This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
> ::i marked places . ^ J ' V ' ^^^ transition t ' is fired , . ' < > : no transition can . be fired
Fig. 4.5. A Reachability Tree
ii. Pre- and Post-Conditions Calculus: This algorithm [3]is called with the root of RT and with the parameters of the current execution context. Property: For all plan 77 modeled as an RPN, if the FFCC algorithm applied to the reachabilty tree (RT) of 77 returns true (i.e. 77 is consistent) and the environment is stable, then no failure situation can occur. Discussion: In order to avoid a combinatorial explosion, there exist algorithms which allow to construct a reduced RT. In our context, the RT can be optimized as the following: let ni and n^ be two nodes of RT. The RTC can be optimized through analysis of the method calls as follows: -
-
Independent Nodes: there is no interference between the Pre- and Post-Conditions of rii on the one hand and the Pre- and Post-Conditions of rij on the other hand, i.e. the associated transitions can be fired simultaneously whatever their ordering (i.e. the global execution is unaffected). Here, an arbitrary ordering is decided (e.g. Self.Put and Self.Go_To can be exchanged) which allows many sub-trees to be cut(the right sub-tree in Fig. 4.5). Semi-Independent Nodes: there is no interference between the Post-Conditions of Ui and rij, i.e. the associated transitions don't affect the same attributes. If the exchanged sequences ({n^, Uj } or {uj, n^}) of firing transitions which lead to the same marking can be detected then the sub-tree starting from this marking can be cut. The obtained graph is then acyclic and merges the redundant sub-trees.
Handling Negative Interactions: Now an other agent (e.g. Cli) sends his plan to Cv who processes 77i (C/i's plan). GCOA as Generalization of the COA Algorithm [3] The Conveyor starts a new coordination. The first and second phases are the same as in the case of positive interactions.He chooses the plan 772. Negative interactions arise when the two refinements (772 and 772) have shared attributes (e.g. the boat volume constraint). Now, Cv has to solve internal negative interactions before proposing a merging plan to C/2. Again the GCOA is divided into two steps.
4 Multi-agent Planning for Autonomous Agents' Coordination
65
i. Internal Structural Merging by Sequencing: Cv connects 772 and il2 by creating a place pi for each pair of transitions (te^U) in End{n2) x Init{Il2) and two arcs in order to generate a merged plan 77^: Function Sequencing(in iJi, 772: Plan): Plan; {this function merges 11\ and 772/ produces a merged plan Tim and the synchronization places } begin Let TE = {te G Hi/te is an end transition} and Ti = {U e 112/U is an initial transition } (i.e. U has no predecessor in Pi2) for ail (te.ti) GTE x T / d o Create a place pe,i Create an input arc IAe,i from te to pe,i Create an output arc OAe,i from pe,i to U (i.e. Post{pe,i,te) = 1 and Pre{pe,i,ti) = 1) endfor Hm := Merged_Plan (TTi, 772, {pa}, {IAe,i}, {OAe,i}) return (77^, {pe,i}) end {Sequencing} ii. Parallelization by Moving up arcs: Cv applies the FFCC algorithm to the merged net 77^^ obtained by sequencing. If the calculus returns true then the planner proceeds to the parallelization phase by moving up the arcs recursively in order to introduce a maximum parallelization in the plan. This algorithm [3] tries to move (or eliminate) the synchronization places. The predecessor transition of each synchronization place will be replaced by its own predecessor transition in two cases: the transition which precedes the predecessor transition is not fired or is not in firing. If both the Pre- and Post-Conditions remain valid, then a new arc replaces the old. The result of this parallelization is to satisfy both Cli and CI2 by executing the merged net 77^.
rg J JL
Moving Up Arcs te;ti ^te//ti
Fig. 4.6. Sequencing and Parallelization
Remark: At each moving up the arcs, the FFCC algorithm is applied to the new net.
66
Amal El Fallah-Seghrouchni
The exchanged plans are the old ones augmented by synchronization places upstream and downstream. This algorithm can be optimized at the consistency control level. In fact, the coherence checking can be applied in incremental way to each previous plan 772.
4.6 Hybrid Automata Fomalism for Multi-Agent Planning The second model of multi-agent planning we developped is based on Hybrid Automata [7] which represent an alternative formalism to deal with multi-agent planning when temporal constraints play an important role. In this modelling, the agents' behaviour (throught individual plans and multi-agent plans) is state-driven. The interest of those automata is that they can model different clocks evolving with different speeds. These clocks may be the resources of each agent and the time. A Hybrid Automaton is composed of two elements, a finite automaton and a set of clocks: •
•
A finite automaton A is a tuple: A =< Q, E, Tr, qo, I > where Q is the set of finite states, E the set of labels, Tr the set of edges, qo the initial locations, and I an application associating the states of Q with elementary properties verified in this state. A set of clocks H, used to specify quantitative constraints associated with the edges. In hybrid automata, the clocks may evolve with different speeds.
Tr is a set of edge t such as ^ G Tr, t =< s, {{g}, e, {r}), s' >, where: •
s and 5' are elements of Q, they respectively model the source and the target of the edge t =< s, {{g}, e, {r}), s' > such that: - {g} is the set of guards. It represents the conditions on the clocks; - e is the transition label, an element of E; - {r} is the set of actions on clocks.
Multi-agent plans can be modeled by a network of synchronized hybrid automata (a more detailed presentation can be found in [4]). They provide an important interest since they take into account the agents features and the time as parameters of the simulation (those variables may be modeled by different clocks evolving with different speeds inside the automata). All the parameters of the planning problem may be represented in the hybrid automata: the tasks to be accomplished are represented by the reachable states in the automata; the relation between tasks by the edges; the pre-, post- and interruption conditions by the guards of the edges; and finally the different variables by the clocks of the automata. Let us define the synchronized product: Considering n hybrid automata Ai =< Qi^Ei, Tri,qo^i, k, Hi >, fori = 1, ...,n. •
Q=
QiX...xQn\
4 Multi-agent Planning for Autonomous Agents' Coordination
•
•
T = {((gi,...,^9n),(ei,...,en),(5l,...,gl)|, Ci = ' - ' and g- = qi or Ci -f=! -' and {q^ e^, q-) G Tr J ; 90 = (9o,i,9o,2,.--,9o,n);
•
H = Hi X ... X Hn-
67
So, in this product, each automaton may do a local transition, or do nothing (empty action modeled by '-') during a transition. It is not necessary to synchronize all the transitions of all the automata. The synchronization consists of a set of Synchronization that label the transitions to be synchronized. Consequently, an execution of the synchronized product is an execution of the Cartesian product restricted to the label of transitions. In our case, we only synchronize the edges concerning the temporal connectors Sstart and Send- Indeed the synchronization of individual agent's plans is done with respect to functional constraints and classical synchronisation technics of the automata formalism like "send / reception" messages. Hybrid Automata formalism and the associated coordination mechanisms are detailed in [5].
4.7 Conclusion The two models presented in this paper are suitable for multi-agent planning. The recursive Petri nets allow the plans modelling (both at the agent and multi-agents levels) and their management when abstraction and dynamic refinement are required. RPN allows, easily, the synchronization of individual agents'plans. They are, in particular, interesting for the multi-agent validation thanks to the reachability tree building if combined to reduction technics (in order to avoid the combinatory explosion of the the number of states). The main shortcoming of this model is the absence of explicit handling of temporal constraints. This is why we developped a model based on Hybrid Automata that model different clocks evolving with different speeds. These clocks may be the resources of each agent and the time.
References 1. R. Alur and D. Dill A Theory of Timed Automata. Theoretical Computer Science. Vol. 126, n. 3, pages 183-225. (1994) 2. A. Barrett and D.S. Weld. Characterizing Subgoal Interactions for Planning. In Proceedings ofIJCAI-93, pp 1388-1393. (1993). 3. A. El Fallah Seghrouchni and S. Haddad. A Recursive Model for Distributed Planning. In the proceedings of ICMAS'96. IEEE publisher. Japan (1996). 4. A. El Fallah-Seghrouchni, I. Degirmenciyan-Cartault and F. Marc. Framework for MultiAgent Planning based on Hybrid Automata. In the proceedings of CEEMAS 03 (International/Eastern Europe conference on Multi-Agent System). LNAI2691. Springer Verlag. Prague.(2003).
68
Amal El Fallah-Seghrouchni
5. A. El Fallah-Seghrouchni, R Marc and I. Degirmenciyan-Cartault. Modelling, Control and Validation of Multi-Agent Plans in Highly Dynamic Context. To appear in the proceedings of AAMAS 04. ACM Publisher. New York.(2004). 6. M.R Georgeff. Planning. In Readings in Planning. Morgan Kaufmann Publishers, Inc. San Mateo, California.(1990) 7. T. A. Henzinger. The theory of Hybrid Automata. In the proceedings of 11th IEEE Symposium Logic in Computer Science, pages 278-292.(1996) 8. T. A. Henzinger, Pei-Hsin Ho and H. Wong-Toi HyTech : a model checker for hybrid systems Journal of Software Tools for Technology Transfer. Vol. 1, n. 1/2, pages 110122. (2001) 9. Jensen, K. High-level Petri Nets, Theory and Application. Springer-Verlag.(1991) 10. Martial, V. 1990. Coordination of Plans in a Multi-Agent World by Taking Advantage of the Favor Relation. In Proceedings of the Tenth International Workshop on Distributed Artificial Intelligence.
Creating Common Beliefs in Rescue Situations Barbara Dunin-K^plicz^'"^ and Rineke Verbrugge"^ ^ Institute of Informatics, Warsaw University, Banacha 2, 02-097 Warsaw, Poland [email protected] ^ Institute of Computer Science, Polish Academy of Sciences, Ordona 21, 01-237 Warsaw, Poland ^ Institute of Artificial Intelligence, University of Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands [email protected] Summary. In some rescue or emergency situations, agents may act individually or on the basis of minimal coordination, while in others, full-fledged teamwork provides the only means for the rescue action to succeed. In such dynamic and often unpredictable situations agents' awareness about their involvement becomes, on the one hand, crucial, but one can expect that it is only beliefs that can be obtained by means of communication and reasoning. A suitable level of communication should be naturally tuned to the circumstances. Thus in some situations individual belief may suffice, while in others everybody in a group should believe a fact or even the strongest notion of common belief is the relevant one. Even though conmion knowledge cannot in general be established by communication, in this paper we present a procedure for establishing common beliefs in rescue situations by minimal conununication. Because the low-level part of the procedure involves file transmission (e.g. by TCP or alternating-bit protocol), next to a general assumption on trust some additional assumptions on conmiunication channels are needed. If in the considered situation communication is hampered to such an extent that establishing a common belief is not possible, creating a special kind of mutual intention (defined by us in other papers) within a rescue team may be of help.
5.1 Introduction Looking at emergency situations in their complexity, a rather powerful knowledgebased system is needed to cope with them in dynamic and often unpredicatble environment. In emergencies, coordination and cooperation are on the one hand vital, and on the other side more difficult to achieve than in normal circumstances. To make the situation even more complex, time is critical for rescues to succeed, and communication is often hampered. Also, usually expertise from different fields is needed. Multiagent systems exactly fit the bill: they deliver means for organizing complex, sometimes spectacular interactions among different, physically and/or logically distributed knowledge based entities [I]:
70
Barbara Dunin-K^plicz and Rineke Verbrugge
A MAS can be defined as a loosely coupled network of problem solvers that work together to solve problems that are beyond the individual capabilities or knowledge of each problem solver. This paper is concerned with a specific kind of MAS, namely a team. A team is a group in which the agents are restricted to having a common goal of some sort in which team-members typically cooperate and assist each other in achieving their common goal. Rescuing people from a crisis or emergency situation is a complex example of such a common goal. Emergency situations may be classified along different lines. It is not our purpose to provide a detailed classification here, but an important dimension of classification is along the need for teamwork. A central joint mental attitude addressed in teamwork is collective intention. We agree with [2] that: Joint intention by a team does not consist merely of simultaneous and coordinated individual actions; to act together, a team must be aware of and care about the status of the group effort as a whole. In some rescue situations, agents may act individually or on the basis of minimal coordination, while in others, full-fledged teamwork, based on a collective intention, provides the only means for the rescue action to succeed. MAS can be organized using different paradigms or metaphors. For teamwork, BDI (Beliefs, Desires, Intentions) systems form a proper paradigm. Thus, some multiagent systems may be viewed as intentional systems implementing practical reasoning — the everyday process of deciding, step by step, which action to perform next. This model of agency originates from Michael Bratman's theory of human rational choice and action [3]. His theory is based on a complex interplay of informational and motivational aspects, constituting together a belief-desire-intention model of rational agency. Intuitively, an agent's beliefs correspond to information the agent has about the environment, including other agents. An agent's desires or goals represent states of affairs (options) that the agent would choose. Finally, an agent's intentions represent a special subset of its desires, namely the options that it has indeed chosen to achieve. The decision process of a BDI agent leads to the construction of agent's commitment, leading directly to action execution. The BDI model of agency comprises beliefs referring to agent's informational attitudes, intentions and then commitments referring to its motivational attitudes. The theory of informational attitudes has been formalized in terms of epistemic logic as in [4, 5]. As regards motivational attitudes, the situation is much more complex. In Cooperative Problem Solving (henceforth CPS), a group as a whole needs to act in a coherent pre-planned way, presenting a unified collective motivational attitude. This attitude, while staying in accordance with individual attitudes of group members, should have a higher priority than individual ones. Thus, from the perspective of CPS these attitudes are considered on three levels: individual, social (bilateral), and collective.
5 Creating Common Beliefs in Rescue Situations
71
When analyzing rescue situations from the viewpoint of BDI systems, one of the first purposes is to define the scope and strength of motivational and informational attitudes needed for successful team action. These determine the strength and scope of the necessary communication. In [6], [7], we give a generic method for the system developer to tune the type of collective commitment to the application in question, the organizational structure of the group or institution, and to the environment, especially to its communicative possibilities. In this paper, however, the essential question is in what terms to define the communication necessary for teamwork in rescue situations. Knowledge, which always corresponds to the facts and can be justified by a formal proof or less rigorous argumentation, is the strongest and therefore preferred informational attitude. The strongest notion of knowledge in a group is common knowledge, which is the basis of all conventions and the preferred basis of coordination. Halpem and Moses proved that common knowledge of certain facts is on the one hand necessary for coordination in well-known standard examples, while on the other side, it cannot be established by communication if there is any uncertainty about the communication channel [4]. In practice in MAS, agents do with belief instead of knowledge for at least the following reasons. First, in MAS perception provides the main background for beliefs. In a dynamic unpredictable environment the natural limits of perception may give rise to false beliefs or to beliefs that, while true, still cannot be fully justified by the agent. Second, communication channels may be of uncertain quality, so that even if a trustworthy sender knows a certain fact, the receiver may only believe it. Conmion belief is the notion of group belief which is constructed in a similar way as common knowledge. Thus, even though it puts less constraints on the communication environment that common knowledge, it is still logically highly complex. For efficiency reasons it is often important to minimize the level of communication among agents. This level should be tuned to the circumstances under consideration. Thus in some situations individual belief may suffice, while in others everybody in a group should believe a fact and again in the others the strongest notion of common belief is needed. In this paper we aim to present a method for establishing common beliefs in rescue situations by minimal conmiunication. If in the considered situation communication is hampered to such an extent that establishing a common belief is not possible, we attempt some alternative solutions. The paper is structured in the following manner. In section 5.2, a short reminder is given about individual and group notions of knowledge and belief, and the difficulty to achieve conmion belief in certain circumstances. Then, a procedure for creating conmion beliefs is introduced in section 5.3, which also discusses the assumptions on the environment and the agents that are needed for the procedure to be effective. Section 5.4 presents three case studies of rescue situations where various collective attitudes enabling appropriate teamwork are established, tuned to the communicative possibilities of the environment. Finally, section 5.5 discusses related work and provides some ideas about future research.
72
Barbara Dunin-K^plicz and Rineke Verbrugge
5.2 Knowledge and belief in groups In multiagent systems, agents' awareness of the situation they are involved in is a necessary ingredient. Awareness in MAS is understood as a reduction of the general meaning of this notion to the state of an agent's beliefs (or knowledge when possible) about itself, about other agents as well as about the state of the environment, including the situation they are involved in. Assuming such a scope of this notion, different epistemic logics can be used when modelling agents' awareness. This awareness may be expressed in terms of any informational (individual or collective) attitude fitting given circumstances. In rescue situations, when the situation is usually rather complex and hard to predict, one can expect that only beliefs can be obtained. 5.2.1 Individual and common beliefs To represent beliefs, we adopt a standard i^£)45n-system for n agents as explained in [4], where we take BEL(i,(/?) to have as intended meaning "agent i believes proposition (/?". A stronger notion than the one for belief is knowledge, often called "justified true belief. The usual axiom system for individual knowledge within a group is 55^, i.e. a version of KDA5n where the consistency axiom is replaced by the (stronger) truth axiom KN0W(2, ip) —^ (p. We do not define knowledge in terms of belief. Definitions occurring in the MAS-literature (such as KNOW(i, if)
^
/ \ BEL(i,v?) ieG
C2 C-BELG(<^) ^ E-BELG(C^ A RCl From ip -^ E-BELG {ip A (p)
R2 From (p infer BEL(z,
) is the only clause that defines nei^O in PK, U {(^O}» we have that sat{so, ip) G M{PK) iff neiyO G M{P)c U {(^o})» (iii) since the transformation rules, when applied according to the transformation strategy UFV, preserve the perfect model, newO G M{P)c U {SQ}) iff newO G M{T), and finally, (iv) by the definition of perfect model, (iv.l) if newO <— occurs in T then newO G M{T) and (iv.2) if no clause with head newO occurs in T then newO ^ M{T). Let us return to our example of Figure 7.1 and let us suppose that we want to verify that JCQ^SQ \= -^ef -^ef a holds, that is, for every state reachable from the initial state 5o it is possible to get to a state where a holds. Let Po be the program constructed at Step 1 as indicated in Section 7.3. Step 2 of our verification method consists in transforming the program PQ U {SQ} where SQ is the clause newO <^ sat{sQ, -^ef -«e/ a) into a program where newO <— occurs. We will present this transformation in Section 7.3.3. 7.3.1 The Transformation Rules The process of transforming a given program Pjc thereby deriving program T, can be formalized as a sequence PQ, • • •, ^n of programs, called a transformation sequence, where: (i) Po = PA:, (ii) Pn = T, and (iii) for 2 = 0 , . . . , n - 1 , program P^+i is obtained from program Pi by applying one of the transformation rules listed below. These rules are variants of the rules considered in the literature for transforming logic programs (see, in particular, [20, 21]). The atomic definition rule allows us to introduce a new predicate definition by adding to program Pi a new clause whose body consists of one atom only. We can use this rule to add clause So to program P/c at the beginning of Step 2 of our verification method. Rl. Atomic Definition. We introduce a new clause, called a definition, of the form: S : newp{Xi, • • •, ^m) ^ A where: (i) newp is a predicate symbol not occurring in PQ, . . . , Pi, (ii) X i , . . . , Xm are the variables occurring in A, and (iii) the predicate of A occurs in PQ. By atomic
106
Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti
definition (or definition, for short), we derive the new program P^^-i = P^ U {5}. For i > 0, DefSi denotes the set of definitions introduced during the transformation sequence PQ^. .. ^Pi.ln particular, Defs^ = 0. The unfolding rule corresponds to a symbolic computation step. It replaces a clause 7 in Pi by the set of all clauses that can be derived by applying a resolution step w.r.t. a literal L occurring in the body of 7. We have a positive and a negative unfolding rule, according to the case where L is a positive or negative literal. Notice that in the negative unfolding rule (see case R2n below) the literal L should be either valid or failed. We say that an atom A is valid in a program P iff there exists a unit clause i7 ^— in P such that A is an instance of H. We say that an atom A is^ failed in P iff there exists no clause H ^ Gin P such that A is unifiable with B. The negated atom -^A is valid iff A is failed and -^A is failed iff A is valid. R2. Unfolding. Let 7 : H '^ Gi A L A G2 be a. clause in Pi and let P/ be a variant of Pi without common variables with 7. We consider the following two cases. (R2p: Positive Unfolding) Let L be a positive literal. By unfolding ^ w.r.t, L v/e derive the set of clauses r : {{H ^GiAGA G2)'d | (i) i^ ^ G is a clause in P^ and (ii) L and K are unifiable with mgu 'd} We derive the new program P^+i = (P^ — {7}) U P. (R2n: Negative Unfolding) Let L be a negative literal, (i) If L is valid in P/, then by unfolding ^ w.r.t. L ^e derive the clause
ry:
H^GiAG2
and we derive the new program P^+i = {Pi - {7}) U {7]}. (ii) If L is failed in P/, then by unfolding j w.r.t. L we derive the new program P,+i =Pi- {7}. The atomic folding rule allows us to replace an atom A which is an instance of the body of a definition by the corresponding instance of the head of the definition. R3. Atomic Folding. Let 7 : i? ^ Gi AL AG2 be a clause in Pi and IciS : N ^ A be a clause in Defs^ without common variables with 7. We consider the following two cases. (R3p: Positive Folding) Let L be the atom A'd for some substitution i9. By folding 7 w.r.t. A using S we derive the clause
rj: H
^GiAN'dAG2
and we derive the new program Pi_{-i = {Pi — {7}) U {r/}. (R3n: Negative Folding) Let L be the negated atom ^Ad. By folding 7 w.r.t. -^A using S, we derive the clause
77: H
^GiA-^N^AG2
and we derive the new program Pi^i = {Pi — {7}) U {rj}. The following clause removal rule may be used for removing from Pi a redundant clause 7, that is, a clause 7 such that M{Pi) = M{Pi - {7}). Let us first introduce the following definitions. The set of useless predicates in a program P is the maximal set U of predicate symbols occurring in P such that a predicate p is in C/ iff for every clause p{...) ^— G in P , the body G is of the form Gi A q{...) A G2 for some predicate g in C/. A clause is useless iff the predicate in its head is useless.
7 Automatic Proofs of Protocols via Program Transformation
107
R4. Clause Removal, Let 7 be a clause in Pi. By clause removal we derive the new program P^+i = Pi — {7} if one of the following cases occurs: (R4s: Removal of Subsumed Clause) 7 is a clause H
PhaseB.
RemovedUnfold{PA,T)
The f/ipy strategy is divided into two phases: Phase A and Phase B. Phase A takes program PK, and clause ^o as input and returns program PA as output. Initially program PA is the empty set of clauses. During Phase A we use the following two sets of clauses: (1) Defs, which is the set of definitions introduced during the transformation process, and (2) A, which is the set of definitions that have been introduced but not yet unfolded. At the beginning of Phase A we apply the definition rule and we add clause 6Q to Defs and A. Then the strategy performs a WHILE-DO loop whose body consists of applications of the unfolding rule according to the Unfold procedure (see below), followed by applications of the definition and folding rules according to the DefineSzFold procedure (see below). The Unfold procedure takes a definition clause 5 e A and derives a set F of clauses by applying one or more times the positive unfolding rule starting from clause S. Every clause in F is of the form iJ ^ Li A . . . A Ln, where n > 0 and each literal Li is either an atom of the form sat{s, V^) or a negated atom of the form -^sat{s, ijj). The DefineSzFold procedure takes as input: (i) the set F of clauses and (ii) the set Defs of definitions, and returns as output: (i) a set NewDefs of new definitions and (ii) a set ^ of transformed clauses. NewDefs is the set of definitions of the form newp
108
Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti
the definitions of NewDefs are added to Defs and, in the set A, clause 5 is replaced by the clauses of NewDefs. Phase B is realized by the Removek, Unfold procedure, which transforms PA by repeatedly removing useless clauses, subsumed clauses, and applying the positive and the negative unfolding rule w.r.t. valid or failed literals. Upon termination this procedure returns a program T where new^ is either valid or failed, that is, either (i) the unit clause new^ ^— occurs in T or (ii) no clause with head new^ occurs in T. In case (i) we have that /C, SQ \= ^ and in case (ii) we have that /C, SQ ^ (/?. For a Kripke structure /C with a finite set of states the L^FV strategy terminates. In particular, only a finite number of definitions will be introduced during the execution of the WHILE-DO loop because only a finite number of distinct atoms of the form sat{s^ ^) can be generated. Indeed s is an element of a finite set of states and ^ is a (proper or not) subformula of the given formula ^. Thus, the UFV strategy is a decision procedure for checking whether or not /C, 5o \= ^ holds for any given finite state Kripke structure /C, initial state 5o, and temporal formula (f. 7.3.3 An Example of Application of the Verification Method In this section we will see our transformation strategy in action for the verification of a property of the finite state system of Figure 7.1. We want to show that in that system, for every state which is reachable from the initial state SQ, it is possible to get to a state where a holds, that is, /Co, 5o |= ag ef a. The initial program PQ which encodes the Kripke structure /Co, is the following: 1. sat{s2,CL) <— 4. sat{sQ^ef F) <— sat{si,ef F) 2. sat{S,-^F) ^ -^sat(S,F) 5. satlsi, ef F) ^ sat{so, ef F) 3. sat{S, ef F) ^ sat(S,F) 6. sat{si, ef F) ^ sat{s2, efF) 7. sat{s2, ef F) <— sat{s2, efF) Program PQ has been obtained by unfolding the program Pjc which encodes a generic Kripke structure /C, by using the clauses which define the relations elem and t relative to /Co (see Section 7.2). We have not considered the clauses for temporal formulas of the form F l A F2 and afF because they are not needed during the application of the transformation strategy. The clause 5Q is newO ^— sat{soj -^ef ^ef a). (Recall that ag ef ^ is equivalent to ->ef ->ef (f.) By applying the Unfold procedure, we unfold clause 5o using clause 2 and we get: 8. newO ^— ^sat{so, ef -> ef a). Then we apply the DefinekFold procedure. We introduce the definition: 9. newl <— sat{sQ, ef -> ef a) and we fold clause 8 using clause 9. We get the following clause: 10. newO
7 Automatic Proofs of Protocols via Program Transformation
109
11. newl
7.4 Verification of Infinite State Systems Our transformational approach for the verification of properties of protocols can be extended from finite state systems to infinite state systems. In order to specify infinite state systems we extend the method described in Section 7.2 by using constraint logic programs [15]. These programs generalize logic programs by allowing the bodies of the clauses to contain constraints. Constraints are formulas that define relations over some given domains, such as real numbers, integers, and trees. For our purposes here, constraints are simply first order formulas whose predicate symbols are taken from a distinct set. These constraints can be evaluated by using constraint solvers that are realized by ad hoc algorithms. The semantics of a constraint c is defined by the usual first order satisfaction relation V \= c, where X> is a fixed interpretation. The notion of perfect model can be extended from logic programs to constraint logic programs in a straightforward way [9].
110
Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti
A transition relation t on an infinite set of states can be specified by using constraints in the body of the clauses that define t. For instance, the clause t{X, Y) <— X >0 AY = X'\-1 specifies a transition relation on the set of the integer numbers. We will see a more elaborate example in Section 7.4.1. In order to encode the satisfaction relation JC,s \= (pfora, Kripke structure /C with an infinite set of states, we can construct a constraint logic program Pjc similarly to what we have described in Section 7.2. However, in order to encode the satisfiability of a temporal formula of the form af (p, the method of Section 7.2 introduces a clause for each state of/C and this is impossible for infinite state systems. Fortunately, this problem can be overcome for a large class of infinite state systems by using constraints as indicated in [9] (see also Section 7.4.1 for an example). In order to extend our verification method to the case of infinite state systems, we need to extend the transformation rules and the transformation strategy presented in Section 7.3 to the case of constraint logic programs. The extensions of the definition, unfolding, folding, and clause replacement rules can be found in [8, 9]. Moreover, we will use the following two transformation rules which specifically refer to constraints: (i) Rule R4f, for deleting a clause whose body contains an unsatisfiable constraint, and (ii) Rule R5, for replacing a constraint by an equivalent one. R4f. Removal of Clauses with Unsatisfiable Body. Let 7 be a clause of the form H <^ c A G in Pi. Suppose that c is unsatisfiable, that is, V \= -'3(c), where 3(c) is the existential closure of c. Then, we derive the new program Pi^i = Pi — {7}. R5. Constraint Replacement. Let 71 : ^ ^ ci A G be a clause in Pi. Suppose that for some constraint C2, we have that: PhV(3yi...3nci^3Zi...3Z^C2) where: (i) F i , . . . , Y^ are the variables occurring free in ci and not in {H, G}, (ii) Z i , . . . , Zm are the variables occurring free in C2 and not in {H, G}, and V((/?) denotes the universal closure of formula ^p. Then by constraint replacement we derive the clause 72 : E ^ C2/\G and we derive the new program P^+i = {Pi — {71}) U {72}. The transformation strategy L^FV presented in Section 7.3.2 can be extended to constraint logic programs that encode infinite state systems. During the execution of this strategy, we apply the modified transformation rules for constraint logic programs. In particular, during the execution of the DefineSzFold procedure, when applying the definition rule, we introduce new definitions of the form: NewH ^c{X)Asat{X,2/j) where: (i) X is a variable ranging over states, (ii) c{X) is a constraint representing a possibly infinite set of states, and (iii) V^ is a temporal formula. The main issue that arises when dealing with infinite state systems is that the termination of the f/FV strategy is no longer guaranteed. Indeed, for any given temporal formula ip, an infinite number of constrained atoms of the form c{X) A sat{X^ 2p) with non-equivalent constraints may be generated and, thus, an infinite number of non-equivalent definitions may be introduced.
7 Automatic Proofs of Protocols via Program Transformation
111
For instance, during the verification of the mutual exclusion property of the Bakery protocol (see Section 7.4.1 below), starting from the initial definition SQ : newO
112
Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti
(Bl, B2), respectively, is represented by the 4-tuple (Al, A2, B l , 52). The transition relation t of the two agent system from an old state OldState to a new state NewState, is defined as follows: TA. t{OldState, NewState) ^ tA{OldState, NewState) TB. t\oidState, NewState) ^ tB{OldState, NewState) where the transition relation tA for the agent A is given by the following clauses whose bodies are conjunctions of constraints (see also Figure 7.2): Al. tA{{think, A2,51,52), {wait, A21,51,52)) ^ ^21 = 5 2 + 1 A2. tA{{wait, A2,51,52), {use, A2,51,52)) ^ A2<52 A3. tA{{wait, A2,51,52), {use, A2,51,52)) ^52 =0 A4. tA{{use, A2,B\,B2), {think, A2\,B\,B2)) ^ A21 = 0 The following analogous clauses define the transition relation tB for the agent 5 : Bl. tB{{A\,A2,think,B2),{A\,A2,wait,B2l)) ^ 5 2 1 = ^12+1 B2. tB({A\, A2, wait, 52), {Al, A2, use, 52)) ^ 5 2 < A2 B3. tB({Al, A2, wait, 52), {Al, A2, use, 52)) ^ A2 = 0 B4. tBl{Al,A2, use, B2),{A1,A2, think, B21)) ^ 5 2 1 = 0
A2:=0
A2
Notice that the two agent system has an infinite number of states, because counters may increase in an unbounded way, as indicated by the underlined states of the following computation path starting from the initial state {think, 0, think, 0): {think, 0, think, 0), {wait, 1, think, 0), {wait, 1, wait, 2), {use, 1, wait, 2), {think, 0, wait, 2), {think, 0, use, 2), {wait, 3, use, 2), {wait, 3, think, 0), . . . We may apply our verification method for checking the mutual exclusion property of the Bakery protocol. This property is expressed by the CTL formula: ->e/ unsafe, where for all states s, elem{s, unsafe) holds iff s is of the form {use, A2, use, 5 2 ) , that is, unsafe holds iff both agents A and 5 are in the control state use. The initial program Pjc which encodes the Kripke structure of the Bakery protocol with two agents A and 5 , is the following one:
7 Automatic Proofs of Protocols via Program Transformation
113
1. sat{{use, A2, use, B2)J unsafe) <^ 2. sat{S,^F) ^^sat{S,F) 3. sat{S,efF) ^ sat{S,F) 4. sat{S, ef F) ^ t{S, T) A sat{T, ef F) together with the clauses TA, TB, A1-A4, and B1-B4, which define the transition relation t. The initial clause So for the mutual exclusion property is: newOme ^— sat {{think, 0, think, 0), -le/ unsafe) For the Bakery protocol we may also want to prove the starvation freedom property which ensures that an agent, say A, which requests the shared resource, will eventually get it. This property is expressed by the CTL formula: ag {waitA -^ af use A), which is equivalent to: -^ef {waitA A ^af useA). For the elementary properties waitA and useA, the satisfaction relation is defined by the clauses: sat{{wait, A2, Bl, B2), waitA) <— sat{{use, A2, S I , S2), useA) ^ For the starvation freedom property the initial clause 6o is: newOsf <— sat {{think, 0, think, 0),->ef {waitA A->af use A)) The clauses for sat{X,-^F), sat{X,FlAF2), and sat{X, ef F) are clauses S2-S5 (see Section 7.2). We do not have the space here to list all clauses for sat{X, af F). These clauses include clause S6 (see Section 7.2) together with one or more clauses of the form S7 (see Section 7.2) for each state s of the form {A1,A2,B1,B2), where Al and J51 belong to {think, wait, use}. For instance, the clause for the state {think, A2, think, B2) is: sat{{think,A2, think, B2), af F) ^ A21 = B2-hl A 521 = ^ 2 + 1 A sat{{wait, A21, think, 52), af F) A sat{{think, A2, wait, B21), af F) The remaining clauses for sat{X, af F) can be found in [9]. By using our experimental constraint logic program transformation system MAP [14] we have been able to automatically verify the mutual exclusion and the starvation freedom properties of the Bakery protocol. We have verified some more properties of various other protocols by using our system MAP running on a Linux machine with a 900 MHz clock, and the results of these experiments are reported in the following Table 7.1. The verification times we have obtained demonstrate that our system performs well w.r.t. the DMC system [6] and the other systems cited in [6, 7].
7.5 Conclusions and Related Work We have presented a method for verifying CTL properties of protocols for multiagent systems specified as constraint logic programs. For systems which have a finite number of states, the method is complete and can be used as a decision procedure. For systems which have an infinite number of states, the method may not terminate. However, for a large class of infinite state systems, the method terminates if we use suitable generalization operators. We have applied our method for proving safety and liveness properties of several infinite state protocols.
114
Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti
Table 7.1. Experimental results of the verification of some properties of various protocols. The protocols and the properties are taken from [6]. The verification time is expressed in seconds. Protocol Bakery (mutual exclusion) Ticket (mutual exclusion) Berkeley RISC (cache coherence) Xerox Dragon (cache coherence) DEC Firefly (cache coherence) Illinois University (cache coherence) MESI (cache coherence)
Property -le/ unsafe ag (wait A —> af use A) -le/ unsafe ag (wait A —> af use A) -^ef{dirty>2) ->ef {dirty > 1 A shared > 1) -^ef {dirty>2) -le/ {dirty > 1 A sharedxlean > 1) -le/ {dirty > 1 A shareddirty > 1) ^ef{dirty>2) ->e/ {dirty > 1 A shared > 1) -^ef {dirty>2) -le/ {dirty > 1 A shared > 1) -yef {dirty>2) ->e/ {dirty > 1 A shared > 1)
Verification Time 0.2 2.3 0.6 3.0 2.0 1.3 1.3 0.9 1.0 0.4 0.4 0.3 0.3 0.3 0.2
Our verification method is related to others presented in the literature for the proofs of properties of concurrent systems which use the logic programming paradigm. Among them we mention the following ones. In [18] the authors present XMC, a model checking system implemented in the tabulation-based logic programming language XSB. XMC can verify temporal properties expressed in the altemation-free fragment of the /i-calculus of finite state systems specified in a CCS-like language. In [17] a model checker is presented for verifying CTL properties of finite state systems, by using logic programs with finite constraint domains that are closed under conjunction, disjunction, variable projection and negation. The verification process is performed by executing a constraint logic program encoding the semantics of CTL in an extended execution model that uses constructive negation and tabled resolution. In [10] an automatic method for verifying safety properties of infinite state Petri nets with parametric initial markings is presented. The method constructs the reachability set of the Petri net being verified by computing the least fixpoint of a logic program with Presburger arithmetic constraints. A method for the verification of some CTL properties of infinite state systems using constraint logic programming is described in [7]. Suitable constraint logic programs which encode the system and the property to be verified, are introduced, and then, the CTL properties are verified by computing exact and approximated least and greatest fixed points of those programs. Finally, the use of program transformation for verifying properties of infinite state systems has been investigated in [12, 19]. In particular, (i) specialization of logic programs and abstract interpretation are used in [12] for the verification of safety properties of infinite state systems, and (ii) unfold/fold transformation rules
7 Automatic Proofs of Protocols via Program Transformation
115
are applied in [19] for proving safety and liveness properties of parameterized finite state systems with various network topologies.
Acknowledgements Many thanks to Dr. A. Jankowski, Prof. A. Skowron, and Dr. M. Szczuka for their kind invitation at the International Workshop MSRAS 2004. We would like also to thank Prof. G. Delzanno, Prof. S. Etalle, and Prof. M. Leuschel for helpful discussions and comments.
References 1. K.R. Apt and R.N. Bol. Logic programming and negation: A survey. J, Logic Programming, 19,20:9-11, 1994. 2. R.M. Burstall and J. Darlington. A transformation system for developing recursive programs. JACM, 24(l):44-67, January 1977. 3. L. Cardelli and A.D. Gordon. Mobile ambients. Theoretical Computer Science, 240(1):177-213,2000. 4. W. Chen and D.S. Warren. Tabled evaluation with delaying for general logic programs. yACM, 43(1), 1996. 5. E.M. Clarke, O. Grumberg, and D. Peled. Model Checking, MIT Press, 2000. 6. G. Delzanno. Automatic verification of parametrized cache coherence protocol. In Proc. CAW 2000, LNCS 1855, 55-68. Springer, 2000. 7. G. Delzanno and A. Podelski. Model checking in CLP. In R. Cleaveland (ed.) Proc. TkCAS '99, LNCS 1579, 223-239. Springer, 1999. 8. S. Etalle and M. Gabbrielli. Transformations of CLP modules. Theoretical Computer Science, 166:101-146, 1996. 9. F. Fioravanti, A. Pettorossi, and M. Proietti. Verifying CTL properties of infinite state systems by specializing constraint logic programs. R. 544, lASI-CNR, Roma, Italy, 2001. 10. L. Fribourg and H. Olsen. Proving safety properties of infinite state systems by compilation into Presburger arithmetic. In Proc. CONCUR '97, LNCS 1243, 96-107. SpringerVerlag, 1997. 11. L. Lamport. A new solution of Dijkstra's concurrent progranmiing problem. CACM, 17(8):453-455, 1974. 12. M. Leuschel and T. Massart. Infinite state model checking by abstract interpretation and program specialization. In Proc. LOPSTR '99, LNCS 1817, 63-82. Springer, 1999. 13. J.W.Lloyd. Foundations of Logic Programming. Springer-Verlag, Berlin, 1987. 14. MAP group. The MAP transformation system. Available from: h t t p : //\tmw. i a s i . r m . c n r . i t / ~ p r o i e t t i / s y s t e m . h t m l , 1995-2004. 15. K. Marriott and P. Stuckey. Programming with Constraints: An Introduction. The MIT Press, 1998. 16. R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes. Part I and II. Information and Computation, 100(1): 1-77, 1992. 17. U. Nilsson and J. Liibcke. Constraint logic progranmiing for local and symbolic modelchecking. In Proc. CL 2000, LNAI 1861, 384-398. Springer, 2000.
116
Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti
18. Y.S. Ramakrishna, C.R. Ramakrishnan, I.V. Ramakrishnan, S.A. Smolka, T. Swift, and D.S. Warren. Efficient model checking using tabled resolution. In Proc. CAV '97, LNCS 1254, 143-154. Springer, 1997. 19. A. Roychoudhury, K. Narayan Kumar, C.R. Ramakrishnan, I.V. Ramakrishnan, and S.A. Smolka. Verification of parameterized systems using logic program transformations. In Proc. TACAS2000, LNCS 1785, 172-187. Springer, 2000. 20. H. Seki. Unfold/fold transformation of stratified programs. Theoretical Computer Science, S6:101-139, 1991. 21. H. Tamaki and T. Sato. Unfold/fold transformation of logic programs. In S.-A. Tamlund (ed.) Proc. 2nd Int. Conf. Logic Programming, 127-138, Uppsala, Sweden, 1984.
8 Mereological Foundations to Approximate Reasoning Lech Polkowski* Polish-Japanese Institute of Information Technology, Koszykowa 86, 02008 Warsaw, Poland Department of Mathematics and Computer Science, University of Warmia and Mazury, Zolnierska 14a, 10561 Olsztyn, Poland email: [email protected]
Summary. In this article, we intend to present a synthetic account of mereological foundations for approximate reasoning along with an outline of applications of this approach to modem paradigms like Granular Computing, and Spatial Reasoning. Key words: rough set theory, rough mereology, rough inclusions, granulation, spatial reasoning, granular rough set theory
8.1 Introduction: Rough Sets and Mereology in Approximate Reasoning We begin with an example that will demonstrate basic ideas of rough sets and mereology and will introduce at the same time a problem of approximate reasoning. The well-known heap paradox due to Eubulides of Miletus consists in two assumptions: 1. 1 grain does not make a heap; 2. if n grains do not make a heap then n + 1 grains do not make a heap, along with a tacit assumption that there is a number making the heap. The paradox results when one applies the mathematical induction to infer from 1 and 2 that there is no number of grains that could make a heap. Let us suppose that we denote with heap-number the set of natural numbers n such that n grains make a heap, and introduce two disjoint to each other and to heap-number sets non-heap-number and B such that the union of the three sets is the set N of natural numbers. We modify the rules 1,2 by adopting the new set of rules: 3. 1 G non-heap-number \ 4. if n G non^ heap-number then n + 1 G non-heap-number W n -\- 1 G B; 5. if n G B then n-f-1 G S V n - f l G heap-number. Using 3-5 and the tacit assumption, one infers that there is a number of grains at which one passes from non- heap state to B, and then a number of grains at which one goes from B to the heap state. The set B is a witness to vagueness of the notion of a heap [18]. *This article is an extended version of the plenary talk given by the author at MSRAS 2004 in Plock, Poland on June 7, 2004
118
Lech Polkowski
The example illustrates aptly the idea of approximate reasoning: one attempts at a description of an imprecise concept, e.g., of a heap, by means of a precise set of notions. As a result, such description may be only approximate. This example also illustrates well the idea of a rough set: in order to describe the imprecise concept of a heap, one does introduce a region B that does witness the imprecision: we actually do not know at what number n the state of non_heap changes into B and at what number m the state B changes into the heap state, yet we know it must be so: B is the region of uncertainty. Finally, one may observe that the notion of a heap is of mass (collective) nature and as such, it may be described better in terms of parts then in terms of elements, i.e., a heap should rather be discussed in mereological terms. Actually, the heap paradox catches in an ingenious way the fact that in describing a heap one passes from settheoretical description in terms of elements to a mereological description in terms of parts; this passing happens somewhere in the set B. 8.1.1 A formal idea of a rough set The idea of a rough set was proposed by Zdzislaw Pawlak [11] in the context of knowledge represented as an equivalence relation E on a set U of entities. Equivalence classes of R contain entities that are indiscernible with respect to R, and a concept (a subset of the set 17) X is said to be exact in case it is a union of a family of equivalence classes of R; otherwise, X is said to be rough. An information system (Pawlak, see [10]) is a well-known way of presenting data, and representing the relation i? ; it is symbolically represented as a pair A={U,A). The symbol U denotes a set of objects, and the symbol A denotes the set of attributes. Each pair (attribute, object) is uniquely assigned a value: given a £ A,u eU, the value a{u) is an element of the value set V. Each entity (object) u £ U is represented in the information system A by its information set InfA{u) = {(a, a{u)) : a e A}, that corresponds to the u-th row of the data table A. Two objects, u,w, may have the same information set: Inf{u) = Inf{w) in which case they are said to be A-indiscemible (see [11], [10]); the relation IND{A) = {{u,w) : InfA{u) = Inf^iw)} is said to be the A-indiscemibility relation. It is an equivalence relation that renders a form of the general relation R. The symbol [U]A denotes the equivalence class of the relation IND{A) containing u. It follows that a concept X is A-exact if and only if X is a union of equivalence classes: X = U { M A • '^ ^ -^}It may be observed that the notion of indiscemibility may be defined with respect to any set B C A of attributes, i. e., a B-information set InfB{u) — {(a, a{uf) : a G 5 } is defined and then the relation of B-indiscemibility, IND{B) = {{u, w) : InfB{u) = Irifsiw)} is introduced, classes of which form B-exact sets. Given a set B of attributes, and a concept X C U, that is not J5-exact, there exists u e U with neither [U]B Q X nor [U]B Q U \X. Thus, the B-exact sets, BLOWX
= {U eU
: [U]B Q X},
and, B^^^X
= {u e U : [U\B H X ^
0}
8 Mereological Foundations to Approximate Reasoning
119
are distinct, and BLOWX Q X C B^^^X. The set BLOW^ is the lower Bappwximation to X whereas B^^^X is the upper B-approximation to X, The concept X is said to be B-rough. B{X) = B^^^X \ BLOWX is the S-boundary region, the counterpart to B in the heap example. 8.1.2 Mereology From among mereological theories of sets, we choose the chronologically first, and conceptually most elegant, viz., the mereology proposed by Le^niewski, see [4], and based on the notion of a part. Parts The relation of being a part, denoted TT, satisfies the requirements, (PI) xny A ynz => xnz. (P2) xny =^ ->{y7rx). It follows that -i(x7rx).
(8.1)
The relation of proper containment C in a family of sets satisfies (PI), (P2). The notion of a n-element (mereological element), el-j^, is defined as follows, (El) xel-j^y ^ xTTy W x = y. By (El) and (PI-2), eZ^r is a partial ordering on the mereological universe. It follows by (El) that CIQ =C is the mereo-element relation in any family of sets. Class operator The relation of a part is defined for individual objects, not for collections of objects and the class operator allows to make collections of objects into objects. The definition of the class operator is based on the notion of element e/^r; we denote this operator with the symbol CIST^. Given a non-empty collection M of objects, the class of M, denoted CIST^M, is the object that satisfies the following requirements, (Clsl) ifxeM then xelrcCls^M. (Cls2) if XCIT^CIST^M then there exist objects y,z with the properties that yel^rX, yelT^z, and z e M. (Cls3) for each non-empty collection M, the class CIST^M exists and it is unique.
120
Lech Polkowski
The reader has certainly observed that the object CISQM in case of a collection M of sets in afield of setsy is the union set^jM. We may conclude that mereology, a set theory based on the notion of a part, is a feasible vehicle for a non-standard set theory that renders intuitions fundamental to rough set theory, and commonly expressed in the language of naive set theory at the cost of operating with collections of objects, not objects themselves.
8.2 Reasoning in Rough Set Framework: Set Theory and Logic for Rough Sets From a formal point of view, given a knowledge Rom. set of entities U, each exact set, X, satisfies the dichotomy,
\fueU[ueXyueU\X].
(8.2)
To reveal the nature of (8.2), let us observe that in case X is exact, \lueU[ueX^
[U]R C X].
(8.3)
Formula (8.3) suggests a new notion of element, viz., one we denote with the symbol G*, defined for any set X C t/ and u £U, as, ^ G* X <^
[U]R C
X.
(8.4)
8.2.1 Rough set theory Collecting together the facts on rough sets and exact sets and a new notion of an element, we may define Rough Set Theory, RZF (Rough Zermelo-Fraenkel) as a set theory whose instances are tuples of the form,
(c/,u,nA,G,G^<s,7^),
(8.5)
where C/ is a set (in standard ZF), U, fi, \ are standard set operations of the union, the intersection and the complement (set difference as well), G denotes the standard element predicate, G* denotes the rough element predicate, and, moreover the following requirements are satisfied, where the notion of containment C is based on the element predicate G: 1. £^ is the family of subsets of U with the property: uG*XViiG* t / \ X for each u eU, 2. (5, U, n, \) is a C-complete Boolean algebra. 3. X en^X ^e, for each X CU,
8 Mereological Foundations to Approximate Reasoning
121
4. £ is C-co-initial as well as C-co-final in the power set 2^; hence, for each X en, there exist Y,Z eS such that 7 C X C Z. 5. t/ G* X if and only if 3 7 eS.ueY CX. A model ofRZF is induced naturally in any knowledge base (t/, {Ri : i G /}) by considering classes of the relation i? = P|^ Ri, the relation G* defined as x e* X if and only if[x]R C X, and S defined as the collection of exact sets with respect to R. By properties 2 and 4, for each X e IZ, there exist sets i{X),c{X) G S that satisfy, i{X) C X C c{X), (8.6) moreover,
• i{X) = sup{Y •
c{X) = inf{Z
eS-.YcXY eS:X
CZ},
meaning that i(X) is the largest in £ subset of X, and c{X) is the smallest in £ superset of X; these sets are counterparts of lower, respectively, upper approximations, BLOWX, B^P^X, B{X) = c{X) \ i{X) is the boundary region. 8.2.2 A logic for rough sets Intuitively, a rough set X induces a 3-valued logical structure: the case u G i{X) is interpreted as u with certainty in X (truth value 1), the case u ^U\X\^ interpreted as u with certainty not in X (truth value 0) whereas u G B{X) is interpreted as uncertain state (state of truth | ) . In the context of RZF, we define an intensional unary predicate logic RL. We consider a set Fred of unary predicates of the form 7(x), where the variable x runs over the set C/, that have denotations (either exact or rough) in the set C/, along with logical connectives N of negation and C of implication. Instead of the symbol C, in particular formulas, we will use the more familiar symbol =>. Intensions We define the intension / as a mapping from the family £ into the set {0,1, ^ } , where 0,1 denote logical values of falsity and truth, respectively, and \ denotes the uncertain state (in the Lukasiewicz 3-valued logic (see [2]) it is denoted 2, or ^) of neither being false nor being true. Formulas of rough logic RL will be evaluated as extensions at particular sets A e £. The construction of / , will be done in few steps. 1. First, given a predicate 7(0:), we consider its meaning, or denotation, [[7]] = {x e U : 7(x) holds true}. 2. We assume the following denotation rules: [[A^7]]=\[[7]]; [[Cl6]]=\Mu[[S]]. Given A e £, the extension / ^ of / at ^ is defined as follows,
(8.7)
122
Lech Polkowski
{
1 incase A C [[7]], 0 incase y l n [ [ 7 ] ] = 0 ,
(8.8)
^ otherwise. Remark We may notice that the condition yl C [[7]] is equivalent to the condition A C ^([[7]]), and, similarly, conditions A n [[7]] = 0 and A D c([[7]]) = 0 are equivalent. On the basis of (8.7), (8.8), we now establish truth tables for operators N, C. In Table 1, the first row represents truth states of 7 whereas the second row gives the corresponding truth states of Nj. In Table 2, the values of 1^(0^6) are given in the /^(7)-row and the /^(5)-column. Table 8.1. Truth table for N NpOl\
Table 8.2. Truth table for C CO 1±_ 0 111 1 0 1I 1 1 in case An [[7]] C [[S]]; ^, otherwise
Relations to Lukasiewicz logic
Table 8.3. Truth table for N in L3 lOi Np 0 1
Table 8.4. Truth table for C in L3 CO 0 1 11 1 0 1 1 11 2 2
H
8 Mereological Foundations to Approximate Reasoning
123
It is convenient to recall here the truth table (Tables 3 and 4) of 3-valued Lukasiewicz logic that we denote L3. A comparison of Tables 1, 3 and Tables 2, 4 shows that the only difference between L3 and rough logic RL is in treatment of implication between uncertain states of truth: whereas L3 assigns to this case value 1, rough logic discerns in this case between 1 and ^, assigning the former to the extension at yl € £ in case ^ n [[7]] C [[S]] only. Relations between the two logics may be expressed by means of the notion of an acceptable formula. We assume the ordering of states of truth: 0 < ^ < 1. Following a practice established in many-valued logic, we set a threshold, in our case ^, and we say that a formula in rough logic is acceptable if and only if its state of truth is at least ^ at every yl-extension. A formula 7 of rough logic will be called a theorem whenever its value is 1 at every yl-extension. Similarly, a formula 7 of rough logic will be called a theorem (respectively, acceptable) with respect to a family M of exact sets whenever its value is 1 at every yl-extension with A e M: I^{j) = 1 for every A e M (respectively, /^(7) > ^ for every A E A^). A theorem of I/3 is any formula whose value is 1 regardless of values of variables in it. Collapse is the operation that transforms a formula of rough logic RL of unary predicates into a formula in L3 by forgetting about variable x and disregarding parentheses related to usage of the variable. As a result, we have the same form of a formula in both logics. Then we have (see [13]): a formula 7 is acceptable in rough logic RL if the collapsed formula is a theorem of the logic L3. Similarly, one may verify (see op.cit.),that, for each theorem 7 of rough logic RL, the collapsed formula is a theorem of the 3-valued Lukasiewicz logic. For theorems in both logics, derivation rules, (Modus P o n e n s ) ^ ^ , (Modus Tollens) ^^^^^ are valid. Let us observe that Modus Ponens is not valid in case of acceptable formulas of rough logic whereas Modus Tollens is valid in that case (see op.cit.).
8,3 Rough Mereology We have seen in sect. 8.1.2 that mereology-based theory of sets is a proper theory of sets for rough set theory. In order to accomodate variants of rough set theory like Variable Precision Rough Sets (Ziarko, see [24]), and to account for modem paradigms like granular computing (Zadeh, see [5], [21]), or computing with words (Zadeh, [23], see [9]), it is suitable to extend the mereology based rough set theory by considering a more general relation of a part to degree.
124
Lech Polkowski
8.3.1 Set partial inclusions As a starting point, we can consider the theoretical proposition of Lukasiewicz [7] of endowing formulas of unary predicate calculus interpreted in a finite set U with partial degrees (states) of truth. Given, e.g., an implication p{x) => q{x), its degree of truth is given by the value of the fraction, \{u:p{u)Aq{u)}\ \{u:p{u)}\
^^^^
The formula (8.9) has been exploited in many contexts, e.g., in rough membership functions (Pawlak and Skowron, op.cit.), accuracy and coverage coefficients for decision rules (Tsumoto, [20]), association rules (Agrawal et al., [1]), variable precision rough sets (Ziarko, [24]), approximation spaces (Skowron and Stepaniuk, [19]). The following properties of a measure of partial containment, /i(rr, y) between two objects, X, y, defined according to (8.9) may be derived, (SIl)/x(x,x) = 1. (SI2) fi{x, y) = 1 if and only ifXCY, (SB) if /i(a:, y) = 1 then fi{z, x) < fi{z^ y) for each non-empty set z. We will call set partial inclusions functions defined on pairs of non-empty sets and satisfying (SIl-3). In general, measures fi will be called rough inclusions, see [17]. 8.3.2 Rough inclusions We consider a universe U of non-empty objects along with a mereological relation TT of a part, inducing the mereological element relation el-j^. A rough inclusion, is a relation fx^^ CU x U x [0,1] that satisfies the following requirements, (RIl) jU7r(x, X, 1) for each x eU. (RI2) /XTT(x, 2/, 1) if and only if xel^^y for each pair x, y of elements of U. (RI3) if /i7r(^5 y? 1) then for each z e U, and each r G [0,1], the implication holds: if f^^{z, X, r) then fi^{z, y, r). (RI4) if/XTTC^J 2/5 '^) and s
8 Mereological Foundations to Approximate Reasoning
125
8.3.3 Rough inclusions in information systems We would like to begin with single elements of the universe U of an information system (U^A), on which to define rough inclusions. First, we want to address the problem of transitive rough inclusions. We recall that a t-norm T^ is archimedean if in addition it is continuous and T{x,x) < X for each x € (0,1). It is well known (Ling, see [6], cf. [15]) that any archimedean t-norm T, can be represented in the form, T{x,y)=9[f{x) + f{y)),
(8.10)
where / : [0,1] -^ [0,1] is continuous decreasing and g is the pseudo-inverse to /^ We will consider the quotient set UIMD = U/IND{A), and we define attributes on UjMD by means of the formula, a(M/iVD(.4)) = «(^)For each pair x, y of elements of UIND, we define the discemibility set DIS{x, y) {a£ A: a(x) j^ a{y)} C A. For an archimedean t-norm, T, we define a relation ^T by letting,
M^,J/,r)^5(l^^^M)>,.
(8.11)
Then, /XT is a rough inclusion that satisfies the transitivity rule, see [14],
f^T{x,y,r),/j.T{y,z,s)
(8.12)
fZT{x,z,T{r,s))
Particular examples of rough inclusions are the Menger rough inclusion, (MRI, in short) and the Lukasiewicz rough inclusion (LRI, in short), corresponding, respectively, to the product t-norm TM{x,y) = x - y, and the Lukasiewicz product TL{X, y) = max{0, x-\-y -l). The Menger Rough inclusion For the t-norm T/v/, the generating function f{x) = —Inx whereas g{y) = e~^ is the pseudo-inverse to / . The rough inclusion ^TM ^^ given by the formula, \DIS{x,y)\
fj.TM{^,y.r)^e
1^1
>r.
(8.13)
^i.e., a map T : [0,1]^ -^ [0,1] that is symmetric, associative, increasing and satisfies r(a:,0) = 0. ^This means that g(x) = 1 for rr G [0, / ( I ) ] , g{x) = 0 for x € [/(O), 1], and g{x) = /-^(x)forx€[/(l),/(0)].
126
Lech Polkowski
The Lukasiewicz rough inclusion For t-norm TL, the generating function f{x) inverse to / . Therefore,
= I — x and g = f is the pseudo-
,,^i:,,y,r)^l-\£l^>r.
(8.14)
Let us observe that rough inclusions based on sets DIS are necessarily symmetric. Table 8.5. The information system A U ai a2 Cis a4 a:i 1 1 1 2 X2 X3 X4 X5 X6 X7 X8
1 2 3 3 3 1 2
0 0 2 1 2 2 0
1 0 1 1 1 0 1 0 1 2 01 02
For the information system A in Table 5, we calculate values of LRI, shown in Table 6; as //TL is symmetric, we show only the upper triangle of values. Table 8.6. /x^ for Table 5 U
Xl X2
Xs
X4 Xs
XQ XJ XS
xi 1 X2 X3X4 -
0.5 0.25 0.25 0.5 0.5 0.25 0.25 1 0.5 0.5 0.5 0.25 0.25 0.25 - 1 0.25 0.25 0.25 0.25 0.5 - - 1 0.75 0.75 0.25 0
X5-
-
-
-
1
0.5
0 0
X6- - X7 - - -
- - 1 0.25 0.25 - - - 1 0.25
X8 -
-
-
-
-
-
-
1
Rough inclusions over relational information systems In some applications, a need may arise, to stratify objects more subtly than it is secured by sets DIS. A particular answer to this need can be provided by a relational information system by which we mean a system {U, A , R), where R — {Ra ' CL G A} with Ra ^Va xVa^ relation in the value set Va.
8 Mereological Foundations to Approximate Reasoning
127
A modified set DIS^{x,y) is defined as follows; DIS^{x,y) = {a e A : Ra{ci{^),o,{y))}' Then, for any archimedean t-norm T, and non-reflexive, nonsymmetric, transitive, and linear, relation R, we define the rough inclusion /x^ by the modified formula,
^,^i:c,y,r)^gi\^l^^^>r,
(8.15)
where g is the pseudo-inverse to / in the representation r ( r , s) = g{f{r) -f f{s)); clearly, the notion of a part is here: xn^y if and only \i x ^ y and Ra{a{y), a{x)) for each a e A. Let us look at values of /x^ in Table 7 for the information system in Table 5 with value sets ordered linearly as a subset of real numbers. Table 8.7. I^^TL for Table 5 U
X\
XI X2 X3 X4 X5 X6 X7 X8
1 1 0.75 0.5 0.5 0.5 0.75 0.75 0.5 1 0.5 0.5 0.5 0.25 0.5 0.5 0.5 1 1 0.5 0.5 0.25 0.75 0.75 0.75 1 0.75 1 1 0.75 0.75 0.75 0.75 1 0.75 0.75 1 0.5 0.5 0.75 1 1 1 1 1 1 1 1 0.5 0.75 0.5 0.5 0.5 0.25 1 0.5 0.5 0.75 0.75 0.25 0.25 0.25 0.75 1
X2
X^ X4 X5
X6
Xj
Xs
As expected, the modified rough inclusion is non-symmetric. We now discuss problems of granulation of knowledge, showing an approach to them by means of rough inclusions.
8.4 Rough Mereological Granule Calculus Granular computing paradigm proposed by Lotfi Zadeh, is based on the idea of making entities into granules of entities and performing calculations on granules in order to reduce computational cost of approximate problem solving. Here we propose a general scheme for granular computing based on rough mereology. We assume a rough inclusion /i^ on a mereological universe {U, el^r) with a part relation TT. For given r < 1 and x E C/, we let, Qrix) = Cls{%),
(8.16)
%iy)^f^^{y,x,r).
(8.17)
where The class gr{x) collects all atomic objects satisfying the class definition with the concept iZv.
128
Lech Polkowski
We will call the class gr{x) the r-granule about x; it may be interpreted as a neighborhood of x of radius r. We may also regard the formula yyirX as stating similarity oiyiox (to degree r). We do not discuss here the problem of representation of granules; in general, one may apply sets or lists as the underlying representation structure. The following are general properties of the granule operator gr induced by a rough inclusion /i^r, see [14]. 1. 2. 3. 4. 5.
\ifi^{y,x,r)i\iQnyel^gr{x). if /^TT(^, y-t ^) A yelT^z then xel^^gr (z). \/z.[zelT^y => 3w, q.{welT^z A WCIT^Q A finiQ^ ^j ^)] => yel-Kgri^)if yelT^grix) A zel^ry then zel^^grix). if 5 < r then gr{'^)elT^gs{x).
8.4.1 Granulation via archimedean t~norm based rough inclusions For an archimedean t-norm T — induced rough inclusion /i^, we have a more detailed result, viz., the equivalence, 6. for each x, y G UjNDy ^^lirgriy) if and only if fj^rix, y, ^)We consider the information system of Table 5 along with values of rough inclusions fiTL^I^TL Siv^^» respectively, in Tables 6 and 7. Admitting r = .5, we list below granules of radii .5 about objects xi — xg in both cases. We denote with the symbol gi^gf, respectively, the granule go.bixi) defined by ^J'TL^^^TL' respectively, presenting them as sets. We have, 1- gi =
2. g2 =
{XI,X2,X5,XG},
{xi,X2,Xs,X4,Xs},
3. gs = { X 2 , X 3 , X 8 } ,
4. g4 = {X2,X4,X5,X6}, 5. g5 = {xi,X2,X4,X5,X6}, 6. g6 =
{XI,X4,X^,XG},
7. gj = {xj}, 8. ^8 = {xs.xs}, what provides an intricate relationship among granules: i^g^^g^ Q gs, gs Q g2, ^2, g5 incomparable by inclusion, gr isolated. We may contrast this picture with that for fXj,^. 2. g^ = 9^ =97=U\ 3- 9s =
{xe},
{xi,X2,X3,X7,Xs},
providing a nested sequence of three distinct granules.
8 Mereological Foundations to Approximate Reasoning
129
8.4.2 Extending rough inclusions to granules We now extend /XTT over pairs of the form x, g, where x G Ujjsfo, 9 a granule. We define /x^r in this case as follows, fi^{x,g,r)
<^3y e UiND^yel^g and iJ.^{x,y,r).
(8.18)
The notion of element el^ on pairs x, g follows by treating p as a class of elements under element relation el^, i.e., we admit that, xelng if and only if for each element z of x, there exist elements w,t such that WEIT^Z, welT^q, and g{q). By g{q) true, we mean that q has the property defining g. Clearly, the extended relation e/^ is transitive. The relation //TT on pairs x,y or x^g, where x^y £ UjjsfD, and g a concept, is a rough inclusion. Finally, we define ^^r on pairs of the form g,h, of granules by means of the formula, f^nig, h, r) ^ \/xel^g3y.yel^h and /i^(x, y, r). (8.19) The corresponding notion of element extended to pairs of concepts g, h will be defined as follows, gelT^h if and only if for each element xel-j^g there exist elements w, y with wel-j^x, welT^y and yelT^h. The extended most general form of /u is a rough inclusion. Extended archimedean rough inclusions For a rough inclusion fir based on an archimedean t-norm T, the extended rough inclusion satisfies the generalized transitivity rule, liT{k,g,r),^iT{g,h,s) ^T{Kh,T{r,s))
'
^^-^^^
We also notice a property of granules based on /^T, xelT,gr{y) =^ 9s{x)el^gT(r,s){y)-
(8.21)
8.4.3 Approximations by granules Given a granule g, and r, 5 e (0,1), we can define, by means of the class operator, approximations to g by granules as follows, subject to the restriction that classes in question are non-empty. The /x, r-lower approximation to ^ by a collection H = {h} of granules is the class.
130
Lech Polkowski
ClsLOwifJ', H, r) = Cls^{ii, g, r, iJ), where ^{ji, g, r, H){h) holds if and only if fi{h, g^ r) and h e H hold. Similarly, the /x, s-upper approximation to p by a collection H of granules is the class, Cls^PP{lJL, H, r) = Cls^ifi, g, r, H), where ^()Li, p, r, if )(/i) holds if and only if fi{h, g,t) -^ t < s and h e H hold. Taking set inclusion /i in the above definitions, we obtain approximations in the Variable Precision Rough Set Model (Ziarko, op.cit.).
8.5 Spatial Reasoning Based on Rough Mereology The scheme presented above may be exploited as a basis for spatial reasoning. We present a sketch of this approach. This approach is based on the functor C of being connected that satisfies the following, (CI) xCx\ meaning reflexivity of C. (C2) xCy ==> yCx\ meaning symmetry. (C3) \iz.{zCx ^^=> zCy)] ==> {x = y)\ meaning extensionality."* In terms of connections, schemes for spatial reasoning are constructed, see [3]. 8.5.1 Connections from rough inclusions In this section we investigate some methods for inducing connections from rough inclusions /x = /XTT, see [16]. Limit connection We define a functor CT as follows, xCry ^=^ - ( 3 r , 5 < l.ext{gr{x),gs{y))),
(8.22)
where ext{x^ y) is satisfied in case x, y have no common parts. Clearly, (C1-C2) hold with CT irrespective of a rough inclusion fi applied . The status of (C3) depends on /i. In case x ^ y^v/e have, e.g., zelx and ext{z, y) for some z. Clearly, CT{Z, X)\ to prove -I(CT(>2^, t/)), we add a new property of /x: (RM5) ext{x,y) ==:^ 3s < l.Vt > 5.-i[/i(x,y,t)]. Under (RM5), CT induced via // does satisfy (C3), i.e. it is a connection. 8.5.2 From Graded Connections to Connections We begin with a definition of an individual BdrX. BdrX ~ CIST^{II'^{X)), where/i;^(x)(2) 4=^ ^{z,x,r) A -i(3s > We introduce a graded (r, s)-connection C{r, s) (r, s
r.fi{z,x,s)).
8 Mereological Foundations to Approximate Reasoning xC{r, s)y <^==^ 3w.welTrBdrX A welT^{Bdsy).
131 (8.23)
We have then (i) xC(l, l)x\ (ii) xC(r, s)y = > yC{s, r)x. Concerning the property (C3), we adopt here a new approach. It is valid from theoretical point of view to assume that we may have "infinitesimal" parts i.e. objects as "small" as desired.^ Infinitesimal parts model We adopt a new axiom of infinitesimal parts (IP) -^{xelj^y) ==^ yr > O.Bz.zel^^x^ s <
r.zfif{y).
Our rendering of the property (C3) under (IP) is as follows: (C3)/p -^{xel^y) = > Vr > 0.3^, s > r.zC{l, l)x A zC{l, s)y. Connections from Graded Connections Our notion of a connection will depend on a threshold, a, set according to the needs of the context of reasoning. Given 0 < a < 1, we define a functor Ca as follows, xCay -^^ 3r, s > a.xC{r, s)y.
(8.24)
Then the functor C^ has all the properties (C1)-(C3) of a connection, see [16].
References 1. Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, and Uthurusamy R (eds) Advances in Knowledge Discovery and Data Mining. AAAI Press. 2. Borkowski L (ed)(1970) Jan Lukasiewicz. Selected Works. North Holland - Polish Sci. Publ., Amsterdam - Warsaw 3. Cohn AG, Varzi A (1998) Connections relations in mereotopology. In: Prade H (ed) Proceedings ECAI 98, 13th European Conference on Artificial Intelligence. Wiley, Chichester 4. Lesniewski S (1982) On the foundations of mathematics. Topoi 2: 7-52 5. Lin TY, Yao YY, Zadeh LA (eds) (2001) Rough Sets, Granular Computing and Data Mining. Physica, Heidelberg 6. Ling CH (1965) Representation of associative functions. Publ. Math. Debrecen 12 : 189212 7. Lukasiewicz J (1913) Die Logischen Grundlagen der Wahrscheinlichtkeitsrechnung. Krakow (see [2]) ^Cf. an analogous assumption in mereology based on connection [8]).
132
Lech Polkowski
8. Masolo C, Vieu L (1993) Atomicity vs. infinite divisibility. In: Freksa C, Mark DM (eds) Spatial Information Theory, LNCS vol. 692. Springer, Berlin 9. Pal SK, Polkowski L, Skowron A (eds) (2004) Rough-neural Computing. Techniques for Computing with Words. Springer, Berlin 10. Pawlak Z (1992) Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht 11. PawlakZ (1982)Rough sets.Intem. J. Comp. Inf. Sci.ll : 341-356 12. Pawlak Z, Skowron A (1994) Rough membership functions. In: Yager RR, Fedrizzi M, Kacprzyk J (eds) Advances in the Dempster-Schafer Theory of Evidence. Wiley, New York 13. Polkowski L (2004) A note on 3-valued rough logic accepting decision rules.Fundamenta Informaticae, 61(1): 3 7 ^ 5 14. Polkowski L (2003)Rough mereology: A rough set paradigm for unifying rough set theory and fuzzy set theory. Fundamenta Informaticae 54(1): 67-88 15. Polkowski L (2002) Rough Sets. Mathematical Foundations. Physica, Heidelberg 16. Polkowski L (2001) On connection synthesis via rough mereology. Fundamenta Informaticae 46 (1/2): 83-96 17. Polkowski L, Skowron A (1996) Rough mereology: a new paradigm for approximate reasoning. International Journal of Approximate Reasoning 15(4): 333-365 18. Read S (1995)Thinking about Logic: An Introduction to the Philosophy of Logic. Oxford U.R 19. Skowron A, Stepaniuk J (2001) Information granules: Towards foundations of granular computing. International Journal for Intelligent Systems 16: 57-85 20. Tsumoto S (1998) Automated induction of medical system expert rules from clinical databases based on rough set theory. Information Sciences 112: 67-84 21. Yao YY (2004)Information granulation and approximation. In: [9]. 22. Zadeh LA (1965) Fuzzy sets. Information and Control 8: 338-353 23. Zadeh LA, Kacprzyk J (eds) (1999) Computing with Words in Information/Intelligent Systems 1. Physica, Heidelberg 24. Ziarko W (1993) Variable precision rough set model. J. Computer and System Sciences 46:39-59
Data Security and Null Value Imputation in Distributed Information Systems Zbigniew W. Ras^'^ and Agnieszka Dardziiiska^ ^ UNC-Charlotte, Department of Computer Science, Charlotte, N.C. 28223, USA ^ Polish Academy of Sciences, Institute of Computer Science, Ordona 21, 01-237 Warsaw, Poland ^ Bialystok Technical Univ., Dept. of Mathematics, ul. Wiejska45A, 15-351 Bialystok, Poland Summary. Distributed Information System (DIS) is seen as a collection of autonomous information systems which can collaborate with each other. This collaboration can be driven by requests for knowledge needed to predict what values should replace null values in missing or incomplete attributes. Any incompleteness in data can be seen either as the result of a partial knowledge about properties of objects stored in DIS or some attributes might be just hidden from users because of the security reason. Clearly, in the second case, we have to be certain that the missing values can not be predicted from the available data by chase, distributed chase or any other null value imputation method. Let us assume that an attributes d is hidden at one of the sites of DIS, denoted by S and called a client. With a goal to reconstruct this hidden attribute, a request for a definition of this attribute can be sent by S to some of its remote sites (see [15]). These definitions stored in a knowledge-base KB can be used by Chase algorithm (see [4], [6]) to impute missing attribute values describing objects in S. In this paper we show how to identify these objects and what additional values in S have to be hidden from users to guarantee that initially hidden attribute values in S can not be properly predicted by Distributed Chase.
9.1 Introduction Distributed information system is a system that connects a number of information systems using network communication technology. In this paper, we assume that these systems are autonomous and incomplete. Incompleteness is understood by allowing to have a set of weighted attribute values as a value of an attribute. Additionally, we assume that the sum of these weights has to be equal 1. The definition of an information system of type A and Distributed Information System {DIS) proposed in this paper is a modification of definitions given by Ras in [14] and used later by Ras and Dardzinska in [15] to talk about semantic inconsistencies among sites of DIS from the query answering point of view. The type A is introduced mainly to monitor if weights assigned to values of attributes by Chase algorithm are greater than or equal to A. If the weight assigned by Chase to one of the attribute values
134
Zbigniew W. Ras and Agnieszka Dardzinska
is less than the allowed threshold value, then this attribute value has to be ruled out. Semantic inconsistencies are due to different interpretations of attributes and their values among sites (for instance one site can interpret the concept young differently than other sites). Different interpretations are also due to the way each site is handling null values. Null value replacement by a value suggested either by statistical or some rule-based methods is quite common before a query is answered by QAS. Ontology ([10], [11], [18], [19], [20], [2], [3], [21], [7]) is a set of terms of a particular information domain and the relationships among them. Currently, there is a great deal of interest in the development of ontology to facilitate knowledge sharing in general and information systems integration in particular. Ontologies and interontology relationships among them are created by experts in corresponding domain, but they can also represent a particular point of view of the global information system by describing customized domains. Ontologies can be expressed, for instance, using statements in description logics. These descriptions are organized as a lattice and may be considered as semantically rich metadata that capture the information content of underlying data repositories. As ontologies are abstractions, they can describe almost any kind of data format. Also, to allow an intelligent query processing, we assume that any information system in DIS is described by one or more ontologies. Inter-ontology relationships can be seen as a semantical bridge between autonomous information systems so they can collaborate and understand each other. In [15], the notion of the optimal rough semantics and a method of its construction was proposed. The rough semantics can be used to model and nicely handle semantic inconsistencies among sites due to different interpretations of incomplete values. Distributed chase is a chase algorithm [1] linked with a site S of DIS, called a client, which is similar to Chasel [6] and Chase2 [4] with additional assumption concerning the creation of knowledge bases at all sites of DIS involved in the process of solving a query submitted to 5. The knowledge base at the client site contains rules extracted from S and also rules extracted from information systems at its remote sites. The structure of the knowledge base and its properties are the same as properties of the knowledge bases used in Chasel or Chase2 algorithms. The difference only lies in the process required to collect these rules. In a distributed framework, these rules are extracted from the local and remote sites, usually under different semantics. Although the names of the attributes are often the same among sites, their granularity levels may differ from site to site. As the result of these differences, the knowledge base has to satisfy certain properties in order to be used by Chase, The same properties are required by the query answering system based on Chase. In this paper, we mainly concentrate on the problem of reconstructing values of attributes which are hidden from users for either some or all objects stored at one of the sites of S called a client. The knowledge related to hidden attributes in S can be extracted at remote sites for S which means that hidden values for some objects in S might be reconstructed even with a high degree of certainty. To avoid this, system S has to be made more incomplete. The goal of this paper is to propose a strategy for identifying the minimal number of values in S which additionally have to be hidden from users to guarantee that hidden attribute values in S can not be reconstructed by chase or distributed chase.
9 Data Security and Null Value Imputation in DIS
135
9.2 Query Processing with Incomplete Data In real life, data are often collected and stored in information systems residing at many different locations, built independently, instead of collecting them and storing at only one single location. In such cases we talk about distributed (autonomous) information systems. It is very possible that an attribute is missing or hidden in one of them while it occurs in many others. Also, in one information system, an attribute might be partially hidden, while in other systems the same attribute is either complete or close to being complete. Assume that user submits a query to one of the information systems (called a client) which involves some hidden or non-local attributes. In such a case, network communication technology is used to get definitions of these unknown or hidden attributes from other information systems (called servers). All these new definitions form a knowledge base which can be used to chase both missing and hidden attributes at the client site. But, before any chase-based algorithm can be applied, semantic inconsistencies among sites have to be resolved first. For instance, it can be done by taking rough semantics [Ras & Dardzinska], mentioned earlier. Otherwise, an inter-ontology relationship between local ontologies associated with two involved information systems has to be provided. Definition 1: We say that S = (X, A, V) is a partially incomplete information system of type A, if S is an incomplete information system and the following three conditions hold: •
as{x) is defined for any a: G X, a G A,
•
(Vx G X)(Va G A)[{as[x) = {(a,,p,) : 1 < i < m})-^
•
(Vx G X){Wa G A)[{as{x) = {(a,,p,) : 1 < i < m}) ^ {\/i){pi > A)].
YZiPi
= ^l
Now, let us assume that 5i, 52 are partially incomplete information systems, both of type A. The same set X of objects is stored in both systems and the same set A of attributes is used to describe them. The meaning and granularity of values of attributes from A in both systems Si, S2 are also the same. Additionally, we assume thata5,(x) = {{aii,pii) : 1 < m i } Sindas^ix) = {(a2i,P2i) : 1 < ^ 2 } . We say that <5-containment relation ^ holds between ^i and S2, if the following three conditions hold: •
(Vx G X)(Va G A)[card{as^{x)) > card{as^{x))],
•
(Vx G X){\/a G A)[[card{as^{x)) = card{as^{x))] -^ [£i^j \P2i-P2j\ > iZi^j \Pli-V\3\]]' [Ei#j IP2i - P231 - Yli^j \Pii - Pij II] > ^•
•
Instead of saying that 5-containment relation holds between Si and ^2, we can equivalentiy say that Si was transformed into ^2 by (5-containment mapping ^. This fact can be presented as a statement ^ ( 5 i ) = ^2 or (Vx G X)(Va G
136
Zbigniew W. Ras and Agnieszka Dardzinska
A)[^{asi{x)) = ^{as2{^))]' Similarly, we can either say that as^{x) was transformed into as2 (x) by ^ or that (5-containment relation ^ holds between as^ {x) and as^ix). So, if 5-containment mapping ^ converts an information system 5 to 5', then 5 ' is more complete than S. Saying another words, for a minimum one pair (a, x) e A X X, either ^ has to decrease the number of attribute values in as{x) or the average difference between confidences assigned to attribute values in as{x) has to be increased minimum by S. To give an example of a (^-containment mapping ^, let us take two information systems Si, S2 both of the type A, represented as Table 9.1 and Table 9.2. Also, we assume that 5 = ^. Table 9.1. Information System Si
X
a
b
c
d
c
Xi
{(ai. l),(«2,i)}
{(^1:>l)db2.. >!)}
Cl
di
{(ei,.i),(e2,i)}
X2
{(^2, i),(«3,|)}
{(^1:.5)'(^2,.1)}
d2
ei
{(ci,i),(C3,i)}
d2
63
C2
di
62
X3 X4
as
X5
xe X7
{(ei, i).(e2,|)}
{(«i, i),(a2,i)}
61
C2
^2
62
C3
d2
{(62,
^2
{(^1,.i),(^2,>!)}
{(ci,i),(C2,i)}
d2
62
62
Cl
di
63
X8
ei
|),(e3,|)}
Table 9.2. Information System ^2 X
a
b
c
d
e
XI
{(ai,i),(a2,i)}
{(6i,i),(62,i)}
ci
di
{(ei, | ) , (62, f ) }
X2
{(a2a).(«3,f)} ^1
{(ci,|),(c2,i)}
ci2 ei
X3
ai
{(ci,i),(c3,|)}
c?2
X4
as
C2
X5
{(«!,!), («2,i)}
61
C2
Xe
a2
^2
C3
C?2 { ( e 2 , | ) , ( e 3 , | ) }
X7
a2
{(^1,1), (^2, I ) }
Cl
d2 62
X8
{(^1, | ) , ( a 2 , | ) }
&2
Cl
(ii
^2
ea 62 ei
63
9 Data Security and Null Value Imputation in DIS
137
It can be easily checked that the values assigned to e{xi), b{x2), c{x2), a{xs), e(x4), a{xs), c{x7), and a{xs) in 5i are different than the corresponding values in 52. In each of these eight cases, an attribute value assigned to an object in 52 is less general than the value assigned to the same object in Si. Also, it can be easily checked that ^ satisfies S restriction. It means that ^{Si) = 52.
9.3 Query Processing with Distributed Data and Chase Assume now that L{D) = {{t -^ Vc) e D : c e In{A)} (called a knowledge-base) is a set of all rules extracted from 5 = (X, A, V) by ERID{S, Ai, A2), where In{A) is the set of incomplete attributes in 5 and Ai, A2 are thresholds for minimum support and minimum confidence, correspondingly. ERID is the algorithm for discovering rules from incomplete information systems, presented by Dardzinska and Ras in [5] and used as a part of Chase algorithm in [16]. The type of incompleteness in [16] is the same as in this paper. Assume now that a query q{B) is submitted to system 5 = (X, A, V), where B is the set of all attributes used in g(B) and that AnB ^^. Attributes in JB — [^ n B] are called either foreign or hidden in 5. If 5 is a part of a distributed information system, definitions of such attributes can be extracted at remote sites for 5 (see [15]). Clearly, all semantic inconsistencies and differences in granularity of attribute values among sites have to be resolved first. To simplify the problem, we adopt the same assumption as in [15]. It means that different granularity of attribute values and different interpretation of incomplete attribute values are only allowed among sites. It was shown in [15] that to process a query of type q{B) at site 5, we can discover definitions of values of attributes from B — [AoB] at the remote sites for 5 and next use them to answer q{B). Hidden attributes for 5, can be seen as attributes entirely incomplete in 5, which means values (either exact or partially incomplete) of such attributes have to be ascribed to all objects in 5. Stronger the consensus among sites on a value to be ascribed to X, better the result of the ascription process for x can be expected in most of the cases. The question remains, whether the values predicted by the imputation process are correct. Possible approach, to this type of problems, is to start with a complete information system and remove randomly from it, let's say, 10 percent of its values and next run the imputation algorithm on the resulting system. The next step is to compare the descriptions of objects in the system which is the outcome of the imputation algorithm with descriptions of the same objects in the original system. Clearly, hidden attribute values are known to some of the users. So, we can run the imputation algorithm for all hidden attributes and compare the results with values known to be correct. Descriptions of objects for which hidden attribute values are predicted reasonably well should be made more incomplete. Before we continue this discussion, we have to decide first on the interpretation of functors or and and, denoted in this paper by + and *, correspondingly. We will adopt the semantics of tenns proposed by Ras & Joshi in [17] as their semantics has all the properties required for the query transformation process to be sound and complete [see
138
Zbigniew W. Ras and Agnieszka Dardziriska
[17]]. It was shown that their semantics satisfies the following distributive property: tl * (^2 + ^3) = (^1 * ^2) -f (^1 * ^3).
Let us assume that S = (X, A^ V) is an information system of type A and t is a term in predicate calculus constructed, in a standard way, from values of attributes in V seen as constants and from two functors + and *. By Ns{t), we mean the standard interpretation of a term t in 5 defined as (see [17]):, • • •
Ns{v) = {{oc,p) : {v,p) e a{x)}, for any v e K , Ns{ti+t2)=Ns{ti)®Ns{t2), Ns{ti*t2) = Ns{ti)^Ns{t2),
where, for any Nsih) • •
= {ixi,pi)}i^i,
iV^fe) = {{xj^Qj)}jeJ^ we have:
Ns{ti) e Ns{t2) = {{xi,Pi)}ie{i-J) ^ {{^3^Pj)}je{J-i) Nsih) 0 Ns{t2) = {{xi,Pi ' qi)}ieiinJ)'
U
{ixi^rnax{pi,qi))}ieinj,
The incomplete value imputation algorithm Chase (see [16]), based on the above semantics, converts information system S of type A to a new more complete information system Chase{S) of the same type. This algorithm assumes partial incompleteness of data (sets of weighted attribute values can be assigned to an object as its value) in system S. Rules discovery system ERID (see [4]) was used to extract rules from this type of incomplete data set and next applied in Chase algorithm. Now, let us assume that a partially incomplete information system S of type A is used to store descriptions of objects. When a query asking for objects in S, satisfying some hidden property, is submitted to 5, its query answering system QAS will replace S by Chase{S) and next will solve the query using, for instance, the strategy proposed Ras & Joshi in [17]. Clearly, we have to make sure that our system is secure and objects in S which satisfy this hidden property can not be retrieved by QAS.
9.4 Distribution, Inconsistency, and Distributed Chase As we already pointed out, the knowledge base L{D), contains rules extracted locally at the client site (information system queried by user) as well as rules extracted from information systems at its remote sites. Since rules are extracted from different information systems, inconsistencies in semantics, if any, have to be resolved before any query can be processed. There are two options: •
a knowledge base L{D) at the client site is kept consistent (in this scenario all inconsistencies have to be resolved before rules are stored in the knowledge base),
•
a knowledge base at the client site is inconsistent (values of the same attribute used in two rules extracted at different sites may be of different granularity levels and may have different semantics associated with them).
9 Data Security and Null Value Imputation in DIS
139
In general, we assume that the information stored in ontologies and, if needed, in inter-ontologies (if they are provided) is sufficient to resolve inconsistencies in semantics of all sites involved in Chase. Inconsistencies related to the confidence of conflicting rules stored in L{D) do not have to be resolved at all (algorithm Chase does not have such a requirement).
•^3 1 g [q b\ cl
qs2 •4
Si\b\a
Tpl
^ ZlJ
KB
! 1
1 1 11 1 KB
fsf rt ~ extracted from S\
Fig. 9.1. Global extraction and exchange of knowledge
The fact, that rules stored in L{D) can be extracted at different sites and under different interpretations of incomplete values, is not pleasant assuming that we need to use them in Chase. In all such cases, following the same approach as in [15], rough semantics can be used for interpreting rules in L{D). One of the problems related to an incomplete information system S = (X, A, V) is the freedom how new values are constructed to replace incomplete values in 5, be-
140
Zbigniew W. Ras and Agnieszka Dardzinska
fore any rule extraction process begins. This replacement of incomplete attribute values can be done either by Chase or/and by a number of available statistical methods (see [9]). This implies that semantics of queries submitted to S and queries processed by the query answering system QAS based on Chase, may often differ. In such cases, following again the approach in [15], rough semantics can be used by QAS to handle this problem. In this paper we only concentrate on granularity-based semantic inconsistencies. Assume first that Si — {Xi,Ai,Vi) is an information system for any i G / and that all of them form a Distributed Information System (DIS). Additionally, we assume that, if a e AiH Aj, then only the granularity levels of a in Si and Sj may differ but conceptually its meaning, both in Si and Sj is the same. Assume now that D = Ui^j L{Di) is a set of rules which can be used by Chase algorithm, associated with any of the sites of DIS, and L{Di) contains rules extracted from S^. Now, let us say that system Sk,kelis queried be a user. Chase algorithm, to be applicable to Sk, has to be based on rules from D which satisfy the following conditions: •
•
•
attribute value used in the decision part of a rule from D has the granularity level either equal to or finer than the granularity level of the corresponding attribute in Sk. the granularity level of any attribute used in the classification part of a rule from D is either equal or softer than the granularity level of the corresponding attribute inSk, attribute used in the decision part of a rule from D either does not belong to A^ or is incomplete in 5^.
These three conditions are called distributed Chase 5A;-applicability conditions. Let Lk{D) denotes the subset of rules in D satisfying these three Chase Skapplicability conditions. Assuming now that a match between the attribute value used in the description of the tuple t and the attribute value used in a description of a rule s -^ d £ Lk{D) is found, the following two cases should be considered: •
•
an attribute involved in matching is the decision attribute m s -^ d. If two attribute values, involved in that match, have different granularity, then the decision value d has to be replaced by a softer value which granularity will match the granularity of the corresponding attribute in Sk. an attribute a involved in matching is the classification attribute ins -^ d. If two attribute values, involved in that match, have different granularity, then the value of attribute a has to be replaced by a finer value which granularity will match the granularity of a in S^.
The new set of rules constructed from Lk(D), following the above two steps, is called granularity-repaired set of rules. So, the assumption that Lk{D) satisfies distributed Chase 5^-applicability conditions is sufficient to run Chase successfully on Sk using this new granularity-repaired set of rules. In Figure 9.1, we present two consecutive states of a distributed information system consisting of Si, S2, S3.
9 Data Security and Null Value Imputation in DIS
141
In the first state, all values of all hidden attributes in all three information systems have to be identified. System ^i sends request qs^ to the other two information systems asking them for definitions of its hidden attributes. Similarly, system ^2 sends request qs2 to the other two information systems asking them for definitions of its hidden attributes. Now, system S^ sends request ^53 to the other two information systems also asking them for definitions of its hidden attributes. Next, rules describing the requested definitions are extracted from each of these three information systems and sent to the systems which requested them. It means, the set L{Di) is sent to 52 and 53, the set L{D2) is sent to Si and 53, and the set L{Ds) is sent to Si and 52. The second state of a distributed information system, presented in Figure 9.1, shows all three information systems with the corresponding L{Di) sets, i G {1,2,3}, all abbreviated as KB. Now, the Chase algorithm is run independently at each of our three sites. Resulting information systems are: Chase{Si), Chase{S2), and Chase{Sz). Now, the whole process is recursively repeated. It means, both hidden and incomplete attributes in all three new information systems are identified again. Next, each of these three systems is sending requests to the other two systems asking for definitions of its either hidden or incomplete attributes and when these definitions are received, they are stored in the corresponding KB sets. Now, Chase algorithm is run again at each of these three sites. The whole process is repeated till some fixed point is reached (no changes in attribute values assigned to objects are observed in all 3 systems). When this step is accomplished, a query containing some hidden attribute values can be submitted to any Si,i G {1,2,3} and processed in a standard way.
9.5 Distributed Chase and Security Problem of Hidden Attributes Assume now that an information system 5 = (X, A, V) is a part of DIS and attribute h e A has to be hidden. For that purpose, we construct 5^ = (X, A, V) to replace 5, where: • • •
as{x) = as^{x), for any a e A — {6}, x e X, bsf, (x) is undefined, for any x e X, bs{x) e Vb.
Users can submit queries to Sb and not to 5. What about the information system Chase{Sb)l How it differs from 5? Clearly, bs{x) can be equal to bchase{Sb)i^) for a number of objects in X. If this is the case, additional values of attributes for all these objects should be hidden. In this section, we show how to identify the minimal number of values which should be additionally hidden in 5^ to guarantee that values of attribute b can not be reconstructed by Chase for any x e X. We present our strategy using system 5 = (X, A, V) of type A = | from Table 9.1 as an example. We also assume that attribute d is hidden in 5. The corresponding system 5^ of type A = | is given as Table 9.3. Also, assume that the following rules have been extracted at the remote sites for Sd:
142
Zbigniew W. Ras and Agnieszka Dardzinska Table 9.3. Information System Sd
X
a
6
c
Xl
{(«i. l)-(«2,i)}
{(61.. 3 ) ' ( ^ ' 3 ) /
Cl
X2
{(«2,
e {(^1,^5)'(e2,§)}
i),(a3,|)} {(bl: , | ) , ( b 2 , | ) } 62
X3
d
ei
{(ci,, | ) , ( C 3 ,
1)}
e3
C2
{(ei, f),(e2,|)}
i)-(a2,|)} h
C2
ei
0'2
h
C3
{(^2, I),(e3,f)}
a2
{{bl: .i),(&2,f)}
{(ci,
b2
Cl
X4
as
X5
{(ai,
X6 X7 X8
ri = ^2 = ^3 = ^4 = ^5 = ^6 =
|),(e2,
i)}
62
ea
[a2 • 62 -^
Let us consider the first tuple in Sd. It supports rule r i , r2, r4, rs and, TQ. Rule ri supports value ^2 with weight [[L32| -. 1|3J]] - 3 - l = | . Rule r2 supports value di with weight [| • 1] • 2 • 1 = | . Rule r4ipportj supports value ^2 with weight [| • 1] • 3 • 1 = 1. Rule rs supports value di with weight [ | - | ] - 2 - l = | . Rule re supports value 0^2 with weight [| • 1] • 4 • 1 = | . So, ^ is the total support for value d2 whereas § is the total support for the value di. Because § • ^ < ^, then the value di is rule out and the same can not be predicted by Chase. Now, let us take tuple XQ. This tuple supports only two rules: ri and r^. Rule ri supports value d2 with weight [1 • 1] • 3 • 1 = 3. Rule rs supports value d2 with weight [1 • 1] • 3 • 1 = 3. It can be easily checked that by removing value 62 from the description of XQ we decrease the total support of value d2 to 3 but still keep the support of di equal to zero. Also, additional removal of C3 or a2 will not help, {cs, 02,62} is the smallest set which has to be removed from the description of XQ to guarantee that the value ^2 will be not assigned by Chase as the value of d for XQ. NOW, let us take tuple X7. This tuple supports three rules: r i , r2, and re. Rule ri supports value c?2 with weight [1 • | ] • 3 • 1 = | . Rule r2 supports value di with weight [1 • | ] • 2 • 1 = | . Rule re supports value ^2 with weight [1 • 1] • 4 • 1 = | . So, ^ is the total support for value ^2, whereas | is the total support for value di, which means that the value di is rule out. By removing the value 02 from the description of object xj, both values di, ^2 will be assigned as possible values of the attribute d for X7. Following similar strategy for the remaining objects, we get a new information system Sd represented by Table 9.4: Clearly, the hidden attribute d can not be reconstructed by distributed Chase from the available data in Sd, for any object x.
9 Data Security and Null Value Imputation in DIS
143
Table 9.4. Information System Sd X
a
b
c
^1
{(«i,|),(«2,|)}
{(6i,i),(62,|)}
ci
X2
{(a2,i),(a3,|)}
{(6i,|),(62,i)}
2^3
62
a:^4
as
X5
{(ai,|),(a2,|)}
{(C1,|),(C3,|)} C2
&i
C2
X6
a:8
d
e {(ei,|),(e2,^)}
63 {(ei,|),(e2,|)} ei {(e2,|),(e3,|)}
^2
ci
63
In general, for any tuple x, we identify all rules supported by that tuple. Next, on the basis of these rules, we calculate the total support for each value of the hidden attribute. These total supports are used to calculate the confidence in each of this values. If the confidence in any of them is below the threshold A, then such a value is ruled out. We need minimum two weighted values remaining if the correct value is one of them. This can be achieved by replacing some values in S^ by Null Values. The strategy outlined in this paper shows how to search for such minimal sets.
9.6 Security of Hidden Attributes and Testing In this section, we give more precise description of the algorithm for identifying the minimal number of cells in Sd which additionally have to be hidden from users in order to guarantee that attribute d cannot be reconstructed by them through Distributed Chase. Finally, we test that algorithm on data obtained from one of the insurance companies. Assume that KB contains rules extracted in DIS at server sites for Sd with a goal to reconstruct hidden attribute d in Sd. In this section, by d{x) we mean the value of d for x which is hidden in Sd. For each object x in Sd, we look first for all rules in KB supported by x. Several cases have to be considered: •
There is only one rule r = [t —> di] in KB supported by object x in 5^. If d{x) = di, then value di is predicted correctly by r. It means that minimum one of the attributes listed in t has to be additionally hidden for x in Sd- This attribute can be chosen randomly (its corresponding slot is denoted by hidl).
•
There is a set of rules {ri = [ti —> c?i],r2 = [^2 —> G?i],...,rfc = [tk —^ di]} in KB supported by x. If d{x) = G?I, then value di is predicted correctly by rules from {ri, r2,..., rfc}. It means that a set containing at least a minimal set of attributes covering all terms {^1,^2,...,^/.} has to be additionally hidden for x in
144
Zbigniew W. Ras and Agnieszka Dardzinska S (corresponding slots are denoted by hid2).
•
There is a set of rules {ri = [ti —> di],r2 = [^2 —> ^2], •••, ^fc = [h —^ dk]} in KB supported by x. Let [s^, Q] denotes support and confidence of rule r^, for i < k. Let Confsd {d'^ x, KB) denotes the confidence in attribute value d' € Vd for X in Sd driven by iiTB. It is defined as ^{si - Ci : [1 < i < k] A [d^ = di]}/ J2{^i ' Ci : 1 < i < k}.lf d{x) = dj and A is the threshold for minimal confidence in attribute values describing objects in Sd, then > A], we do - if ConfsAdj^x,KB) > A and {3d ^^ dj)[ConfsAd,x,KB) not have to hide any additional slots for x. -
if ConfsMji^^KB) > A and {Wd 7^ dj)[Confs^{d,x,KB) have to hide additional slots (denoted by hidS) for x.
< A], we
-
If Confs^ {dj, X, KB) < A and (3d 7^ dj) [Confs^ {d, x, KB) > A], we do not have to hide additional slots for x.
So, each slot asd{x) which has to be hidden is assigned to one of the 3 groups: hidl, hid2, hidS. To check what is the percentage of slots which have to be additionally hidden in Sd in order to guarantee that a randomly hidden attribute d can not be reconstructed by distributed Chase, we use sampling data table containing 10,000 objects described by 100 attributes. These objects are extracted randomly from a complete database describing customers of an insurance company. To build DIS environment as simple as possible (without problems related to handling different granularity and different semantics of attributes at different sites and without either using a global ontology or building inter-ontology bridges between local ontologies), this data table was randomly partitioned into 4 equal tables containing 2,500 tuples each. Next, from each of these tables 40 attributes (columns) have been randomly removed leaving 4 data tables of the size 2,500 x 60 each. One of these tables is called a client and the remaining 3 are called servers. All of them represent sites in DIS. Now, for all objects at the client site, we have hidden values of one of the attributes which was chosen randomly. This attribute is denoted by d. At each server site, if d is listed in its domain schema, we learn descriptions of d using See5 software (data are complete so for that purpose we do not have to use ERID). All these descriptions, in the form of rules, have been stored in KB of the client. Distributed Chase was applied to predict what is the real value of a hidden attribute for each object X at the client site. The threshold value A = 0.125 was used in our example. The number of additional slots required to be hidden: •
3176 slots of /izdl-type (2.117% of slots at client table)
•
811 slots of hid2-typQ (0.54% of slots at client table)
•
24 slots of hidS-type (0.016% of slots at client table)
9 Data Security and Null Value Imputation in DIS
145
It should be observed that the majority of slots which additionally are hidden at the client site are uniquely predicted {hidl and hide2 types) by rules in KB.
9.7 Conclusion Proposed strategy shows the steps one has to follow if he needs to identify additional slots in a data table which have to be jointly hidden with a chosen hidden attribute. Presented example gives also the percentage of additional slots which have to hidden in a data table at the client site to guarantee the security of a hidden attribute from the point of view of a distributed Chase. To improve our strategy, we can look for additional hidden slots taking into consideration their influence on predicting incorrect values for a chosen hidden attribute d. To be more precise, if there is a rule r = [t —> di] in KB supported by object X in Sd which identifies d{x) correctly, one of the attributes listed in t has to be additionally hidden for x in 5^. We can chose this attribute randomly but also we can identify which attribute used in t has the highest influence on predicting incorrect values for our hidden attribute. Similar strategy can be followed for slots of the type hid2 and hid^ in the data table at the client site.
References 1. Atzeni, P., DeAntonellis, V. (1992) Relational database theory, The Benjamin Cummings Publishing Company 2. Benjamins, V. R., Fensel, D., Perez, A. G. (1998) Knowledge management through ontologies, in Proceedings of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM-98), Basel, Switzerland. 3. Chandrasekaran, B., Josephson, J. R., Benjamins, V. R. (1998) The ontology of tasks and methods, in Proceedings of the 11th Workshop on Knowledge Acquisition, Modeling and Management, Banff, Alberta, Canada 4. Dardzinska, A., Ras, Z.W. (2003) Rule-Based Chase Algorithm for Partially Incomplete Information Systems, in Proceedings of the Second International Workshop on Active Mining (AM'2003), Maebashi City, Japan, October, 2003, 42-51 5. Dardzinska, A., Ras, Z.W. (2003) On Rules Discovery from Incomplete Information Systems, in Proceedings of ICDM'03 Workshop on Foundations and New Directions of Data Mining, (Eds: T.Y. Lin, X. Hu, S. Ohsuga, C. Liau), Melbourne, Florida, IEEE Computer Society, 2003, 31-35 6. Dardzinska, A., Ras, Z.W. (2003) Chasing Unknown Values in Incomplete Information Systems, in Proceedings of ICDM'03 Workshop on Foundations and New Directions of Data Mining, (Eds: T.Y. Lin, X. Hu, S. Ohsuga, C. Liau), Melbourne, Florida, IEEE Computer Society, 2003, 24-30 7. Fensel, D., (1998), Ontologies: a silver bullet for knowledge management and electronic commerce. Springer-Verlag, 1998 8. Grzymala-Busse, J. (1997) A new version of the rule induction system LERS, in Fundamenta Informaticae, Vol. 31, No. 1, 27-39
146
Zbigniew W. Ras and Agnieszka Dardzinska
9. Giudici, P. (2003) Applied Data Mining, Statistical Methods for Business and Industry, Wiley, West Sussex, England 10. Guarino, N., ed. (1998) Formal Ontology in Information Systems, lOS Press, Amsterdam 11. Guarino, N., Giaretta, P. (1995) Ontologies and knowledge bases, towards a terminological clarification, in Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, lOS Press 12. Pawlak, Z. (1991) Rough sets-theoretical aspects of reasoning about data, Kluwer, Dordrecht 13. Pawlak, Z. (1991) Information systems - theoretical foundations, in Information Systems Journal, Vol. 6, 1981, 205-218 14. Ras, Z.W. (1994) Dictionaries in a distributed knowledge-based system, in Concurrent Engineering: Research and Applications, Conference Proceedings, Pittsburgh, Penn., Concurrent Technologies Corporation, pp. 383-390 15. Ras, Z.W., Dardzinska, A. (2004) Ontology Based Distributed Autonomous Knowledge Systems, in Information Systems International Journal, Elsevier, Vol. 29, No. 1, 2004, 47-58 16. Ras, Z.W., Dardzinska, A. (2004) Query Answering based on Collaboration and Chase, in the Proceedings of FQAS'04 Conference, Lyon, France, LNCS/LNAI, Springer-Verlag, 2004, will appear 17. Ras, Z.W., Joshi, S. Query approximate answering system for an incomplete DKBS, in Fundamenta Informaticae Journal, lOS Press, Vol. 30, No. 3/4, 1997, 313-324 18. Sowa, J.F. (2000a) Ontology, metadata, and semiotics, in B. Ganter Sz G. W. Mineau, eds.. Conceptual Structures: Logical, Linguistic, and Computational Issues, LNAI, No. 1867, Springer-Verlag, Berlin, 2000, pp. 55-81 19. Sowa, J.F. (2000b) Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole Publishing Co., Pacific Grove, CA. 20. Sowa, J.F. (1999a) Ontological categories, in L. Albertazzi, ed.. Shapes of Forms: From Gestalt Psychology and Phenomenology to Ontology and Mathematics, Kluwer Academic Publishers, Dordrecht, 1999, pp. 307-340. 21. Van Heijst, G., Schreiber, A., Wielinga, B. (1997) Using explicit ontologies in KBS development, in International Journal of Human and Computer Studies, Vol. 46, No. 2/3, 183-292.
10 Basic Principles and Foundations of Information Monitoring Systems Alexander Ryjov Chair of Mathematical Foundations of Intelligent Systems Department of Mechanics and Mathematics Lomonosov' Moscow State University, Moscow, Russia
ryj [email protected]
10.1 Introduction This article describes main ideas of Information Monitoring Systems (IMS) and applications of IMS in real-world problems. Information monitoring systems relate to a class of hierarchical fuzzy discrete dynamic systems. The theoretical base of such class of systems is made by the fuzzy sets theory, discrete mathematics, methods of the analysis of hierarchies which was developed independently in works of Zadeh [9, 10], Messarovich [2], Saaty [8] and others. IMS address to process uniformly diverse, multi-level, fragmentary, unreliable, and varying in time information about some problem/process. Based on this type of information IMS allow perform monitoring of the problem/process evolution and work out strategic plans of problem/process development. These capabilities open a broad area of applications in business (marketing, management, strategic planning), socio-political problems (elections, control of bilateral and multilateral agreements, terrorism), etc. One of such applications is a system for monitoring and evaluation of state's nuclear activities (department of safeguards, IAEA) [3] - have been shortly described in the report.
10.2 Basic elements of IMS and their characteristic We shall name a task of evaluation of a current state of the problem/process and elaboration of the forecasts of its development as an information monitoring problem and human-computer systems ensuring support of a similar sort of information problems - information monitoring systems. Basic elements of monitoring system at the top level are the information space, in which information about the state of the problem/process circulates, and expert (experts), working with this information and making conclusions about the state of the problem/process and forecasts of its development. The information space represents a set of various information elements, which can be characterized as follows: •
diversity of the information carriers, i.e. fixing of the information in the articles, newspapers, computer kind, audio- and video- information etc.;
148
Alexander Ryjov
• fragmentariness. The information more often concerns to any fragment of the problem, and the different fragments may be differently "covered" with the information; • multi-levels of the information. The information can concern to the whole problem, to some its parts, to a particular element of the problem; • various degree of reliability. The information can contain the particular data which has a various degree of reliability, indirect data, results of conclusions on the basis of the reliable information or indirect conclusions; • possible discrepancy. The information from various sources can coincide, slighdy to differ or in general to contradict one another; • varying in time. The problem develops in time, therefore the information at different moments of the time about the same element of the problem may and should be differ; • possible bias. The information reflects certain interests of the source of the information, therefore it can have tendentious character. In the specific case it may be misinformation (for example, for political problems or for problems, connected to competitiveness). The experts are an active element of the monitoring system and, observing and studying elements of the information space, they make conclusions about the state of the problem and prospects of its development taking into account listed above properties of the information space.
10.3 Basic principles of information monitoring technology Information monitoring systems allow: • • • •
to process uniformly diverse, multi-level, fragmentary, unreliable, information varying in time; to receive evaluations of status of the whole problem/process and/or its particular aspects; to simulate various situations in the subject area; to reveal "critical ways" of the development of the problem/process. It means to reveal those elements of the problem, the small change of which status may qualitatively change the status of the problem/process as a whole.
Taking into account the given features of the information and specific methods of its processing, it is possible to declare the main features of the information monitoring technology as follows: •
The system provides the facility for taking into account data conveyed by different information vehicles (journals, video clips, newspapers, documents in electronic form etc.). Such a facility is provided by means of storage in a database of a system of references to an evaluated piece of information, if it is not a document in electronic form. If the information is a document in electronic form, then both the evaluated information (or part thereof) and a reference thereto are stored in the system. Thus the system makes it possible to take into account and use in
10 Basic Principles and Foundations of Information Monitoring Systems
•
•
•
•
149
an analysis all pieces of information which have a relationship to the subject area irrespective of the vehicles concerned. The system makes it possible to process fragmentary information. For this purpose a considerable part of the model is represented in the form of a tree. It is clear that for complex problems/process such representation of a model is some simplification. However in this way good presentation and simplicity of operation with the model is attained. Information with different degrees of reliability, some of it possibly tendentious, can be processed in the system. This is achieved by reflecting the influence of a particular piece of information on the status of the elements of the model of the problem with the aid of fuzzy linguistic values. It should be borne in mind that an evaluation of an element of the model. The information with a various degree of reliability, probably, biased can be processed in the system. For this purpose the description of the influence of the information received on a status of the model of a problem was done with use of fuzzy linguistic variable. It is necessary to take into account, that the evaluation of the element of model may both vary under influence of the information received and remain unchanged (i.e. be confirmed). Time is one of the parameters of the system. This makes it possible to have a complete picture of the variation of the status of the model with time.
Thus, the systems constructed on the basis of this technology allow having the model of the problem developing in time. It is supported by the references to all information materials, chosen by the analysts, with general and separate evaluations of the status of the problem/process. Use of the time as one of parameters of the system allows to conduct the retrospective analysis and to build the forecasts of development of the problem/process. There is the opportunity of allocation "of critical points", i.e. such element(s) of the model, the small change of which can cause significant changes in a status of the whole problem/process. The knowledge of such elements has large practical significance and allows to reveal "critical points" of the problem/process, to work out the measures on blocking undesirable situations or achievement desirable, i.e. somewhat operate the development of the problem/process in time in the desirable direction.
10.4 Theoretical Basis For effective practical application of the proposed technological solutions it is necessary to tackle a series of theoretical problems, the results of which are given below.
10.4.1 Problem 1: Modeling of human's perception It is assumed that the expert describes the degree of inconsistency of the obtained information (for example, the readiness or potential for readiness of certain processes in a country [3]) in the form of linguistic values. The subjective degree of convenience of such a description depends on the selection and the composition of such linguistic values. Let us explain this on a model example.
150
Alexander Ryj ov
Example 1. Let it be required to evaluate the quantity of plutonium. Let us consider two extreme situations ([3]). Situation 1. It is permitted to use only two values: "small" and "considerable quantity". Situation 2. It is permitted to use many values: "very small", "not very considerable quantity",..., "not small and not considerable quantity",..., "considerable quantity". Situation 1 is inconvenient. In fact, for many situations both the permitted values may be unsuitable and, in describing them, we select between two "bad" values. Situation 2 is also inconvenient. In fact, in describing a specific quantity of nuclear material, several of the permitted values may be suitable. We again experience a problem but now due to the fact that we are forced to select between two or more "good" values. Could a set of linguistic values be optimal in this case? It is assumed that the system tracks the development of the problem, i.e. its variation with time. It is also assumed that it integrates the evaluations of different experts. This means that one object may be described by different experts. Therefore it is desirable to have assurances that the different experts describe one and the same object in the most "uniform" way. On the basis of the above we may formulate the first problem as follows: Problem 1. Is it possible, taking into account certain features of the man's perception of objects of the real world and their description, to formulate a rule for selection of the optimum set of values of characteristics on the basis of which these objects may be described? Two optimality criteria are possible: Criterion 1. We regard as optimum those sets of values through whose use man experiences the minimum uncertainty in describing objects. Criterion 2. If the object is described by a certain number of experts, then we regard as optimum those sets of values which provide the minimum degree of divergence of the descriptions. This problem may be reformulated as a problem of construction of an optimal information granulation procedure from point of view of criterion 1 and criterion 2. Let us consider t fuzzy variables with the names ai, a2, . . . , at, specified in one universal set. We shall call such a set the semantic space Sf. Let us introduce a system of limitations for the membership functions of the fuzzy variables comprising st. We shall consider that: (1) Vj(l <j< t)3\J] i=. 0, where U] = {?/ G U \[i^{u) = 1}, U ] is an interval or a point; (2) Vj(l < j < t)fij{u) does not decrease on the left of U ] and does not increase on the right of U] (since, according to (1), U j is an interval or a point, the concepts "on the left" and "on the right" are determined unambiguously). Requirements 1 and 2 are quite natural for membership functions of concepts forming a semantic space. In fact, the first one signifies that, for any concept used in the universal set, there exists at least one object which is standard for the given concept.
10 Basic Principles and Foundations of Information Monitoring Systems
151
If there are many such standards, they are positioned in a series and are not "scattered" around the universe. The second requirement signifies that, if the objects are "similar" in the metrics sense in a universal set, they are also "similar" in the sense of membership of a certain concept. Henceforth, we shall need to use the characteristic functions as well as the membership functions, and so we shall need to fulfil the following technical condition: (3) Vj(l < j < t)fXj{u) has not more than two points of discontinuity of the first kind. For simplicity let us designate the requirements 1-3 as L. Let us also introduce a system of limitations for the sets of membership functions of fuzzy variables comprising Sf. Thus, we may consider that: (4) completeness: Vn € U 3 j ( l < j
(5) orthogonal: V-w G U ^^ iij{u) = 1. Requirements 4 and 5 also have quite a natural interpretation. Requirement 4, designated the completeness requirement, signifies that for any object from the universal set there exists at least one concept to which it may belong. This means that in our semantic space there are no "holes". Requirement 5, designated the orthogonality requirement, signifies that we do not permit the use of semantically similar concepts or synonyms, and we require sufficient distinction of the concepts used. Note that this requirements is often fulfilled or not fulfilled depending on the method used for constructing the membership functions of the concepts forming the semantic space. Note also that all the results given below are justified with a certain weakening of the orthogonality requirement [7], but for its description it is necessary to introduce a series of additional concepts. Therefore let us dwell on this requirement. For simplicity we shall designate requirements 4 and 5 as G. We shall term the semantic space consisting of fuzzy variables, the membership functions of which satisfy the requirements (1) - (3), and their populations the requirements (4) and (5), a complete orthogonal semantic space and denote it G{L). As can be seen from example 1, the different semantic spaces have a different degree of internal uncertainty. Is it possible to measure this degree of uncertainty? For full orthogonal semantic spaces the answer to this question is yes. To prove this fact and derive a corresponding formula, we need to introduce a series of additional concepts. Let there be a certain population of t membership functions St G G{L). Let st = {/xi(w),..., /it(^^)}. Let us designate the population of t characteristic functions st = {hi (?i),..., ht{u)} as the most similar population of characteristic functions, if
M.) = {J;'f°»-«^« = "•<»> «(!<.<.) It is not difficult to see that, if the complete orthogonal semantic space consists not of membership functions but of characteristic functions, then no uncertainty will arise when describing objects in it. The expert unambiguously chooses the term aj, if the object is in the corresponding region of the universal set. Some experts describe
152
Alexander Ryjov
one and the same object with one and the same term. This situation may be illustrated as follows. Let us assume that we have scales of a certain accuracy and we have the opportunity to weigh a certain material. Moreover, we have agreed that, if the weight of the material falls within a certain range, it belongs to one of the categories. Then we shall have the situation accurately described. The problem lies in the fact that for our task there are no such scales nor do we have the opportunity to weigh on them the objects of interest to us. However we can assume that, of the two semantic spaces, the one having the least uncertainty will be that which is most "similar" to the space consisting of the populations of characteristic functions. In mathematics distance can be a degree of similarity. Is it possible to introduce distance among semantic spaces? For complete orthogonal semantic spaces it is possible. Lemmsil. Let st G G(L), sj G G(L), st = s'f. = {fi[{u)^..., /ij(ii)}, p(/, g) is a measure in L. Then
{fii{u),...
,iJ.t{u)},
t
is a measure in Gt{L), The semantic statements formulated by us in the analysis of st may be formalized as follows. Let st G G{L). For the measure of uncertainty of st we shall take the value of the functional C(st), determined by the elements of G{L) and assuming the values in [0,1] (i.e. (,{st) : G{L) -^ [0,1]), satisfying the following conditions (axioms): Al .^(st) = 0, if St is a set of characteristic functions; A2.Let 5t, s'^, G G{L), t and t' may be equal or not equal to each other. Then^(5t) < ^(^J/), if p{st,st) < p{s[,^s[,), where/9(-, •) is some metric in G{L). Do such functional exist? The answer to this question is given by the following theorem. Theorem 1. (Theorem of existence). Let St G G{L). Then functional
^{st) = ^
^ / (M^I W - f'iii^)) ^^'
(10-1)
where ai*(u) = max LLi(u),ai*(u) =
max
Ui(u),
f satisfies the following requirements: Fl : /(O) = 1, / ( I ) = 0; F2 : / does not increase is a measure of uncertainty of St, i.e. satisfies Al andA2.
(10.2)
10 Basic Principles and Foundations of Information Monitoring Systems
153
There are many functionals satisfying the conditions of Theorem 1. They are described in sufficient detail in [13]. The simplest of them is the functional in which the function f is linear. It is not difficult to see that conditions Fl and F2 are satisfied by the sole linear function f{x) = 1 — x. Substituting it in Eq. ( 10.1), we obtain the following simplest measure of uncertainty of the complete orthogonal semantic space Cist) = ^J^{1-
(/x,*(u) - Mi*M)) du,
(10.3)
where /Xi* {u), fXi* (u) are determined by the relations (10.2). Let us denote the sub-integral function in (10.3) by rj{st^u): r]{su u) = l - {fii* (u) - fXi* (u)) Now we may adduce the following interpretation of the measure of uncertainty ( 10.1). Interpretation. Let us consider the process of describing objects in the framework of the semantic space 53 e G{L) - Fig. 10.1. For the objects ui and 1^5, man will
Fig. 10.1. Interpretation of measure of uncertainty without hesitation select one of the terms (ai and as respectively). For the object U2 the user starts selecting between the terms ai and a^. This hesitation increases and attains its peak for the object u^\ at this point the terms ai and 02 are indistinguishable. All the experts will be unanimous in describing the objects ui and 1^5, while in describing U2 a certain divergence will arise which attains its peak for the object 1^4. Let us now consider formula for r]{st,u). It is not difficult to see that
0 = 7y(sf, U5) = ri{st,ui) < 77(5^, 2x2) < 7/(5^,^3) < ry(st, U4) = 1. Thus, rj{st,u) actually reflects the degree of uncertainty which man experiences in describing objects in the framework of the corresponding semantic space or the degree of divergence of opinion of the experts in such a description. Then the degree of fuzziness of ^{st) ( 10.1) is an average measure of such uncertainty in describing all the objects of the universal set. The following theorems are true [7]:
154
Alexander Ryjov Let us define the following subset of function set L: 1/ is a set of functions from L, which are part-linear and linear on U = {u G U : Vi(l <j
iij{u) < 1},
L is a set of functions from L, which are part-linear on U (including U). Theorem 2. Let st G G(L). Then ^{st) = 2iTJl' ^^^^^ ^ = l^lTheorem 3. Letst G G{L). Then^{st)
^roi' ^^^^^ ^ ' |U|; c < 1, c = Const.
Let g is some one-to-one function, which is defined on U. This function is induced transformation of some st G Gt{L) on universum U to g{st) on universum U', where U ' = gCU) = {u' : u' = g{u), u G U } . The above induction is defined by following way: g{st) is a set of membership functions {/xi(u'),... ,/iJ(7x')}, where fJ^^j{u^) = fJ'jidiu)) = tij{g-^(u')) = fij{u), fXj{u) est, l<j< t. The following example illustrate this definition. Example 2. Let st G G{L), U is universum st and ^ is expansion (compression) of universum U. In this case, g{st) is a set of functions produced from st by the same expansion (compression). Theorem 4. Let st G G{L), U w universum st and g is some linear one-to-one Junction defined on U and^{st) i^ 0. Then ^(s^) = i{g{st)). An important aspect of the practical use of any model is its stability. It is quite natural that in identifying the parameters of the model (in our case when constructing the membership functions) small measuring errors can occur. If the model is sensitive to such errors, then its practical use is very problematic. Let us consider that the membership functions in our case are not given accurately but have a certain "accuracy" 5 (Fig. 10.2). Let us call this particular situation the J-model and denote
Fig. 10.2. 5 - model it by G^{L). In this situation we can calculate the top (^(st)) and bottom (^{st)) valuations of the degree of fuzziness ^{st).
10 Basic Principles and Foundations of Information Monitoring Systems
155
Theorem 5. Let st e G^(L). Then i(£i) = 2 | t ^ ( l - < 5 i ) '
(10.4)
^(5*) = ^ ( l + 2<5i)
(10.5)
By comparing the results of the theorem 2 and theorem 5, we see that for small significances, the main laws of our model are preserved. Therefore, we can use our technique of estimation of the degree of fuzziness in practical tasks, since we have shown it to be stable. Based on described above results we can propose following rule for selection of the optimum set of values of characteristics on the basis of which these objects may be described: • • • •
All the "reasonable" sets of linguistic values are formulated. Each of such sets is represented in the form of a complete orthogonal semantic space. For each set the measure of uncertainty is calculated ( 10.1). As the optimum set minimizing both the uncertainty in the description of objects and the degree of divergence of opinions of experts we select the one, the uncertainty of which is minimal.
We can formulate the following resume for this section. It is shown that we can formulate a method of selecting the optimum set of values of qualitative indications (collection of granules [5]). Moreover, it is shown that such a method is stable, i.e. the natural small errors that may occur in constructing the membership functions do not have a significant influence on the selection of the optimum set of values. The sets which are optimal according to criteria 1 and 2 coincide. Following this method, we may describe objects with minimum possible uncertainty, i.e. guarantee optimum operation of the information monitoring system from this point of view. 10.4.2 Problem 2: Information retrieval in fuzzy data bases Information monitoring technology assumes the storage of information material (or references to it) and their linguistic evaluations in the system database. In this connection the following problem arises. Problem 2. Is it possible to define the indices of quality of information retrieval in fuzzy (linguistic) databases and to formulate a rule for the selection of such a set of linguistic values, use of which would provide the maximum indices of quality of information retrieval? This problem may be reformulated as a problem of construction of an optimal information granulation procedure from point of view of information retrieval in fuzzy (linguistic) databases. The information monitoring systems are human-machine information systems. The user's estimations of the accessible information materials are store in a database
156
Alexander Ryj ov
of system and are used for an evaluation of the current status of a problem and for a forecasting of its development (see section 10.3). In this sense the database of system is a basis of information model of a subject area. The quality of this basis (and, accordingly, model of the problem) is expressed, in particular, through parameters of the information retrieval. If the database containing the linguistic descriptions of objects of a subject area allows to carry out qualitative and effective search of the relevant information then the system of information monitoring will work also qualitatively and effectively. As well as in section 10.4.1, we shall consider that the set of the Hnguistic meanings describing a degree of discrepancy of the received information and available or a possibility of realization of some processes in a state can be submitted as G{L). In our study of the process of information searches in data bases whose objects have a linguistic description, we introduced the concepts of loss of information i^x{U)) and of information noise {^x{U)). These concepts apply to information searches in these data bases, whose attributes have a set of significances X, which are modeled by the fuzzy sets in st . The meaning of these concepts can informally be described as follows. While interacting with the system, a user formulates his query for objects satisfying certain linguistic characteristics, and gets an answer according to his search request. If he knows the real (not the linguistic) values of the characteristics, he would probably delete some of the objects returned by the system (information noise), and he would probably add some others from the data base (information losses), not returned by the system. Information noise and information losses have their origin in the fuzziness of the linguistic descriptions of the characteristics. Let us denote N(u) the number of objects, the descriptions of which are stored in the data base, that possess a real (physical, not linguistic) significance equal to u\ Pj(j = I,... ,t) - the probability of some request offering in some j - meaning of the characteristic. The following theorems are true [8]. Theorem 6. Let st e G{L), N{u) = N = Const andpj = | ( j = 1 , . . . , ^). Then ^x{U) = ^x{U) =
IN
—^{st).
Theorem7. Letst e G{L), N{u) = JV = Constandpj
= ^{j = I,...,t).
Then
where c is a constant with depends from N only. Theorem 8. Let st G G^{L), N{u) = N = Const andpj = i ( j = 1 , . . . , t). Then —
_
\3
2N t{l-\-25i)
a ^ + 2.. «».). O.N
10 Basic Principles and Foundations of Information Monitoring Systems
157
Based on described above results we can propose following rule for the selection of such a set of linguistic values, use of which would provide the maximum indices of quality of information retrieval: • • • •
All the "reasonable" sets of linguistic values are formulated. Each of such sets is represented as a complete orthogonal semantic space. For each set the measure of uncertainty is calculated (10.1). As the optimum set of linguistic values, use of which would provide maximum indices of quality of information retrieval, ratio ^ ^ of which is minimal.
We can formulate the following resume for this section. It is shown that it is possible to introduce indices of the quality of information retrieval in fuzzy (linguistic) databases and to formalize them. It is shown that it is possible to formulate a method of selecting the optimum set of values of qualitative indications which provides the maximum quality indices of information retrieval. Moreover, it is shown that such a method is stable, i.e. the natural small errors in the construction of the membership functions do not have a significant effect on the selection of the optimum set of values. It allows to approve that the offered methods can be used in practical tasks and to guarantee optimum work of information monitoring systems. 10.4.3 Problem 3: Aggregation of information in fuzzy hierarchical systems Because model of the problem/process have hierarchical stricture (see section 10.2), choice and selection (tuning) of aggregation operators for the nodes of the model is one more important issue in development IMS. We may formulate this problem as follows: Problem 3. Is it possible to propose the procedures of information aggregation in fuzzy hierarchical dynamic systems which allow us to minimize contradictoriness in the model of problem/process in IMS? Let model of object or process is tree D with nodes dj {j = 0,..., ND), each of which links with some set of the linguistic values Xj describing a state of the node. Every not leaf node is associated with some operator of aggregation of the information, allowing on the basis of estimations of a state of the subordinated nodes to calculate its state (i.e. to choose one of elements of corresponding set of values). Frequently the choice of such operator is defined by properties of model. For example, for technical problems/processes min is good enough operator. However often this choice is not so obvious (for example, for problems from area of political science, sociology or medicine). Development of methods of a choice of adequate operators of aggregation on the basis of the accessible information from experts and the analysis of functioning of model is necessary for these cases. Let us consider some not leaf node dj^ with subordinates to it (in sense of considered tree D) nodes dj^ ,dj^,.. .,dj^^. Then the operator of aggregation of information (OAI) is the function defined on set of all possible values of subordinated nodes and accepting values in set: 0j, : Xj, X X,, X . . . X Xj^^ ^ Xj,
(10.6)
158
Alexander Ryjov
^j T ^ Sets Xj {j = 0,...,ND) is a dset of linguistic values a^^Oj,. aJ ^. Let us denote M[0jo] a set of OAI for node dj^. It is obvious, that for a concrete element of model the number possible OAI is big: from (10.6) directly follows, that
Our task is the choice of the concrete operator Oj e M[0j] for all not leaf nodes dj of our model D. This choice is based on some information Ij about "ideal" OAI Oj € M[Gj]. This information represents by two sets: -
f(i)
ij=ir^ij
(2)
(10.7)
r(i)^ is a set of statements of experts about "correct behaviour" of OJ; where /•
JJ^^ is a set of records of "work" of Oj. The following statements can illustrates Iji ''If dj^ = a]^ and djr^ = a]^,.-, and djj^^ = a ] ^ , then dj^ = a]^"; ''If dj^ is strongly increasing, then dj^ is decreasing"; "djo is monotonely decreasing on all arguments", etc. Ij is a table of following kind: ( " j l ) " J 2 ' •••>djMn)
''JO
^ J 1 ' ^ J 2 ' •• •^"•JNn 1 1 •'°-JN„
^K'l.aja." •.«k) e{a}^,a]^,.. •'«Ln)
a""
e{a]^,a]^,...
a"" ^('.«;;". ••'S^n ^ Left column of the table is collection of all pairwise different values of dj^ ^^j2^"">^jMo right column is values of dj^, based djQ, based on /• \ or empty records. In the beginning of our work the table contains dj^ values of first type (which are based on Ij). In process of reception and an estimation by user of the information the table is filled on the basis of calculations for the chosen operator Oj till the moment when the user disagrees with "theoretical" value dj^. If such contradiction does not arise, the operator is chosen successfully and is adequate OAI for the given node of a model. If the contradiction arises, it is necessary to repeat procedure of a choice of adequate OAI, but on the basis of added and, perhaps, specified with the expert the information / • ^ and / • . This process repeats until the table will be filled completely. The final table is adequate OAI for the given node. Unfortunately we can not present all our algorithms of choosing of adequate OAI here due to article's volume limitation.
10 Basic Principles and Foundations of Information Monitoring Systems
159
We can just formulate the following resume. It is shown that it is possible to propose the following approaches based on different interpretations of aggregation operators: geometrical, logical, and learning-based. The last one includes leaning based on genetic algorithms and learning based on neural networks. These approaches are described in details in [6].
10.5 Application's features Some applied information monitoring systems based on described above technology have been developed. Based on this experience, we can formulate the following necessary stages of the development process: • conceptual design; • development of the demonstration prototype; • development of a prototype of the system and its operational testing; • development of the final system. Volume of article does allow us to describe these items in details; therefore we can focus on a matter of principle only. The most difficult point in development process is the elaboration of structure of the problem/process model. In some well-developed areas (marketing, medicine) we used descriptions of the process from professional books and references (like [1]) as a draft of the model, coordinated this draft with the professional experts (conceptual design and development of the demonstration prototype stages), and "tuned" this improved draft during testing of the system (development of a prototype stage). Sometimes the problem/process for monitoring is formalized enough for application of information monitoring technology. An example of this situation is a state nuclear program evaluation procedure in IAEA [3]. Developed earlier so-called physical model of the nuclear fuel cycle was a good base for the model of the problem/process in information monitoring system. Based on this model a prototype of information monitoring system has been developed. This prototype allows [3]: • Provide a tool for continuous monitoring of the status of the subject area. • Provide IAEA expert with a tool to input into the system documents concerning the States' nuclear activities in textual format or references to documents in the form of hard copies, video topics, audio reports, etc. • Produce an evaluation of the influence of obtained sign on the status of elements of the model and to change (confirm) their status accordingly. • Provide a tool for examining the status of the subject area with several levels of detail. • Detect inconsistencies between the declared States' capabilities for processing nuclear material and those capabilities as established by the Agency through analysis of information from other available sources. • Assess the importance of any detected inconsistencies from the point of view of a change in the States' capabilities to produce HEU and Pu. • Detect "critical points" important from the point of view of production of HEU and Pu, information about which is crucial for resolving an inconsistency between a country's declaration and its capabilities for processing nuclear material established by the Agency.
160
•
•
•
Alexander Ryj ov
Provide storage in its database of all the documents evaluated by the expert on references to them with linkage to specific elements of the model of the nuclear activity of a country. Provide IAEA expert with the tool for retrospective analysis of a change in the evaluations of each element of the model with the possibility of scanning the corresponding document or obtaining references to it. Record changes occurring in the system and provide the user with the tool for analyzing them.
10.6 Conclusion IMS works with diverse, multi-level, fragmentary, unreliable, and varying in time information about some problem/process and allows performing monitoring of the problem/process evolution and working out strategic plans of the problem/process development. The most difficult point in development process is the elaboration of structure of the problem/process model. Perspective way for automation of this process (development of the model of problem/process) is an application of advanced technologies like data mining. Our first experiments shown that data mining can be a good tool for this task especially if we have enough data on the problem/process. Developed methods for problems 1-3 (section 10.3) allow us to guarantee optimum work of IMS.
References 1. Philip Kotler. Marketing Management (10th Edition). Prentice Hall, 1999, 784 p. 2. Messarovich M.D., Macko D., Takahara Y. Theory of hierarchical multilevel systems. Academic Press, N.Y.- London 1970 - 344 p. 3. Ryjov A., Belenki A., Hooper R., Pouchkarev V., Fattah A. and Zadeh L.A.. Development of an Intelligent System for Monitoring and Evaluation of Peaceful Nuclear Activities (DISNA), IAEA, STR-310, Vienna, 1998, 122 p. 4. Ryjov A. Estimation of fuzziness degree and its application in intelligent systems development. Intelligent Systems. V.l, 1996, p. 205 - 216 (in Russian). 5. Ryjov A. Fuzzy Information Granulation as a Model of Human Perception: some Mathematical Aspects. Proceeding of Eight International Fuzzy Systems Association World Congress 99, p. 82-86. 6. Ryjov A. On information aggregation in fuzzy hierarchical systems. Intelligent Systems. V.6, 2001, p. 341 - 364 (in Russian). 7. Ryjov A. The principles of fuzzy set theory and measurement of fuzziness. Moscow, Dialog-MSU Publishing, 1988, 116 p. (in Russian). 8. Ryjov A. Models of information retrieval in fuzzy databases. Moscow, MSU Publishing, 2004,96 p. (in Russian). 9. Saaty T.L. The Analysis of the Hierarchy Process. Moscow, Radio and Swjaz, 1993 - 315 p. (in Russian). 10. Zadeh L.A. Fuzzy sets. Information and Control, 1965, v.8, pp. 338-353. 11. Zadeh L.A. The concept of a linguistic variable and its application to approximate reasoning. Part 1,2,3. Inform.Sci.8, 199-249; 8,301-357; 9,43-80 (1975).
n Modelling Unreliable and Untrustworthy Agent Behaviour Marek Sergot Department of Computing, Imperial College London 180 Queen's Gate, London SW7 2BZ, UK mj [email protected] Summary. It cannot always be assumed that agents will behave as they are supposed to behave. Agents may fail to comply with system norms deliberately, in open agent systems or other competitive settings, or unintentionally, in unreliable environments because of factors beyond their control. In addition to analysing system properties that hold if specifications/norms are followed correctly, it is also necessary to predict, test, and verify the properties that hold if system norms are violated, and to test the effectiveness of introducing proposed control, enforcement, and recovery mechanisms. C-\-'^'^ is an extended form of the action language C-f of Giunchiglia, Lee, Lifschitz, McCain, and Turner, designed for representing norms of behaviour and institutional aspects of (human or computer) societies. We present the permission component of C-f"^"^ and then illustrate on a simple example how it can be used in conjunction with standard model checkers for the temporal logic CTL to verify system properties in the case where agents may fail to comply with system norms.
11.1 Introduction It is a common assumption in many multi-agent systems that agents will behave as they are intended to behave. Even in systems such as IMPACT [1], w^here the language of ^obligation' and 'permission' is employed in the specification of agent behaviour, there is an explicit, built-in assumption that agents always fulfill their obligations and never perform actions that are prohibited. For systems constructed by a single designer and operating on a stable and reliable platform, this is a perfectly reasonable assumption. There are at least two main circumstances in which the assumption must be abandoned. In open agent societies, where agents are programmed by different parties, where there is no direct access to an agent's internal state, and where agents do not necessarily share a conmion goal, it cannot be assumed that all agents will behave according to the system norms that govern their behaviour. Agents must be assumed to be untrustworthy because they act on behalf of parties with competing interests, and so may fail, or even choose not to, conform to the society's norms in order to achieve their individual goals. It is then usual to impose sanctions to discourage norm violating behaviour and to provide some form of reparation when it does occur.
162
Marek Sergot
The second circumstance is where agents may fail to behave as intended because of factors beyond their control. This is likely to become commonplace as multi-agent systems are increasingly deployed on dynamic distributed environments. Agents in such circumstances are unreliable, but not because they deliberately seek to gain unfair advantage over others. Imposition of sanctions to discourage norm violating behaviour is pointless, though there is a point to specifying reparation and recovery norms. There is a third, less common, circumstance, where deliberate violations may be allowed in order to deal with exceptional or unanticipated situations. An example of discretionary violation of access control policies in computer security is discussed in [2]. In all these cases it is meaningful to speak of obligations and permissions, and to describe agent behaviour as governed by norms, which may be violated, accidentally or on purpose. In addition to analysing system properties that hold if specifications/norms are followed correctly, it is also necessary to predict, test, and verify the properties that hold if these norms are violated, and to test the effectiveness of introducing proposed control, enforcement, and recovery mechanisms. In previous work [3, 4, 5] we presented a framework for specifying open agent societies in terms of permissions, obligations, and other more complex normative relations. Norms are represented in various action formalisms in order to provide an executable specification of the agent society. This work, however, did not address verification of system properties, except in a limited sense. In another strand of work [6, 7], we have addressed verification of system properties but that was in the specific context of reasoning about knowledge in distributed and multi-agent systems. We showed that by adding a simple deontic component to the formalism of 'interpreted systems' [8] it is possible to determine formally which of a system's critical properties are compromised when agents fail to behave according to prescribed communication protocols, and then to determine formally the effectiveness of introducing additional controller agents whose role is to enforce compliance. In this paper we conduct a similar exercise, but focussing now on agent behaviours generally rather than on communication and epistemic properties specifically. We present the main elements of a formalism C-\-'^'^ which we have been developing for representing norms of behaviour and institutional aspects of (human or computer) societies [9]; we then present a simple example to sketch how it can be used in modelling unreliable/untrustworthy agent behaviour. C-h"^"^ is an extended form of the action language C+ of Giunchiglia, Lee, Lifschitz, McCain, and Turner [10], a formalism for specifying and reasoning about the effects of actions and the persistence ('inertia') of facts over time. An 'action description' in C-f is a set of C+ rules which define a transition system of a certain kind. Implementations supporting a range of querying and planning tasks are available, notably in the form of the 'Causal Calculator' C C A L C . Our extended version C-f-"^"^ provides two main extensions. The first is a means of expressing 'counts as' relations between actions, also referred to as 'conventional generation' of actions. This feature will not be discussed in this paper. The second extension is a way of specifying the permitted (acceptable, legal) states of a transition system and its permitted (acceptable, legal) transitions. This will be the focus of attention in this paper.
11 Modelling Unreliable and Untrustworthy Agent Behaviour
163
A main attraction of the C+ formalism compared to other action languages in the AI literature is that it has an explicit semantics in terms of transition systems and also a semantics in terms of a nonmonotonic formalism ('causal theories', summarised below) which provides a route to implementation via translations to executable logic programs. The emphasis in this paper is on the transition system semantics. Transition systems provide a bridge between AI formalisms and standard methods in other areas of computer science. We exploit this link by applying standard temporal logic model checkers to verify system properties of transition systems defined using the language C-h"^"^. Two points of clarification: (1) We do not distinguish in this paper between deliberate and unintentional norm violation, and (2) we are modelling agent behaviour from an external "bird's eye" perspective. We do not discuss an agent's internal state or its internal reasoning mechanisms.
11.2 The language C+ The language C was introduced by Giunchiglia and Lifschitz [11]. It applies the ideas of 'causal theories' to reasoning about the effects of actions and the persistence ('inertia') of facts ('fluents'), building on earlier suggestions by McCain and Turner. C-h extends C by allowing multi-valued fluents as well as Boolean fluents and generalises the form of rules in various ways. The definitive presentation of C-f, and its relationship to 'causal theories', is [10]. An implementation supporting a range of querying and planning tasks is available in the form of the Causal Calculator ( C C A L C ) ^ We present here a concise, and necessarily rather dense, sunmiary of the language. Some features (notably 'statically determined fluents') are omitted for simplicity. There are also some minor syntactic and terminological differences from the version presented in [10], and we give particular emphasis to the transition system semantics. Syntax and semantics We begin with a, a multi-valued, propositional signature, which is partitioned into a (non-empty) set a^ of fluent constants and a (non-empty) set a^ of action constants. For each constant c e a there is a finite, non-empty set dom{c) of values. For simplicity, in this paper we will assume that each dom{c) has at least two elements. An atom of the signature is an expression c=v, where c e a and V G dom{c), c=v is SLfluentatom when c e a^ and an action atom when c G a^. A Boolean constant is one whose domain is the set of truth values {t, f}. When c is a Boolean constant, we often write c for c=t and -"C as a shorthand for c=f. Formulas are constructed from the atoms using the usual propositional connectives. The expressions T and _L are 0-ary connectives, with the usual interpretation. A fluent formula is a formula whose constants all belong to cr^; an action formula is a formula whose constants all belong to a^, except that T and ± are fluent formulas but not action formulas. ^http://www.cs.utexas.edu/users/tag/cc
164
Marek Sergot
An interpretation of a multi-valued signature cr is a function mapping every constant c to some v e dom{c)\ an interpretation X is said to satisfy an atom c=^v if X{c) = V, and in this case we write X [= c=v. The satisfaction relation [= is extended from atoms to formulas in accordance with the standard truth tables for the propositional connectives. We let the expression I(cr) stand for the set of interpretations of a. For convenience, we adopt the convention that an interpretation X of cr is identified with the set of atoms that are satisfied by X, i.e., X |= c=v iff c=v G X for any atom v=^v of a. Every action description D of C+ defines a labelled transition system (5, A, R) where • • •
5 is a (non-empty) set of states, each of which is an interpretation of the fluent constants a^ of D; S C l{a^); A is a set of transition labels, sometimes referred to as action labels or events; A is the set of interpretations of the action constants a^, A = l(a^)', i? is a set of transitions, RC S x A x S.
For example: suppose there are three agents, a, b, and c which can move in direction E, W, N, or 5, or remain idle. Suppose (for the sake of an example) that they can also whistle as they move. Let the action signature consist of action constants move{a), move{h), move{c) with domains {E^W^N^S^idle}, and Boolean action constants whistle{a), whistle{b), whistle{c). Then one possible interpretation of the action signature, and therefore one possible transition label, is {move{a)=E^ move{b)=N, move{c)=idle, whistle{a), ->whistle{b), whistle{c)}. Because every transition label e is an interpretation of the action signature a^, action formulas a can be evaluated on the transition labels. We sometimes say that a transition (s, €, 5') is a transition of type a when e |= a. An action description D in C+ is a set of causal laws, which are expressions of the following three forms. A static law is an expression: Fife
(II.1)
where F and G are fluent formulas. Static laws express constraints on states. A fluent dynamic law is an expression: F if G after 7/;
(11.2)
where F and G are fluent formulas and ip is any formula of signature a^ U a^. Informally, (11.2) states that fluent formula F is satisfied by the resulting state 5' of any transition (5, €, 5') with 5 U € |= V^, as long as fluent formula G is also satisfied by 5'. Some examples follow. An action dynamic law is an expression: aifip
(11.3)
where a is an action formula and ip is any formula of signature a^ U a^. Action dynamic laws are used to express, among other things, that any transition of type a
11 Modelling Unreliable and Untrustworthy Agent Behaviour
165
must also be of type a' {a' if a), or that any transition from a state satisfying fluent formula G must be of type j3 {(3 if G). Examples will be provided in later sections. The C+ language provides various abbreviations for common forms of causal laws. We will employ the following in this paper. a causes F if G expresses that fluent formula F is satisfied by any state following the occurrence of a transition of type a from a state satisfying fluent formula G. It is shorthand for the dynamic law F if T after G Aa. a causes F is shorthand for F if T after a. nonexecutable a if G expresses that there is no transition of type a from a state satisfying fluent formula G, It is shorthand for the fluent dynamic law i. if T after GA a, or a causes ± if G. inertial / states that values of the fluent constant / persist by default ('inertia') from one state to the next. It is shorthand for the collection of fluent dynamic laws f=v if f=v after f=v for every v G dom{f). Of most interest are definite action descriptions, which are action descriptions in which the head of every law (static, fluent dynamic, or action dynamic) is either an atom or the symbol _L, and in which no atom is the head of infinitely many laws of D. We will restrict attention to definite action descriptions in this paper. Now for the semantics. (See [9] for further details.) Let Tstatic (s) stand for the heads of all static laws in D whose bodies are satisfied by s; let E{s, e, s') stand for the heads of all fluent dynamic laws in D whose bodies are satisfied by the transition (5, e, s'); and let A{e, s) stand for the heads of all action dynamic laws whose bodies are satisfied by the transition (s, e, s'). ^ static \^) ^^def {F I F if G is in D, 5 t= G}
F(s, e, 5') =def {F I F if G after iP isin D, s' \= G, sU e ^ ip} A{e,s) =def {o; I a if -0 is in D, sUe \= i/;} Let Dbea definite action description and a^ its fluent signature. A set s of fluent atoms is a state of D iff it satisfies the static laws of D, that is, iff •
s h Tstaticis)
(i.e.,
A static
(s) C s)
(5, e, s') is a transition of D iff s and 5' are interpretations of the fluent signature a^ and e is an interpretation of the action signature a^ such that: • • •
s\= Tstatic{s) (Tstatic{s) C s; s is a state of D) s' = Tstatic{s')UE{s,e,s') €\=A(€,s) (A{e,s)Ce)
One can see from the definition that s' is a state of D when (5, e, 5') is a transition of D. Paths Finally, when (5, A, i^) is a labelled transition system, a path of length m is a sequence so ^o ^i • • • Sm-i ^m-i Sm ( ^ > 0) such that (s^-i, e^-i, 5^) G R for i 6 l..m. We will also be interested in infinite (a; length) paths.
166
Marek Sergot
Causal theories The language C-h can be regarded as a higher-level notation for defining particular classes of theories in the non-monotonic formalism of 'causal theories', and indeed this is how it is presented in [10]. For present purposes the important points are these: for every (definite) action description D and non-negative integer m there is a natural translation from D toa causal theory F ^ which encodes the paths of length m in the transition system defined by D; moreoever, for every definite causal theory F ^ there is a formula com;?(F^) of (classical) prepositional logic whose (classical) models are in 1-1 correspondence with the paths of length m in the transition system defined by D. Thus, one method of computation for C+ action descriptions is to construct the formula comp{T^) from the action description D and then employ a (standard, classical) satisfaction solver to determine the models of comp{r^). This is the method employed in the 'Causal Calculator' C C A L C . We summarise the main steps for completeness; the reader may wish to skip the details on first reading. A full account is given in [10]. A causal theory of signature cr is a set of expressions ('causal rules') of the form F<=G where F and G are formulas of signature a. F is the head of the rule and G is the body. A rule F ^ Gisiobe read as saying that F is 'caused' if G is true, or (perhaps better), that there is an explanation for the truth of F if G is true. Let F be a causal theory and let X be an interpretation of its signature. The redact T^ is the set of all rules of F whose bodies are satified by the interpretation X: F ^ =def {F I F <^ G is a rule in F and X \=G}.Xisa model of F iff X is the unique model (in the sense of multi-valued signatures) of F ^ . A causal theory F is definite if the head of every rule of F is an atom or ±, and no atom is the head of infinitely many rules of F. Every definite causal theory F can be translated into a formula comp(r) of (classical) propositional logic via the process of 'literal completion': for each atom c=v construct the formula c=v <e^ Gi V • VG^ where G i , . . . , Gn (n > 0) are the bodies of the rules of F with head c=v\ comp(F) is the conjunction of all such formulas together with formulas -iF for each rule of the form _L <^ F in F. The models of a definite causal theory F are precisely the (classical) models of its literal completion, comp{T). Given an action description D in C-f, and any non-negative integer m, translation to the corresponding causal theory F ^ proceeds as follows. The signature of F ^ is obtained by time-stamping every fluent and action constant of D with non-negative integers between 0 and m: the (new) atom f[i]=v represents that fluent f=v holds at integer time z, or more precisely, that f=v is satisfied by the state 5, of a path So ^0 • • • €m-iSmOf the transition system defined by D; the atom a[i]=t' represents that action atom a=v is satisfied by the transition e^ of such a path. In what follows, i/:[i] is shorthand for the formula obtained by replacing every atom c=v in z/' by the the timestamped atom c[i]=i;. Now, for every static law F if G in D and every i e 0.. m, include in F £ a causal rule of the form
11 Modelling Unreliable and Untrustworthy Agent Behaviour
167
For every fluent dynamic law F \f G after ipin D and every i G 0.. m—1, include a causal rule of the form F[i-\-l]<=G[i+l]AiP\i] And for every action dynamic law o; if -0 in D and every i e 0.. m—1, include a causal rule of the form a[i] 4= ip[i]
We also require the following 'exogeneity laws'. For every fluent constant / and every v e dom{f), include a causal rule: /[0]=^ <= f[0]=v And for every action constant a, every v G dom{a), and every z G 0.. m—1, include a causal rule: a[z]=i; 4= a[z]=f (There are some further complications in the full C+ language concerning 'statically determined' fluents and non-exogenous actions, which we are ignoring here for simplicity.) It is straightforward to check [10] that the models of causal theory F ^ , and hence the (classical) models of the propositional logic formula comp(r^), correspond 1-1 to the paths of length m of the transition system defined by the C-h action description D. In particular, models of comp{T^) encode the transitions defined by D and models of comp^T^) the states defined by D, Given an action description D and a non-negative integer m, the 'Causal Calculator' CCALC performs the translation to the causal theory F ^ , constructs comp(F^), and then invokes a standard propositional satisfaction solver to find the (classical) models of comp(F£). So, for example, plans of length m from an initial state satisfying fluent formula F to a goal state satisfying fluent formula G can be found by determining the models of the (classical) propositional formula co7np(F^) A F [ 0 ] AG[m]. It must be emphasised, however, that C-h is a language for defining labelled transition systems (of a certain kind), and is not restricted to use with C C A L C . A variety of other languages can be interpreted on the transition system defined by a C-i- action description. In particular, in later sections we will look at the use of the branching time temporal logic CTL for expressing system properties to be checked on transition systems defined by C+. Example (trains) The following example is used in [12, 13] to illustrate the use of alternating-time logic (ATL) for determining the effectiveness of 'social laws' designed to co-ordinate the actions of agents in multi-agent system. We will use the example for a different purpose: in this section, to illustrate use of the language C-f, and in later sections, to show how the extended form C+"^''"can be used to analyse variants of the example in which agents may fail to obey social laws.
168
Marek Sergot
There are two trains, a and b, with a running clockwise round a double track, and b running anti-clockwise. There is a tunnel in which the double track becomes a single track. If the trains are both inside the tunnel at the same time they will collide. The tunnel can thus be seen as a kind of critical section, or as a resource for which the trains must compete. t W
There are obviously many ways in which the example can be formulated. The following will suffice for present purposes. Although it may seem unnecessarily complicated, this formulation is convenient for the more elaborate versions of the example to come later. Let fluent constants loc (a) and loc (b) represent the locations of trains a and b respectively. They both have possible values {W, t, E}. For action constants, we take a and b with possible values {go^ stay}. (Action constants act (a) and act{b) may be easier to read but we choose a and 6 for brevity.) The C-f- action description representing the possible movements of the trains is as follows. We will call this action description Dtrainsinertial loc (a), loc{b) train a moves clockwise: a=go causes loc (a)=t if loc (a)=W a=go causes loc (a)=E if loc (a)=t a-go causes loc (a)=W if loc (a)=E
train b moves anti-clockwise: b=go causes loc (6)=t if loc (6)=E b=go causes loc {b)=\N if loc (6)=t b=go causes loc {b)=E if loc (6)=W
collisions: collision iff loc (a)=t A loc {b)=X
% for convenience
nonexecutable a=go if collision nonexecutable b=go if collision
The Boolean fluent constant collision is introduced for convenience.^ The example can be formulated perfectly well without it. The transition system defined by Dtrains is shown in Fig. 11.1. ^For readers familiar with C-h, a law of the form F iff G is used here as shorthand for the pair of laws F if G and default -iF. default ->F is a C+ abbreviation for the law -iF if ->F.
11 Modelling Unreliable and Untrustworthy Agent Behaviour
EW
WW
tW
EE
WE
tE
Et
Wt
tt
169
Fig. 11.1. The transition system for the trains example. A state label such as EW is short for {loc (a)=E, loc {b)=\N}. Horizontal edges, labelled a-, are transitions in which train a moves and b does not. Vertical edges, labelled -6, are transitions in which train b moves and a does not. Diagonal edges, unlabelled in the diagram, are transitions in which both trains move. Reflexive edges, corresponding to transitions in which neither train moves, are omitted from the diagram to reduce clutter.
11.3 The language C+++ 0+"^"^ is an extended form of the language C+ designed for representing norms of behaviour and institutional aspects of (human or computer) societies [9]. It provides two main extensions to C-f-. The first is a means of expressing ^counts as' relations between actions, also referred to as 'conventional generation' of actions. That will not be discussed further in this paper. The second extension is a way of specifying the permitted (acceptable, legal) states of a transition system and its permitted (acceptable, legal) transitions. Syntax and semantics An action description of C-\-'^'^ defines a coloured transition system, which is a structure of the form: (5, A, R^ 5g, R^) where {S^ A, R) is a labelled transition system of the kind defined by C+ action descriptions, and where the two new components are • •
Sg C S, the set of 'permitted' ('acceptable', 'ideal', 'legal') states—we call 5g the 'green' states of the system; RgC R, the set of 'permitted' ('acceptable', 'ideal', 'legal') transitions—we call Rg the 'green' transitions of the system.
We refer to the complements S — Sg and i? — i?g as the 'red states' and 'red transitions', respectively. Semantical devices which partition states (and here, transitions) into two categories are familiar in the field of deontic logic where are known to yield rather simplistic logics; full discussion of their adequacy is outside the scope of this paper. It is also possible to consider a more elaborate structure, of partially coloured transition systems in which states and transitions can be green, red, or uncoloured, but we shall not present that version here.
170
Marek Sergot
A coloured transition system (S, A, R^ 5g, Rg) must further satisfy the following constraint: •
if (s, e, 5') G jRg and s e Sg then 5' G Sg.
We refer to this as the green-green-green constraint. The idea is that occurrence of a permitted (green) transition in a permitted (green) state must always lead to a permitted (green) state. All other possible combinations of green/red states and green/red transitions are allowed. In particular, and contra the assumptions underpinning JohnJules Meyer's construction of 'dynamic deontic logic' [14], a non-permitted (red) transition can result in a permitted (green) state. Similarly, it is easy to devise examples in which a permitted (green) transition can lead to a non-permitted (red) state. Some illustrations will arise in the examples to be considered later. The only combination that cannot occur is the one eliminated by the 'green-green-green' constraint: a permitted (green) transition from a permitted (green) state cannot lead to a nonpermitted (red) state. The language C-j-"^"^ extends the language C-\- with two new forms of rules. A state permission law is an expression of the form not-permitted F
(11.4)
where F is a fluent formula. An action permission law is an expression of the form not-permitted aifi/j
(11-5)
where a is an action formula and ip is any formula of signature a^Ua^. not-permitted a is an abbreviation for not-permitted a if T. It is also convenient to allow two variants of rule forms (11.4) and (11.5), allowing oblig F as an abbreviation for not-permitted -iF and oblig a as an abbreviation for not-permitted -na. Informally, in the transition system defined by an action description D, a state s is red whenever 5 |= F for any state permission law not-permitted F . All other states are green by default. A transition (s, e, 5') is red whenever sU e \= ip and e \= a for any action permission law not-permitted a \f F after ip. All other transitions are green, subject to the 'green-green-green' constraint which may impose further conditions on the possible colouring of a given transition. Let D be an action description of C-(-"^"^. i^basic refers to the subset of laws of D that are also laws of C-j-. The transition system defined by D has the states 5 and transitions R that are defined by its C-f- component, J5basic» and green states Sg and green transitions Rg given by Sg =cief S — 5red, Rg =def R ~ Rred where S'red =def {« | s |= F for somc law not-permitted F in D} -Rred =def K^, 6, s') | s U 6 |= ^, € \= a foT somc law not-permitted a if V^ in D} U {(5, e, 5') I 5 G Sg and 5' ^ 5g} The second component of the iZred definition ensures that the 'green-green-green' constraint is satisfied.
11 Modelling Unreliable and Untrustworthy Agent Behaviour
171
Example Consider the trains example of section 11.2. A collision is undesirable, unacceptable, not permitted ('red'). Construct an action description Di of C-f"^"^ by adding to the C+ action description Dtrains of section 11.2 the state permission law not-permitted collision
(11.6)
The coloured transition system defined by Di is the transition system of Fig. 11.1 with the collision state tt coloured red and all other states coloured green. The three transitions leading to the collision state are coloured red because of the green-greengreen constraint; all other transistions, including the transition from the collision state to itself, are green. Causal theories Any (definite) action description of C-f"^"*" can be translated to the language of (definite) causal theories, as follows. Let D be an action description and m a non-negative integer. The translation of the C+ component -Dbasic of D proceeds as usual. For the permission laws, introduce two new fluent and action constants, status and trans respectively, both with possible values green and red. They will be used to represent the colour of a state and the colour of a transition, respectively. For every state permission law not-permitted F and time index i e 0.. m, include in r ^ a causal rule of the form status[z]=red <= F[i], and for every z G 0.. m, a causal rule of the form status[z]=green <^ status[z]=green to specify the default colour of a state. A state permission rule of the form oblig F produces causal rules of the form status[i]=red <^ ~*F[i]. For every action permission law not-permitted a if V^ and time index i e 0.. m—1, include in r £ a causal rule of the form trans[i]=red <= a[i] A ilj[i], and for every i e 0..m—1, a causal rule of the form trans[i]=green <^ trans[i]=green to specify the default colour of a transition. An action permission law of the form oblig a if -0 produces causal rules of the form trans[i]=red <= -^a[i] A il;[i]. Finally, to capture the 'green-green-green' constraint, include for every i e 0.. m—1 a causal rule of the form trans[z]=red <= status[i]=green A status[i+l]=red
(11-7)
It is straightforward to show [9] that models of the causal theory r £ correspond to all paths of length m through the coloured transition system defined by D, where the fluent constant status and the action constant trans encode the colours of the states and transitions, respectively. Notice that, although action descriptions in C+"*""'" can be translated to causal theories, they cannot be translated to action descriptions of C-{-: there is no form of causal law in C+ which translates to the green-green-green constraint (11.7). In addition to permission laws of the form (11.4) and (11.5), which are convenient but rather restrictive, the C-\-~^~^ language allows distinguished fluent and action constants status and trans to be used explicitly in formulas and causal laws. The atoms status=red and trans=red can then be regarded as what are sometimes called 'violation constants' in deontic logic. It is also easy to allow more 'shades' of red and green to allow different notions of permitted/legal/acceptable to be mixed. We will not employ that device in the examples discussed in this paper.
172
Marek Sergot
Example (trains, continued) The action description Di of the previous section states that collisions are not permitted but says nothing about how the trains should ensure that collisions are avoided. Suppose, for the sake of an example, that we impose additional norms (social laws), as follows: no train is permitted to enter the tunnel unless the other train has just emerged. (We assume that this will be observed by the train that is preparing to enter.) Will such a law be effective in avoiding collisions? To construct a C^"^"^ action description D2 for this version, ignore (11.6) and instead add to the C+ action description Dtrains the following laws. First, it is convenenient to define the following auxiliary action constants (all Boolean): enter (a) iff a=go A loc (a)=W exit (a) Iff a=qo A loc (a)=t ^ ^ ^ ^ enter {b) iff b=go A loc {b)=E exit (6) iff b=go A loc (6)=t
(11.8)
Again, these are introduced merely for convenience; the example can be constructed easily enough without them. Now we formulate the social laws: not-permitted enter (a) if loc {b)^\N not-permitted enter (6) if loc (a)7^E The coloured transition system for this version of the example is shown in Fig. 11.2. Notice that since we are now using green/red to represent what is permitted/not permitted from the point of view of train behaviour, we have discarded the state permission law (11.6). Consequently the collision state tt is coloured green not red. We could combine the two notions of permission expressed by laws (11.6) and (11.9), for instance by introducing two different shades of green and red and relating them to each other, but we do not have space to discuss that option here. How do we test the effectiveness of the social laws (11.9)? Since the causal theory r^^ encodes the transitions defined by D2, the following captures the property that if both trains comply with the social laws, no collisions will occur. comp(r^^) \= -ico//mon[0] A trans[0]=green -^ ->collision[l]
(11.10)
This can be checked, as in C C A L C , by using a standard sat-solver to determine that the formula comp{r^^) A ->collision[0] A trans[0]=green A collision[l] is not satisfiable. The property (11.10) is equivalently expressed as: comp{Ti'^) 1= -tcollision[0] A collision[l] —> trans[0]=red
(H-H)
which says that a collision occurs only following a transition in which either one train or both violate the norms. Notice that comp(r^^) ^ trans[0]=green -^ -tcollision[l]: as formulated by D2, the transition from a collision state to itself is green.
11 Modelling Unreliable and Untrustworthy Agent Behaviour
EW
WW
tW
EE
WE
tE
Et
Wt
tt
173
Fig. 11.2. Coloured transition system defined by action description D2. Dotted lines indicate red transitions. All states and all other transitions are green. Reflexive edges (all green) are omitted for clarity. One major advantage of taking C+ as the basic action formalism, as we see it, is its explicit transition system semantics, which enables a wide range of other analytical techniques to be applied. In particular, system properties can be expressed in the branching time temporal logic CTL and verified on the transition system defined by a C-\- or C-f-"^"^ action description using standard model checking systems. We will say that a formula (f of CTL is valid on a (coloured) transition system (S',I(cr^),i?, 5g,J^g) defined by C+"^"^ action description D when s U e \= (f for every 5 U e such that (5, e, s') G R for some state s\ The definition is quite standard, except for a small adjustment to allow action constants in (/? to continue to be evaluated on transition labels e. (And we do not distinguish any particular set of initial states; all sets in S are initial states.) We will also say in that case that formula (p is valid on the action description D. In CTL, the formula AX (f expresses that (p is satisfied in the next state in all future branching paths from now.^ EX is the dual of AX: EXcp = -lAX -K^. EX (p expresses that cp is satisfied in the next state of some future branching path from now. The properties (ILIO) and (ILl 1) can thus be expressed in CTL as follows: -> collision A tra ns=green —^ AX -• collision
(1L12)
or equivalently -^collision A EX collision -^ trans=red. It is easily verified by reference to Fig. 11.2 that these formulas are valid on the action description D2. Also valid is the CTL formula EX trans=green which expresses that there is always a permitted action for both trains. This is true even in collision states, since the only available transition is then the one where both trains remain idle, and that transition is green. The CTL formula EF collision is also valid on D2, signifying that in every state there is at least one path from then on with collision true somewhere in the future.^
^so U eo \= AX (p if for every infinite path so eosiei • • we have that si U ei |= cp. ^so U €0 \= Ef (p if there is an (infinite) path SQ €Q - • • Sm ^m - • • with Sm U €m \= (p for some m > 0.
174
Marek Sergot
11.4 Example: a simple co-ordination mechanism We now consider a slightly more elaborate version of the trains example. In general, we want to be able to verify formally whether the introduction of additional control mechanisms—additional controller agents, communication devices, restrictions on agents' possible actions—are effective in ensuring that agents comply with the norms ('social laws') that govern their behaviour. For the trains, we might consider a controller of some kind, or traffic lights, or some mechanism by which the trains communicate their locations to one another. For the sake of an example, we will suppose that there is a physical token (a metal ring, say) which has to be collected before a train can enter the tunnel. A train must pick up the token before entering the tunnel, and it must deposit it outside the tunnel as it exits. No train may enter the tunnel without possession of the token. To construct the C-f"^+ action description D3 for this version of the example, we begin as usual with the C-f action description Dtrains of section 11.2. We add a fluent constant tok to represent the position of the token. It has values {W, E, a, b}. tok=\N represents that the token is lying at the West end of the tunnel, tok=a that the token is currendy held by train a, and so on. We add Boolean action constants pick (a), pick (6) to represent that a (resp., b) picks up the token, and drop (a), drop (6) to represent that a (resp., 6) drops the token at its current location. For convenience, we will keep the action constants enter (a), enter (6), exit (a), exit (b) defined as in D2 of the previous section. The following causal laws describe the effects of picking up and dropping the token. To avoid duplication, x and / are variables ranging over a and b and locations W, E, t respectively. inertial tok drop (x) causes tok=l if tok=x A loc {x)=l nonexecutable drop (x) if tok^x pick (x) causes tok-x nonexecutable pick {x)
if loc
{x)^tok
The above specifies that the token can be dropped by train x only if train x has the token (tok=x), and it can be picked up by train x only if train x and the token are currently at the same location (loc {x)==tok). Notice that, as defined, an action drop (x) A x=stay drops the token at the current location of train x, and drop (x) A x=^go drops it at the new location of train x after it has moved. Since tok=\ is not a well-formed atom, it is not possible that (there is no transition in which) the token is dropped inside the tunnel, pick {x) A x=go represents an action in which train x picks up the token and moves with it. More refined representations could of course be constructed but this simple version will suffice for present purposes. The action description D3 is completed by adding the following permission laws: not-permitted enter (x) if tok^x A -^pick (x) oblig drop (x) if exit (x) A tok=x
11 Modelling Unreliable and Untrustworthy Agent Behaviour
175
It may be helpful to note that in C+"^~^, the first of these laws is equivalent to oblig pick (x) if enter (x) A tok^x The coloured transition system defined by action description D3 is larger and more complicated than that for D2 of the previous section, and cannot be drawn easily in its entirety. A fragment is shown in Fig. 11.3.
-Et
WW
Et
-WW
Et-
WW
-tE
ET
WW-
IE
EW
-WE
tE
EW-
WE
tE-
-tw
-EW
WE
-tt
tw
EW
WE-
It
Tw
-EE
-Wt
xJ
tw-
EE
Wt
EE-
WT
EE
Wt-
tt-
Fig. 11.3. Fragment of the coloured transition system defined by D3. The figure shows all states but not all transitions. The dash in state labels indicates the position of the token: it is at W/E when the dash is on the left/right, and with train a/b when the dash appears above the location of a/b. Dotted lines depict red transitions. All other depicted transitions, and all states, are green. One property we might wish to verify on D3 is that collisions are guaranteed to be avoided if both trains comply with the norms ('social laws'). Using the 'Causal Calculator' C C A L C , we can try to determine whether co7np{r^)
\= -^collision[0] A trans[0]=green A . . . A trans[m—l]=green —> -^collision[m]
that is, whether the formula comp{T^) A -^collision[0] A trans[0]=green A • • • A trans[m—l]=green A collision[7n] is satisfiable. But what should we take as the
176
Marek Sergot
length m of the longest path to be considered? In some circumstances it is possible to find a suitable value m for the longest path to be considered but it is far from obvious in this example what that value is, or even if one exists. The problem can be formulated conveniently as a model checking problem in CTL. The CTL formula E[trans=green U collision] expresses that there is at least one path with collision true at some future state and trans=green true on all intervening transitions.^ So the property we want to verify can be expressed in CTL as follows: -^collision —^ ->E[trans=green U collision]
(11-14)
It can be seen from Fig. 11.3 that property (11.14) is not valid on the action description D3: there are green transitions leading to collision states, from states where there is already a train inside the tunnel without the token. However, as long as we consider states in which both trains are initially outside the tunnel, the safety property we seek can be verified. The following formula is valid on D3: loc {a)^X A loc {b)y^X -^ -<E[trans=green U collision]
(11.15)
We are often interested in the verification of formulas such as (11.14) and (11.15) which express system properties conditional on norm compliance (conditional on all transitions being green). Verfication of such properties is particularly easy: translate the coloured transition system M = (5, A, jR, 5g, Rg) to the transition system M' = (5g, A, iZg) obtained by deleting all red states and red transitions from M. Now, since in CTL E[T U (^] = EF (f, instead of checking, for example, formula (11.15) on M we can check whether loc {a)^X A loc {b)^X —> -lEF collision
(11.16)
is valid on A^'. This is a standard model checking problem.
11.5 Conclusion We have presented the permission component of the action language C-f-"^"*" and sketched how it can be applied to modelling systems in which agents do not necessarily comply with the norms ('social laws') that govern their behaviour. Space limitations prevented us from discussing more elaborate examples where non-compliance with one set of norms imposes further norms for reparation and/or recovery. We are currently working on the use of C-\-'^'^ as the input language for various temporal logic model checkers, CTL in particular. Scaleability is of course an issue; however, here the limits are set by the model checking techniques employed and not by the use of CH-"*"^. At the MSRAS workshop, our attention was drawn to the model checking technique of [15] which uses program transformations on constraint logic programs representing transition systems to verify formulas of CTL. Since action descriptions in C-h, and in C-\-'^'^, can be related via causal theories to logic programs [10], this presents an interesting avenue to explore. ^5o U €0 1= E[y?i U (/?2] if there is an (infinite) path so eo • • • Sm€m • • • with Sm U em |= (p2 for some m >0 and with SiU ei \= <^i for all 0 < z < TTI.
11 Modelling Unreliable and Untrustworthy Agent Behaviour
177
References 1. Subrahmanian, V.S., Bonatti, P., Dix, J., Eiter, T., Kraus, S., Ozcan, F., Ross, R.: Heterogeneous Agent Systems. MIT Press, Cambridge (2000) 2. Rissanen, E., Sadighi Firozabadi, B., Sergot, M.J.: Towards a mechanism for discretionary overriding of access control (position paper). In: Proc. 12th International Workshop on Security Protocols, Cambridge, April 2004. (2004) 3. Artikis, A., Pitt, J., Sergot, M.J.: Animated specification of computational societies. In Castelfranchi, C , Johnson, W.L., eds.: Proc. 1st International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'02), Bologna, ACM Press (2002) 1053-1062 4. Artikis, A., Sergot, M.J., Pitt, J.: Specifying electronic societies with the Causal Calculator. In Giunchiglia, K, Odell, J., Weiss, G., eds.: Agent-Oriented Software Engineering III. Proc. 3rd International Workshop (AOSE 2002), Bologna, July 2002. LNCS 2585, Springer (2003) 1-15 5. Artikis, A., Sergot, M.J., Pitt, J.: An executable specification of an argumentation protocol. In: Proc. 9th International Conference on Artificial Intelligence and Law (ICAIL'03), Edinburgh, ACM Press (2003) 1-11 6. Lomuscio, A., Sergot, M.J.: Deontic interpreted systems. Studia Logica 75 (2003) 63-92 7. Lomuscio, A., Sergot, M.J.: A formalisation of violation, error recovery, and enforcement in the bit transmission problem. Journal of Applied Logic 2 (2004) 93-116 8. Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning about Knowledge. MIT Press, Cambridge (1995) 9. Sergot, M.: The language C-\--^^. In Pitt, J., ed.: The Open Agent Society. Wiley (2004) (In press). Extended version: Technical Report 2004/8. Department of Computing, Imperial College, London. 10. Giunchiglia, E., Lee, J., Lifschitz, V., McCain, N., Turner, H.: Nonmonotonic causal theories. Artificial Intelligence 153 (2004) 49-104 11. Giunchiglia, E., Lifschitz, V.: An action language based on causal explanation: Preliminary report. In: Proc. AAAI-98, AAAI Press (1998) 623-630 12. van der Hoek, W, Roberts, M., Wooldridge, M.: Social laws in alternating time: Effectiveness, feasibility, and synthesis. Technical report, Dept. of Computer Science, University of Liverpool (2004) Submitted. 13. Jamroga, W, van der Hoek, W, Wooldridge, M.: On obligations and abilities. In Lomuscio, A., Nute, D., eds.: Proc. 7th International Workshop on Deontic Logic in Computer Science (DEON'04), Madeira, May 2004. LNAI 3065, Springer (2004) 165-181 14. Meyer, J-J.: A different approach to deontic logic: Deontic logic viewed as a variant of dynamic logic. Notre Dame Journal of Formal Logic 29 (1988) 109-136 15. Fioravanti, R, Pettorossi, A., Proietti, M.: Verifying CTL properties of infinite state systems by specializing constraint logic programs. In: Proceedings of Second ACM-Sigplan International Workshop on Verification and Computational Logic (VCL*01), Florence, September 2001. (2001) 85-96 Expanded version: Technical Report R.544, lASI-CNR, Rome.
12 Nearest Neighbours without k Hui Wang^, Ivo Duntsch^, Giinther Gediga^, and Gongde Guo^ ^ School of Computing and Mathematics University of Ulster at Jordanstown Northern Ireland, BT37 OQB, United Kingdom h . wang|g. guo@uls t . a c . u k ^ Department of Computer Science Brock University St. Catherines, Ontario, L2S 3AI, Canada d u e n t s c h | g e d i g a @ c o s c . b r o c k u . ca Summary. In data mining, the k-Nearest-Neighbours (kNN) method for classification is simple and effective [3, 1]. The success of kNN in classification is dependent on the selection of a "good" value for k, so in a sense kNN is biased by k. However, it is unclear what is a universally good value for k. We propose to solve this choice-of-k issue by an alternative formalism which uses a sequence of values for k. Each value for k defines a neighbourhood for a data record - a set of k nearest neighbours, which contains some degree of support for each class with respect to the data record. It is our aim to select a set of neighbourhoods and aggregate their supports to create a classifier less biased by k. in print To this end we use a probability function G, which is defined in terms of a mass Junction for events weighted by a measurement of events. A mass function is an assignment of basic probability to events. In the case of classification, events can be interpreted as neighbourhoods, and the mass function can be interpreted in terms of class proportions in neighbourhoods. Therefore, a mass function represents degrees of support for a class in various neighbourhoods. We show that under this specification G is a linear function of the conditional probability of classes given a data record, which can be used directly for classification. Based on these findings we propose a new classification procedure. Experiment shows that this classification procedure is indeed less biased by k, and that it displays a saturating property as the number of neighbourhoods increases. Experiment further shows that the performance of our classification procedure at saturation is comparable to the best performance of kNN. Consequently, when we use kNN for classification we do not need to be concerned with k; instead, we need to select a set of neighbourhoods and apply the procedure presented here. Key words: k-nearest neighbour, data mining and knowledge discovery, Dempster-Shafer theory, contextual probability, pignistic probability
180
Hui Wang, Ivo Duntsch, Gunther Gediga, and Gongde Guo
12.1 Introduction k-Nearest-Neighbours (kNN) is a popular method for classification. It is simple but effective in many cases [3, 2, 1]. For a data record t to be classified, k nearest neighbours are retrieved, which form a neighbourhood of t. Majority voting with or without weighting among the data records in the neighbourhood is used to decide the classification for t. To apply kNN we need to choose a value for k and a metric for selecting nearest neighbours, and the success of classification is very much dependent on the choices of k and the metric. In a sense kNN is biased by k and the metric. In this paper we look at the choice-of-A: issue only. There are many ways of choosing the k value, but a simple one is to run the algorithm many times with different k values and choose the one with best performance. This approach is effective, but it lacks theoretical justification. Each neighbourhood contains some degree of support for all classes. If we can aggregate all these supports, we could end up with a less biased classification in the sense that it is not too dependent on a single value for k. We call this the aggregation problem. This is the motivation for the work reported in this paper. The Bayes Rule can be applied to this problem as follows. Let D be a dataset, t be a record to be classified, and Ai,A2,'" , ^ n be a series of neighbourhoods of t corresponding to different k values - A;i, A;2, • *' ? ^n- For example, Ai is the set of the nearest ki neighbours of t. Then we have t £ Aifori = 1,- -- ,n. The problem can then be formulated as the calculation of P(c|t, ^1,^42, • • • , ^ n ) . where c is a class label. According to Bayes Rule we have P{c\t,A^,A„
,A„)-
p(,,i^,i^...^^^) P(i|c)P(c) -, since ^ G Ai
Pit) -Pic\t)
Consequently, the neighbourhoods do not play any role at all! To calculate P{c\t) we can take either the regression approach or the class-conditional approach [3], which are two well-studied research areas. In order to aggregate the various supports from neighbourhoods for classification we can try to simply add some measure of the supports. Since the neighbourhoods are not mutually exclusive (i.e., Ai D Aj may be non-empty), probability theory does not apply directly. Dempster-Shafer (D-S) Theory does not require mutual exclusiveness, but it gives two numbers as lower (belief) and upper (plausibility) probabilities. This may give rise to difficulty in using the two numbers for classification. This further motivated us to seek to find a probability function that is additive, does not require mutual exclusiveness, and produces a single number. In the nearest neighbours methodology, the closer the neighbours are to a data record the more relevant they are. In the same spirit different neighbourhoods should play different
12 Nearest Neighbours without k
181
roles in classification, and the smaller a neighbourhood the more relevant it is in classification. This effort resulted in a probability function, which generalises the classical probability function. This function takes into account any set of neighbourhoods, and the smaller a neighbourhood the more significant its contribution.
12.2 Contextual probability Let i? be a finite set called/ram^ of discernment. A mass function is m : 2^ -^ [0,1] such that
Y. ^W = 1
(12.1)
X
The mass function is interpreted as a representation (or measure) of knowledge about i?, and m{A) is interpreted as a degree of support for A. Our objective is to extend our knowledge to those events that we do not know explicitly in m. Therefore we consider a function G :2^ -^ [0,1] such that for any ACQ G(^)=5:m(X)^i^,
(12.2)
The interpretation of the above definition is as follows. Event A may not be known explicitly in the representation of our knowledge, but we know explicitly some events X that are related to it (i.e., A overlaps with Xox Af\X ^^). Part of the knowledge about X, m(X), should then be shared by A, and a measure of this part is 1^4 n
x\/\x\. Theorem 1. G is a probability Junction on Q. That is to say, 1. For any ACQ. G{A) > 0; 2. G{Q) = 1; 3. For Ai,A2e fi, G{A^ U A-z) = G(Ai) + 6 ( ^ 2 ) ifAi n ylj = 0. Proof. The first claim is true following the fact that m{X) > 0 for any XCQ. The equation holds when A = $. The second claim is true since G{Q) = ^xcn "^(-^)Let's now consider the third claim. Xn{AiU A2) = {Xr\Ai)U{Xr\ A2). If Ai n A2 = 0then \Xr\{AiLlA2)\ = \Xr\Ai\ + | X n ^ 2 | - As a result we have
G{A, U ^2) = E m{X)^
'Tir
= E -(^)
182
Hui Wang, Ivo Diintsch, Giinther Gediga, and Gongde Quo
We therefore call G a contextual probability function to emphasize the fact that the probability values are derived from various "contexts" ^. For simplicity, if A is a singleton set, e.g., A = {a}, we write G{a) for G{{a}). Now we look at an example. Example 1. Let i? = {a, 6, c, d, e, / } , and the mass function m be as follows: m{{a,h}) m({a, 6,c}) m{{a^h^c^d}) m({a,6,c,d,e,/})
= = = =
0.3 0.4 0.1 0.2
Suppose that we are interested in the probabilities of the events: {a}, {6}, {c}, {d}, {e}, {/}, {6, c}, {a, 6, d]. According to the definition of G function, we have G{a) =m({a,6}) x J M L + m({a,6,c}) x
r^^^,+
m{{a, b, c,d}) X
'["^1 + m{{a,b,c, d,e, / } ) x '•^"^' \{a,b,c,d}\ ' ' ' ' ' \{a,b,c,d,e,f}\ =0.3 X 1/2 + 0.4 X 1/3 + 0.1 X 1/4 + 0.2 x 1/6 = 41/120 Similarly we have G(6) = G{a), G{c) = 23/120, G{d) = 7/120, G(e) = 4/120, and G{f) = G(e). Clearly G(o) + G{b) + G(c) + G{d) + G(e) + G ( / ) = 1. Further on, we have G{{aAd])
=m({a,6}) x K ^ + ^ ( { « , 5 , c } ) x r r ^ + ||a,6|| Ka,6,c}| + m({a,6,c,c/})x-ii^l^ + m({a,6,c,d,e,/})x
1^^'^'^^' |{a,6,c,d,e,/}| 0.3 + 0.4 X 2/3 + 0.1 X 3/4 + 0.2 x 3/6 = 89/120 : G{a) + G{h) + G{d)
Similarly we have G({6, c}) = 64/120 = G{h) + G(c).
12.3 An interpretation of the mass function The mass function can be interpreted in different ways. In this section we present one interpretation in order to solve the aggregation problem. ^The contextual probability is also known as pignistic probability, which was invented by Smets (see, for example, [4]). The probability (contextual or pignistic) function serves as a theoretical basis for the novel methods developed in this paper.
12 Nearest Neighbours without k
183
Let 5 be a finite set of class labels, and i? be a (finite) dataset each element of which has a class label in S. The labelling is denoted by a function / : i? -^ 5 so that for X G i?, / ( x ) is the class label of x. Consider a class c e S. Let N ^= |i7|, N^ =^ \{x e Q : f{x) = c}|, and ^xcn ^(^1^)- Th^ mass function for c is defined as mc : 2 that, for ACf2, ^c[A)
—> [0,1] such
def P{C\A) P{C\A) = :=^ p . .yv = —TF—
(12.3)
Clearly X ; x c r ? ^ c ( ^ ) = lLet ( ^ be the combinatorial number representing the number of ways of picking n unordered outcomes from N possibilities. From combinatorics we know that /M _
\nj
—
AT!
(N-n)\n\'
Lemma 1, If the distribution over Q is uniform, then
i=l
Proof Mc=J2
^ ( ^ i ^ ) ' definition
XCi?
v ^ P(X|c)P(c) ^
xcn
P{X\c)Nc/N , ' , •^\X\/N \X\/N
, uniformdistnbution ' N
XCi?
^ 1 =^Nc E ~ i=l
'
'
E
i=l
E
XCi7,|X|=2
^(^1^)' probability property
XCQ,\X\=ix^X
=Nc E
~ (i^i^) E
=Nc E
" (i^i^)' probability distribution property
=-TT(2^
^(^1^)' combination
- 1), combinatorics
184
Hui Wang, Ivo Diintsch, Giinther Gediga, and Gongde Quo Based on the mass function we define G^-.l^ ^ [0,1] such that, for A C i? G,{A)=
^ m , ( X ) ^ ^ ^
(12.4)
From the result in Section 12.2 we know that Gc{A) is a probability function, so it has all the properties of a probability function. l - t cc '^ i : E f = i h ((f-T^) - (f-2^) and /3e '^ t E f = i * ( f - 2 ^ . Since Me = ^ ( 2 ^ - 1) we have /3e = 2 ? ^ E t i :^G^2% Clearly /3e is independent of c, so we drop the subscript for /3. Theorem 2. iff f^e distribution over fi is uniform then, for a & Q and c& S, Gc{a) = P{c\a)ac + 13 Proof. By definition we have Gc(a) = E
^ ^ - < ^ W ^
=
1 P{c\X) ^
E 1
^ ^
P(c|X)
According to Bayes rule, P{c\X) =
P{c)P{X\c) P{X)
If the distribution over Q is uniform, then P{c) = N^/N and P(A') = |X|/Ar. Therefore we have P{c\X) =
j^^P{X\c)
As a result we have
Note also that P{a\c) = P{c\a)P{a)/P{c). If the distribution is uniform, P{a) = 1/N and P{c) = Nc/N. Therefore P{a\c) = P{c\a)/Nc. Following similar workings as in the proof of Lemma 1 we then have «c(«) = # E
4 {(f-7^)^(«k) + (f--.^[l - Pia\c)]}
=NcP{a\c)ac + (3 = P{c\a)ac + (3
12 Nearest Neighbours without k
185
Now we consider an example. Example 2. Consider a set J? = {a, 6, c} with uniform probability distribution. There are two classes (H- and - ) on the elements: {a: +, b: -, c: -1}. We want to show the relationship between P{c\A) and Gc{A) for AC.Q. Here AT = 3, Ar+ = 1, and iV_ = 2. Then we have M+ = 7/3, M_ = 14/3, a+ = 15/28, a_ = 15/56, and (3 = 13/84. The P{A) values are obtained by the uniform distribution assumption, and the additive property of probability functions. The P{c\A) are calculated from P{A,c) and P{A) by P\c\A) = P{A, c)/P{A). The mdA) are calculated according to Eq. 12.3. The Gc{A) are calculated according to Eq. 12.2. The results of calculation are shown in Table 12.1. Now we illustrate the calculation of Gc{a). r
(n\
rr. /f^iN • ^4.({a, 6}) + m^.({fl, c}) , m.^,({Q,6,c}) _ 3
_3^
i_ _
^
"7'^14'^21~84
G^(a) =m_({a}) + ^-(K^}) + ^-(K^i) + ^-({^^^^^}) = -A 2 _ 13 ~ 28 "^ 3 X 14 ~ 84 Note that due to the additive property of G, if \A\ > 1 we can obtain Gc{A) accordingtoGe(^) = Ex6AGc(a;). From Table 12.1 we can verify that the relationship between Gci^A) and P(c\A) holds for singleton A. We can also verify that the same relationship does not hold for non-singleton A, For example P(H-|{a, 6})Q;+ + ^ = 1/2 X 15/28 -I-13/84 = 71/168, whereas G^.({a,6}) = 71/84. The above example is meant to show the relationship between P{c\A) and Gc{A), In practice our knowledge is only up to a certain higher granularity level, and it is our aim to apply the knowledge to infer on more detailed cases. For some tasks (e.g., classification), we don't know P(c\x) for some a; G i?, but we may know P{c\A) for some ACQ, The task can be tackled by approximating P{c\x). Now we present an example to show how this can be done.
12.4 kNN for classification using multiple neighbourhoods Based on the contextual probability, we designed and implemented a kNN classification procedure based on multiple neighbourhoods - nokNN classifier or just nokNN for short. To classify a new data record we consider its h neighbourhoods, each of which is determined as in the standard kNN method. Supports for the class membership of the data record from these neighbourhoods are aggregated to give a combined class
186
Hui Wang, Ivo Duntsch, Gunther Gediga, and Gongde Guo
Table 12.1. Q = {a, 6, c} with uniform probability distribution. The elements in the set fall into two classes (+ and —): {a: +, b: -, c: -1}. The table is meant to show the relationship between P and G.
A 1 \ J__ PiA) 1 P{A^ P{A-)\ P{MA)\ P{-\A)\ m+{A) m-{A)\ G^iA) G-{A)\
A 1 [Ml PiA) 1 2/3 ^(A+) 1/3 PiA-)\ 1/3 P{MA)\ 1/2 PHA)\ 1/2 m+(A) 3/14 m-{A)\ 3/28 71/84 G^A) G-{A)\ 97/168
{a} {b} {c} 1/3 1/3 1/3 1/3 0 0 0 1/3 1/3 1 0 0 0 1 1 3/7 0 0 0 3/14 3/14 58/84 13/84 13/84 13/84 71/168 71/168 {a,c} {b,c} {a,b,c} 2/3 2/3 1 1/3 0 1/3 1/3 2/3 2/3 1/2 0 1/3 1/2 1 2/3 3/14 1/7 0 3/28 3/14 2/14 71/84 26/84 1 97/168 142/168 1
distribution, and the classification is done by simply choosing the class which has the highest conditional probability. This algorithm was evaluated with real world datasets in order to see if and how aggregating different neighbourhoods improves classification accuracy. In the experiment we used 7 public datasets available from UC Irvine Machine Learning Repository. General information about these datasets is shown in Table 12.2. Table 12.2. General information about the datasets used in the experiment including number of attributes, number of examples, and number of classes. Data #Attr. #Exa. #Cls Australian 14 690 2 Colic 22 368 2 Diabetes 8 768 2 Hepatitis 19 155 2 Iris 4 150 3 Sonar 60 208 2 Wine 13 178 3
12 Nearest Neighbours without k
187
In order to compare with the standard kNN we also implemented the standard voting kNN. kNN and nokNN were both implemented in C++ '^. In the experiment, 10 neighbourhoods were used and for every dataset, kNN was run with varying neighbourhoods (e.g., 1st neighbourhood, 2nd neighbourhood) and nokNN was run with varying number of neighbourhoods (e.g., 1 neighbourhood, 2 neighbourhoods). Due to page limit we can not show full details of the results. Figures 12.1 and 12.2 show the full details for one dataset - Diabetes, and the averages for all datasets. Table 12.3 shows that worst and best performance of kNN along with the corresponding "k" values, and the performance of nokNN when all 10 neighbourhoods were used. Note that 5-fold cross validation was used.
Diabetes
4
5
6
7
Neighbourhood
Average
i§ in
(0
c .-g *^
o 2 o o
CO
> w o ^
83 82.5 82 81.5 81 80.5 4
5
6
7
8
9
10
Neighbourhood
Fig. 12.1. ^NN results.
From the experiment result it is clear that the standard voting kNN performance varies when different neighbourhoods are used while nokNN performance improves with increasing number of neighbourhoods but stabilises after certain stages. Fur"^The implementation can be found at ~cbcj 23/papers/nokNN.html.
http://www.infj.ulst.ac.uk/
188
Hui Wang, Ivo Dtintsch, Gunther Gediga, and Gongde Guo Diabetes
2
3
4
5
6
7
8
9
10
Count of neighbourhoods
Average
u ih %
c 2
85 84
(0 83 u > £ 0) 82
+«•
^
(A
o o^ 81 O
1
2
3
4
5
6
7
8
9
10
Count of neighbourhoods
Fig. 12.2. noA;NN results.
Table 12.3. The worst and best performance of kNN along with the corresponding values for k. Also the performance of nokNN when 10 neighbourhoods are used. Dataset Australian Colic Diabetes Hepatitis Iris Sonar Wine Average
nokNN kNN Worst case Best case All of 10 k %correct k %correct %correct 85.15 2 83.04 10 85.48 82.63 7 79.64 2 82.89 74.86 1 71.73 2 74.22 79.35 1 78.71 2 79.35 96.00 1 93.33 3 96.00 76.43 10 65.89 1 72.08 93.21 3 89.29 1 92.65 83.24 83.95 80.23
12 Nearest Neighbours without k
189
thermore the stabilised performance is comparable (in fact slightly better in our experiment on the datasets) to the best performance of kNN within 10 neighbourhoods.
12.5 Summary and conclusion In this paper we have discussed the "choice-of-A:" issue related to the kNN method for classification. In order for kNN to be less dependent on the choice of value for k, we proposed to look at multiple sets of nearest neighbours rather than just one set of k nearest neighbours. A set of neighbours is here called a neighbourhood. For a data record t each neighbourhood bears certain support for different possible classes. The key question is: how can we aggregate these supports to give a more reliable support value which better reveals the true class of t? In order to answer this question we presented a probability function, G. It is defined in terms of a mass function on events and it takes into account the cardinality of events. A mass function is a basic probability assignment for events. For the classification problem, an event is specified as a neighbourhood and a mass function is taken to represent the degree of support for a particular class from different neighbourhoods. Under this specification we have shown that G is a linear function of conditional probability, which can be used to determine the class of a new data record. In other words we calculate G from a set of neighbourhoods, then we calculate the conditional probability from G according the linear equation, and finally we classify based on the conditional probability. We designed and implemented a classification algorithm based on the contextual probability - nokNN. Experiment on some public datasets shows that using nokNN the classification performance (accuracy) increases as the number of neighbourhoods increases but stabilises soon after a few number of neighbourhoods; using the standard voting kNN, however, the classification performance varies when different neighbourhoods are used. Experiment further shows that the stabilised performance of nokNN is comparable (in fact, slightly better than) to the best performance of kNN. This fulfils our objective.
References 1. Atkeson, C. G., Moore, A. W., and Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1-5): 11-73. 2. Han, J. and Kamber, M. (2000). Data Mining : Concepts and Techniques. Morgan Kaufmann. 3. Hand, D., Mannila, H., and Smyth, P. (2001). Principles of Data Mining. The MIT Press. 4. Smets, P. and Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 66(2):191-234.
13 Classifiers Based on Approximate Reasoning Schemes Jan Bazan^ and Andrzej Skowron^ ^ Institute of Mathematics, University of Rzeszow Rejtana 16A, 35-959 Rzeszow, Poland [email protected] ^ Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland [email protected]
Summary. We discuss classifiers [3] for complex concepts constructed from data sets and domain knowledge using approximate reasoning schemes (AR schemes). The approach is based on granular computing methods developed using rough set and rough mereological approaches [9, 13, 7]. In experiments we use a road simulator (see [15]) making it possible to collect data, e.g., on vehicle-agents movement on the road, at the crossroads, and data from different sensor-agents. We compare the quality of two classifiers: the standard rough set classifier based on the set of minimal decision rules and the classifier based on AR schemes.
13.1 Introduction A classification algorithm {classifier) permits making a forecast in new situations on the basis of accumulated knowledge. We consider here classifiers predicting decisions for objects previously unseen; each new object will be assigned to a class belonging to a predefined set of classes on the basis of observed values of suitably chosen attributes (features). Many approaches have been proposed for constructing of classifiers. Among them we would like to mention classical and modem statistical techniques, neural networks, decision trees, decision rules and inductive logic programming (see e.g. [5] for more details). One of the most popular methods for classification algorithms constructing is based on learning rules from examples. The standard rough set methods based on calculation of so called local reducts makes it possible to compute, for a given data, the descriptions of concepts by means of minimal consistent decision rules (see, e.g., [6], [2]). Searching for relevant patterns for complex concepts can be performed using AR schemes. AR schemes (see, e.g., [13]) can be treated as approximations of reasoning performed on concepts from domain knowledge and they represent relevant patterns for complex classifier construction. The proposed approach is based on granular
192
Jan Bazan and Andrzej Skowron
computing methods developed using rough set and rough mereological approaches [9,13,7]. In our experiments we use a road simulator (see [15]) making it possible to collect data, e.g., on vehicle-agents movement on the road and at the crossroads and data from different sensor-agents. The simulator also registers a few more features, whose values are defined by an expert. Any AR scheme is constructed from labelled approximate rules, called productions that can be extracted from data using domain knowledge [13]. In the paper we present a method for extracting productions from data collected by road simulator and an algorithm for classifying objects by productions, that can be treated as an algorithm for on-line synthesis of AR scheme for any tested object. We report experiments supporting our hypothesis that classifiers induced using the AR schemes are of higher quality than the traditional rough set classifiers (see Section 13.5). For comparison we use data sets generated by road simulator.
13.2 Approximate reasoning scheme One of the main tasks of data exploration [4] is discovery from available data and expert knowledge of concept approximations expressing properties of the investigated objects and rules expressing dependencies between concepts. Approximation of a given concept can be constructed using relevant patterns. Any such pattern describes a set of objects belonging to the concept to a degree p where 0 < p < 1. Relevant patterns for complex concepts can be represented by AR schemes. AR schemes can be treated as approximations of reasoning performed on concepts from domain knowledge. Any AR scheme is constructed from labeled approximate rules, called productions. Productions can be extracted from data using domain knowledge. We define productions as a parameterized implications with premises and conclusion built from patterns sufficiently included in the approximated concept. C3> "large"
C3> "medium" CI > "medium" C2 > "large" C3> "small" CI > "small" C2 > "medium "
CI > "smair
C2 > "small"
Fig. 13.1. The example of production as a collection of three production rules In Figure 13.1 we present an example of production for some concepts CI, C2 and C3 approximated by three linearly ordered layers small, medium, and large. This
13 Classifiers Based on Approximate Reasoning Schemes
193
production is a collection of three simpler rules, called production rules, with the following interpretation: (1) if inclusion degree to a concept CI is at least medium and to concept C2 at least large then the inclusion degree to a concept C3 is at least large', (2) if the inclusion degree to a concept CI is at least small and to a concept C2 at least medium then the inclusion degree to a concept C3 is at least medium', (3) if the inclusion degree to a concept CI is at least small and to a concept C2 at least small then the inclusion degree to a concept C5 is at least small. The concept from the upper level of production is called the target concept of production, whilst the concept from the lower level of production are called the source concepts of production. For example, in case of production from Figure 13.1 C3 is the target concept and CI, C2 are the source concepts.
Cl-si^'smair C2>"memmf C4>''smair
Fig. 13.2. Synthesis of approximate reasoning scheme
One can construct AR scheme by composing single production rules chosen from different productions from a family of productions for various target concepts. In Figure 13.2 we have two productions. The target concept of the first production is C5 and the target concept of the second production is the concept C3. We select one production rule from the first production and one production rule from the second production. These production rules are composed and then a simple AR-scheme is obtained that can be treated as a new two-levels production rule. Notice, that the target pattern of lower production rule in this AR-scheme is the same as one of the source patterns from the higher production rule. In this case, the common pattern is
194
Jan Bazan and Andrzej Skowron
described as follows: inclusion degree (of some pattern) to a concept C3 is at least medium. In this way, we can compose AR-schemes into hierarchical and multilevel structures using productions constructed for various concepts.
13.3 Road simulator Road simulator (see [15]) is a tool for generating data sets recording vehicle movement on the road and at the crossroads (see [15]). Such data is extremely crucial in testing of complex decision systems monitoring the situation on the road that are working on the basis of information coming from different devices. ,.^MiM Maximal number of vehicles: 20 Current number of vehicles: 14 Humidily: LACK Visibility: 500 Traffic parameter of main road: 0.5 Traffic parameter of subordinate road: 0.2 Current simulation step: 68 (from 500) Saving data: NO
SOUTH I
Fig. 13.3. The board of simulation Driving simulation takes place on a board (see Figure 13.3) which presents a crossroads together with access roads. During the simulation the vehicles may enter the board from all four directions that is East, West, North and South. The vehicles coming to the crossroads form South and North have the right of way in relation to the vehicles coming from West and East. Each of the vehicles entering the board has only one aim - to drive through the crossroads safely and leave the board. The simulation takes place step by step and during each of its steps the vehicles may perform the following maneuvers during
13 Classifiers Based on Approximate Reasoning Schemes
195
the simulation: passing, overtaking, changing direction (at the crossroads), changing lane, entering the traffic from the minor road into the main road, stopping and pulling out. Planning each vehicle's further steps takes place independently in each step of the simulation. Each vehicle, is "observing" the surrounding situation on the road, keeping in mind its destination and its own parameters (driver's profile), makes an independent decision about its further steps; whether it should accelerate, decelerate and what (if any) maneuver should be commenced, continued, ended or stopped. Making decisions concerning further driving, a given vehicle takes under consideration its parameters and the driving parameters of five vehicles next to it which are marked FRl, FR2, FL, BR and BL (see Figure 13.4).
FL-H-
1
-+-FR2
i-
-H—FRl
BL-
4 i-
A given vehicle
h-BR
Fig. 13.4. A given vehicle and five vehicles next to it
During the simulation the system registers a series of parameters of the local simulations, that is simulations connected with each vehicle separately, as well as two global parameters of the simulation that is parameters connected with driving conditions during the simulation. The value of each simulation parameter may vary and what follows it has to be treated as a certain attribute taking values from a specified value set. We associate the simulation parameters with the readouts of different measuring devices or technical equipment placed inside the vehicle or in the outside environment (e.g., by the road, in a helicopter observing the situation on the road, in a police car). These are devices and equipment playing the role of detecting devices or converters meaning sensors (e.g., a thermometer, range finder, video camera, radar, image and sound converter). The attributes taking the simulation parameter values, by analogy to devices providing their values will be called sensors. The exemplary sensors are the following: initial and current road (four roads), distance from the crossroads (in screen units), current lane (two lanes), position of the vehicle on the road (values from 0.0 to 1.0), vehicle speed (values from 0.0 to 10.0), acceleration and deceleration, distance of a given vehicle from FRl, FL, BR and BL
196
Jan Bazan and Andrzej Skowron
vehicles and between FRl and FR2 (in screen units), appearance of the vehicle at the crossroad (binary values), visibility (expressed in screen units values from 50 to 500), humidity (slipperiness) of the road (three values: lack of humidity - dry road, low humidity, high humidity). If, for some reason, the value of one of the sensors may not be determined, the value of the parameter becomes equal NULL (missing value). Apart from sensors the simulator registers a few more attributes, whose values are determined using the sensor's values in a way determined by an expert. These parameters in the present simulator version take the binary values and are therefore called concepts. The results returned by testing concepts are very often in a form YES, NO or DOES NOT CONCERN (NULL value). Here are exemplary concepts: 1. 2. 3. 4. 5.
Is the vehicle forcing the right of way at the crossroads? Is there free space on the right lane in order to end the overtaking maneuver? Will the vehicle be able to easily overtake before the oncoming car? Will the vehicle be able to brake before the crossroads? Is the distance from the FRl vehicle too short or do we predict it may happen shortly? 6. Is the vehicle overtaking safely? 7. Is the vehicle driving safely? Besides binary concepts, simulator registers for any such concept one special attribute that approximates binary concept by six linearly ordered layers: certainly YES, rather YES, possibly YES, possibly NO, rather NO and certainly NO. Some concepts related to the situation of the road are simple and classifiers for them can be induced directly from sensor measurement but for more complex concepts this is infeasible. In searching for classifiers for such concepts domain knowledge can be helpful. The relationships between concepts represented in domain knowledge can be used to construct hierarchical relationship diagrams. Such diagrams can be used to induce multi-layered classifiers for complex concepts (see [14] and next section). In Figure 13.5 there is an exemplary relationship diagram for the above mentioned concepts. The concept specification and concept dependencies are usually not given automatically in accumulated data sets. Therefore they should be extracted from a domain knowledge. Hence, the role of human experts is very important in our approach. During the simulation, when a new vehicle appears on the board, its so called driver's profile is determined. It may take one of the following values: a very careful driver, a careful driver and a careless driver. Driver's profile is the identity of the driver and according to this identity further decisions as to the way of driving are made. Depending on the driver's profile and weather conditions (humidity of the road and visibility) speed limits are determined, which cannot be exceeded. The generated data during the simulation are stored in a data table (information system). Each row of the table depicts the situation of a single vehicle and the sensors' and concepts' values are registered for a given vehicle and the FRl, FR2, FL,
13 Classifiers Based on Approximate Reasoning Schemes
197
Safe driving Safe overtaking
Safe distance from FL during overtaking
Forcing the right of way
Possibility of going back to the right lane
Possibility of safe stopping before the crossroads
SENSORS Fig. 13.5. The relationship diagram for presented concepts
BL and BR vehicles (associated with a given vehicle). Within each simulation step descriptions of situations of all the vehicles on the road are saved to file.
13.4 Algorithm for classifying objects by production In this section we present an algorithm for classifying objects by a given production but first of all we have to describe the method for the production inducing. To outline a method for production inducing let us assume that a given concept C registered by road simulator depends on two concepts CI and C2 (registered be road simulator too). Each of these concepts can be approximated by six linearly ordered layers: certainly YES, rather YES, possibly YES, possibly NO, rather NO and certainly NO. We induce classifiers for concepts CI and C2. These classifiers generate the corresponding weight (the name of one of six approximation layers) for any tested object. We construct for the target concept C a table T over the Cartesian product of sets defined by weight patterns for CI, C2, assuming that some additional constraints hold. Next, we add to the table T the last column, that is an expert decision. From the table T, we extract production rules describing dependencies between these three concepts. In Figure 13.6 we illustrate the process of extracting production rule for concept C and for the approximation layer rather YES of concept C The production rule can be extracted in the following four steps: 1. Select all rows from the table T in which values of column C is not less than rather YES.
198
Jan Bazan and Andrzej Skowron The tatget pattern of production rule
C2 C 1 1 ^^ 1 certainly YES certainly YES certainly YES \ certainly NO certBinly NO \ certainly YES rather YES 1 possibly YES possibly NO possibly YES \ 1 certainly NO 1 rather YES
1 possibly YES possibly NO 1 possibly YES ratiierYES
rather NO \ rather YES \
1 possibly YES certainly NO possibly NO \ C1> possibly YES 1 certainly YES rather YES certainly YES | possibly NO
1 certainly NO
C2> rather YES
The source patterns of production rule
certainly NO \
certainly YES > rather YES > possibly YES > possibly NO > rather NO > certainly NO Fig. 13.6. The illustration of production rule extracting 2. Find minimal values of attributes CI and C2 from table T for selected rows in the previous step (in our example it easy to see that for the attribute CI minimal value is possibly YES and for the attribute C2 minimal value is rather YES). 3. Set sources patterns of new production rule on the basis of minimal values of attributes that were found in the previous step. 4. Set the target pattern of new production, i.e., concept C with the value rather YES. Finally, we obtain the production rule: (*) If (CI > possibly YES) and {C2 > rather YES) then (C > rather YES). A given tested object can be classified by the production rule (*), when weights generated for the object by classifiers induced for concepts from the rule premise are at least equal to degrees from source (premise) patterns of the rule. Then the production rule classifies tested object to the target (conclusion) pattern. 1 1
CI certainly YES
1
CI
1
possibly YES
C2 n — ^ rather YES \ 02
1
certainly NO \
-/>
0>ratherYES
C>rather YES
Fig. 13.7. Classifying tested objects by single production rule
13 Classifiers Based on Approximate Reasoning Schemes
199
For example, the object ui from Figure 13.7 is classified by production rule (*) because it is matched by the both patterns from the left hand side of the production rule (*) whereas, the object U2 from figure 13.7 is not classified by production rule (*) because it is not matched by the second source pattern of production rule (*) (the value of attribute C2 is less than rather YES). The method of extracting production rule presented above can be applied for various values of attribute C In this way, we obtain a collection of production rules, that we mean as a production. Using production rules selected from production we can compose AR schemes (see Section 13.2). In this way relevant patterns for more complex concepts are constructed. Any tested object is classified by AR scheme, if it is matched by all sensory patterns from this AR scheme. The method of object classification based on production can be described as follows: 1. Preclassify object to the production domain. 2. Classify object by production. We assume that for any production a production guard (boolean expression) is given. Such a guard describes the production domain and is used in preclassification of tested objects. The production guard is constructed using domain knowledge. An object can be classified by a given production if it satisfies the production guard. For example, let us assume that the production P is generated for the concept: "Is the vehicle overtaking safely ?". Then an object-vehicle u is classified by production P iff ti is overtaking. Otherwise, it is returned a message ''HAS NOTHING TO DO WITH (OVERTAKING) ". Now, we can present algorithm for classifying objects by production. Algorithm 1. The algorithm for classifying objects hy production Step 1 Select a complex concept C from relationship diagram. Step 2 If the tested object should not be classified by a given production P extracted for the selected concept C, i.e., it does not satisfy the production guard: return HAS NOTHING TO DO WITH Step 3 Find a rule from production P that classifies object with the maximal degree to the target concept of rule if such a rule of P does not exist return / DO NOT KNOW. Step 4 Generate a decision value for object from the degree extracted in the previous step if (the extracted degree is greater than possibly YES) then the object is classified to the concept C (return YES) else the object is not classified to the concept C (return NO). The algorithm for classifying objects by production presented above can be treated as an algorithm of dynamical synthesis of AR scheme for any tested object. It is easy to see, that during classification any tested object is classified by single
200
Jan Bazan and Andrzej Skowron
Table 13.1. Results of experiments for the concept: "Is the vehicle overtaking safely?" Decision class Method Accuracy Coverage Real accuracy 0.784 YES RS 0.949 0.826 0.974 ARS 0.973 0.948 0.889 NO RS 0.979 0.870 ARS 0.926 1.0 0.926 All classes RS 0.999 0.996 0.995 (YES + NO) ARS 0.999 0.999 0.998
production rule selected from production. It means that the production rule is dynamically assigned to the tested object. In other words, the approximating reasoning scheme is dynamically synthesized for any tested object. We claim that the quality of the classifier presented above is higher than the classifier constructed using algorithm based on the set of minimal decision rules. In the next section we present the results of experiments with data sets generated by road simulator supporting this claim.
13.5 Experiments with Data To verify effectiveness of classifiers based on AR schemes, we have implemented our algorithms in the AS-lib programming library. This is an extension of the RSESlib 2.1 progranmiing library creating the computational kernel of the RSES system [16]. The experiments have been performed on the data sets obtained from the road simulator. We have applied the train and test method for estimating accuracy (see e.g. [5]). Data set consists of 18101 objects generated by the road simulator. This set was randomly divided to the train table (9050 objects) and the test table (9051 objects). In our experiments, we compared the quality of two classifiers: RS and ARS. For inducing of RS we use RSES system generating the set of minimal decision rules that are next used for classifying situations from testing data. ARS is based on AR schemes. We compared RS and ARS classifiers using accuracy of classification, learning time and the rule set size. We also checked the robustness of classifiers. Table 13.1 and table 13.2 show the results of the considered classification algorithms for the concept: "Is the vehicle overtaking safely?" and for the concept "Is the vehicle driving safely?" respectively. One can see that accuracy of algorithm ARS is higher than the accuracy of the algorithm RS for analyzed data set. Table 13.3 shows the learning time and the number of decision rules induced for the considered classifiers. In case of the number of decision rules we present the average over all concepts (from the relationship diagram) number of rules.
13 Classifiers Based on Approximate Reasoning Schemes
201
Table 13.2. Results of experiments for the concept: "Is the vehicle driving safely?" Decision class Method Accuracy Coverage Real accuracy YES NO All classes (YES + NO)
RS ARS RS ARS RS ARS
0.978 0.962 0.633 0.862 0.964 0.958
0.946 0.992 0.740 0.890 0.935 0.987
0.925 0.954 0.468 0.767 0.901 0.945
Table 13.3. Learning time and the rule set size for concept: "Is the vehicle driving safely?" Method Learning time Rule set size 835 801 seconds RS ARS 247 seconds 189
One can see that the learning time for ARS is much shorter than for RS and the number of decision rules induced for ARS is much lower than the number of decision rules induced for RS.
13.6 Summary We have discussed a method for construction (from data and domain knowledge) of classifiers for complex concepts using AR schemes (ARS classifiers). The experiments showed that: • • •
the accuracy of classification by ARS is better than accuracy of RS classifier, the learning time for ARS is much shorter than for RS, the number of decision rules induced for ARS is much lower than the number of decision rules induced for RS.
Finally, the ARS classifier is much more robust than the RS classifier. The results are consistent with the rough-mereological approach. Acknowledgement. The research has been supported by the grant 3 T l l C 002 26 from Ministry of Scientific Research and Information Technology of the Republic of Poland.
References 1. Bazan J. (1998) A comparison of dynamic non-dynamic rough set methods for extracting laws from decision tables. In: [8]: 321-365 2. Bazan J., Nguyen H. S., Skowron A., Szczuka M. (2003) A view on rough set concept approximation. LNAI2639, Springer, Heidelberg: 181-188
202
Jan Bazan and Andrzej Skowron
3. Friedman J. H., Hastie T., Tibshirani R. (2001) The elements of statistical learning: Data mining, inference, and prediction. Springer, Heidelberg. 4. Kloesgen W., Zytkow J. (eds) (2002) Handbook of KDD. Oxford University Press 5. Michie D., Spiegelhalter D.J., Taylor, C.C. (1994) Machine learning, neural and statistical classification. Ellis Horwood, New York 6. Pawlak Z (1991) Rough sets: Theoretical aspects of reasoning about data. Kluwer, Dordrecht. 7. Pal S. K., Polkowski L., Skowron A. (eds) (2004) Rough-Neuro Computing: Techniques for Computing with Words. Springer-Verlag, Berlin. 8. Polkowski L., Skowron A. (eds) (1998) Rough Sets in Knowledge Discovery 1-2, Physica-Verlag, Heidelberg. 9. Polkowski L., Skowron, A. (1999) Towards adaptive calculus of granules. In: [17]: 201227 10. Polkowski L., Skowron A. (2000) Rough mereology in information systems. A case study: Qualitative spatial reasoning. In: Polkowski L., Lin T. Y., Tsumoto S. (eds). Rough Sets: New Developmentsin Knowledge Discovery in Information Systems, Studies in Fuzziness and Soft Computing 56, Physica-Verlag, Heidelberg: 89-135 11. Skowron, A. (2001) Toward intelligent systems: Calculi of information granules. Bulletin of the International Rough Set Society 5 (1-2): 9-30 12. Skowron A., Stepaniuk J. (2001) Information granules: Towards foundations of granular computing. International Journal of Intelligent Systems 16 (1): 57-86 13. Skowron A., Stepaniuk J. (2002) Information granules and rough-neuro computing. In [7]: 43-84 14. Stone P. (2000) Layered Learning in Multi-Agent Systems: A Winning Approach to Robotic Soccer. The MIT Press, Cambridge, MA 15. The Road simulator Homepage - l o g i c . m i m u w . e d u . p i / ^ b a z a n / s i m u l a t o r 16. The RSES Homepage - l o g i c , mimuw . e d u . p l / ~ r s e s 17. Zadeh L. A., Kacprzyk J. (eds.) (1999) Computing with Words in Information/Intelligent Systems 1-2. Physica-Verlag, Heidelberg
14 Towards Rough Applicability of Rules Anna Gomoliiiska* University of Bialystok, Department of Mathematics, Akademicka 2, 15267 Bialystok, Poland [email protected]
Summary. In this article, we further study the problem of soft applicability of rules within the framework of approximation spaces. Such forms of applicability are generally called rough. The starting point is the notion of graded applicability of a rule to an object, introduced in our previous work and referred to as fundamental. The abstract concept of rough applicability of rules comprises a vast number of particular cases. In the present paper, we generalize the fundamental form of applicability in two ways. Firstly, we more intensively exploit the idea of rough approximation of sets of objects. Secondly, a graded applicability of a rule to a set of objects is defined. A better understanding of rough applicability of rules is important for building the ontology of an approximate reason and, in the sequel, for modeling of complex systems, e.g., systems of social agents. Key words: approximation space, ontology of approximate reason, information granule, graded meaning of formulas, applicability of rules To Emilia
14.1 Introduction It is hardly an exaggeration to say that soft application of rules is the prevailing form of rule following in real life situations. Though some rules (e.g., instructions, regulations, laws, etc.) are supposed to be strictly followed, it usually means "as strictly as possible" in practice. Typically, people tend to apply rules "softly" whenever the expected advantages (gain) surpass the possible loss (failure, harm). Soft application of rules is usually more efficient and effective than the strict one, however, at the cost of the results obtained. In many cases, adaptation to changing situations requires a
*Many thanks to James Peters, Alberto Pettorossi, Andrzej Skowron, Dominik Sl^zak, and last but not least to the anonymous referee for useful and insightful remarks. The research has been partially supported by the grant 3T11C00226 from the Ministry of Scientific Research and Information Technology of the Republic of Poland.
204
Anna Gomolinska
change in the mode of application of rules only, retaining the rules unchanged. Allowing rules to be applied softly simplifies multi-attribute decision making under missing or uncertain information as well. As a research problem, applicability of rules concerns strategies (meta-rules) which specify the permissive conditions for passing from premises to conclusions of rules. In this paper, we analyze soft applicability of rules within the framework of approximation spaces (ASs) or, in other words, rough applicability of rules. The first step has been already made by introducing the concept of graded applicability of a rule to an object of an AS [3]. This fundamental form of applicability is based on the graded satisfiability and meaning of formulas and their sets, studied in [2]. The intuitive idea is that a rule r is applicable to an object u in degree Hff a sufficiently large part of the set of premises of r is satisfied for uin a. sufficient degree, where sufficiency is determined by t. We aim at extending and refining this notion step by step. For the time being, we propose two generalizations. In the first one, the idea of approximation of sets of objects is exploited more intensively. The second approach consists in extending the graded applicability of a rule to an object to the case of graded applicability of a rule to a set of objects. Studying various rough forms of applicability of rules is important for building the ontology of an approximate reason. In [9], Peters et al. consider structural aspects of such an ontology. A basic assumption made is that an approximate reason is a capability of an agent. Agents classify information granules, derived from sensors or received from other agents, in the context of ASs. One of the fundamental forms of reasoning is a reflective judgment that a particular object (granule of information) matches a particular pattern. In the case of rules, agents judge whether or not, and how far an object (set of objects) matches the conditions for applicability of a rule. As explained in [9]: Judgment in agents is a faculty of thinking about (classifying) the particular relative to decision rules derived from data. Judgment in agents is reflective but not in the classical philosophical sense [...]. In an agent, a reflective judgment itself is an assertion that a particular decision rule derived from data is applicable to an object (input). [... ] Again, unlike Kant's notion of judgment, a reflective judgment is not the result of searching for a universal that pertains to a particular set of values of descriptors. Rather, a reflective judgment by an agent is a form of recognition that a particular vector of sensor values pertains to a particular rule in some degree. The ontology of an approximate reason may serve as a basis for modeling of complex systems like systems of social, highly adaptive agents, where rules are allowed to be followed flexibly and approximately. Since one and the same rule may be applied in many ways depending, among others, on the agent and the situation of (inter)action, we can to a higher extent capture the complexity of the modelled system by means of relatively less rules. Moreover, agents are given more autonomy in applying rules. From the technical point of view, degrees of applicability may serve as lists of tuning parameters to control application of rules. Another area of possible use of rough applicability is multi-attribute classification (and, in particular, decision making). In
14 Towards Rough Applicability of Rules
205
the case of an object to which no classification rule is applicable in the strict sense, we may try to apply an available rule roughly. This happens in the real life, e.g., in the process of selection of the best candidate(s), where no candidate fully satisfies the requirements. If a decision is to be made anyway, some conditions should be omitted or their satisfiability should be treated less strictly. Rough applicability may also help in classification of objects, where some values of attributes are missing. In Sect. 14.2, approximation spaces are overviewed. Section 14.3 is devoted to the notions of graded satisfiability and meaning of formulas. In Sect. 14.4, we generalize the fundamental notion of applicability in the two directions mentioned earlier. Section 14.5 contains a concise summary.
14.2 Approximation Spaces The general notion of an approximation space (AS) was proposed by Skowron and Stepaniuk [13, 14, 16]. Any such space is a triple M = (U^F^K), where f/ is a non-empty set, F .U \-^ pU is an uncertainty mapping, and K : (pf/)^ ^-^ [0,1] is a rough inclusion function (RIF). pU and (pC/)^ denote the power set of U and the Cartesian product pU x pU, respectively. Originally, F and n were equipped with tuning parameters, and the term "parameterized" was therefore used in connection with ASs. Exemplary ASs are the rough ASs, induced by the Pawlak information systems [6, 8]. Elements of C/, called objects and denoted by u with subscripts whenever needed, are known by their properties only. Therefore, some objects may be viewed as similar. Objects similar to an object u constitute a granule of information in the sense of Zadeh [17]. Indiscemibility may be seen as a special case of similarity. Since every object is obviously similar to itself, the universe U of M\^ covered by a family of granules of information. The uncertainty mapping T is a basic mathematical tool to describe formally granulation of information on U, For every object u, Fu is a set of objects similar to u, called an elementary granule of information drawn to u. By assumption, u G Fu. Elementary granules are merely building blocks to construct more complex information granules which form, possibly hierarchical, systems of granules. Simple examples of complex granules are the results of set-theoretical operations on granules obtained at some earlier stages, rough approximations of concepts, or meanings of formulas and sets of formulas in ASs. An adaptive calculus of granules, measure(s) of closeness and inclusion of granules, construction of complex granules from simpler ones which satisfy a given specification are a few examples of related problems (see, e.g., [11, 12, 15, 16]). In our approach, a RIF K : [pU)^ \-^ [0,1] is a function which assigns to every pair (x, y) of subsets of U, a number in [0,1] expressing the degree of inclusion of x in 2/, and which satisfies postulates (A1)-(A3) for any x^y^z C U:{A\) K{x^y) = 1 iff X C 2/; (A2) If x ^^ 0, then K{X, y) = 0 iff X fl y = 0; (A3) If y C, z, then /^(^j y) < i^{x^ z). Thus, our RIFs are somewhat stronger than the ones characterized by the axioms of rough mereology, proposed by Polkowski and Skowron [10, 12].
206
Anna Gomolinska
Rough mereology extends Lesniewski's mereology [4] to a theory of the relationship of being-a-part-in-degree. Among various RIFs, the standard ones deserve a special attention. Let the cardinality of a set X be denoted by #a:. Given a non-empty finite set U and x,y CU, the standard RIF, «-^, is defined by /^-^(x, y) = < ^^ , I 1 otherwise. The notion of a standard RIF, based on the frequency count, goes back to Lukasiewicz [5]. In our framework, where infinite sets of objects are allowed, by a quasi-standard RIF we understand any RIF which for finite first arguments is like the standard one. In M, sets of objects (concepts) may be approximated in various ways (see, e.g., [1] for a discussion and references). In [14,16], a concept x CU is approximated by means of the lower and upper rough approximation mappings low, upp : pU H-^ pU, respectively, defined by lowx = {u G C/ I K,{ru^x) = 1} and uppx = {u e U \ K^FU^X) > 0}. (14.1) By (A1)-(A3), the lower and upper rough approximations of a:, lowx and uppa;, are equal io {u e U \ Fu C x} and {u eU \ FuDx ^ (/)}, respectively. Ziarko [18, 19] generalized the Pawlak rough set model [7, 8] to a variableprecision rough set model by introducing variable-precision positive and negative regions of sets of objects. Let t e [0,1]. Within the AS framework, in line with (14.1), the mappings of t-positive and t-negative regions of sets of objects, pos^,neg^ : pU i-> pU, respectively, may be defined as follows, for any set of objects x'} pos^x = {ti G [/ I
K{FU,X)
> t} and neg^a: = {u E U \ K{FU,X)
< t}. (14.2)
Notice that lowx = pos^x and uppx = U — neggX.
14.3 The Graded IVfeaning of Formulas Suppose a formal language L expressing properties of M is given. The set of all formulas of L is denoted by FOR. We briefly recall basic ideas concerning the graded satisfiability and meaning of formulas and their sets, studied in [2]. Given a relation of (crisp) satisfiability of formulas for objects of [/, \=c, the c-meaning (or, simply, meaning) of a formula a is understood as the extension of a, i.e., as the set | |a| |c = {u £ U \ u \=c a}. For simplicity, "c" will be omitted in formulas whenever possible. By introducing degrees t G [0,1], we take into account the fact that objects are perceived through the granules of information attached to them. In the formulas below, li 1=^ a reads as "a is t-satisfied for u" and \\ct\\t denotes the t-meaning of a: u\=ta
iff K{FU, \\a\\) > t and ||a||t = {u £ U \ u ^t (^}-
The original definitions, proposed by Ziarko, are somewhat different.
(14.3)
14 Towards Rough Applicability of Rules
207
In other words, ||a||t = posJ|a||. Next, for t € T == [0,1] U {c}, the set of all formulas which are t-satisfied for an object u is denoted by \u\t, i.e., \u\t = {a E FOR I ^i |=t a } . Notice that it may be t = c here. The graded satisfiability of a formula for an object is generalized on the left-hand side to a graded satisfiability of a formula for a set of objects, and on the right-hand side to a graded satisfiability of a set of formulas for an object, where degrees are elements of Ti = T x [0,1]. For any n-tuple t and i = 1 , . . . , n, let Tr^t denote the z-th element of t. For simplicity, we use |=t, | • |t» and 11 • | |t both for the (object, formula)case as well as for its generalizations. Thus, for any object u, a set of objects x, a formula a, a set of formulas X, a RIF K* : (pFOR)^ h-> [0,1], and teTi, x\=tOL iff K{X, llallTTit) > 7r2t and \x\t = {a e FOR | x |=t a } ; u^tX
iff K^XM^it)
> 7T2t and ||X||t = {ueU\u\=t
X}.(14.4)
u [=t X reads as "X is t-satisfied for u'\ and | |X| |t is the t-meaning of X. Observe that \=t extends the classical, crisp notions of satisfiability of the sorts (set-of-objects, formula) and (object, set-of-formulas). Along the standard lines, x \= a iff \fu e x.u t= a, and u\= X iffWa e X.u \= a. Hence, x ^ a iff x f=(c,i) Q^» and u [= X iff u |=(c,i) X. Properties of the graded satisfiability and meaning of formulas and sets of formulas may be found in [2]. Let us only mention that a non-empty finite set of formulas X cannot be replaced by a conjunction f\X of all its elements as it happens in the classical, crisp case. In the graded case, one can only prove that II /\ X||t C ||X||(t 1), where t eT, but the converse may not hold.
14.4 The Graded Applicability of Rules Generalized All rules over L, denoted by r with subscripts whenever needed, constitute a set RUL. Any rule r is a pair of finite sets of formulas of L, where the first element, Pr, is the set of premises of r and the second element of the pair is a non-empty set of conclusions of r. Along the standard lines, a rule which is not applicable in a considered sense is called inapplicable. A rule r is applicable to an object u in the classical sense iff the whole set of premises Pr is satisfied for u. The graded applicability of a rule to an object, viewed as a fundamental form of rough applicability here, is obtained by replacing the crisp satisfiability by its graded counterpart and by weakening the condition that all premises be satisfied [3]. Thus, for any t eT\, r e apl^u iff «:*(Pr, l^kit) > 7r2t, i.e., iff u e \\Pr\\t'
(14.5)
r e aip\^u reads as "r is t-applicable to u''? Properties of apl^ are presented in [3]. Let us only note that the classical applicability and the (c, 1)-applicability coincide. Example 1. In the textile industry, a norm determining whether or not the quality of water to be used in the process of dyeing of textiles is satisfactory, may be written ^Equivalently, "r is applicable to u in degree f\
208
Anna Gomolinska
as a decision rule r with 16 premises and one conclusion (o?, yes). In this case, the objects of the AS considered are samples of water. The c-meaning of the conclusion of r is the set of all samples of water u eU such that the water may be used for dyeing of textiles, i.e., ||(G?,yes)|| = {u £ U \ d{u) = yes}. Let a i , . . . ,07 denote the attributes: colour (mgPt/1), turbidity (mgSi02/l), suspensions (mg/1), oxygen consumption (mg02/l), hardness (mval/1), Fe content (mg/1), and Mn content (mg/1), respectively Then, (ai, [0,20]), (as, [0,15]), (as, [0,20]), {a^, [0,20]), (as, [0,1.8]), (ae, [0,0.1]), and (07, [0,0.05]) are exemplary premises of r. For instance, the cmeaning of (a2, [0,15]) is the set of all samples of water such that their turbidity does not exceed 15mgSi02/l, i.e., ||(a2, [0,15])|| = {u eU \ a2{u) < 15}. Suppose that the values of a2, aa slightly exceed 15,20 for some sample u, respectively, i.e., the second and the third premises are not satisfied for u, whereas all remaining premises hold for u. That is, r is inapplicable to the sample u in the classical sense, yet it is (c, 0.875)-applicable to u. Under special conditions as, e.g., serious time constraints, applicability ofriou in degree (c, 0.875) may be viewed as sufficient or, in other words, the quality of u may be viewed as satisfactory if the gain expected surpass the possible loss. Observe that r € apl^tz iff ix G /^c/||^r||t, where Ipu is the identity mapping on pU. A natural generalization of (14.5) is obtained by taking a mapping /$ : pU H-^ pU instead of /pt/, where $ is a possibly empty list of parameters. For instance, /$ may be an approximation mapping. In this way, we obtain a family of mappings aplf^ : U H-^ pRUL, parameterized hyteTi and $, and such that for any r and u,
re^vi'u'HuehWPrWf The family is partially ordered by C, where for any ti,t2
(14.6) ETI,
aplff E apl/^ ^^ Wu e C/.aplf> C aplf>.
(14.7)
The general notion of rough applicability, introduced above, comprises a number of particular cases, including the fundamental one. In fact, apl^ = apl^*^^. Next, e.g., r e aplj^^i^ iff ix € low| |Pr | |t iff ^ is ^-applicable to every object similar to u. In the same vein, r € apl^^^u iff u € upp||Pr||t iff r is ^-applicable to some object similar to u. We can also say that r is certainly ^-applicable and possibly t-applicable to u, respectively. In the variable-precision case, for / = pos^ and s e [0,1], r e aplj tx iff w € pos^llPrll* iff r is i-applicable to a sufficiently large part of Fu, where sufficiency is determined by 5. In a more sophisticated case, where / = pos^ o low (o denotes the concatenation of mappings), r e apl/zz iff u G pos^lowUP^Hf iff A^(Pu, low||Pr||t) > 5 iff r is certainly ^-applicable to a sufficiently large part of Fu, where sufficiency is determined by s. Etc. For t = (^1,^2) ^ [0,1]^, the various forms of rough t-applicability are determined up to granularity of information. An object u is merely viewed as a representative of the granule of information Fu drawn to it. More precisely, a rule r may practically be treated as applicable to u even if no premise is, in fact, satisfied for u.
14 Towards Rough Applicability of Rules
209
It is enough that premises are satisfiable for a sufficiently large part of the set of objects similar to u. If used reasonably, this feature may be advantageous in the case of missing data. The very idea is intensified in the case of pos^. Then, r is t-applicable to u in the sense of pos^ iff it is t-applicable to a sufficiently large part of the set of objects similar to u, where sufficiency is determined by s. This form of applicability may be helpful in classification of u if we cannot check whether or not r is applicable to u and, on the other hand, it is known that r is applicable to a sufficiently large part of the set of objects similar to u. Next, rough applicability in the sense of low is useful in modeling of such situations, where the stress is laid on the equal treatment of all objects forming a granule of information. A form of stability of rules may be defined, where r is called stable in a sense considered if for every u, r is applicable to It iff r is applicable to all objects similar to u in the very sense. Example 2. Consider a situation of decision making whether or not to support a student financially. In this case, objects of the AS are students applying for a bursary. Suppose that some data concerning a person u is missing which makes decision rules inapplicable to u in the classical, crisp sense. For simplicity, assume that r would be the only decision rule applicable to u unless the data were missing. Let a be the premise of r of which we cannot be sure if it is satisfied for u or not. Suppose that for 80% of students whose cases are similar to the case of u, all premises of r are satisfied. Then, to the advantage of u, we may view r as practically applicable to u. Formally, r is (0.8, l)-applicable to u. Additionally, let r be (0.8,0.9)-applicable to 65% of objects similar to u. In sum, r is (0.8,0.9)-applicable to u in the sense of The second (and last) generalization of the fundamental notion of rough aplicability, proposed here, consists in extension of applicability of a rule to an object to the case of applicability of a rule to a set of objects. In the classical case, a rule is applicable to a set of objects x iff it is applicable to each element of x. For any a, let {a)'^ denote the tuple consisting of n copies of a, and (a)^ be abbreviated by (a). For arbitrary tuples s,t, st denotes their concatenation. Next, if t is at least a pair of items (i.e., an n-tuple for n > 2), then
(14.8)
Thus, a family of mappings Apl^ : pU \-^ pRUL is obtained, parameterized by t eT2 and partially ordered by a relation C, where for any ^1, ^2 G T2, Apl,^ C Apl,^ ^ ' Vx C UApk^x
C Apl,^x.
(14.9)
The graded applicability, introduced above, is an exemplary notion of rough applicability of a rule to a complex object which is a set of objects of the underlying
210
Anna Gomoliiiska
approximation space M in our case. This notion may be useful in modeling of a number of situations. Three such cases are sketched below. Example 3. Suppose that objects of an AS are questions which may be subject to negotiation. Then, sets of objects are packets of such questions and represent possible negotiation problems. Let the rules considered be decision rules on how to solve particular problems. We can rank decision rules depending, among others, on their graded applicability to given negotiation problems. The more questions solved positively by a rule, the better is the rule. Example 4. Let objects of an AS be school students in a town. A conmiittee constructs rules to rank classes of students in order to award a prize to the best class. They search for the most universal rule(s) satisfying some additional conditions. A rule r is viewed as more universal than a rule r' iff r applies in a considered sense to larger parts of given classes of students than r' does. Example 5. In a factory, every lot of products is tested whether or not the articles comply with a norm r or, in other words, how far the norm r is applicable in some considered sense to every lot of products. In this case, products are objects of an AS and lots of products are the complex objects considered. A lot x passes the test if a sufficiently large part of x complies with r or, in other words, if r applies to x in a sufficient degree. Below, we present a number of properties of the forms of applicability of rules defined earlier. For natural numbers n > l , z = l , . . . , n , non-empty partially ordered sets {xi, pU, s e [0,1], and t, t' 6 Ti, we have: (a) Where / = pos^, aplfix = Apl^^^^Pw. (6) ap4^- = a p i r ^ and a p l ^ ^ = {}{^viu
\ f = posj.
s>0
(c) If Fu ~ Fu' and g G {upp o /$,pos5 o /$}? then apl^^z = apl^-u' and aplf Ti = aplf u'. (d) If /$ is monotone and t ^t\
then aplj^f C. aplf^.
(e) apll^- C apl, C apl^^^.
(/) Apli(i)^ = n^^P^*^ \uex}. Proof. We prove (d), (f) only. For (d) consider a rule r and assume (dl) /$ is monotone and (d2) t :< t'. First, we show (d3) \\Pr\\t' ^ WPrWt- Consider the non-trivial case only, where nit,nit' ^ c. Assume that u e WPrWr- Then
14 Towards Rough Applicability of Rules
211
(d4) K*{Pr, I^^UitO > ^^2^' by the definition of graded meaning. Observe that for any formula a, if K{ru, \\a\\) > 7rit\ then K{ru, \\a\\) > nit by (d2). Hence, Hint' Q lulTTit' As a consequence, K*{Pr,\u\^^t') < K,*{Pr,\u\^^t) by (A3). Hence, A^*(P^, \u\^,t) > ^2t' > TTS^ by (d2), (d4). Thus, u G \\Pr\\t by the definition of graded meaning. In the sequel, /$||Pr||t' Q /$||Pr||t by (dl), (d3). Hence, r £ apl/^^ implies r G aplf'^ by the definition of graded applicability in the sense of /$. Incase (f), for any rule r,r e Apl^^^^x iff x C ||Pr||t iff Vii G x.u G \\Pr\\t iff Wu G x.r G apl^w iff r G p|{apl^w \u G x}. D Let us briefly comment the results. By (a), rough applicability of a rule to u in the sense of pos^ and the graded applicability of a rule to Fu coincide, (b) is a direct consequence of the properties of approximation mappings, (c) states that the fundamental notion of rough applicability as well as the graded forms of applicability in the sense of uppo/$ and pos^o/$ are determined up to granulation of information. By (d), ift:
then Apl^, C Apl^.
(/) Apl(i)5/ E Apl(^„),, E Apl(o)3(g) f l f l Apl,a: = Apl(i)3[/ = Apl(,,i,i)C/ = { r G R U L | | | P , | | = C / } . xCUteT2
{h) If Pr C Pr' and 7r2t = 1, then r' G Apl^x implies r G Apl^x. (i) If 3a G Pr.||a||7rit = 0, 7r2t = 1, and TTS* > 0, then r G Apl^x iff a: = 0. (j) If x' n llPrlUt = 0, then r G Apl^(x U x') implies r G Apl^x and r G Apl^x impUes r G Apl^(a: — x'). (fc) If x' C ||Pr||
212
Anna Gomolinska
then Fu C ||a||. Since u G Fu, it holds u |= a as required. As a consequence, (g3) \u\i C \u\. In the next step, we prove (g4) ||Pr||(i,i) = t/ iff ||Pr|| = U (recall that \\Pr\\ = ||-Pr||(c,i))- "=^" Assume ||Pr||(i,i) = U, Hence, for every object u, Pr C \u\i by the definition of (1, l)-meaning. In virtue of (g3), Pr C \u\. Hence ||Pr|| = U by the definition of meaning. "<=" Assume \\Pr\\ — U. Hence, for every object u, Pr C \u\ by the definition of meaning. In other words, Vu € f/.Va e Pr.u e \\a\l i.e., Va G Pr.||<^|| = U, Hence, \/u G ^.Va G Pr.Fu C ||Q;||. Thus, Vt/ G C^.Va G Pr-^ t=i oi by the definition of ^=i, i.e., Wu G C/.Va G Pr.a G |^x|i, i.e., Vu G f/.Pr S l^li- Hence, ||Pr||(i,i) = ^ by the definition of (1, l)-meaning. By (gl), (g2), and (g4), it holds that (g5) Apl^ipC/ = Apl(c,i,i)C^- Observe that (g6) for any x C U, f]{Ap\^x \ t G T2} = Apl^^^sa: by (e),'(f). Next, we show that (gl) fKApl^i^sx \ x C U} = Ap^^^^sU. "C" is obvious. To prove " 2 ' \ consider a rule r G Apl(i)3{7. By the definition of (1,1,1)applicability, U C ||Pr||(i,i). Hence, for any x C U, x C ||Pr||(i,i). Again by the definition of (1,1,1)-applicability, r G Apl^i^ao: for every set of objects x. Hence, r G n{Apl(i)3a: \ x C U}. Thus, CiiApl^x \ x CU At GT2} = Ap\^yU by (g6), (g7). Hence, (g) finally follows by (g2), (g5). D Some comments can be handy. First, as directly follows from the definitions of applicability, the (c, 1,1)-applicability is the same as the classical applicability. Next, if 7r2t = TTst = 1, then a rule r is ^-applicable to a set of objects x iff every premise of r is
14 Towards Rough Applicability of Rules
213
14.5 Summary The aim of this paper was to further analyze rough applicability of rules. We generaUzed the fundamental concept of graded applicability in two ways, where, nevertheless, all premises of a rule were treated on equal terms. In the future, rules with premises partitioned into classes will be of interest. Applicability is only one aspect of application of rules. An analysis of the results of rough application and the question of rough quality of rules are of importance as well. The latter problem is closely related to propagation of uncertainty. Obviously, not all concepts of rough applicability can prove useful from the practical point of view. Nevertheless, some of them deserve our attention as they seem to describe formally certain forms of soft applicability of rules, observed in real life situations.
References 1. Gomolinska A (2002) A comparative study of some generalized rough approximations. Fundamenta Informaticae 51(1-2): 103-119 2. Gomolinska A (2004) A graded meaning of formulas in approximation spaces. Fundamenta Informaticae 60:159-172 3. Gomolinska A (2004) A graded applicability of rules. In: Tsumoto S, Slowiriski R, Komorowski J, Grzymala-Busse J W (eds) Proc 4th Int Conf Rough Sets and Current Trends in Computing (RSCTC'2004), Uppsala, Sweden, 2004, June 1-5, LNAI 3066. Springer, Berlin Heidelberg, pp 213-218 4. Le^niewski S (1916) Foundations of the general set theory 1 (in Polish). Works of the Polish Scientific Circle 2 Moscow Also in: Surma S J et al (eds) (1992) Stanislaw Lesniewski collected works. Kluwer Dordrecht, pp 128-173 5. Lukasiewicz J (1913) Die logischen Grundlagen der Wahrscheinlichkeitsrechnung. Krakow Also in: Borkowski L (ed) (1970) Jan Lukasiewicz - Selected works. North Holland Amsterdam London, Polish Sci Publ Warsaw, pp 16-63 6. Pawlak Z (1981) Information systems - theoretical foundations. Information Systems 6(3):205-218 7. Pawlak Z (1982) Rough sets. Int J Computer and Information Sciences 11:341-356 8. Pawlak Z (1991) Rough sets - Theoretical aspects of reasoning about data. Kluwer Dordrecht 9. Peters J F, Skowron A, Stepaniuk J, Ramanna S (2002) Towards an ontology of approximate reason. Fundamenta Informaticae 51(1-2): 157-173 10. Polkowski L, Skowron A (1996) Rough mereology: A new paradigm for approximate reasoning. Int J Approximated Reasoning 15(4):333-365 11. Polkowski L, Skowron A (1999) Towards adaptive calculus of granules. In: Zadeh L A, Kacprzyk J (eds) Computing with words in information/intelligent systems 1. Physica Heidelberg, pp 201-228 12. Polkowski L, Skowron A (2001) Rough mereological calculi of granules: A rough set approach to computation. J Comput Intelligence 17(3):472-492 13. Skowron A, Stepaniuk J (1994) Generalized approximation spaces. In: Proc 3rd Int Workshop on Rough Sets and Soft Computing, San Jose, USA, 1994, November 10-12, pp 156-163
214
Anna Gomolinska
14. Skowron A, Stepaniuk J (1996) Tolerance approximation spaces. Fundamenta Informaticae 27:245-253 15. Skowron A, Stepaniuk J, Peters J F (2003) Towards discovery of relevant patterns from parameterized schemes of information granule construction. In: Inuiguchi M, Hirano S, Tsumoto S (eds) Rough set theory and granular computing. Springer Berlin Heidelberg, pp 97-108 16. Stepaniuk J (2001) Knowledge discovery by application of rough set models. In: Polkowski L, Tsumoto S, Lin T Y (eds) Rough set methods and applications: New developments in knowledge discovery in information systems. Physica Heidelberg New York, pp 137-233 17. Zadeh L A (1973) Outline of a new approach to the analysis of complex system and decision processes. IEEE Trans on Systems, Man, and Cybernetics 3:28^4 18. Ziarko W (1993) Variable precision rough set model. J Computer and System Sciences 46(l):39-59 19. Ziarko W (2001) Probabilistic decision tables in the variable precision rough set model. J Comput Intelligence 17(3):593-603
15 On the Computer-Assisted Reasoning about Rough Sets Adam Grabowski * Institute of Mathematics, University of Bialystok, Akademicka 2, 15-267 Bialystok, Poland, adain@math. uwb. edu. p i Summary. The paper presents some of the issues concerning a formal description of rough sets. We require the indiscemibility relation to be a tolerance of the carrier, not an equivalence relation, as in the Pawlak's classical approach. As a tool for formalization we use the Mizar system, which is equipped with the largest formalized library of mathematical facts. This uniform and computer-checked for correctness framework seems to present a satisfactory level of generality and may be used by other systems as well as it is easily readable for humans. Key words: tolerance approximation spaces, formalized mathematics, automated reasoning, knowledge representation
15.1 Introduction Do we need another formal approach to the rough approximations? The formalizations are different and the generalizations go in different directions. Our approach differs from all those previously known (e.g. [2]) because we require the encoding to be machine-checkable. The idea of using a computer as a math-assistant is not new, also in the rough set conmiunity. Well-known existing systems (Rosetta, RSES, etc.) are a good example of how automatic tools may be used with the developed theory as a background. We would like to discuss the non-KDD approach, i.e. how the (rough set) theory can be machine-formalized. We are also going to propose our formal approach to some basic notions. By a formalization of mathematics we mean the encoding of mathematics in a formal language sufficiently detailed for a computer program to verify the correctness. As two main applications of formalized mathematics we may point out representation, and from this presentation of mathematics, and verification of the correctness of the formalized knowledge. Since it is hard to develop a uniform system which performs well in all stages of automated reasoning, there is a need for a flexible cooperation between various *Many thanks are due to Anna Gomoliiiska for the motivation for this work.
216
Adam Grabowski
computer systems specialized in specific problem domains. Recently, especially under the auspices of the European Union (Mathematical Knowledge Management MKM-Net and Calculemus - Systems for Integrated of Computation and Deduction, to note the most important networks), a number of such experiments with an agent-oriented mechanized reasoning approach were conducted, e.g. MathWeb or the Logic Broker Architecture, just to name a few. More detailed discussion on the topic can be found e.g. in [1]. The feasibility and usefulness of a distributed multiagent system for problem solving in formalized mathematics will depend on flexibility and value of its components. There are several efforts to join proof-checkers, e.g. Mizar with his classical first order logic and ZFC set theory as a base (it has a language relatively close to that used by mathematicians and computer scientists), provers (EQP/Otter is one of the most famous for the solution of the many equational problems with MACE as a tool for finite model generator), and a module translating results into an XML (Extensible Markup Language) style with its growing popularity as (one of the) information interchange standards for industry and academia. Mathematica, Maple or other computer algebra systems can be used for solving ordinary mathematical problems. Highly configured user interface based on the popular GNU editor Emacs may be a good choice not only for programmers, finally e.g. Omega group can provide a broker architecture. Mizar Mathematical Library itself can also be a subject for research - e.g. for knowledge discovery as a large database of mathematical facts. Some works on the using Mizar as a tool for checking computer program for correctness are also known. For some reasons, in this paper we focused mainly on the proof-checking agent. Among systems which are designed for this purpose we may enumerate here: Coq, NuPRL/MetaPRL, Mizar, Isabelle/Isar, HOL, PVS, two of them are declarative (that is enable to write scripts in a style close to the mathematical jargon): Mizar and Isar. We concentrated on building a sufficiently general and flexible framework which should give the opportunity for reusing it by other automated math-assistants and enable further generalizations. Obviously, we also wanted to choose a system which does not force a researcher to write something in a way far from his/her usual mathematical style. In our opinion, the language and typing mechanisms available in the Mizar system serve good for this purpose.^ We would like to describe some of the issues concerned with the first (as we know of) computer-checked formalization of rough set theory basics. The paper is organized as follows. The second section deals with a brief description of the system we have chosen as a basic tool. In the sections 3-6 the fundamental concepts for tolerance approximation spaces were described, rough approximations, rough inclusion predicate, and rough membership functions, respectively. The next section contains the full formalization of a chosen lemma as an example while in the last one we draw some concluding remarks. ^The Mizar project started in the early seventies of the previous century in Flock, Poland, where MSRAS 2004 workshop took place, under the auspices of Plock Scientific Society.
15 On the Computer-Assisted Reasoning about Rough Sets
217
All Mizar examples will be cited in the typewriter font, for lack of space we will not provide full proofs (with the exception in Section 7). They are available both in every Mizar distribution and online from the website of the Mizar project.
15.2 The Mizar System We are going to shed some light on a system we have chosen for the formalization. Its description as well as many useful links are available on the web page of the project [8], so we will focus here on the generalities to make this presentation possibly selfcontained. The Mizar system is based on three programs (agents in some sense): the accommodator (which imports all the necessary notions and theorems from the Mizar Mathematical Library - MML), the verifier (the core of the system, which parses texts and verifies the logical correctness of the reasoning), and the exporter/transferer (which separates reusable knowledge for inclusion in the database and exports it). There are precompiled binaries for Intel platforms: Linux, Solaris and Win32 freely available from the homepage of the project. The most important part of the system however is the large computer-managed database of mathematical facts. As of the time of writing, MML consisted of 834 Mizar articles - over 60 megabytes of texts written by more than 150 authors. This large repository contains 7093 definitions and 36847 theorems covering different disciplines in mathematics and computer science. Since the Mizar type system is based on ZFC set theory with the classical first-order logic, the developed repository is close in some sense to the Bourbaki school. Comparing to the Bourbaki project, MML is developed by many more authors which causes problems with uniformization of the library. The core of it is organized into the encyclopedia-like style, although the original division into articles assigned to authors is kept. The Library Committee of the Association of Mizar Users (which is a non-profit organization), reserves the rights to revise authors' work. Since the system evolves, articles must be kept compatible with it. There is a service, MML Query, which enables advanced article browsing and searching functions. Every Mizar distribution contains full texts, their abstracts (with proofs removed), and the database as well as a collection of proof-enhancing software. All articles are also automatically translated into the ETEX source and are available on the Internet at [8] as the Journal of Formalized Mathematics. Although the Mizar checking software is developed by a small group of progranmiers, MML is open for external developments. All new articles are reviewed by the Library Committee and the accepted ones are included into MML. Fundamental Theorem of Algebra, Birkhoff Variety Theorem, the equivalence of Robbins and Boolean algebras ([5]), Wedderbum Theorem, or Stone representation theorem for Boolean algebras are examples of what is already proved in MML. The projects
218
Adam Grabowski
of formalization of Compendium of continuous lattices^ or the first machine-checked proof of the Jordan Curve Theorem are on their way, to note the most important ones.
15.3 Tolerance Approximation Spaces Following Jarvinen works (e.g. [6]), it can be argued that neither reflexivity, symmetry, nor transitivity are indispensable properties of indiscemibility relations. In our view, we have chosen an approach introduced originally in [11], i.e. we require this relation to be a tolerance. Obviously, we could develop in parallel two views for the notion of an approximation space: the classical and the generalized one, however it could be easily explored by MML software that some theorems are corollaries from the others, and they would be a subject for deleting from the library. This may be treated as a lack of focus of some kind, because some theorems remain true only in a classical version. In our opinion, this framework is sufficiently uniform as forced by the criterion of a generalization level. One of the most important constructors for types used in MML is the notion of a structure. The antecedent of all other structures, which has only one field the carrier, is called 1 - s o r t e d . Inheritance mechanisms and type polymorphism implemented in Mizar allow notions defined for a certain structure to be used also for all its descendants. Out of nearly 100 structures declared in MML we have chosen R e l S t r , that is a relational structure - a carrier together with a relation defined on it, technically named I n t e r n a l R e l . It corresponds to (C/, IND), but the properties of the fields of a structure are usually added to it by the adjectives (attributes), as in the example below. d e f i n i t i o n l e t P be R e l S t r ; a t t r P i s with.equivalence means :: ROUGHS.lidef 2 the I n t e r n a l R e l of P i s Equivalence_Relation of the c a r r i e r of P; a t t r P i s with_tolerance means :: ROUGHS_l:def 3 the I n t e r n a l R e l of P i s Tolerance of the c a r r i e r of P; end;
As we already noticed, Mizar articles can be revised. Actually, the changes are very often. At the beginning of MML development, tolerances and equivalence relations were introduced independently. While the Mizar language evolved to enable more inheritance mechanisms working, equivalence relations became tolerances in MML thanks to the distribution of their types into adjectives. After this revision tolerances are defined in MML as reflexive and symmetric relations. If transitivity is added, they turn into equivalence relations, as usual. ^Mizar formalization is acknowledged in the revised edition of the Compendium, issued as Continuous lattices and domains by G. Gierz et al., Cambridge, 2003.
15 On the Computer-Assisted Reasoning about Rough Sets
219
d e f i n i t i o n l e t X be s e t ; mode Tolerance of X i s t o t a l r e f l e x i v e symmetric Relation of X; end;
The notion of a mode as a basic constructor for types will be explained later in detail. After we prepared the necessary formal apparatus, we may construct an example of a tolerance approximation space and introduce appropriate mode. definition mode Tolerance_Space i s with_tolerance non empty RelStr; end;
In this manner, if we consider a subset of a given tolerance (or approximation) space, the associated tolerance (or equivalence, respectively) relation is also fixed as a hidden argument. Thanks to the attributes we can force structure properties, but also classify objects of other types. This is the way crisp and rough subsets of a given tolerance space are defined. d e f i n i t i o n l e t A be Tolerance_Space; l e t X be Subset of A; a t t r X i s rough means :: ROUGHS.l:def 7 BndAp X <> { } ; end;
The functor BndAp is just a set-theoretical difference between the upper and the lower approximation (i.e. boundary) of X described in the next section thoroughly. notation l e t A be Tolerance.Space; l e t X be Subset of A; antonym X i s exact for X i s rough; end;
Antonyms and synonyms may be introduced if the author would prefer to use his own notation or lexicon for yet defined notion. The use of this mechanism however is controlled by the people responsible for the library in order to avoid the lexical overloading of MML.
15.4 Rough Approximations The basic ideas of RST deal with situations in which the objects of a certain universe can be identified only within the limits determined by the knowledge represented by a given indiscemibility relation (that is, by a internal relation of a relational structure). Regardless which approach we are claiming, the key notion of RST is the notion of an approximation. A lower approximation of a set X consists of objects which are surely (w.r.t. indiscemibility relation) in X. Similarly, the upper one extends the lower for the objects which are possibly in X. Formally, we have XR = {X&U: [X\R C X}. Its translation to Mizar is quite similar:
220
Adam Grabowski
definition let A be Tolerance_Space, X be Subset of A; func LAp X -> Subset of A equals :: ROUGHS.l:def 4 { X where x is Element of A : Class (the InternalRel of A, x) c= X }; end;
On the right hand side of an arrow " - > " the so-called mother type of a functor is given. Because in a variant of ZF set theory types of all objects expand to the type s e t (except Mizar structures which are treated in a different way), the user may drop this part of a definition not to restrict its type. We wanted Mizar to understand automatically that approximations yield subsets of an approximation space. For uniformity purposes, we used notation C l a s s (R, x) instead of originally introduced in MML n e i g h b o u r h o o d (x, R) - even if we dealt with tolerances, not equivalence relations. Because of implemented inheritance mechanisms and adjectives it worked surprisingly well. The Mizar articles are plain ASCII files, so some usual (often close to its ETgX equivalents) abbreviations are claimed: "c=" stands for the set-theoretical inclusion, " i n " for G," {}" for 0, " \ / " and "/ \ " for the union and the intersection of sets, respectively. The double colon starts a comment, while semicolon is a delimiter for a sentence. Another important construction in the Mizar language which we extensively used, was cluster, that is a collection of attributes. There are three kinds of cluster registrations: •
existential, because in Mizar all types are required to be non-empty, so the existence of the object which satisfies all these properties has to be proved. We needed to construct an example of an approximation space; r e g i s t r a t i o n l e t A be non diagonal Approximation.Space; cluster rough Subset of A; existence; end;
Considered approximation space A which appear in the locus (argument of a definition) have to be non d i a g o n a l . If A will be diagonal, i.e. if its indiscemibility relation will be included in the identity relation, therefore all subsets of A will become crisp with no possibility for the construction of a rough subset. • functorial, i.e. the involved functor has certain properties, used e.g. to ensure that lower and upper approximations are exact (see the example below); r e g i s t r a t i o n l e t A be Approximation_Space, X be Subset of A; cluster LAp X -> exact; coherence; end;
•
Functorial clusters are most frequent due to a big number of introduced functors (5484 in MML). The possibility of adding of an adjective to the type of an object is also useful (e.g. often we force that an object is non-empty in this way). conditional stating e.g. that all approximation spaces are tolerance spaces.
15 On the Computer-Assisted Reasoning about Rough Sets
221
registration cluster with.equivalence -> with^tolerance RelStr; coherence; end;
This kind of a cluster is relatively rare (see Table 15.1) because of a strong type expansion mechanism. Table 15.1 contains number of clusters of all kinds comparing to those introduced in [4]. Table 15.1. Number of clusters in MML vs. RST development type
in MML in [4]
existential functorial conditional
1501 3181 1131
7 9 7
total
5813
23
As it sometimes happens among other theories (compare e.g. the construction of fuzzy sets), paradoxically the notion of a rough set is not the central point of RST as a whole. Rough sets are in fact classes of abstraction w.r.t. rough equality of sets and their formal treatment varies. Majority of authors (w^ith Pawlak in [9] for instance) define a rough set as an underlying class of abstraction (as noted above), but some of them (cf. [2]) claim for simplicity that a rough set is an ordered pair containing the lower and the upper limit of fluctuation of the argument X. These two approaches are not equivalent, and we decided to define a rough set also in the latter sense. d e f i n i t i o n l e t A be Approximation.Space; l e t X be Subset of A; mode RoughSet of X means :: ROUGHS_l:def 8 i t = [LAp X, UAp X]; end;
What should be recalled here, there are so-called modes in the Mizar language which correspond with the notion of a type. To properly define a mode, one should only prove its existence. As it can be easily observed, because the above definiens determines a unique object for every subset X of a fixed approximation space A, this can be reformulated as a functor definition in the Mizar language. If both approximations coincide, the notion collapses and the resulting set is exact, i.e. a set in the classical sense. Unfortunately, in the above mentioned approach, this is not the case. In [4] we did not use this notion in fact, but we have chosen some other solution which describes rough sets more effectively, i.e. by attributes.
222
Adam Grabowski
15.5 Rough Inclusions and Equality Now we are going to briefly present the fundamental predicate for the rough set theory: rough equahty predicate (the lower version is cited below, while the dual upper equality - notation is "= "", and assumes the equality of upper approximations of sets). d e f i n i t i o n l e t A be Tolerance_Space, X, Y be Subset of A; pred X _= Y means :: ROUGHS^lidef 14 LAp X « LAp Y; reflexivity; symmetry; end;
Two additional properties (reflexivity and symmetry) were added with their trivial proofs: e.g. the first one forces the checker to accept that X ^^ X without any justification. In Mizar it is also possible to introduce the so-called redefinitions, that is to give another definiens, if equivalence of it and the original one can be proved (in the case above, the rough equality can be defined e.g. as a conjunction of two rough inclusions). This mechanism may be also applied to our Mizar definition of a rough set generated by a subset of approximation space - as an ordered pair of its lower and upper approximation, not as classes of abstraction w.r.t. rough equality relation.
15.6 Membership Functions Employing the notion of indiscemibility the concept of a membership fiinction for rough sets was defined in [10] as
^^^""^ -
\I{x)\
'
where \A\ denotes cardinality of A. Because the original approach deals with equivalence relations, I{x) is equal to [x]/, i.e. an equivalence class of the relation / containing element x. Using tolerances we should write rather x/I instead. Also in Mizar we can choose between C l a s s and n e i g h b o u r h o o d , as we already noted in the fourth section. As it can be expected, for a finite tolerance space A and X which is a subset of it, a function //^ is defined as follows. d e f i n i t i o n l e t A be f i n i t e Tolerance_Space; l e t X be Subset of A; func MemberFimc (X, A) -> Function of the carrier of A, REAL means for X being Element of A holds i t . x = card (X / \ Class (the InternalRel of A, x)) / (card Class (the InternalRel of A, x ) ) ; end;
15 On the Computer-Assisted Reasoning about Rough Sets
223
Actually, the dot " . " stands in MML for the function application, i t in the definiens denotes the defined object. Extensive usage of attributes make formulation of some theorems even simpler (at least, in our opinion) than in the natural language, because it enables us e.g. to state that JJ,-^ is equal to the characteristic function xx (theorem 44 from [4]) for a discrete finite approximation space A (that is, with the identity as an indiscemibility relation) in the following way: theorem :: ROUGHS.1:44 for A being discrete finite Approximation.Space, X being Subset of A holds
15.7 Example of the Formalization We formalized 19 definitions, 61 theorems with proofs, and 23 cluster registrations in [4]. This representation in Mizar of the rough set theory basics is 1771 lines long (the length of a line is restricted to 80 characters), which takes 54855 bytes of text. In this section we are going to show one chosen lemma together with its proof given in [6] and its full formalization in the Mizar language."^ Lemma 9. Let Re Tol{U) and X,Y XRUYR
=
CU. IfX is R-definable, then {XUY)R.
Proof. It is obvious that XRUYR C {X U Y)R. Let x e {X U Y)R, i.e., x/R C X U y. If x/R n X 7^ 0, then x e X^ md x £ XR because X is i?-definable. If x/R D X = 0, then necessarily x/R C Y and x e YR. D Hence, in both cases x e XR U YR . What is worth noticing, the attribute e x a c t (sometimes called i?-definable in the literature) has been defined earlier to describe sets with their lower and upper approximations equal (that is, crisp). Defining new synonyms and redefinitions however is also possible here. One of the features of the Mizar language which reflects closely mathematical vernacular is reasoning per cases (subsequent cases are marked by the keyword s u p p o s e ) . The references (after by) for XB00LE_1 (which is identifier of the file containing theorems about Boolean properties of sets) take external theorems from MML as premises, all other labels are local. Obviously, some parts of proofs in the literature may be hard for machine translation (compare "It is obvious that..." above), other may depend on the checker architecture (especially if an author would like to drive remaining part of his/her proof analogously to the earlier one). However, the choice of the above example is rather accidental.
'^In fact, to keep this presentation compact, we dropped dual conjunct of this lemma.
224
Adam Grabowski
theorem Lemma_9: for A X Y LAp proof let
being Tolerance_Space, being exact Subset of A, being Subset of A holds X \/ LAp Y = LAp (X \/ Y)
A be Tolerance^Space, X be exact Subset of A, Y be Subset of A; thus LAp X \/ LAp Y c= LAp (X \/ Y) by Th26; let X be set; assume Al: X in LAp (X \/ Y ) ; then A2: Class (the InternalRel of A, x) c= X \/ Y by Th8; A3: LAp X c= LAp X \/ LAp Y & LAp Y c= LAp X \/ LAp Y by XB00LE_1:7; per cases; suppose Class (the InternalRel of A, x) meets X; then X in UAp X by Al, Thll; then X in LAp X by Thl5; hence x in LAp X \/ LAp Y by A3; suppose Class (the InternalRel of A, x) misses X; then Class (the InternalRel of A, x) c^ Y by A2, XB00LE_1:73; then X in LAp Y by Al, Th9; hence x in LAp X \/ LAp Y by A3; end;
Even though Mizar source itself is not especially hard to read for a mathematician, some translation services are available. The final version converted automatically back to the natural language looks like below: For every tolerance space A, every exact subset X of A, and every subset Y of A holds LAp(X) U LAp(y) = LAp(X U Y). The name de Bruijn factor is claimed by automated reasoning researchers to describe "loss factor" between the size of an ordinary mathematical exposition and its full formal translation inside a computer. However in Wiedijk's considerations and Mizar examples contained in [12] it is equal to four (although in the sixties of the previous century de Bruijn assumed it to be about ten times bigger), in our case two is a good upper approximation.
15.8 Conclusions The purpose of our work was to develop a uniform formalization of basic notions of rough set theory. For lack of space we concentrated in this outline mainly on the notions of rough approximations and a membership function. Following [6] and [10], we formalized in [4] properties of rough approximations and membership functions
15 On the Computer-Assisted Reasoning about Rough Sets
225
based on tolerances, rough inclusion and equality, rough set notion and associated basic properties. The adjectives and type modifiers mechanisms available in the Mizar type theory made our work quite feasible. Even if we take into account that the transitivity was dropped from the classical indiscemibility relation treated as equivalence relation, further generalizations (e.g. variable precision model originated from [14]) are still possible. It is important that by including the formalization of rough sets into MML we made it usable for a number of automated deduction tools and other digital repositories. The Mizar system closely cooperates with OMDOC system to share its mathematical library via a format close to XML. Works concerning exchange of results between automatic theorem provers (e.g. Otter) and Mizar (already resulted in successful solution of Robbins problem) are on their way. Formal concept analysis, as well as fuzzy set theory is also well developed in MML. Successful experiments with theory merging mechanisms implemented in Mizar (e.g. to describe topological groups or continuous lattices) are quite promising to go further with rough concept analysis as defined in [7] or to do the machine encoding of the connections between fuzzy set theory and rough sets. We also started with the formalization of a paper [3], which focuses upon a comparison of some generalized rough approximations of sets. We hope that much more interconnections can be discovered automatically. Rough set researchers could be also assisted in searching in a distributed library of facts for analogies between rough sets and other domains. Eventually, it could be helpful within the rough set domain itself, thanks to e.g. proof restructurization utilities available in Mizar system itself - as well as other specialized tools. One the most useful at this stage is discovering irrelevant assumptions of theorems and lemmas. Comparatively low de Bruijn factor allows us to say that the Mizar system seems to be effective and the library is quite well developed to go further with the encoding of the rough set theory. Moreover, the tools which automatically translate the Mizar articles back into the ET^X source close to the mathematical vernacular are available. This makes our development not only machine- but also human-readable.
References 1. Ch. Benzmiiller, M. Jamnik, M. Kerber, V. Sorge, Agent-based mathematical reasoning, Electronic Notes in Theoretical Computer Science, 23(3), 1999. 2. E. Bryniarski, Formal conception of rough sets, Fundamenta Informaticae, 27(2-3), 1996, pp. 109-136. 3. A. Gomolinska. A comparative study of some generalized rough approximations, Fundamenta Informaticae, 51(1-2), 2002, pp. 103-119. 4. A. Grabowski, Basic properties of rough sets and rough membership function, to appear in Formalized Mathematics, 12(1), 2004, available at h t t p : / / m i z a r . org/JFM/Voll5 / r o u g h s . l . html.
226
Adam Grabowski
5. A. Grabowski, Robbins algebras vs. Boolean algebras, in Proceedings of Mathematical Knowledge Management Conference, Linz, Austria, 2001, available at http://www.emis.de/proceedings/MKM2001/. 6. J. Jarvinen, Approximations and rough sets based on tolerances, in: W. Ziarko, Y. Yao (eds.). Proceedings of RSCTC 2000, LNAI2005, Springer, 2001, pp. 182-189. 7. R. E. Kent, Rough concept analysis: a synthesis of rough sets andformal concept analysis, Fundamenta Informaticae, 27(2-3), 1996, pp. 169-181. 8. The Mizar Home Page, h t t p : / / m i z a r . o r g . 9. Z. Pawlak, Rough sets, International Journal of Information and Computer Science, 11(5), 1982, pp. 341-356. 10. Z. Pawlak, A. Skowron, Rough membership functions, in: R. R. Yaeger, M. Fedrizzi, and J. Kacprzyk (eds.), Advances in the Dempster-Shafer Theory of Evidence, Wiley, New York, 1994, pp. 251-271. 11. A. Skowron, J. Stepaniuk, Tolerance approximation spaces, Fundamenta Informaticae, 27(2-3), 1996, pp. 245-253. 12. F. Wiedijk, The de Bruijnfactor, h t t p : / /www. c s . k u n . n l / ~ f r e e k / f a c t o r / . 13. L. Zadeh, Fuzzy sets, Information and Control, 8, 1965, pp. 338-353. 14. W. Ziarko, Variable precision rough sets model. Journal of Computer and System Sciences, 46(1), 1993, pp. 39-59.
16 Similarity-Based Data Reduction and Classification Gongde Guo^'^, Hui Wang\ David Bell^, and Zhining Liao^ ^ School of Computing and Mathematics, University of Ulster, BT37 OQB, UK { G . G U O , H.Wang, Z . L i a o } @ u l s t e r . a c . u k ^ School of Computer Science, Queen's University Belfast, BT7 INN, UK [email protected] ^ School of Computer and Information Science, Fujian University of Technology Fuzhou, 350014, China Summary. The ^-Nearest-Neighbors (^NN) is a simple but effective method for classification. The major drawbacks with respect to ^NN are (1) low efficiency and (2) dependence on the parameter k. In this paper, we propose a novel similarity-based data reduction method and several variations aimed at overcoming these shortcomings. Our method constructs a similarity-based model for the data, which replaces the data to serve as the basis of classification. The value of k is automatically determined, is varied in terms of local data distribution, and is optimal in terms of classification accuracy. The construction of the model significantly reduces the number of data for learning, thus making classification faster. Experiments conducted on some public data sets show that the proposed methods compare well with other data reduction methods in both efficiency and effectiveness. Key words: data reduction, classification, /:-Nearest-Neighbors
16.1 Introduction The ^-Nearest-Neighbors (^NN) is a non-parametric classification method, which is simple but effective in many cases [6]. For an instance dt to be classified, its k nearest neighbors are retrieved, and this forms a neighborhood of dt. Majority voting among the instances in the neighborhood is generally used to decide the classification for dt with or without consideration of distance-based weighting. In contrast to its conceptual simplicity, the A:NN method performs as well as any other possible classifier when applied to non-trivial problems. Over the last 50 years, this simple classification method has been intensively used in a broad range of applications such as medical diagnosis, text categorization [9], pattern recognition, data mining, and e-commerce etc. However, to apply kNN we need to choose an appropriate value for k, and the success of classification is very much dependent on this value. In a sense, the kNN method is biased by k. There are many ways of choosing the k value, but a simple one is to run the algorithm many times with different k values and choose the one with the best performance, but this is not a pragmatic method in real applications.
228
Gongde Guo, Hui Wang, David Bell, and Zhining Liao
In order for A:NN to be less dependent on the choice of k, we proposed to look at multiple sets of nearest neighbors rather than just one set of A:-nearest neighbors [12]. The proposed formalism is based on contextual probability, and the idea is to aggregate the support of multiple sets of nearest neighbors for various classes to give a more reliable support value, which better reveals the true class of dt. As it aimed at improving classification accuracy and alleviating the dependence on k, the efficiency of the method in its basic form is worse than ^NN, though it is indeed less dependent on k and is able to achieve classification performance close to that for the best k. From the point of view of its implementation, the A:NN method consists of a search of pre-labelled instances given a particular distance definition to find k nearest neighbors for each new instance. If the number of instances available is very large, the computational burden for ^NN method is unbearable. This drawback prohibits it in many applications such as dynamic web mining for a large repository. These drawbacks of A:NN method motivate us to find a way of instances reduction which only chooses a few representatives to be stored for use for classification in order to improve efficiency whilst both preserving its classification accuracy and alleviating the dependence on k.
16.2 Related work Many researchers have addressed the problem of training set size reduction. Hart [7] made one of the first attempts to reduce the size of the training set with his Condensed Nearest Neighbor Rule (CNN). His algorithm finds a subset S of the training set T such that every instance of T is closer to an instance of S of the same class than to an instance of 5 of a different class. In this way, the subset S can be used to classify all the instances in T correctly. Ritter et al. [8] extended the condensed NN method in their Selective Nearest Neighbor Rule (SNN) such that every instance of T must be closer to an instance of S of the same class than to any instance of T (instead of S) of a different class. Further, the method ensures a minimal subset satisfying these conditions. Gate [5] introduced the Reduced Nearest Neighbor Rule (RNN). The RNN algorithm starts with S-T and removes each instance from S if such a removal does not cause any other instances in T to be misclassified by the instances remaining in S. Wilson [13] developed the Edited Nearest Neighbor (ENN) algorithm in which S starts out the same as T, and then each instance in S is removed if it does not agree with the majority of its k nearest neighbors. The Repeated ENN (RENN) applies the ENN algorithm repeatedly until all instances remaining have a majority of their neighbors with the same class, which continues to widen the gap between classes and smooth the decision boundary. Tomek [11] extends the ENN with his AUk-NN method of editing. This algorithm works as follows: for i=l to k, flag as bad any instance not classified correctly by its / nearest neighbors. After completing the loop all k times, remove any instances from S flagged as bad. Other methods include ELGrow {Encoding Length Grow), Explore by Cameron-Jones [3], IB1~IB5 by Aha et al. [1][2], Dropl~Drop5, and DEL by Wilson et al. [15] etc. From the experimental results conducted by Wilson et al., the average classification
16 Similarity-Based Data Reduction and Classification
229
accuracy of those methods on reduced data sets is lower than that on the original data sets due to the fact that purely instances selection suffers information loss to some extent. In the next section, we introduce a novel similarity-based data reduction method (SBModel). It is a type of inductive learning methods. The method constructs a similarity-based model for the data by selecting a subset S with some extra information from training set T, which replaces the data to serve as the basis of classification. The model consists of a set of representatives of the training data, as regions in the data space. Based on SBModel, two variations of SBModel called e-SBModel and p-SBModel are also presented which aim at improving both the efficiency and effectiveness of SBModel. The experimental results and a comparison with other methods will be reported in section 16.4.
16.3 Similarity-Based Data Reduction 16.3.1 The Basic Idea of Similarity-Based Data Reduction Looking at Figure 16.1, the Iris data with two features: petal length and petal width is used for demonstration. It contains 150 instances with three classes represented as diamond, square, and triangle respectively, and is plotted in 2-dimensional data space. In Figure 16.1, the horizontal axis is for feature petal length and the vertical axis is for feature petal width.
3 y^
2.5-
N^m(rfi). M^(di)-43
2
^ ^ i v
1.5 •
* A A ^
1 •
y
kCksslI
*
UCl«iss2| |»C1MS3|
0.50C
\W* 2
4
6
8
Fig. 16.1. Data distribution in 2-dimension data space Given a similarity measure, many instances with the same class label are close to each other in many local areas. In each local region, the central instance di looking at Figure 16.1 for example, with some extra information such as Cls{di) - the class label of instance df, Num{di) - the number of instances inside the local region; Sim{di) - the similarity of the furthest instance inside the local region to di, and Rep{di) - 2i representation of di, might be a good representative of this local region. If we take these representatives as a model to represent the whole training set, it will significantly reduce the number of instances for classification, thereby improving efficiency.
230
Gongde Guo, Hui Wang, David Bell, and Zhining Liao
For a new instance to be classified in the classification stage, if it is covered by a representative it will be classified by the class label of this representative. If not, we calculate the distance of the new instance to each representative's nearest boundary and take each representative's nearest boundary as an instance, then classify the new instance in the spirit of kNN. 16.3.2 Terminology and Definitions Before we give more details about the designs of the proposed algorithms, some important terms (or concepts) need to be explicitly defined first. Definition 1. 7. Neighborhood A neighborhood is a term referred to a given instance in data space. A neighborhood of a given instance is defined to be a set of nearest neighbors of this instance. 2. Local Neighborhood A local neighborhood is a neighborhood, which covers the maximal number of instances with the same class label 3. Local €-Neighborhood A local ^-neighborhood is a neighborhood which covers the maximal number of instances in the same class label with allowed e exceptions. 4. Global Neighborhood The global neighborhood is defined to be the largest local neighborhood among a set of local neighborhoods in each cycle of model construction stage. 5. Global £-Neighborhood The global ^-neighborhood is defined to be the largest local e-neighborhood among a set of local e-neighborhoods in each cycle of model construction stage. The global e-neighborhood is defined to be the largest local ^-neighborhood among a set of local ^-neighborhoods in each cycle of model construction stage. With the above definitions, given a training set, each instance has a local neighborhood. Based on these local neighborhoods, the global neighborhood can be obtained. This global neighborhood can be seen as a representative to represent all the instances covered by it. For instances not covered by any representative we repeat the above operation until all the instances have been covered by chosen representatives. All representatives obtained in the model construction process are used for replacing the data and serving as the basis of classification. There are two obvious advantages: (1) we needn't choose a specific k in the sense of A:NN for our method in the model construction process. The number of instances covered by a representative can be seen as an optimal k which is generated automatically in the model construction process, and is different for different representatives; (2) using a list of chosen representatives as a model for classification not only reduces the number of instances for classification, but also significantly improves the efficiency. From this point of view, the proposed method overcomes the two shortcomings of ^NN.
16 Similarity-Based Data Reduction and Classification
231
16.3.3 Modelling and Classification Algorithm Let D be a collection of n class-known instances {di, G?2, • * * ,dn},di e D. For handling heterogeneous applications - those with both numeric and nominal features, we use HVDM distance function (to be presented later) as a default similarity measure to describe the following algorithms. The detailed model construction algorithm of SBModel is described as follows: Step 1: Select a similarity measure and create a similarity matrix for a given training setD. Step 2: Set to 'ungrouped' the tag of all instances. Step 3: For each 'ungrouped' instance, find its local neighborhood. Step 4: Among all the local neighborhoods obtained in step 3, find its global neighborhood Ni, Create a representative {Cls{di),Sim{di),Nu'm{di), Rep{di)) into M to represent all the instances covered by Ni, and then set to 'grouped' the tag of all the instances covered by Ni. Step 5: Repeat step 3 and step 4 until all the instances in the training set have been set to 'grouped'. Step 6: Model M consists of all the representatives collected from the above learning process. In the above algorithm, D represents a given training set, M represents the created model. The elements of representative {Cls{di), Sim{di)^Num{di)^ Rep{di)) respectively represent the class label of di, the HVDM distance of di to the furthest instance among the instances covered by Ni', the number of instances covered by Ni, and a representation of di itself. In step 4, if there are more than one local neighborhood having the same maximal number of neighbors, we choose the one with minimal value of Sim{di), i.e. the one with highest density, as representative. The classification algorithm of SBModel is described as follows: Step 1: For a new instance dt to be classified, calculate its similarity to all representatives in the model M. Step 2: If dt is covered by a representative {Cls{dj),Sim{dj),Num{dj), Rep{dj)), i.e. the HVDM distance of dt to dj is smaller than Sim{dj), dt is classified as the class label of dj. Step 3: If no representative in the model M covers dt, classify dt as the class label of a representative which boundary is closest to dt. The HVDM distance of dt to a representative di's nearest boundary is equal to the difference of the HVDM distance of di to dt minus Sim{di). In an attempt to improve the classification accuracy of SBModel, we implemented two different pruning methods in our SBModel. One method is to remove both the representatives from the model M created by SBModel that only cover a few instances and the relevant instances covered by these representatives from the training set, and then to construct the model again using SBModel from the revised training set. The SBModel algorithm based on this pruning method is called p-SBModel. The model construction algorithm of p-SBModel is described as follows:
232
Gongde Guo, Hui Wang, David Bell, and Zhining Liao
Step 1: For each representative in the model M created by SBModel that only covers a few (a pre-defined threshold by users) instances, remove all the instances covered by this representative from the training set D. Set model M=0, then go to step 2. Step 2: Construct model M from the revised training set D again by using SBModel. Step 3: The final model M consists of all the representatives collected from the above pruning process. The second pruning method modifies the step 3 in the model construction algorithm of SBModel to allow each local neighborhood to cover e (called error tolerance rate) instances with different class label to the majority class label in this neighborhood. This modification integrates the pruning work into the process of model construction. The SBModel algorithm based on this pruning method is called sSBModel. The detailed model construction algorithm of e-SBModel is described as follows: Step 1: Select a similarity measure and create a similarity matrix for a given training setD. Step 2: Set to 'ungrouped' the tag of all instances. Step 3: For each 'ungrouped' instance, find its local ^-neighborhood. Step 4: Among all the local ^-neighborhoods obtained in step 3, find its global sneighborhood Ni. Create a representative {Cls(di),Sim{di),Num{di), Rep{di)) into M to represent all the instances covered by A^^, and then set to 'grouped' the tag of all the instances covered by A^^. Step 5: Repeat step 3 and step 4 until all the instances in the training set have been set to 'grouped'. Step 6: Model M consists of all the representatives collected from the above learning process. The SBModel is a basic algorithm with s=0 (error tolerance rate) and without pruning.
16.4 Experiment and Evaluation 16.4.1 Data Sets To evaluate the SBModel method and its variations, fifteen public data sets have been collected from the UCI machine learning repository for training and testing. Some information about these data sets is listed in Table 16.1. The meaning of the title in each column is follows: NF-Number of Features, NN-Number of Nominal features, NO-Number of Ordinal features, NB-Number of Binary features, NI-Number of Instances, CD-Class Distribution. 16.4.2 Experimental Environment Experiments use the 10-fold cross validation method to evaluate the performance of SBModel and its variations and to compare them with C5.0, A:NN (Voting ^NN), and w^NN (Distance weighted A:NN). We implemented SBModel and its variations, ^NN
16 Similarity-Based Data Reduction and Classification
233
Table 16.1. Some information about the data sets Dataset
NF NN NO NB
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo
14 23 8 9 13 13 19 34 4 6 60 18 16 13 16
NI
CD
4 383:307 6 4 690 232:136 16 7 0 368 268:500 0 8 0 768 0 9 0 214 70:17:76:0:13:9:29 164:139 7 3 303 3 120:150 7 3 270 3 1 12 155 6 32:123 126:225 0 34 0 351 4 0 150 50:50:50 0 145:200 6 0 345 0 97:111 0 60 0 208 0 18 0 846 212:217:218:199 267:168 0 16 435 0 59:71:48 0 13 0 178 16 0 0 90 37:18:3:12:4:7:9
and wA;NN in our own prototype. The C5.0 algorithm used in our experiments were implemented in the Clementine' software package. The experimental results of other editing and condensing algorithms to be compared here are obtained from Wilson's experiments [15]. In voting ^NN, the k neighbors are implicitly assumed to have equal weight in decision, regardless of their distances to the instance x to be classified. It is intuitively appealing to give different weights to the k neighbors based on their distances to jc, with closer neighbors having greater weights. In wA:NN, the k neighbors are assigned to different weights. Let c/ be a distance measure, and xi, 0:2, • • ? ^fc be the A: nearest neighbors of ;c arranged in increasing order of d{xi,x). So xi is the first nearest neighbor of jc. The distance weight Wi for i*^ neighbor Xi is defined as follows:
I
1
if d{xk,x) = d{xi,x)
Instance x is assigned to the class for which the weights of the representatives among the k nearest neighbors sum to the greatest value. In order to handle heterogeneous applications - those with both ordinal and nominal features - we use a heterogeneous distance function HVDM [14] as the distance function in the experiments, which is defined as:
HVDM{^,y) = where the function daixa-, Va) is the distance for feature a and is defined as:
234
Gongde Guo, Hui Wang, David Bell, and Zhining Liao
(
1
if Xa or ya is unknown;
otherwise
vdma{Xa,ya) if a is nominal, else l^a — ?/a|/4cra if a is numeric In above distance function, aa is the standard deviation of the values occurring for feature a in the instances in the training set D, and vdma{xa, Va) is the distance function for nominal features called Value Difference Metric [10]. Using the VDM, the distance between two values Xa and ya of a single feature a is given as: C=l
^'^
^^
where Nx^ is the number of times feature a had value Xa', Nx^,c is the number of times feature a had value Xa and the output class was c; and C is the number of output classes. 16.4.3 Experimental Results [Experiment 1] In this experiment, our goal is to evaluate the basic SBModel algorithm, and to compare its experimental results with C5.0, kNN and wA:NN. So the error tolerance rate is set to 0, k for kNN is set to 1, 3, 5 respectively, k for wA:NN is set to 5, and the allowed minimal number of instances covered by a representative (N) in the final model of SBModel is set to 2, 3, 4, 5 respectively. Under these settings, A comparison of C5.0, SBModel, ^NN, and wkNN in classification accuracy using 10-fold cross validation is presented in Table 16.2. The reduction rate of SBModel is listed in Table 16.3. Note that in Table 16.2 and Table 16.3, N = / means each representative in the final model of SBModel at least covers / instances of the training set. From the experimental results, the average classification accuracy of the proposed SBModel method in its basic form on fifteen training sets is better than C5.0, and is comparable to A:NN and wA:NN. But the SBModel significantly improves the efficiency of A:NN by keeping only 9.19 percent (N=4) of the original instances for classification with only a slight decreasing in accuracy (81.29%) in comparison with A:NN (82.58%) and wkNN (82.34%). [Experiment 2] In this experiment, our goal is to evaluate s-SBModel. So we tune the error tolerance rate £ in a small range from 0 to 4 for each training set, and choose the e for obtaining relatively higher classification accuracy. The experimental results are presented in Table 16.4. Note that in Table 16.4 heading RR for short represents 'Reduction Rate'. From the experimental results in Table 16.4, e-SBModel obtains better performance than C5.0, SBModel, ^NN, and wkNN. Even when A^=5,6:-SBModel still obtains 82.93% classification accuracy which is higher than 79.96% of C5.0, 82.58% of ^NN, and 82.34% of wA:NN (Refer to Table 16.2 for more details). In this situation, £-SBModel only keeps 7.67 percent instances of the original training set for classification, thus significantly improving the efficiency whilst improving the classification accuracy, ofA:NN.
16 Similarity-Based Data Reduction and Classification
235
Table 16.2. A comparison of C5.0, SBModel, ^NN, and w^NN in classification accuracy Dataset
C5.0 N=2 N=3 N=4 N=5 A:NN(1) itNN(3) ^NN(5) witNN(5)
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo
85.5 80.9 76.6 66.3 74.9 75.6 80.7 84.5 92.0 65.8 69.4 67.9 96.1 92.1 91.1
79.42 78.89 70.92 68.10 78.33 76.30 80.67 87.14 95.33 60.00 88.00 68.57 91.30 95.88 96.67
82.75 83.89 73.03 66.67 82.33 80.37 80.67 85.14 94.67 66.47 83.50 71.79 92.17 94.71 95.56
85.22 83.06 74.21 67.62 81.00 80.37 83.33 84.00 96.67 66.47 85.00 69.29 92.17 94.71 95.56
82.46 81.94 72.37 67.62 81.33 77.41 83.33 87.14 95.33 66.47 86.50 71.43 90.87 95.29 95.56
Average
79.96 82.16 82.03 81.29 80.22 81.03
82.25
82.58
82.34
84.20 81.67 75.00 65.24 82.67 80.37 83.33 94.29 96.00 63.53 84.00 65.83 88.70 95.29 92.22
84.64 82.50 74.08 65.24 80.33 80.74 85.33 93.71 96.00 64.41 82.50 65.36 88.70 94.71 92.22
84.78 83.06 74.21 61.43 80.33 80.37 87.33 92.57 96.00 63.82 80.00 63.69 88.70 94.12 88.89
84.63 82.50 74.74 55.71 78.00 77.78 87.33 91.43 96.00 61.76 79.50 62.26 88.70 94.12 88.89
Ta ble 16.3. The reduction rate of SBModel in the firlal model Dataset
N=2
N=3
N=4
N=5
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo
86.81 78.26 80.47 79.44 84.16 84.81 85.81 81.48 95.33 73.62 81.73 80.38 91.38 90.45 91.11
90.43 84.24 86.98 88.32 87.79 88.52 88.39 85.19 96.00 83.48 86.06 87.83 93.53 90.45 92.22
92.17 87.50 89.58 90.19 91.42 90.00 90.32 88.60 96.00 88.70 87.50 91.96 93.97 92.13 92.22
92.46 88.86 91.67 93.93 92.74 91.48 91.61 89.74 96.00 92.75 90.87 93.50 94.40 92.70 93.33
Average
84.35 88.63 90.81 92.40
236
Gongde Guo, Hui Wang, David Bell, and Zhining Liao Table 16.4. The classification accuracy and reduction rate of s-SBModel Dataset
£ N=2 RR
N=3 RR N=4
RR N=5 RR
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere Iris LiverBupa Sonar Vehicle Vote Wine Zoo Average
2 84.93 90.43 84.93 90.43 85.22 92.17 85.51 92.46 1 83.06 78.26 83.06 84.24 82.78 87.50 83.61 88.86 1 74.34 80.47 74.47 86.98 75.13 89.58 75.53 91.67 3 69.52 90.19 69.52 90.19 69.52 90.19 69.05 93.93 4 81.67 92.08 81.67 92.08 81.67 92.08 81.67 92.08 1 80.74 84.81 81.11 88.52 81.85 90.00 81.11 91.48 1 88.00 85.81 89.33 88.39 88.67 90.32 88.67 91.61 1 93.71 81.48 93.71 85.19 92.86 88.60 92.57 89.74 0 96.00 95.33 96.00 96.00 96.00 96.00 96.00 96.00 2 68.53 83.48 68.53 83.48 68.24 88.70 67.94 92.75 2 82.50 86.54 82.50 86.54 82.50 88.94 81.50 90.38 2 66.43 87.83 66.43 87.83 66.55 91.96 66.67 93.50 4 91.74 94.40 91.74 94.40 91.74 94.40 91.74 94.40 0 95.29 90.45 94.71 90.45 94.12 92.13 94.12 92.70 0 92.22 91.11 92.22 92.22 88.89 92.22 88.89 93.33 83.25 87.51 83.33 89.13 83.05 90.99 82.93 92.33
[Experiment 3] In this experiment, our goal is to evaluate p-SBModel. It is a nonparametric classification method which conducts pruning work by removing both the representatives from the model M that only cover 1 instances (it means no any induction being done for this representative) and the relevant instances covered by these representatives from the training set, and then constructing the model from the revised training set again. The experimental results are presented in Table 16.5. Form the experimental results shown in Table 16.5, it is clear that with the same classification accuracy, p-SBModel has a slight higher reduction rate than SBModel on average. The main merit of the /?-SBModel algorithm is that it does not need any parameter to be set in both modelling and classification stages. However, its average classification accuracy is comparable to A:NN and wA:NN. It keeps only 10.13 percent instances of the original training set for classification. [Experiment 4] In this experiment, we compare our SBModel method and its variations with other algorithms in the literature in average classification accuracy and reduction rate. These algorithms to be compared in the experiment include CNN, SNN, IB3, DEL, ENN, RENN, Allk-NN, ELGrow, Explore and Drop3, each of which has been described in section 16.2 in this paper. The experimental results are presented in Figure 16.2. Note that the values of the horizontal axis In Figure 16.2 represent different algorithms, i.e. 1-CNN, 2-SNN, 3-IB3, 4-DEL, 5-Drop3, 6-ENN, 7-RENN, 8-Allk-NN, 9-ELGrow, 10-Explore, 11-SBModel, 12-(^-SBModel), 13-(p-SBModel). From the experimental results, it is clear that the average classification accuracy and reduction rate of our proposed SBModel method and its variations on fifteen data sets are better than other data reduction methods in 10-fold cross validation with exceptions of
16 Similarity-Based Data Reduction and Classification
237
Table 16.5. A comparison of A:NN, SBModel, andp-SBModel Dataset
itNN(5) wfcNN(5) RR SBModel(3) RR p-SBModel RR
LiverBupa Sonar Vehicle Vote Wine Zoo
85.22 83.06 74.21 67.62 81.00 80.37 83.33 84.00 96.67 66.47 85.00 69.29 92.17 94.71 95.56
82.46 81.94 72.37 67.62 81.33 77.41 83.33 87.14 95.33 66.47 86.50 71.43 90.87 95.29 95.56
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
84.64 82.50 74.08 65.24 80.33 80.74 85.33 93.71 96.00 64.41 82.50 65.36 88.70 94.71 92.22
90.43 84.24 86.98 88.32 87.79 88.52 88.39 85.19 96.00 83.48 86.06 87.83 93.53 90.45 92.22
86.23 82.78 73.16 65.24 80.67 81.85 84.67 92.00 95.33 62.94 82.50 67.26 90.00 94.71 91.11
95.22 88.59 87.11 84.58 89.11 91.11 96.77 87.18 95.33 82.03 86.54 83.69 96.98 90.45 93.33
Average
82.58
82.34
0
82.03
88.63
82.03
89.87
Australian Colic Diabetes Glass HCleveland Heart Hepatitis Ionosphere
Iris
m Classification Accuracy
• Reduction Rate
120 -1 100 - §^B'«^^^^^^^^^©^^^l*^^^^^^^N^^^^^^^^^^^^^^^^^^Rf ^F% 80 - s ^ w f 4 i ^ ^ a ^ s ^ : f i i ^ ^ ^ ^ s t i M ^ a B m 60 40 - ^^^ M' fli B l H i ^ • m i : ^ : W H M 1 SHf 11 I H I-Hi ^ l K1Pml^^B" 20 - M^M •SK^ ^B^«*^^Bl!-''~ WUf -mlLkimli-*'" *ffl^S £HfT-^Hr>>^BHC .mm^''-'mm^^'l - • i ^ « i ' « i , ^ @ i •
m ai 1
1
2
3
4
5
6
7
8
9
10
11
12
13
Fig. 16.2. Average classification accuracy and reduction rate ELGrow and Explore in reduction rate. Though ELGrow obtains a highest reduction rate among all the algorithms for comparison, its rather lower classification accuracy counteracts its advantage in reduction rate. Explore seems to be a competitive algorithm with a higher reduction rate and a slight lower classification accuracy in comparison with our proposed SBModel and its variations. Otherwise, Drop3 is the one closest to our algorithms both in classification accuracy and reduction rate.
16,5 Conclusions In this paper we have presented a novel solution for dealing with the shortcomings of ^NN. To overcome the problems of low efficiency and dependency on k, we select a few representatives from training set with some extra information to represent the whole training set. In the selection of each representative we use the optimal but different k, decided automatically according to the local data distribution, to eliminate
238
Gongde Guo, Hui Wang, David Bell, and Zhining Liao
the dependency on k without user's intervention. Experimental results carried out on fifteen public data sets have shown that SBModel and its variations e-SBModel and p-SBModel are quite competitive for classification. Their average classification accuracies on fifteen public sets are better than C5.0 and are comparable with A:NN, and wA:NN. But our proposed SBModel and its variations significantly reduce the number of the instances in the final model for classification with a reduction rate ranging from 88.63% to 92.33%. Moreover, comparing to other reduction techniques, s-SBModel obtains the best performance. It only keeps 7.67 percent instances of the original training set on average for classification whilst improving the classification accuracy of A:NN and wA:NN. It is a good alterative of ^NN in many application areas such as for text categorization and for financial stock market analysis and prediction.
References 1. Aha DW, Kibler k, Albert MK (1991) Instance-Based Learning Algorithms, Machine Learning, 6, pp.37-66. 2. Aha DW (1992) Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms, International Journal of Man-Machine Studies, 36, pp. 267-287. 3. Cameron-Jones, RM (1995) Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing, Proc. of the 8th Australian Joint Conference on Artificial Intelligence, pp. 99-106. 4. Devijver P, Kittler J (1972) Pattern Recognition: A Statistical Approach, Prentice-Hall, Englewood Cliffs, NJ. 5. Gates G (1972) The Reduced Nearest Neighbor Rule, IEEE Transactions on Information Theory, 18, pp. 431-433. 6. Hand D, Mannila H, Smyth P (2001) Principles of Data Mining, The MIT Press. 7. Hart P (1968) The Condensed Nearest Neighbor Rule, IEEE Transactions on Information Theory, 14,515-516. 8. Riter GL, Woodruff HB, Lowry SR et al (1975) An Algorithm for a Selective Nearest Neighbor Decision Rule. IEEE Transactions on Information Theory, 21-6, November, pp. 665-669. 9. Sebastiani F (2002) Machine Learning in Automated Text Categorization, In ACM Computing Surveys, Vol. 34, No. 1, pp. 1-47. 10. StanfiU C, Waltz D (1986) Toward Memory-Based Reasoning Communications of the ACM, 29, pp. 1213-1228. 11. Tomek A (1976) An Experiment with the Edited Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics, 6-6, pp. 448-452. 12. Wang H (2003) Contextual Probability, in Journal of Telecommunications and Information Technology, 4(3):92-97. 13. Wilson DL (1972) Asymptotic Properties of Nearest Neighbor Rules Using Edited Data, IEEE Transactions on Systems, Man, and Cybernetics, 2-3, pp. 408-421. 14. Wilson DR, Martinez TR(1997) Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research (JAIR), 6-1, pp. 1-34. 15. Wilson DR, Martinez TR (2000)Reduction Techniques for Instance-Based Learning Algorithms, Machine Learning, 38-3, pp. 257-286.
17 Decision TVees and Reducts for Distributed Decision Tables Mikhail Ju. Moshkov Institute of Computer Science, University of Silesia 39, B^ziiiska St., Sosnowiec, 41-200, Poland [email protected]
Summary. In the paper greedy algorithms for construction of decision trees and relative reducts for joint decision table generated by distributed decision tables are studied. Two ways for definition of joint decision table are considered: based on the assumption that the universe of joint table is the intersection of universes of distributed tables, and based on the assumption that the universe of joint table is the union of universes of distributed tables. Furthermore, a case is considered when the information about distributed decision tables is given in the form of decision rule systems. Key words: distributed decision tables, decision trees, relative reducts, greedy algorithms
17.1 Introduction In the paper distributed decision tables are investigated which can be useful for the study of multi-agent systems. Let T i , . . . , T^ be decision tables, and { a i , . . . , an} be the set of attributes of these tables. In the paper two questions are considered: how we can define a joint decision table T with attributes a i , . . . , a^ generated by tables Ti,... ,Tm. and how we can construct decision trees and relative reducts for the table T. We study two extreme cases: • •
The universe of the table T is the intersection of universes of tables T i , . . . , Tm. The universe of the table T is the union of universes of tables T i , . . . , T^n.
In reality, we consider more complicated situation when we do not know exactly universes of the tables T i , . . . , r ^ . In this case we must use upper approximations of the table T which are tables containing at least all rows from T. We study two such approximations which are minimal in some sense: the table T^ for the case of the intersection of universes, and the table T^ for the case of the union of universes. We show that in the first case (intersection of universes) even simple problems (for given tables T i , . . . , T^ and a decision tree it is required to recognize is this tree a decision tree for the table T^; for given tables T i , . . . , T^ and a subset of the set
240
Mikhail Ju. Moshkov
{ a i , . . . , ttn} it is required to recognize is this subset a relative reduct for the table T^) are NP-hard. We consider approaches to minimization of decision tree depth and relative reduct cardinality on some subsets of decision trees and relative reducts. In the second case (union of universes) the situation is similar to the situation for single decision table: there exist greedy algorithms for decision tree depth minimization and for relative reduct cardinality minimization which have relatively good bounds on precision. Furthermore, we consider the following problem. Let we have a complicated system consisting of parts Q i , . . . , Qm- For each part Qj the set of normal states is described by a decision rule system Sj. It is required to recognize for each part Qj is the state of this part normal or abnormal. We consider an algorithm for minimization of depth of decision trees solving this problem, and bounds on precision for this algorithm. The results of the paper are obtained in the frameworks of rough set theory [8,9]. However, for simplicity we consider only "crisp" decision tables in which there are no equal rows labelling by different decisions. The paper consists of six sections. In the second section we consider known results on algorithms for minimization of decision tree depth and relative reduct cardinality for single decision table. In the third and fourth sections we consider algorithms for construction of decision trees and relative reducts for joint tables T^ and T^ respectively. In the fifth section we consider an algorithm which for given rule systems 5 i , . . . , Sm constructs a decision tree that recognizes the presence of realized rules in each of these systems. The sixth section contains short conclusion.
17.2 Single Decision Table Consider a decision table T (see Fig. 17.1) which has t columns labelling by attributes a i , . . . , at. These attributes take values from the set {0,1}. For simplicity, we assume that rows are pairwise different (rows in our case correspond not to objects from the universe, but to equivalence classes of the indiscemibility relation). Each row is labelled by a decision d.
Fig. 17.1. Decision table T
We correspond the classification problem to the decision table T: for a given row it is required to find the decision attached to this row. To this end we can use values of attributes a i , . . . ,at.
17 Decision Trees and Reducts for Distributed Decision Tables
241
Test for the table T is a subset of attributes which allow to separate each pair of rows with different decisions. Relative reduct (reduct) for the table T is a test for which each proper subset is not a test. Decision tree for the table T is a tree with the root in which each terminal node is labelled by a decision, each non-terminal node is labelled by an attribute, two edges start in each non-terminal node, and these edges are labelled by numbers 0 and 1. For each row the work of the decision tree is finished in terminal node labelling by the decision corresponding to the considered row. The depth of the tree is the maximal length of a path from the root to a terminal node. It is well known that the problem of reduct cardinality minimization and the problem of decision tree depth minimization are NP-hard problems. So we will consider only approximate algorithms for optimization of reducts and trees. 17.2.1 Greedy Algorithm for Decision Tree Construction Denote by P{T) the number of unordered pairs of rows with different decisions. This number will be interpreted as uncertainty of the table T. Sub-table of the table T is any table obtained from T by removal of some rows. For any a^ G { a i , . . . , a^} and b e {0,1} denote by T{ai^ b) the sub-table of the table T consisting of rows which on the intersection with the column labelling by a^ contain the number b. If we compute the value of the attribute a^ then the uncertainty in the worst case will be equal to t/(T,a,) = niax{P(T(a,,0)),P(r(a,,l))} . Let P{T) ^ 0. Then we compute the value of an attribute a^ for which C/(T, a^) has minimal value. Depending on the value of ai the given row will be localized either in the sub-table T(ai, 0) or in the sub-table T{ai, 1), etc. The algorithm will finish its work when in the constructed tree for any terminal node for sub-table T' corresponding to this node the equality P{V) = 0 holds. It is clear that the considered greedy algorithm has polynomial time complexity.Denote by h{T) the minimal depth of a decision tree for the table T. Denote by hgreedyiT) the depth of a decision tree for the table T constructed by the greedy algorithm. It is known [3,4] that hgreedyiT)
< h{T)\nP{T)
+ 1 .
Using results of Feige [1], one can show (see [5]) that if NP2DTIME(n^(^^s^^sn)) then for any e > 0 there is no polynomial algorithm that for a given decision table T constructs a decision tree for this table which depth is at most
{l-e)h{T)lnP{T)
.
Thus, the considered algorithm is close to best polynomial approximate algorithms for decision tree depth minimization.
242
Mikhail Ju. Moshkov
It is possible to use another uncertainty measures in the considered greedy algorithm. Let F be an uncertainty measure, T a decision table, V a sub-table of T, a^ an attribute of T and h G {0,1}. Consider conditions which allow to obtain bounds on precision of greedy algorithm: (a) F{T) - F{T{au h)) > F{r) - F{r{au b)). (b) F{T) = 0 iff r has no rows with different decisions. (c) If F{T) — 0 then T has no rows with different decisions. One can show that if F satisfies conditions (a) and (b) then Veedy(T)
\nP{T)
+ 1 .
Using results of Feige [1] one can show that if NP 2 DTIME(n^(^^s^^sn)) then for any ^ > 0 there is no polynomial algorithm that for a given decision table T constructs a reduct for this table which cardinality is at most {l~e)R{T)\nP{T)
.
Thus, the considered algorithm is close to best polynomial approximate algorithms for reduct cardinality minimization. To obtain bounds on precision of this algorithm it is important that we can represent the considered problem as a set cover problem. We can weaken this condition and consider a set cover problem such that each cover corresponds to a reduct, but not each reduct corresponds to a cover. In this case we will solve the problem of reduct cardinality minimization not on the set of all reducts, but we will be able to obtain some bounds on precision of greedy algorithm on the considered subset of reducts.
17 Decision Trees and Reducts for Distributed Decision Tables
243
17.3 Distributed Decision Tables. Intersection of Universes In this section we consider the case when the universe of joint decision table is the intersection of universes of distributed decision tables. 17.3.1 Joint Decision Table T^ Let Ti,..., Tyn be decision tables and { a i , . . . , a^} be the set of attributes of these tables. Let b = ( 6 i , . . . , 6^) ^ {0,1}*^, j G { 1 , . . . , m} and {a^^,..., a^,} be the set of attributes of the table Tj. We will say that the row b corresponds to the table Tj if {hi^,..., 6i J is a row of the table Tj. In the last case we will say that (6^^,... ,bij is the row from Tj corresponding to b. Let us define the table T^ (see Fig. 17.2). This table has n columns labelling by attributes a i , . . . , a^. The row b = ( 6 i , . . . , 6n) G {0, l}'^ is a row of the table T^ iff b corresponds to each table Tj,j e { 1 , . . . , m}. This row is labelled by the tuple (cfi,..., d^) where dj is the decision attached to the row from Tj corresponding to h,j e { 1 , . . . , m}. Sometimes we will denote the table T"^ by Ti x ... x T^.
{di,...
,dm)
Fig. 17.2. Joint decision table T^ One can interpret the table T^ in the following way. Let t/i,..., Um be universes corresponding to tables Ti,..., T^ and f/n = C^i Pi... n f/^. If we know the set C/p we can consider the table T(C/n) with attributes ai,..., On, rows corresponding to objects from C/n» and decisions of the kind ( d i , . . . , d^). Assume now that we do not know the set C/n- In this case we must consider an upper approximation of the table T{U^) which is a table containing all rows from T{Up). The table T^ is the minimal upper approximation in the case when we have no any additional information about the set
17.3.2 On Construction of Decision Trees and Reducts for T ^ Our aim is to construct decision trees and reducts for the table T^. Unfortunately, it is very difficult to work with this table. One can show that the following problems are NP-hard: •
For given tables Ti,..., T^ it is required to recognize is the table T^ = Ti x ... x Tm empty.
244
• • •
Mikhail Ju. Moshkov
For given tables Ti,..., T^ it is required to recognize has the table T^ rows with different decisions. For given tables Ti,..., T^ and decision tree F it is required to recognize is F a. decision tree for the table T^. For given tables Ti,..., T^ and a subset D of the set { a i , . . . , a^} it is required to recognize is P a reduct for the table T^.
So really we can use only sufficient conditions for decision tree to be a decision tree for the table T^ and for a subset of the set { a i , . . . , an} to be a reduct for the table T^. If P 7^ NP then there are no simple (polynomial) uncertainty measures satisfying the condition (b). Now we consider two examples of polynomial uncertainty measures satisfying the conditions (a) and (c). L e t a i i , . . . , a i , e { a i , . . . ,an}, 6 1 , . . . ,6^ G {0,1}, and Oi = ( a i , , 6 i ) . . . ( a ^ , , 6 t ) .
Denote by T^a the sub-table of T^ which consists of rows that on the intersection with columns labelling by a^^,..., a^^ have numbers 6 1 , . . . , 6^. Let j € { 1 , . . . , m} and Aj be the set of attributes of the table Tj. Denote by Tja the sub-table of Tj consisting of rows which on the intersection with column labelling by a^^ have number bk for each a^^ G Aj d {a^,,..., a^ J . Consider an uncertainty measure Fi such that F i ( r ^ a ) = P(Tia) H- ... + P ( T ^ a ) . One can show that this measure satisfies the conditions (a) and (c). Unfortunately, the considered measure does not allow to use relationships among tables Ti,,.. ,Tm- Describe another essentially more complicated but polynomial measure which takes into account some of such relationships. Consider an uncertainty measure F2. For simplicity, we define the value of this measure only for the table T^ (the value F2 (T^a) can be defined in the similar way). Set F2(T^) = GI + . . . - K G ^
.
Let j e { 1 , . . . , m}, and the table Tj have p rows r i , . . . , rp. Then
Gj-E^i^: where 1 < ^ < A; < p, and rows r^ and r^ have different decisions. Let q G { l , . . . , p } . Then ^^q
^ ql
'''
^qm
where V^^,i = 1 , . . . , m, is the number of rows r in the table Tj x Ti such that r^ is the row from Tj corresponding to r. It is not difficult to prove that this measure satisfies the conditions (a) and (c). One can show that if P 7^ NP then it is impossible to reduce effectively the problem of reduct cardinality minimization to a set cover problem. However, we can
17 Decision Trees and Redacts for Distributed Decision Tables
245
consider set cover problems such that each cover corresponds to a reduct, but not each reduct corresponds to a cover. Consider an example. Denote B{Tj), j — 1 , . . . , m, the set of unordered pairs of rows from Tj with different decisions, B = J5(ri) U . . . U B(Tm), and Q the set of pairs from B separating by a^, i = 1 , . . . , n. It is not difficult to show that the set cover problem for the set B and family { C i , . . . , C^} of subsets of B has the following properties: each cover corresponds to a reduct for the table T^, but (in general case) not each reduct corresponds to a cover.
17.4 Distributed Decision Tables. Union of Universes In this section we consider the case when the universe of joint decision table is the union of universes of distributed decision tables. 17.4.1 Joint Decision Table T ^ Let Ti,..., Tm be decision tables and { a i , . . . , «„} be the set of attributes of these tables. Let us define the table T^ (see Fig. 17.3). This table has n columns labelling by attributes a i , . . . , a^. The row b = (fei,..., 6^) G {0,1}"^ is a row of the table T^ iff there exists j G {1, • •., m} such that b corresponds to the table Tj. This row is labelled by the tuple (d*,..., cij^) where dj is the decision dj attached to the row from Tj corresponding to b, if b corresponds to the table Tj, and gap otherwise, j e {l,...,m}.
(di, — , . . . ,dm)
Fig. 17.3. Joint decision table T"^ Two tuples of decisions and gaps will be considered as different iff there exists digit in which these tuples have different decisions (in the other words, we will interpret gap as an arbitrary decision). We must localize a given row in a sub-table of the table T^ which does not contain rows labelling by different tuples of decisions and gaps. Most part of results considered in Sect. 17.2 is valid for joint tables T^ too. One can interpret the table T^ in the following way. Let C/i,..., Um be universes corresponding to tables Ti, ...,r^, and Uu = (7i U ... U 1/^. If we know the set U\j we can consider the table T{Uu) with attributes ai,..., a^, rows corresponding to objects from Uu, and decisions of the kind (di, —,..., dm)- Assume now that we do not know the set C/y In this case we must consider an upper approximation of the table T{U[j) which is a table containing all rows from T{Uu). The table T^ is the minimal upper approximation in the case when we have no any additional information about the set f/y-
246
Mikhail Ju. Moshkov
17.4.2 On Construction of Decision Trees and Reducts for T^ L e t a i i , . . . , a i , C {ai,...,On}, 6 1 , . . . , 6t G {0,1}, and a = {ai^.h).. Consider the uncertainty measure Fi such that
.{ai^,bt).
Fi(T^a) = P(Tia) + ... + P{Tma) . One can show that this measure satisfies the conditions (a) and (b). So we can use greedy algorithm for decision tree depth minimization based on the measure Fi, and we can obtain relatively good bounds for this algorithm precision. The number of nodes in the constructed tree can grow as exponential on m. However, we can effectively simulate the work of this tree by construction of a path from the root to a terminal node. Denote B{Tj), j = 1 , . . . , m, the set of unordered pairs of rows from Tj with different decisions, B = B(Ti) U . . . U B{Tm), and Q the set of pairs from B separating by a^, i = 1 , . . . , n. It is not difficult to prove that the problem of reduct cardinality minimization for T^ is equivalent to the set cover problem for the set B and family { C i , . . . , C^} of subsets of B. So we can use greedy algorithm for minimization of reduct cardinality, and we can obtain relatively good bounds for this algorithm precision.
17.5 From Decision Rule Systems to Decision Tree Instead of distributed decision tables we can have information on such tables represented in the form of decision rule systems. Let we have a complicated object Q the state of which is described by values of attributes a i , . . . , an. Let Q i , . . . , Qm be parts of Q, For j = 1 , . . . , m the state of Qj is described by values of attributes from a subset Aj of the set { a i , . . . , a„}. For any Qj v/e have a system Sj of decision rules of the kind Oil = bi A ,.. Aai^ =bt -^ normal where a^^,..., a^^ are pairwise different attributes from Aj, and 6 1 , . . . , 6t are values of these attributes (not necessary numbers from {0,1}). These rules describe the set of all normal states of Qj. We will assume that for any j G { 1 , . . . , m} and for any two rules from Sj the set of conditions from the first rule is not a subset of the set of conditions from the second rule. We will assume also that all combinations of values of attributes are possible, and for each attribute there exists an "abnormal" value which is not in rules (for example, missed value of the attribute). For each part Qj we must find a rule from Sj which is realized (in this case Qj has normal state) or must show that all rules from Sj are non-realized (in this case Qj has abnormal state).
17 Decision Trees and Reducts for Distributed Decision Tables
247
Consider simple algorithm for construction of a decision tree solving this problem. Really we will construct a path from the root to a terminal node of the tree. Describe the main step of this algorithm which consists of 6 sub-steps. Main step: 1. Find minimal set of attributes which cover all rules (an attribute covers a rule if it is in this rule). 2. Compute values of all attributes from this cover. 3. Remove all rules which contradict to obtained values of attributes. If after the realization of this sub-step a system of rules Sj, j G { 1 , . . . , m}, will be empty then corresponding part Qj has abnormal state. 4. Remove from the left side of each rule all conditions (equalities) containing attributes from the cover. 5. Remove all rules with empty left side (such rules are realized). Remove all rules from each system Sj which has realized rules. For each such system the corresponding part Qj has normal state. 6. If the obtained rule system is not empty then repeat main step. Denote h the minimal depth of a decision tree solving the considered problem, /laig the depth of the decision tree constructed by the considered algorithm, and L the maximal length of a rule from the system 5 = 5i U . . . U 5 ^ . One can prove that max{L, h} < /laig < L x h . It is possible to modify the considered algorithm such that we will construct a cover of rules by attributes using greedy algorithm for set cover problem. Denote by /if[g®^^ the depth of constructed decision tree. By N we denote the number of rules in the system S. One can prove that
hir^''^'^
+ l) .
Some other algorithms for transformation of decision rule systems into decision trees can be found in [6].
17,6 Conclusion In the paper algorithms for construction of decision trees and relative reducts for joint decision table generated by distributed decision tables are considered. Some of these algorithms can be useful in applications.
Acknowledgments The author is greatly indebted to Andrzej Skowron for stimulating discussions.
248
Mikhail Ju. Moshkov
References 1. Feige U (1996) A threshold of In n for approximating set cover (Preliminary version). In: Proceedings of 28th Annual ACM Symposium on the Theory of Computing 2. Johnson D S (1974) J Comput System Sci 9:256-278 3. Moshkov M Ju (1982) Academy of Sciences Doklady 265:550-552 (in Russian); English translation: Sov Phys Dokl 27:528-530 4. Moshkov M Ju (1983) Conditional tests. In: Yablonskii S V (ed) Problems of Cybernetics 40. Nauka Publishers, Moscow (in Russian) 5. Moshkov M Ju (1997) Algorithms for constructing of decision trees. In: Proceedings of the First European Symposium Principles of Data Mining and Knowledge Discovery, LNCS 1263, Springer-Verlag 6. Moshkov M Ju (2001) On transformation of decision rule systems into decision trees. In: Proceedings of the Seventh International Workshop Discrete Mathematics and its Applications 1 (in Russian) 7. Nigmatullin R G (1969) Method of steepest descent in problems on cover. In: Memoirs of Symposium Problems of Precision and Efficiency of Computing Algorithms 5 (in Russian) 8. Pawlak Z (1991) Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht Boston London 9. Skowron A, Rauszer C (1992) The discemibility matrices and functions in information systems. In: Slowinski R (ed) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory. Kluwer Academic Publishers, Dordrecht Boston London
18 Learning Concept Approximation from Uncertain Decision Tables Nguyen Sinh Hoa^ and Nguyen Hung Son^ ^ Polish-Japanese Institute of Information Technology Koszykowa 86, 02-008, Warsaw, Poland ^ Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland e-mails: {hoa, son}@mimuw. edu. pi
Summary. We present a hierarchical learning approach to approximation of complex concept from experimental data using inference diagram as a domain knowledge. The solution, based on rough set and rough mereology theory, require to design a learning method from uncertain decision tables. We examine the effectiveness of the proposed approach by comparing it with standard learning approaches with respect to different criteria on artificial data sets generated by a traffic road simulator.
18.1 Introduction Concept approximation is an important problem in data mining [4]. In a typical process of concept approximation we assume that there is given information consisting of values of conditional and decision attributes on objects from a finite subset (training set, sample) of the universe and using this information one should induce approximations of the concept over the whole universe. In many learning tasks, e.g., identification of dangerous situations on the road by unmanned vehicle aircraft (UAV), the target concept is too complex and it can not be approximated directly from feature value vectors. In some cases, when the target concept is a composition of some simpler one, the layered learning [14] is an altemative approach to concept approximation. Given a hierarchical concept decomposition. The main idea is to gradually synthesize a target concept from simpler ones. A learning process can be imagined as a treelike structure with the target concept located at the highest layer. At the lowest layer, basic concepts are approximated using feature values available from a data set. At the next layer more complex concepts are synthesized from basic concepts. This process is repeated for successive layers. The importance of hierarchical concept synthesis is now well recognized by researchers (see, e.g., [9, 6]). An idea of hierarchical concept synthesis, in the rough mereological and granular computing frameworks has been developed (see, e.g..
250
Nguyen Sinh Hoa and Nguyen Hung Son
[9, 6, 10]) and problems connected with compound concept approximation are discussed, e.g., in [6, 11, 1, 13]. In this paper we concentrate on concepts that are specified by decision classes in decision systems [7]. The crucial for inducing concept approximations is to create the description of concepts in such a way that makes it possible to maintain the acceptable level of imprecision along all the way from basic attributes to final decision. In this paper we discuss some strategies for concept composing founded on the rough set theory approach. We also examine effectiveness of layered learning approach by comparison with standard rule-based learning approach. Quality of the new approach will be verified with respect to generality of concept approximation, preciseness of concept approximation, computation time required for concept induction and concept description lengths. Experiments are carried out on an artificial data set generated by a traffic road simulator.
18.2 Basic notions Many problems in machine learning, pattern recognition or data mining can be formulated as searching for concept approximation issues. Formally, given an universe U of objects (cases, states, patients, observations, etc.), and a concept X which can be interpreted as a subset of ZY, the problem is to find a description of X, that can be expressed in a predefined descriptive language C. We assume that C consists of such formulas that are interpretable as subsets ofU. The concept approximation problem can be formulated as a problem of searching for a (approximated) description of a concept X based on a finite set of examples U cU called training set. The approximation is required to be closed to the original concept. The closeness of approximation to the original concept can be measured by different criteria like accuracy, description length,..., which can be also approximated by so called testing examples. Usually, we assume that the input data for concept approximation problem is given by decision table, which is a tuple § = {U,A, dec), where C/ is a non-empty, finite set of training objects, A is a non-empty, finite set, of attributes and dec ^ A is a distinguished attribute called decision. Each attribute a e A corresponds to the function a :U -^Va called evaluation function, where Va is called the domain of a. For any non-empty set of attributes B C A and any object x G C/, we define the Binformation vector of a: by: inf Q{X) = {(a, a(x)) : a G B}, The set INFB{S)
=
: X G U} is called the B-information set of S. Without loss of generality, we assume that the domain of the decision dec is equal to V^ec = {l, • • • j^}- For any k e Vdec, the set CLASSk = {x e U : dec{x) = A;} is called the k*^ decision class of S. The decision dec determines a partition of U into decision classes, i.e., U = CLASSi U . . . U CLASSd. In case of concept approximation problem, we can assume that Vdec = {y^s, no}, and U = CLASSyes U CLASSno{^^/JB (X)
18 Learning Concept Approximation from Uncertain Decision Tables
251
The approximated description of a concept can be induced by any learning algorithm from inductive learning area like rule extraction, decision tree, ... In the next Section we concentrate on methods based on layered learning and rough set theory. In some concept approximation problems (see next Section), we have to approximate the given concept from uncertain decision table. In such uncertain decision tables, attributes are not determined exactly. Formally, every attribute a e A is associated with an evaluated function Ua ' U x Va -^ [0,1]. Assume that K = {'^ii-'-i'^ka} is the domain of the (discrete) attribute a, then the vector i/a{x) = [i/aix^vi),..., Ua{x^ '^ka)] is Called the value distribution of attribute a for the object x. We also assume that the condition ^
Va[x,Vi) < 1
ViEVa
holds for any object u eU. In layered learning approach, the standard decision tables are used to approximate the basic concepts, which are located in the lowest layer. For concepts that are located on the higher levels (called compound concepts), we can use uncertain decision table to induce their approximation.
18.3 Rough Sets and Concept Approximation Problem 18.3.1 Basic idea Rough set methodology for concept approximation can be described as follows. Let X C U be Si concept and let C/ C ZY be a finite sample of U. Assume that for any X eU there is given information if x e X nU ovx eU - X. Any pair P = (L, U) is called rough approximation ofX if it satisfies the following conditions:
1. L C U C W ; 2. L, U are subsets ofU expressible in the language £;
3. LnU cxnu
cvnU;
4. L is maximal (and U is minimal) in the family of sets definable in C satisfying 3. The sets L and U are called the lower approximation and the upper approximation of the concept X CU, respectively. The set B N = U \ L is called the boundary region of approximation of X. The set X is called rough with respect to its approximations (L, U) if L 7^ U, otherwise X is called crisp in U. The pair (L, U) is also called the rough set (for the concept X). The condition (4) in the above list can be substituted by inclusion to a degree to make it possible to induce approximations of higher quality of the concept on the whole universe U. In practical applications the last condition in the above definition can be hard to satisfy. Hence, by using some heuristics we construct sub-optimal instead of maximal or minimal sets. The rough approximation of concept can be also defined by means of rough membership function.
252
Nguyen Sinh Hoa and Nguyen Hung Son
Definition 1 Let X C U be a concept and let decision table S = {U,A, dec) describe the training objects U C.U. A function JJLX : W —» [0,1] is called a rough membership function of the concept X C. U if and only if (L^x -> ^MX ) ^^ ^ rough approximation ofX (induced from sample U), where L^^ = {x eU : p^xi^) = 1} andV^x = {^ € ZY : /ix(^) > 0}Many methods of discovering rough approximations of concepts from data have been proposed, e.g., method based on reducts [7][8], on k-NN classifiers [1], or on decision rules [1]. Let us remind the construction of rough membership function in the concept approximation approach based on decision rules. Given a decision table S = {U, A, dec). Let us assume that RULES(S) is a set of decision rules induced by some rule extraction method. For any object x e U, let MatchRules{S, x) be the set of rules from RULES(S) supported by x. One can define the rough membership function pcLASSk : Z^ —> [0,1] for the concept determined by CLASSk as follows: 1. Let Ryes be the set of all decision rules from MatchRules{S, x) for k^^ class and let Rno be the set of decision rules from MatchRules{S, x) for other classes. 2. We define two real values Wyes.Wno by 'i^yes =
/__] strength{r) and Wno = T ^
strength{r)
where strength{r) is a normalized function depending on length, support, confidence of r and some global information about the decision table S like table size, class distribution (see [2]). 3. One can define the value of fjicLASSk (^) by ^ undetermined if msx{wyes^'Wno) < ^ 0 if > 9 and Wno > ^ PCLASSk (^) = { 1
if Wyes - "^no > 0 and Wyes > (^
' ^ ^ " ^ - " " - M n other cases where cj, 0 are parameters set by user. These parameters make it possible in a flexible way to control the size of boundary region for the approximations established according to Definition 1. 18.3.2 Rough set based layered learning In this section we discuss a strategy composing concepts that are described with concepts established on top of already existing ones. A method for concept composition is a crucial point in concept synthesis. We will discuss the method that gives us the ability to control the level of approximation quality along all the way from basic concepts to the target concept. Let us assume that a concept hierarchy H is given. The concept hierarchy should contain either inference diagram or dependence diagram that connect the target concept with input attribute through intermediate concepts. A training set is represented
18 Learning Concept Approximation from Uncertain Decision Tables
253
by decision table S5 = {U, A, D), where D is a set of decision attributes corresponding to all intermediate concepts and to the target concept. Decision values indicate if an object belong to basic concepts and to the target concept, respectively. Using information available from a concept hierarchy for each basic concept Cb one can create a training decision system SQ, = {U^ Act,, c^eccj,), where Acf, C A, and deccf, e D.To approximate the concept Cb one can apply any classical method (e.g., k-NN, supervised clustering, or rule-based approach [5]) to the table Scf,- In further discussion we assume that basic concepts are approximated by rule based classifiers (see Section 2) derived from relevant decision tables.
c Ac
§c«
--(U,Ac,,deccJ
iU,Ac,decc) = {Wy^s , ^no
, 'Wyes, Wno }
'Sc{, = {U,Ac(,,decCf^
Fig. 18.1. The construction of compound concept approximation using rough description of simpler concepts To avoid overly complicated notation let us limit ourselves to the case of constructing compound concept approximation on the basis of two simpler concept approximations. Assume we have two concepts Ci and C2 that are given to us in the form of rule-based approximations derived from decision systems S^i = {U, Aci,decci) and Sca = {U, Ac^, decc^)- Henceforth we are given two rough membership functions fj>Ci{x), /xcaC^)- These functions are determined with use of parameter sets {w^^s,w^^,uj^\9^^} and {w^^^^w^^^uj^'^.O^''}, respectively. We want to establish similar set of parameters {wy^^.w^^^uj^,0^} for the target concept C, which we want to describe with use of rough membership function fiC' As previously, the parameters UJ, 9 controlling of the boundary region are userconfigurable. But, we need to derive {Wy^^, w^^} from the data. The issue is to define a decision system from which rules used to define approximations can be derived. This problem can be described by uncertain decision table as follows: The uncertain decision system § c — (t/, Ac, decc), that is necessary for learning an approximation of concept C, contains conditional attributes Ac = {aci, CLC2}
254
Nguyen Sinh Hoa and Nguyen Hung Son
related to simpler concepts Ci and C2. There are two possibilities of defining the evaluated functions i/a^^ and Uac^: 1. by rough membership functions, i.e.
2. by voting weights: We propose the following methods for learning approximation of compound concept from uncertain decision tables: Naive method: One can treat uncertain decision table Sc as a normal decision table §' with more attributes. By extracting rules from §' (using discretization as preprocessing), the rule-based approximations of the concept C are created. It is important to observe that such rules describing C use attributes that are in fact classifiers themselves. Therefore, in order to have more readable and intuitively understandable description as well as more control over quality of approximation (especially for new cases) it pays to stratify and interpret attribute domains for attributes in AcStratification method: Instead of using just a value of membership function or weight we would prefer to use linguistic statements such as ''the likeliness of the occurrence of Ci is low". In order to do that we have to map the attribute value sets onto some limited family of subsets. Such subsets are then identified with notions such us "certain", "low'\ "high" etc. It is quite natural, especially in case of attributes being membership functions, to introduce linearly ordered subsets of attribute ranges, e.g., {negative^ low^ medium, high, positive}. That yields fuzzy-like layout, or linguistic variables, of attribute values. One may (and in some cases should) consider also the case when these subsets overlap. Stratification of attribute values and introduction of linguistic variable attached to inference hierarchy serves multiple purposes. First, it provides a way for representing knowledge in more human-readable format since if we have a new situation (new object X* eU\U) to be classified (checked against compliance with concept C) we may use rules like: If compliance ofx* with Ci is high or medium and compliance ofx* with C2 is high then x* € C. Another advantage of imposing the division of attribute value sets lays in extended control over flexibility and validity of system constructed in this way. As we may define the linguistic variables and corresponding intervals, we gain the ability of making system more stable and inductively correct. In this way we control the general layout of boundary regions for simpler concepts that contribute to construction
18 Learning Concept Approximation from Uncertain Decision Tables
255
of the target concept. The process of setting the intervals for attribute values may be performed by hand, especially when additional background information about the nature of the described problem is available. One may also rely on some automated methods for such interval construction, such as, e.g., clustering, template analysis and discretization. Some extended discussion on foundations of this approach, which is related to rough-neural computing [6] and computing with words can be found in [13, 12].
18.4 Experimental Results To verify a quality of hierarchical classifiers we performed some experiments with the road simulator system. 18.4.1 Road simulator Learning to recognize and predict traffic situations on the road is the main issue in many unmanned vehicle aircraft (UVA) projects. It is a good example of hierarchical concept approximation problem. We demonstrate the proposed layered learning approach on our own simulation system. ROAD SIMULATOR is a computer tool generating data sets consisting of recording vehicle movements on the roads and at the crossroads. Such data sets are next used to learn and test complex concept classifiers working on information coming from different devices (sensors) monitoring the situation on the road. Let us present some most important features of this system. During the simulation the system registers a series of parameters of the local simulations, that is simulations connected with each vehicle separately, as well as two global parameters of the simulation that is parameters connected with driving conditions during the simulation. The local parameters are related to driver's profile, which is randomly determined, when a new vehicle appears on the board, and may not be changed until it disappears from the board. The global parameters like visibility, weather conditions are set randomly according to some scenario. We associate the simulation parameters with the readouts of different measuring devices or technical equipment placed inside the vehicle or in the outside environment (e.g., by the road, in a police car, etc.). Apart from those sensors, the simulator registers a few more attributes, whose values are determined based on the sensor's values in a way determined by an expert. These parameters in the present simulator version take over the binary values and are therefore called concepts. Concepts definitions are very often in a form of a question which one can answer YES, NO or NULL (does not concern). In Figure 18.3 there is an exemplary relationship diagram for the above mentioned concepts we present some exemplary concepts and dependency diagram between those concepts. During the simulation data may be generated and stored in a text file. The generated data are in a form of a rectangle board (information system). Each line of the board depicts the situation of a single vehicle and the sensors' and concepts' values
256
Nguyen Sinh Hoa and Nguyen Hung Son Maximal number of vehicles: 20 Current number of vehicles: 14 Humidity: LACK Visibility: 500 Traffic parameter of main road: 0.S Traffic parameter of subordinate road: 0.2 Current simulation step: 68 (from 500) Saving data: NO
Fig. 18.2. The board of simulation. Safe driving Safe overtaking
Safe distance from FL during overtaking
Possibility of going back to therightlane
Possibility of safe stopping before the crossroads
SENSORS Fig. 18.3. The relationship diagram for presented concepts.
are registered for a given vehicle and its neighboring vehicles. Within each simulation step descriptions of situations of all the vehicles are saved to file.
18 Learning Concept Approximation from Uncertain Decision Tables
257
18.4.2 Experiment setup We have generated 6 training data sets: clO^lOO, clOMOO, cl0^300, cl0^400, C10JS500, C20JS500 and 6 corresponding testing data sets named by CIOJSIOON, CIOJSIOON, CIOJSSOON, C10^400N, C10JS500N, C20^500N. All data sets consists of 100 attributes. The smallest data set consists of above 700 situations (100 simulation units) and the largest data set consists of above 8000 situations (500 simulation units). We compare the accuracy of two classifiers, i.e., RS: the standard classifier induced by the rule set method, and RS-L: the hierarchical classifier induced by the RS-layered learning method. In the first approach, we employed the RSES system [3] to generate the set of minimal decision rules. We use the simple voting strategy for conflict resolution in new situation classification. In the RS-layered learning approach, from training table we create five sub-tables to learn five basic concepts (see Figure 18.3): Ci: ''safe distance from FL during overtaking'' C2'. ''possibility of safe stopping before crossroads',' C3: "possibility of going back to the right lane'' C4: "safe distance from FRl," C5: "forcing the right of way." These tables are created using information available from the concept relationship diagram presented in Figure 18.3. A concept in the next level is CQ: "safe overtaking". To approximate concept CQ, we create a table with three conditional attributes. These attributes describe fitting degrees of object to concepts Ci, C2, C3, respectively. The decision attribute has three values YES, NO, or NULL corresponding to the cases of safe overtaking, dangerous overtaking, and not applicable (the overtaking has not been made by car). The target concept C7: "safe driving" is located in the third level of the concept decomposition hierarchy. To approximate Cj we also create a decision table with three attributes, representing fitting degrees of objects to the concepts C4, C5, CQ, respectively. The decision attribute has two possible values YES or NO if a car is satisfying global safety condition, or not, respectively. The comparison results are performed with respect to the following criteria: (1) Accuracy of classification, (2) Covering rate of new cases (generality), and (3) Computing time necessary for classifier synthesis. Classification accuracy: Similarly to real life situations, the decision class "safe driving = YES" is dominating. The decision class "safe driving = NO" takes only 4% - 9% of training sets. Searching for approximation of "safe-driving = NO" class with the high precision and generality is a challenge of leaning algorithms. In experiments we concentrate on quality of the "NO" class approximation. In Table 18.1 we present the classification accuracy of RS and RS-L classifiers. One can observe, the accuracy of "YES" class of both standard and hierarchical classifiers is high. Whereas accuracy of "NO" class is very poor, particularly in case of the standard classifier. The hierarchical classifier showed to be much better than the
258
Nguyen Sinh Hoa and Nguyen Hung Son
standard classifier for this class. Accuracy of "NO" class of the hierarchical classifier is quite high when training sets reach a sufficient size. Table 18.1. Classification accuracy of a standard and hierarchical classifiers. Accuracy cl0.slOON cl0.s200N clO_5300iV clO_s400iV cl0.s500N c20.s500N Average
RS 0.94 0.99 0.99 0.96 0.96 0.99 0.97
Total RS-L 0.97 0.96 0.98 0.77 0.89 0.89 0.91
RS 1 1 1 0.96 0.99 0.99 0.99
Class YES RS-L 1 0.98 0.98 0.77 0.90 0.88 0.92
Class of NO RS RS-L 0 0 0.60 0.75 0 0.78 0.64 0.57 0.80 0.30 0.44 0.93 0.34 0.63
Covering rate: Generality of classifiers usually is evaluated by the recognition ability for unseen objects. In this section we analyze covering rate of classifiers for new objects. One can observe the similar scenarios to the accuracy degree. The recognition rate of situations belonging to "NO" class is very poor in the case of the standard classifier. One can see in Table 18.2 the improvement on coverage degree of "YES" class and "NO" class of the hierarchical classifier. Table 18.2. Covering rate for standard and hierarchical classifiers. Covering rate clO^slOQN clO_5200Ar cl0_s3007V clO_s400iV cl0.5500iV c20.s500iV Average
RS 0.44 0.72 0.47 0.74 0.72 0.62 0.62
Total RS-L 0.72 0.73 0.68 0.90 0.86 0.89 0.79
RS 0.44 0.73 0.49 0.76 0.74 0.65 0.64
Class YES RS-L 0.74 0.74 0.69 0.93 0.88 0.89 0.81
RS 0.50 0.50 0.10 0.23 0.40 0.17 0.32
Class NO RS-L 0.38 0.63 0.44 0.35 0.69 0.86 0.55
Computing speed: With time respect the layered learning approach shows a tremendous advantage in comparison with the standard learning approach. In the case of standard classifier, computational time is measured as a time required for computing a rule set using to decision class approximation. In the case of hierarchical classifier computational
18 Learning Concept Approximation from Uncertain Decision Tables
259
time is a total time required for all sub-concepts and target concept approximation. One can see in Table 18.3 that speed up ratio of the layered learning approach to the standard one reaches from 40 to 130 times (all experiments were performed on computer with processor AMD Athlon 1.4GHz., 256MB RAM) Table 18.3. Time for standard and hierarchical classifier generation. Tables cl0.5lOO cl0_s200 clO_5300 cl0.s400 cl0.s500 c20_s500 Average
RS RS-L Speed up ratio 94 s 2.3 s 40 714 s 6.7 s 106 1450 s 10.6 s 136 60 2103 s 34.4 s 3586 s 38.9 s 92 104 10209 s 98s 90
18,5 Conclusion We presented a new method for concept synthesis. It is based on the layered learning approach. Unlike traditional approach, in the layered learning approach the concept approximations are induced not only from accessed data sets but also from expert's domain knowledge. In the paper, we assume that knowledge is represented by concept dependency hierarchy. The layered learning approach showed to be promising for the complex concept synthesis. Experimental results with road traffic simulation are showing advantages of this new approach in comparison to the standard approach. Acknowledgements: The research has been partially supported by the grant 3T11C00226 from Ministry of Scientific Research and Information Technology of the Republic of Poland. The authors are deeply grateful to Dr. J. Bazan for his road simulator system and Prof. A. Skowron for valuable discussions on the layered learning approach.
References 1. J. Bazan, H. S. Nguyen, A. Skowron, and M. Szczuka. A view on rough set concept approximation. In G. Wang, Q. Liu, Y. Yao, and A. Skowron, editors, RSFDGrC'2003, Chongqing, China, volume 2639 of LNAI, pages 181-188, Heidelberg, Germany, 2003. Springer-Verlag.
260
Nguyen Sinh Hoa and Nguyen Hung Son
2. J. G. Bazan. A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables. In L. Polkowski and A. Skowron, editors, Rough Sets in Knowledge Discovery 1: Methodology and Applications, pages 321-365. Physica-Verlag, Heidelberg, Germany, 1998. 3. J. G. Bazan and M. Szczuka. RSES and RSESlib - a collection of tools for rough set computations. In W. Ziarko and Y. Yao, editors, RSCTC'02, volume 2005 of LNAI, pages 106-113, Banff, Canada, October 16-19 2000. Springer-Verlag. 4. W. Kloesgen and J. Zytkow, editors. Handbook of Knowledge Discovery and Data Mining. Oxford University Press, Oxford, 2002. 5. T. Mitchell. Machine Learning. Mc Graw Hill, 1998. 6. S. K. Pal, L. Polkowski, and A. Skowron, editors. Rough-Neural Computing: Techniques for Computing with Words. Cognitive Technologies. Springer-Verlag, Heidelberg, Germany, 2003. 7. Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data, volume 9 of System Theory, Knowledge Engineering and Problem Solving. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991. 8. Z. Pawlak and A. Skowron. A rough set approach for decision rules generation. In Proc. ofIJCAI'93, pages 114-119, Chambery, France, 1993. Morgan Kaufmann. 9. L. Polkowski and A. Skowron. Rough mereology: A new paradigm for approximate reasoning. IntemationalJoumal of Approximate Reasoning, 15(4):333-365, 1996. 10. A. Skowron. Approximate reasoning by agents in distributed environments. In N. Zhong, J. Liu, S. Ohsuga, and J. Bradshaw, editors, lAT'Ol, Japan, pages 28-39. World Scientific, Singapore, 2001. 11. A. Skowron. Approximation spaces in rough neurocomputing. In M. Inuiguchi, S. Tsumoto, and S. Hirano, editors. Rough Set Theory and Granular Computing, pages 13-22. Springer-Verlag, Heidelberg, Germany, 2003. 12. A. Skowron and J. Stepaniuk. Information granule decomposition. Fundamenta Informaticae, 47(3-4):337-350, 2001. 13. A. Skowron and M. Szczuka. Approximate reasoning schemes: Classifiers for computing with words. In Proceedings ofSMPS 2002, Advances in Soft Computing, pages 338-345, Heidelberg, Canada, 2002. Springer-Verlag. 14. P. Stone. Layered Learning in Multi-Agent Systems: A Winning Approach to Robotic Soccer. The MIT Press, Cambridge, MA, 2000.
19 In Search for Action Rules of the Lowest Cost Zbigniew W. Ras^'^ and Angelina A. Tzacheva^ ^ UNC-Charlotte, Computer Science Dept., Charlotte, NC 28223, USA ^ Polish Academy of Sciences, Institute of Computer Science, Ordona 21, 01-237 Warsaw, Poland
Summary. There are two aspects of interestingness of rules that have been studied in data mining literature, objective and subjective measures ([2], [1], [3], [11],[12]). Objective measures are data-driven and domain-independent. Generally, they evaluate the rules based on their quality and similarity between them. Subjective measures, including unexpectedness, novelty [11], and actionability, are user-driven and domain-dependent. A rule is actionable if user can do an action to his/her advantage based on this rule ([2], [1], [3]). Action rules introduced in [7] and investigated further in [8] are constructed from actionable rules. To construct them, authors assume that attributes in a database are divided into two groups: stable andflexible.Flexible attributes provide a tool for making hints to a user what changes within some values offlexibleattributes are needed for a given group of objects to re-classify these objects to another decision class. Ras and Gupta (see [10]) proposed how to construct action rules when information system is distributed with autonomous sites. Additionally, the notion of a cost and feasibility of an action rule is introduced in this paper. A heuristic strategy for constructing feasible action rules which have high confidence and possibly the lowest cost is also proposed. Interestingness of such action rules is the highest among actionable rules.
19.1 Introduction There are two aspects of interestingness of rules that have been studied in data mining literature, objective and subjective measures ([2], [1], [3], [11],[12]). Objective measures are data-driven and domain-independent. Generally, they evaluate the rules based on their quality and similarity between them. Subjective measures, including unexpectedness, novelty [11], and actionability, are user-driven and domaindependent. A rule is actionable if user can do an action to his/her advantage based on this rule ([2], [1], [3]). Action rules introduced in [7] and investigated further in [8] are constructed from actionable rules. They suggest ways to re-classify consumers to a desired state. However, quite often, such a change cannot be done directly to a chosen attribute (for instance to the attribute profit). In such situations, definitions of such an attribute in terms of other attributes have to be learned. These definitions are used to construct action rules showing what changes in values of attributes, for a given consumer, are needed in order to re-classify this consumer the way busi-
262
Zbigniew W. Ras and Angelina A. Tzacheva
ness user wants. This re-classification may mean that a consumer not interested in a certain product, now may buy it, and therefore may shift into a group of more profitable customers. These groups of customers are described by values of classification attributes in a decision system schema. Ras and Gupta, in [10], assume that information system is distributed and its sites are autonomous. They claim that it is wise to search for action rules at remote sites when action rules extracted at the client site can not be implemented in practice (they are too expensive, too risky, or business user is unable to make such changes). Also, they show under what assumptions two action rules extracted at two different sites can be composed. One of these assumptions requires that semantics of attributes, including the interpretation of null values, have to be the same at both sites. In the present paper, this assumption is relaxed. Additionally, we introduce the notion of a cost and feasibility of an action rule. Usually, a number of action rules or chains of action rules can be applied to re-classify a given set of objects. The cost associated to changes of values within one attribute is usually different than the cost associated to changes of values within another attribute. We present a strategy for constructing chains of action rules driven by a change of attribute values suggested by another action rule which are needed to reclassify some objects. This chain of action rules uniquely defines a new action rule and it is built with a goal to lower the cost of reclassifying these objects. Silberschatz and Tuzhilin [11], [12] quantify actionability in terms of unexpectedness and define unexpectedness as a subjective measure of interestingness. They have shown that the most actionable knowledge is unexpected and most of the unexpected knowledge is actionable. So, by discovering action rules of possibly the lowest cost, we obtain the most actionable knowledge and the same the mostly unexpected knowledge related to a desired reclassification of objects.
19.2 Information System and Action Rules An information system is used for representing knowledge. Its definition, presented here, is due to Pawlak [4]. By an information system we mean a pair S = {U, A, V), where: • • •
C/ is a nonempty, finite set called the universe, A is a nonempty, finite set of attributes i.e. a : U —> 14 is a function for a e A, V = [j{Va : a G A}, where Va is a set of values of the attribute a e A.
Elements of U are called objects. In this paper, they are often seen as customers. Attributes are interpreted as features, offers made by a bank, characteristic conditions etc. By a decision table we mean any information system where the set of attributes is partitioned into conditions and decisions. Additionally, we assume that the set of conditions is partitioned into stable conditions and flexible conditions. For simplicity reason, we also assume that there is only one decision attribute. Date of Birth is an example of a stable attribute. Interest rate on any customer account is an example
19 In Search for Action Rules of the Lowest Cost
263
of a flexible attribute (dependable on bank). We adopt the following definition of a decision table: By a decision table we mean an information system of the form 5 = (C/, Ast U Api U {d}), where d 0 Ast U Api is a distinguished attribute called decision. The elements of Ast are called stable conditions, whereas the elements of AFI are called flexible conditions. As an example of a decision table we take S = ({xi, rr2, X3, X4, xs, xe, X7, a^s}, {a, c} U {b} U {d}) represented by Table 1. The set {a, c} lists stable attributes, b is a flexible attribute and d is a decision attribute. Also, we assume that H denotes a high profit and L denotes a low one.
X
a
b
c
d
Xi
0
S
0
L
X2
0
R
1
L
X3
0
S
0
L
X4
0
R
1
L
X5
2
P
2
L
xe
2
P
2
L
X7
2
S
2
H
Xs
2
S
2
H
Table 19.1. Decision System S
In order to induce rules in which the THEN part consists of the decision attribute dand the IF part consists of attributes belonging to Ast^Apu subtables (C/, Bu{d}) of 5 where B is a d-reduct (see [4]) in S should be used for rules extraction. By L{r) we mean all attributes listed in the IF part of a rule r. For example, if r = [(a, 2)*(6, S) —> (d, H)] is a rule then L{r) = {a, 6}. By d{r) we denote the decision value of a rule. In our example d{r) = ^ . If r i , r2 are rules and B C A^t U Api is a set of attributes, then ri/B = r2/B means that the conditional parts of rules r i , r2 restricted to attributes B are the same. For example if ri = [(6, S) * (c, 2) —> (c?, J?)], thenri/{6} = r/{6}. In our example, we get the following optimal rules: (a,0)-^(d,L),(c,0)-.(d,L), (6,i2)^(ci,L),(c,l)^(d,L), (6, P) ^ (d, L), (a, 2) * (6,5) - . (d, ff), (6,5) * (c, 2) ^ (d, H). Now, let us assume that (a, v -^^ it;) denotes the fact that the value of attribute a has been changed from v to w. Similarly, the term (a, t; —» tt;)(x) means that a{x) = V has been changed to a{x) — w. Saying another words, the property (a, v) of object X has been changed to property (a, w).
264
Zbigniew W. Ras and Angelina A. Tzacheva
Let S = {U,Ast U Api U {d}) is a decision table and rules r i , r2 have been extracted from S. Assume that S i is a maximal subset of Ast such that ri/Bi = ^2/Si, d{ri) = ki, d{r2) = k2 and the user is interested in reclassifying objects from class ki to class k2. Also, assume that (61,62,..., bp) is a Hst of all attributes in L{ri) D L{r2) H An on which r i , r2 differ and ri{bi) = t;i, ri(62) = V2,..., ri{bp) = Vp, r2{bi) = wi, r2{b2) = W2,..., r2{bp) = Wp, By (ri, r2)-action rule on x G C/ we mean an expression (see [7]): [{bi,vi -^ wi) A (62,1^2 -^ '^^2) A... A {bp,Vp -> Wp)]{x) => Uki-^k2)]{x). The rule is valid, if its value on x is true in S (there is object xi e S which does not contradict with x on stable attributes in 5 and (Vi < p)(Vfei)[6i(x2) = Wi] A d{x2) = k2). Otherwise it is false.
19.3 Distributed Information System By a distributed information system we mean a pair DS = {{Si}i^i, L) where: • • •
/ is a set of sites. Si = {Xi, Ai,Vi) is an information system for any i e I, L is a symmetric, binary relation on the set / showing which systems can direcdy communicate with each other.
A distributed information system DS = {{Siji^j, L) is consistent if the following condition holds: (Vz)(Vi)(Vx eXiO Xj){Wa eAiH Aj) [{a[s,]{x) C a[Sj]{x)) or (a[5.|(x) C ais,]{x))]. Consistency basically means that information about any object x in one system can be either more general or more specific than in the other. Saying another words two systems can not have conflicting information stored about any object x. Another problem which has to be taken into consideration is semantics of attributes which are common for a client and some of its remote sites. This semantics may easily differ from site to site. Sometime, such a difference in semantics can be repaired quite easily. For instance if Temperature in Celsius is used at one site and Temperature in Fahrenheit at the other, a simple mapping will fix the problem. If information systems are complete and two attributes have the same name and differ only in their granularity level, a new hierarchical attribute can be formed to fix the problem. If databases are incomplete, the problem is more complex because of the number of options available to interpret incomplete values (including null vales). The problem is especially difficult in a distributed framework when chase techniques based on rules extracted at the client and at remote sites (see [6]), are used by the client to impute current values by values which are less incomplete. In this paper we concentrate on granularity-based semantic inconsistencies. Assume first that Si — (Xi.Ai, Vi) is an information system for any i e I and that
19 In Search for Action Rules of the Lowest Cost
265
all S^s form a Distributed Information System (DIS). Additionally, we assume that, if a e Ai 0 Aj, then only the granularity levels of a in Si and 5^ may differ but conceptually its meaning, both in Si and Sj is the same. Assume now that L{Di) is a set of action rules extracted from 5^, which means that D = IJie/ ^(-^0 ^^ ^ ^^^ of action rules which can be used in the process of distributed action rules discovery. Now, let us say that system Sk, k e I is queried be a user for an action rule reclassifying objects with respect to decision attribute d. Any strategy for discovering action rules from S^ based on action rules D' C D is called sound if the following three conditions are satisfied: • •
•
for any action rule in D', the value of its decision attribute d is of the granularity level either equal to or finer than the granularity level of the attribute din S^. for any action rule in D\ the granularity level of any attribute a used in the classification part of that rule is either equal or softer than the granularity level of a in Skattribute used in the decision part of a rule has to be classified as flexible in 5^.
In the next section, we assume that if any attribute is used at two different sites of DIS, then at both of them its semantics is the same and its attribute values are of the same granularity level.
19.4 Cost and Feasibility of Action Rules Assume now that DS = ({5^ : i € / } , L) is a distributed information system (DIS), where Si = {Xi.Ai^ Vi),i e LhQtb e Aiisa flexible attribute in Si and 6i, 62 G Vi are its two values. By ps^ (^1, ^2) we mean a number from (0, +00] which describes the average cost to change the attribute value from 61 to 62 for any of the qualifying objects in Si. Object x e Xi qualifies for the change from 61 to 62, if b{x) = bi. If the implementation of the above change is not feasible for one of the qualifying objects in Si, then we write psi{bi,b2) = +00. The value of ^5^(61,62) close to zero is interpreted that the change of values from 61 to 62 is quite easy to accomplish for qualifying objects in Si whereas any large value of p^. (61,62) means that this change of values is practically very difficult to get for some of the qualifying objects in Si. If psi (61, ^2) < PSi {bs, 64), then we say that the change of values from 61 to 62 is more feasible than the change from 63 to 64. We assume here that the values pSi (6ji, 6^2) are provided by experts for each of the information systems Si. They are seen as atomic expressions and will be used to introduce the formal notion of the feasibility and the cost of action rules in Si. So, let us assume that r = [{bi.vi -^ wi) A (62,^2 —^ W2) A ... A {bp^Vp -> Wp)]{x) => (d, ki -^ k2){x) is a (ri,r2)-action rule. By the cost of r denoted by cost{r) we mean the value Ylips.i'^ki '^k) ' ^ ^ k < p}. We say that r is feasible if cost{r) < pSi{ki,k2). It means that for any feasible rule r, the cost of the conditional part of r is lower than the cost of its decision part and clearly cost{r) < +00.
266
Zbigniew W. Ra^ and Angelina A. Tzacheva
Assume now that disa. decision attribute in Si,ki,k2 G V^, and the user would like to re-classify some customers in Si from the group ki to the group k2. To achieve this goal he may look for an appropriate action rule, possibly of the lowest cost value, to get a hint which attribute values have to be changed. To be more precise, let us assume that Rsi [( k2)] he may identify a rule which has the lowest cost value. But the rule he gets may still have the cost value much to high to be of any help to him. Let us notice that the cost of the action rule r = [{bi.vi -^ wi) A {b2,V2 -^ '^2) A ... A {bp,Vp -^ Wp)]{x) ^ {d,ki -^ k2){x) might be high only because of the high cost value of one of its sub-terms in the conditional part of the rule. Let us assume that {bj^Vj —> Wj) is that term. In such a case, we may look for an action rule in Rs^ [{bj^Vj -^ Wj)] which has the smallest cost value. Assume that ri = [{bji^Vji —> Wji) A {bj2,Vj2 —^ '^32) A ... A {bjq^Vjq -^ '^3q)]{y) =^ iPj^'^j "^ '^j){y) is such a rule which is also feasible in Si, Since x,y e Xi, we can compose r with ri getting a new feasible rule which is given below: [(61,-^i -> wi) A ... A [{bji.Vji -^ Wji) A {bj2,Vj2 -^ '^32) A ... A {bjq.Vjq -^ Wjq)] A ... A {bp.Vp -^ 'Wp)]{x) => {d,ki -> k2){x). Clearly, the cost of this new rule is lower than the cost of r. However, if its support in Si gets too low, then such a rule has no value to the user. Otherwise, we may recursively follow this strategy trying to lower the cost of re-classifying objects from the group ki into the group k2. Each successful step will produce a new action rule which cost is lower than the cost of the current rule. This heuristic strategy always ends because there is a finite number of action rules and any action rule can be applied only once at each path of this recursive strategy. One can argue that if the set Rsi[{d^ki -^ k2)] contains all action rules reclassifying objects from group ki into the group k2 then any new action rule, obtained as the result of the above recursive strategy, should be already in that set. We do not agree with this statement since in practice Rsi [(c/, ki —> A;2)] is only a subset of all action rules. Firstly, it takes too much time (complexity is exponential) to generate all possible rules from an information system and secondly even if we extract such rules it still takes too much time to generate all possible action rules from them. So the applicability of the proposed recursive strategy, to search for rules of lowest cost, is highly justified. Again, let us assume that the user would like to reclassify some objects in Si from the class 61 to the class 62 and that ps^ (^1, ^2) is the current cost to do that. Each action rule in i?^. [(d, ki —> k2)] gives us an alternate way to achieve the same result but under different costs. If we limit ourself to the system 5^, then clearly we can not go beyond the set Rsi [(0?, ki -^ A:2)]. But, if we allow to extract action rules at other information systems and use them jointly with local action rules, then
19 In Search for Action Rules of the Lowest Cost
267
the number of attributes which can be involved in reclassifying objects in Si will increase and the same we may further lower the cost of the desired reclassification. So, let us assume the following scenario. The action rule r = [{bi,vi —^wi)A (62,f2 —^ W2) A ... A {bp.Vp -^ Wp)]{x) =^ {d,ki -^ k2){x), extracted from the information system Si, is not feasible because at least one of its terms, let us say {bj, Vj -^ Wj) where 1 < j < p, has too high cost ps-. {vj, Wj) assign to it. In this case we look for a new feasible action rule ri = [(bji^Vji -^ Wji) A {bj2,Vj2 -> ^i2) A ... A {bjq.Vjq -^ u)jq)]iy) ^ {bj.Vj "^ '^j){y) wWch Concatenated with r will decrease the cost value of desired reclassification. So, the current setting looks the same to the one we already had except that this time we additionally assume that ri is extracted from another information system in DS. For simplicity reason, we also assume that the semantics and the granularity levels of all attributes listed in both information systems are the same. By the concatenation of action rule ri with action rule r we mean a new feasible action rule ri o r of the form: [(61,vi -^ wi) A ... A [{bji,Vji -^ Wji) A ibj2,Vj2 "^ ^j2) A ... A {bjq.Vjq -^ Wjq)] A ... A {bp.Vp -^ Wp)]{x) => {d,ki -^ k2){x) where x is an object in Si = (X^, Ai.Vi). Some of the attributes in {6^1,6^2, ••, bjq} may not belong to Ai. Also, the support of ri is calculated in the information system from which r i was extracted. Let us denote that system by Sm = (^m, ^m, Kn) and the set of objects in Xjn supporting ri by Supsmi^i)- Assume that Supsi{r) is the set of objects in Si supporting rule r. The domain of ri o r is the same as the domain of r which is equal to SupSi{r), Before we define the notion of a similarity between two objects belonging to two different information systems, we assume that Ai = {61,62,63,64}, Am = {bi,b2,b3,b5,bG}, and objects x e Xi, y e Xm are defined by the table below: Table 19.2. Object x from Si and y from Sn 61
X Vi y vi
62
63
^4
^5
V2 V3 V4 W2 W3
ws
We
The similarity p(x, y) between x and y is defined as: [1 -f 0 -f- 0 + 1/2 + 1/2 + 1/2] = [2 -h 1/2]/6 = 5/12. To give more formal definition of similarity, we assume that: p{x, y) = [S{p{bi{x), bi{y)) : bi 6 {Ai U Am)}]/card{Ai U Am), where: • • •
p{bi{x),bi{y)) = 0, if bi{x) ^ bi{y), p{bi{x),bi{y)) = 1, if bi{x) = bi{y), p{bi{x)^ bi{y)) = 1/2, if either bi{x) or bi{y) is undefined.
268
Zbigniew W. Ras and Angelina A. Tzacheva
Let us assume that p(a:,5''ixp5^(ri)) = max{p{x,y) : y e Sups^{ri)}, for each x G SupSi{r). By the confidence of ri o r we mean Conf{ri o r) = lUipi^^S'^PSmin)) ' X € Sups,{r)}/card{SupSi{r))] • Conf{ri) • Conf{r), where Conf{r) is the confidence of the rule r in Si and Conf(ri) is the confidence of the rule ri in Sfn. If we allow to concatenate action rules extracted from 5^ with action rules extracted at other sites of DIS, we are increasing the total number of generated action rules and the same our chance to lower the cost of reclassifying objects in Si is also increasing but possibly at a price of their decreased confidence.
19.5 Heuristic Strategy for the Lowest Cost Reclassification of Objects Let us assume that we wish to reclassify as many objects as possible in the system Si, which is a part of DIS, from the class described by value ki of the attribute d to the class k2. The reclassification ki -^ k2 jointly with its cost psi {ki,k2) is seen as the information stored in the initial node no of the search graph built from nodes generated recursively by feasible action rules taken initially from i?^. [(d, ki -> ^2)]. For instance, the rule r = [{bi,vi -> wi) A (62,^2 -^ ^2) A ... A {bp.Vp -^ Wp)]{x) =^ {d,ki -^k2){x) applied to the node UQ = {[ki -^ k2^ pSi (^15 ^2)]} generates the node ni = {[vi -^wi,ps,{vi,wi)],[v2 -^W2,pSiiv2,W2)],..., [Vp -^Wp,ps,{Vp,Wp)]}, and from rii we can generate the node ^2 = {[Vl -^Wi,pSi{vi,Wi)],[v2 -^W2,pSi{v2,yJ2)],'", [Vji -^ Wji,ps,(Vjl,Wji)],[Vj2 -^Wj2,pSi{Vj2,Wj2)l..., [Vjq -> VJjq^ps.iVjq.Wjq)], ..., [Vp -> Wp, ps,iVp,Wp)]} assuming that the action rule n = [{bjl^Vjl -> Wji) A {bj2,Vj2 -^ Wj2) A ... A {bjq.Vjq -^ Wjq)]{y) => {bj.Vj -^Wj){y) from Rs^ [{bjiVj -^ '^j)] is applied to ni. /see Section 4/ This information can be written equivalently as: r(no) = n i , ri(ni) = n2, [ri o r](no) = n2. Also, we should notice here that ri is extracted from S^ and Supsm (^1) ^ ^rn whereas r is extracted from 5^ and Sups^ (r) C Xi. By Sup Si (r) we mean the domain of action rule r (set of objects in 5^ supporting r). The search graph can be seen as a directed graph G which is dynamically built by applying action rules to its nodes. The initial node no of the graph G contains information coming from the user, associated with the system Si, about what objects in Xi he would like to reclassify and how and what is his current cost of this reclassification. Any other node n in G shows an alternative way to achieve the same reclassification with a cost that is lower than the cost assigned to all nodes which are preceding n in G. Clearly, the confidence of action rules labelling the path from the
19 In Search for Action Rules of the Lowest Cost
269
initial node to the node n is as much important as the information about reclassification and its cost stored in node n. Information from what sites in DIS these action rules have been extracted and how similar the objects at these sites are to the objects in Si is important as well. Information stored at the node {[^;i -^ wi.ps,(^^1,^i)], [v2 -^ ^2,pSi{v2,W2)\,..., [vp -^ Wp,ps,{vp,Wp)]} says that by reclassifying any object x supported by rule r from the class vi to the class Wi, for any i < p, we also reclassify that object from the class ki to k2. The confidence in the reclassification of x supported by node {[vi -^ 'Wi,pSi{vi,wi)],[v2 -^ W2,pSi{y2i'i^^2)],-'A'^P ^ Wp,ps,{vp,Wp)]} IS tht Same as the confidence of the rule r. Before we give a heuristic strategy for identifying a node in G, built for a desired reclassification of objects in Si, with a cost possibly the lowest among all the nodes reachable from the node no, we have to introduce additional notations. So, assume that N is the set of nodes in our dynamically built directed graph G and no is its initial node. For any node n e N,by f{n) = {Yn,{[vn,j -^ Wnj,pSi{yn,j^'^n,j)]}jein) ^^ mcau its domain, the reclassification steps related to objects in Xi, and their cost all assigned by reclassification function f to the node n, where Yn C Xi /Graph G is built for the client site Si/. Let us assume that/(n) = (Yndb^n.k -^ ifn,fc,p5i(^n,fc,^t^n,fc)]}fc€/.)-We say that action rule r, extracted from Si, is applicable to the node n if: • •
YnnSups,{r)y^ili, (Bk e In)[f ^ Rsi[vn,k -^ tt;n,A;]]./see Section 4 for definition of i?5. [...]/
Similarly, we say that action rule r, extracted from 5 ^ , is applicable to the node nif:f: • •
{3x e Yn){3y e Sups^{r))[p{x,y) < A], lp{x,y) is the similarity relation between x, y (see Section 4 for its definition) and A is a given similarity threshold/ {3k e /n)[^ ^ Rsm [^n,k —^ Wn,k]]- /scc Scctiou 4 for definition of Rs^ [...]/
It has to be noticed that reclassification of objects assigned to a node of G may refer to attributes which are not necessarily attributes listed in Si. In this case, the user associated with Si has to decide what is the cost of such a reclassification at his site, since such a cost may differ from site to site. Now, let RA{n) be the set of all action rules applicable to the node n. We say that the node n is completely covered by action rules from RA{n) if Xn = [JlSups^ (r) : r e RA{n)}. Otherwise, we say that n is partially covered by action rules. What about calculating the domain Yn of node n in the graph G constructed for the system 5^? The reclassification (d, ki -^ k2) jointly with its cost psi{ki^k2) is stored in the initial node no of the search graph G. Its domain YQ is defined as the settheoretical union of domains of feasible action rules in Rs^ [{d, ki —> k2)] applied to Xi. This domain still can be extended by any object x e Xi if the following condition holds: (3m)(3r € Rsjki ^ k2]){3y G Sups^{r))[p{x,y) < A].
270
Zbigniew W. Ras and Angelina A. Tzacheva
Each rule applied to the node no generates a new node in G which domain is calculated in a similar way to no. To be more precise, assume that n is such a node and / ( n ) = {Yn, {K,/c -> 'Wn.k^pSi{vn,k,'Wn,k)]}kein)' Its domain Yn is defined as the set-theoretical union of domains of feasible action rules in IJi^s^i [^n,k -^ Wn,k] ' k e In} applied to Xi. Similarly to no, this domain still can be extended by any object x e Xiif the following condition holds: {3m){3k e /n)(3r G Rsm[vn,k -^ ^^n,A:])(32/ e Sups^{r))[p{x,y) < A]. Clearly, for all other nodes, dynamically generated in G, the definition of their domains is the same as the one above. Property 1. An object x can be reclassified according to the data stored in node n, only if x belongs to the domain of each node along the path from the node no to n. Property 2. Assume that x can be reclassified according to the data stored in node n a n d / ( n ) = (Fn,{K,fe -^ w^^k,pSi{vn,k,'l^n,k)]}keIr^)' The cost Cosifci-^fcaC^j ^) assigned to the node n in reclassifying x from ki to k2 is equal to J2{pSi{yn,k,Wn,k) ' k G In}Property 3. Assume that x can be reclassified according to the data stored in node n and the action rules r, r i , r2,..., rj are labelling the edges along the path from the node no to n. The confidence Confk^-^k2 (^? ^) assigned to the node n in reclassifying x from fci to k2 is equal to Conf[rj o ... o r2 o ri o r] /see Section 4/. Property 4. If node nj2 is a successor of the node n^i, then Confk^^k2{'^j2,x) < Con/fc.^A^sKi.^)Property 5. If a node nj2 is a successor of the node n^i, thenCostki^k2{'^j2,x) < Costfci-^fcaC^ji^^)Let us assume that we wish to reclassify as many objects as possible in the system Si, which is a part of DIS, from the class described by value ki of the attribute d to the class k2. We also assume that R is the set of all action rules extracted either from the system Si or any of its remote sites in DIS. The reclassification (c?, fci —^ ^2) jointly with its cost pSi {ki, ^2) represent the information stored in the initial node no of the search graph G, By Xconf we mean the minimal confidence in reclassification acceptable by the user and by Xcost, the maximal cost the user is willing to pay for the reclassification. The algorithm Build-and-Search generates for each object x in Si, the reclassification rules satisfying thresholds for minimal confidence and maximal cost. Algorithm Build-and-Search(i^, x, Xconf^ Xcost, n, m); Input Set of action rules R, Object X which the user would like to reclassify, Threshold value Xconf for minimal confidence. Threshold value Xcost for maximal cost. Node n of a graph G. Output Node m representing an acceptable reclassification of objects from 5^. begin if Co5tfci_fc2(^5^) > Acost,then
19 In Search for Action Rules of the Lowest Cost
271
generate all successors of n using rules from R', while ni is a successor of n do if Con/fci^A;2(^i7^) < ^'Conf then stop else if Co5tfci_^jfe2(ni,a:) < Xcost then Output[ni] else Build-and-Search(i2, x, Xconf, Xcost, ni,m) end Now, calling the procedure Build-and-Search(i?,x, Acon/, Acost,^o,^), we get the reclassification rules for x satisfying thresholds for minimal confidence and maximal cost. The procedure, stops on the first node n which satisfies both thresholds: Xconf for minimal confidence and Xcost for maximal cost. Clearly, this strategy can be enhanced by allowing recursive calls on any node n when both thresholds are satisfied by n and forcing recursive calls to stop on the first node ni succeeding n, if only Costk^^k2{'^i^^) < ^Cost and Confk^^k2{'^i^^) < Xconf- Then, the recursive procedure should terminate not on rii but on the node which is its direct predecessor.
19.6 Conclusion The root of the directed search graph G is used to store information about objects assigned to a certain class jointly with the cost of reclassifying them to a new desired class. Each node in graph G shows an alternative way to achieve the same goal. The reclassification strategy assigned to a node n has the cost lower then the cost of reclassification strategy assigned to its parent. Any node nin G can be reached from the root by following either one or more paths. It means that the confidence of the reclassification strategy assigned to n should be calculated as the maximum confidence among the confidences assigned to all path from the root of G to n. The search strategy based on dynamic construction of graph G (described in previous section) is exponential from the point of view of the number of active dimensions in all information systems involved in search for possibly the cheapest reclassification strategy. This strategy is also exponential from the point of view of the number of values of flexible attributes in all information systems involved in that search. We believe that the most promising strategy should be based on a global ontology [14] showing the semantical relationships between concepts (attributes and their values), used to define objects in DAIS. These relationships can be used by a search algorithm to decide which path in the search graph G should be exploit first. If sufficient information from the global ontology is not available, probabilistic strategies (Monte Carlo method) can be used to decide which path in G to follow.
References 1. Adomavicius, G., Tuzhilin, A., (1997), Discovery of actionable patterns in databases: the action hierarchy approach, in Proceedings of the Third International Conference on
272
2.
3. 4. 5. 6.
7.
8.
9.
10. 11.
12.
13. 14.
15.
Zbigniew W. Ras and Angelina A. Tzacheva Knowledge Discovery and Data Mining (KDD97), Newport Beach, CA, AAAI Press, 1997 Liu, B., Hsu, W., Chen, S., (1997), Using general impressions to analyze discovered classification rules, in Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD97), Newport Beach, CA, AAAI Press, 1997 Liu, B., Hsu, W., Mun, L.-E, (1996), Finding interesting patterns using user expectations, DISCS Technical Report No. 7, 1996 Pawlak Z., (1985), Rough Ssets and decision tables, in Lecture Notes in Computer Science 208, Springer-Verlag, 1985, 186-196. Pawlak, Z., (1991), Rough Sets: Theoretical aspects of reasoning about data, Kluwer Academic Publisher, 1991. Ra^, Z., Dardzinska, A., "Handling semantic inconsistencies in query answering based on distributed knowledge mining", in Foundations of Intelligent Systems, Proceedings of ISMIS*02 Symposium, LNCS/LNAI, No. 2366, Springer-Verlag, 2002, 66-74 Ras, Z., Wieczorkowska, A., (2000), Action Rules: how to increase profit of a company, in Principles of Data Mining and Knowledge Discovery, (Eds. D.A. Zighed, J. Komorowski, J. Zytkow), Proceedings of PKDD'OO, Lyon, France, LNCS/LNAI, No. 1910, SpringerVerlag, 2000, 587-592 Ras, Z.W., Tsay, L.-S., (2003), Discovering Extended Action-Rules (System DEAR), in Intelligent Information Systems 2003, Proceedings of the IIS'2003 Symposium, Zakopane, Poland, Advances in Soft Computing, Springer-Verlag, 2003, 293-300 Ras, Z.W., Tzacheva, A., (2003), Discovering semantic incosistencies to improve action rules mining, in Intelligent Information Systems 2003, Advances in Soft Computing , Proceedings of the IIS*2003 Symposium, Zakopane, Poland, Springer-Verlag, 2003, 301310 Ras, Z., Gupta, S., (2002), Global action rules in distributed knowledge systems, in Fundamenta Informaticae Journal, lOS Press, Vol. 51, No. 1-2, 2002, 175-184 Silberschatz, A., Tuzhilin, A., (1995), On subjective measures ofinterestingness in knowledge discovery, in Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD95), AAAI Press, 1995 Silberschatz, A., Tuzhilin, A., (1996), What makes patterns interesting in knowledge discovery systems, in IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, 1996 Skowron A., Grzymala-Busse J., (1991), From the Rough Set Theory to the Evidence Theory, in ICS Research Reports, 8/91, Warsaw University of Technology, October, 1991 Sowa, J.F., (2000), Ontology, Metadata and Semiotics, in Conceptual Structures: Logical, Linguistic, and Computational Issues, B. Ganter, G.W. Mineau (Eds), LNAI 1867, Springer-Verlag, 2000, 55-81 Suzuki, E., Kodratoff, Y., (1998), Discovery of surprising exception rules based on intensity of implication, in Proc. of the Second Pacific-Asia Conference on Knowledge Discovery and Data mining (PAKDD), 1998
20
Circularity in Rule Knowledge Bases Detection using Decision Unit Approach Roman Siminski and Alicja Wakulicz-Deja University of Silesia, Institute of Computer Science B^ziiiska 39, 41-200 Sosnowiec, Poland [email protected] wakulicziius . edu. p i
20.1 Introduction Expert systems are programs that have extended the range of application of software systems to non-structural, ill-defined problems. Perhaps the crucial characteristic of expert systems that distinguishes it from classical software systems is impossibility to obtain correct and complete formal specification. It comes from the nature of knowledge engineering process, that is essentially a modeling discipline, where the results of modeling activity, is the modeling process itself. Expert systems are programs, that solves problems using knowledge acquired, usually, from human experts in the problem domain, as opposed to conventional software that solves problems by following algorithms. But expert systems are programs and programs must be validated. Regarding the classical definition of validation, as stated in [1]: Determination of the correctness of a program with respect to the user needs and requirements, we claim that it is adequate for KBS. But we encounter some differences if we try to use classical verification methods for software engineering. The tasks performed by the knowledge-based systems usually can not be correctly and completely specified, these tasks are usually ill-structured and no efficient algorithmic approach is known from them. KBS are constructed using declarative languages (usually rule-based) that are interpreted by inference engines. This kind of programming is concerned with truth values, rule dependencies and heuristic associations, in contrast to conventional programming that deals with variables, conditionals, loops and procedures. The knowledge base of expert system contains program, usually constructed using rule-based languages, knowledge engineer uses declarative languages or specialized expert system shells. In this work we concentrate our attention on verification of rule knowledge bases of expert systems. We assume that the inference engine and other parts of expert system doesn't need any verification for example, they derives properties from commercial expert system shell. Although the basic validation concepts are common for knowledge and software engineering, we encounter difficulties if we try to apply classical definitions of
274
Roman Siminski and Alicja Wakulicz-Deja
verification and validation (from software engineering) to knowledge engineering. Verification methods of conventional software are not directly applicable to expert systems and the new, specific methods of verification are required. In our previous works [2, 3,4, 5] we present some of the theoretical and practical information about verification and validation of knowledge bases as well as some of the best known methods and tools described in references. Perhaps the best reference materials we found in Alun Precee home page: httpi/Zwww.csd.abdn.ac.ukTapreece, especially in [8, 9, 10, 1, 12]. We can identify some kinds of anomalies in rule knowledge bases. A. Preece divides them in to the four groups: redundancy, ambivalence, circularity and deficiency. In present work we will discuss only one kind of anomalies - circularity. Circular rule sequences are undesirable because they may cause endless loops, as long as inference system does not recognize them at execution time. We present circularity detection algorithm based on decision units conception described in details in [6, 7].
20,2 Circularity - the problem in backward chaining systems Circularity presents an urgent problem in backward chaining systems. A knowledge base has circularity iff it contains some set of rules such that a loop could occur when the rules are fired. In other words - a knowledge base is circular if it contains a circular sequence of the rules, that is, a sequence of rules that the right-hand of all but the last rule are contained in the left-hand side of next rule in sequence, and the right-hand side of the rule is contained in the left-hand side of the first rule of sequence. More formally [10], a knowledge base R contains circular dependences if there is a hypothesis H that unifies with the consequent of a rule R in the rule base R, where R is Arable only when H is supplied as an input to R (see eq. 1). {3R eR,3Ee
E, 3H e H)
{H = conseq{R) A -^firable{R, R, E) A firable{R, R,EU {H}))
(20.1)
where function conseq{R) supplies the literal from the consequent of rule R : R = Li A L2 A ... A Lm -^ M : conseq{R) = M. E, called environment, is a subset of legal input literals (that does not imply a semantic constraint). H, called inferable hypothesis, is defined to be the set of literals in the consequents ant their instances: H e R if{3R e R) {conseq{R) = H). Predicate fireable describes that a rule i? G R is firable if there is some environment E such that the antecedent of i^ is a logical consequence of supplying E as input to R : fireable{R, R, E)if{3a)(RUE) We can distinguish between direct cycle, where the rule calls itself: P{x) A R{x) -> R{x)
(20.2)
Ri : P{x) A Q{x) -> R{x) i?2 : R{x) A S(x) -> P{x)
(20.3)
20 Circularity in Rule Knowledge Bases...
275
I And
Andf
Fig. 20.1. An example circular rule sequence
20.3 Decision units In the real-world rule knowledge bases literals are often coded using attribute-value pairs. In this chapter we shall briefly introduce conception of decision units determined on a rule base containing the Horn clause rules, where literals are coded using attribute-value pairs. We assume a backward inference. A decision unit U is defined as a triple U = (/, O, R), where / denotes a set of input entries, O denotes a set of output entries and R denotes a set of rules fulfilling given grouping criterion. These sets are defined as follows:
/ = O =
{{attri^valij) {(attri^valij)
:3r e R '."ir G R
Rz=z ^r :Wi j^ j , ri^ Vj G R
{attri^valij) G antec{r)} attri =
conclAttr{r)}
conclAttr{ri) = conclAttr{rj)}
(20.4)
Two functions are defined on a rule r : conclAttr{r) returns attribute from conclusion of rule r, antec{r) is a set of conditions of rule r. As it can be seen, decision unit U contains the set of rules R, each rule r e R contains the same attribute in the literal appearing in the conclusion part. All rules grouped within a decision unit take part in an inference process, confirming the aim described by attribute, which appears in the conditional part of each rule. The process given above is often considered to be a part of decision system, thus it is called - a decision unit. All pairs (attribute, value ) appearing in the conditional part of each rule are called decision unit input entries, whilst all pairs (attribute, value) appearing in the conclusion part of each set rule R are called decision unit output entries. Summarising, the idea of decision units allows arranging rule-base knowledge according to a clear and simple criterion. Rules within a given unit work out or confirm the aim determined by a single attribute. When there is a closure of the rules within a given unit and a set of input and output entries is introduced it is possible to review a base on the higher abstraction level. This reveals simultaneously the global connections, which are difficult to be detected immediately, on the basis of rules list verification. Decision unit idea can be well used in knowledge base verification and
276
Roman Siminski and Alicja Wakulicz-Deja Rule set R
r
(ai, valjj)
(Oj, valj if (aI, valji) (a^, valj if(a^
(a^
valj
Input entries /
val^J
(a^, valj I
:
(a^, valj (a^ valj
Output entries O
Fig. 20.2. The structure of the decision unit U
validation process and in pragmatic issue of modelling, which is the subject to be presented later in this paper.
20.4 Decision units in knowledge base verification Decision unit introduction allows implementing the anomalies division into local and global: • •
local anomalies appear within considered individually the decision unit and their detection is local; global anomalies disclose at the decision unit net level. Their detection is based on the connection analysis between units and is global.
A single decision unit can be considered as a model of an elementary, partial decision that has been worked out by the system. The reason for this situation is that all rules being a constitution of a decision unit have the same conclusion attribute. All conclusions create a set of unit output entries specifying possible to confirm inference aims. The decision unit net allows us to formulate the global verification method. On the strength of connections between decision unit analysis it is possible to detect local anomalies in rules, such as deficiency, redundancy, incoherence or circularity, creating chains during an inference process. We can apply considerations at the unit level using black box and glass box techniques. Digressing from an integral unit structure, which creates a net, allows us to detect characteristic symptoms of global anomalies. This can give us a push to do a detailed analysis, making allowance for an integral structure of each unit. This analysis is nevertheless limited to a given fragment of a net, having been tipped previously through a black box verification method.
20 Circularity in Rule Knowledge Bases...
277
20.5 Circular relationship detection technique using decision units There is one particular case of circularity - circularity inside decision unit. This is an example of local anomalies. We can detect this kind of circularity on the local level, building local casual graph - this case presents Fig. 20.3. Global circular rule relationship detection technique shall be presented on example. Figure 20.4a presents such an example. A net can be described as a directed graph. After exclusion of input and output entries discrimination and after rejection of vertexes which stand for disjointed input and output entries the graph assumes shape like the one presented by Figure 20.4b. Such graph shall be called a global relationship decision unit graph. As it can be seen, there are two cycles: 1-2-3-1 and 1-3-1. a)
0
b) l:c=v^,if
<^>—K^=^ci> ^~vcl''
2: c=v^ if
o - > ( ^ G^w^
^>-<«zw%^^^ c^->(5^^>)Gii>t
E^ Fig. 20.3. An example of circularity in decision unit - local casual graph
The presence of cycles can indicate appearance of a cycle relationship in a considered rule base. Figure 20.4c presents example where there is no cyclical relationship - the arcs define proper rules. To make the graph more clear a text description has been omitted. On the contrary, the figure 4d presents case where both cycles previously present at figure 4b now stand for real cycles in a base rule. Thus, the presence of cyclical relationship on a decision unit relationship graph is an indication to carry out a cyclical relationship presence inspection on a global level. This can be achieved by creating a suitable reason-result (casual) graph, representing relations between input and output entries of units, causing cyclical relations described by decision unit relationship diagram. Scanned graph shall consist of only nodes and arcs necessary to determine a circularity causing limitations in scanned area.
20.6 Summary This paper presents the usage of decision units in circularity detection task. Decision units allow modular organisation of the rule knowledge base, which facilitate programming and base verification, simultaneously increasing the clarity of achieved
278
Roman Simiiiski and Alicja Wakulicz-Deja 3 C 3^C 15^
>--i
^
C
T
2
g""*!
^
)) ) •»(
±^
c
A.1.,,,
) 3 C ^''g
>^ >*-?
rf)
>-<^
A»Un, }~»--°
Fig. 20.4. Circularity in the decision units net results and ergonomics of a knowledge engineer labour. The decision units are simple and intuitive models describing relations in rule knowledge base, being direct superstructure of a rule-base model. Decision units allow us to reduce search space in circularity detection task. Thus, the decision units can be considered as a source of simple heuristics, which make verification simple and more efficient. In this same way we can reduce search spaces in other verification tasks, i.e. in redundancy detection in rules chains. We can take into account only selected paths in rules chains and performing verification more quickly. The knowledge base shown as a decision unit
20 Circularity in Rule Knowledge Bases...
279
net allows us to make a convenient presentation in the graphic form - both on a computer screen and on the prints. Thus, we can present verification result in clear and user friendly form. The difficulties with rule base verification have been described in a natural and intuitive way. Detected anomalies and suggested methods of their elimination can be presented to the knowledge engineer in a suggestive way, without the necessity of relating to a complex, conceptual issues.
References 1. Adrion W.R., Branstad M.A., Cherniavsky J.C., 1982, Validation, verification and testing of computer software, ACM Computing Surveys, June, 14(2) pp. 159-192. 2. Simiriski R., Wakulicz-Deja A. (1998) A., Principles and Practice in Knowledge Bases Verification, Proceedings of the IIS VII, Intelligent Information Systems, Poland, Malbork, 15-19.06.1998, pp. 203-211. 3. Siminski R. (1998), Methods and Tools for Knowledge Bases Verification and Validation, Proceedings of CAr98 - Colloquia in Artificial Intelligence, 28-30.9.1998, Lodz, Poland. 4. Siminski R., Wakulicz-Deja A. (1998), Principles and Practice in Knowledge Bases Verification, Proceedings of IIS'98 - Intelligent Information Systems VII, 15-19.6.1998, Malbork, Poland. 5. Siminski R., Wakulicz-Deja A. (1999), Dynamic Verification Of Knowledge Bases, Proceedings of IIS'99, Intelligent Information Systems VIII, 14-18.6.1999, Ustron, Poland. 6. Siminski R., Wakulicz-Deja A. (2000), Verification of Rule Knowledge Bases Using Decision Units, Advances in Soft Computing, Intelligent Information Systems, PhysicaVerlag, Springer Verlag Company, 2000. 7. Siminski R., Wakulicz-Deja A. (2003), Decision units as a tool for rule base modeling and verification, Proceedings of Intelligent Information Systems Intelligent: Information Processing and Web Mining, 2-5.6.2003, Zakopane, Polska, Advances in Soft Computing, Physica-Verlag, Springer Verlag Company, 2003, pp. 553-556. 8. Preece A.D. (1991), Methods for Verifying Expert System Knowledge Base, [email protected]. 9. Preece A.D. (1991a), Verifying expert system knowledge bases: An example, [email protected]. 10. Preece A.D. (1994). Foundation and Application of Knowledge Base Verification. International Journal of Intelligent Systems, 9 pp. 683-701 11. Preece A.D. Batarekh A. Shinghal R. (1990) Verifing Rule-Based Systems, [email protected]. 12. Preece A.D., Shinghal R., Batarekh A. (1992), Principles and Practices in Verifing Rule-Based Systems, Knowledge Engineering Review, vol. 7, no. 2, pp. 115-141, [email protected].
21 Feedforward Concept Networks* Dominik Sl^zak^'^, Marcin Szczuka^, and Jakub Wroblewski^ ^ Department of Computer Science, University of Regina Regina, SK, S4S 0A2, Canada ^ Polish-Japanese Insdtute of Information Technology Koszykowa 86, 02-008 Warsaw, Poland ^ Institute of Mathematics, Warsaw University Banacha 2, 02-097 Warsaw, Poland [email protected], [email protected], j akubw@pj ws tk.edu.pi
Summary. The paper presents an approach to construction of hierarchical structures of data based concepts (granules), extending the idea of feedforward neural networks. The operations of processing the concept information and changing the concept specification through the network layers are discussed. Examples of the concepts and their connections are provided with respect to the case study of learning hierarchical rule based classifiers from data. The proposed methods are referred to the foundations of granular and rough-neural computing.
21.1 Introduction If we take a look at the standard approach to classification and decision support with use of learning systems, we quickly realize that it does not always fit the purpose. Equipped with the hypothesis formation (learning) algorithm, we attempt to find a possibly direct mapping from the input values to decisions. Such an approach does not always result in success, because of various reasons. We address the situation when the desired solution should be more fine-grained, namely, it should have an internal structure. Although possibly hard to find and learn, such architecture repays us by providing significant extensions in terms of flexibility, generality and expressiveness of the yielded model. We attempt to show our view on the process of construction and tuning of hierarchical structures of concepts (which can be also referred to as granules of knowledge [11, 13, 14]). We address these structures diS feedforward concept networks, which can be regarded as a special case of hierarchical structures developed within the rough-neural computing (RNC) methodology [6, 8,9]. In particular, we consider * Supported by grant 3T11C00226 from the Polish Ministry of Scientific Research and Information Technology. The first author also supported by the grant from the Faculty of Science, the University of Regina.
282
Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski
classifier networks, where the input concepts correspond to the classified objects' behavior with respect to the standard classifiers and the output (target) concept reflects the final classification. The basic idea is that the relationship between such input and output concepts (granules) is not direct but based on the internal layers of intermediate elements, which help in more reliable transition from the basic information to possibly compound classification goal. We strive against formalization of our approach with use of analogies rooted in other areas, such as artificial neural networks [3, 4], ensembles of classifiers [2, 12, 21], and layered learning [20]. We show how the presented ideas can be exploited within wider frameworks of rough-neural and granular computing. We also make an effort to provide examples of actual models outlined in our earlier, applicationoriented papers [18, 19]. The paper starts with general overview sketching main points of the proposed approach. We provide some intuitions and familiarize the reader with mechanisms present in our proposed model. Then, we go step-by-step through formalization describing the kinds of dependencies that drive the whole approach. Where possible, we provide examples to better ground the ideas.
21.2 Hierarchical learning and classification Let us start by explaining how we intend to treat the notion of a concept. In general, a concept is an element drawn from a parameterized concept space. By a proper setting of these parameters we choose the right concept. Note, that we do not initially demand that all concepts come from the same space. Such an informal definition of a concept space can be referred to the notion of an information granule system S = (G, R, Sem), where G is a set of parameterized formulas called information granules, i? is a (parameterized) relation structure, and Sem is the semantics of G in i? (cf. [14]). In our approach, we focus especially on the concept parametrization and ability of parameterized construction of new concepts from the others. In this sense, our understanding of a concept space can be regarded as equivalent to information granule system and the terms concept and granule can be used exchangeably. Let a concept represent an element acting on the basis of information originating from other concepts or directly from the data source. To better depict the whole structure, it is convenient to exploit the analogy with artificial neural networks. In this case, a concept corresponds to a signal transmitted through a neuron - the basic computing unit. Dependencies between concepts, their precedence and importance, are represented by weighted connections between nodes. Similarly to the feedforward neural network, operations can be performed from bottom to top. They can correspond to the following goals: Construction of compound concepts from the elementary ones. It can be observed in the case-based reasoning (cf. [5]), layered learning (cf. [20]), as well as rough mereology [10] and rough neural-computing [6, 8, 9], where we want to approximate target concepts step by step, using the simpler concepts that are easier to learn directly from data.
21 Feedforward Concept Networks
283
Construction of simple concepts from the advanced ones. It can be considered for the synthesis of classifiers, where we start from compound concepts (granules) reflecting the behavior of a given object with respect to particular, often compound classification systems, and we tend to obtain a very simple concept of a decision class where that object should belong to [9, 11]. The first goal corresponds to generalization of simple concepts while the second - to instantiation of general concept in a simpler, more specialized concept (cf. [16]). Obviously, we do not assume that the above are the only possible types of constructions. For instance, in a classification problem, decision classes can have a compound semantics requiring gradual specification corresponding to the first type of construction. Then, once we reach an appropriate level of expressiveness, we follow the second scenario to synthesize those compound specifications towards obtaining the final response of the classifier network.
21.3 General network architecture When considering hierarchical structures for compound concept formation, several issues pop-up. At the very general level of hierarchy construction/learning, one has to make choices with respect to homogeneity and synchronization. We mention below how these factors determine the complexity of construction task. Homogeneous vs. heterogeneous. At each level of hierarchy we make choice of the type of concepts (granules) to be used. In the simplest case each node implements the same type of mapping. We have studied such a fully homogeneous system in [18, 19] to express probabilistic classifiers based on the rough set reducts [12] and Naive Bayes approach. First step towards heterogeneity is by permitting different types of concepts to be used at various levels in hierarchy, but retaining uniformity across a single layer. This creates typical layered learning model [20]. Finally, we may remove all restrictions on the uniformity of models in the neighboring nodes. In this way we produce a more general but harder to control structure. Synchronous vs. asynchronous. This issue is concerned with the layout of connections between nodes. If it has easily recognizable layered structure we regard it to be synchronized. In other words, we can analyze the hierarchical structure in a level-by-level manner and, consequently, have an ability to clearly indicate the level of abstraction for composite concepts. If we permit the connections to be established on less restrictive basis, the synchronization is lost. Then, the nodes from non-consecutive levels may interact and the whole idea of simple-to-compound precedence of concepts becomes less usable. The layouts of classifier networks for various levels of homogeneity and synchronization are illustrated in Figure 21.1. The simplest case of homogeneous and synchronized network corresponds to Figure 21.1a. The partly homogeneous, synchronized architecture that we are attempting to formalize in this paper is shown in Figure 21.1b. Figures 21.1c and 21. Id represent the harder cases.
284
Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski
One can see that there are also other cases possible. For instance, we can consider asynchronous but homogeneous network described in [1], where the nodes correspond semantically to the complex concepts we want to approximate although syntactically the operations within the nodes remain of the same type, regardless of whether those nodes represent the advanced of very initial concepts.
Fig. 21.1. Examples of network layout: a. both synchronized and homogeneous; b. synchronized and partly heterogeneous; c. synchronized and heterogeneous; d. neither synchronized nor homogeneous.
21.4 Hierarchical concept schemes In this section we present a general notation for feedforward networks transmitting the concepts. Since we restrict ourselves to the two easier architecture cases illustrated by Figures 21.1a and 21.1b, we can consider the following notion: Definition 1. By a hierarchical concept scheme we mean a tuple (C.MAV). C = {Ci,... ,Cn, C} is a collection of the concept spaces (information granule systems), where C is called the target concept space. The concept mappings MAV
= {mapi : Ci -^ Q + i : z = 1,.
) C'n-f 1 — C}
(21.1)
are the functions linking consecutive concept spaces. We assume that any feedforward concept network corresponds to (C^MAV), i.e. each 2-th layer provides us with the elements of Q . In case of total homogeneity, we have equalities Ci = • • • = C^ = C and mapi = • • • = mapn = identity. For partly homogeneous architecture, some of the mappings can remain identities but we should also expect non-trivial mappings between the concepts of entirely different nature, where Ci ^ Q + i .
21 Feedforward Concept Networks
285
Following the structure of feedforward neural network, we calculate the inputs to each next layer as combinations of the concepts from the previous one. In general, we cannot expect the traditional definition of a linear combination to be applied. Still, the intuition says that the labels of connections should somehow express the level of concepts' importance in formation of the new ones. We refer to this intuition in terms of so called generalized linear combinations: Definition 2. Feedforward concept scheme is a triple (C, AiAV^ CXM), where CIM = {lirii : 2^^^^^ -> Q : 2 = 1 , . . . , n }
(21.2)
defines generalized linear combinations over the concept spaces Ci. For any i = 1 , . . . , n, Wi denotes the space of the combination parameters. If Wi is a partial or total ordering, then we interpret its elements as weights reflecting the relative importance ofparticular concepts in construction of the resulting concept. Let us denote by m{i) G N the number of nodes in the i-th network layer. For any i = 1 , . . . , n, the nodes from the z-th and {i + l)-th layers are connected by the links labeled with parameters ^j(*4.i) ^ Wi, for j{i) = 1 , . . . , m(i) and j{i -h 1) = 1 , . . . , m(i + 1). For any collection of the concepts c j , . . . , c^^*^ G Q occurring as the outputs of the i-th network's layer in a given situation, the input to the j{i -h l)-th node in the {i -h l)-th layer takes the following form: 4 \ + ^ U m a p , (lim ( { ( 4 ' ' \ 4 ^ - i i ) ) -.Jii) = l , . . . , m ( i ) } ) )
(21.3)
The way of composing functions within the formula (21.3) requires, obviously, further discussion. In this paper, we restrict ourselves to the case of Figure 21.2a, where mapi and lirii are stated separately. However, parameters ^^/*^_i) could be also used directly in a generalized concept mapping genmapi : 2^^^^^ - . Q + i
(21.4)
as shown in Figure 21.2b. These two possibilities reflect construction tendencies described in Section 21.2. Function (21.4) can be applied to construction of more compound concepts parameterized by the elements of Wi, while the usage of Definitions 1 and 2 results rather in potential syntactical simplification of the new concepts (which can, however, still become more compound semantically). One can see that function genmap and the corresponding illustration 21.2b refer directly to the ideas of synthesizing concepts (granules, standards, approximations) known from rough-neural computing, rough mereology, and the theory of approximation spaces (cf. [6, 11, 14]). On the other hand, splitting genmap's functionality, as proposed by formula (21.3) and illustrated in 21.2a, provides us with a framework more comparable to the original artificial neural networks and their supervised learning capabiHties (cf. [19,18]).
286
Dominik Slf zak, Marcin Szczuka, and Jakub Wroblewski
21.5 Weighted compound concepts Beginning with the input layer of the network, we expect it to provide the conceptssignals c j , . . . , c^^^^ G Ci, which will be then transmitted towards the target layer using (21.3). If we learn the network related directly to real-valued training sample, then we get Ci = R, lirii can be defined as classical linear combination (with Wi = M), and mapi as identity. An example of a more compound concept space originates from our previous studies [18, 19]:
C;]^7|^^'^ = map.
olin^
Fig. 21.2. Production of new concepts in consecutive layers: a. the concepts arefirstweighted and combined within the original space Ci using function lirii and then mapped to a new concept in Ct+i; b. the concepts are transformed directly to the new space Ci-^i by using the generalized concept mapping (21.4). Example 1. Let us assume that the input layer nodes correspond to various classifiers and the task is to combine them within a general system, which synthesizes the input classifications in an optimal way. For any object, each input classifier induces possibly incomplete vector of beliefs in the object's membership to particular decision classes. Let DEC denote the set of decision classes specified for a given classification problem. By the weighted decision space WDEC we mean the family of subsets of DEC with elements labeled by their beliefs, i.e.:
WDEC=
U
{{k,iik)'keX,yik^
(21.5)
XCDEC
Any weighted decision Jl = {(fc,jLtfc) : A; G Xjx^fjik G R} corresponds to a subset Xji C DEC of decision classes for which the beliefs fik ^^ are known. Another example corresponds to the specific classifiers - the sets of decision rules obtained using the methodology of rough sets [12, 21]. The way of parametrization is comparable to the proceedings with classification granules in [11, 14]. Example 2. Let DESC denote the family of logical descriptions, which can be used to define decision rules for a given classification problem. Every rule is labeled with its description amie ^ DESC and decision information, which takes - in the most
21 Feedforward Concept Networks
287
general framework - the form of Jlmie ^ WDEC. For a new object, we measure its degree of satisfaction of the rule's description (usually zero-one), combine it with the number of training objects satisfying amie, and come out with the number appmie € M expressing the level of rule's applicability to this object. As a result, by the decision rule set space RULS we mean the family of all sets of elements of DESC labeled by weighted decision sets and the degrees of applicability, i.e.: RULS =
(J
{(a, Jl, app) :a£X,p£
WDEC, app £ E}
(21.6)
XCDESC
Definition 3. By a weighted compound concept space C we mean a space of collections of sub-concepts/ram some sub-concept space S (possibly from several spaces), labeled with the concept parameters/ram a given space V, i.e.: C^
[j {{s,Vs):seX,VseV} xcs
(21.7)
For a given c= {{s,Vs) : s e X^ Vg G V}, where Xc C S is the range ofc, parameters Vs EV reflect relative importance of sub-concepts s G Xc within Ci. Just like in case of combination parameters Wi in Definition 2, we can assume a partial or total ordering over the concept parameters. A perfect situation would be then to be able to combine these two kinds of parameters while calculating the generalized linear combinations and observe how the sub-concepts from various outputs of the previous layer fight for their importance in the next one. For the sake of simplicity, we further restrict ourselves to the case of real numbers, as stated by Definition 4. However, in general Wi does not need to be E. Let us consider a classifier network, similar to Example 2, where decision rules are described by parameters of accuracy and importance (initially equal to their support). A concept transmitted by network refers to rules matched by an input object. The generalized linear combination of such concepts may be parameterized by vectors (w^O) G Wi and defined as a union of rules, where importance is expressed by w and 9 states a threshold for the rules' accuracy. Definition 4. Let the i-th network layer correspond to the weighted compound concept space Ci based on sub-concept space Si and parameters V^ = E. Consider the j{i -\-l)-th node in the next layer We define its input as follows:
lini{{{4^'\w^g,^) : j(i) = l,...,m(i)}) =
(21.8)
where Xj(^i) C Si is simplified notation for the range of the weighted compound concept c^^*^ and Vs G E denotes the importance of sub-concept s e Si in c^^^K Formula (21.8) can be applied both to WDEC and RULS. In case of WDEC, the sub-concept space equals to DEC. The sum J2j(i)-sex i ^^(I-i-i)^« gathers the
288
Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski
weighted beliefs of the previous layer's nodes in the given decision class s e DEC. In the case of RULS we do the same with the weighted applicability degrees for the elements-rules belonging to the sub-concept space i ^ E ^ C x WDEC. It is interesting to compare our method of the parameterized concept transformation with the way of proceeding with classification granules and decision rules in the other rough set based approaches [11, 12, 14, 21]. Actually, at this level, we do not provide anything novel but rewriting well known examples within a more unified framework. A more visible difference can be observed in the next section, where we complete our methodology.
O
to be ^ / \ classified
RULS layers
WDEC^l—I for the ' I 1 nhiect
Fig. 21.3. The network-based object classification: the previously trained decision rule sets are activated by an object by means of their applicabihty to its classification; then the rule set concepts are processed and mapped to the weighted decisions using function (21.9); finally the most appropriate decision for the given object is produced.
21.6 Activation functions The possible layout combining the concept spaces DEC, WDEC, and RULS with the partly homogeneous classifier network is illustrated by Figure 21.3. Given a new object, we initiate the input layer with the degrees of applicability of the rules in particular rule-sets to this object. After processing with this type of concept along (possibly) several layers, we use the concept mapping function map{ruls) = { (fc, Eia,ji,app)eruis:keXj, ^PP ' f^k) : k € U(,a,fi,app)Eruls
4
X;
(21.9) that is we simply summarize the beliefs (weighted by the rules' applicability) in particular decision classes. Similarly, we finally map the weighted decision to the decision class, which is assigned with the highest resulting belief. The intermediate layers in Figure 21.3 are designed to help in voting among the classification results obtained from particular rule sets. Traditional rough set approach (cf. [12]) assumes specification of a fixed voting function, which, in our terminology, would correspond to the direct concept mapping from the first RULS
21 Feedforward Concept Networks
289
layer into DEC, with no hidden layers and without possibility of tuning the weights of connections. An improved adaptive approach (cf. [21]) enables us to adjust the rule sets, although the voting scheme still remains fixed. In the same time, the proposed method provides us with a framework for tuning the weights and, in this way, learning adaptively the voting formula (cf. [6, 11, 14]). Still, the scheme based only on generalized linear combinations and concept mappings is not adjustable enough. The reader may check that composition of functions (21.8) for elements of RULS and WD EC with (21.9) results in the collapsed single-layer structure corresponding to the most basic weighted voting among decision rules. This is exactly what happens with classical feedforward neural network models with no non-linear activation functions translating the signals within particular neurons. Therefore, we should consider such functions as well. Definition 5. Neural concept scheme is a quadruple (C, MAV, CXM, ACT), where the first three entities are provided by Definitions 1, 2, and ACT = {acti : Ci -^ Ci : 2 = 2 , . . . ,n + 1}
(21.10)
is the set of activation fiinctions, which can be used to relate the inputs to the outputs within each i-th layer of a network. It is reasonable to assume some properties of ACT, which would work for the proposed generalized scheme analogously to the classical case. Given a compound concept consisting of some interacting parts, we would like, for instance, to guarantee that a relative importance of those parts remains roughly unchanged. Such a requirement, corresponding to monotonicity and continuity of real functions, is well expressible for weighted compound concepts introduced in Definition 3. Given a concept Ci G Ci represented as the weighted collection of sub-concepts, we claim that its more important (better weighted) sub-concepts should keep more influence on the concept acti{ci) G Ci than the others. In [18, 19] we introduced sigmoidal activation function working on probability vectors comparable to the structure of WD EC in Example 1. That function, originated from the studies on monotonic decision measures in [15], can be actually generalized onto any space of compound concepts weighted with real values: Definition 6. By a-sigmoidal activation function for weighted compound concept space C with the real concept parameters, we mean function act^ : C -^ C parameterized by a> 0 which modifies these parameters in the following way: act2:{c) = {\s,^-^^^^^^^y.it,vt)ec\
(21.11)
By composition of lirii and mapi, which specify the concepts c^^^'^ ^ e C^+i as inputs to the nodes in the (z+l)-th layer, with functions actf_^i modifying the concepts within the entire nodes, we obtain a classification model with a satisfiable expressive and adaptive power. If we apply this kind of function to the rule sets, we modify the rules' applicability degrees by their internal comparison. Such performance cannot
290
Dominik Slf zak, Marcin Szczuka, and Jakub Wroblewski
be obtained using the classical neural networks with the nodes assigned to every single rule. Appropriate tuning of a > 0 results in activation/deactivation of the rules with a relative higher/lower applicability. Similar characteristics can be observed within WDEC, where the decision beliefs compete with each other in the voting process (cf. [15]). The presented framework allows for modeling also other interesting behaviors. For instance, the decision rules which inhibit influence of other rules (so called exceptions) can be easily achieved by negative weights and proper activation functions, what would be hard to emulate by plain, negation-free conjunctive decision rules. Further research is needed to compare the capabilities of the proposed construction with other hierarchical approaches [6, 10, 9, 20].
21.7 Learning in classifier networks A cautious reader have probably already noticed the arising question about the proper choice of connection weights in the network. The weights are ultimately the component that decides about the performance of entire scheme. As we will try to advocate, it is - at least to some extent - possible to learn them in a manner similar to the case of standard neural networks. Backpropagation, the way we want to use it here, is a method for reducing the global error of a network by performing local changes in weights' values. The key issue is to have a method for dispatching the value of the network's global error functional among the nodes (cf. [4]). This method, when shaped in the form of an algorithm, should provide the direction of the weight update vector, which is then applied according to the learning coefficient. For the standard neural network model (cf. [3]) this algorithm selects the direction of weight update using the gradient of error functional and the current input. Obviously, numerous versions and modifications of gradient-based algorithm exist. In the more complicated models which we are dealing with, the idea of backpropagation transfers into the demand for a general method of establishing weight updates. This method should comply to the general principles postulated for the rough-neural models (cf. [8, 21]). Namely, the algorithm for the weight updates should provide a certain form of mutual monotonicity i.e. small and local changes in weights should not rapidly divert the behavior of the whole scheme and, at the same time, a small overall network error should result in merely cosmetic changes in the weight vectors. The need of introducing automatic backpropagation-like algorithms to rough-neural computing were addressed recently in [6]. It can be referred to some already specified solutions like, e.g., the one proposed for rough-fuzzy neural networks in [7]. Still, general framework for RNC is missing, where a special attention must be paid on the issue of interpreting and calculating partial error derivatives with respect to the complex structures' parameters. We do not claim to have discovered the general principle for constructing backpropagation-like algorithms for the concept (granule) networks. Still, in [18,19] we have been able to construct generalization of gradient-based method for the homogeneous neural concept schemes based on the space WDEC. The step to partly homogeneous schemes is natural for the class of weighted compound concepts.
21 Feedforward Concept Networks
291
which can be processed using the same type of activation function. For instance, in case of the scheme illustrated by Figure 21.3, the conservative choice of mappings, which turn to be differentiable and regular, permits direct translation from the previous case. Hence, by small adjustment of the algorithm developed previously, we get a recipe for learning the weight vectors. An example of two-dimensional weights {w, 6) e Wi proposed in Section 21.4 is much harder to translate into backpropagation language. One of the most important features of classical backpropagation algorithm is that we can achieve the local minimum of an error function (on a set of examples) by local, easy to compute, change of the weight value. It does not remain easy for two real-valued parameters instead of one. Moreover, parameter ^ is a rule threshold (fuzzified by a kind of sigmoidal characterisitcs to achieve differentiable model) and, therefore, by adjusting its value we are switching on and off (almost, up to the proposed sigmoidal function) entire rules, causing dramatic error changes. This is an illustration of the problems arising when we are dealing with more complicated parameter spaces - In many cases we have to use dedicated, time-consuming local optimization algorithms. Yet another issue is concerned with the second „tooth" of backpropagation: transmitting the error value backward the network. The question is how to modify the error value due to connection weight, assuming that the weight is generalized (e.g. the vector as above). The error value should be translated into value compatible with the previous layer of classifiers, and should be useful for an algorithm of parameters modification. It means that information about error transmitted to the previous layer can be not only a real-valued signal, but e.g. a complete description of each rule's positive or negative contribution to the classifier performance in the next layer.
21.8 Conclusions We have discussed construction of hierarchical concept schemes aiming at layered learning of mappings between the inputs and desired outputs of classifiers. We proposed a generalized structure of feedforward neural-like network approximating the intermediate concepts in a way similar to traditional neurocomputing approaches. We provided the examples of compound concepts corresponding to the decision rule based classifiers and showed some intuition concerning their processing through the network. Although we have some experience with neural networks transmitting non-trivial concepts [18, 19], this is definitely the very beginning of more general theoretical studies. The most emerging issue is the extension of proposed framework onto more advanced structures than the introduced weighted compound concepts, without loosing a general interpretation of monotonic activation functions, as well as relaxation of quite limiting mathematical requirements corresponding to the general idea of learning based on the error backpropagation. We are going to challenge these problems by developing theoretical and practical foundations, as well as by referring to other approaches, especially those related to rough-neural computing [6, 8,9].
292
Dominik Sl^zak, Marcin Szczuka, and Jakub Wroblewski
References 1. Bazan, J., Nguyen, S.H., Nguyen, H.S., Skowron, A.: Rough Set Methods in Approximation of Hierarchical Concepts. In: Proc. of RSCTC'2004. LNAI3066, Springer Verlag (2004) pp. 346-355 2. Dietterich, T.: Machine learning research: four current directions. AI Magazine 18/4 (1997) pp. 97-136. 3. Hecht-Nielsen, R.: Neurocomputing. Addison-Wesley (1990). 4. le Cun, Y.: A theoretical framework for backpropagation. In: Neural Networks - concepts and theory. IEEE Computer Society Press (1992). 5. Lenz, M., Bartsch-Spoerl, B., Burkhard, H.-D., Wess, S. (eds.): Case-Based Reasoning Technology: From Foundations to Applications. LNAI1400, Springer (1998). 6. Pal, S.K., Peters, J.F., Polkowski, L., Skowron, A,: Rough-Neural Computing: An Introduction. In: S.K. Pal, L. Polkowski, A. Skowron (eds.), Rough-Neural Computing. Cognitive Technologies Series, Springer (2004) pp. 15-^1. 7. Pedrycz, W., Peters, J.F.: Learning in fuzzy Petri nets. In: J. Cardoso, H. Scarpelli (eds.), Fuzziness in Petri Nets. Physica (1998) pp. 858-886. 8. Peters, J.F., Szczuka, M.: Rough neurocomputing: a survey of basic models of neurocomputation. In: Proc. of RSCTC'2002. LNAI 2475, Springer (2002) pp. 309-315. 9. Polkowski, L., Skowron, A.: Rough-neuro computing. In: W. Ziarko, Y.Y. Yao (eds.), Proc. of RSCTC'2000. LNAI 2005, Springer (2001) pp. 57-64. 10. Polkowski, L., Skowron, A.: Rough mereological calculi of granules: A rough set approach to computation. Computational Intelligence, 17/3 (2001) pp. 472-492. 11. Skowron, A.: Approximate Reasoning by Agents in Distributed Environments. Invited speech at IAT'2001. Maebashi, Japan (2001). 12. Skowron, A., Pawlak, Z., Komorowski, J., Polkowski, L.: A rough set perspective on data and knowledge. In: W. Kloesgen, J. Zytkow (eds.). Handbook of KDD. Oxford University Press (2002) pp. 134-149. 13. Skowron, A., Stepaniuk, J.: Information granules: Towards foundations of granular computing. International Journal of Intelligent Systems 16/1 (2001) pp. 57-86. 14. Skowron, A., Stepaniuk, J.: Information Granules and Rough-Neural Computing. In: S.K. Pal, L. Polkowski, A. Skowron (eds.), Rough-Neural Computing. Cognitive Technologies Series, Springer (2004) pp. 43-84. 15. Sl^zak, D.: Normalized decision functions and measures for inconsistent decision tables analysis. Fundamenta Informaticae 44/3 (2000) pp. 291-319. 16. Sl^zak, D., Szczuka, M., Wrdblewski, J.: Harnessing classifier networks - towards hierarchical concept construction. In: Proc. of RSCTC'2004, Springer (2004). 17. Sl^zak, D., Wroblewski, J.: Application of Normalized Decision Measures to the New Case Classification. In: W. Ziarko, Y. Yao (eds.), Proc. of RSCTC'2000. LNAI 2005, Springer (2001) pp. 553-560. 18. Sl^zak, D., Wroblewski, J., Szczuka, M.: Neural Network Architecture for Synthesis of the Probabilistic Rule Based Classifiers. ENTCS 82/4, Elsevier (2003). 19. Sl^zak, D., Wroblewski, J., Szczuka, M.: Constructing Extensions of Bayesian Classifiers with use of Normalizing Neural Networks. In: N. Zhong, Z. Ras, S. Tsumoto, E. Suzuki (eds.), Proc. of ISMIS'2003. LNAI 2871, Springer (2002) pp. 408-416. 20. Stone, P.: Layered Learning in Multiagent Systems: A Winning Approach to Robotic Soccer. MIT Press, Cambridge MA (2000). 21. Wroblewski, J.: Adaptive aspects of combining approximation spaces. In: S.K. Pal, L. Polkowski, A. Skowron (eds.), Rough-Neural Computing. Cognitive Technologies Series, Springer (2004) pp. 139-156.
22
Extensions of Partial Structures and Their Application to Modelling of Multiagent Systems Bozena Staruch Faculty of Mathematics and Computer Science University of Warmia and Mazury Zolnierska 14a, 10-561 Olsztyn, Poland bs [email protected] Summary. Various formal approaches to modelling of multiagent systems were used, e.g., logics of knowledge and various kinds of modal logics [4]. We discuss an approach to multiagent systems based on assumption that the agents possess only partial information about global states, see [6]. We make a general assumption that agents perceive the world by fragmentary observations only [8, 4]. We propose to use partial structures for agent modelling and we present some consequences of such an algebraic approach. Such partial structures are incrementally enriched by new information. These enriched structures are represented by extensions of the given partial model. The extension of partial structure is a basic notion of this paper. It makes it possible for a given agent to model hypotheses about extensions of the observable world. An agent can express the properties of the states by properties of the partial structure he has at his disposal. We assume that every agent knows the signature of the language that we use for modelling agents.
22.1 Introduction A partial structure is a partial algebra [2, 1] enriched in predicates. For simplicity, we use a language with a satisfactory number of constants and, in consequence, we describe theories of partial structures in terms of atomic formulas with constants and additionally, inequalities between some constants. Such formulas can be treated as constraints defining the discemibility conditions that should be preserved, e.g, during the data reduction [8]. Our theoretical considerations splits into two methods: partial-model theoretic and logical one. We investigate two kinds of sets of first order sentences. An infallible set of sentences (a partial theory) contains all sentences that should be satisfied in every extension of the given family of partial structures. A possible set of sentences is the set of sentences that is satisfied in a certain extension of the given family of partial structures. Any partial algebraic structure is closely related to its partial theory. The theory of a partial structure that is the intersection of all its extensions corresponds to the common part of extensions considered in non-monotonic logics [5].
294
Bozena Staruch
Temporal, modal, multimodal and epistemic logics are used to express properties of extensions of partial structures (see, e.g., [10],[12] or [13]). We investigate the inconsistency problem that may appear in multiagent systems during extending and synthesizing (fusion) of partial results. From logical point of view, inconsistency could appear if the theory of a partial structure representing knowledge of a given agent is logically inconsistent under available information for this single agent or other agents. From algebraic point of view, inconsistency could appear when identification of different constants by agents is necessary. The main tool we use for fusion of partial results is the coproduct operation. For any family of partial structures there exists the unique (up to isomorphism) coproduct that is constructed as a disjoint sum of partial structures factored by a congruence identifying constants that should be identified. Then, inconsistency can be recognized during the construction of this congruence. Notice that Pawlak's information systems [8], can be naturally represented by partial structures. For example, any such system can be considered as a relational structure with some partial operations. Extensions of partial structures can also be applied to problems concerning data analysis in information systems such as the decomposition problem or the synthesis (fusion) problem of partial results [7]. We also consider multiagent systems where some further logical constraints (in form of atomic formulas), controlling the extension process, are added. The paper is organized as follows. We introduce basic facts on partial structures in Section 2. We define there extensions of a partial structure and of a family of partial structures, as well. In Subsection 2.1 we give the construction of coproduct of the given family of partial structures. Section 3 includes the logical part of our theory. We give here a definition of possible and infallible sets of sentences. In the next section we discuss how our algebraic approach can be used in multiagent systems.
22.2 Partial structures We use partial algebra theory [2, 1] throughout the paper. Almost all facts concerning partial algebras are easily extended to partial structures [10 -13]. We consider a signature (F,C,i7,n), with at most countable (finite in practice) and pairwise disjoint sets of function, constants and predicate symbols and with an arity function n : FU n -^ Af, where Af represents the set of nonnegative integers. Any constant is a 0-ary function, so we omit a set of constants in a signature generally, and we write it apparently if necessary. Definition 1. A partial structure of signature (F, 77, n) is a triple A = {A, ( / ^ ) / e F , {'f^^)ren) such that for every f £ F f^ is a partial n{f)-ary operation on A (the domain of the operation f^ C A^^^^ x A is denoted by domf^) and for every r £ 11 r ^ C A^^^\ We say that A is a total structure of signature (F, 77, n) if all its operations are defined everywhere. An operation or relation is discrete if its domain is empty. A partial structure A is discrete if all its operations and relations are discrete.
22 Extensions of Partial Structures ...
295
Notice that for any constant symbol c, the appropriate operation c^ is either a distinguished element in A or is undefined. Every structure (even total) of a given signature is a partial structure of any wider signature. Then, the additional operations and relations are discrete. Remark 1. We will use Pawlak's information systems [8] for presenting examples, so let us recall some definitions here. An information system is a pair S = (f7, A), where each attribute a e A, is identified with function a :U —^Va, from the universe U of objects, into the set Va of all possible values on a. A formula a = Vais called a descriptor and a template is defined as a conjunction of descriptors /\{ai, Va^) where a^ E A,ai ^ aj firi^j. A decision table is an information system of the form A = (C/, A U {d}), where d ^ A is a distinguished attribute called the decision. For every set of attributes 5 C ^ , an equivalence relation, denoted by INDA{B) and called the B-indiscemibility relation, is defined by INDA{B)
= {(u, u') e V^ : for every aeB,
a{u) = a(u')}
(22.1)
Objects u, u' satisfying the relation INDA{B) are indiscernible by attributes from B. If A = {U,A) is an information system, 5 C yl is a set of attributes and X C.U is a set of objects, then the sets: BX = {u e U : [U]B C X} and BX = {u e U : [U]B n Xj^0} are called the B-lower and the B-upper approximation of X in A, respectively. The set BNB{X) =1BX - BX will be called the B-boundary of X. In rough set theory there are considered also approximations determined by tolerance relation instead of equivalence relation. Our approach can be used there, too. Example 1. We interpret the information system as a partial structure A = (f/, R), where R = {(r^^^) '- CL ^ A^v E T4}, and ra^v is a unary relation such that for every X eU X e r^^y iff a{x) = v. Partial operations can be also considered there. Example 2. Every partially ordered set is a partial lattice of signature with two binary operations V and A of the least upper bound and the greatest lower bound, respectively. Definition 2. A homomorphism of partial structures h : A —> B of signature (F, 77, n) is a function h : A —> B such that, for any f E F, if a e domf^ then ho a£ domf^ and then h{f^{a)) = f^{ho a) and for any r E 11 andai,... ,an(r) G A ifr^[ai,... ,an(r)) then r^{h{ai),... ,/i(an(r)))Definition 3. A partial structure B is an extension of a partial structure A iff there exists an injective homomorphism e^ : A —> B. //"B is total, then we say that B is a completion of A, •
E{A) denotes the class of all extensions and
296
•
B ozena Staruch
T( A) denotes the class of all completions of A.
Remark 2. For applications in further sections we use a generalization of the above notion of extension. By a generalized extension of the given partial structure A we understand any partial structure B (even of extended signature) such that there exists a homomorphism /i : A ^ B preserving some a'priori chosen constraints. Properties of extensions defined by monomorphisms are important from theoretical point of view and can be easily imported to more general cases. We also consider extensions under some further constraints which follow from assumption on extensions to belong to special classes of partial structures [13]. Definition 4. A is a weak substructure o/B iff the identity embedding id A : A -^ B is a homomorphism ofpartial structures idA : A —^ B . Hence, every partial structure is an extension of its weak substructure. We do not recall here notions of a relative substructure and a closed substructure. Example 3. If B is a subtable of the given information system A, then the corresponding partial structure A is an extension of B . By a subtable we mean any subset of the given universe with some attributes and some values of these attributes. We allow null attribute values in subtables. B = (UB^RB) is a weak substructure of the given information system A = ([/, R) if UB CU and RB Q R (then also B C A). It means that if x G r^^ then X e r^y. Hence, it may be that a{x) = t' in A and x £ UB and a G 5 but a{x) is not determined in B. Example 4. For generalized extensions we discern some constants. For example let A = ([/, -R) be a relational system corresponding to information system A = (C/, A) and let X C A. Take the language Cu in which every object of C/ is a constant. Assume that every constant of the lower approximation A{X) should be discerned from every constant from the complement, while no assumption is taken for objects in the boundary region of the concept. One can describe the above discemibility using decision tables. To do that let d be an additional decision attribute such that d{x) = 1 for every x G A{X) and d{x) = 0 for every x eU\ A{X). For partial lattices A , B , C described in Figure 22.1, A is an extension of B and obviously B is a weak substructure of A. A is a generalized extension of C under assumption that a ^ h ^ c ^ a. The appropriate homomorphism glues c with d. II.IX Extensions of a family of partial structures We assume that a family of partial structures (agents) is given. Every possible extension should include (in some way) every member of the given family, as well as the entire family. Let us take the following definition:
22 Extensions of Partial Structures ...
297
Example 5.
K B
Fig. 22.1. Partial lattices
Definitions. Let dt = {Ai)i^i be a family of partial structures of a given fixed signature. A partial structure B is an extension of^iff^ is an extension of every AiE^. And B is a generalized extension of^iff^ is a generalized extension of every E{^), T(9?) denote the classes of extensions and completions of a family 3^, respectively. Definition 6. Let 3? = {Ai)i^i be a family of partial structures of signature (F, C,n,n). A partial structure B of the same signature is called a coproductc?/3^ iff there exists a family of homomorphisms hi : Ai —^ 'B for every Ai G ^ and iffor a certain partial structure C there exists a family of homomorphisms gi \ Ai ^^ C then there exists a unique homomorphism /i : B —> C such that ho hi = Qi, Proposition 1. For any family ofpartial structures the coproduct of this family exists and is unique up to isomorphism. Construction of coproducts of partial structures Let ^ = {Ai)i^i be a family of partial structures of signature (F, C, 77, n). We assume that there are no 0-ary functional symbols in F, i.e., all constants are included in C. Let for any A^ e 5R, A^ denote its reduct to signature (F, 77, n). We first take a disjoint sum (j^^ of the family SR° = {A? : A^ e ^}. We now take care of the appropriate identification of the existing constants and set ^o = {((eAi^^)^(cA,-^^-)). Ai^AjGdi, ceC, c^Sc^^exist} Moreover, let 6 be the congruence relation on | J ^ ° generated by 0^. Finally, we set B = \J^^/O and a family of appropriate homomorphisms hi : A^ -^ B so that hi{a) = [ia,i)]0. Proposition 2. The partial structure B constructed above is a coproduct of the family SR.
298
Bozena Staruch Example 6.
Coproduct Fig. 22.2. Coproduct of partial lattices A, B, C If each of the above homomorphisms is injective, then we call the coproduct the free sum of K and the free sum is an extension of 3?. If there are no constants in the signature then the disjoint sum of the family 3ft is a free sum of 3?. The coproduct is a generalized extension of 5t if it preserves all the a'priori chosen inequalities. We also consider coproducts under further constraints in form of a set of atomic formulas. Look at Figure 22.2. Assume that a, 6, c, cJ, e are constants in the signature of the language. In the coproduct of A, B, C there must be determined a A 6, a A c, 6 A c, because they have been determined in A. It follows from the construction of the coproduct that all a V 6, a V c, 6 V c must be determined and must be equal. Notice that this coproduct is not an extension of the given family of partial lattices, but it is a generalized extension of this family when it is assumed that a ^ h ^ c^ a. And it is not a generalized extension of {A, B , C} if we assume that all the constants a, 6, c, d, e are pairwise distinct. We see from this example that depending on the initial conditions the coproduct of the given family of partial structures can be a generalized extension or not. If it is not a generalized extension then it means that generating the congruence in the construction of coproduct we have to identify constants which are assumed to be different. In this situation we say that the given family of partial structures is inconsistent with undertaken assumptions (consistent, for short). This inconsistency is closely related to logical inconsistency. The given family 9? of partial structures is inconsistent with undertaken assumptions A iff the infallible set of sentences for 3? (definition in the next section) is inconsistent with A. It is important to know what to do when inconsistency appears. We consider the simplest approach to detect what condition cause problems and take a crisp decision of rejecting it. In the above example by rejecting d ^ e ^t obtain a consistent generalized extension. For applications it is worth to assume that partial structures under considerations are finite with finite sets of functional and relational symbols. In this situation we can check inconsistency in a finite predicted time. Various methods for conflict resolving dependent on the application problem are possible. For example, facts that cause conflicts can be rejected, one can use voting in eliminating facts causing conflicts. In
22 Extensions of Partial Structures ...
299
general, the process of eliminating some constraint can require some more advanced negotiations between agents.
22.3 Possible and infallible sets of sentences We present in this section the logical part of our approach. Let £ be a first order language of signature (F, 77, n). The set of all sentences of the language C is denoted \yj Sent{C). Assume that A is a given partial structure in the signature of £ (a partial structure of £, for short). Definition 7. A set of sentences E C Sent{C) is possible/(7r A iff there is a total structure B G T(A) such that B \= E. The set of sentences PA = C\{^K^) ' ^ ^ ^ ( A ) } is called an infallible set of sentences for A. We say also that PA is the theory ofpartial model A. Notice that a set of sentences is possible for a partial structure A if it is possible for a certain extension of A. The infallible set of sentences for a partial structure A is also an intersection of infallible sets for all its extensions. Notice here that extension used in non-monotonic logics corresponds to theories of total structures, whereas the infallible set for a partial structure corresponds to the intersection of non-monotonic extensions. The properties of possible and infallible sets of sentences are described and proved in [10 - 13]. If 5^ is a family of partial structures then we define possibility and infallibility for 3^ analogously, and if Pg? denotes the set of sentences infallible for 3fJ then we have the following: 1. Pgj = Cn((J{PAi • Ai € 3?}), where Cn denotes classical operator of first order consequences. 2. P^ is logically consistent iff T(§ft) is nonempty. Let A be a partial structure of a language C of signature (P, 7J, n). We extend C to CA by adding a set of constants CA = {ca • CL ^ A}. Now, we describe all the information about A in £A- Let EA be the sum of the following sets: ^F ={f{ca^,...,Ca^^f^) = Ca: f ^ F, (tti,..., a ^ / ) ) E d o m / ^ , /^(tti,..., a ^ / ) ) a} ^n ={r(cai,...,Ca„(,)): r G 77 , r ^ ( a i , ...,a^(^))} SCA = {ca ^ Cfe: a,b e A , ay^b} Remark 3, When dealing with generalized extensions as in Remark 2, we do not assume that the homomorphism for extension is injective. Then in place of EA we may take a set Ep U En U Ec where Ec is any subset of Ec^ • Then all the results for extensions easily transform to the generalized ones. Definition 8. Let Abe a partial structure of a language £. Let CA be the language as above. We say that a partial structure A' is an expansion of A to CA iff A' = A, yA = yA ^A _ ^A ^^j. ^y^yy f £ F and r e n and c^ = a.
300
B ozena Staruch
Proposition 3. For any partial structure AofC Sent{C).
Pp^> = Cn(i7^) and PA = PA' H
For a family 5R = {Ai)i^i of partial structures of a given language C we can take a language >Cs^ extending £ by a set of constants Cs^ = {ca^ : ai e Ai,i e I}. Here the set Ss^ = (J Z'A. has analogous properties to EA-
22A Partial structures and their extensions in multiagent systems Let us consider a team of agents with the supervisor agent who collects information, deals with conflicts and distributes knowledge. Generally, the system that we discuss will be like a firm hierarchy. There is the chief director and n departments with their chiefs, the departments 1,..., n are divided into Ui, i = 1,..., n sections with their chiefs and so on. There are information channels from sections to departments and from departments to the director. Agents that are inmiediately higher are able to receive information from their subordinate (child) agents, fuse the information, resolve conflicts, and change knowledge of their children. There are also information channels between departments and between sections and so on. These channels can be used only for exchanging information. It may be assumed that such a frame for agent systems is obligatory. For simplicity we do not assume any frame except the supervisor agent. However, agents can create subsystems of child agents and supervise them. They can also exchange information if they need. The relationships between agents follow from existence of a homomorphism. We represent collected information as a partial structure of a language C with signature {F^IJ^n). We assume that the environment of the problem one work is described in a language of arbitrarily large signature. We can reach this world by observing finite fragments only. Hence the signature we use should be finite, but sufficient for the observed fragments and whenever a new interrelation is discovered, the signature can be extend. At the beginning we perform the following steps: •
• •
We give names to the observable objects and discover interrelations between objects and decide which of them should be written as relations, partial operations or constants and additionally which names describe different objects. Depending on the application problem we decide which interrelations should be preserved while extending. We represent our knowledge by means of afinitepartial structure A of a language C with signature (F, IJ, n) with information about discemibility of some names. The names of observed objects are elements of the structure either all of them or these that are important for some reason.
Having a partial structure A of a language C in signature (F, 77, n) we extend the language to CA- Let A denote the expansion of Kio CA- The discemibility of some names is written up as a subset S C EA- We distribute the knowledge to n agents Agi^ ...Agn- The method of distribution depends on the problem we try to solve, but the most generally, we select n weak substructures Ai,..., An from A. Every agent
22 Extensions of Partial Structures ...
301
Agi, i = 1,..., n is provided with knowledge represented by Ai and additionally we provide him/her with a set of inequalities Si C S. There are a lot of possibilities to do that: we can either take relative or closed substructures or substructures covering A ( or not), or even n copies of A. And the inequalities from E could be distributed to the agents by various procedures. We describe a situation when the agents get Si C S such that there exists a set of constants d C C^^ such that Si = {ca ^ Cb : Ca^Cb e Ci^a ^h}. It means that all the constants from Ci are pairwise unequal. As we will show below this situation is easy to menage from theoretical point of view. Thus, the knowledge of an agent Agi is represented by a partial structure A^ such that Ai = (Ci, {f^')f^F, {'f^^')ren) is a weak substructure of A^ and additionally for every Ca^Ca €Ci ,a ^bii holds that Ca ^ c^. Notice that from logical point of view S^_ C Z*^. C 17 C SA- Hence (J i7^. C U ^Ai ^ SA is inconsistent set of sentences in CAThe knowledge distribution process is based on properties of a finite number of constants. In such an algebraic approach we can take advantage of homomorphisms, congruences, quotient structures, coproducts, and so on. We have at our disposal a closely related logical approach, where we can use infallible sets of sentences, consistency, and so on. We are able to propose logical methods of knowledge distribution via the standard family of partial structures (see [13]) whereas the logic, as not effective, would be less useful in applications. We consider also multiagent systems where some further logical constraints (in form of atomic formulas), controlling the extension process, are added. Hence let every agent Agi possesses a set Ai of atomic formulas. Example 7. Let our information be written as a relational system A = ([/, i?) corresponding to information system A = (f/, A). Let X C ^4 be a set of objects. Take the language Cu. Thus every object is now a constant. We discern every constant of the lower approximation A{X) from every constant from the complement, while no assumption is taken for objects in the boundary region of the concept. Now we distribute A to n agents Agi,..., Agn giving to every Agi, i = 1,..., n, a weak substructure (subtable) A^ = (f/^, RAJ- This distribution depends on expert decision, it may be a covering of A or only a covering of a chosen(training) set of objects. Moreover, every agent Agi has at his disposal sets of objects Xi = XnUi, U—Xi and makes his own approximations of these sets and discern constants. Additionally, every agent Agi gets a set of descriptors (or templates) which is either derived from his own information or is obtained from the system. Hence, every agent approximates a part of the concept described by X, he can get new objects, new attributes, new attribute values and a new set of descriptors. Now one can consider fusion (the coproduct) of information obtained from agents.
302
Bozena Staruch
22.4.1 Agents acting Now we are at starting point, i.e., in a moment t = 0. We have the fixed language CA> Local state of every agent Agi is represented by a partial structure A^ equipped with a set of inequalities of constants from Ci and also with a set of atomic formulas Ai. We take the following assumptions. • • • •
Every agent knows the signature of the language. One can consider also situation where every agent knows all the sets d. Every agent is able to exchange information with others. Then he makes fusion and resolves his internal conflicts using the construction of the coproduct. Every agent can build his own system of child agents in the same way as the whole system. For every moment of time there is the supervising agent with his own constraints (in form of atomic formulas) that collects all the information and deals with inconsistency using coproduct.
Now, knowledge is distributed and we are going to show how agents act. Every agent possesses his knowledge and is able to collect new information independently. New information can be obtained by the agent by his own exploring of the world or by his child agents acting or by exchanging information with other agents. The new information is in the form of new objects, new operation and relation symbols, new determination of constants,extension of domains of some relations and operations, new constraints that can be derived or added. Assume that at some time our system is in a state s. It means that agents have collected information and written it as partial structures Af,..., A^ which are consistent generalized extensions of Ai,..., A^, respectively. The main tool for fusion is a notion of coproduct, which plays the supervising role in the system. Additionally, a set As of atomic formulas is given. We construct a coproduct S of a family of partial structures A, Af,..., A^. If S is a generalized extension of Ai,..., An and As holds in S, then the system is consistent, and knowledge can be redistributed, otherwise we should resolve conflicts. Now, S plays role of A i.e. we take a new language Cs and the expansion S of S to this language. The most general way of knowledge redistribution is repetition of the process. It is often necessary to redistribute the knowledge in accordance, for example, either with the initial information (i.e., preserving A^ for every agent Agi) or with actually sent information. Notice that it is not necessary to stop the system for synthesis. Agents can work during this process. In this situation every agent should synthesize his actual results with redistributed knowledge (i.e., construct a coproduct of this two) and dispose of the eventual inconsistency by his own. 22.4.2 Dealing with inconsistency From logical point of view inconsistency may appear when the set of sentences IJ PA\ is either not possible for the family Ai,..., An or is possible for this family
22 Extensions of Partial Structures ...
303
but is inconsistent with As that is, it is logically inconsistent. Algebraic inconsistency is when the coproduct is inconsistent with the given constraints. There are two kinds of inconsistency: (i) the first one is "internal", when an agent, say Agi needs to identify constants that were different in A^ as a consequence of extending his knowledge; (ii) an "external" one, when knowledge of every agent is internally consistent, but there are conflicts in the whole system. Notice that decision of removing some determinations of constants and operations is not irreversible since the agent can resend the same information. If it happens "often" then we have a signal that there is something wrong, and we have to correct our initial knowledge. We remove inconsistency while exchanging information. If agent Agj sends some information to Agi, then Agi should resolve conflicts using a coproduct as above. The requirement of exchanging information can be recognized by Agi when he gets a constant and he knows that Agj has some information about this constant. One can use the following schema for dealing with inconsistency. First identification of constants in the process of congruence generating in the construction of the coproduct is a signal that inconsistency could appear. From proceeding of this process the supervisor detects the cause of inconsistency and sends orders to remove given determination to detected agents, respectively. We do not assume that the process should stop for the control. If agents work during this time then they fuse their actual knowledge with this redistributed and resolve conflicts with an assumption that the knowledge form the supervisor is more important. The proposed system may work permanendy, stoping after human decision or in case if some conditions are satisfied, e.g., time conditions or restrictions on the system size.
22.5 Conclusions We have presented a way of modelling multiagent systems via partial algebraic methods. We propose the coproduct operator to fuse knowledge and resolve conflicts under some logical-algebraic constraints. Further constraints may be also considered, for example on number of agents and or on the system size.
References 1. Bartol, W., 'Introduction to the Theory of Partial Algebras', Lectures on Algebras, Equations and Partiality, Univ. of Balearic Islands, Technical Report B-006, (1992), pp. 36-71. 2. Burmeister, P., 'A Model Theoretic Oriented Approach to Partial Algebras', Mathematical Research 32, Akademie-Verlag, Berlin (1986). 3. Burris, S., Sankappanavar, H.P[.], 'A Course in Universal Algebra', Springer-Verlag, Berlin (1981). 4. Fagin, R., Halpem, J., Moses, Y., Vardi, M.Y., 'Reasoning About Knowledge', MIT Press, Cambridge MA (1995). 5. Gabbay, D.M., Hogger, C.J., Robinson, A. A., 'Handbook of Logic in Artificial Intelligence and Logic Programming 3: Nonmonotonic Reasoning and Uncertain Reasoning', Oxford University Press, Oxford (1994).
304
Bozena Staruch
6. dlnverno M., Luck, M., 'Understanding Agent Systems', Springer-Verlag, Heidelberg, (2004). 7. Pal, S.K., Polkowski, L., Skowron, A. (Eds.), 'Rough-Neural Computing: Techniques for Computing with Words', Springer-Verlag, Berlin, (2004). 8. Pawlak, Z., 'Rough sets. Theoretical Aspects of Reasoning about Data', Kluwer Academic Publishers, Dordrecht, (1991). 9. Shoenfield, J.R., 'Mathematical Logic', Addison-Wesley Publishing Company, New York 1967. 10. Staruch, B., 'Derivation from Partial Knowledge in Partial Models', Bulletin of the Section of Logic 32, (2002), pp. 75-84. 11. Staruch, B., Staruch B., ' Possible sets of equations'. Bulletin of the Section of Logic 32, (2002), pp. 85-95. 12. Staruch, B., Staruch B . , ' Partial Algebras in Logic', submitted to Logika, Acta Universitatis Vratislaviensis, (2002). 13. Staruch, B., Staruch B.,' First order theories for partial model', accepted for publication in Studia Logica, (2003).
23 Tolerance Information Granules Jaroslaw Stepaniuk Department of Computer Science Bialystok University of Technology Wiejska45a, 15-351 Bialystok, Poland [email protected] Summary. In this paper we discuss tolerance information granule systems. We present examples of information granules and we consider two kinds of basic relations between them, namely inclusion and closeness. The relations between more complex information granules can be defined by extension of the relations defined on parts of information granules. In many application areas related to knowledge discovery in databases there is a need for algorithmic methods making it possible to discover relevant information granules. Examples of SQL implementations of discussed algorithms are included.
23.1 Introduction Last years have shown a rapid growth of interest in granular computing. Information granules are collections of entities that are arranged together due to their similarity, functional adjacency or indiscemibility [13], [14]. The process of forming information granules is referred to as information granulation. Granular computing as opposed to numeric-computing is knowledge-oriented. Knowledge based processing is a cornerstone of knowledge discovery and data mining [3]. A way of constructing information granules and describing them is a common problem no matter which path (fuzzy sets, rough sets,...) we follow. In this paper we follow rough set approach [7] to constructing information granules. Different kinds of information granules will be discussed in the following sections of this paper. The paper is organized as follows. In Section 23.2 we recall selected notions of the tolerance rough set model. In Section 23.3 we discuss information granule systems. In Section 23.4 we present examples of information granules. In Section 23.5 we discuss searching of optimal tolerance granules.
23.2 Selected Notions of Tolerance Rough Sets In this section we recall selected notions of the tolerance rough set model [8], [9], [11], [12].
306
Jaroslaw Stepaniuk
We recall general definition of an approximation space [9], [11], [12] which can be used for example for introducing the tolerance based rough set model and the variable precision rough set model. For every non-empty set C/, let P {U) denote the set of all subsets of U. Definition 1. A parameterized approximation space is a system AS^^% = {UJ^,jy$), where • • •
U is a non-empty set of objects, I^ :U —^ P{U) is a granulation function^ iy$ : P (U) X P (U) —• [0,1] is a rough inclusion function.
The granulation function defines for every object x a set of similarly described objects. A constructive definition of granulation function can be based on the assumption that some metrics (distances) are given on attribute values. For example, if for some attribute a e Aa. metric Sa : Va x Va —> [0, oo) is given, where Va is the set of all values of attribute a then one can define the following granulation function: y e i t {x) if and only if 5a (a {x), a {y)) < fa (a (x)
,a{y)),
where fa'-^a^Va -^ [0, oo) is a given threshold function. A set X C f/ is definable in AS:^^$ if and only if it is a union of some values of the granulation function. The rough inclusion function defines the degree of inclusion between two subsets ofC/[9].
This measure is widely used by data mining and rough set conmiunities. However, Jan Lukasiewicz [5] was first who used this idea to estimate the probability of implications. The lower and the upper approximations of subsets of U are defined as follows. Definition 2, For an approximation space AS^^$ = (C/, J:,^, u$) and any subset X C U the lower and the upper approximations are defined by LOW (A5^,$, X)={xeU:u, (/^ (x), X) = 1} , UPP (A%,$, X) ={xeU :us (/# (x), X) > 0}, respectively Approximations of concepts (sets) are constructed on the basis of background knowledge. Obviously, concepts are also related to unseen so far objects. Hence it is very useful to define parameterized approximations with parameters tuned in the searching process for approximations of concepts. This idea is crucial for construction of concept approximations using rough set methods. In our notation # , $ are denoting vectors of parameters which can be tuned in the process of concept approximation. Approximation spaces are illustrated on Figure 23.1.
23 Tolerance Information Granules
307
Fig. 23.1. Approximation Spaces with Two Vectors # 1 and # 2 of Parameters
Rough sets can approximately describe sets of patients, events, outcomes, keywords, etc. that may be otherwise difficult to circumscribe. We recall the notion of the positive region of the classification in the case of generalized approximation spaces [12]. Definitions. Let AS^^% = {U^I^^u%) be an approximation space and let for a natural number r > 1 a set {Xi,... ,Xr} be a classification of objects (i.e. Xi^... ,Xr CU, |J[_-^ Xi = U and Xi fl Xj = 0/or i ^ j , where z, j = 1 , . . . , r j . The positive region of the classification { X i , . . . , Xr} with respect to the approximation space AS^ $ is defined by POS (A5#,$, { X i / . . .,Xr]) = U L i LOW ( A % , $ , X , ) . Let DT = ([/, A U {d}) be a decision table [7], where C/ is a set of objects, A is a set of condition attributes and c? is a decision. For every condition attribute a G A is known a distance function Sa : Va x Va -^ [0, oo), which is defined as follows: for numeric attributes Sa{ci{Xi),Ci{Xj))
= \ci{Xi) -
a{Xj)\
for symbolic attributes
*.(.fe)..fe))={;:?:<:;!;:<:;!
308
Jaroslaw Stepaniuk
For every attribute a e AV/Q determine tolerance threshold Sa- If we know attribute a e A and Sa, the tolerance relation Ta{sa) (reflexive and symmetric relation) is as follows: Vxi,x,€t/ {{Xi.Xj)
G TaiSa) ^
[Sa{a{Xi),
a{Xj))
< £«]) •
If we have a set of attributes B C A and £ai, where ai e B, the tolerance relation TB {£ai, ^aa 5 • • • 5 ^an) is defined as follows:
Let DT = (U,AU {d}), discemibihty matrix of DT table we called square matrix n x n, where n is a number of objects in the set U. Descemibility matrix M{xi,Xj) is defined as follows: M{^ ^ \_ j W^^'^aia ivi[Xi,Xj)-<^^
(xi), a (xj)) > £a} if d{xi) ^ d{xj) z / d ( x , ) = d(x,)
Let Q be a quality function which is defined, for example, by: Qisa,, £a2,. •., ^an) = (NRdRIA/NRd)
^w-\- (NRdRIA/NRIA)
*{l-w)
where: • • • •
NRd is the number of pairs of objects which have the same decision attribute value; NRIA is the number of pairs of objects which are in TA {sai ? ^02 ? • • i ^an) relation; NRdRIA is the number of pairs of objects, which have the same decision attribute value and they are in TA {sai, ^02 5 • • • ^ ^on) relation; w is a weight, it could have values in the interval [0,1];
If we want to know if our vector of tolerance thresholds or decision rules are optimal, we check values, which are returned by a quality function. The optimal vector or rule is one, which has the highest value of the quality function.
23.3 Information Granule Systems In this section, we present a basic notion for our approach, i.e., information granule system. Any such system S consists of a set of granules G. Moreover, a family of relations with the intended meaning to be a part to a degree between information granules is distinguished. The degree structure is described by a relation to be an exact part. More formally, an information granule system is any tuple S = {G,H,<,{iyp}peH) where
(23.1)
23 Tolerance Inforaiation Granules
309
1. G^ is a finite set of granules; 2. jff is a finite set of granule inclusion degrees with a binary relation < which defines on i? a structure used to compare the degrees 3. Up G G X G is a binary relation to be a part to a degree at least p between information granules from G, called rough inclusion. One can consider the following examples of elementary granules: 1. a set of descriptors of the form (a, v) where a e A and v e Va for some finite attribute set A and value sets T4; 2. a set of descriptor conjunctions. In the standard rough set model information granules are corresponding to indiscemibility classes of an equivalence relation. Let, for example, U be a set of cars (see, Figure 23.2) and we consider two attributes colour and type of car's body. Let ycolour = {white, yellow, black, green} and Vtype = {van, sedan, station wagon}. In this case we obtain twelve information granules corresponding to conjunctions of descriptors e.g. {colour, white) A {type, van), {colour, yellow) A {type,
van),...
1 , station wagon
sedan
van
white
yellow
black //
Fig. 23.2. Granules in the Standard Rough Set Model
green
310
Jaroslaw Stepaniuk
For a set X of cars the lower and the upper approximation is also depicted on Figure 23.2. Examples of complex granules are tolerance granules created by means of similarity (tolerance) relation between elementary granules, decision rules or sets of decision rules. 23.3.1 Syntax and Semantics of Information Granules Usually, together with an approximation space, there is also specified a set of formulas # expressing properties of objects. Hence, we assume that together with the approximation space AS^^$ there are given • •
a set of formulas # over some language, semantics Sem of formulas from ^, i.e., a function from ^ into the power set PiU).
Let us consider an example [7]. We define a language Ljs used for elementary granule description, where IS = {U, A) is an information system. The syntax of LIS is defined recursively by 1. 2. 3. 4.
(a in V) G I//5, for any a e A and V C Va. If a G LIS then -^a € LisIf a, /3 E LIS then a A /3 E L/5. If a, /? E LIS then aV (3 eLis^
The semantics of formulas from Lis with respect to an information system IS is defined recursively by 1. 2. 3. 4.
Semis{a inV) = {x eU : a {x) E V} . Semis{~^Oi) = U — Semis{oi). Semis {a A /3) = Semis{ot) fl Semis{P). Semis {a V /?) = Semis (a) U Semis{/3).
A typical method used by the rough set approach [7] for constructive definition of the uncertainty function is the following: for any object x eU, there is given information InfA {x) (information signature of x in A) which can be interpreted as a conjunction EFB {X) of selectors a = a (x) for a E A and the set / # (x) is equal to Semis{EFB{x))
= Semis
/\ \aeA ) One can consider a more general case taking as possible values of I^ {x) any set ||a|| j5 containing x. Next from the family of such sets the resulting neighborhood / # [x) can be selected. One can also use another approach by considering more general approximation spaces in which / ^ (x) is a family of subsets of IJ, We now present the syntax and the semantics of examples of information granules. These granules are constructed by taking collections of already specified granules. They are parameterized by parameters which can be tuned in applications. In
23 Tolerance Information Granules
311
the following sections we discuss some other kinds of operations on granules as well as the inclusion and closeness relations for such granules. Let us note that any information granule g formally can be defined by a pair {Syn{g), Sem{g)) consisting of the granules syntax Syn{g) and semantics Sem{g). However, for simplicity of notation we often use only one component of the information granules to denote it.
23.4 Examples of Information Granules Elementary granules. In an information system IS = {U,A), elementary granules are defined by EFB (X) , where EFB is a conjunction of selectors (descriptors) of the form a = a{x), B C A and x eU. For example, the meaning of an elementary granule a = l A 6 = l i s defined by Semis (a = 1 A 6 = 1) = {x G C/ : a{x) = lk b{x) = 1} . Thus, in the system 5B = ( G s , i / , < , K } p G / f )
(23.2)
of elementary granules GB is a set of conjunctions of selectors, H = [0,1] and Up{EFB, EF'B) if and only if card (Semis (EFB) D Semis {EF'^)) ^ card (Semis (EFB)) "^ The number of conjuncts in the granule can be taken as one of parameters to be tuned what is well known as the drooping condition technique in machine learning. One can extend the set of elementary granules assuming that if a is any Boolean combination of descriptors over A, then (Ba) and (Ra) define syntax of elementary granules too, for any B C A. Sequences of granules. Let us assume that 5 is a sequence of granules and the semantics Semis (•) in IS of its elements have been defined. We extend Sem^is (•) on 5 by
Semis (S) = {Semis (9)}ges • Example 1. Granules defined by rules in information systems are examples of sequences of granules. Let IS be an information system and let (a, /?) be a new information granule received from the rule if a then (3 where a,/3 are elementary granules of IS. The semantics Semis ((<^i0)) of ((^^l3) is the pair of sets (Semis ((^) 5 Semis (/?)) • If the right hand sides of rules represent decision classes then among parameters to be tuned in classification is the number of conjuncts on the left hand sides of rules. Typical goal is to search for minimal (or less than minimal) number of such conjuncts (corresponding to the largest generalization) which still guarantee the satisfactory degree of inclusion in a decision class. Sets of granules. Let us assume that a set G of granules and the semantics Semis
(•)
312
Jaroslaw Stepaniuk
in IS for granules from G have been defined. We extend Semis (•) on the family of sets H C Gby Semis {H) = {Semis id) ' 9 ^ H}, One can consider as a parameter of any such granule its cardinality or its size (e.g., the length of such granule representation). In the first case, a typical problem is to search in a given family of granules for a granule of the smallest cardinality sufficiently close to a given one. Example 2. One can consider granules defined by sets of rules. Assume that there is a set of rules RuleSet = {(a^, /3i) : i = 1,... ,k} . The semantics of Rule^Set is defined by Sem^is (Rule-Set) = {Sem,is ((a^, /3i)) : i = 1 , . . . ,fc}. The above mentioned searching problem for a set of granules corresponds in the case of rule sets to searching for the simplest representation of a given rule collection by another set of rules (or a single rule) sufficiently close to the collection. Example 3. Let us consider a set G of elementary information granules - describing possible situations together - with decision table DT^ representing decision tables for any situation a e G. Assume Rule.Set{DTa) to be a set of decision rules generated from decision table DTa (e.g., in the minimal form). Now let us consider a new granule {{a,Rule.Set{DTa)) : a e G} with semantics defined by {SemoT {{a,Rule.Set{DTa))) : a e G} = {{Semis {(^), SemoT {Rule.Set {DTJ))) \aeG}. An example of a parameter to be tuned is the number of situations represented in such granule. A typical task is to search for a granule with the minimal number of situations creating together with the sets of rules, corresponding to them, a granule sufficiently close to the original one. Extension of granules defined by tolerance relation. Now we present examples of granules obtained by application of a tolerance relation (i.e., reflexive and symmetric relation; for more information see, e.g., [9]). Example 4. One can consider extension of elementary granules defined by tolerance relation. Let IS = (f/, A) be an information system and let r be a tolerance relation on elementary granules of IS. Any pair (r : a) is called a r-elementary granule. The semantics Sem,is ((r : a)) of (r : a) is the family {Semis {P) : (/?, a) G r } . Parameters to be tuned in searching for relevant tolerance granule can be its support (represented by the number of supporting it objects) and its degree of its inclusion (or closeness) in some other granules as well as parameters specifying the tolerance relation. Example 5. Let us consider granules defined by rules of tolerance information systems [9]. Let IS = {U^ A) be an information system and let r be a tolerance relation on elementary granules of IS. If if a then /3 is a rule in IS then the semantics of a new information granule (r : a,/?) is defined by Semis ((r : a,/?)) =
23 Tolerance Information Granules
313
Semis {{OL^T)) X Semis ( ( / 3 , T ) ) . Parameters to be tuned are the same as in the case of granules being sets of more elementary granules as well as parameters of the tolerance relation. Example 6. We consider granules defined by sets of decision rules corresponding to a given evidence in tolerance decision tables. Let DT = {U,A,d)bea. decision table and let r be a tolerance on elementary granules of IS = (U^A). Now, any granule (a, Rule.Set {DTa)) can be considered as a representative for the information granule cluster (r : {a,Rule.Set{DTa))) with the semantics SemoT i(r : (a, Rule.Set (DTa)))) = {SemDT ((/^, Rule.Set {DT13))) : (^, a)eT}, One can see that the considered case is a special case of information granules from Example 3 with G defined by tolerance relation.
23.5 Searching for Optimal Tolerance Granules In this section we discuss searching for tolerance granules. One of the problems while generating decision rules is determination of optimal tolerance thresholds. Quality of generated rules depends on right choice of tolerance threshold vector. To find optimal tolerance thresholds we have to count difference between objects in decision table. For DT = {U,A\J {d}) and 5a for every attribute a G ^ we can build new decision table DT' = ([/', A' U {D}) where: U' = A' = {a' : U' ^R^
{{xuXj)eUxU:{i<j)} : a\{xi,Xj))
=
5a{a{xi),a{xj))}
Next we can search tolerance threshold vector for appropriate attributes, which describe objects. The easiest way is to consider all possible combinations of vectors what "check all" method presents. For all combinations estimation of quality function is calculated. The best thresholds vector is one with the higher value of estimated quality function. This problem is computationally complex. Experiments showed that pairs of objects with different decision attribute value do not improve quality of tolerance thresholds. It is enough to consider only pairs of objects with the same decision attribute. If we want to use this method for many objects we have to divide table DT' to parts. Second method is heuristic method step forward. Threshold values are counted for every attribute separately. Let A; > 0 be a given natural number. This method consists of steps:
314
Jaroslaw Stepaniuk
1. choose k best thresholds values for first attribute; 2. choose k best thresholds values for next attribute, from thresholds already chosen and actually counted; 3. repeat second step for all condition attributes; 4. choose k best thresholds values for first attribute considering all other threshold values. In our further presentations we will use the following data table: DT {ID, num, sym, d) where the first attribute is a unique identifier for an object and there is numeric attribute num and symbolic attribute sym and decision attribute d. Query which creates DT' table is as follows: CREATE TABLE DT' (num' INTEGER, sym' INTEGER, D INTEGER); Query which inserts data into DT' table: INSERT INTO DT' SELECT ABS(DTO.num-DTl.num) AS num', IIF(DTO.sym=DTl.sym,0,l) AS sym', IIF(DTO.d=DTl.d,0,l) AS D FROM DT AS DTO, DT AS DTI WHERE DTO.ID < DTI.ID; An idea of query which calculates NRd is as follows: SELECT COUNT (*) AS NRd FROM DT' WHERE (D=0); A sketch of SQL query for step forward method is presented below: SELECT DISTINCT
(((NRdRIA/NRd)*w)+((NRdRIA/NRIA)*(l-w))) AS q, (SELECT COUNT (*) FROM DT' WHERE (D=0 AND num' < A.num' AND sym'=B.sym')) AS NRdRIA, (SELECT COUNT (*) FROM DT' WHERE (num'
23 Tolerance Information Granules
315
FROM DT' WHERE (num' < A.num' AND sym'=B.sym')) <> 0 AND A.D=0 AND B.D=0 AND ((A.niiin'= THRES.O.l) OR ... OR (A.num'= THRES_0_k)); For more detailed presentation of SQL queries in searching for tolerance thresholds and generation of tolerance decision rules see [1].
Conclusions Syntax and semantics of information granules is discussed. An approach to tolerance information granules is presented. The examples are illustrated using SQL language. The approach seems to be promising and will be further explored.
Acknowledgements The research has been supported by the grant 4T11C014 25 from Ministry of Scientific Research and Information Technology of the Republic of Poland.
References 1. Dakowicz M., Stepaniuk J.: Tolerance Rough Sets and Data Base Management Systems, Proceedings of Concurrency, Specification and Programming Workshop, Czama, Poland, September 25-27,2003, 108-119. 2. Garcia-Molina H., Ullman J., Widom J.: Database Systems: The Complete Book, Prentice Hall, 2002. 3. Kloesgen W., Zytkow J. (Eds.): Handbook of Knowledge Discovery and Data Mining, Oxford University Press, Oxford, 2002. 4. Krawiec K., Slowinski R., Vanderpooten D.: Learning of Decision Rules from Similarity Based Rough Approximations. In: Skowron A., Polkowski L.(Eds.) Rough Sets in Knowledge Discovery. Physica Verlag, Heidelberg, 1998, 37-54. 5. Lukasiewicz J.: Die logischen grundlagen der wahrscheinilchkeitsrechnung, Krakow 1913. In Borkowski L., ed.: Jan Lukasiewicz - Selected Works. North Holland Publishing Company, Amstardam, London, Polish Scientific Publishers, Warsaw, 1970. 6. Pal S.K., Polkowski L., Skowron A. (Eds.): Rough-Neural Computing: Techniques for Computing with Words. Springer-Verlag, Berlin, 2004. 7. Pawlak Z.: Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht, 1991.
316
Jaroslaw Stepaniuk
8. Skowron A., Stepaniuk J.: Generalized Approximation Spaces, Proceedings of the Third International Workshop on Rough Sets and Soft Computing, November 10-12,1994, San Jose, California, USA, 18-21. 9. Skowron A., Stepaniuk J.: Tolerance Approximation Spaces, Fundamenta Informaticae, vol. 27 (2,3), 1996,245-253. 10. SQL standards: http://www.jcc.com/SQLPages/jccs_sql.htm . 11. Stepaniuk J.: Optimizations of Rough Set Model, Fundamenta Informaticae vol. 36(2-3), 1998, 265-283. 12. Stepaniuk J.: Knowledge Discovery by Application of Rough Set Models, L. Polkowski, S. Tsumoto, T.Y. Lin (Eds.), Rough Set Methods and Applications. New Developments in Knowledge Discovery in Information Systems, Physica-Verlag, Heidelberg, 2000, 137233. 13. Zadeh L.A.: Toward a theory of fuzzy information granulation and its certainty in human reasoning and fuzzy logic. Fuzzy Sets and Systems 90, 1997, 111-127. 14. Zadeh L.A.: A new direction in AI: Toward a computational theory of perceptions. AI Magazine 22(1), 2001, 73-84.
24
Attribute Reduction Based on Equivalence Relation Defined on Attribute Set and Its Power Set Ling Wei^ and Wenxiu Zhang'.2
•
^ Institute for Information and System Science, Faculty of Science, Xi'an Jiaotong University, Xi'an, People's Republic of China Department of Mathematics, North-west University, Xi'an, People's Republic of China q j jwv@nwu. e d u . en ^ Institute for Information and System Science, Faculty of Science, Xi'an Jiaotong University, Xi'an, People's Republic of China [email protected] t u . e d u . e n Summary. The knowledge discovery in information systems, essentially, is to classify the objects according to attributes and to study the relation among those classes. Attribute reduction, which is to find a minimum attribute set that can keep the classification ability, is one of the most important problems in knowledge discovery in information system. The general method to study attribute reduction in information system is rough set theory, whose theoretical basis is the equivalence relation created on universe. Novotny. M.(1998) [17] has proposed a new idea to study attribute reduction by creating equivalence relation on attribute set. In this paper, we develop this idea to study attribute reduction through creating equivalence relations on attribute set and its power set. This paper begins with the basis theory of information systems, including definitions of information systems, and equivalence relation RB on universe. Furthermore, two equivalence relations r and R are defined on attribute set and its power set separately. In the next section, two closed operators - C(R) and C{r) are created. Using these two operators, we get two corresponding closed set families - Cr, CR, which are defined as CR = {B, C(R){B) = B}, Cr = {B : C{r){B) = B}. Further, we study properties of these two closed set families, and prove that CR is a subset of Cr. One of the most important result is the necessary and sufficient condition about Cr=CR. This equivalence condition is described by elements of attribute set's division. Finally, based on the equivalence proposition, we find an easy method to acquire attribute reduction when Cr=CR. This method is easy to understand and use. Key words: information system, equivalence relation, closed operator, closed set
24.1 Introduction Knowledge discovery in database is an intelligent method of discovering unknown or unexplored relationship within a large database. It is defined as the nontrivial process *This work was supported by 973 Program of China (No. 2002CB312200).
318
Ling Wei and Wenxiu Zhang
of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [1]. Many different methods and tools are used in knowledge discovery of databases, such as Neural Network algorithm (NN) based on network structure, Genetic Algorithm (GA), Rough Sets (RS), Fuzzy Sets (FS) [2, 3, 4, 5, 6] and so on. Today, knowledge discovery is applied in a wide spectrum of fields: finance, banking, retail sails, manufacturing, monitoring and diagnosis, health care, marketing, and science data acquisition, among others [7, 8, 9, 10]. The knowledge discovery in information systems, essentially, is to classify the objects according to attributes and to study the relation among those classes. So, the knowledge discovery in information systems is the discovery about concepts and rules. Research literatures in this topic focus on attribute reduction. We all know not all of the attributes in an information system have the same importance. Some attributes are absolutely unnecessary, therefore deleting them would hardly affect the classification ability; some are absolutely necessary, and deleting them would surely lead to a decrease in classification ability; for those attributes are relatively necessary, keeping such attribute with others can improve the classification ability. Attribute reduction is to find the minimum attribute set to keep the classification ability [11,12]. The theory of rough sets is an important method in knowledge discovery field. It has been introduced by Zdislaw Pawlak to deal with imprecise or vague concept [2]. In recent years, we witnessed a rapid growth of interest in rough set theory and its applications [3, 13, 14, 15, 16]. Rough set theory study information systems through the equivalence relation created on universe (i.e. objects set). This idea inspired our initiative of studying information systems by utilizing equivalence relation created on attributes set. Miroslav Novotny (1998) has proposed this idea [17] and studied information systems using this method. In this paper, we studied not only the information systems, but also the relations based on attribute set. The paper begins with basic concept about information systems, and then we define two equivalence relations on attributes set and its power set separately. Then we create two closed set families C^, CR , and examine their relationship and properties. Finally, we find the necessary and sufficient condition about Cr=CR.
24.2 Equivalence Relation Based on Attribute Set and Its Power Set Definition 1. An information system is defined as a three-tuple IS=(U,A,F), where U = {xi, ...,Xn} is a finite set of objects, xi (i < n) is an object; A = {ai,..., am} is a finite set of attributes, aj {j < m) is an attribute; F = {fj : j < m} is a set of relationship between U and A, fjiU—^ Vj {j < m), Vj is the value set of attribute aj. An information system is named decision system when the attributes in A are composed by condition attribute set C and decision attribute set D, i.e. A = C U D, C n D = 0, which is denoted by DS=(U,C,D,F). From definition 1, we see that an information system corresponds to a relation database table, and vice versa. That is to say, an information system is the abstract
24 Attribute Reduction Based on Attribute Set
319
description of a relation database table. In general, we say an information system {U,A,F) is a pure attribute system, each attribute in which can classify the object set into at least two classes. Such information system does not have a one-value attribute. Namely, for any a e A,Va>2 holds. In fact, one-value attribute is ineffective for knowledge discovery in information systems, since it has no classification ability. For these attributes, we always delete them before analyzing the information system. The information systems we study in this paper are such pure information systems after deletion of one-value attributes. Suppose IS=(U,A,F) is an information system, for arbitrary B C A, define RB = {{x^Xj)
: fiixi) = fi{xj), {Wai e B)}
(24.1)
Then, RB is an equivalence relation based on £/, which can be proved directly. Especially, for an attribute b e A, there is R{b} = {{xi.Xj) : fb{xi) = fb{xj)} Let R = { ( 5 , C):RB=
RC}.
r = {({6}, {c}) : R^^} = R{c}}
(24.2)
We then have the following conclusions in [17]. Theorem 1. R is a congruence relation on V{A). r is an equivalence relation on A, where V{A) denotes power set on A. We denote the congruence relation as an equivalence relation, and it satisfies the following condition: For any B^ Q e V{A), i = 1,2. (Bi, Ci) G R, (^2, C2) G R hold, we have
{BiUB2,CiUC2)eR.
24.3 Relation Between Equivalence Classification Based on Attribute Set and Its Power Set In this section, we create two closed operators and corresponding closed set families using the above equivalence relation R and r. Furthermore, we study the relation between these two closed set families, get the equivalence condition for CR = Cr, and acquire a method of attribute reduction when CR = Cr. 24.3.1 About Closed Operator C(R) and C{r) Definition 2. Suppose (V{A), C) is an ordered set. C is a mapping ofV{A) into itself. Then C : V{A) -^ V{A) is called a closed operator, if it satisfies the following three conditions: (1). B C C{B)forany B G V{A) (law ofextensivity); (2). If{Bi,B2) are in V{A) and that Bi C B2. then C{Bi) C C{B2) (law of monotony); (3). C{C{B)) = C{B) hold for any B G V{A) (law of indempotence).
320
Ling Wei and Wenxiu Zhang
Because R is an equivalence relation on V{A), there exists an equivalence classification on P ( ^ ) : V{A)/R = {[B]R:BCA} (24.3) where, [B]R = {C : RB = Re}- Similarly, there is another equivalence classification on A: A/r = {[b]r:beA} (24 A) where, [6]^ = {a e A: {a,b) e r} Theorem 2. Suppose [B]R {B C A) is an equivalence class ofB with respect to R, [b]r (b E A) is an equivalence class ofb with respect to r, define C{R){B)
= U[B]R
(24.5)
and C{r){B) = U{E £A/r:EnBj^0}
(24.6)
then, C{R) and C(r) are closed operators. Proof. The definition of C{R){B) implies B C C{R){B). So, the mapping C{R) is extensive. Suppose that Bi, B2 are in V{A) and J5i C B2 holds, we have: (Bi,C(i?)(5i)) G R,
{B2,C(R){B2))
€ R.
Then, ( 5 i U B2, C{R){Bi) U C{R){B2)) G R. That is to say, {B2,C{R){B^)uC{R){B2))eR. So, C{R){B2) D C(i?)(Bi) U C{R){B2) 2 C{R){B2). The mapping C(R) is monotone. From {C{R){CIR){B)),C{R){B)) e R and (J5,C(jR)(B)) e R, we have {C{R)(C{R){B)),B) e R, so C{R){C{R){B)) C C{R){B)\ on the other hand, the extensivity and monotony of C{R) implies that C{R){B) C C{R) {C{R){B)). So, C{R){C{R){B))=C{R){B). The law of indempotence is satisfied. Therefore, C{R) is a closed operator on V{A). The proof about C{r) is a closed operator on V{A) can be found in [17]. 24.3.2 Relation Between CR and C^. Because of C{R) and C(r) are closed operators on V{A), we denote CR = {B, C(E)(B) = B } ,
a = {B, C{r){B) = B}
(24.7)
They are closed set families corresponding to closed operator C{R) and C{r). We will study properties of CR and Cr.
24 Attribute Reduction Based on Attribute Set
321
Theorem 3. CR and Cr have the following properties.
(l).iDeCr,AeCr. (2).il\eCR,AeCR. (3), IfE e A/r then {b} G [E]Rjorany b e E, (4). IfE e A/r then E' e [E]Rjorany E' C E. (5). IfE e A/r then E e Cr, That is, A/r C Cr. (6). Cr can be described as Cr = {UE; E e H C A/r}. (7). IfE e A/r, EnBj^0,thenBuEe [B]R. (8). Suppose A = {ai,..., a ^ } . If A/r = {J5i,..., Em}, ic. Ei = {a^}, then Cr = V{A). Proof (1). The definition of Cr implies these two results. (2). The definition of CR implies A G CR, and the information system we studied is pure information system implies 0 e CR. (3). If E G A/r,R{a} = R{b} holds for any a,b e E. So RE
= \\ R{a} = R{b}' aeE
That is, {b} G [E]R. (4). For Wb e E' C E,R[b} D RE^ 2 RE is right; moreover, we know R^i,y = RE from property (3). And then, RE^ = RE- SO, E' G [E]R. (5). Suppose E G A/r. For \/F{j^ E) G A/r, there must be F D £; = 0. That is to say, for each element in A/r , only E itself whose intersection with E is nonempty. Furthermore, from the definition of C{r){B) in formula (6), we see C{r){E) = E. It follows that E eCr . (6). Every element B in Cr satisfies
B = \J{EeA/r;EnBj^0}=
\J E EnB^0
So, Cr = {UE; E e H C A/r}. This is another description of Cr . (7).If 6 G EDB, thenRBUE = RB^RE = RB^R{h} = RB- So,BuEe
[B]R.
(8). Since Ei = {ai}, we have, VC G V{A) ^C
= [j{E
G A/r,E
C C} = \J{E G A/r,EnC
^ 0}
^CeCr. So, Cr = V{A) . This property shows: if each attribute's classification result is different, then Cr = V{A). Theorem 4. There exists inclusion relation between CR and Cr: CR C Cr. Proof Suppose B £ CR , which implies C{R){B) = B. For WE G A/r, when E H B f^ 0, we know B U E e [B]R from the property (7) of theorem 3. So, C{R){B) DBuEDB. Since C{R){B) = B, BuE = B holds. That is, ECB.So
322
Ling Wei and Wenxiu Zhang
C{r){B) = \J{E
e A/r; E n B ^ 0} = \J{E e A/r; E C B} C B.
On the other hand, for V 6 G B, there must exist E £ A/r which satisfies b e E, and EnB^9;^nd then be EC C{r){B). So, B C C{r){B) is proved. The above proof means B = C{r){B). So, B e Cr. From the above proof, we see CRCCr. In general, there exists the relation CR ^ Cr . This theorem 4 gives the clear relation description between them. That is, CR is a sub-set of Cr . Next sub-section will give the necessary and sufficient condition about CR = Cr . 24.3.3 Equivalence Condition For CR = Cr and Attribute Reduction Method Generally speaking, CR and Cr are different. For example, in an information system described as table 1, Table 24.1. An information system of CR / Cr
Xi X2 X3
ai
a2
as
1 2 2
2 1 2
2 2 1
we have: Cfl = {0,{ai},{a2},{«3},>l},
Cr.=ViA)
This section will give the equivalence condition about CR = Cr . Theorem 5. The necessary and sufficient condition for CR = Cr is: ifB^C, RB ^ Re- Where
B= \J Ei
then
C= D Ei
and a, /? C A/r, which satisfies a ^ (3. Proof. We only need to prove the following equivalence proposition: CR^ Cr <=> There exist a, /3 C A/r, which are satisfied a^f3, let B=
[j Eieoc
Ei
C=
[j
Ei
Eie(3
When B^C,RB = RC' Sufficiency. If there exist a,P C A/r, B and C are defined as the proposition, then B,C G C^.. At the same time, when B ^ C, RB — Re is right. Then, RBUC = RB^RC = RB ^ Rc> So, we get C{R){B) D B U C D B, or C{R){C) D BUC D C . That is to say, C{R){B) ^ B ox C{R){C) ^ C . That
24 Attribute Reduction Based on Attribute Set
323
means, B,C ^ CR. Therefore, we have Cn^Cr . Necessity. If CR i^ Cr , then, there exist a B , which satisfies C{r){B) = B,C{R){B) j^ B , As B e Cr , from property (6) of theorem 3, B can be described as
B=
[j Ei EieoL
where a C A/r . Put C = C{R){B). Since C{R) is a closed operator, C = C{R){C). So, C € C/? C C^ . And then, C can be described as
C= [j Ei f3 C A/r. It is obvious that a 7^ /?, 5 7^ C , but C = C{R){B), RB = Rc^ Analyzing theorem 5 in detail, we can naturally find and prove an attribute reduction method using the equivalence condition when CR = Cr . The method is as follows: selecting an element (an attribute) in each Ei arbitrarily, we can obtain a set containing such attributes, which is just the attribute reduction of information system {U, A, F) . If there exists some Ei , which is a one-point set, then its element must be a kernel attribute. We show this method by theorem 6: Theorem 6, Suppose (U^A^F) is an information system. When CR = Cr , let A/r = {Eu...,En}.Then, B is the reduction of A ^ B = {ei,..., €„}, where Ci e Ei, {i = 1,2,..., n). If there is a Ek which satisfies \Ek\ — 1, then its element is kernel attribute. We give two examples about attribute reduction using theorem 6. In these examples, we define equivalence relations R = {{B, C); RB = Re}, r = {({b}^ {c}); R{b} = Example 1. The information system is shown in table 2. The attribute set A = {ai, 02,03}, the object set f7 = {xi,X2,X3,a:4} . Table 24.2. An information system of example I
Xl X2 X3 X4
ai
a2
as
I 2 3 1
2 1 1 1
1 2 2 2
We can easily get the following results: CR = Cr = {{ai}, {a2, as}, A, 0}
324
Ling Wei and Wenxiu Zhang
A/r = {{ai},{a2,as}} Therefore, the attribute reduction of this information system is {ai, 02} or {ai ,03}, and ai is kernel attribute. Example 2. The information system is shown in table 3. We can easily get the folTable 24.3. An information system of example 2 ai a2 as XI
1
1
0^2
1
1
1 2
xa 0:4
1 2
2 2
2 2
lowing results:
A/r = {{ai},{a2,a3}} Therefore, the attribute reduction of this information system is A,
24.4 Conclusion In this paper, we study the following problems in information system {U,A,F)\ the connection between equivalence relation r and R which are defined on attribute set A and its power set V{A) separately; the sufficient and necessary condition for CR = Cr\ one method of attribute reduction when CR = Cr. Using these results, we can simplify the knowledge discovery in information system by transfer research on attribute set's power set into attribute set itself. In addition, our study has no restriction for any attribute set S . If S is a condition attribute set, we will get the knowledge of raw information system; if JB is a decision attribute set, we will get the knowledge of decision system. All the results we get in this paper can be a theoretical base for knowledge discovery in information systems. Of course, we do believe that there may exist more useful methods and theories which can improve the knowledge discovery in information systems. We hope this paper can inspire further researches in knowledge discovery in information system.
References 1. U. Fayyad, G. Piatetsky-Shapiro, and P.Smyth(1996) From Data Mining to Knowledge Discovery: An Overview. In: U. Fayyad, G. Piatetsky-Shapiro, P.Smyth, R. Uthurusamy (eds) Advances in knowledge Discovery and Data Mining, MIT Press, Cambridge, Mass.
24 Attribute Reduction Based on Attribute Set
325
2. Pawlak Z (1991). Rough sets: Theoretical Aspects of Reasoning about Data. BostonrKluwer Academic Publishers, Dordrecht. 3. Polkowski L, Lin T Y, Tsumoto S (Eds). Rough Set Methods and applications: New Developments in Knowledge Discovery in Information Systems. Heidelberg: PhysicaVerlag. 4. Siddhartha B, Olivier V. Pictet, Gilles Z (2002) Knowledge-intensive genetic discovery in foreign exchange markets. IEEE Transaction, Evolutionary Computation. 6(2): 169-181. 5. Engelbert Mephu Nguifo, Vincent Duquenne, Michel Liquiere (2003) Introduction - Concept Lattice-Based Theory, Methods and Tools for Knowledge Discovery in Databases: Applications. Applied Artificial Intelligence. 17(3): 177-180. 6. K. J. Adams, D. A. Bell, Liam P. Maguire, J. McGregor (2003) Knowledge Discovery from Decision Tables by the Use of Multiple-Valued Logic. Artificial Intelligence Review. 19(2): 153-176. 7. U. Fayyad (1996) Data Mining and Knowledge Discovery: Making Sense Out of Data. IEEE Expert. 20-25. 8. T. Oyama, K. Kitano, Kenji Satou, T (2002) Extraction of knowledge on protein-protein interaction by association rule discovery. Bioinformatics. 18(5):705-714. 9. Kay Chen Tan, Q. Yu, C. M. Heng, T. H. Lee (2003) Evolutionary computing for knowledge discovery in medical diagnosis. Artificial Intelligence in Medicine. 27(2): 129-154. 10. Carolina Silva, Cirano lochpe, Paulo Engel (2003) Using Knowledge Discovery to Identify Analysis Patterns for Geographic Database. ICEIS (2):359-364. 11. Kryszkiewicz M, Rybinski H (1993) Finding reducts in composed information systems. In: Ziarko W (eds) Rough Sets, Fuzzy Sets and Knowledge Discovery. 12. Wroblewski J (1995) Finding minimal reducts using genetic algorithms. In: Wang P P(eds) Proceedings of the International Workshop on Rough Sets Soft Computing at Second Annual Joint Conference on Information Science (JCIS'95), Wrighsville Beach, NC. 13. James F. Peters, Andrzej Skowron (2002) A rough set approach to knowledge discovery. International Journal Intelligence System. 17(2): 109-112. 14. Yasdi R (1995) Combining Rough Sets learning and neural learning method to deal with uncertain and imprecise information. Neurocomputing. 7(l):61-84. 15. Slowinski, R (1995) Rough set approach to decision analysis. AI Expert. 5:19-25. 16. Aijun A et al (1996) Discovering rules for water demand prediction - an enhanced rough set approach. Engineering Applications of Artificial Intelligence. 9(6):645-653. 17. Miroslav Novotny (1998) Dependence Spaces of Information Systems. In: E. Ortowska (eds) Incomplete Informations: Rough Set Analysis. Physica-Verlag.
25
Query Cost Model Constructed and Analyzed in a Dynamic Environment Zhining Liao, Hui Wang, David Glass, and Gongde Quo School of Computing and Mathematics, Faculty of Engineering, University of Ulster, BT37 OQB, UK {Z.Liao, H.Wang, Dh.Glass, G.Guo}@ulster.ac.uk
Summary. Query processing over the Internet involving multiple data sources has been proven one of the most difficult and important problems in modem e-data sharing society. In this new data processing environment, three major factors affect the cost of a query: network congestion situation, server states (server workload), and data/query complexity. In this paper, we construct cost models for estimating the cost of query and split query cost into data searched cost and data transmitted cost. We also study how to capture the changes of the query system in order to update the cost models whenever it needs ,and use a real discrete fourier transform method tofilterthe noise in the main trend of the network and the query system for the more accurate cost models. So we can choose the best query plan according to the updated cost model. Key words: query optimization, cost model, data processing
25.1 Introduction To meet the growing needs for sharing pre-existing data sources over the Internet, data integration from a multitude of autonomous data sources has recently been a research focus. Query optimization is a major stage in data integration over the Internet. It requires the estimated costs of possible query plans to select the best query plan in terms of costs. The key challenges arise due to the dynamics and unpredictability of the workloads of both the network and the autonomy of remote data sources. These sources may not provide availability metrics for accurate cost estimation. Therefore, some methods of deriving cost models for autonomous data sources at a global level are significantly important in order to accurately process query. Several methods proposed [1, 3, 7-13] assume that the system environment (both the network itself and the remote servers) does not change significantly over time. Therefore, the impact of the two factors is not explicitly involved in query cost estimation. The significance of recognizing the impact of the overall system contention states has been studied in two separate research projects recently. In [11-13], the effects of the workload of a server on the cost of a query are investigated and a method to decide the contention states of a server is developed. Cost models derived through sampling queries for
328
Zhining Liao, Hui Wang, David Glass, and Gongde Quo
each contention state are also constructed for estimating the costs of further queries. This research has been concentrated on the workload of a server only and the experiments have been carried out in a peer-to-peer environment, and it is assumed that the network is steady and does not consume much of the query cost. In [5, 9], on the other hand, the importance of coping with the dynamic network environment is addressed. The effects of the network factor are investigated and a cost estimation model is proposed that measures the costs of the same query in different network situations, e.g., different times of the day. The main drawbacks of the method are twofold. First of all, the dimension of Time has the minimum scale of one hour. If a remote source is highly dynamic, hourly intervals may be too large to reflect the change of the server. Secondly, this approach considers only the quantity of data to be transferred and does not consider the variety of queries using different operators. The complexity of a query can affect its response time significantly even under the same workload of the network. In [4], We combine two factors (network congestion situation, server contention states) together as system contention states to discuss the effect of system contention states on the cost of a query. In this paper, we construct the estimated cost model and split the query cost into the cost of the network and the cost of servers. We also propose a model to update the cost model to adapt the changing environment.
25.2 System contention states and cost models 25.2.1 Grouping costs of sample queries using clustering techniques To establish the relationship between the contention states of a system and the cost of a query, a sample query is carefully designed. This query is of reasonable complexity and can be evaluated by the remote server quickly. It is tested on the remote server at a fixed time interval over 24 hours period. The costs of the query (T^) at these time points (U) are collected. To determine an appropriate set of contention states for a dynamic environment, an algorithm often used for multi-dimensional data clustering [6] is modified to cluster one-dimensional data (i.e., the cost of the sample query in terms of time spent by the server). The key idea underlying the algorithm is to place each data object (the cost of a sample query) in its own cluster initially and then gradually merge clusters into larger ones until a desired number of clusters have been found. To determine appropriate cost models for system contention states, we carry out multiple regression analysis to build cost formulae [2, 11]. The multiple regression process allows us to establish a statistical relationship between the costs of queries and the relevant contributing (explanatory) variables, as listed below. The more details can be found in [4]. 25.2.2 Multiple linear regression cost models There are mainly five factors affecting the cost of a query in the wide area environment that we have mentioned as follows: 1. How many tuples in an operand table. 2. How many tuples in the result table. 3. The cardinality of an intermediate table.
25 Query Cost Model Constructed and Analyzed in a Dynamic Environment
329
4. The tuple length of the result table. 5. Contention about the system, including system factors, such as CPU, I/O, or data items, and network factors, such as, network speed and data volume. Other factors (i.e. the tuple length of an operand) are less important in wide area environment and some other factors (i.e. the physical size, index of database) are not available in most cases in query processing systems over the Internet, since every data source is autonomous. As we know, the factors affecting the cost of query, we can construct the cost model as following. Let Xi, X2,... ,Xp be p explanatory variables, which correspond to the factors we discussed above in query processing. They do not have to represent different independent variables. It is allowed, for example, that Xs = Xi * X2. The response (dependent) variable Y (which is the query cost in this paper) tends to vary in a systematic way with the explanatory variables Xi. For an unary query, we let Ru be the operand table, Nu be the cardinalities of the operand table, Nr be the cardinalities of the result table, Lr is the tuple length of the result table. Then LNr = Nr * Lr is the data volume that is to be transferred to the user. The cost estimation formula for unary queries is: Y = Bo-hBi'Nu-\-B2'Nr-\-Bs' LNr (25.1) For a join query, we let Rui be one of the operand tables and Ru2 be the other operand table, Nui and Nu2 be its cardinalities of the operand tables, Nr be the cardinality of the result table, Lr be the tuple length of the result table. Then LNr is the data volume of the result table that is to be transferred to the user. Y = Bo-\-Bi'Nui-\-B2' Nu2 -\-Bs-Nr-\-B4' LNr (25.2)
25,3 The cost of query in the wide area environment As we know, the network delay, the server states and the data volume affect the cost of a query. The estimated cost (of time) of a query plan is then divided into three parts. TotalTime = {Timeu -f Time2i + Timesi);i = 1 , . . . , A; (25.3) TotalTime is the time from a query is submitted until the user gets all of the data for the query. Timei is the time to transmit the query from query agent to server over the Internet. Time2 is the time taken by a remote data source to perform a subquery. Times is the time to transfer all the resulting data from a remote data source to the query agent. Since a global query is decomposed into many sub-queries and some of these sub-queries can be performed in parallel, variable k is the (estimated) number of sub-queries that will be carried out in sequence. The total time taken to answer this global query is the sum of times taken by each sequential sub-query. Here, we focus on the part of timei and times. In [4], we proposed a sample query method to establish the relationship between a system contention state and the cost of a query. The process of the sample query method is to send a query to a server from a query agent and record the relative time points. In here, we extend the method. We recorded several time points (Ts, T^/, Tri) and the number of data packages that the query agent sent and received (Numg, Numr). Tg : the time point to send the query out from query agent; Trj: the time point when query agent receives the first data package; Tri: the time point when the last data package is received by query agent; Nurns', the number of data packages to be sent to the server when the agent
330
Zhining Liao, Hui Wang, David Glass, and Gongde Quo
submits a query. Nurrir'. the number of data packages to be received by query agent for the data of query. Query agent records Tg, T^/, Tri. Then we can get the following formulae and calculate the cost of single data package {Cg) that is transmitted over the Internet (In here, for simple reason we use unary query as an example). TotalTime = TH - Ts (25.4) Times = Tri - Trf + (T^ - Trf)/{Nurrir Cs = Times/{Nurrir
- 1)
- 1)
Timei = Nurris • Cs
(25.5) (25.6) (25.7)
Time2 = TotalTime - Timei - Times (25.8) From formulae 25.4 to 25.8, we can get the cost of a query from the sever cost and the network cost. So when we use the sample query method, we can get the network cost and calculate the network speed during the time when the sample query was submitted and get the server's contention states by the method used in [4]. Then we can make a more accurate estimate of the cost of query by estimating the cost of server and the network respectively.
25.4 Updating query cost models in the wide area environment After we construct the cost models for the query system, we can estimate the query cost accurately. But in the wide area environment, there are many factors to make the cost models changed. Although such an environment may not change dramatically during the execution of one query, the costs of the same query executed at different times in the environment can be significantly different. The query optimizer may not choose the best query plan for a query, which would cause a serious performance problem. In this paper, we tackle this problem by changing a cost model and analyze two aspects of query optimizer: the changes of the network and the changes of server contention states to capture the changing environment so that the cost model can be kept up-to-date all the time. 25.4.1 Updating the estimated formula for the cost of network Most of situation the speed of network changes irregularly and is effected by many kinds of factors. It is difficult to estimate the network states using a single state when the query is submitted to server. To cope with this problem, the method of Real Discrete Fourier Transform is proposed to filter noise and keep the main trend of network speed. We can estimate the server states base on the main trend of network speed. It is assumed that the recorded raw data about the speed of the network, Nspeed{^X is composed additively of a long-term signal Nlspeed{^) and a noise n(t) , that is Nspeed{t) = Nlspeed{t) + n(t) . If wc are able to reduce n(t) form Nspeed{t), then we can obtain the main trend of the network speed. In our study, the Fourier transform is applied to the raw graph to identify the long-term signal NlgpeediO as it is constructed mainly from waves with low frequency (slow changes over time), while the noise signal is constructed from waves with high frequency (fast changes over time).
25 Query Cost Model Constructed and Analyzed in a Dynamic Environment
331
The formulae are as follows. The n-point (n = power of 2) Real Discrete Fourier Transform of a signal x= [xt], t-Oy 1,..., n-l is defined to be a sequence x of n/2+1 complex numbers ,/=ft I,..., n/2, given by Xf=Rf -\-iIf in which, Rf = Ilt=o XtCos{2nft/n)/ / = Ylt=o xtsin{27rft/n),t
^^5.9) = 0,...,n/2
Here i is the imaginary unit. The signal x can be recovered by the inverse transform with the reduction of noise: xt = l/n{{Ro -h Rn/2COs{7rt))/2 + J^tH'^ XtCos{27rft/n) -^Y:'^il~^Xtsin{27rft/n)),t = 0,,..,n/2 As we observe, although the changes speed of the network are irregular in a short time, but the main trend of network speed changes regularly in the period of a week. The speed of the network from Monday to Friday are much lower than the speed of weekend. The peak time (8:00am to 6:00pm) is much busier than off peak time. If the estimated cost of network is significantly different to the observed cost, we can put the data of observed cost in the data set of the cost of network at proper time points and rebuild the formula for estimating cost of the network by the formula 25.9 and 25.10. So we can capture the changes of the network and get the new cost formula for the network. 25.4.2 Updating the cost model for the server cost As we know, many factors can affect server contention states, those including the data volume, the number of visitors, physical data distribution/organization on disk, local database conceptual/physical schemas, etc. These factors usually change little by little, and significant change may be accumulated after certain period of time (e.g., couple of days, weeks, or months). To deal with this problem, one direct way is to rebuild the cost model by the query sampling method [4] after obtaining the new observed data of query cost. Two drawbacks still exist in this method. First of all, the relatively high overhead is caused by frequently rebuilding the cost model. The other problem is caused by some unpredicted factors such as temporarily interruption of servers. Network congestion, some of routings fail in the line of data transmitted or virus attacking in the wide area environment. The kind of factors may appear randomly and lead to the very high cost of query . When we record the high cost of query, the kind of data are the noise in the data set of cost of query. If the data set with noise is used to rebuild the cost model, the query optimizer will choose an inefficient execution plan for a query based on the rebuilt cost model and the effect of noise will last long time. To deal with the first problem, we record all sample data points since the last time rebuilding cost model and use them to update the cost model all at once to reduce the updating overhead. We also use the current cost model to estimate the sample queries and compare the observed data of sample queries to the estimated data. If the error rate is high (e.g., the number of queries with a large error of cost estimation is beyond a threshold), we can use the observed data to update the cost model to capture the changes of environment. For secondly problem, we pre-process the data set of cost of query before the data set is used to rebuild the new cost model. There is a clear advantage if we pre-process the raw data and work
332
Zhining Liao, Hui Wang, David Glass, and Gongde Guo
on the pre-processed information. In this paper, we use a real Fourier transform to filter the noise and keep the main trend of the changes of environment. It is assumed that the data set of the cost of the sample query, Cs(t), is composed of a long-term signal Csl(t) and a noise n(t), that is Cs(t) = Cls(t) + n(t). If we are able to reduce n(t) from Cs(t), the main trend of the changes of environment can be obtained. The cost model rebuilt by the preprocessed data set is more accurate to reflect the query system. So to use the proposed techniques is quite promising in maintaining accurate cost models efficiently for the changing environment.
25.5 Conclusions In this paper, we construct the cost models for estimating the cost of query, study the behavior of the wide area network. So, we can estimate the cost of query through separating the estimated query cost into data searching cost and data transmitted cost in the wide area environment. We also study how to capture the changes of the network and the query system, and use a real discrete fourier transform method to filter the noisy in the main trend of the network and the query system. Then we use the preprocessed data sets to rebuild the cost models of the network and query system. We can keep the cost models to update whenever it needs. We can be more accurate to choose the best query plan according to the new situation.
References 1. Adali S.,Candan K.S., Papakonstantinou Y., Subrahmanian V.S, Query caching and optimization in distributed mediator systems, Proc. of ACM SIGMOD'96, 1996, pp.l37-I48 2. Chatterjee S., Price ^..Regression Analysis by Example John Wiley & Sons, 1991 3. Du W, Query optimization in heterogeneous DBMS, Proc. of VLDB'92, 1992, pp.l03119 4. Liu W., Liao Z., Jun H., Query cost estimation through remote server analysis over the Internet, Proc. Of Wr03, 2003, pp.345-355 5. Gruser J.R., Raschid L, Zadorozhny V., Zhan T, Learning response time for web-sources using query feedback and application in query optimization, VLDB Journal, 9(1), 2000, pp. 18-37 6. Muralikrishna M., Dewitt D.J., Equi-depth histograms for estimating selectivity factors for multi-dimensional queries, Proc. of SIGMOD'88,1988, pp.28-36. 7. Roth M.T., Ozcan R, Haas L.M, Cost models DO matter: providing cost information for diverse data sources in a federated system, Proc. of VLDB'99,1999, pp.599-610. 8. Ling Y., Sun W, A supplement to sampling-based methods for query size estimation in a database system SIGMOD Record, 21(4), 1992, pp.12-15 9. Zadorozhny V., Raschid L., Zhan T., Bright L, Validating an Access Cost Model for Wide Area Applications. Cooperative Information Systems, Cooperative Information Systems, Vol9,2001,pp.371-385 10. Zhu Q., Larson, P. A, Query Sampling Method of Estimating Local Cost Parameters in a Multidatabase System, Proc. of ICDE'94, 1994, pp. 144-153 11. Zhu Q., Larson P., Building Regression Cost Models for Multidatabase Systems, Proc.PDIS'96, 1996, pp. 220-231. 12. Zhu Q., Motheramgari S., Sun Y, Cost estimation for large queries via fractional analysisand probabilistic approach in dynamic multidatabase environments, Proc. of DEXA'OO, 2000, pp.509-525 13. Zhu Q., Motheramgari S., Sun Y, Developing cost models with qualitative variables for dynamic multidatabase environments, Proc. of ICDE'OO, 2000.
26 The Efficiency of the Rules' Classification Based on the Cluster Analysis Method and Salton's Method Agnieszka Nowak and Alicja Wakulicz-Deja University of Silesia, Institute of Computer Science B^dzinska 39, 41-200 Sosnowiec, Poland [email protected] [email protected]
26.1 Introduction The aim of this paper is the comparison of the document classification method based on distance analysis (cluster analysis) and based on coefficients of similarity (Salton's method). We are interested if the received groups of documents are identical (similar) and if new cases are similarly classificated. Getting the positive results will give us the basis to use the method based on coefficients of similarity to classify the set of rules. Classification based on the coefficient of similarity is used to create documentation structural based and retrieval in search engines. The cluster analysis is used to facts and rules classification in knowledge base. It seems that using the mechanism based on the coefficient of similarity can be very comfortable to inference's process. Our paper includes the first issue, it means the comparison of the rules's process classification based on the cluster analysis method and Salton's method.
26.2 The classification as the solution to recognize The classification takes the task of recognize the memberships the various type of objects to some classes. Any objects is describe by the set of values attributes. It gives us the possibilities of making calculation if the objects belongs to specific class. Assume D is the set of objects, which should be recognize. Then, on this set, it will be able to define the equivalence class K C D x D, so called the classification. It defines the division of the set D on to the class collection of equivalence {Di}. [1] 26.2.1 The structure of space attributes The measurement of the objects attributes both: model and new ones it is the initial element of any algorithm of classification. It leads to changing all objects d into
334
Agnieszka Nowak and Alicja Wakulicz-Deja
the point in the X-space. This space is treated as the Euchdean space, where each documents are the n- dimensional vectors x = {xi, 0:2, Xn} G X, From among all classification method's we pay attention to minimal distance method's, where is required choosing one of distance of measurements. It can be any imitation q : X X X -^ R, where for all vectors x G X ( l , 2 , ) following foundations are truly [3]:
g(x^,x^)
(26.1)
26.2.2 The method of distance and similarity estimate The cluster analysis method studies the distance between objects, however Salton's method the similarity. There is the very close relationship between them: the smaller distance between the objects the bigger degree of their similarity (and vice versa). The literature gives a lot of different similarity measures and distance. Mostly we use the following measures:[l]. Euclidean measure: d{x,y)= \Y,{xi-yiY
(26.2)
p(x, y) =
(26.3)
Cosine measure: ^^'I'y'
where: vectors x and y are compared in n-dimensional space. 26.2.3 The types of classification There are a lot of method's of objects classifications, from among which the most important and most general are: graphical, hierarchical and k-optimal methods. The most popular method of all is the /[:-means method. That is why this method was chosen to rules's classification in this paper.
26.3 The k-means method In literature we can find many version of k-means method. The simples version of this method tells us to choose at random k objects from the set as the centers of k groups and to assign the rest of objects to them. In each iteration the number of groups is invariable and only the composition of the group can be change. In the front of each group there is one group's representative (the center), which is the center of gravity the document's set of vectors[l]. The k-means algorithm:
26 TheEfficiency of the Rules'Classification...
335
1. For every cluster its center is counted. 2. For every considered objects the closest center is created. 3. If the stabilization was not reached yet, the objects move boundary strip with clusters, and new corrected centroides are marked. 4. It repeats oneself steps 2 and 3 until stabilization is not achievement (the objects were not in the best groups). Our goal is to get minimum distance inside the concentrations, maximum - between concentrations.
26.4 The Salton's method Initially we suppose that all documents are loose documents. Each of this document will be subjected to the test of thickness to check how many documents are situated in it's neighborhood. Over Ni of documents should have the coefficient correlation with tested document, higher than some parameter pi, and more than A^2 documents higher than p2. If the document passed the test of thickness, we calculate the minimal and maximal groups. Then we choose the Pmin as the minimal value of coefficient correlation. The group is created by documents, which have higher correlation with the central element than Pmin- Next, for each group, we create centroid vector. It consist of the attributes describing the objects of given groups. This centroid vector is compared with all set. To create the final group, threshold parameters are calculated again. Combined process is repeated for all unconnected elements[2].
26.5 The description of stages analysis The analysis introduced in this paper was carried out on testing set of 24 different length rules. The aim of it was formation two groups of documents and selection of classification vectors for created groups. The aim is an easy classification of new cases and the selection of the final decision of attribute value. The groups of combined objects will be created and the retrieval process will be carried out only on special groups. 26.5.1 The guidelines To solve many problems the object will be normalized till the solid dimension. In this situation, the values of each attribute are numbered one by one in turn form 1 to n: 1 = 1 ^ 5 = 1,8 = 4(00001004] The first stage of classification were conditional parts of rules. Each rules was replaced with 8 dimensional solid length vector. On each position of this vector there is the value of given attribute, if it is in conditional parts of given rule. If given rule doesn't use given attribute there is zero value on this position. In result we get the data set contains the same structured, which will be easy to analyze. We want to create the groups of similar objects (documents) with the representative in front of them. This objects are recorded in the new cases's table. Except the conditional part
336
Agnieszka Nowak and Alicja Wakulicz-Deja
of given rule, there is also weight of it. If the objects is new it is added to the new cases's table with the weight equal one, however the weight of existing cases are suitably increased.
I0I0I0I0I1I0I0I4 1 2I
__whlnlnl NJ U U il Inlnlnin | U U U U 1 Z0 1
In! 1 Inl 1 In In In In 1 "^1
fcH
0 O0 0 H 0 0
0
0|110|1 |0|u|0|u 1 3fz^;;^ ^|i|u u|U|U|U|U|U | ^ | ^^^|0|0|0|0|0|0|0| 1 1
Fig. 26.1. The structure of new cases table for under the examination gathering of rules
26.5.2 The cluster analysis method We choose k clusters at random: ki ^ 5 = 1 , 8 = 4 4=> [00001004] fc2
X18, X20, ^ 2 1 , ^23,3:24}
In following iterations we check the stability of this structure. In order to this we calculate again the distance of each object to the group's centers. As new group's centers we choose so called centers of gravity. They are calculated according to the method of average gravity link (average connections weighed). Then the center object so called centroid Ci, is calculated as the average distance all objects till the given cluster, in accordance with the following measure:
Ci
N
l
(26.4)
where: N - the number of objects, Oji - the values of object Oj at the i-posistion. In this way we got new group's centers, which we are going to use to form a new group. These are: ki = [00001003] k2 = [00010000]. After next iteration it turned out, that groups didn't change their state. It means that in that moment, we
26 TheEfficiency of the Rules'Classification...
337
can stop building further cluster analysis. If none of objects didn't moved to other group, then group's centers didn't change themselves so the distances of separated objects hasn't changed either. If we would like to aim at the best results, we should choose completely new group's centers and compare with the results we have aimed so far. Finally, we should choose such classifications, which made possibility to aim both goals of cluster analysis. 26.5.3 Salton's method In offered Rocchio's algorithm, Salton required certain estabilished parameters (so called the test of thickness's parameters): pi > 0.85, Ni = 3, p2 > 0-7, N2 = 5.Additionally, we choose one object Xc at random, as the potential center of the first group. Xci = 5 = 1,8 = 4(00001004] Round this object we will try to form a new group. We calculate the similarity of each object to the chosen one, according to cosine measure. The value of this measure belongs to the interval [0,1], where the value closer 1 defines that two objects are very similar to each other. Value closer 0 means the reverse situation. Received coefficients are set decreasing. On the top of hierarchy, there are objects, which similarity to the group's centers is the largest. Chosen objects passed the test of thickness, so now, on the bases of sprit group we estabilish Pmin- For so formed groups we create the centroid vector as the gravity center of vectors, which belong to the max group. It turned out that, the vector would have the same form as vector created in cluster analysis method's. Now it is necessary to test the thickness for centroid and finally estabilished the size of group. We have to calculate the similarity of initial group's objects with estabilish Xc = [00001003]. It turned out that mi = m2, so we have to calculate Pmin = 0,86 again and all objects, which have the coefficient of similarity from Xc higher or equal Pmin^ will be added to the first group. Therefore, after two iterations there are following structures: The centroid Ci = [00001003] with the group Xi = {xi, xs, 0:4,0:19, a:22}The centroid C2 = [01010000] with the group X2 = {xe, XQ, xio, x n , X12,2:13, 3^14? ^21, ^23}. The third group of loose documents L = X3 = {x2,xs,X7,X8,xi5, X16, X i 7 , X18, X20, 2:24}.
26.6 The received results - Final conclusions So formed group's structure let us classify further new cases. We intent to explore the similarity of new objects in comparison with objects have been classified and placed in cases's table so far. If in cases's table, there is the same vector, which are analyzing, then we have to increase the weight for this case. Additionally, we check uniqueness of decision (for this case). If there is only one decision, the weight for this case aprioppriately increased. In the opposite situation we have to aprioppriately increased the weight for each cases, or for this one, which has got the biggest frequency. If the case is new, it is added at the end to the table with the weight equal
338
Agnieszka Nowak and Alicja Wakulicz-Deja
1. The result of classification process the same set of objects, two rules's groups were created. The state of these groups are very similar, it means that the same objects belong to given group, and even the centroides are the same. So there is one conclusion, that both methods let us achieve the same result, only initial parameters both algorithm are different. No doubt both method's except similarities have also differences: 1. k-means method generate all groups in each steps of algorithm and the number doesn't changed. Whereas, the Salton's method in each iteration generates only one group. 2. k-means method for each object choose the cluster, with the least distance, while Salton's method let the object, which is similar to more than one group to be in these in the same time. It can be also classified in loose document's group. According to this statement it is really hard to estimate, which method is better. In connection with that we can ask the question how to measure the efficiency of these received classifications. On this stage we are able to say that k-means method with the k-medoid modification(which will be the object of further researches), is more useful than Salton's method. This method calculate Total Cost each classification, as the sum all distances of objects to their centers. It follows that the classification which gained the least cost is the best possible solution. In our researches as a model method we will be using cluster analysis method to create groups.
References 1. E. Gatnar: Symboliczne metody klasyfikacji danych, W-wa 1998, PWN [Polish] 2. G. Salton: SMART - automatyczny system wyszukiwania informacjU W-wa 1975, WNT [Polish] 3. R. Tadeusiewicz, M. Flasiiiski: Rozpoznawanie obrazow, W-wa, PWN, [Polish]
27 Extracting Minimal Templates in a Decision Table Barbara Marszat-Paszek and Piotr Paszek Institute of Computer Science, University of Silesia B^zinska 39,41-200 Sosnowiec, Poland [email protected]
Summary. In 1991 there were defined basic functions of the evidence theory based on the concepts of the rough set theory. In this paper we use these functions in specifying minimal templates of decision tables. The problem offindingsuch templates is NP-hard. Hence, we propose some heuristics based on genetic algorithms. Key words: templates, rough sets, evidence theory
27.1 Basic Concepts of the Rough Set Theory In 1982 Z. Pawlak created the rough set theory as an innovative mathematical tool for describing knowledge, including the uncertain and inexact knowledge [2]. In this theory knowledge is based on possibility (capability) of classifying objects. The objects may be for instance real objects, statements, abstract concepts, or processes. We recall some basic definitions of the rough set theory [2]. Definition 1. Information system A pair A = (C/, ^4) will be called information system, where: U -is a nonempty, finite set called the universe; A-is a nonempty, finite set of attributes. Each attribute a e Ais a function, where Va - is a set of values of attribute a and is called the domain of the attribute. One of the basic concepts of the rough set theory is the indiscemibility relation. It can be defined for a given information system as follows. Definition 2. Indiscemibility relation The B-indiscemibility relation IND{B)for IND{B)
= {{x,y) eUxU:
B C Ais defined by V a{x) = a{y)}. aeB
340
Barbara Marszal-Paszek and Piotr Paszek
The indiscemibility relation is an equivalence relation. The equivalence class of the relation IND{B) defined by an object x is the set of all objects J5-indiscemible with X. By [X]B we denote the equivalence class of x, i.e., the set [X]B = {yeU:x
IND{B)
y}.
Now we can introduce definitions of the upper approximation and the lower approximation of a set, that make it possible to define rough concepts. Definition 3. Upper and lower approximations of a set For any X CU and B C Athe B-lower approximation B_X ofX and the B-upper approximation BX ofX are defined by BX
= {XGU:
BX =
[X]B C
X},
{xeU:[x]BnX^0},
The set BNB{X)
=
BX\BX
is called B-boundary region ofX. Definition 4. Positive region and negative region of a set For each X CU and B C Athe B-positive region of set X is defined by POSB{X)
=
BX.
B-negative region of set X is the complement (within the universe) of the B-positive one, Le.y NEGB{X)
=
U\BX.
Any decision table is a specific information system. Definition 5. Decision Table Let A = {U,A) be an information system and let A = C U D where C, D are nonempty, disjoint subsets of A. The set C is called the set of condition attributes and the set D is called the set of decision attributes. The tuple A = (U^A^C^D) is referred as a decision table.
27.2 Belief and Plausibility Functions in Rough Sets In 1991 A. Skowron and J. Grzymala-Busse proposed a clear way of combining the rough set theory [2] and the evidence theory [3]. In this section we recall from [4] the basic definitions and theorems that are indispensable for further considerations.
27 Extracting Minimal Templates in a Decision Table
341
Let A = ([/, Au{d}) be a decision table, where d ^ Aisa. distinguished attribute called the decision. The decision d determines a partition of universe U into decision classes X i , . . . , X^(d), where r{d) = \{k : 3xeu d{x) = A;}| (means the number of different values of a decision attribute) is called the rank ofd. Let App.ClassA{d) be a family of sets defined by
App,ClassA{d)
= {AXu • • •, AXr(d)} U {BdA{0) : 6 e OA and \e\ > 1}
where
5d^(0) = f ) B A ^ ^ ( X , ) n n - ^ ^ A ( X i )
and
OA = {I,.. • ,r{d)}.
Now we defined a function of the form of F^ : OA —> App.ClassA (d) by (AXi for F^(^) = }
0 = {i}, i e { l , . . . , r ( d ) } e = ID 9 C OA, \0\ > 1
and the standard basic probability assignment TUA • 2^^ —> R+, is defined [4] by mA{0)=^-^^
for any
OCOA-
Theorem 1. For any 0 C OA the following equality holds [4]: BelA{0) =
\A u Xi\ '^'
It shows that BelA{0) (a behef function) is the ratio of the number of objects in U that may certainly be classified to the union Ui^gXi to the number of all objects inU. Corollary 1. For any 6 C OA the following equality holds [4]: \A U Xi\ PIA{0)
=
'^'
\u\
Hence, the plausibility function value for 6 is the ratio of the number of objects that may probably classified to the union Ui^gXi to the number of all objects in U.
342
Barbara Marszal-Paszek and Piotr Paszek
27.3 Templates in a Decision Table A template T in a decision table A = {U, A, C,d) is any sequence vi,... ,Vn, where Vi G Vai U {*} and ^ = { a i , . . . , an}. If the symbol * appears in a given template it means that the value of the marked attribute is not restricted by the template. We say that a given object matches a given template T = (t^i,..., Vn) if ai{x) = vi, for each i such that Vi ^ ^. The size of T is equal to the number of elements of T different from *. Minimal Template Problem For a given decision table we would like to find minimal templates (i.e., having possibly the smallest number of positions different from *) that define sets with some relevant properties of functions Bel and PI over the decision table. Let A be the decision table. By A T we denote the restrction of A to the template T, i.e., UT-{X G U : ai{x) — Vi, for alH G { 1 , . . . , n} such that Vi ^ *} and AT is the set of attributes of A restricted to UT. We consider the following problem: Minimal Template Problem (MTP) Input: decision table A; thresholds e^K e (0,1) and a natural number n. Output: minimal (with respect to the length) templates T for which there exists a set ^ Q ^AT ^ith at most n elements satisfying the following conditions: \PIAA^)
- BelAAO)\ < e
|P/A^(^)|>1-6
^
> if.
(27.1) (27.2) (27.3)
The support of the template T should be sufficiently large (condition (27.3)) and the set of decisions in A T should be well approximated, i.e., for some possible small set 0 C ©A^ the union |J^^^ Xi of decision classes Xi (i e 6) of A T is approximated with the high quality described by conditions (27.1)-(27.2). The condition (27.1) states that the relative size of the boundary region of [J^^Q Xi is sufficiently small and the condition (27.2) is expressing the fact that the the relative size of UT — [Ji^e Xi is sufficiently small. Any such template T can be also interpreted as a decision rule (association rule [1]) of the form aT —^ d £ 0 where ar = /\{ai = Vi : Vi j^ *}. Such rules can be interesting in case where there are no rules (with the right hand side described by a single decision value) in a given decision table that have satisfactory support (see inequality (27.3)). Then we search for rules having a sufficiently large support with respect to some minimal sets 0 of decision values. To extract such templates we use genetic algorithms.
27 Extracting Minimal Templates in a Decision Table
343
The templates can be used in decomposition of large data sets into smaller ones that are feasible for analysis by the existing rough set algorithms. Notice that the result of decomposition will be a forest of decision trees rather than a single decision tree. For applications some more advanced versions of the minimal template problem can be considered such as the critical pattern problem. Such critical patterns are defined by adding one more requirement to templates considered previously. This requirement states that any extension of the extracted pattern T has also the analogous properties as the template T.
27.4 Templates defined by Approximate Reducts with Respect to Decision Value Sets In this section we consider an approach for constructing from a given decision table decision tables that are semi-optimal with respect to the quality measures analogous to the defined in the previous section. Such tables are defined by subsets of condition attributes. This leads to decreasing the discemibility between objects. However, some unions of decision classes can be still approximated with the high quality. These attribute subsets can be treated as (approximate) reducts assuming that some grouping of decision values in the original decision table can be performed. Let A = ([/, A^ C, d) be a decision table and let Oj = d{U). We define a partial order on pairs (B, 0) where B C A and ^ C ©j by ( 5 , / ) < {B\ r ) iff EC B' and / C / ' .
(27.4)
For any B C A by AB we consider a restriction to B of the decision table A = (f/, A, C, d), i.e., A B = (C/, 5 U d, 5 , d). We consider the following problem: Minimal Redact Problem (MRP) Input: decision table A; thresholds k,l,€ e (0, l),k < I and a natural number n. Output: minimal pairs (5,6) such that B C A ,6 C 0 ^ ^ with at most n elements satisfying the following conditions: \PIAA^)
- BelAA^)\ < e
\PlAs{e)\>l-s k\U/IND{A)\
< \U/IND{B)\
(27.5) (27.6)
< l\U/IND{A)\.
(27.7)
The conditions (27.5)-(27.6) are analogous to those formulated for the previous problem. The requirement (27.7) states that the number of S-indiscmibility classes is reduced comparing the number of A-indiscemibility classes (it is less than l\IND{A)\) but its is still significant (it is greater than k\IND{A)\). We use genetic algorithm to generate semi-optimal solutions of the above problem. The discussed reducts can be used for decomposition of larger data tables into smaller ones that are feasible for the existing rough set based algorithms.
344
Barbara Marszal-Paszek and Piotr Paszek
27.5 Conclusion In the paper it was suggested that the relationships between the rough set theory and the evidence theory could be used in finding minimal templates for a given decision table, also for a table with grouped decision classes. Extracting the templates from data is a problem that consists in finding some reducts in a given decision table. Any such reduct is a set of attributes with a minimal number of attributes, which warrants, among others, that the difference between the belief function and the plausibility function is sufficiently small. The discussed reducts can be used in decomposition of larger data tables into smaller ones tractable by the existing rough set algorithms. The last statement is a subject of our current research. Ackonowledgments We wish to express our thanks to Professor Andrzej Skowron for his helpful comments during making this work.
References 1. Friedman, J.H. and Hastie, T., and Tibshirani, R.: The elements of statistical learning: Data mining, inference, and prediction. Springer, Heidelberg 2001. 2. Pawlak, Z.: Rough Sets: Theoretical aspects of reasoning about data. Boston: Kluwer Academic Publishers (1991). 3. Shafer, G.: A mathematical theory of evidence. Princeton University Press (1976). 4. Skowron, A. and Grzymala-Busse, J.: From the Rough Set Theory to the Evidence Theory. In R.R. Yager, M. Fedrizzi, J. Kacprzyk (eds.). Advances in the Dempster-Shafer Theory of Evidence. New York: Wiley (1994) 193-236.
Part II
Application Domains and Case Studies
28 Programming Bounded Rationality Hans-Dieter Burkhard Institute of Informatics Humboldt University, D-10099 Berlin, Germany [email protected]
Summary. Research on Artificial Intelligence and Robotics helps to understand problems of rationality. Autonomous robots have to act and react in complex environments under constraints of their bounded resources. Simple daily tasks are much more difficult to implement than playing chess. Soccer playing robots are considered as testfieldfor rational behavior. Our implementations are inspired by the belief-desire-intention model.
28.1 Introduction Control of autonomous robots and agents in dynamical environments is interesting from a cognitive point of view as well as under application view points. Different architectures have been inspired by cognitive issues as well as technical requirements. It is commonly accepted that simple stimulus response behavior as well as deliberative decisions are both useful according to different requirements. Hybrid architectures are built to combine both. The underlying conflict between complex computations for efficient behavior and bounded resources is known as bounded rationality. In his book [2], Bratman has investigated a model based on the concepts of belief (what the agent supposes the world to be), desires (what the agent desires, but not necessarily tries to achieve), and intentions (what the agent intends to achieve and the actions he wants to perform for that achievements). There is a sophisticated process of refinements of intentions and plans which finally lead to actions. Implementations of this model are known as BDI architectures (BDI = belief, desire, intention, [7], [11]). Actually, the usage of the notions in the implementations is not unique. There were successful models in an multi-modal logical framework. BDI approaches are often identified with these models and their applications. In contrast, our approach uses the BDI concepts as bases for data structures in object oriented implementations. The aim is to implement and maintain a data structure which corresponds to the idea of intentions as hierarchical partial plans which are completed at that time when it is necessary. We especially address problems known as upwards and recognizant
348
Hans-Dieter Burkhard
failures: In hybrid architectures certain failures are recognized at the lower levels, but need handling on higher levels. As an example we use the soccer domain from the RoboCup initiative. The author likes to thank the previous and recent members of the RoboCup teams "AT Humboldt" (Simulation League) and "German Team" (Sony Four Legged Robotic League) for a lot of fruitful discussions. The paper could not have been written without their theoretical and practical work. The work is granted by the German Research Association (DFG) in the main research program 1125 "Cooperating teams of mobile robots in dynamic and competitive environments".
28.2 Robot Control in Dynamic Environments Control of autonomous robots in dynamic environments is often considered under the view point of moving vehicles with the need for fast short term reactions (e.g. obstacle avoidance) but sufficient time for long term planning. Layered architectures with simple "reactive" low level behavior and complex high level behavior are state of the art (cf. [1],[6]). Only the low level reactions are considered as time critical, while re-planning after the occurrence of unexpected events might take longer time. The vehicle must perform an immediate stop if the road is suddenly closed, but then it might wait until a new path is planned. This need not to be true in other scenarios. Soccer can serve as an example for explanation. During an offensive, the players of team A are oriented forward. Suddenly, when the opponents get the control over the ball (e.g. after missing a pass by team A), then team A should immediately switch to defensive. What does it mean? Firstly, players should immediately stop running forward, they need not to run free anymore. Up to this point, it is comparable to the situation of a vehicle which stops moving before an obstacle. Secondly, it means immediately to adapt defensive play: Instead of running free (trying to keep distance to opponents) they have to mark (try to come closer to opponents and to attack them). There is no time for making new plans, the situation becomes different from the case of vehicles. 28.2.1 Soccer Robots: The RoboCup Initiative The soccer scenario has been proven to be very useful for the discussion of problems and for the evaluation of proposed solutions for autonomous mobile robots in dynamic environments. Such environments are characterized by fast changes, plans may become invalid by unpredictable events. It is not possible to predict future events in general. Moreover, sensory information is incomplete and unreliable. Robots have to be aware, that their skills may not be successful and that their plans may fail. One may theoretically think about a plan to play the ball via several players from the goal-kick to the opponents goal, but nobody would expect that plan to work. Note that there is a great difference to a chess program: It is easy to write a program for finding the ultimate best moves, it is "only" a question of complexity to run this program. But nobody is able to write a similar program for soccer playing robots.
28 Programming Bounded Rationality
349
Different leagues of RoboCup [8], [4] are devoted to different aspects. Leagues with real robots investigate the problems of most suitable hardware and their integrated usage (body materials, drives, legs, sensors, energy, control etc.). The simulation league uses a virtual environment, where software aspects can be studied long before the related hardware is available in praxis. RoboCup Rescue investigates the problems of rescue robots. While soccer robots have to work totally autonomous without any outside control, the challenge of rescue robots is the coordination of humans and robots. Rescue robots are assistant robots. In case of a disaster, there will be not enough human experts to control several hundreds or thousands of robots. Moreover, it is questionable if an outside human controller could receive and process all relevant data to make robots move safely in a disrupted ground. But humans can give global advices, and if a robot finds some victim, then humans will have to decide about further steps. The vision of RoboCup is stated as follows: "By the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team." The challenge behind this vision is twofold: The robots should act autonomously, and they must be accepted as opponents of a human team. This means, that a gun is not allowed, that shape and speed must regard certain constraints, and a lot of further requirements. In fact, it is a vision: Nobody knows if the goal can be reached at all. But one should imagine the development from the first aircrafts to the first man in space, the development from first computers to the success of Deep Blue in chess. RoboCup is not understood as simply a competition, the RoboCup community discovers the strengths and limits of autonomous robots. Having in mind a long term vision, new research challenges are discussed and fixed year by year. There is a common understanding, that solutions and programs are disclosed for the usage of all teams. This explains the huge progress of RoboCup robots since the start in 1997. It is not really important if robots can eventually win against human. What counts is the development of science and techniques by evaluation in competition (comparable to the development of motor cars and aircrafts one hundred years ago), and the exchange of solutions. And it is a great amount of fun, not at least for those young people which will see the outcomes in fifty years. They have their own competition: the games of RoboCup Junior [9]. 28.2.2 Horizontal Modularization Software programs for robot control have to deal with a lot of different aspects ranging from scene interpretation over decision/planning/coordination to actor control. Many efforts and discussions are due to appropriate structures and technologies of software design (cf. e.g. [1], [6], [10]). Natural cognition may serve as model. There exist some commonly used concepts (belief, plan, goal, . . . ) borrowed from such models. This makes communication easier for system developers and programmers. But there are also objections against the usage of mental notions for machines. Control items for robots include •
sensors and perception unit to process inputs from the environment.
350 • •
Hans-Dieter Burkhard behavior control (with different complexities ranging from simple stimulus response behavior to long term deliberative behavior), actors and basic action control to act in the environment (sometimes using direct feed back with sensors).
Communication capabilities are included in both the sensors and actors. The operation according to this first "horizontal" modularization is referred to as the "sense-think-act"-cycle: First the sensory inputs are interpreted, next decisions take place in order to determine appropriate actions, and then actions are performed. Simple control structures perform this cycle just as it is. More complex tasks need planning and coordination. The "think"-step may maintain some memory to keep past commitments. Those commitments (goals, plans) may affect recent decisions. The "sense"-step may maintain another memory ("worldmodel") to keep older information for recent use. The worldmodel is important if the environment is not totally observable at any time point. The soccer ball may be hidden by other players. Hence it is useful to have an idea of hidden objects by memorizing their former appearances. No memory is needed in chess: there the players have complete access to the actual state on the chess board. Besides the "horizontal" modularization according to sense-think-act, there may be different "vertical" layers according to complexity issues. The decision processes are often split into a fast reactive layer with short decision cycles, and a deliberative layer for long term planning with longer cycles. Each level may have its own sensethink-act cycle. The synchronization of such complex layers is a real challenge for software technology.
28.3 How to Implement a Double Pass? 28.3.1 Simple Skills Soccer robots need basic skills like pass, score, dribble, intercept, position. Simple stimulus response behavior is sufficient to some extend. Interception of a moving ball can serve for illustration. A very simple player runs straight line to the place where he sees the ball. While the ball is moving, he adjusts his direction every time he looks for the ball, and he will perform a curved path as the result. A somewhat more skillful player anticipates the optimal point for interception and runs directly to this point. As discussed in [3], there are different possibilities for the anticipation of the optimal intercept point, i.e. for determining an optimal speed vector v{d, u) given the distance d and the speed u of the ball. Machine learning methods are broadly used in RoboCup, but one might also think of complex calculations using physical laws. Such intercept is still realized as stimulus response behavior in a cycle: observe the ball - determine optimal intercept point - run to this point. Then there is another difficulty caused by noisy data and/or imprecise calculations: the calculated intercept point may vary for each observation. This leads to oscillating behavior when
28 Programming Bounded Rationality
351
the robot adapts strictly to the latest calculation. Fast adaptation may lead to worse results, while some "inertia" would be better. If stability must be enforced by software (if simple physical inertia is not sufficient), the program has to compare older results with the newer ones and has to make appropriate interpolations. Therefore, a memory for older decisions is necessary. 28.3.2 Coordination More complex problems are illustrated by basic coordination problems. The decision processes become more and more complex (and subject to further stability problems) as the time horizon is enlarged. Here are some examples of decision processes in the RoboCup scenario: •
•
•
A player decides if he can intercept the ball, i.e. if the ball is reachable for him. The decision process can use the procedures for computing v{d, u) from above. It can be extended to calculate the interception point p{d^ u) and time t{d, u). A player decides if he can intercept the ball before any other players can do. He has to compare his own chances with the interception times of other players (e.g. using the methods to calculate p{d^ u) and t{d^ u) from the view point of other players). A player has to decide for the chances of a pass. If he would kick the ball with a fixed speed vector u, he could determine the player which can intercept first by the calculation method from above. Such calculations can be done for several hypothetic kick directions. After evaluating different kicks, the player can decide for a certain pass, or for some other action if there is no good chance for passing.
Up to this point, the methods are comparable to decision procedures as used in chess: try to imagine what happens for different alternatives and choose the best one. The projection into the future uses some kind of simulation of possible futures. Unfortunately, those decision procedures do not scale up in time. It is impossible to make an anticipation even for standard situations like a comer kick. On the other hand, comer kicks provide pattems for behavior which are worth to be used for robot control. A central problem is the incompleteness of the underlying "plans". It is not possible to set up a complete recipe for intended actions. The plans are usually not fully instantiated even during their execution. Therefore it is sometimes neglected that there is a plan at all. In any case, classical planning methods using search strategies are of limited use. As an example, we consider the course of actions during a double pass. We will use this example extensively in this paper. The course of actions is known only in principle at the beginning. The details will be known afterwards. At the end, the description from the view point of the first player could contain items like this: 1. Dribbling from point pi with speed vi to point P2 between time ti and time ^2. 2. Kick with speed vector ui at time ta. (Pass to team mate.) 3. Run from point p^ with speed V2 to point p4 between time ^4 and time ^5. (Run over opponent.)
352
Hans-Dieter Burkhard
4. Run from point ps with speed vs to point pe between time te and time tj. (Run for reposition.) 5. Run from point p7 with speed f4 to point ps between time tg and time tg. (Run for reposition.) 6. Intercept at time ^10. The parameters of this "script" cannot be defined in advance. For example, after the kick at time t^,, only the parameters pi, vi, p2, ti, t2, ui, t^ are known, while the others are still undefined. Besides incompletely known physical constraints, they depend on the behavior of the team mate and the opponent. The question is how one can deal efficiently with such incomplete plans. How is it possible to implement behaviors like a double pass or other more complex standard situations if classical planning is not applicable? Only partial plans can be instantiated during deliberation (e.g. the scheme of the double pass), while the concrete parameters are determined by the principle of least conmiitment. Moreover, there is a hierarchy leading from "general" intentions to somewhat more "specific" ones. The general intention (play double pass) is broken down to specific intentions dribble, kick etc. The intention dribble is again composed from simpler actions (run, ball handling, ...) and so on. As time proceeds, the concrete lower level intentions are completed, but there is no complete plan from the beginning. Actually, these requirements correspond to the analysis in [2], chapter 3.1: "Partiality and hierarchy combine with the inertia of plans to give many intentions and actions a hybrid character: at one and the same time, a new intention or action may be both deliberative in one respect and nondeliberative in another. An intention or action may be the immediate upshot of deliberation, and so deliberative. But that very deliberation may have taken as fixed a background of prior intentions and plans that are not up for reconsideration at the time of the deliberation. I may hold fixed my intention to earn a doctorate in philosophy while deliberating about what school to go on, what to write a thesis on, and so on. It is by way of such plans - plans that are partial, hierarchical, resist reconsideration, and eventually control conduct - that the connection between our deliberation and our action is systematically extended over time. The partiality of such plans is essential to their usefulness to us. But on the other side of the coin of partiality are the patterns of reasoning I have been emphasizing: reasoning from a prior intention to further, more specific intentions, or to further intentions concerning means or preliminary steps. In such reasoning we fill in plans in ways required for them successfully to guide our conduct." The implementation of processes for this kind of planning is a question of appropriate data structures. Such data structures must implement something like a "mental" state.
28 Programming Bounded Rationality
353
28.4 ''MentaF' States 28.4.1 Belief and Commitment By common understanding, simple stimulus response behavior does not use any state. Whenever the agent receives the same input, it reacts with the same behavior. A closer consideration may show some further underlying assumptions like determinism, appropriate correspondence between input and output, appropriate tact rates etc. On the other hand, if the actions of the agent depend not only from the most recent input, then there must be a state for memorizing things that happened in the past. Such things may concern past observations and past decisions of the agent. The concept of states is used in different ways. Sometimes it simply means the different outcomes of a switch assignment. In this paper, the crucial point for talking about a ("persistent") state is the fact, that its contents persist over a longer time period than only one sense-think-act cycle. According to their contents and impacts, we distinguish between belief states and commitment states. The belief state contains information concerning past and recent events in the environment, therefore it is called "worldmodel". Belief can be understood as the overall notion for the information derived from the sensors. It contains the inferred model of the external world. Therefore it processes data from sensors (including communicated information). It may also contain information about internal facts (e.g. battery status) as derived by internal sensors. Usually, the agent does not have direct access by sensors to all relevant information. Therefore, the agent may use inferential methods based on general knowledge about the world. It may simulate the expected course of affairs to get some idea about the state of the world. Usually, the internal representation of the outside world may be not correct. Hence, this representation is called belief, and not knowledge. The inconsistencies may origin in unreliable data and misleading adoptions. While the usage of the concept of belief is similar in all approaches, the concepts related to conmiitments are used in different ways (corrupting their usefulness in discussions to a certain extend). Commitment states contain the information about goals, plans etc. This information originates from the internal decisions of the agent concerning desirable events and facts in the future. Bratman proposes a segmentation of the commitment state into desires, intentions, and plans. As cited above, intentions and plans are closely related: "Intentions are plans writ large." As more as intentions are refined, they become steps of a plan to be performed. Intentions and plans describe and determine future actions that the agent already has committed to perform. In contrast, desires are pre-commitments for structuring the decision process. They are candidates for forthcoming intentions. Both, belief and commitment states are used to memorize contents from the past to the presence and to the future. But the contents of the belief states are concerned with the events that happened in the past, while the conmiitment states are concerned with the events that should happen in the future. According to this distinction, belief
354
Hans-Dieter Burkhard
States are past-directed, commitment states are future-directed. Note that this differs from the use of past, presence and future states in [6]. Interestingly, the simulation methods of the worldmodel used for extrapolating present facts from older ones are also useful for extrapolation into the future: The agent can imagine consequences of possible actions and events in the deliberation process. Therewith he can evaluate the results of possible plans and choose the best one. 28.4.2 The Option Tree As discussed above, plans/intentions are hierarchically organized. The appropriate data structure is a tree. A double pass intention consists of sub-intentions dribble, pass, run, reposition, intercept. Dribble consists of further sub-intentions and so on. Intentions are partially determined during execution: There are choices left for later commitment, e.g. how to perform dribbling. Only one of the existing alternatives will become an intention. Others will remain still desirable, but without commitment at that moment in time(they are in the status "desire")- Others may simply be not candidates for commitment. The underlying general concept (the domain for desires and intentions) are the options. Options describe alternative courses of actions which may become intentions according to commitments. Options are hierarchically structured. Options are composed by sub-options: The double pass option consists of sub-options dribble, pass, run, reposition, intercept. Options can be realized in different ways: The overall oi^\ion play-soccer cdiVi be realized by the option offensive, or by the option defensive, or by some other useful option. We have two kinds of branches in the option tree: •
•
Choice-branches: They lead to different alternatives for performing the option at the top of the branch. A pass can be performed by di forward-kick, a sidewardkick, a bicycle-kick etc. Sequence-branches: They lead to a sequence of sub-options which must be performed to realize the option at the top of the branch (like the sub-options dribble, pass, run, reposition, intercept for a double pass).
The process of commitment for a complete intention means subsequent top-down commitments for a special sub-option at the related choice-branches. For example, the first part of a commitment might be offensive. Offensive can be realized in different ways. Therefore a next commitment is necessary, for example double pass. Then we have sequential sub-options dribble, pass, run, reposition, intercept. Each of them can be realized in different ways. Commitments are necessary for the alternatives for each of these sub-options: for the style of dribbling, for a special kind of pass etc. The commitments need not be performed at the same time. Instead, they may be performed by need. But afterwards, all commitments are perfect and we can talk about a completed intention. It has the form of a sub-tree in the option tree. Such a complete intention tree contains exactly one successor for each choice-branch start-
28 Programming Bounded Rationality
355
ing from the top of the tree, and it contains all sub-options for the sequence-branches in this tree. An example from the soccer domain is given in Figure 28.1. The numbers (e.g. in DoublePass/1) denote the role (htre: first player) in a cooperative plan.
[OfSasivef----^^ I Score I I l>0ubteP^$s/t
H DoublePass/a. |1
I Defensive | ChangeWings/1
TrTT"! | Attack 1
| OffsideTrap |
Kick I
«..
r Fig, 28.1. Option Tree with Intention Subtree for Double pass/1 According to our discussions from above, the idea behind this description of intentions is the definition of a partial hierarchical plan for the activities. At any concrete time point, the robot has to act according to the recent intentions. It is necessary, that there is a process of completing the intention in time. A proposed implementation will be described below in Section 28.6. Intentions as described here provide the means for the description of the commitment state of the robot. There are two aspects: 1. The options at the bottom of a choice branch in the option tree are annotated according to their commitment status: Roughly speaking, annotations may be of the form "intended", "desired", "not desired". The options which are marked "intended" form the intention tree. Actually, the annotations in our implementation are numbers (e.g. according to expected utilities). Intentions are defined top down according to the highest scores. Desires are defined according to the next highest scores. 2. The options at the bottom of the sequence branches in the intention tree are annotated according to their execution status: Their annotations are of the form "active", "waiting", "done". There is exactly one active option at each level of the intention tree. The active options form a path from the top of the tree to a certain leave determining the recent action. This path is called activation path. At the time when the first player passes to the second one, the activation path (cf. Figure 28.1) consists of PlaySoccer-Offensive-DoublePass/1-Pass- ... down to a concrete action (e.g. a kick-command with specified power and direction). The
356
Hans-Dieter Burkhard
Other options are marked "done" if they are already executed, or "waiting" if they will be executed in the future. Further annotations are needed for the parameters: we call this the agenda. It contains identification of players (e.g. which players are cooperating in the double pass), parameters of actions (e.g. anticipated intercept points, kick directions). The specifications are postponed as far as possible. Altogether, the annotations in the option tree describe the mental state concerning the commitments of the robot. It is a unique data structure over all levels of the hierarchy for all related purposes, no matter if it concerns high level or low level control.
28.5 Redeliberation: When to Abandon a Double Pass? 28.5.1 StabUity vs. Adaptation Dynamic environments, incomplete and imprecise information, unreliable actors there is no guarantee of success, plans may fail to reach their goals. The idea behind intentions is guidance of further deliberation for reasons of complexity and stability of behavior. Hence intentions should be stable. But if there is enough evidence for failure, intentions should be cancelled to prevent from fanaticism. But what are the right criteria? We have already mentioned the problem of oscillating behavior in case of too early adaptation. In the extreme case, the agent might not reach any goal when he always decides for another plan before completing any of them: Buridan's ass gets starved between two piles of hay. Usually both piles are considered to be equally attractive, but noise of sensors might cause oscillating attractiveness of the piles. To be completely rational, the agent had to decide not only on the evaluation of the single options. He should take into account the consequences and costs for changing plans, the possibility for later improvements etc. Especially, for coordination with other agents, stability has a very large impact - and changing plans might result in a lot of overhead for negotiation and new commitments. As long as resources are bounded, attempts for stability may fail in certain cases, but will perform better in the sum over longer times. In natural systems, stability is often forced by the physical constraints: inertia forces smooth movements, restricted conmiunication forces to follow former agreements in cooperation. There are good reasons to keep old intentions, but it must be possible to cancel intentions if necessary. The program of the robot has to manage such critical situations. Consider the situation during a double pass after the first player has passed to his team mate. He runs forward over the opponent and trusts that his team mate will later pass back to him. Sometimes critical situations appear with the danger that the team mate will not reach the ball. The intention forces the player to keep running forward. But if there is enough evidence for failure, the intention has to be cancelled. Moreover, the new situation has to be managed by re-deliberation.
28 Programming Bounded Rationality
357
Re-deliberation is of course needed, if the double pass definitely fails, i.e. if the team mate did not reach the ball, but an opponent does. In this case, our player should not continue the double pass actions by running forward. Instead he should switch immediately to defense. Instead of running away from the opponent, he must try to come close to him. Re-deliberation may also be desirable in the case of unexpected better alternatives. After intercepting the ball, the team mate might have the possibility of scoring by a mistake of the opponent goalie. Then he should not pass back to the first player according to the double pass script. He should take the chance of scoring. It is a difficult task for the programmer to implement the two contradictory requirements: 1. efficiency of decision procedures to avoid time consuming re-considerations as long as things evolve as expected, and at the same time 2. maintaining the possibility for changing behavior in the case of non-expected courses of events. Often it is necessary not only to cancel an intention, but also to clean up further activities similar to an interrupt in a runtime system. 28.5.2 Failing upwards The situation becomes especially difficult if a failure is first recognized at a lower level of control, but needs handling on a higher level. The situation is called failing upwards. We consider again the situation during a double pass when the second player looses the ball. Both players of the team are affected, both have to switch from offensive to defensive play. This means a switch - a re-deliberation on a higher level of control. But there is an important difference: • •
The second player, the expected receiver of the pass, cannot continue according to his double pass / role 2 intention. The failure becomes known immediately. The first player could still perform actions according to his double pass / role 1 intention, he could continue running forward. But there is no reason to do that anymore. It is a harder problem of "upwards recognition": It must be assured by some means, that the failure is recognized on the higher level for the switch from offensive to defensive play.
A crucial point are the real time constraints of an application. In soccer, the switch from offensive play to defensive play should be performed immediately. It means, the real time constraints affect all levels of control (unlike the re-planning of a path for vehicle control).
28.6 How to Perform a Double Pass The special requirements we have identified so far are • •
Need for possibly time consuming long term deliberation. Need for re-deliberation in case of upwards failures under real time constraints.
358
Hans-Dieter Burkhard
There is no choice: Taken into account that deliberation may be time consuming, and that fast decisions for actions are necessary, we have to have two parallel processes (threads or something similar) as in hybrid architectures: one for deliberation, the other for fast actions. But in our approach, we use the complete data structure of the option tree for both processes. The hierarchy allows to describe behaviors and plans in a unique way, ranging from single actions on the lowest level up to long term plans on the highest levels. Partial plans are represented by only partially defined annotations, i.e. partially defined intention trees and partially defined parameters in the agenda. The both processes traverse through the whole option respective intention tree. They add or modify the annotations of the trees according to long term deliberation respective according to short time decisions. Hence they are calltd passes. The passes perform different tasks: The Deliberator performs the time consuming calculations for choice of goals and long term planning. It sets up a partial hierarchical plan (intention tree). Following the least commitment idea, the plan is refined as time goes on. Normally the deliberator does not have time problems since it works with sufficient forerun. Time critical decisions are left to the executor. The Executor performs the contemporary decisions. Based on the preparatory work of the deliberator, its decision space is restricted to the recent intention. It has to determine the concrete parameters e.g. for a kick at the time when the kick is to be performed using the newest sensor information. On the lower levels it controls behaviors similar to the reactive layers of hybrid control. But before doing that, it has to traverse the higher levels in order to check conditions and to perform decisions if necessary. It is important to realize that the both processes are independently running passes through all levels of the hierarchy. This is in contrast to runtime organization in layered architectures (where short time decisions only affect the lowest level) and in imperative programming (where only the procedure on the top of the stack is active). 28.6.1 Deliberator: When to Perform a Double Pass? While the player runs for intercepting the ball, he can already make guesses about the situation when he will have control over the ball: could there be a chance for scoring, would it be desirable to change the wings, is a partner for a pass available and so on. He can set up possible (partial) plans for the future. They are called desires, but not intentions, because there is no commitment to any of them. Which of them becomes the status of an intention will be decided later. Note that different desires can be set up, possibly by different threads. They provide the possibility to be prepared for the handhng of recognizant failures. Consider the players during the double pass. There may be an already prepared alternative plan P (a desire) for defensive play. If it unfortunately happens, that the opponents get the ball, then the players cancel the intention double pass and can switch immediately
28 Programming Bounded Rationality
359
to the new installed intention for defensive according to the prepared plan P without loss of time. The deliberator has to choose from all possibilities: He decides for the prior intention. Since there is no need not to fill in all the details, deliberation can start very early, e.g. while a player is still running for intercepting the ball. Of course, these evaluations are only temporary, thus they remain in the status of desires with different degrees of desirability. The degrees are expressed as annotations at the choice branches in the option tree. They are updated as time goes on until the player gets control over the ball. Then the highest ranked desire becomes the new intention. For example, this new intention is change wings. Note that until this time point, the executor was working over an older intention terminating with the intercept. Now, when the old intention is completed, the new intention for change wings provides a new scope for the executor. Meanwhile, the deliberator can already evaluate the options desirable after the player has finished his actions for the change wings intention. Actually, the choices and evaluations by the deliberator are very complex. In general, it has to process a very large search space. The search space is based on the known parameters of the ball and of the 22 players with respect to localization, speed and further conditions. Additional parameters like distances and guesses about opponent tactics can be derived. Not all parameters do really influence any decision. The hierarchical structures helps to split the space into better manageable parts. Standard situations provide generic cases for cooperative play. Using methods from Case Based Reasoning (CBR, cf. [5]), a concrete situation can be matched to a standard situation. For example, a triggering feature for the double pass is an opponent on the way of an offensive player controlling the ball. The standard situation (the "case") provides a standard scheme ("solution") for an intention. Using CBR methods for adaptation, a concrete intention can be specified. The option hierarchy serves as a structure for describing cases. 28.6.2 Executor: How to Realize a Double Pass? The executor has to guarantee that relevant checks and decisions are performed under real time constraints. In complex environments, it is not possible to do the complete deliberation or re-deliberation in the case of failure in short time. The restricted scope of the activation path in the intention tree is a mean to minimize the work of the executor. There it can perform relevant steps on all levels of the hierarchy. Again, we consider the work of the executor during the double pass while the first player is running over the opponent. The condition "is offensive play still applicable" (i.e. does the team maintain control over the ball) is checked while traversing the high level concerning offensive or defensive play. For this condition, it doesn't matter what kind of offensive play (which sub-intention) is running. As long as ball control is due to the own team, the executor steps down to the next lower level. There on some intermediate level, the player checks if his team mate has already passed the ball back to him. If not, he continues to run to a free space behind the opponent. Otherwise he changes to the next sub-option run for intercept. It proceeds this way
360
Hans-Dieter Burkhard
down to the basic level following and modifying the activation path. Parameters of the agenda are calculated if necessary. In each cycle, the executor starts traversing the higher levels in order to check conditions and to perform decisions if necessary. This permits an efficient handling of events that would cause "upwards failures" otherwise. In our example, the condition "does the team still maintain control over the ball" is checked already on the high level. If the ball is lost, the executor can immediately stop further traversing in the double pass intention. Instead he can switch to an appropriate altemative desire for defensive which becomes the new intention. Hopefully, such an altemative has been prepared by the deliberator in time. But at least, there will be no waste of time for hopelessly continuing the double pass. Therewith, failures are not checked at the low level and then reported to the higher levels. Instead, the executor checks for possible failures on the levels they belong to. The crucial point behind is a problem of software technology. In a stack oriented run time system with control given to the last called procedure (the procedure on top of the run time stack) we have two alternatives: •
•
The last called procedure performs the necessary tests. In our example, there is a procedure active which performs running forward. This procedure has to test if the team mate still maintains control over the ball. There is some higher level procedure which performs the tests according to the impacts of the consequences. This is the procedure for performing the double pass, which has to be cancelled if the ball is lost.
The last altemative is the conceptual correct choice. But in stack oriented architecture, the procedure becomes active only after the called subroutines are terminated. This can result in a considerable delay. If tests are performed and reported upwards by the low level procedure run forward, this procedure will become overloaded by related tests for all higher level options using this procedure. Skills like running forward are used in different contexts. Tests of relevant conditions for all these contexts would result in the overload. Different run forward skills for different contexts would lead to repeated parts of code.
28.7 Conclusion The problem of coordinated actions in robot soccer was considered under the view point of bounded rationality. The proposed architecture ("Double Pass Architecture" because of its two passes for deliberator respective executor) was inspired by the work of Bratman: Prior intentions set the screen for later decisions which lead to further intentions, and finally to actions. Intentions and plans are hierarchically stmctured and only partially defined. To maintain a related "mental" state for commitments, our architecture uses the data structure of the option tree. A commitment state is given by annotations of this tree. The annotations define the recent intention
28 Programming Bounded Rationality
361
tree, the activation path in this tree, and the parameters in the agenda. Additional sub-trees may define desires as candidates for later intentions. Deliberator and executor are to some extend motivated by [2], chapter 3.3: "Practical reasoning, then, has two levels: prior intentions and plans pose problems and provide a filter on options that are potential solutions to those problems; desire-belief reasons enter as considerations to be weighed in deliberating between relevant and admissible options." There is still ongoing work about the suitable distribution of work between deliberator and executor. An experimental version is under development for the simulation league of RoboCup in the project "Architectures and Learning on the Base of Mental Models" of the research program 1125 "Cooperating teams of mobile robots in dynamic and competitive environments" granted by the German Research Association (DFG). An important feature is the unique data structure and the handling of so called upwards failures. Actually, they are not reported upwards anymore because they are already detected on the right level of responsibility. An often discussed practical problem concerns the question if the explicit treatment of complex behaviors is necessary at all. Without explicit representation, a double pass may simply emerge from appropriately tuned simpler behaviors. In fact, such emergent behavior can sometimes be observed in RoboCup competitions. Emergence can occur by chance only, or by intentional fine tuning of simpler behaviors by the programmer. At least in the latter case, the double pass is implicitly present in the program. But the question is, how far this approach can scale up. Our approach is based on an explicit representation using the data structure of the option tree. For an outside observer it makes no difference if the observed behavior is implicitly or explicitly implemented. We are back at an interesting cognitive question: Which "mental" states do have a real representation in mind, and which are only useful ascriptions of observed behavior?
References 1. Arkin R C (1998) Behavior Based Robotics. MIT Press 2. Bratman M E (1987) Intentions, Plans, and Practical Reason. Harvard University Press, Massachusetts 3. Burkhard H D (2002) Real Time Control for Autonomous Mobile Robots. Fundamenta Informaticae 51(3):251-270 4. Kitano H, Asada M, Kuniyoshi Y, Noda I, Osawa E, Matsubara H (1997) RoboCup: A challenge problem for AI. AI Magazine 18(l):73-85 5. Lenz M, Bartsch-Sporl B, Burkhard H D, Wess S (eds) (1998) Case Based Reasoning Technology. From Foundations to Applications. LNAI1400, Springer 6. Murphy R R (2000) Introduction to AI Robotics. MIT Press 7. Rao A S, Georgeff M P (1991) Modeling agents within a BDI-architecture. In: Fikes R, Sandewall E (eds) Proc. of the 2rd Int. Conf. on Principles of Knowledge Representation and Reasoning (KR'91)
362
Hans-Dieter Burkhard
8. RoboCup. The Robot World Cup Initiative: www. r o b o c u p . o r g Annual Proceedings of the RoboCup Workshops/Symposia appear in the Springer LNAISeries 9. RoboCupJunior website: www. r o b o c u p j u n i o r . o r g 10. WeiB G (ed)(1999) Multiagent Systems. A Modem Approach to Distributed Artificial Intelligence. MIT Press 11. Wooldridge M (1999) Intelligent Agents. In: [10]: 27-78
29 Generalized Game Theory's Contribution to Multi-agent Modelling Addressing Problems of Social Regulation, Social Order, and Effective Security Tom R. Bums\ Jose Castro Caldas^, and Ewa Roszkowska^'^ ^ Uppsala Theory Circle, Department, of Sociology University of Uppsala, Box 624, 75126 Uppsala, Sweden. [email protected] ^ Dinamia-ISCTE Av. Das Forcas Armadas, Lisboa 1649-026, Portugal [email protected] ^ Faculty of Economics, University of Bialystok Warszawska 63, 15-062 Bialystok ^ Bialystok School of Economics Choroszczanska 31, 15-732 Bialystok, Poland erosz@w3 cache.uwb.edu.pi
Summary. The paper is divided into three parts: In section 29.1 of the paper. Generalized Game Theory (GGT) is outlined, and its applications in formalizing key social science concepts such as institutions, social relationships, roles, judgment, and games are presented. Institutions operate as a type of social algorithm, organizing and regulating agents playing different roles as they engage in deliberation and judgment activities and make and implement collective decisions. Section 29.2 of the paper will present simple multi-agent simulation models and selected results of the simulation. Section 29.3 will briefly outline an agenda for societal research based on the application of GGT to explaining and managing problems of insecurity and social disorder in multi-agent systems. In the GGT perspective, the problem of security can be formulated in terms of regulating a system and its agents, and dealing with social disorder and crisis.
29.1 An Outline of GGT: Game Structures and Game Processes GGT can be characterized as a cultural institutional approach to game conceptualization and analysis. Social theory concepts such as norm, value, belief, role, social relationship, and institution as well as game interaction structures can be defined in a uniform way in terms of rules and rule complexes.
364
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
Given a concrete situation S^ in context t (time, space, social environment), a general game structure is represented as a particular rule complex G{t) ([5],[12]).^ Informally speaking, a rule complex is a set consisting of rules and/or other rule complexes. The G{t) complex includes as subcomplexes of rules the players' social roles vis-a-vis one another along with relevant norms and other rules. Suppose that a group or collective / = { 1 , . . . ,m} of actors is involved in a game G{t). ROLE{i,t, G) denotes actor i's role complex in G{t) (we drop the "G^" indexing of the role). ROLE{I, t) denotes a role configuration of all actors in / engaged in G{t). For every z = 1 , . . . , m, ROLE{i, t) is a subcomplex of ROLE{I, t) and the latter complex is a subcomplex of G{t), i.e. ROLE{i,t)
Cg ROLE{I,t)
Cg G{t)
(29.1)
The game structure G{t) consists then of a configuration of two or more roles together with R, some general rules (and complexes) of the game: G{t) = [ROLEl, ROLE2,....,
ROLEk; R].
(29.2)
R contains rules (or rule complexes) which describe and regulate the game such as the "rules of the game", general norms, practical rules (for instance, initiation and stop rules in a procedure or algorithm) and meta-rules, indicating, for instance, how seriously or strict the roles and rules of the game are to be implemented, and possibly rules specifying ways to adapt or to adjust the rule complexes to particular situations. G{t) also contains rules defining legitimate or appropriate players, their legitimate or appropriate options, their values, preferences, or utility functions, and their bases of judgment and action determination. The game structure G{t) is distinguished from game process. The game process entails the participating agents applying and performing the appropriate rules and rule complexes of the game structure in a given time and context situation t. The game process entails "playing the game," that is, agents collecting information, making judgments, performing their roles, possibly innovating, and, in general, exercising "human agency." More specifically, a rule complex, let us say C Cg R contains ^ In practice, a totality of rules can have a more complicated structure than merely a set of rules. The notion of rule complex was introduced as a generalization of a set of rules. Formally, a rule complex is a set obtained according to the following formation rules: Anyfiniteset of rules is a rule complex. If C, D are rule complexes, then CuD and p{C) are rule complexes. If C C D and D is a rule complex, then C is a rule complex. A rule complex C is a subcomplex of a rule complex D, C Cg D, if C = D or C obtains from D by deletion of some (or all occurrences) of occurrences rules in D or removal of redundant parentheses. Rule complex is a major concept in GGT. The motivation behind the development of this concept has been to consider repertoires of rules in all their complexity with complex interdependencies among the rules and, hence, to not merely consider them as sets of rules. The organization of rules in rule complexes provides us with a powerful tool to investigate and describe various sorts of rules with respect to their functions such as values, norms, judgment rules, prescriptive rules, and meta-rules as well as more complex objects consisting of rules such as roles, routines, algorithms, models of reality as well as social relationships and games.
29 Generalized Game Theory's Contribution to Multi-agent Modelling
365
rules defining what is a bonafide player (as opposed to, for instance, an "outsider" or a non-human "agent" such as a "robot" or "nature"). In classical game terms, C would specify that all players are "rational" beings (with well-defined goals, consistent in their preferences, and exercising freedom of choice), etc. In real social life, however, there are typically social categories such as "gender" (male, female, or "mixtures"), "under age" (for instance, < 18) defining roles along with specified action opportunities and constraints. There are often rules in C defining or specifying the characteristics of people assigned to particular roles, for instance, persons of a particular social class, status, profession or occupation. We shall also see that persons in their roles may also be distinguished in terms of their level of commitment or performance (see simulation studies later). An actor's role is specified in GOT in terms of a few basic cognitive and normative components, that is rule subcomplexes (see Figure 29.1). Key role components are the following: 1. VALUE{i, t) represents actor i's value complex and provides inputs to generating evaluations and preferences through judgment processes, i's value complex in situation t consists of evaluative rules assigning value to relevant things, states of the world, deeds, and people in the situation, defining, among other things, what is "good", "bad", "right", "wrong", "acceptable", "unacceptable". The strength or resilience of a value reflects commitment, a type of meta-value (see Section 29.2). Rules in VALUE{i, i) may specify (or generate) preference relations among the set of objects X, where X may be the available strategies or the outcomes of interaction in a game matrix or persons or classes of persons. And, of course, VALUE{i, t) may contain a well-defined "utility function." 2. MODEL{i, t) - The subcomplex represents actor i's belief structure about herself, others, and her environment as well as relevant conditions, and constraints in the situation St. It contains beliefs (which are a type of rule) about important characteristics of the game situation and its context. 3. ACT{i,t) represents the repertoire of acts, strategies, routines, programs, and action available to actor i in her particular role in the game situation St. These are acts that are obligated, allowed, or possible. Other actions may be allowed only under special circumstances, or may be discouraged or forbidden. Such a normative action complex includes general principles, guidelines, directives, and regulations indicating what to do and not do in the course of interacting with others in playing the game. It includes rules specifying a set of strategies (or principles for generating or discovering strategies) in actor f's role vis-a-vis others in game G{t). 4. J{i, t) - The judgment complex consists of rules which enable the agent i to come to conclusions about truth, validity, value, or choice of strategic action(s) in a given situation. In general, judgement is a process of operation on objects. The types of objects on which judgements can operate are: values, norms, beliefs, data, strategies, and outcomes. Also there are different kinds of outputs or conclusions of judgment operations such as evaluations, beliefs, data, pro-
366
Tom R. Burns, Jose Castro Caldas, and Ewa Roszkowska cedures, or rule complexes. For our purposes here, we concentrate on action judgments and decisions.
Judgment is a core concept in GGT ([6],[7],[8],[10]). The major basis of judgment is a process of comparing and determining similarity. The capacity of actors to judge similarity or likeness (that is, up to some threshold, which is specified by a metarule or norm of stringency), plays a major part in the construction, selection, and judgment of action. This is also the foundation for rule-following or rule-application activity. ^ Definition 1. Given a player i with a judgment complex J{i,t) in situation St, i applies J{i,t) in operating on a vector of objects a = ( a i , . . . , a n ) , where a i , . . . , an G U, and U is a universe of objects. This process we refer to as the judgement process. The objects, actor i, and situation t specify the context of judgement. The judgement process results in conclusions, which we denote as J^*{a). For a given situation, we can simply write J (a). It is also possible that no rule or rule complex is produced and we write J{a) = 0. 29.1.1 The Principle of Action Determination: A Type of Judgment In making their judgments and decisions about an action or plan B (or between A and B), players activate relevant or appropriate values, norms, and commitments. These are used in the assessments of options through a comparison-evaluation process. In determining or deciding action, a player(s) compares and judges the similarity between an option S, or pair of options A and B, and the appropriate, primary value or goal which the actor is oriented to realizing or achieving in the situation. More precisely, the actor tries to determine if a finite set of expected or predicted qualia or attributes of option B, Q{B), is sufficiently similar to the set of those quaha Q{v) which the norm or value v (or a vector of values) prescribes. ^ This relates to the Wittgensteinian problem of "following a rule" ([18],[19]). From the GGT perspective, "following a rule" (or rule complex) entails several phases and a sequence ofjudgments: in particular, activation and application together with relevant judgments such as those of appropriateness for a given situation or judgments of applicability. To apply a rule (or rule complex), one has to know (1) the conditions under which the application is possible or allowed and (2) the particular conditions of execution or application of the rule (in part, whether other rules may have to be applied earlier). The application of a rule (or rule complex) is not then simply a straightforward matter of "following" and "implementing" it: the conditions of execution may be problematic; the situation (or situational data) may not fit or be fully coherent with respect to the rule (or rule complexes); actors may reject or refuse to seriously implement a rule (or rule complex); a rule (or rule complex) may be incompatible or inconsistent with another rule that is to be applied in the situation. In general, actors may experience ambiguity, contradiction, dilemmas, and predicaments in connection with "following a rule" making for a problematic situation and possibly the unwillingness or inability to "follow a particular rule."
29 Generalized Game Theory's Contribution to Multi-agent Modelling
367
The principle of action determination states: Given an actor i in ROLE{i,t) which entails the value v (or a vector of values) in VALUE{i, t), specifying qualia and standards Q{v) that are to be focused on and realized in role decisions and performances in interaction situation G{t). Actor i in G{t) guided by value v (or a vector of values) tries to construct, or to find and select an action pattern or option B where B is characterized by qualia or properties Q{B), which satisfy the following rough or approximate equation,-^ J(i, t){Q{B), Q{v)) = sufficiently similar
(29.3)
that is, the conclusion of the judgment process is that Q{B) is judged sufficiently similar'^ toQ{v). 29.1.2 Game Processes Interaction or games taking place under well-defined conditions entail the application and implementation of relevant rule complexes. This is usually not a mechanical process. Actors conduct situational analyses; they find that rules have to be interpreted, filled in, adapted to the specific circumstances (see footnote 2). Some interaction processes may be interrupted or blocked because of application problems: contradictions among rules, situational constraints or barriers, social pressures from actors within G{i) and also pressures originating from outside the game situation, that is the larger context (see later). In general, then, human agents do not only apply relevant values and norms specified in their roles. They bring to their roles values and norms from other relationships. For instance, their roles as parents may come into play and affect performance in work roles (or vice versa). They also develop personal "interests" in the course of playing their roles, and these may violate the spirit if not the letter of norms and values defining appropriate role behavior. More extremely, they may reject compliance and willfully deviate, for reasons of particular interests ^ Elsewhere we ([10],[16]) have elaborated this model using a fuzzy set conceptualization. The general formulation of next equation relates to the notion of "satisfying" introduced by Simon [17]. "^ In a game with given action alternatives for each of the players, a judgment procedure such as optimization may be used under some conditions. Actors proceed to compare and determine the similarity or goodness offitbetween the value indications Q{v) and the expected consequences Q{.) of the different action alternatives. She chooses — and tries to produce — these actions which minimize the difference between the anticipated consequences of the actions and the prescribed or indicated consequences of the value v. This is following or applying a value in another sense than that of constructing an action on the basis of a value. More precisely, for two alternatives, A and B, the consequences of each alternative, Q{A) and Q{B) are compared to the consequences prescribed by value v, Q{v). Consider that the actors can compare the results of their judgments by means of a relation <. Suppose that the results of actor I's judgments of the two options A and B are dl{A) and dl{B), that is the degree to which A and B realize or satisfy v, respectively. Moreover, it is judged that dl{A) < dl{B), that is that action B better realizes v than action A. Finally, the actor chooses that action which is maximal with respect to <, namely in this simple example, B.
368
Tom R. Burns, Jose Castro Caldas, and Ewa Roszkowska
or even ideals (as modeled in our simulation studies). Finally, agents may misinterpret, mis-analyze, and, in general, make mistakes in applying and performing rules. In sum, role behavior is not fully predictable or reliable. We investigate and model multi-agent social systems in which the agents have different roles and role relationships. Most modem social systems of interest can be characterized in this way. That is, there are already pre-existing institutional arrangements or social structures (see Figure 29.1). How are games played? Two or more roles ( 1 , 2 , 3 , . . . , m) in relation to one another generate interaction patterns, outcomes, and developments. Consider for simplicity's sake a two role model:
CULTURE/INSTITUTIONAL ARRANGEMENTS
T SOCIAL AGENT B MODEL(2,t) ROLE2 J(2,t]
i 1 VALUE(2,t)
\ ACT(2,t)
1
PHYSICAL ECOSYSTEM STRUCTURES: TIME, SPACE AND OTHER
INTERACTIONS AND OUTCOMES
CONDITIONS
Fig. 29.1. Two role model of interaction embedded in Cultural-Institutional and Natural context Classical games are special cases, namely closed games with given players as well as given action alternatives and outcomes. The players have anomic type social relationships and operate with specified, fixed value complexes or preferences orienting them to self-interested pursuit of their own interests or values [8]. Such closed game situations with specified players, and given value and judgment complexes (for instance, maxmin or other optimization procedure (see footnote 4) as well as given alternatives and outcomes are analytically distinguishable from open game situations.
29 Generalized Game Theory's Contribution to Multi-agent Modelling
369
Open and closed games are distinguished more precisely in terms of the properties of the action complex, ACT (I, t, G) for the group of players I at time t in game G{i) (see [8]). In closed game conditions, ACT{i, t, G) is specified and invariant for each actor i in / , situation Su and game G{t). Such closure is characteristic of classical games (as well as parlour games), whereas most real human games and interaction processes are open. In open games, the actors participating in G{t) construct or "fill in" ACT{I, t, G) in the course of non-routine interactions. Also, in such open games, actors may change values (including changes in preference structures and utility functions), models, and judgment complexes. For instance, in a bargaining process, the actors may alter their strategies or introduce new strategies - or develop particular feelings and undergo shifts in their values and judgment complexes - during the course of their negotiations. In such bargaining processes, the particular social relationships among the actors involved - whether relations of anomie, rivalry, or solidarity - guide the construction of options and the patterning of interaction and outcomes. In general, for each actor i in /, her repertoire of actions, ACT{i, t, G), is constructed by her (and possibly others) in the course of her interactions. She tends to do this in accordance with the norms and values relevant to her role at t. In open game situations, actors may construct and elaborate strategies and outcomes in the course of interaction, for instance in the case of a bargaining game in market exchange ([8],[16]). In such bargaining processes, established social relationships among the actors involved guide the construction of options and the patterns of interaction and outcomes. There is a socially constructed "bargaining space" (settlement possibilities) varying as a function of the particular social relationship in the context of which the bargaining interactions take place. The relationship - the particular social rules and expectations associated with the relationship - make for greater or lesser deception and communicative distortion, greater or lesser transaction costs, and likelihood of successful bargaining. The difficulties - and transaction costs - of reaching a settlement are greatest for pure rivals. They would be more likely to risk missing a settlement than pragmatic "egoists." This is because rivals tend to suppress the potential cooperative features of the game situation in favor of pursuing their rivalry. Pure "egoists" are more likely to effectively resolve some of the collective action dilemmas in the bargaining setting. Friends may exclude bargaining altogether as a precaution against undermining their friendship. Or, if they do choose to conduct business together, their tendencies to self-sacrifice may make for bargaining difficulties and increased transaction costs in reaching a settlement [8]. Let us consider the role relationship {ROLE{l),ROLE{2),R} of actors 1 and 2, respectively, in their positions in an institutional arrangement in which they play a game G{i, t). Such role relationships typically consist of shared as well as interlocked rule complexes. A shared rule or rule complex refers to the condition that it belongs to both rule complexes, ROLE{l) and ROLE{2). The concept of interlocked complementary rule complexes means that a rule in one actor's role complex concerning his or her behavior toward the other has a corresponding rule in the other's actor's complex, for instance in the case of a superordinate-subordinate role relationship, a rule k in ROLE(l) specifies that actor 1 has the right to ask 2 certain
370
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
questions, to make particular evaluations, and to direct actions and sanction 2. In 2's complex there is a rule or rule complex m, obligating 2 to recognize and respond appropriately to actor 1 asking questions, making particular evaluations, directing certain actions, and sanctioning actor 2. Human action is multi-dimensional and open to differing interpretations and judgment processes. The focus may be on, for instance: (i) on the outcomes of the action ("consequentialism" or "instrumental rationality"); (ii) the adherence to a norm or law prescribing particular action(s) ("duty theory"); (iii) the emotional qualities of the action ("feel good theory"); (iv) the expressive qualities of the action (action oriented to communication and the reaction of others as in "dramaturgy"); (v) or combinations of these. Role incumbents exercise their agency and focus on specific qualia in particular contexts, because, among others, (1) such behavior is prescribed by their roles, (2) such behavior is institutionalized in the form of routines, (3) the actors have no time or computational capability to deal with other qualia. Thus, games may be played out in different ways, as actors operate within constraints, determine their choices and actions, and, in general, exercise their agency ([10],[16]): • •
•
• •
routine interactions. The actors utilize habitual modalities (bureaucratic routines, standard operating procedures (s.o.p.'s), etc.) in their interaction. consequentialist-oriented interactions. Actors pay attention to the outcomes of their actions, apply values in determining their choices and behavior on the basis of outcomes realizing values. normativist-oriented interactions. Actors pay attention to, and judge on the basis of norms the qualities or attributes of action and interaction, applying role specific norms in determining what are right and proper actions. emotional interactions. symbolic communication and rituals.
Or, there may be some combination of these, including mixtures such as agents oriented to outcomes interacting with others oriented to qualities of the action. Or, someone following a routine interacts with another agent who operates according to a "feel good" principle. For our purposes here, we focus on the first three patterns. (1) routine interactions (including standard operating procedures (s.o.p.'s), rituals, etc.). In general, the actors operate fixed (without modifiability) interlocked algorithms in their interaction vis-a-vis one another: ALGl Cg ROLE{l,t), ALG2 Cg ROLE{2,t).^ They have either a common rule TQ E R or rules with common premises, but with conclusions that are different for each of them. Under particular conditions (specified by the premises), the rule or rules initiates the performance of ALGl and ALG2 for actors 1 and 2, respectively. These are interlocking. ^ Algorithms are characterized as rule complexes, i.e. ALGl = { r i i , r i 2 , . . . ,r*im}; ALG2 = {r-21, r22, • • •, ^-271}, where rij G ALGl and r2j G ALG2 is a rule or complex of rules. Interlocked algorithms means that the conclusions of at least one subset (subcomplex) of each algorithm provides the premises for at least one subset (subcomplex) of the other. Thus, a failure of one or more rules in a subset (subcomplex) to be activated means a failure in the overall procedure.
29 Generalized Game Theory's Contribution to Multi-agent Modelling
371
meaning that realization in performance of the conclusions of some rule ru (or rule complex) in ALGl provide the premises for a rule (or rule complex) r2j in ALG2. In a certain sense the two algorithms are sub-routines making up a role-relationship algorithm, ALG = {ALGl, ALG2}, that is a more global algorithm and at the same time is a subcomplex of complex {ROLEl, ROLE2, R}. The sub-routines in an operative algorithm ALG characterized by process equilibrium are consistent. Consistency means in a GOT practical perspective that the implementation of rule(s) in ALGl in its (their) proper order does not interfere with the performance of rule(s) in ALG2. Typically, there will be rules for turn-taking and other rules indicating which actor takes the lead under certain conditions, etc. (see [4], concerning interactions between superordinates and subordinates). As the actors enact the interlocked algorithms, certain steps (applications of particular designated rules) are followed. Routine interaction equilibria. The interaction will be a process equilibrium if all the local conditions for applying and executing the algorithms are satisfied and the interlocking algorithms are consistent. Note that the agents in the purest case of routine interaction do not pay attention to results and do not make explicit value judgments, as in the cases of consequentialist and normativist oriented action and interaction. They attend to following the rules of the algorithm. Blockage of performance of the global algorithm and disequilibrium will result whenever: (1) the two sub-algorithms contain at least one contradiction such that the conclusions of a rule in ALGl contradict the premises of a rule to be executed subsequently in ALG2.^ (2) one or both actors makes mistakes or mis-performs so that the conclusions of a rule in ALGl contradict the premises of a rule to be performed subsequently in ALG2. (3) In the context of performance, not all the situational or local conditions for applying and executing rules in the procedures are satisfied. Routine interaction is disrupted or blocked by unexpected conditions, or more generally chaos, so that certain situational conditions necessary for the performance of the interlocked role algorithms are not met. In the following two cases, action determination takes place according to the application of values in value judgments - this according to the principle of action determination (equation 29.3). The role relationship is characterized by quasialgorithms (that is, not complete) and rule complexes. The value directed judgment processes construct and/or select among available alternatives, programs, and rule complexes in general. In other words, the actors check to see if their values are realized in actions they undertake vis-a-vis one another. In their "strategic" or instrumental actions and interactions, the focus of the actors is on the consequences (con) of actions and interactions, that is v specifies Q{v) which Q{con{A)) is to satisfy. In the case of normativist (that is, "non-consequentialist") action, the actors focus on the properties (pro) of actions and interactions themselves, that is, v specifies Q{v) which Q{pro{A)) is to realize. Cognitively and evaluatively, these are substantially different modalities of action determination. ^ A weaker version is that they fail to satisfy the premises of a rule to be performed subsequently in ALG2 (that is, the conclusions of a later rule may contradict earlier rules but this is not a problem when actors automatically and locally perform in the way indicated here).
372
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
(2) Consequentialist-oriented interactions. Given the role complex of actors 1 and 2, {ROLE{l),ROLE{2),R}, the actors orient to trying to realize role specified values in the outcomes or payoffs of action(s) under consideration. The actors focus on dimensions and qualia of action outcomes, "states of the world", or "payoffs" associated either with a constructed action or a set of action alternatives under consideration in a choice situation, in a word v specifies Q{v) which Q{con{A)) is to satisfy. One form of such a mode of action determination is found in classical game theory, entailing the players of the game attempting to maximize or optimise a result or outcome for self. In this case, the actors are assumed to be self-interested, autonomous agents. Related forms of interaction have been investigated by Bums (see [2]) and in [8],[10],[16]. These entail, among others, (1) variation in the goals of the actors; actors may be oriented strategically to one another, for instance, in trying to help the one another ("other-orientation"), or they may be oriented to joint or collective beneficial outcomes; (2) the actors orient in their game situation to multiple values, which often results in dilemmas, or divergent or contradictory conclusions about what to do (see [8]). Typically, their action judgment process will involve the use of procedures such as weighting schemes, lexicographic, and other methods in order to resolve dilemmas. The value orientations, the relationship between the actors' values, and their modalities of action judgment will depend on the type of social role relationship between the actors. For instance, 1. Solidary actors expect one another to determine action(s) which realize a collective value or mutual satisfaction, "a just division", etc. These are the appropriate values which would appear in the action determination principle and which the actors would try to realize in constructing and/or selecting actions in the course of their interactions. 2. Competitive or rivalry relationship. Each actor is motivated (operating with value orientations) trying to construct or find strategies that give results better for self than for other ("relative gain"). The methods of classical game theory as well as other methods may be used for constructing or selecting actions that would give desired results. 3. Relationship of indifference (strictly speaking this is not a role relationship). The actors only concern themselves about the best result for self, ignoring the other agent. Depending on situational conditions, the agents may be motivated to cooperate (for instance, payoffs are convergent for the actors in the situation) or to compete (payoffs are divergent in the situation). Consequentialist-oriented equilibria. In classical game theory, autonomous agents concern themselves with their particular self interest: One major result is the Nash equilibrium, from which no actor in the game can improve his or her individual situation by choosing an action or outcome differing from equilibrium. GGT considers another variant more sophisticated (and realistic) than either preference ordering or maximization of utility. Each actor engaged in a game has in the context of their particular role relationship a value complex defining a minimally acceptable level
29 Generalized Game Theory's Contribution to Multi-agent Modelling
373
as well as an ideal or maximum goal to aim for. So, an outcome that satisfies the minimum for each actor would be a type of equilibrium. However, those who had hoped for a better result are likely to be disappointed and partially dissatisfied and are inclined to search for other possibilities. So the equilibrium under such conditions is an unstable one. The more that the agents realize their more ideal expectations, the more stable the equilibrium. For instance, this is true of negotiated contracts and prices on a market. Related work ([8],[10],[16]) shows that there are multiple equilibria, and that the social relationships among the actors (for example, relations of solidarity, competition, or enmity) determine the particular equilibrium (or equilibria) as well as lack of equilibria that obtain in a particular game such as a prisoners' dilemma game. Actors with a competitive relationship where each tries to outdo the other will lack an equilibrium in a strict case. In consequentialist oriented interaction, the actors cannot determine an equilibrium if they fail to obtain or to be able to use information about outcomes or information about the connection between actions and outcomes. Also, disequilibrium results if actors' expectations or predictions about outcomes satisfying values fail to materialize. A narrow focus on outcomes - ignoring the qualities including ethical qualities of action and interaction - implies that actors behave as if "the ends justify the means." On the one hand, this simplifies greatly judgmental computations. Once consideration of the qualities of actions - and, more generally, once actors are motivated by and take into account multiple values - they are likely to be faced with dilemmas and tendencies to vascillating behavior [8]. (3) Normativist-oriented interactions. In a context t, the rule complex {ROLE{l,t),ROLE{2,t),R} specifies how actors in their roles should act vis-avis one another. The actors pay attention to such qualities of the interaction as, for instance, "cooperation", "taking one another into account", or "fair play". These determinations entail a comparison-judgment of an action or actions focusing on its (their) qualities that satisfy or realize one or more norms referring to the intrinsic properties of the actions. That is, v specifies Q{v) which Q{pro{A)) is to satisfy. Again, actors in solidary relationships focus on producing actions and interactions that are defined as "cooperative", as "solidary", as "fair play", etc. Rivals would focus on producing competitive-like activities. Normativist equilibria. Such equilibria obtains when the actors are able to satisfy sufficiently the value(s), that is, norm(s), which apply to their actions and interaction. Of course, if the satisfaction level is minimal, the equilibrium may be an unstable one, because one or more actors would be inclined to try to improve performance of a given action scheme or to construct another scheme. Stability would, of course, obtain if they judge other schemes to be unavailable. However, under such conditions of dissatisfaction, the equilibrium would be an unstable one [10]. Obviously, disequilibrium obtains if conditions in the situation make it impossible to perform th appropriate actions correctly. In addition, mistakes may be made so that the actual performances do not satisfy the norm. The situation would also be problematic, if the actors cannot obtain sufficient information either to enact the norm or to judge the qualities of their actions (for instance, Hamlet is uncertain if
374
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
he should apply the norm of revenge, revenging his father's death, since he is not sure if the uncle (with his mother) murdered the father). A narrow focus only on the intrinsic properties of action - ignoring the consequences - is highly problematic in general. That is, the actors consider action as "right" regardless of outcomes. Those who are in competitive social relationships generate actions that have the qualities of "competitiveness", "one upmanship", etc. "Equilibrium" patterns would entail the actors generating activities satisfying the norms of competitive action. Note that there is equilibrium interactions among rivals when the focus is on the qualities of the action rather than on the outcomes (where equilibrium is not possible in a strict sense, as pointed out earlier). Several remarks are in order about the different judgment modalities for determing action: (1) Interaction processes may be characterized by some combination of routine, outcome oriented and normatively oriented determinations, for instance, combinations of instrumental or strategic calculation with normative. However, as indicated earlier, this requires particular judgment procedures to resolve value dilemmas or conflicts when they arise in the course of interaction. (2) Actors who interact according to a conmion rule regime or complex are aware that they are playing a particular game together - that is with given rules, to which the players orient (but which they may adhere to or comply with to varying degrees). G{t) is not just any collection of rules whatever. Of course, the actors may disagree (or try to deceive one another) about what the actual rules are. (3) The GOT conception of a game does not contain a procedure to "solve the game." In other words, the concept of a game structure is not a means or a procedure to solve it, for instance, to resolve problems of coordination or conflict, which it may entail. The players of a game have, of course, procedures to determine action and, possibly, also to resolve conflicts [9]. (4) In the GGT framework, rules are distinguished from the performance of rules, namely the process of applying and implementing rules in concrete activities (in time and space). Among these activities is not only the performance of particular action rules such as norms and prescribed procedures but adaptation to interaction situations and conditions (5) In the GGT perspective, the results of a performed game are: (a) patterns of interaction which are largely predictable within rough limits on the basis of the game complex G{t), the actors' roles in the game, and situational conditions as well as the larger context (see Figure 29.1); (b) Some outcomes will be equilibria, others not. When one or another outcome is a normative equilibrium (satisfying or realizing common norms or values), then it is likely to be a stable, enduring result. Otherwise, one or more of the players (or even external players) may challenge it. Interaction is not mechanical; (c) Role performances do not necessarily result in equilibria. Certain game contexts - and configurations of rule complexes - have greater likelihood of ending in stable results or outcomes than others. In sum, the game structure G{t) in GGT is a rule complex where subcomplexes are the roles that different agents play vis-a-vis one another, and the roles are made up of subcomplexes representing key behavioral functions: {MODEL{i,
t), VALUE{i, t], ACT{i, t), J{i, t)} Cg Cg ROLE{i, t) Cg ROLE{I, t) Cg G{i)
{29A)
29 Generalized Game Theory's Contribution to Multi-agent Modelling
375
GOT treats games, roles and role relationships, judgment, and interaction processes as socially embedded, as indicated in Figure 29.1.
29.2 Applications: Simple Simulation Models of Social Regulation Formulating multi-agent game models in the GOT framework provides new insights and policy options. Typically, however, analytic results can only be obtained in cases of highly simplified and stylised situations, for instance, 2- or 3-agent closed games. For dealing with greater complexity - as well as exploring the consequences of any given proposal for institutional change - simulation is an essential tool [11]. A variety of situations of social regulation of interest to scientists and policymakers may be tackled within this framework. Our multi-agent models start with an institution or institutional arrangement, that is a specified rule regime that is applied to a population of agents in a particular institutional situation or domain. The regime defines the roles of the agents, their access to resources, and their rights and obligations with respect to such key actions as complying with norms or deviating from them, monitoring, judging, and sanctioning. A simple application related to social regulation and the maintenance of order will be presented briefly in the form of a simulation model that provides a framework for further extensions. The model was implemented with REPAST.^ The basic game situation involves a type of collective action game, where participants choose to contribute or not contribute to a collective good. This might involve making a contribution or adhering to a social norm benefiting each and all but entailing sacrifices or "costs" to individual agents. There may be some degree of social regulation and control. Two general types of regulation are of interest: (1) in one the population of agents (/), citizens, interact and influence one another within local network relationships; they are subject then to neighborhood or conmiunity regulation, not specialized, centralized regulation; (2) in the other situation, there is a specialized, central regulator A who tries to enforce in the population of agents (/) adherence to a norm (which may be an official policy or law). For this limited paper, we report only on the results of simulation runs with the first type of model. The Basic Model: Diffuse Regulative Processes (that is, self- or organic regulation) The model involves the following constitutive elements: a world or interaction domain, a population of citizens/neighbors who differ in their levels of commitment to the social norm and their communicative capabilities and who act locally as "neighbors" vis-a-vis one another. We start with a population of citizens or neighbors who have defined roles in relation to one another. In GGT we define the value orientations, cognitive models, action and judgment complexes of the population or various sub-populations of the population. ^ The programs for the basic model are found at http://www.soc.uu.se/research/utc/download/
376
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
The world is a 2-D grid (a torus) inhabited by agents (one in each cell). In each time unit of the simulation, agents engage in collective action (modeled as a game of public good provision). On each iteration as many public good games as cells in the grid are played. The neighborhoods playing the game in each iteration are randomly selected (one neighborhood may play the game more than once in one iteration). A neighborhood is a square of eight cells. In the public good game each agent has the choice of contributing or not contributing (0-1 choice). Thus, 1. We start with a population I of individuals who interact in neighborhoods. 2. We define the valuCy cognitive or model, action and judgment complexes of the population or the various sub-populations of the entire population. (A) Value factor, level of commitment to complying with community norm(s) varies across the population of individuals and may change over time. (i) Case 1: There is a subpopulation of "good citizens" (Type 1). They are committed to varying degrees to the role of good citizen, that is, complying with laws or norms. There is also a subpopulation of citizens (Type 0) that rejects to varying degrees the role of good citizen and develops another role, an anti-social role with a commitment to anti-community norms. (ii) Case 2: Case 1 is extended by considering dynamic feedback where the levels and patterns of compliance in the neighborhood affect actors' levels of conmiitment to the "good citizen" role (or its antithesis). (B) Cognitive Model factor: The agents observe what their neighbours are doing (whether they are complying, or not complying). They take the local pattern into account in making judgments about how to behave in the situation. Note that this "neighborhood principle" may be expanded to include more of the population than just the closest neighbors. {Q) Action factor: The action opportunities in the situation are to comply or not comply. An expanded version takes into account the communication of judgments or feelings among agents, for instance, "I am not complying but would like to." Or, "I am complying but would rather not." (JD) Judgment factor: The agent's disposition for compUance or non-compliance is jointly determined by the degree of prior commitment of the agent and his or her perception of the rate of compliance in the neighbourhood. This disposition is a measure of the likelihood of a certain kind of behavior. In case 2 the degree of commitment is updated whenever an agent acts, increasing or decreasing depending on whether the rate of compliance in the relevant neighbourhood is higher or lower than the agent's prior degree of commitment.
29.2.1 Simulation results with the basic model The main results obtained in a computational experiment that included 10 runs of 2500 iterations of the simulation for each set of parameter values (see parameters and values tested in Table 29.1) may be summarized in the following general conclusions:
29 Generalized Game Theory's Contribution to Multi-agent Modelling
377
Degree of commitment of actor/
neighbourhood behaviour Perception Feedback » _ -^
case 1 and case 2 only case 2
Fig. 29.2. The Basic Simulation Model
In case 1 the simulation converges to rates of partial compliance that positively depend on the initial proportion of actor types, relative levels of commitment of the two populations and the relative communication rate between the two populations (see Figure 29.3); In case 2 the simulation converges to levels of compliance that positively depend only on the proportion of types (the variation of relative level of commitment or relative communication rate has little or no effect). Table 29.1. Simulation Parameters \Parameters Values tested Average Degree of Commitment for Agents of 0.1,0.5,0.9 Type 0 (AvgComO) Average Degree of Commitment for Agents of 0.1,0.5,0.9 Type 1 (AvgComl) Communication Rate for Agents of Type 0 0.1,0.5,0.9 1 (ComRO) Communication Rate for Agents of Type 1 0.1,0.5,0.9 (ComRl) # agents type 1 / # agents type 0 (PropT) 0.25,1,4 Rate of Adjustment of Commitment (RateAdj) 0 (case 1), 0.01 (case 2) |
With respect to the problem of norm sustainability the simulation results suggest that: The sustainability of a norm depends crucially on: (a) the proportion in the population of those who share the norm; (b) the comparative levels of commitment of those who share the norm and those who oppose it; (c) the comparative communicative capabilities of both groups. A committed and/or communicative minority may shift the balance of norm compliance (non-compliance) in its favour. For instance, this is the result obtained for case 1 with PropT = 0.25, ComRO=0, ComRl=0, AvgComO=0.1, AvgComl=0.9 (see Table 29.2 and Figure 29.3) The sustainability of norm compliance also depends on the spatial configuration of the initial distribution of types of agents. For instance, in case 1, with PropT =
378
Tom R. Burns, Jose Castro Caldas, and Ewa Roszkowska
0.25, ComRO=0, ComRl=0, AvgComO=0.1, AvgComl=0.9 an initial random distribution leads to convergence close to a 70% compliance rate in the population. A compact minority of type 1 agents leads, on the other hand, to an outcome of only 30% comphance. A clustered or compact minority may shift the balance of norm compliance/noncompliance in its favour. For instance, in case 1, with PropT = 0.25, ComRO=0, ComRl=0, AvgComO=0.9, AvgComl=0.1, an initial random distribution leads to convergence close to a 0% rate of compliance in the population. A compact minority of type 1 agents leads, on the other hand, to an outcome of over 10% compliance. Simulation runs with "walls"^ perpetuate (or enable the sustainability of) a particular group, whether a group of good citizens or anti-social types. Table 29.2. Simulation results of Case 1: Relative commitment, proportion of actors types, and rate of compliance.
Relative Commitment 0.11 0.20 0.56 1.00 1.80 5.00 9.00 0.25 3% 5% 12% 20% 32% 52% 65% PropT 1.00 11% 18% 36% 50% 64% 83% 89% 4.00 34% 46% 70% 80% 88% 95% 97% The basic model allows for extensions that may be used to explore specialized and centralized regulative processes. These multi-agent systems include at least two types of social roles, that of a regulator and those of citizens or neighbors. The public agency A is responsible for solving the collective action problem - that is, the contribution of individual citizens to a collective good or compliance with a norm or law. The agency tries to control the actions of the population(s) of societal agents (such as citizens, business enterprises, public agencies). It does this by monitoring, assessing, sanctioning, etc. It determines a public policy or strategy judged to ensure compliance and enforces it through one or more concrete measures (incentives, sanctions, moral appeals, or persuasion). Formal regulatory actions are assumed to cost money and other resources, but the resources can be allocated in different ways (over time and space, and in relation to concrete events and developments). The population of citizens I are required or expected to adhere to the norm, policy, or law, and the regulator tries to steer or control them. In such a model, the population may or may not be homogenous in its moral predisposition (see below). Different agency strategies and resource levels are run in relation to the population I which is predisposed to, for instance, conditional contribution (or non-contribution). The level of compliance is a function of the perceived level of monitoring and sanctioning of the regulator (and possibly also of neighbors) concerning compliance with respect to a norm which prescribes contribution or compliance. A "wall" is a space on the grid which is left empty so that neighborhoods are split - there is no direct influence or conmiunication across a wall.
29 Generalized Game Theory's Contribution to Multi-agent Modelling
379
4.00 1.00 o
0.11
0.20
0.56
1.00
1.80
5.00
L0.25 9.00
Relative committment (AvgComl/AvgComO) ComRO=0, ComR1=0
10%-20% • 20%-40% • 40%-60% U 60%-80% • 80%-100%
Fig. 29.3. Computational experiment results: Average rate of compliance in the population in case 1 (data from Table 29.2) In the extended model, the following role structures are specified: (1) the value complexes of the regulatory and citizen groups; (2) the cognitive models of the regulatory and citizen groups; (3) the regulator's and citizen groups' action opportunities and constraints; (4) the judgment functions of the regulator and citizen groups. A key aspect of the model is that failure to effectively regulate agents' behavior may result in a major system failure. For instance, in the case of electricity, a excessive electricity consumption overloads the system and a "blackout" or crisis occurs. "Blackout" can refer not only to a collapse of electrical supply but to other types of societal "breakdown," It is a concept that would apply to any major resource depletion (or pollution crisis or widespread violence) that threatens to undermine and destabilize a functioning social order, whether temporarily or extended over some period of time ~ a time factor that can be investigated using the types of models that we have developed. Such types of models enable us to systematically explore the relationships between regulatory agent value orientations and goals, budget levels, other resource constraints, allocation of resources among different implementation strategies, and the character and behavior of the population of regulates. The model - and variants of it - allows us to investigate changes in strategy - for instance, relating to resource use - as a function of policies, budget and other resource constraints, and methods and patterns of enforcement. The policies and programs of agent A impact on the different populations differentially, because of the diverse values, interests, and capabilities of the populations. When one considers that the population I may be divided into sub-populations, who
380
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
not only have varying degrees of commitment to law and order but themselves engage in regulatory processes, then we are in a position to utilize a technology to investigate multiple regulative processes in a complex society.
29.3 An Agenda for Research: GGT Institutional Analysis Applied to Security Problems in Multi-agent Systems In the GGT perspective, institutions are understood as durable social rule systems or regimes, structuring and regulating agents' games (interactions) and the social system as a whole ([4],[3]). Their operation may be considered from different perspectives. Rational choice theory (as well as classical game theory) tends to view institutions as imposing constraints on choice or as influencing the cost-benefit analyses of self-interested rational individuals. In other accounts the role of institutions is broader: they not only constrain or influence through incentives, they define roles with particular values and obligations, providing motivations for action that transcend self-interest by appealing to the agent's social identity and collectiveunderstanding. Analysing problems of security from an institutionalist perspective and understanding how institutions regulate and stabilise social systems - and why they sometimes fail to so - demands an understanding of the social and psychological mechanisms that mediate institutions and behaviour [13]. Research into such mechanisms suggests that besides constraining and influencing through incentives, institutions carry with them expressive and normative dimensions. These frame and affect the interpretation of situations, influencing individual judgements, attitudes and behaviour. Research such as that based on GGT has been developed beyond the narrow boundaries of the rational choice and classical game paradigm ([1],[2],[8],[15],[16]). It recognizes agents who are faced with, and try to resolve, judgment dilenmias, e.g., between personal preferences and obligations associated with social roles and norms. They may embark on courses of action that the mere logic of "incentives" would ignore or forbid in the name of rationality. More importantly, it suggests that incentives, namely pecuniary ones undermine in some contexts the disposition of most individuals to express - and to act in terms of - solidarity and cooperative-intentions, exercising self-restraint and making sacrifices for the common good. Control over individual action through increased monitoring and generalized incentives are likely, under some conditions, to be self-defeating. Transparent incentives structures may transform the interpretations and predispositions of actors in the situations by framing choices in cost-benefit terms instead of community commitment terms. Increased monitoring may convey a message of breakdown in cooperation, or the impression of excessive control by a regulator, possibly triggering shirking and other forms of "free-riding" [14]. More generally, the problem of security can be formulated in the GGT perspective in terms of regulating multi-agent systems, dealing with social disorder and crisis. Such regulation is organized and regulated through particular institutions, which
29 Generalized Game Theory's Contribution to Multi-agent Modelling
381
may fail to function properly or effectively under particular conditions (as in several of our simulation runes). For instance, 1. private property rule regimes may increase or decrease conflict and social disorder. Regimes which allow for overlapping rights or claims, that is unclear or imprecise boundaries or fences, may be conflict generating and result in social disorder. Therefore, the design of such regimes is typically of critical importance to social order and security. 2. Institutional arrangements operating as regulatory devices may decrease conflict and social disorder on labour, commodity, and financial markets. Again, the designed arrangements may be reasonably effective or tragically ineffective. For instance, in advanced countries, there are elaborate institutional arrangements to regulate and resolve labor-management conflicts: namely systems of negotiation, mediation and arbitration, among others. Similarly on commodity markets, there are product safety controls, product guarantees, etc. On the international level, the World Trade Organization (WTO) arrangements regulate trade and exchange in many areas reducing conflict, instability, and disorder. Also, on the international level, major conflicts between debtor and creditor countries and their banks are regulated through negotiation, intervention of powerful, developed countries (as in dealing with Latin American and Asian financial crises). Under some conditions, however, regulatory arrangements fail to resolve or constrain social conflict, but rather increase it as debtor countries decide they must default on loans, generating insecurity and disorder. In general, regulatory institutions may fail in a number of ways. 3. Money and financial systems are subject to regulation, conflict, and crisis. For instance, money and financial markets are regulated on the national level, but increasingly as they function globally, they tend to be unstable. There are powerful forces of "escalation", rapid, destabilizing developments and crises. 4. Institutionalized societal conflict-resolution procedures such as multi-lateral negotiation, judiciary processes, and democratic systems, among others, are means of regulating conflicts and social instability in many contemporary societies [9]. There are limitations of such methods. For instance, "resolutions" may not be accepted by committed minorities. What then? In societies with deep cleavages, certain democratic forms such as "first past the post" and majority rule do not resolve conflict and maintain social order, but rather tend to intensify conflict and generate disorder. In such "democratic" social systems, a majority readily dominates a minority (or minorities) and the latter may reject such domination and mobilize to neutralize or even undermine it, setting the stage for confrontation and escalating conflict. Similarly, a judicial or arbitrator procedure to settle conflicts depends on people's belief or confidence in its competence, impartiality, and objectivity. Again, there may be substantial groups in society who distrust, and act in opposition to, the judiciary or arbitrator because they are coupled or identified with a majority or oppressive group. Or, they have a history of unjust, arbitrary, or incompetent actions.
382
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
"Integrating institutions" such as Republican arrangements treat all citizens equally - and insist that all act equally in the public sphere. However, this may take the form of insisting on keeping religion or religious symbols out of the public sphere. Many religions or religious groups can accept this. But some religious groups (Christian, Jewish, and Muslim) may find it problematic because they may not separate so neatly the private and public spheres, as the Republican model demands. So, the elimination of scarves, as in France, can be seen or experienced as provocative to many Muslims. In general, some institutions in a given context operate to integrate, others to separate (establishing walls between) agents in society. Under which conditions would "walling" institutions function effectively and when would they fail? (See footnote 8). Demographics and resource conditions as well as types and levels of interdependencies across "walls" are important factors, among others, in determining conditions of success or failure of one type of arrangement or another. A number of societal regulatory problem-situations may be identified and analysed using the GGT approach: 1. The regulator problematique: finding sustainable order and security within the constraints of bounded knowledge and scarce resources. 2. Unanticipated consequences of social action in complex systems (non-calculable risks). 3. Social dilemmas and tragic choices. For instance, the tragic choice between security and civil rights and liberties in the face of rising terrorism. 4. Regulatory dialectical game: the interaction between regulator and regulatees is such that as the regulator tries to increase the effectiveness (degree or reliability) of control, the regulatees search for and find counter-measures and loopholes (or sufficient numbers of them do so that social order is disrupted and destabilized). 5. Escalating Processes: Conflict processes may be subject to positive feedback loops. This is characteristic also of armaments races. Other societal processes characterizable in this way are bandwagon effects and speculative bubbles. 6. Self-fulfilling (and self-defeating) prophesies as key mechanisms in social life. Examples are many: bandwagon effects, stock market bubbles, socio-economic depressions which are "predicted". In general, a major principle of social life is that what people believe to be real is often real in its consequences. This may be exemplified in extreme cases by beliefs about "powerful" ghosts in the Middle Ages, or "the multitude of Communists" said to be operating during the McCarthy Period in the USA; or, the threat of genetically modified organisms (GMOs) or other material "threats" that may or may not be real but the belief in their "reality" has real consequences as a result of their impact on human action and interaction. Also, what people believe not to be real has typically real consequences. The earlier, widely-held belief about the non-transferability of BSE ("mad-cow disease") to humans illustrate this. In conclusion, institutional analysis based on GGT suggests key mechanisms of societal order and stability and also appropriate design of regulatory institutions to operate effectively in dealing with security and social order issues. Security depends
29 Generalized Game Theory's Contribution to Multi-agent Modelling
383
on detecting and recognizing dangers or threats and responding to these appropriately and effectively. In general, GOT indicates socio-cognitive, value, judgmental, and organizational factors that play a role in limiting or undermining effective regulation, for instance: 1. A socio-cognitive model or frame is oriented to the wrong dimensions and misses or misinterprets, for instance, threat-signals - as a result, there is a failure to detect, understand, or respond to real dangers or hazards. 2. Similarly, institutionalised values may lead key agents to ignore or downplay particularly threatening hazards or dangers. 3. The established capabilities to deal with particular dangers or hazards have not been developed, or are limited in scale so that key regulatory systems are easily overloaded. 4. Cultural patterns of communication may be such that, for instance, agents conceal or distort information, contributing to failures of detection or of effective response to particular dangers or hazards. 5. Fragmentation among agents - due to ethnic, class, and gender reasons as well as other bases of social cleavage - contributes to blockage of information exchange and knowledge sharing. Such fragmentation may also arise in the context of deficient social organization such that those responsibility for security or social order fail to effectively share information and knowledge relating to major hazards and dangers. 6. Institutional hierarchy tends to block significant information flow, for instance, about performance failures (administrative structures are typically hierarchical in character and illustrate time and time again the problem of blocked or distorted information flow and predictable performance failings). In sum, modem social systems are complex and require sophisticated and multidimensional theoretical frameworks and models with which to understand and regulate them. GOT is a promising candidate in this regard. Acknowledgement We are grateful to Nora Machado for contributing to the conceptualization and representation of Figure 29.1.
References 1. Anderson E. (2000). "Beyond Homo Economicus: New Developments in Theories of Social Norms^.Philosophy and Public Affairs, 29, n. 2. 2. Burns, T. R. (1990)."Models of Social and Market Exchange: Toward a Sociological Theory of Games and Human Interaction." In: C. Calhoun, M. W. Meyer, and W. R. Scott (eds), Structures of Power and Constraints: Essays in Honor of Peter M. Blau. New York: Cambridge University Press.
384
Tom R. Bums, Jose Castro Caldas, and Ewa Roszkowska
3. Bums T. R., Carson M. (2002). "Actors, Paradigms, and Institutional Dynamics: The Theory of Social Rule Systems Applied to Radical Reforais." In: R. Hollingsworth, K.H. Muller, E.J. Hollingsworth (eds) Advancing Socio-Economics: An Institutionalist Perspective. Oxford: Rowman and Littlefield. 4. Bums T.R., Flam H. (1987). The Shaping of Social Organization: Social Rule System Theory with Applications, London: Sage Publications. 5. Burns T. R., Gomoliiiska A. (1998). "Modeling Social Game Systems by Rule Complexes." In: L. Polkowski and A. Skowron (eds.). Rough Sets and Current Trends in Computing. Berlin/Heidelberg, Springer-Verlag, 581-584 6. Bums T. R., Gomolinska A. (2001). "Socio-cognitive Mechanisms of Belief Change: Application of Generalized Game Theory to Belief Revision, Social Fabrication, and Selffulfilling Prophesy." Cognitive Systems, Vol 2(1), 39-54. 7. Bums T. R., Gomolinska A. (2000). "The Theory of Socially Embedded Games: The Mathematics of Social Relationships, Rule Complexes, and Action ModaUties." Quality and Quantity: International Journal of Methodology Vol. 34(4): 379-406. 8. Burns T.R., Gomolinska A. , Meeker L. D.(2001). "The Theory of Socially Embedded Games: Applications and Extensions to Open and Closed Games." Quality and Quantity: IntemationalJoumal of Methodology, Vol. 35(1): 1-32. 9. Bums T.R., Roszkowska E. (2005). "Conflict and Conflict Resolution: A SocietalInstitutional Approach." In: M. Raith, Procedural Approaches to Conflict Resolution. Springer Press, Berlin/London. In press. 10. Bums T.R., Roszkowska E. (2004). "Fuzzy Games and Equilibria: The Perspective of the General Theory of Games on Nash and Normative Equilibria" In: Pal S.K, Polkowski L, Skowron A. editors, Rough-Neural Computing. Techniques for Computing with Words. Springer-Verlag, 435-470. 11. Caldas J. C , Coelho H. (1999). "The Origin of Institutions: Socio-economic Processes, Choice, Norms, and Conventions." J. of Artificial Societies and Social Simulation, Vol. 2, no. 2. http://www. soc. surrey, ac. uk/JASSS/2/2/1. html 12. Gomolinska A. (1999). "Rule Complexes for Representing Social Actors and Interactions." Studies in Logic, Grammar, and Rhetoric, Vol. 3(16):95-108. 13. Hodgson G.M. (2002). *The Evolution of Institutions: An Agenda for Future Theoretical Research". Constitutional Political Economy, 13, pp. 111-127. 14. Kahan D.M. (2002). "The Logic of Reciprocity: Tmst, Collective Action and Law." Yale Law and Economics Research Paper No. 201. Yale Law School, John M. Olin Center for Studies in Lawy, Economic, and Public Policy. New Haven, Conn.: Yale University 15. Leijonhufvud A.(1993). "Toward a Not-Too-Rational Macroeconomics." Southern Economic Journal. Vol. 60:1, 1-13 16. Roszkowska E., Bums TR (2002). Fuzzy Judgment in Bargaining Games: Diverse Pattems of Price Determination and Transaction in Buyer-Seller Exchange. Paper presented at the First World Congress of Game Theory, Bilbao, Spain, 2000. available at http://www.soc.uu.se/publications/fulltext/tb_market-pricing-game.doc. 17. Simon H. (1969). The Sciences of the Artificial. Cambridge: MIT Press. 18. Winch P. (1958). The Idea of a Social Science and Its Relation to Philosophy. London: Routledge & Kegan. 19. Wittgenstein L. (1958). Remarks on the Foundations of Mathematics. Oxford: Blackwell.
30
Multi-Agent Decision Support System for Disaster Response and Evacuation Alexander Smimov, Michael Pashkin, Nikolai Chilov, Tatiana Levashova, and Andrew Krizhanovsky St.Petersburg Institute for Informatics and Automation The Russian Academy of Sciences 39, 14*'' Line, St Petersburg, 199178, Russia {smir,michael,nick,oleg,aka}@mail.iias.spb.su
Summary. The paper describes an agent-based approach and its application to intelligent support of disaster response and evacuation operations. The approach is based on the idea of knowledge logistics which stands for integration and transfer of the right knowledge from distributed sources to the right person within the right context at the right time to the right purpose. The problem in the approach is represented as configuring a network of knowledge sources and approach is called *'KSNet-approach". The paper concentrates on such aspects as multiagent architecture, knowledge representation formalism and presents an application of the approach via a case study.
30.1 Introduction Disaster response and evacuation operations are very likely to be based on a number of different, quasi-volunteered, vaguely organized groups of people, non-government organizations, institutions providing humanitarian aid and also army troops and official governmental initiatives. Here many participants will be ready to share information with some well specified community [1]. Therefore to manage such operations an efficient knowledge sharing between multiple participating parties is required. This knowledge must be pertinent, clear, and correct, and it must be timely processed and delivered to appropriate locations, so that it could provide for situation awareness. This is even more important when the operations involve coalitions uniting resources both of government (military, security service, community service, etc.) and non-government organizations. As a result, systems aimed at intelligent support of disaster response and evacuation operations have to meet a number of requirements including (i) support of knowledge sharing, (ii) distributed architecture for collaborative work, (iii) interoperability with other information systems, (iv) dynamic (on-the-fly) problem solving, (v) ability to work with uncertain information, (vi) constraint network notation for real-world problem description, and other.
386
A, Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky
Since successful operation management can be achieved through knowledge of the status and dynamics of the situation and its comprehension, it can be stated that the right knowledge from distributed sources has to be integrated and transferred to the right person within the right context at the right time to the right purpose. The aggregate of these interrelated activities is referred to as Knowledge Logistics [2].
30.2 Approach to Knowledge Logistics for Disaster Response and Evacuation Operations Knowledge logistics (KL) takes place in a network-centric environment. Unlike hierarchical organizations with fixed commander-subordinate relationships, nodes of network-centric environment are autonomous decision making units that can serve other units and also be served by them. With regard to computer systems the networkcentric environment is based on advanced information technologies such as intelligent agents, ontology management, Web intelligence. Semantic Web and markup languages. Support of disaster response and evacuation operations in the networkcentric environment requires rapid processing and analysis of large body of up-todate (preferably real-time) information from distributed and heterogeneous sources (experts, electronic documents, real-time sensors, weather forecasts, etc.). Hence, one of the key components of the situational awareness is fusion of information from different sources. The most influential fusion model in the area of information fusion is JDL Data Fusion Model [3]. It combines five levels of fusion: 0) sub-object data assessment, 1) object assessment, 2) situation assessment, 3) impact assessment, and 4) process refinement. Here presented approach combines KL and information fusion at level 2 of situation assessment and is based on such advanced information technologies such as intelligent agents, markup languages, ontology management, and other. 30.2.1 Multiagent Architecture As an implementation of the approach the system called "KSNet" (acronym of Knowledge Source Network) has been developed. This system uses intelligent software agents to provide access to distributed heterogeneous knowledge sources. Multiagent systems offer an efficient way to understand, manage, and use the distributed, large-scale, dynamic, open, and heterogeneous computing and information systems [4,5]. Multi-agent system architecture based on Foundation for Intelligent Physical Agents (FIPA) Reference Model [6] was chosen as a technological basis for the system, since it provides standards for heterogeneous interacting agents and agentbased systems, and specifies ontologies and negotiation protocols. FIPA-based technological kernel agents used in the system are: wrapper (interaction with knowledge sources), facilitator ("yellow pages" directory service for the agents), mediator (task execution control), and user agent (interaction with users). The following problemoriented agents specific for KL and scenarios of their collaboration have been developed: translation agent (terms translation between different vocabularies), knowl-
30 Multi-Agent Decision Support System...
387
edge fusion (KF) agent (KF operation performance), configuration agent (efficient use of knowledge sources), ontology management agent (ontology operations performance), expert assistant agent (interaction with experts), and monitoring agent (verification of knowledge sources). A community of agents is represented in Fig.30.1 according to the above described principles and functions of the system "KSNet". A detailed description of the multi-agent architecture can be found in [7]. Below, two agents specific to disaster response and evacuation operations are described.
KB - knowiedge base KF - knowledge fusion
Fig. 30.1. Agent-based architecture for the system 'KSNet'
30.2.2 Monitoring Agent Since disaster response and evacuation operations take place in a dynamic environment, continuous run-time monitoring and tracing of this environment is one of the key factors of success. This means that successful disaster response and evacuation operations can be achieved through the comprehension of the situation, knowledge of the status and dynamics of its elements and prediction of the states of the operation environment in the future. In the KSNet-approach a monitoring agent is provided for permanent checking of the knowledge sources to have actualized information about the current situation. It enables timely planning of activities in a network-centric environment since decision makers will have actualized information. 30.2.3 Knowledge Fusion Agent Agents and services for disaster response and evacuation operation support have to be dynamic, in other words on-the-fly business operation planning and management based on adaptive agents is required. Possibilities to add new agents and remove or modify existing ones based on the results of monitoring have to be provided. For
388
A. Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky
this purpose the described approach implements adaptive agents. These agents may modify themselves when solving a particular task. For example, within the KSNetapproach there is an agent attached to an application that is responsible for configuration problem solving (e.g., coalition configuration, routing, etc) based on existing knowledge. The task is described by an ontology stored in an ontology library. Upon receiving a task the application loads an appropriate ontology and generates an executable module for its solving "on-the-fly". In the proposed approach a novel "on-the-fly" compilation mechanism is proposed to solve varying problems. In a rough outline this novel "on-the-fly" compilation mechanism is based on the following concepts (Fig.30.2): -
-
a pre-processed user request defines (1) which ontologies are to be used for the problem domain description, and (2) which knowledge sources are to be used; C++ code is generated on the basis of information extracted from (1) the user request (goal, goal objects, etc.), (2) appropriate ontologies (classes, attributes, and constraints), and (3) suitable knowledge sources; the compilation is performed in an environment of the prepared in advance C++ project; failed compilations/executions do not fail the system work in whole; an appropriate error message is generated. Translation Agent ^ ——
User request
Code generation
/ y Knowledge sources Wrappers
- Prepared in advance templates - Generated "on-the-fly" code fragments
JL Compilation
n
y Ontologies
Ontology Management Agent
Execution (ILOG)
Knowledge Fusion Agent
Fig. 30.2. The concept of the "on-the-fly" compilation mechanism The essence of the proposed on-the-fly compilation mechanism is to write the ontology elements (classes, attributes, constraints) to a C++ file directiy so that it could be compiled into ILOG powered program (as it was mentioned above ILOG was chosen as a constraint solver in this project). The service responsible for the problem solving creates the C++ file based on these data and inserts the program source code to the program (the prepared in advance Microsoft Visual Studio project). The program is compiled in order to create an executable file in the form of dynamic-link library (DLL). After that the service calls a function from DLL to solve the problem. The
30 Multi-Agent Decision Support System ...
389
experiments showed that for complex tasks the compilation time is significantly less then the time of the task solving by the generated program. 30.2.4 Ontologies and Knowledge Imprecision The main principles considered during development of the presented approach and the system "KSNet" are originated from characteristics of modem e-business applications. These applications widely use ontologies as a common language for business process / enterprise modelling [8-9]). Since agents must represent their knowledge via ontologies [5] the FIPA ontology definition was used for the ontologies description. . Thus, the described approach is focused on utilizing reusable knowledge through ontological descriptions [10], with object-oriented constraint network paradigm being considered as a common knowledge representation notation what correlates with semantic metadata representation concept of the Semantic Web project. As a general model of ontology representation in the system "KSNet", an objectoriented constraint network paradigm was proposed [2]. This model defines the common ontology notation used in the system. According to this representation an ontology (A) is defined as: A = (O, Q, D, C) where: (9 - a set of object classes {''classes''), each of the entities in a class is considered as an instance of the class; Q - a set of class attributes {''attributes'")', JD - a set of attribute domains {"domains'')', and C - a set of constraints. However, when dealing with knowledge, uncertainties may arise due to the following reasons: (i) a lack of information, (ii) invalidity of information, (iii) subjectivity, (iv) a lack of knowledge about a problem, (v) unverbalizability of the problem, (vi) imprecision of the problem solving methods. Taking uncertainties into account is especially important for such operations as disaster response that have to be fast and take place under extreme conditions (e.g., damaged communications). To process the uncertain knowledge the formalism of fuzzy object-oriented constraint networks described as (O, Q, D, C^, W, T, Ip) has been chosen, where C^ - a set of constraints, and each constraint contains a function // of membership in [0, 1] associated with weight OJC representing its weight (importance) or priority; W - a weight scheme, i.e. a function combining satisfaction degree of constraint fi{c) with LUc, for estimation of weighted satisfaction degree of /i^(c); T - an aggregation function, which performs simple partial regulating on defined values, defining C^; and Ip - an information content (instances of classes) of the constraint network, which has a probabilistic nature. Constraints of attributes to classes belonging, compatibility structural constraints, hierarchical structural constraints and "one-level" structural constraints are hard constraints. All of them have to be satisfied in the found solution, i.e. for each of them cjc = 1. Some functional constraints and domains to attributes belonging can be considered as soft constraints. Within the KSNet-approach the following types of uncertainties have been selected: (i) variable contents and structures of knowledge sources, (ii) uncertainty presented in knowledge sources, (iii) low assurance of experts in their knowledge, (iv) complexity of an application domain formalization, (v) terminological conflicts
390
A. Smimov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky
during translation of knowledge from one ontology to another, (v) complexity of a user request recognition, and (vi) incompatibility of knowledge stored in different sources. This list does not pretend to cover all possible types of uncertainties.
30.3 Case Study In the considered case study a fictitious Binni region [11] is considered. The aim of the used Binni scenario is to provide a rich environment, focusing on new aspects of coalition problems and new technologies demonstrating the ability of distributed systems for intelligent support to supply services in an increasingly dynamic environment. For the presented here task of mobile hospital creation the following knowledge sources can be considered: supplies related information (required quantities of materials, required times of delivery), available suppliers (constraints on suppliers' capabilities, capacities, locations), available providers of transportation services (constraints on available types, routes, and time of delivery), geography and weather of the Binni region (constraints on types, routes, and time of delivery, e.g. by air, by trucks, by off-road vehicles). The problem of an automatic knowledge discovery is a future research. For the case study the list of knowledge sources containing information for the user request processing was defined by an expert team. This list of sources is not fixed. The scalable architecture of the system KSNet allows seamless attaching of new sources in order to get new features and to take into account more factors for tasks solved. The general problem considered in this case study for the system "KSNet" has the following formulation: "Define suppliers, transportation routes and schedules for building a hospital of given capacity at given location by given time". The following required information was defined: -
-
hospital related information (constraints on its structure, required quantities of components, required delivery schedules); available suppliers (constraints on suppliers' capabilities, capacities, locations); available providers of transportation services (constraints on available types, routes, and time of delivery); geography and weather of the Binni region (constraints on types, routes, and time of delivery, e.g. by air, by trucks, by off-road vehicles).
30.3.1 Application and Task Ontologies for Hospital Configuration As a result of the analysis of the problem the following application ontology describing the problem was built. A number of existing ontologies corresponding to the described problem were found in Internet's ontology libraries [12 - 17]. These ontologies represent a hospital in different manners using different representation formats. Firstly, the ontologies were imported from the source formats into the system notation (e.g., ontology parts corresponding to the hospital representation of the North American Industry Classification System (NAICS) code [15] and United
30 Multi-Agent Decision Support System ...
391
Nations Standard Products and Services Code (UNSPSC) [16] were imported from DAML+OIL source format). After that, they were included into the ontology library, henceforth they can be reused for the solution of similar problems. Next, ontology parts relevant to the request were combined into a single ontology (Fig.30.3). Fig.30.3 has the following notation. Reused ontology classes (the classes adopted from the Internet's ontology libraries) are shown by firm lines, reused classes that were renamed are shown by dotted lines, new ontology classes (the classes included by experts) are outlined by thick lines, firm unidirectional arrows represent hierarchical relationships "is-a", dotted unidirectional arrows represent hierarchical relationships "part-of", double-headed arrows show associative relationships. Ontology part corresponding to AO included into the case study is represented by the shaded area. The built application ontology was expanded with regard to extension of the class
Construction
Structures, building and construction components and supplies
f Transportation ]
Fig. 30.3. "Mobile hospital" application ontology "Hospital configuration". This class represents a complex task that was split into subtasks. As a result a task ontology of hospital configuration was built (Fig.30.4). Hospital configuration
4 - * -.. Components definition A-^^ Staff definition
1f J [
""-"^ BOM definition
BOM-Bill of Materials
Y Hospital allocation
•---.
* - -> Logistics
^-^
A Resource allocation
-<^
k Routing problem 1 V Route availability
] j
] \
Fig. 30.4. Task ontology "Hospital configuration" Within the complex task the following subtasks were defined: Hospital allocation. This subproblem is devoted to finding the most appropriate location for a hospital to be built considering such factors as locations of the disaster, water resources, nearby cities and towns, communications facilities (e.g., locations of airports, roads, etc.) and decision maker's choice and priorities.
392
A. Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky
Logistics. This subproblem is devoted to finding the most efficient ways of delivery of the hospital's components from available suppliers considering such factors as communications facilities (e.g., locations of airports, roads, etc.), their conditions (e.g., good, damaged or destroyed roads), weather conditions (e.g., rains, storms, etc.) and decision maker's choice and priorities. Components Definition. This subproblem is devoted to finding the most efficient components for the hospital considering such factors as component suppliers, their capacities, prices, transportation time and costs and decision maker's choice and priorities. The subtask of "Staff definition" is shown as greyed since it was out of the scope of the project. 30.3.2 Example Solutions of the Problem In order to provide up-to-date routing plans the system monitors the current situation in the region. For this purpose an emulated news Web site has been implemented that contains information about weather and events in the considered region. A specially designed wrapper reads news and finds which cities/areas are not currently available for transportation. Besides, it reads weather conditions and accordingly corrects transportation time and costs for appropriate routes. Presented example illustrates
Fig. 30.5. Routing plan for the minimize time preference (in this solution four vehicles/vehicle groups are used to provide maximum of concurrency). finding a routing plan for the same conditions but with different user preferences, namely: minimize time; minimize time, then costs; minimize both time and costs; minimize costs, then time; minimize costs. In (Fig. 30.5 - Fig. 30.7) results for different choices are presented and compared. For illustration of the results a map is
30 Multi-Agent Decision Support System...
393
generated that uses the following notations. Dots are the cities of the region. The city of Aida is the city where the hospital is to be located. The bigger cities are the cities where suppliers are located (Libar, Higgville, Ugwulu, Langford, Nedalla, Laki, Dado). Transportations routes are shown as lines. The colored trucks denote the routes of particular vehicles/vehicle groups.
Fig. 30.6. Routing plan for the minimize both time and costs and minimize costs, then time preferences (in this solution three vehicles/vehicle groups are used).
Fig. 30.7. Routing plan for the minimize costs preference (in this solution one vehicle/vehicle group is used to provide minimum of costs).
394
A. Smirnov, M. Pashkin, N. Chilov, T. Levashova, A. Krizhanovsky Route Plans for Different Criteria 2500 -f 2400 2300 1500
I
2200 " + 2100 2000 1900 Minimize Time Minimize time, Minimize txtth Minimize costs, Minimize costs then costs time and costs tiien time
-Time —•—Costs
Fig. 30.8. Routing plans for different criteria (time and costs minimization preferences). Fig. 30.8 represents a comparison of the routing plans created for different criteria. As it can be seen while importance of one of the parameters increases (e.g., importance for costs increases from left to right) the value of the parameter decreases (the red line with diamonds for the costs) and vice versa (the green line with squares for the time).
30.4 Conclusions Within the presented approach knowledge logistics is coupled with information fusion based on constraint satisfaction techniques. Utilizing ontologies and compatibility of the employed ontology notation with modem standards enables semantic interoperability with other knowledge-based systems and services and facilitates knowledge sharing. Application of constraint networks allows rapid problem manipulation by adding/changing/removing its components (objects, constraints, etc.) and usage of such existing efficient technologies as ILOG. Agent-based architecture increases scalability, efficiency and interoperability of the system "KSNet". Applicability of the approach is illustrated via a case study of on-the-fly portable hospital configuration as a problem of health service logistics. Acknowledgements Some parts of the research were done as parts of the ISTC partner project # 1993P funded by Air Force Research Laboratory at Rome, NY, the project # 16.2.44 of the research program "Mathematical Modelling and Intelligent Systems", project #1.9 of the research program "Fundamental Basics of Information Technologies and Computer Systems" of the Russian Academy of Sciences, grant # 02-01-00284 of the Russian Foundation for Basic Research. Some prototypes were developed using software granted by ILOG Inc.
30 Multi-Agent Decision Support System ...
395
References 1. Pechoucek, M., Marik V., Barta, J.: CplanT: an acquaintance model based coalition formation multi-agent system. In: Proc. of the Second Int. Workshop of Central and Eastern Europe on Multi-Agent Systems (CEEMAS'2001) (2001), pp. 209-216. 2. Smimov, A., Pashkin, M., Chilov, N., Levashova, T., Haritatos, R: Knowledge source network configuration approach to knowledge logistics. Int. J. of General Systems, Vol. 32, No. 3 (2003) pp. 251-269. 3. Salerno, J., Hinman, M., Boulware, D., Bello, P.: Information fusion for situational awareness. In: Proc. of the 6^^ Int. Conf. on Information Fusion, (2003) pp. 507-513. 4. Payne, T., Singh, R., Sycara, K.: Communicating Agents in Open Multi-Agent Systems. In: First GSFC/JPL Workshop on Radical Agent Concepts (WRAC) (2002) pp. 365-371. 5. G. Weiss (ed.). Multiagent Systems: a Modem Approach to Distributed Artificial Intelligence. The MIT Press, Cambridge MA, USA (2000). 6. Foundation for Intelligent Physical Agents (FTPA), 2004. http://www.fipa.org. 7. Smimov, A., Pashkin, M., Chilov, N., Levashova, T: Agent-based support of mass customization for corporate knowledge management. Eng. Applications of Artificial Intelligence, Vol. 16, No. 4 (2003) pp. 349-364. 8. Goossenaerts, J., Pelletier, C : Enterprise Ontologies and Knowledge Management. In: Proc. of the 1*^ Int. Conf. on Concurrent Enterprising "Engineering the Knowledge Economy through Co-operation" (2001) 281285. 9. O'Leary, D. E.: Different Firms, Different Ontologies, and No One Best Ontology. IEEE Intelligent Systems. September/October (2000) pp. 72-78. 10. Guarino, N., Welty, C : Towards a Methodology for Ontology-based Model Engineering. In: Bezivin, J., Ernst, J. (eds.) Proc. of the ECOOP-2000 Workshop on Model Engineering. 2000. 11. Rathmell, R. A.: A Coalition Force Scenario "Binni - Gateway to the Golden Bowl of Africa." In: A. Tate (ed.) Proc. on the Int. Workshop on Knowledge-Based Planning for Coalition Forces (1999) pp. 115-125. 12. Clin-Act (Clinical Activity). The ON9.3 Library of Ontologies: Ontology Group of IPCNR (a part of the Institute of Psychology of the Italian National Research Council (CNR)) (2000). URL: http://saussure.irnikant.rm.cnr.it/onto/. 13. Hpkb-Upper-Level-Kemel-Latest: Upper Cyc / HPKB 1KB Ontology with links to SENSUS, Version 1.4. Ontolingua Ontology Server. (1998). URL: http://www-ksl-svc.stanford.edu:5915. 14. Weather Theory. Loom ontology browser. Information sciences Institute, The University of Southem California (1997). URL:http://sevak.isi.edu:4676/loom/shuttle.html. 15. North American Industry Classification System code. DAML Ontology Library, Stanford University (2001). URL: http://opencyc.sourceforge.net/daml/naics.daml. 16. The UNSPSC Code (Universal Standard Products and Services Classification Code). DAML Ontology Library, Stanford University (2001). URL: http://www.ksl.stanford.edu/projects/DAML/UNSPSC.daml. 17. WebOnto: Knowledge Media Institute (KMI). The Open University, UK (2002). URL: http://eldora.open.ac.uk:3000/webonto.
31 Intelligent System for Environmental Noise Monitoring Andrzej Czyzewski^, Bozena Kostek^'^, and Henryk Skarzynski^ ^ Gdansk University of Technology ul. Narutowicza 11/12, PL-80-952 Gdansk, Poland [email protected]
http://www.multimed.org/ ^ Institute of Physiology and Pathology of Hearing [email protected]
ul. Pstrowskiego 1, 01-952, Warsaw Summary. The telemonitoring system, developed at the Multimedia Systems Department of the Gdansk University of Technology is discussed, aimed at environmental noise levels monitoring. A system presentation was provided, consisting of descriptions of the following elements: noise measurement units, computer noise measuring software, Internet multimedia noise monitoring service and soft computing algorithms applied to the analysis of the system database content. The results of noise measurements were compared to those obtained with acquired subjective opinions on noise annoyance. A new GIS layer was produced on the basis of this study employing data produced with soft computing algorithms. The engineered intelligent application may help in diminishing hearing problems and other diseases occurrence caused by environmental & industrial noise.
31.1 Introduction A considerable portion of hearing and psychosomatic diseases is caused by excessive industry, urban and traffic noise or any unwanted sounds occurring in everyday life. Consequently, it is expected that a reduction of their occurrence will be achieved as a result of implementation of the solutions that have been developed within the project scope. The latest technological advances in information technology were used in the course of the project realization [1][2]. Consequently, it is shown in the paper that the presented solutions are based on some innovative ideas and inexpensive technical means for measuring noise and assessing its annoyance [6] [7] [8]. The intelligent processing of resulting data allows fast evaluation of noise influence on humans. It is expected that implementation of the noise telemonitoring system covering whole country will contribute to rising awareness of society and authorities with regard of the influence of noise on health. Furthermore, it turns out to be an essential factor in the future improvement of the environmental noise conditions.
398
Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski
31.2 System design The recently developed multimedia noise monitoring system is addressed to all users interested in problems related to noise. It offers not only objective noise measuring methods, but also electronic questionnaires for subjective opinions survey. The measuring noise system consists of the following functional elements: a USB device with a measuring microphone which is used for signal acquisition (the device can be connected to any PC computer equipped with USB interface) and a software for calculating noise parameters (Fig. 31.1) according to valid norms [3].
-RMS.IIv^t
F 721
MfeTgirf^v,r F^k valufi (dB)—- " - p ^ - 4 - | . p A v ^ l ^ . f ® ^ f r ~ ^ ! 787 F 795 ' 5 791 j 69.4 F 70 3
pAcc^ojainito¥N'-87/B-0215f/02F-|r 1/3-octave bands speclfum -At^sbii^hgf^fellavlj&agslfatef^'bar^
1
Averdj^ «ownd tev^ j60 2
Number of 1 ^-octave band
Fig. 31.1. Sample window of the noise measurement program. The application cooperates with the USB device and another system component, which is an Internet service with specially dedicated database open to exploring with data mining algorithms. For the needs of remote and continuous noise measurements, another device based on specially designed microcomputer was developed. The device enables continuous measuring of noise and makes it possible a wireless transmission of results to the database of the system. Consequently, the system was equipped with a modem for data transmission by the GPRS protocol. The elements of the device are presented in Fig. 31.2. The implemented calibration method allows ones the usage of the developed sound interface and the measuring microphone. When the measurements are over, the program can send results through the Internet to the central server for their conmion storage, processing and analysis. Noise measurement in Multimedia Noise Monitoring System uses client's PC system or dedicated embedded miniature PC computer system for automatic measurements. The prototype device developed to achieve these goals offered extended functionality compared to the requirements of MNMS (Multimedia Noise Monitoring System) system [2]. The extensions resulted from the idea that the same device
31 Intelligent System for Environmental Noise Monitoring
399
could also be used in telemedical applications developed earlier by the staff of the Multimedia Systems Department with a close co-operation with Warsaw-based Institute of Physiology and Pathology of Hearing [4,5].
Fig. 31.2. Components of the environmental noise monitoring system: a) compact computer; b) GPRS (General Packet Radio Service) sender; c) GPS (Global Positioning System) receiver It was assumed that every measurement would be triggered by the PC host and would consist of two phases. During the first phase the system records a noise sample which is used for further calculations. The acquisition of geographical location data takes place afterwards. After the successful measurement completion the device is put into suspension mode in order to reduce power consumption and prolong its effective use outdoors. The proposed device acts as a meter allowing for a simultaneous measurement of noise and geographical localization data. This can ease and speed up the generation of noise maps and allows quick discerning places where the noise level is dangerously high. The device itself, along with the accompanying software, should be an interesting alternative to noise level meters currently available on the market. The scheme of the general MSMH system characteristic is seen in Fig. 31.3. The system architecture allows for measurement data acquisition using two methods. The first one is based on the application of specially prepared network communication protocol. It is also used to control automatic measurement stations. The second method involves direct data entry into the database using the SQL text protocol. The database is accessible through a Web service. A dedicated server has been set in order to support the monitoring devices. TCP/IP communication support has been implemented, compliant with the prepared protocol. The server supporting TCP/IP protocol and the communication with the SQL database has been equipped with additional abstraction classes.
400
Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski Computer with USB Measurement Equipment Computer with Web BrcwsaComputer with Web Browser Noise Measurement Equipment OPRS
Data Acquisition Server
MSMH Serwer
Fig. 31.3. The general MSMH system lay-out Common C++ class lets clients to be independent from the operating system, on which the server runs. The Gnome Database Access library allows significant flexibility in server's co-operation with various SQL databases. Such an approach resulted from the need of easier modification and upgrade of the system in the future. The second key server component is the Web service responsible for the presentation and processing of data collected by the engineered multimedia noise monitoring system. The PHP (PHP Hypertext Pre-processor) has been selected as a principal programming language. The service modules responsible for database operations and chart generation have been underscored. This approach, separating the presentation part and the data storage part (operated by the use of static templates of Web pages) from service logic enables simple modification of the content and behavior of the service. The Web service of the Multimedia Noise Monitoring System consists of a number of modules and sub-modules, whose mutual relations have been presented in Fig. 31.4. The main part of Web service consists of three basic modules: Administrative Panel, System User Zone and Operational Module.
31.3 Presenting Acquired Data One of the main functions of Multimedia Noise Monitoring System is the presentation of data in a form comprehensible for the user. The function offers two methods of data visualization: traditional charts and noise maps.
31 Intelligent System for Environmental Noise Monitoring
401
Multimedia Noise Monitoring System
Administrative Panel
System User Zone
Administration of the mostimpoftan options and system settings
Options connected with user datas and contact with service
Main pari of the Web service available for all users
Visualizatbn
Consultation
presentation of data in a form compreliensible for the user
Descriptions and
Fig. 31.4. Web Service structure
31.3.1 Noise map modules The engineered service has been designed for displaying simple noise maps. The main feature of a noise map is the visualization of sound intensity in a specifically defined area. In various GIS (Geographical Information System) systems noise maps are one of many information layers presented to the user. In case of the noise maps module of the system the multi-layer rule has also been maintained, but it is simplified in order to allow its display in a Web browser. The presentation of the system's noise map consists of a series of overlapping raster images. An example of a noise map for Gdansk University of Technology is presented in Fig. 31.5. When the map is displayed in a Web browser, specific images are adjusted to the zoom and offset coefficient entered by the user. In a hierarchy of a series of images a noise map layer is presented as the lowest positioned layer. Raster image presenting the sound intensity information in a given area is read from the Web server, just like the remaining images. In majority of cases
402
Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski
it is a final result of calculations and simulations in a given application supporting noise maps creation. |^^pgBH|||ffP 4^m^
. ^ . ^ J
/ § - ^ W i s a i ^ ilUubone ^J*M<x^
- > ^
^
J
s 1 ^ http //sound efi pg gda pl/haias/index chp'ld=l 14
j j
f>l»fW^
UiCW *
Mulfimedlalny System Monitorowania Halasir J Lista map hatd>u Mapa haiasu I Mapa hatasu (dia pola iaeq}
[g^oc*,,^ [ijz,
Fig. 31.5. Sample acoustic map - noise at the area of Gdansk University of Technology
31.3.2 Measurement Results Data Visualization The user must specify which region the searched measurement point is located in. For the selected region a list of cities with measurement devices is displayed. After selecting the city the user will see the map of the selected area with marked measurement points. The final selection concerns a specific measurement point. It can be selected by clicking a box on the map or by selecting a measure point from the list. For each measurement point one can specify a time range, for which specific parameters will be presented. An example of a selected measure point can be observed in Fig. 31.6. After selecting a measurement point and specifying a required time range one can display the results in graphic or table form. Measurement card for a given point contains a table including available noise parameters and a chart presenting the results in a graphic form. Fig. 31.7 presents an example of a page containing the results of measurements. By clicking a selected parameter in the table one can add of remove it from the chart. To simplify the process of viewing the results for other points, appropriate links have been added. Therefore one can select another measurement point for the same city or specify a new location.
31 Intelligent System for Environmental Noise Monitoring
403
31.3.3 Visualizing Survey Results The Web service offers access to the survey to every interested user. The survey enables users to express their own, subjective opinion about the acoustic climate in the place of residence. Subjective research is a perfect addition to objective measurements, as it allows collecting information about noise spitefulness directly from the inhabitants of an area. Survey results are automatically processed by the system. A number of results' presentation methods have been prepared. They may be charted on the map of regions of the country, for a given city in the form of circle charts or in the form of collective circle charts for the whole country. The user may select an appropriate presentation method. ^^^^J Pik
SUtm
Wkti^
Uubiqne
M^rt^dM
Ptrnt
Mulfimedialny System Monitorowania Hatasu 1 .WvkreSY : W y b 6 r l o k a l i z a c j i p o m i a r u
I I VVybor p u n k t u p o m i a r o w e g o d)<3 m Oost^pne punkty pomiar
®
liPijy^(C)gCH»jXife..
P o k s r wyniki z pr:eii!ialu
czasowego'
^^^^^
(i^asna 2 pcgmi wmiJlrcijftm&^.fi
mmrnEm Fig. 31.6. Selecting location at which the measurements are made.
31.4 Analysis of database content The content of the database can be analyzed in various ways. One possibility is assessing the subjective impression of loudness of environmental noise by people living or working in the areas endangered with excessive noise levels. This is a way of assessing hearing acuity together with overall sensitivity to sound which may be determined also by some psychological causes (tiredness, nervousness or reverse: habituation effects). The second case assumes that overall response to noise revealing noise annoyance and possibly the risk of hearing diseases may be determined also on the basis of analyzing data gathered from subjective opinions.
404
Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski
mm ; m.
et^rgs
4m 4 WKJok
mubtone
Uanxtiaa
fmasc
Mulrimedialny System Monitorowania Hatasu
S Wyniki pomiarow dla miasta: Gdynia, w miejscu! Chylonia, skrzyzowanie: Chylonska - kartuska
Has«o
IB^^
Parametrjmjfi max ^den [dgl 56
akt
ytutei
36
52
45
Lfeldfil
21
62
57
LdTdBI
45
74
45
LflBQ fdBl
4S
60
55
<^
€> Fig. 31.7. Web page displaying results of acoustic measurements.
The hearing sensitivity is understood in this context as sensitivity to perceived sound (employing hearing sense and psychological determinants). The sensitivity decrease could be objective (hearing loss) or subjective (habituation effects). This kind of a study seems very interesting for hearing pathologists, psychologists and environmental engineers. The analysis can be done employing statistical or data mining tools. We decided to make database querying with two data mining approaches: 1. basing on assessing fuzzy rules and perception-based data processing principles introduced by Zadeh [9]. The difference d between the typical (regular) noise loudness impression and the impression reported by system users (expressed in dB) is the subject of our interest, because it can reflect the decrease of generally understood hearing sensitivity. It is assumed that the users express their subjective impressions in the natural language (from NONE to ULTRA LOUD), thus fuzzy logic provides suitable tool for the computing in our case [4]; 2. basing on collecting subjective assessment results data in a decision table and applying rough set data analysis method. This approach was successfully tried earlier with regard to subjective opinions processing related to "computing with word" concept [10]. 31.4.1 Case 1: Perception-based data processing In the first discussed case there are two premises. One is associated with the information on regular loudness scaling; in further considerations it is represented by
31 Intelligent System for Environmental Noise Monitoring
405
the Norm variable. The other premise is associated with the investigated results of noisy areas inhabitants' subjective loudness impression; it is represented by the Sub variable. In order to differentiate the fuzzy sets associated with individual premises, labels of fuzzy sets associated with the first premise use lower case letters while those of fuzzy sets associated with the second premise employ upper case letters. On the basis of available information one can design a rule basis according to the following guidelines: - Premises pointing to consistence of loudness sensation evaluation for regular loudness scaling and for the investigated loudness scaling generate a decision stating no scaling differences and marked with the label "none'\ e.g.: If Norm is soft AND Sub is SOFT THEN d is none Zero (none) difference is a special case of difference. The experimental results lead to a conclusion that the output of the described fuzzy system can be described by a set of thirteen membership functions (Fig. 31.8) expressing the difference between the loudness sensation evaluation in noisy conditions and the evaluation for regular hearing. Fuzzy sets obtained in this fashion can be described with the following labels (describing the difference size): the MF in the middle of the Fig. 31.8 reflects to the label: none, then to the right there are the following lables: very small, very small'¥y small, small+, medium, mediums, large, large+, very large, very large-\-, total, total-\-. Labels marked with "+" sign denote positive difference (hypersensitivity). From the mid MF to the left the assigned labels refer to the negative difference.
Fig. 31.8. Output membership functions. If the given result of loudness scaling differs by one category of loudness sensation evaluation, the decision is associated with the output labeled "small" in the case of a negative difference or "small+ " for a positive difference. e.g.: If Norm is very soft AND Sub is NONE THEN d is small IF Norm is loud AND Sub is ULTRA LOUD THEN d is small+ It was found that the difference by two or more categories are lesfrequent, however they are also supported by adequate decision rules, e.g.: IF Norm is soft AND Sub is LOUD THEN d is medium-^ IF Norm is ultra loud AND Sub is SOFT THEN d is large IF Norm is medium AND Sub is ULTRA LOUD THEN d is large+
406
Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzyiiski
The fuzzy rule processing followed by the defuzzyfying allows for calculating crisp values of d for various points on the acoustical map of the investigated region. First the definition of membership function for input variables is needed. An example of a set of membership functions for the frequency band of 500Hz obtained by approximating the factual values of membership functions with triangles is illustrated on Fig. 31.9. In practice such functions should be determined for various frequency sub-bands, typically with center frequency: 500 Hz, IkHz, 2 kHz and 4 kHz. The approximation of fuzzy set boundaries is done algorithmically on the plane containing scattered data. The algorithm used by the authors to determine the triangular membership functions involves the following steps:
90
100 110 120
Fig. 31.9. Approximation of fuzzy sets boundaries. Dots represent (processed) answers of noise monitoring system users as to their subjective noise loudness impression (expressed in terms of degree of membership to individual loudness categories). X-axis represents measured noise levels. Labels of membership functions are: NONE, VERY SOFT, SOFT, MEDIUM, LOUD,VERYLOUD, ULTRA LOUD.
Finding the value of the first element belonging to the given fuzzy set (value of the first argument, for which the factual membership function takes a non-zero value) For determining the first arm of the triangle one considers all the elements of the factual membership function MF fulfilling the equation: x:{\/
{MF (xi) ~ MF (x,_i)) > 0
(31.1)
where i - indices of arguments of membership functions MF fulfilling condition (31.1), Calculating parameters ai and 6i of the straight line: y = aix -\-bi For determining the second arm of the triangle one considers all the elements of the factual membership function MF fulfilling the equation: X : ( V {MF {xi) - MF (x^.i)) < = 0
(31.2)
31 Intelligent System for Environmental Noise Monitoring
-
407
where i- indices of arguments of membership functions MF fulfilling condition (31.2), Calculating parameters a2 and 62 of the straight line 2/ = ^2^ + 62, Calculating the point of intersection of straight lines y = aix -\- bi and y = a2X H- 62 (determining the triangle vertex), Calculating zeros of both lines.
As in this case individual elements may belong to more than two fuzzy sets, further fuzzy logic-based processing is more compHcated [4]. A side effect is that membership functions, which share a part of their domain with domains of other membership functions (intersection with more than two other fuzzy sets), may not have the maximum value equal to 1.. It turns out that such situation is only possible when determining membership functions on the basis of averaged results of loudness scaling, as only then each fuzzy set "neighbors" (intersects) at most two other fuzzy sets and there are elements, for which the average value of loudness scaling results points directly to a given category of loudness sensation evaluation. Usually the situation when the whole population of regular-hearing persons would evaluate the hearing sensation of a given test signal level exactly the same does not happen. As in fuzzy processing using functions that reach the maximum membership value of 1 is recommended, in the discussed case one needs to normalize each membership function. 31.4.2 Case 2: Rough-set based data analysis Since our system measures noise levels and simultaneously allows people express their subjective opinions about the noise harmfulness (using electronic questionnaires), so that there is possible to investigate the relation of noise occurrence and annoyance of people exposed to noise. Meanwhile, the noise annoyance can be defined also objectively, on the basis of measured data. New noise annoyance indicator is defined according to the ISO 1996-2 norm. The day-evening-night level Lden in decibels (dB), is defined by the following formula:
Lden = 101. ^ . — f 12 .10-ir^ -F 4 • 10
r^"^ + 8 • 10-^ro— j
(31.3)
in which: -
Lday is the A-weighted long-term average sound level as defined in ISO 19962: 1987, determined over all the day periods of a year; Levening is the A-wcightcd long-term average sound level as defined in ISO 19962: 1987, determined over all the evening periods of a year; Lnight is the A-weighted long-term average sound level as defined in ISO 19962: 1987, determined over all the night periods of a year;
408
Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski
Adequate norms define permissible noise levels as shows Tab. 31.1 presenting demands for quiet areas. The assessment of the annoyance provided by the proposed system is based on both: the measurement procedures resulting in Lden value, and a collection of information related to the respondent's subjective evaluation. The rough set data analysis serves in the engineered system as an expert procedure allowing for the correlation between objective measure and subjective evaluation. Therefore the main objective is to compare assessments of exposure to noise done by a human with the objective data coming from measuring module of the system. The respondent's answers are obtained from the electronic questionnaires. The corresponding data are collected in the Pawlak's decision table along with the Lden value (Tab. 31.2). The attributes Ai,...,Am are related to parameters such as: respondent's age, perception of loudness, noise occurrence, and noise type, vulnerability to distraction by noise, to noise interference on communication and work performance, to anxiety & fear, and finally to subjective perception of stress. Among the attributes contained in the Tab. 31.2 one can see those related to the categories of loudness perception. This is a way to find the correlation between subjective perceptions and objectively measured values basing on rough set-based data processing. The data may have descriptive character (string values) or can be expressed by numerical ranges. The following decision attribute set is valid for descriptive values: {none, low, medium, high, very high, ultra high}.
Table 31.1. Normative values of noise indicators. Residential areas and noise-sensitive buildings housing public institutions (schools, hospitals, nursing homes, etc.) Single buildings in the open country Service enterprises (hotels, offices, etc.): Recreational areas where people stay overnight (holiday houses, allotment gardens, caravan parks, etc.): Other recreational areas where people do not stay overnight
Equivalent level, Lden Maximum level, LAFmax 55 dB 70 dB
55 dB 60 dB
70 dB 75 dB
50 dB
65 dB
55 dB
70 dB
The form of rules derived on the basis of analyzed cases contained in the respondents' database filelds is of the following form: (attribute-Ai)=(valueMil) and. and (attributeJirn)= (valueMnm) => (decisionjy)^ {value.di) The data are gathered from all respondents over a period of time. Having collected results for number of respondents, these data are then processed by the rough
31 Intelligent System for Environmental Noise Monitoring
409
Table 31.2. Decision table containing respondents' data. Respondent/ attribute Ai A2 an ai2 ti
•^m
D
t2
^21 022
CL2m
di d2
tn
0"nl
an2
0>nm
dn
dim
set algorithm [10]. In Tab. 31.3 some records from data collected by the system are shown. These data are related to street noise evaluation. Table 31.3. Database of respondents ' records. B C D Respondent/ Parameters A cont. cont. 1 85 loud i n 85 very loud impuls. freq.
I J Annoyance med. med. 54 high 54
high
Denotations: A - -Lden [dB] B - Loudness {none, very soft, soft, medium, very loud, ultra loud} C - Type of noise {impulsive, non-stationary, stationary, continuous} D - Occurrence {rare,fi^equent,often, continuous} E - Distraction{loWy medium, high} F - Communication Interference{low, medium, high} G - Performance Interference {low, medium, high} H - Anxiety & Fear {low, medium, high, very high, ultra high} I - Stress{low, medium, high} J-Age category{positive integer value} Annoyance - {very low, low, medium, high, very high} The first step of rough set processing is related to the elimination of rows in decision tables that are duplicated (superfluous data elimination). Further steps result in generation of rules and rough set measure, and computation of reducts allowing obtaining the reduced form of rules based on the indispensable attributes only [4] [5] [7]. Examples of rules processed by the system are shown below. One can see that some cases will be contradictory depending on the respondent's age and his/her sensitivity to the noise exposure, and in addition to the type of noise. IF ^=85 A B=loud A C=cont A i:)=cont. A £'=med. AF=med. AG=med. A iJ=low A /=med. A J=54 => Annoyance=med. IF A=85 A B=very loud A C=impulsive A jD=freq. A £;=high A F=high A G=med. A i7=high A /=high A J=54=^ Annoyance=high The real interest does not concern, however, the meaning of the above shown rules which is obvious to any data analyst, but it is related to the rough measure /J,RS associated with these rules. That is because the rough measure can be used in the
410
Andrzej Czyzewski, Bozena Kostek, and Henryk Skarzynski
system as a weighting factor for determining the correlation between the objective values of measured L^en and subjectively evaluated annoyance. The accuracy of decisions produced by the intelligent database analysis algorithm is expected to grow higher as the number of respondents' records is increased.
31.5 Conclusions Discerning correlation of objectively measured quantities and perception-based categories is one of most important problems in many disciplines of science, including acoustics. The engineered noise telemonitoring system was designed in such a way that it allows to measure noise in endangered areas and to study the influence of environmental noise on humans. Two kinds of soft computing algorithms were employed to that end originating from fuzzy sets and rough set theories. The engineered intelligent application may help in diminishing hearing problems and other diseases occurrence caused by environmental & industrial noise.
References 1. Czyzewski A., Kotus J., Web-Based Acoustic Noise Measurement System. 116th Audio Engineering Society Convention, Preprint No. 6006., Berlin, 08-11 May, 2004. 2. Czyzewski A., Kotus J., „Universal system for diagnosing environmental noise" Management of Environmental Quality, pp. 294-305, vol. 15, No. 3, 2004. 3. Directive 2002/49/Ec of the European Parliament. 4. Czyzewski A., Kostek B., Suchomski P., Automatic Assesment of the Hearing Aid Dynamics Based on Fuzzy Logic - Part I., Konferencja TASTED, the Third lASTED Int. Conf on Artificial Intelligence and Applications, September 8-10, 2003 Benalmadena, Spain 5. http://www.telewelfare.com/ 6. Miedema H. M. E., Vos H., Noise sensitivity and reactions to noise and other environmental conditions, J. Acoust. Soc. Am. 113 (3), 1492-1504 (2003). 7. D. Ouis: Annoyance from road traffic noise: a review. Journal of Environmental Psychology, 21,101-120 (2001). 8. M. Heinonen-Guzejev, H. S. Vuorinen, J. Kaprio, K. Heikkilag, H. Mussalo-Rauhamaa, Self-report of transportation noise exposure, annoyance and noise sensitivity in relation to noise map information. Journal of Sound and Vibration 234 (31.2), 191-206 (2000). 9. Zadeh L.A., A New Direction in AI: Toward a Computational Theory of Perceptions. AI Magazine 22(31.1): 73-84 (2001) 10. B. Kostek, „Computing with words" Concept Applied to Musical Information Retrieval. Electronic Notes in Theoretical Computer Science, 2003, No 4, vol. 82.
32
Multi-agent and Data Mining Technologies for Situation Assessment in Security-related Applications Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov SPIIRAS, 39, 14-th Liniya, St. Petersburg, 199178, Russia {gor,ok,samovl}@mail.iias.spb.su Summary. The paper considers one of the topmost security related problems that is situation assessment. Specific classification and data mining issues associated with this task and methods of their solution are the subjects of the paper. In particular, the paper discusses situation assessment data model specifying situation, approach to learning of situation assessment, generic architecture of multi-agent situation assessment systems and software engineering issues. Detection of abnormal use of computer network is a case study used for demonstration of the main research results.
32.1 Introduction Security-related problems, which recently became of great concern for human society, constitute a new class of applications within information technology scope. Among such applications, the most important ones are those associated with security of critical state infrastructures including computer networks and information systems assurance, safeguard and restoration of critical enterprises like nuclear power plants, electrical power grids, etc. Other important class of such applications covers assessment of threat and prognosis of development of situations associated with large scale natural and man-made disasters and mitigation of their negative impact on the environment. Very specific class of security-related applications is caused by the necessity to predict terrorist intents and counteract against terrorist attacks. The list of such security related applications of topmost concerns can be continued. From information technology point of view, security-related applications possess a number of common very specific properties making extremely difficult the development of the corresponding decision making and control systems. Among such properties, the most specific ones are multiplicity of distributed data sources, heterogeneity, incompleteness, uncertainty and temporal nature of input data to be fused for decision making, large scale and distributed nature of decision making problem, etc. The above impulses new research in the area of distributed intelligent information systems ([11], [12], [1], [8], [15], etc.) whose main objective is a so-called situa-
412
Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov
tional awareness task that is understood as the in-depth comprehension, prediction, and management of what is going within the system and environment of interest. The experience accumulated with regard to situational awareness problem allowed creating a general model of data processing within respective applications, socalled JDL model^ [14]. It considers hierarchy of tasks associated with the situation awareness-related applications (Fig. 32.1). In the conmionly accepted the situational awareness is a situation-centric problem, whose most significant subtasks are Object Assessment often referred to as Data Fusion and Situation Assessment referred to as Information Fusion. Both these tasks are currently the subjects of intensive research ([1], [8], [15], etc.]). Level 0: Preproeessing of sensor data Sensor 1 Level 1: Object assessment Sensor 2 l_J
Level 2: Situation assessment Level 3: Impact and/or threat assessment
Sensor N
Sensor and resource management
iir-( Level 4 j ^ Process refinement
Data Base Management System Fusion DB Support DB
Fig. 32.1. JDL model of data and information fusion [Salemo-01] Certain important aspects of the situation assessment task constitute the main subjects of this paper. On the one hand, distributed nature of situation assessment system input data necessitates the use of distributed architecture. In this respect the paper takes advantage of multi-agent architecture for systems in question. On the other hand, incomplete and temporal nature of input data makes the decision making problem rather specific, and this issue is also a subject of the paper. It is further shown that due to the aforementioned properties of input data the classification has to be produced based on multiple asynchronous data streams. Unfortunately both learning of classification and classification itself for such kind of data are poorly investigated. The paper proposes an approach that allows coping with the respective learning and classification problems. The subsequent part of the paper is organized as follows. Section 32.2 introduces the basic notions associated with the situation assessment task and outlines specific features of input data model used for situation assessment. Section 32.3 presents briefly the developed and completely implemented methodology of situation assessment based on information fusion and outlines the multi-agent architecture of a particular security-related application that is a detection of the security status of computer networks. Section 32.4 describes an approach intending training of classifiers ^JDL model was developed by Joint Directories Research Laboratories of the US Air Force within the framework of Information Fusion Initiative.
32 Multi-agent and Data Mining Technologies for Situation Assessment...
413
destined for on-line situation assessment update based on asynchronous inputs from multiple sources. Section 32.5 considers implementation issue of multi-agent situation assessment systems and demonstrate this aspect by example of such a system developed by the authors. Conclusion summarizes the research results and outlines future works.
32.2 On-line Situation Assessment Update: Peculiarities of Input Data Situation assessment is the topmost task in the security-related scope. Situation is a characteristic of a system constituted by semi-autonomous objects (situation objects) having particular goals and operating in a coordinated mode to achieve certain goal of the system on the whole. Situation object can be physical (e.g., technical means participating in a rescue operation) or abstract (e.g., components of software where traces of attack against computer are manifested). Situation and objects are characterized by their "states" taking values from finite sets of labels. Situation assessment task (or rather "situation state assessment" task) is a classification task aiming to determine its current state; its essence is that at each given time instant a label is mapped to situation. Situation and objects states are of dynamic nature and therefore situation assessment is a real time task. Situation related information arrives continuously from multiple distributed sensors. As a rule, the outputs of these sensors come into situation assessment system with different frequencies and in irregular mode constituting jointly asynchronous input data streams that have to be processed by situation assessment system in order to online update the current situation state. The below given example from computer network security demonstrates peculiarities of situation assessment system input; it considers the anomaly detection task. It is assumed that security status of a computer network can take values from binary set { "Normal", "Abnormal"}. It is also assumed for simplicity that four data sources resulting from preprocessing of network traffic constitute input of the security status assessment system^; they are: 1. connection — related vectors of binary sequences specifying six-component stream of IP packets headers; 2. statistical attributes of connections manifested in traffic (like duration, status, total number of connection packets, etc.); 3. statistical attributes of traffic during the short time (5 sec) intervals presented by four features specifying integral characteristics of input traffic like total numbers of connections and services of various types for last 5 sec; and 4. statistical attributes of traffic for long time intervals composed of the same statistics as previous ones but averaged over chosen number of connections. ^The whole case study from intrusion detection scope that is used for validation of the situation assessment technology under development includes multiple data sources of traffic level, operating system level and application level.
414
Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov
Fig. 32.1 illustrates part of these data streams graphically. The datasets of the above kinds used below for demonstration of the properties of the developed approach to mining of asynchronous data streams were resulted from Tcpdump/Windump data processed by TCPtrace utility and also by some other ad-hoc developed programs.
Sequences of IP packets of particular connections Traffic statistics aggregated for Traffic statistics aggregated for connections
Fig. 32.2. Multiplicity of input data streams used for anomaly detection based on data of network traffic level Sensor data are collected continuously, and one of their peculiarities is that they are time-stamped and particular data streams input into situation assessment system with different frequencies and possess finite values of "life times", that can considerably vary for data of different streams. Finiteness of life time results in the fact that after elapsing a definite time a part of data becomes useless for situation assessment. Therefore at a time of situation assessment update some attributes can be not assigned a value and, thus, input data vector to be used for situation assessment update has missing values. Fig. 32.2 demonstrates this fact. Indeed, let us assume that new data ("events)" arrive at the times Ti,T2, T3 and T4, and according to the necessity to update situation assessment in real time mode, at the same times Ti,T2, T3 and T4 situation assessment system has to make decision about current computer network security status. Decision at the time Ti is initiated by arrival of data denoted as Zi about the most recent connection completed. At that moment, life times of the most recently received data corresponding to the traffic statistics aggregated for 5 sec, Z2, and corresponding to the traffic statistics aggregated for 100 connections, Z3, are not yet elapsed and that is why these data together with the newly arrived ones, Zi, constitute the fully instantiated input Z{Ti) =< Zi, Z2, Zs >. Someone can make sure that the same takes place at the decision making time T2. At the times T3 and T4 the situation looks different. Indeed, at the time T3 decision is initiated by arrival of data Z2 corresponding to the traffic statistics aggregated for 5 sec. At that moment life time of the most recently received data Z3 corresponding to traffic statistics aggregated for 100 connections is yet not elapsed and that is why can be used for on-line decision making, whereas life time corresponding to the most recently completed connection, Zi, is already elapsed (and new connection is still being in progress) and that is why the data corresponding to Zi are useless. Therefore the input at the time T3 contains missing value of the data Zi (see Fig. 32.2). The similar takes place
32 Multi-agent and Data Mining Technologies for Situation Assessment...
415
at the time T4, when due to elapsing of the life time of data Z3 the input of the system assessing the computer network security status contains missing value in the last position. As a conclusion for the above example it can be stated that asynchronous
(1): Connections flow
^5U^^
(2): Aggregation for 5 sec.
< £ A ^ ^ ^
fSSmZ^Zm^ZSm^
|g^^
(3): Aggregation for 100 connections Input data with missing values
Legend: Expiration of life times
z(r,) ==
Arrival of new events
Zfr,j =<*, Z2. mn
( L r o ) ( S ) • • ' ( S ) - Life times of events \J\
^ 2 J \J\
( 7 ) - Events initiating on-line
*
-Missing value
zrr,) =< zi . Z ILA
decision update
Fig. 32.3. Explanation of missingness nature in input of situation assessment system
nature of the situation assessment system inputs and finite life time of these inputs result in the necessity to make decisions on the basis of data with missing values. In general case some kind of prognosis of the missing values can be used. Unfortunately in the example in question the latter is not possible at all due to the fact that, for instance, at the moment of decision making new connection can correspond to the activity of new user and in such a case there is no correlations between previous and next portions of connection-related data. It is important to note that due to
On-line update of combined decisions "^ 2'^>^
Asynchronous streams of inputs
^
Decision of source 1
Decision of source 2
Decision of source k
-<%—
Classifier of source 1
Classifier of source 2
Data source 1
Data source 2
Classifier of
source k Data source k
Asynchronous streams of inputs ^
Fig. 32.4. Decision Fusion Methodology
some reasons, in general case of situation assessment systems certain attributes of input data streams can be also missing; e.g., if airborne data are used then data can
416
Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov
be missing due to meteorological factors, object masking, etc. Thus, missingness of data is a specific property of the input of situation assessment systems and in many cases it is impossible to impute missing values based on some statistical properties of input. The above example from computer network security scope demonstrates this fact. Thus, a specific problem stated by situation assessment tasks is that the latter is classification task with missing values. Respectively, training and testing of situation assessment systems destined for on-line classification update is reduced to data mining and knowledge discovery from the data sets containing missing values. The respective approach and techniques as well are briefly considered in this paper.
32.3 On-line Situation Assessment Update: Methodology and Multi-agent Architecture Let us outline methodology of situation assessment that is based on the ideas of information fusion corresponding to level 2 of JDL model (see Fig. 32.1). This methodology determines how to allocate data and information processing functions to data source-based level and meta-level. There exist several approaches to fusion of data and information ([3]). In the used methodology at least two-level information fusion architecture is considered. In this architecture local classification mechanisms produce decisions regarding the object states based on particular data sources and then, these decisions are combined at meta-level. This methodology is advantageous in many respects, particularly, it (1) considerably decreases conmiunication overhead; (2) is applicable to the applications where data structures of particular sources are heterogeneous, since only local decisions are forwarded to the upper level, and these decisions are represented in binary or categorical scales; (3) there exist a number of effective and efficient algorithms for combining such decisions at upper level and (4) it preserves the source data privacy. A generalized structure demonstrating such a methodology of information fusion is depicted in Fig. 32.4. As concerns the above Anomaly detection system
Fig. 32.5. Anomaly detection system architecture example considering anomaly detection task, at its first level four classifiers produc-
32 Multi-agent and Data Mining Technologies for Situation Assessment...
417
ing decisions based on particular data sources are used. At the second level dealing with asynchronous binary data streams these decisions are combined. The respective multi-agent architecture of the anomaly detection system is presented in Fig. 32.5. This architecture consists of two agents of different classes: •
•
Agent of Network Traffic-based Alerts, NTA-agent, that is responsible for detection of abnormal user activity based on particular data streams of output data generated by Network Traffic Sensor, NTS, and Alert Correlation agent (Information Fusion), AC-agent, that s responsible for combining the alerts generated by classifiers oi NTA-agent.
The basic functionalities oi NTA-agent are (1) transformation of raw data structures resulting from traffic data preprocessing (this is performed by NTS component indicated in Fig. 32.5) into feature structures (this function is realized by NT-F component of NTA-agent), and (2) producing classifications, Normal or Alert, for each particular data stream of the feature structures. The last functions are realized by classifiers QA(Cnc), QB (Cnc), QB (CW^) and QB (CWIQO). Here Cnc stands for Connection and Ssn stands for Session. Accordingly, architecture of NTA-agent comprises (1) the component performing computation of the feature structures over the raw data and (2) the components performing classification (alert generation) based on particular data streams represented in terms of the feature structures. Architecture of the AC-agent includes two components, and the QB classifier is its main component. It is responsible for on-line combining of decisions produced by classifiers of the NTA-agent, thus, producing the on-line assessment of the host security status. Normal, Abnormal. In intrusion detection, this procedure is referred to as ''alert correlation". The Syn component (Syn is abbreviator for 'Synchronization") is responsible for detecting "too old" data. This component carries out "synchronization" of data in the following manner. Up to the time of receiving a new message, AC-agent (more exactly, its component Syn) keeps previous decisions of all first-level classifiers (in our case it consists of 4 attributes — labels of host security status produced by classifiers of the NTA-agent) with time stamps. While having new message received, Syn component changes values of respective attribute and deletes values of attributes that are "out of date". The updated data vector which can contain missing values is forwarded to the QB classifier responsible for "alert correlation". The simplified architecture of anomaly detection system described above presents in general features a generic architecture of many situation assessment systems intended to on-line update the situation status. The differences between particular cases of such systems can mainly concern the number of data sources used, the number of agents, the number of decision making levels. Nevertheless, in the most cases such a multi-agent system has to comprise more than one level of data processing and decision making, component providing "synchronization" as well as component responsible for first-level decision combining.
418
Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov
32.4 Data Mining with Missing Values for Situation Assessment Data mining with missing values is a special problem being investigated for a long time. Here most of researchers mainly dwell upon the methods based on a reasonable assignment ("imputation") of the missing values exploiting mostly statistical ideas, but such approaches are not applicable to SA due to substantial variety of dynamics of input data streams. Unfortunately, an approach based on using the imputation idea is not relevant to many situation assessment applications. This is the reason why the direct mining of data with missing values has to be used for this application. An approach to direct mining of data with missing values that does not assume an imputation was proposed in [6]. The idea exploited in it is conceptually simple: if we arbitrary assigned the missing values of training dataset, we would be able to extract the set of maximally general rules, MGR [10] using the existing techniques like AQ [9], RIPPER [2], GK2 [7], etc. It is important to note that different assignments would lead to different MGR sets. It was shown in [6] that among different assignments of missing values of training dataset two specific variants exist that lead to the sets of MGR serving as low Riow and upper Rupper bounds for any set of MGR i?* corresponding to an arbitrary assignment: •^low
_ •^* ^ -^upper
K'^^'^J
where C is the deducibility relation. Let us outline how this assignments can be found and how the bounds Riow and Rupper can be computed. Let t{i) be an arbitrary z-th instance of training dataset and k be the index of the chosen seed [10], / ^ be the indexes set of seed attributes with assigned values. While searching for MGR for seed t{k), columns of training dataset whose indexes are out of the set / ^ are ignored. Let us denote the index set of missing values in a negative example t{l) by /^~^ and the same set in a positive example t(r), r ^ k,by I^^. Let us consider two variants of missing values assignment in the sets of negative NE and positive PE examples:
J
i = ^tli € /^„/ e NE'Xi = tti e /+,,r G PE,
I
t\ = tlielr^^.le
NE;t\
= -^t^i G /,%,r G PE
(32.2) (32.3)
The assignment (32.2) maximally increases both distinctions between the seed and negative examples, and similarities between the seed and other positive examples. On the contrary, the assignment (32.3) maximally increases both similarities between the seed and negative examples, and distinctions between the seed and other positive examples. The assignment (32.2) can be called optimistic, it cannot decrease both the generality and coverage factor of any rule of MGR extracted from any arbitrary assigned source dataset. In the assignment (32.2), that can be called ''pessimistic'', both the generality and coverage factors of rules extracted from any arbitrary assigned source dataset cannot be increased.
32 Multi-agent and Data Mining Technologies for Situation Assessment...
419
The above statements provide a general framework for direct mining of data with missing values It is obvious that the rule set Riow belongs to the set of MGR under search. Other rules have to be selected from the rule set Rupper- Let us explain how this can be done. Let alternative classes of situations be denoted as Q and Q. The selection algorithm used in this research consists of the following steps applied to each seed: L Assign the missing values of the training dataset "optimistically" as in (32.2) and mine the rule set Rupper for classes Q and 0 ^ RupperiQ) and RupperiQ)2. Assess the quality of the extracted rule sets Rupper (Q), RupperiQ) using certain evaluation criteria based on testing dataset and select the best rules from these sets for use in reasoning mechanism. 3. Design classification mechanism and assess its performance quality. 4. If the above quality is not satisfactory go to 4 to repeat rule selection. The other procedures are the same as used in ordinal cases [10]. An experiment simulating training of anomaly detection system case study allows to optimistically evaluate the developed approach to online situation assessment update. Indeed, as applied to the anomaly detection task which uses data of the network traffic level, data of operating system log and data of application level, this approach showed the estimated probability of correct classification about 0.99 if tested on testing and training samples with about 20% of missing values.
32.5 Implementation Issues An important issue of multi-agent situation assessment systems is a technology of its analysis, design, implementation and deployment. In this research the MASDK 3.0 software tool supporting all the stages of multi-agent technology is used [5]. MASDK System kernel, MAS XML specification
Integrated set ofl editors
ir^ L Generic agent
Software agent | builder
3ost
Communication platform
Fig. 32.6. MASDK software tool components and their interaction
This software tool implementing Gaia methodology [16] consists of the following components (Fig. 32.6): 1. system kernel, which is a data structure for XML-based representation and storing of target applied MAS specification; 2. integrated multitude of user friendly editors supporting user's activity destined for specification of applied MAS;
420
Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov
3. library of C++ classes implementing what is usually called Generic agent integrating reusable component of agents; 4. communication platform to be installed in particular computers of a network; and 5. generator of software agent instances, which performs generation of source C++ code and executable code of software agent instances and also software needed for MAS deployment over the already installed communication platform. Specification of applied MAS in the system kernel is carried out using the editors structured in three levels. The editors of ihQ first level used for description of applied MAS at the analysis stage are as follows: 1. application ontology editor, 2. editor describing roles, names of agent classes, and high-level schemes of roles' interactions, 3. editor describing roles' interaction protocols. Editors of the second level supporting specification of agent classes at design stage are as follows: 1. editor specifying model of meta-level behavior of agent (while analyzing input messages and interacting with user and environment); 2. editor specifying particular agent functions and behavior scenarios in terms of state machines; 3. editor specifying software agent private ontology, and 4. initial state of agent class mental model. Editors of the third level support specification of the MAS components needed for its deployment. Applied MAS specification produced by designers making use of the above editors is stored as XML file in the system kernel. Generation of source (in C++) and executable codes is performed in automatic mode. The case study on anomaly detection system described in section 32.2 and used through all the paper for demonstration of the proposed solutions was implemented through MASDK 3.0 software tool. All the classifiers were trained for Distributed data Mining tool also developed by the authors [4]. One of its components implements the algorithm for direct mining of data with missing values described in previous section. It was used for training of the classifier of meta-level, denoted in Fig. 32.5 as Q B This system was trained and tested based on well known DARPA data. Fig. 32.6 demonstrates graphically the performance of this multi-agent system for anomaly detection for certain time period lasting about one hour. In the bottom part the time intervals where intrusion takes place are presented in black color, whereas the intervals without intrusions are given in white color. The top part of Fig. 32.6 presents the performance results of the developed anomaly detection system. In it same colors are used for the same decisions produced by the anomaly detection system. The decisions corresponding to false alarms and missing of signals are presented here below
32 Multi-agent and Data Mining Technologies for Situation Assessment...
421
the time axis, whereas decisions corresponding to the correct anomaly detections are given above this axis in black color. The correct detection of normal users' activity are given in white color above the time axis.
Fig. 32.7. Visualization component of the implemented anomaly detection system
It can be seen that although only traffic-based data source was used, and training dataset contains rather high percentage of missing values (about 20%) the results are not too bad, although they are far from "ideal". It also should be noted that the purpose of the above experiments at the current stage of research was not to evaluate the algorithm developed for direct mining of data with missing values but validate the architecture as well as the developed design and implementation technology destined for engineering of situation assessment systems supporting on-line update of situation assessment.
32.6 Conclusion The paper is devoted to certain key issues of the situation assessment problem. It considers the situation assessment task statement that accounts for the fact that input of any situation assessment system is composed of asynchronous data streams possessing various life times, and the input can contain missing values. Other important peculiarity of the situation assessment task statement that is very significant for practice is that situation assessment has to be updated on-line, i.e. that such systems operate in real-time mode. The novel results presented in the paper are as follows: 1. New sound approach to direct mining of data with missing values based on computation of upper and low bounds of the sets of maximally general rules that can be extracted from arbitrary assigned training data with missing values. 2. Two level multi-agent architecture for situation assessment systems making decisions based on asynchronous data streams arriving from multiple sources. The main paper results were used in design and implementation of a software prototype of multi-agent anomaly detection system operating on the basis of multiple data sources. The future research will aim at further validation of the paper results via design, and implementation of multi-agent software prototypes for other security-related applications.
422
Vladimir Gorodetsky, Oleg Karsaev, and Vladimir Samoilov
Acknowledgement We wish to thank European Office of Aerospace Research and Development of the USAF (Project 1993P) and Russian Foundation for Basic Research (grant # 04-0100494) for support of this research.
References 1. Ben-Bassat, M., Freedy, A.: Knowledge Requirements and Management in Expert Decision Support Systems for (Military) Situation Assessment. IEEE Transactions on Systems, Man and Cybernetics, vol.12. (2002) pp. 479-490 2. Cohen, W.: Fast efficient rule induction. Machine Learning: 12th International Conference, CA, Morgan Kaufmann (1995) 3. Goodman, I., Mahler, R., and Nguen, H.: Mathematics of Data Fusion. Kluwer Academic Publishers, (1997) 4. Gorodetsky, V., Karsaeyv, O., and Samoilov, V: Software Tool for Agent-Based Distributed Data Mining. Proceedings of the IEEE Conference "Knowledge Intensive Multiagent Systems" (KIMAS 03), Boston, USA (2003) 5. Gorodetski, V, Karsaev, O., Kotenko, and I., Khabalov, A.: Software Development Kit for Multi-agent Systems Design and Implementation. In B.Dunin-Keplicz, E.Navareski (Eds.), From Theory to Practice in Multi-agent Systems. Lecture Notes in Artificial Intelligence, Vol. 2296, (2002) 121-130 6. Gorodetsky, V, Karsaev, O.: Mining of Data with Missing Values: A Lattice-based Approach. In Proceedings of International Workshop on the Foundation of Data Mining and Discovery, Japan, (2002) 151-156 7. Gorodetsky, V, Karsaev, O.: Algorithm of Rule Extraction from Learning Data. Proceedings of the 8-th International Conference "Expert Systems & Artificial Intelligence" (EXPERSYS-96)(1996) 133-138 8. Greeenhill, S., Venkatesh, S., Pearce, A., Ly, T.C.: Representations and Processes in Decision Modeling. DSTO Aeronautical and Maritime Research Laboratory, Australia, DSTO-GD-0318(2002) 9. Michalski, R.: A Theory and Methodology of Inductive Learning. Machine Learning, vol.1, Carbonel, J.G., Michalski, R.S. and Mitchel, T.M. (Eds.). Tigoda, Palo Alto (1983) 83-134 10. Michalski, R. and Kaufman, A.: Data Mining and Knowledge Discovery: A Review of Issues and Multistrategy Approach. Machine learning and Data Mining: Methods and Applications, John Wiley and Sons, (1997) 11. Proceeding of the Fifth International Conference on Information Fusion (IF-2002). Annapolis, MD, July 7-11, (2002) 12. Proceeding of the Six International Conference on Information Fusion (IF-2003). Melbourne, AustraUa, July 13-17 (2003) 13. Salerno, J., Hinman, M., Boulware, D.: Building a Framework for Situation Assessment. Proceedings of The 7th International Conference on Information Fusion. Sweden (2004) 14. Salerno, J.: Information Fusion: A High-level Architecture Overview. In CD Proceedings of the Fusion-2002, AnnapoHs, MD (2002) 680-686. 15. Than, C. L., Greenhill, S., Venkatesh, S., Pearce, A.: Multiple Hypotheses Situation Assessment. Proceedings of The 6th International Conference on Information Fusion. Australia, (2003)972-978 16. Wooldridge, M., Jennings, N.R., Kinny, D.: The Gaia Methodology for Agent-Oriented Analysis and Design. Journal of Autonomous Agents and Multi-Agent Systems, 3, vol.3. (2000)285-312
33
Virtual City Simulator for Education^ Training, and Guidance Hideyuki Nakanishi Department of Social Informatics, Kyoto University Kyoto 606-8501, Japan [email protected] http://www.Iab7.kuis.kyoto-u.ac.jp/~nuka/
33.1 Introduction Since smooth evacuation is important to safeguard our lives, we are taught how to evacuate in preparation for a disaster. For example, fire drills are conducted in schools. However, so sporadic and preestablished real-world trainings can only give us very limited experience. Also, it is rare to conduct fire drills in large-scale public spaces such as central railway stations, even though they are places where vast amounts of people gather. Crowd simulations should be used to compensate for the lack of opportunities to experience this. Even if we have learned to evacuate before an emergency happens, we need appropriate guidance during such emergency. For example, large buildings have to be equipped with many emergency exits and their signs. However, these architectural guidance objects are not very flexible and human guidance is necessary to help an escaping crowd while adapting to the moment's needs. Crowd simulations should also be used to assist such guidance tasks. Crowd simulations having the purpose of learning about evacuation and guidance of escaping people would be very beneficial. Multi-agent simulations are already known as a technology that deals effectively with the complex behavior of escaping crowds [9, 28]. However, conventional simulations are not tailored for learning or guiding an evacuation. Since they are only designed for analyzing crowd behavior, they do not take human involvement much into account. For example, crowds are usually represented as moving particles. It is not easy to interpret such a symbolic representation. Moreover, it is almost impossible for users to become a part of the simulated crowds and experience a virtual evacuation. So in order to compensate for this incapability, and as part of the Digital City Project [12], we have developed "FreeWalk" [25], a virtual city simulator that allows human involvement. In FreeWalk, multi-agent simulations that include human beings can be designed. In this paper, I describe Free Walk's design and how it is used for learning and guidance purposes. First, the capability to involve humans is described. Next, an
424
Hideyuki Nakanishi
experiment to evaluate its effectiveness for learning is explained. Finally, the first prototype of a guidance system is introduced.
33.2 FreeWalk Virtual training environments have already been used for single-person tasks (e.g. driving vehicles) and are becoming popular for multi-party tasks because they can significantly decrease the cost of group training. In these environments, it is easier to gather many trainees since they participate in the training as avatars by entering the virtual space through a computer network. In addition, it is possible to practice a dangerous task repeatedly since trainees are inherently safe. In the virtual environments "social agents", the software agents that have social interaction with people [24], play an important role in the following two ways: 1) By sharing the same group behavior, social agents become colleagues of human trainees and can decrease the number of human participants necessary to carry out a large-scale group training. 2) Social agents can play a predefined role within the training. Scripted training scenarios enable social agents to perform their assigned roles [6]. Human participants can then learn something through the interaction with the social agents, which behave according to the training scenario. We have developed a platform for simulating social interaction in virtual space called FreeWalk, its primary application being virtual trainings. We integrated diverse technologies related to virtual social interaction, e.g. virtual environments, visual simulations, and lifelike characters [30]. In FreeWalk, lifelike characters enable virtual collaborative events such as virtual meetings, trainings, and shopping in distributed virtual environments. You can conduct distributed virtual trainings [8, 21, 38] where you can use lifelike characters as the colleagues of human trainees [32]. FreeWalk is not only for training but also for communication and collaboration [7, 19, 33]. You can use lifelike characters as the facilitators of virtual communities [11]. These characters and the human participants can use verbal and nonverbal conmiunication skills to talk with one another [5]. And it can also be a browser of 3D geographical contents [20] in which lifelike characters guide your navigation [16] and populate the contents [36]. To allow users to be involved in a multi-agent simulation, each virtual human in FreeWalk can be either an avatar or an agent. 'Avatar' means a virtual human manipulated by a user through the keyboard, mouse, and other devices. 'Agent' means a virtual human controlled by an outside program connected to FreeWalk. FreeWalk has a common interaction model for both agents and avatars while at the same time having different interfaces for them, so they can interact with each other based on the same model. Agents are controlled through the application program interface (API). Avatars are controlled through the user interface (UI). FreeWalk does not distinguish agents from avatars. Figure 33.1 roughly shows the distributed architecture of FreeWalk. An agent is controlled through the platform's API. A human participant enters the virtual space as an avatar, which he/she controls through the UI devices connected to the platform. Each character can be controlled from any client. FreeWalk
33 Virtual City Simulator
425
Program
Fig. 33.1. Architecture of FreeWalk uses a hybrid architecture in which the server administrates only the list of current members existing in the virtual space and each client administrates the current states of all characters. This architecture enables agents and people to socially interact with each other based on the same interaction model in a distributed virtual space. Currently, FreeWalk is connected with the scenario description language " g " [13]. It took a long time to construct agents that can socially interact with people, since such agents need to play various roles and each role needs its specific behavioral repertory. We thought it should be easier to design the external role of an agent instead of its internal mechanism and previous studies had focused on the internal mechanism rather than on the external role [16]. So 2 is a language for describing an agent's external role through an "interaction scenario", which is an extended finite state machine whose input is the perceived cues, whose output is an action, and where each state corresponds to each scene. Each scene includes a set of interaction rules, each of which is a couple made up of a conditional cue and the consequent series of actions. Each rule is of the form: "if the agent perceives the event A, then the agent executes the actions B and C." FreeWalk agents behave according to the assigned scenario. FreeWalk and the language processor of Q are connected by a shared memory, through which the Q processor calls FreeWalk's API functions to evaluate cues and actions described in the current scene. FreeWalk's virtual city makes a multi-agent simulation more intuitive and understandable. The spatial structure and the crowd behavior of the simulation are represented as 3D photo-based models. Since the camera viewpoint can be changed freely, users can observe the simulation through a bird's-eye view of the virtual city and also experience it by controlling their avatars using first-person views. FreeWalk does not use neither prepared gait animations nor simplified collision models to keep
426
Hideyuki Nakanishi
the correspondence between crowd behavior and the graphical representation of the virtual city. The VRML model that is basically used for drawing the virtual city is also used as a geometric model to detect collisions with the spatial structure and to generate gait animations. Animations are generated based on the hybrid algorithm of kinematics and dynamics [37]. To reduce the building cost, each VRML model was constructed as the combination of pictures taken by digital cameras and a simple geometric model based on the floor plan. A simple model also helps to reduce the workload of collision detection. It is also possible to represent the actual state of an existing real-world crowd in realtime. To achieve this, it is necessary to synchronize the events simulated in Free Walk with those occurring in the real world. FreeWalk provides an interface used to connect with a sensor network through the Internet. FreeWalk uses physical and social rules to robustly synchronize the movements of human figures with those of the real-world crowds. Based on the positions captured by the sensors. FreeWalk determines the next position of the corresponding human figure. This next position is modified according to the social rules described in the Q language. (Examples of rules are flocking behaviors such as following others and keeping a fixed distance from them [31], and such cultural behaviors as forming into a line to go through a ticket gate or forming a circle to have a conversation [17].) Then, the next position is modified again based on the pedestrian model to avoid collision with others, walls, or pillars [28]. Finally, the gait animation is generated.
33.3 Learning Evacuation Free Walk enables users to experience crowd behavior from their first-person view (FP in Figure 33.2.) With this viewpoint they can practice decision making since they control their avatars based on what their personal view informs. But in FreeWalk users can also observe the overall crowd behavior from a bird's-eye view (BE in Figure 33.2.) This view is more effective in understanding an overall crowd behavior than first-person views. Since both views have different advantages, we conducted an experiment to compare them, and also to derive their synergic effects in order to find out the best way to learn evacuation and the kinds of superiority of FR In the experiment, we tested each view and the effect of viewing a combination of them both in different orders. We compared four groups: experiencing a first-person view (FP group); observing a bird's-eye view (BE group); experiencing a first-person view before observing a bird's-eye view (FP-BE group); and observing a bird's-eye view before experiencing a first-person view (BE-FP group). The subjects were 96 college students. 24 subjects were assigned to each group. A previous real-world experiment [34] had given us a gauge to measure the subjects' own understanding of the resulting crowd behavior. This experiment demonstrated how the following two evacuation methods cause different crowd behaviors: 1) In the follow-direction method, the leaders point their arms at the exit and shout out, "the exit is over there!" to indicate the direction. They don't escape until all evacuees have gone out. 2) In the follow-me method, they do not indicate the di-
33 Virtual City Simulator
427
First-person view (FP)
Fig. 33.2. Two views for learning evacuation
rection. To a few of the nearest evacuees, they whisper, "follow me" and proceed to the exit. This behavior creates a flow toward the exit. The evacuation simulation was constructed based on this previous experiment [22]. At the beginning of the simulation, everyone was in the left part of the room, which was divided into left and right parts by the center wall as shown in the BE of Figure 33.2. Four leaders had to lead sixteen evacuees to the correct exit on the right side and prevent them from going out through the incorrect exit in the left part. In the FP simulations, six of the evacuees were subjects and the rest were agents. So, four FP simulations were conducted in each of the three groups that include FP simulations. In the BE simulations, both evacuees and leaders were all agents. In the experiment, subjects observed and experienced the two different crowd behaviors caused by the two evacuation methods explained above. We used the resulting behaviors as questions and the causing methods as answers: In a quiz including 17 questions, subjects read the description of each crowd behavior and chose one of the two methods as an answer. They took the quiz before and after the experiment. We used a t-test to find significant differences between the scores of pre- and post-quizzes. A significant difference meant that the subject could learn the asked nature of crowd behavior through his or her observation and experience. Table 33.1 summarizes the results of the t-test on nine questions. Since no group could answer the other eight questions correctly, they are omitted. Even though the
428
Hideyuki Nakanishi
results depend on the design of the quiz, it seems clear that a bird's-eye observation was necessary to understand the crowd's behavior. The FP group could not answer the questions from no. 3 to 9, which were related to the overall crowd behavior. However, the first-person experience was not worthless. It is interesting that the BEFP group could answer the questions no. 6 and 7 that the BE and FP-BE groups could not. This result implies that the background knowledge of the overall behavior enabled the subjects to infer individuals' actions (the following behavior) and its outcome (the formation of a group) from their first-person experiences. They could understand how they interacted with other evacuees because they controlled their avatars by themselves. They could also understand the others interacted with each other the same way as themselves because they knew the overall behavior beforehand. The ranking of the four ways to learn evacuation is illustrated in Figure 33.3. BE-FP was the best way, BE and FP-BE were next, and FP was the worst. We found that the best way is to observe first and then experience.
33.4 Guiding Evacuation Our living space consists of our home, office and public spaces. Studies on remote communication have predominantly focused on the first two spaces. The primary issue of these studies is how to use computer network technologies to connect distributed spaces. These studies have proposed various designs and technologies but share the same goal, which is the reproduction of face-to-face (FTF) communication environments. For example, media space research tried to connect distributed office spaces [2]. Telepresence and shared workspace research explored a way to integrate distributed deskwork spaces [4, 14]. Spatial workspace collaboration research dealt Table 33.1. Summary of the results of the quiz (one-sided paired t-test) BE-FP
No.
Question (the answer of all items is the foUow-me method.)
FP
1
Leaders are the first to escape.
4 3***
2.2*
2.3*
4_Q***
4 Q***
4 2*5f!*
1.9*
2.9**
BE
FP-BE
2
Leaders do not observe evacuees.
2.8**
44***
3
Leaders escape Hke evacuees.
1.6
2.2*
4
One's escape behavior is caused by others' escape behavior.
1.2
2.1*
3.3**
2.9**
1.6
4 9***
3 y***
4.5***
5
Nobody prevents evacuees from going to the incorrect exit.
6
Evacuees follow other evacuees.
1.3
1.0
0.7
2.1*
7
Evacuees form a group.
1.6
0.5
1.2
1.9*
8
Leaders and evacuees escape together.
0.7
2.0*
0.2
3.4**
Evacuees try to behave the same as other evacuees.
0.9
2.5*
1.5
0.9
[9
*p<.05, **p<.01, ***p<.001 (df=23)
33 Virtual City Simulator
429
Effectiveness
Fig. 33.3. Ranking of the four ways to learn evacuation with spatially configured workspaces [18]. CVE research proposed using virtual environments as virtual workspaces [1]. And this kind of efforts still continue [15]. A recent additional issue is how to use the technologies for enhancing collocated spaces [10]. We tackled a third but increasingly important issue-how to use the technologies to support remote conmiunication in large-scale public spaces such as a central railway station. Those spaces have characteristic participants: staff administrating the space and visitors passing through it. Remote conMnunication between them is important because a vast amount of people gathers and appropriate guidance for crowd control is critical. Currently, surveillance cameras and announcement speakers connect staff in a control room with visitors in a public space. The staff can observe the visitors thanks to the cameras and talk to them through the speakers. This traditional conmiunication system is not enough for individual guidance. The off-site staff in some room can give overall guidance for the whole visitors but on-site staff working in the public space is necessary to give location-based guidance for each visitor. We devised a new way to guide each visitor remotely and a new conmiunication environment for it, since conventional environments that aimed at the reproduction of FTP communication cannot be adapted to the case where every visitor is a candidate for on-demand guidance. The results of the evacuation learning experiment described in the previous section have several implications on the design of the communication environment. The surveillance cameras enable the staff to watch many fragmentary views of the public space. However, the results showed that bird's-eye views were better than first-person views. Thus, a single global view is better than a collection of fragmentary views. The announcement speakers can convey only uniform information for all the visitors.
430
Hideyuki Nakanishi
However, the results showed that the group experiencing a first-person view could understand the situation much better if they observed the bird's-eye view after their first-person experience. This means that announced information should teach the visitors the overall situation surrounding them. Thus, the visitors need not only overall information, e.g. a fire breaks out, but site-specific information like "too many people are rushing into the stairs in front of you." Another limitation of the announcement speakers is that they cannot support two-way communication. The most interesting result was that the best way to learn the crowd behavior was to observe it first and experience it afterward. This result implies that the staff can derive useful information from the visitors. Thus, two-way communication is better than one-way communication. We built Free Walk into an evacuation guidance system in which a public space is monitored by vision sensors instead of surveillance cameras and information is transmitted by mobile phones instead of announcement speakers. Figure 33.4 is a snapshot of our guidance system and the escaping passengers on a station's platform, with their mobile phones at hand. You can see a pointing person who stands in front of a large-scale touch screen. Suppose that this person is a station staff officer working in a control room. The screen displays the bird's-eye view of the simulated station visualized thanks to FreeWalk. The walking behavior of human figures in the virtual station is generated according to the positional data transmitted from the real station, equipped with vision sensors that track the movements of the real-world escaping passengers. In the snapshot, the man is pointing at a human figure, which represents one of the passengers. When the touch screen detects this pointing operation, the system immediately activates the connection between the officer's headset and the passenger's mobile phone. This trick is possible because the headset is connected to a PC equipped with a special interface card which can control audio connections between the PC and several telephone lines. This simple coupling between pointing operation and audio activation makes it easy for the staff to begin and end an instruction. As described above, the system provides the staff a single global view of the public space and two-way communication channels with particular visitors so that the staff can supply the visitors site-specific information. Kyoto Station in Kyoto City is a central railway station where the number of visitors per day is more than 300,000. To install our evacuation guidance system in the station, we attached a vision sensor network to the station. We attached 12 sensors to the concourse area and 16 sensors to the platform. Figure 33.5(a) is the floor plan, on which the black dots show the sensors' positions, and Figure 33.5(b) shows how they have been installed. The vision sensor network can track passengers between the platform and the ticket gate. In Figure 33.5(c), you can see a CCD camera and a reflector with a special shape [23]. If we could expand the field of view (FOV) of each camera, we could reduce the number of required cameras. However, a widened FOV causes minus (barrel) distortion in the images taken by conventional cameras. The reflector of our vision sensor can eliminate such distortion. The shape of the reflector can tailor a plane that perpendicularly intersects the optical axis of the camera to be projected perspectively to the camera plane. As shown in Figure 33.5(d), this optical contrivance makes it possible to have a large FOV without distortion. From
33 Virtual City Simulator
431
Fig. 33.4. Evacuation guidance system the images taken by the cameras, the regions of moving objects are extracted using the background subtraction technique. The position of each moving object is determined based on geographical knowledge, including the position of the cameras, the occlusion edges in the views of the cameras, and the boundaries of walkable areas.
432
Hideyuki Nakanishi
(b) /nstaf/at/on
(e) SImufatedpassengers Fig. 33.5. Virtual Kyoto Station
Figure 33.5(e) is a screenshot of the simulated passengers synchronized with their retrieved positions. We named the communication form supported by our guidance system "transcendent communication" [26]. In transcendent communication, a user watches the bird's-eye view of the real world to grasp its situation and points at a particular location within the view to select a person or group of people to talk to. Figure 33.6 explains the difference between distributed communication and transcendent communication. In distributed communication, a virtual space is used for connecting real spaces each of which contains its own participant. Therefore, the virtual space is a synthetic space, which transmits nonverbal cues such as proxemics and eye contact [27]. The goal of distributed communication is the reproduction of collocated communication. In transcendent communication, a virtual space is used for visualizing the bird's-eye view of the real space. Therefore, the virtual space is a projective space, which represents the real world situation. The goal of transcendent communication is not the reproduction of collocated conmiunication but the production of asymmetric conmiunication. Collocated communication is symmetric since everyone has his/her first-person view to observe the others and can control conversation floor. In distributed communication, participants should be basically reciprocal since privacy is an important issue and intrusiveness should be avoided [3, 35]. To the contrary, in transcendent communication, the bird's-eye view helps a transcendent participant, e.g. someone from the station staff, become an intrusive observer who can administrate the immanent participants, in this case, the passengers. Transcendent participants need to observe immanent participants but immanent participants
33 Virtual City Simulator
433
do not need to observe transcendent participants. And only transcendent participants can control communication channels.
Real
cil^fc. Collocated
Distributed
Transcendent
Fig. 33.6. Transcendent communication
Figure 33.7 presents an example of "transcendent guidance", that is, guidance via transcendent communication. In Figure 33.7(a), the staff is watching a virtual station platform where too many people are rushing into the stairs at the right, while the left stairs are not crowded at all. The staff determines that a group of people following behind in the right crowd is not safe and should be guided towards the left stairs. In Figure 33.7(b), the staff connects communication channels with the group to begin guidance instructing them to switch the destination from the right stairs to the left stairs as shown in Figure 33.7(c). The current implementation in Kyoto Station cannot allow us to do the exact same thing as this example due to technological limitations. The image processing function of the vision sensor network does not work if the platform is crowded. And there is no mechanism to automatically track the phone numbers of the passengers. However, the technological advances in perceptual user interfaces (PUI) [29] may soon be able to eliminate these implementation issues. Communication channel control should be very efficient in order to give good guidance. To explore its interaction design, we implemented two different kinds of user interfaces. In the GUI version shown in Figure 33.4, touching characters and talking are coupled. In the PUI version shown in Figure 33.8, we used an eye-tracking device instead of a touch screen to couple gazing characters and talking. The PUI version gives a much more seamless feeling than the GUI version since a vocal channel is established immediately when a user looks at the character to talk. However, gaze is a single pointing device while a touch screen enables a user to use at least two devices, i.e. his or her two hands. Even if the screen can only detect a single spot being touched at one given time, the two hands are more efficient than a single hand or a gaze.
434
Hideyuki Nakanishi
(a) Watch
R ^a
T
bl- M
(b) Connect
(c) Guide
R ^H
.^niEC^.^
Fig. 33.7. Transcendent guidance
Fig. 33.8. Gaze-and-talk interaction
33.5 Conclusion We presented two examples of crisis management applications of our virtual city simulator, FreeWalk. The first application is virtual evacuation simulation, where
33 Virtual City Simulator
435
learners can observe multi-agent crowd behavior simulations described in the Q language and also take part in the simulation as avatars. The second application is transcendent guidance systems, which visualize real-world pedestrians in the virtual city and enable location-based remote guidance. The key feature of our simulator is such inclusion of humans in crowd behavior simulations of urban spaces. In the simulations, each person can be simulated as an agent, an avatar, or a projective agent that visualizes context information retrieved from a real-world person walking around a smart environment. The development of the two applications showed that the design principles of real-world systems could be derived from virtual simulations. We designed the transcendent guidance system based on the result of the evacuation simulation experiment. The transfer of the design principles was made possible by the correspondence between the two different viewpoints (first-person and bird's-eye views) and the two different kinds of users (transcendent and immanent users). We think this study implies a new method of software design. Acknowledgements. This work was conducted as part of the Digital City Project supported by the Japan Science and Technology Agency. I express my special gratitude to Torn Ishida, the project leader. This work would also have been impossible without the contribution of Satoshi Koizumi and Hideaki Ito. I express my thanks to the Municipal Transportation Bureau and General Planning Bureau of Kyoto city for their cooperation. I received a lot of support in the construction of the simulation environment from Toshio Sugiman, Shigeyuki Okazaki, and Ken Tsutsuguchi. Thanks to Hiroshi Ishiguro, Reiko Hishiyama, Shinya Shimizu, Tomoyuki Kawasoe, Toyokazu Itakura, CRC Solutions, Mathematical Systems, and CAD Center for their efforts in the development and experiment. The source code of Free Walk and Q is available at http://www.lab7.kuis.kyoto-u.ac.jp/freewalk/ and http://www.digitalcity.jst.go.Jp/Q/.
References 1. Benford, S., Greenhalgh, C, Rodden, T. and Pycock, J. Collaborative Virtual Environments. Communications of the ACM, 44(7), 79-85, 2001. 2. Bly, S. A., Harrison, S. R. and Irwin, S. Media Spaces: Bringing People Together in a Video, Audio, and Computing Environment. Communications of the ACM, 36(1), 28-47, 1993. 3. Homing, A. and Travers, M. Two Approaches to Casual Interaction over Computer and Video Networks. International Conference on Human Factors in Computing Systems (CHI91), 13-19, 1991. 4. Buxton, W. Telepresence: Integrating Shared Task and Person Spaces. Canadian conference on Graphics Interface (GI92), 123-129, 1992. 5. Cassell, J., Bickmore, T, Billinghurst, M., Campbell, L., Chang, K., Vilhjalmsson H. and Yan H. Embodiment in Conversational Interfaces: Rea. International Conference on Human Factors in Computing Systems (CHI99), 520-527, 1999.
436
Hideyuki Nakanishi
6. Cavazza, M., Charles, F. and Mead, S. J. Interacting with Virtual Characters in Interactive Storytelling. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS2002), 318-325, 2002. 7. Greenhalgh C. and Benford S. Massive: A Collaborative Virtual Environment for Teleconferencing. ACM Transactions on Computer-Human Interaction, 2(3), 239-261, 1995. 8. Hagsand, O. Interactive Multiuser VEs in the DIVE System. IEEE MultiMedia, 3(1), 3039, 1996. 9. Helbing, D., Parkas, I.J. and Vicsek, T. Simulating Dynamical Features of Escape Panic. Nature, 407(6803), 487-490, 2000. 10. Huang, E. M. and Mynatt, E. D. Semi-Public Displays for Small, Co-located Groups. International Conference on Human Factors in Computing Systems (CHI2003), 49-56, 2003. 11. Isbell, C. L., Keams, M., Kormann, D., Singh, S. and Stone, P. Cobot in LambdaMoo: A Social Statistics Agent. National Conference on Artificial Intelligence (AAAI2000), 36-41, 2000. 12. Ishida, T. Digital City Kyoto: Social Information In-frastructure for Everyday Life. Communications of the ACM, 45(7), 76-81, 2002. 13. Ishida, T. Q: A Scenario Description Language for Interactive Agents. IEEE Computer, 35(11), 54-59, 2002. 14. Ishii, H., Kobayashi, M. and Arita, K. Iterative Design of Seamless Collaboration Media, Communications of the ACM, 37(8), 83-97, 1994. 15. Jancke, G., Venolia, G. D., Grudin, J., Cadiz, J. J. and Gupta, A. Linking Public Spaces: Technical and Social Issues. International Conference on Human Factors in Computing Systems (CHI2001), 530-537, 2001. 16. Johnson, W.L., Rickel, J.W. and Lester, J.C. Animated Pedagogical Agents: Face-to-Face Interaction in Interactive Learning Environments. International Journal of Artificial Intelligence in Education, 11, 47-78, 2000. 17. Kendon, A. Spatial Organization in Social Encounters: the F-formation System. A. Kendon, Ed., Conducting Interaction: Patterns of Behavior in Focused Encounters, Cambridge University Press, 209-237, 1990. 18. Kuzuoka H. Spatial Workspace Collaboration: a SharedView Video Support System for Remote Collaboration Capability. International Conference on Human Factors in Computing Systems (CHI92), 533-540, 1992. 19. Lea, R., Honda, Y., Matsuda K. and Matsuda, S. Conamunity Place: Architecture and Performance. Symposium on Virtual Reality Modeling Language (VRML97), 41-50, 1997. 20. Linturi, R., Koivunen, M. and Sulkanen, J. Helsinki Arena 2000 - Augmenting a Real City to a Virtual One. T. Ishida, K. Isbister Ed., Digital Cities, Technologies, Experiences, and Future Perspectives. Lecture Notes in Computer Science 1765, Springer-Verlag, New York, 83-96. 2000. 21. Macedonia, M. R., Zyda, M. J., Pratt, D. R., Barham, P. T. and Zeswitz, S. NPSNET: A Network Software Architecture for Large-Scale Virtual Environments. Presence, 3(4), 265287, 1994. 22. Murakami, Y, Ishida, T, Kawasoe, T. and Hishiyama, R. Scenario Description for Multiagent Simulation. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS2003), 369-376, 2003. 23. Nakamura, T. and Ishiguro, H. Automatic 2D Map Construction using a Special Catadioptric Sensor. International Conference on Intelligent Robots and Systems (IROS2002), 196-201, 2002. 24. Nakanishi, H., Nakazawa, S., Ishida, T., Takanashi, K. and Isbister, K. Can Software Agents Influence Human Relations? - Balance Theory in Agent-mediated Communities
33 Virtual City Simulator
437
-. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS2003), 717-724. 2003. 25. Nakanishi, H. FreeWalk: A Social Interaction Platform for Group Behavior in a Virtual Space. International Journal of Human Computer Studies, 60(4), 421-454, 2004. 26. Nakanishi, H., Koizumi, S., Ishida, T. and Ito, H. Transcendent Communication: Location-Based Guidance for Large-Scale Public Spaces. International Conference on Human Factors in Computing Systems (CHI2004), 655-662, 2004. 27. Okada, K., Maeda, R, Ichikawa, Y. and Matsushita, Y. Multiparty Videoconferencing at Virtual Social Distance: MAJIC Design. International Conference on Computer Supported Cooperative Work (CSCW94), 385-393, 1994. 28. Okazaki, S. and Matsushita, S. A Study of Simulation Model for Pedestrian Movement with Evacuation and Queuing. International Conference on Engineering for Crowd Safety, 271-280, 1993. 29. Pentland, A. Perceptual Intelligence. Communications of the ACM, 43(3), 35-44, 2000. 30. Prendinger, H. and Ishizuka, M. Life-Like Characters: Tools, Affective Functions, and Applications. Springer Verlag, 2004. 31. Reynolds, C. W. Flocks, Herds, and Schools: A Distributed Behavioral Model. International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH87), 2534, 1987. 32. Rickel, J. and Johnson, W. L. Animated Agents for Procedural Training in Virtual Reality: Perception, Cognition, and Motor Control. Applied Artificial Intelligence, 13, 343-382, 1999. 33. Sugawara, S., Suzuki, G., Nagashima, Y, Matsuura, M., Tanigawa H. and Moriuchi, M. Interspace: Networked Virtual World for Visual Communication. lEICE Transactions on Information and Systems, E77-D(12), 1344-1349, 1994. 34. Sugiman T. and Misumi J. Development of a New Evacuation Method for Emergencies: Control of Collective Behavior by Emergent Small Groups. Journal of Applied Psychology, 73(1), 3-10, 1988. 35. Tang, J. C. and Rua, M. Montage: Providing Teleproximity for Distributed Groups. International Conference on Human Factors in Computing Systems (CHI94), 37-43, 1994. 36. Tecchia, R, Loscos, C. and Chrysanthou, Y Image-Based Crowd Rendering. IEEE Computer Graphics and Applications. 22(2), 36-43, 2002. 37. Tsutsuguchi, K., Shimada, S., Suenaga, Y Sonehara, N. and Ohtsuka, S. Human Walking Animation based on Foot Reaction Force in the Three-dimensional Virtual World. Journal of Visualization and Computer Animation, 11(1), 3-16, 2000. 38. Waters, R. C. and Barrus, J. W. The Rise of Shared Virtual Environments. IEEE Spectrum, 34(3), 20-25, 1997.
34 Neurocomputing for Certain Bioinformatics Tasks Shubhra Sankar Ray, Sanghamitra Bandyopadhyay, Pabitra Mitra, and Sankar K. Pal Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India {shubhra_r, sanghami,pabitra_r, sankar}@isical .ac . in
Summary. Different bioinformatics tasks like gene sequence analysis, genefinding,protein structure prediction and analysis, gene expression with microarray analysis and gene regulatory network analysis are described along with some classical approaches. The relevance of intelligent systems and neural networks to these problems is mentioned. Different neural network based algorithms to address the aforesaid tasks are then presented. Finally some limitations of the current research activity are provided. An extensive bibliography is included. Key words: biological data mining, soft computing, computational biology, genomics, proteomics, multilayer perceptron, self organizing map
34.1 Introduction Bioinformatics can be viewed as the use of computational methods to make biological discoveries [1]. It is an interdisciplinary field involving biology, computer science, mathematics and statistics to analyze biological sequence data, genome content and arrangement, and to predict the function and structure of macromolecules. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be derived [2]. There are three important sub-disciplines within bioinformatics: a) Development of new algorithms and models to assess different relationships among the members of a large biological data set in a way that allows researchers to access existing information and to submit new information as they are produced b) Analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures; and c) Development and implementation of tools that enable efficient access and management of different types of information. Artificial neural networks (ANN), a biologically inspired technology, is a machinery for adaptation and curve fitting and guided by the principles of biological neural
440
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
network. ANN have been studied for many years with the hope of achieving human like performance, particularly in the field of pattern recognition. They are efficient adaptive and robust classifiers, producing near optimal solutions and achieve high speed via massive parallelism. Therefore, the application of ANN for solving certain problems of bioinformatics, which need optimization of computation requirements, and robust, fast and close approximate solution, appears to be appropriate and natural. Moreover, the errors generated in experiments with bioinformatics data can be handled with the robust characteristics of ANN and minimized during the trainnig process. The problem of integrating ANN and bioinformatics constitutes a new research area. This article provides a survey of the various neural network based techniques that have been developed over the past few years for different bioinformatics tasks. In Section 34.2 we describe the elements of bioinformatics along with their biological basis. In Section 34.3 different bioinformatics tasks are explained. Then we explain the relevance of ANN in bioinformatics in Section 34.4. Different ANN based methods available to address the bioinformatics tasks are explained in Section 34.5 and Section 34.6. Finally, conclusions and some future research directions are presented.
34,2 Elements of Bioinformatics DNA (deoxyribonucleic acid) and proteins are biological macromolecules built as long linear chains of chemical components. DNA strand consists of a large sequence of nucleotides, or bases. For example there are more than 3 billion bases in human DNA sequences. DNA plays a fundamental role in different bio-chemical processes of living organisms in two respects. First it contains the templates for the synthesis of proteins, which are essential molecules for any organism [3]. The second role in which DNA is essential to life is as a medium to transmit hereditary information (namely the building plans for proteins) from generation to generation. The units of DNA are called nucleotides. One nucleotide consists of one nitrogen base, one sugar molecule (deoxyribose) and one phosphate. Four nitrogen bases are denoted by one of the letters A (adenine), C (cytosine), G (guanine) and T (thymine). A linear chain of DNA is paired to a complementary strand. The complementary property stems from the ability of the nucleotides to establish specific pairs (A-T and G-C). The pair of complementary strands then forms the double helix that was first suggested by Watson and Crick in 1953. Each strand therefore carries the entire information and the biochemical machinery guarantees that the information can be copied over and over again even when the "original" molecule has long since vanished. A gene is primarily made up of sequence of triplets of the nucleotides (exons). Introns (non coding sequence) may also be present within gene. Not all portions of the DNA sequences are coding. Coding zone indicates that it is a template for a protein. As an example, for the human genome only 3%-5% of the sequence are coding, i.e., they constitute the gene. There are sequences of nucleotides within the
34 Neurocomputing for Certain Bioinformatics Tasks
441
DNA that are spliced out progressively in the process of transcription and translation. In brief, the DNA consists of three types of non-coding sequences. 1. Intergenic regions: Regions between genes that are ignored during the process of transcription 2. Intragenic regions (or Introns): Regions within the genes that are spliced out from the transcribed RNA to yield the building blocks of the genes, referred to as Exons 3. Pseudogenes: Genes that are transcribed into the RNA and stay there, without being translated, due to the action of a nucleotide sequence.
Intran Jiink
Exon
InliDii
Exon
Exon
Junk
^ • ^
Geiie
IR
Region (IR) Fig. 34.1. Various parts of a DNA
Proteins are made up of 20 different amino acids (or "residues"), which are denoted by 20 different letters of the alphabet. Each of the 20 amino acids is coded by one or more triplets (or codons) of the nucleotides making up the DNA. Based on the genetic code the linear string of DNA is translated into a linear string of amino acids, i.e., a protein via mRNA [3].
34.3 Bioinformatics Tasks The different biological problems studied within the scope of bioinformatics can be broadly classified into two categories: genomics and proteomics which include genes, proteins, and amino acids. We describe below different tasks involved in their analysis along with their utilities. 34.3.1 Gene Sequence Analysis The evolutionary basis of sequence alignment is based on the principles of similarity and homology [4]. Similarity is a quantitative measure of the fraction of two genes which are identical in terms of observable quantities. Homology is the conclusion drawn from data that two genes share a common evolutionary history; no metric is associated with this. The tasks of sequence analysis are as follows:
442
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
Sequence Alignment An alignment is a mutual arrangement of two or more sequences, that exhibits where the sequences are similar, and where they differ. An optimal alignment is one that exhibits the most correspondences and the least differences. It is the alignment with the highest score but may or may not be biologically meaningful. Basically there are two types of alignment methods, global alignment and local alignment. Global alignment [5] maximizes the number of matches between the sequences along the entire length of the sequence. Local alignment [6] gives a highest scoring to local match between two sequences. Gene Finding and Promoter Identification In general DNA strand consists of a large sequence of nucleotides, or bases. Due to the huge size of the database, manual searching of genes, which code for proteins, is not practical. Therefore identification of the genes from the large DNA sequences is an important problem in bioinformatics [7]. A cell mechanism recognizes the beginning of a gene or gene cluster with the help of a promoter. The promoter is a key regulatory sequence before each gene in the DNA that serves as an indication to the cellular mechanism (transcription) that a gene is ahead. For example, the codon AUG (which codes for methionine) also signals the start of a gene. Recognition of regulatory sites in DNA fragments has become particularly popular because of the increasing number of completely sequenced genomes and mass application of DNA chips. 34.3.2 Protein Analysis Proteins are polypeptides, formed within cells as a linear chain of amino acids [8]. Within and outside of cells, proteins serve a myriad of functions, including structural roles (cytoskeleton), as catalysts (enzymes), transporter to ferry ions and molecules across membranes, and hormones to name just a few. There are twenty different amino acids that make up essentially all proteins on earth. Different tasks involved in protein analysis are as follows: Multiple Sequence Alignment In order to characterize protein families, one needs to identify shared regions of homology in a multiple sequence alignment; (this happens generally when a sequence search revealed homologies in several sequences). The clustering method can do alignments automatically but are subjected to some restrictions. Manual and eye validations are necessary in some difficult cases. The most practical and widely used method in multiple sequence alignment is the hierarchical extensions of pairwise alignment methods, where the principal is that multiple alignments is achieved by successive application of pairwise methods. Multiple amino acid sequence alignment techniques [1] are usually performed to fit one of the following scopes:
34 Neurocomputing for Certain Bioinforaiatics Tasks
443
(a) finding the consensus sequence of several aligned sequences; (b) helping in the prediction of the secondary and tertiary structures of new sequences; and (c) providing preliminary step in molecular evolution analysis using phylogenetic methods for constructing phylogenetic trees. Protein Motif Search Protein motif search [7] allows search for a personal protein pattern in a sequence (personal sequence or entry of Gene Bank). Proteins are derived from a limited number of basic building blocks (domains). Evolution has shuffled these modules giving rise to a diverse repertoire of protein sequences, as a result of it proteins can share a global or local relationship. Protein motif search is a technique for searching sequence databases to uncover common domains/motifs of biological significance that categorize a protein into a family. Structural Genomics Structural genomics is the prediction of 3-dimensional structure of a protein from the primary amino acid sequence [9]. This is one of the most challenging tasks in bioinformatics. The four levels of protein structure (Figure 34.2) are (a) Primary structure: the sequence of amino acids that compose the protein, (b) Secondary structure: the spatial arrangement of the atoms constituting the main protein backbone, such as alpha helices and beta strands, (c) Tertiary structure: formed by packing secondary structural elements into one or several compact globular units called domains, and (d) Final protein may contain several polypeptide chains arranged in a quaternary structure.
^^arw^iimtm
^wdrnm
Ji^^m^am
Fig. 34.2. Different levels of protein structures
Sequence similarity methods can predict the secondary and tertiary structures based on homology to known proteins. Secondary structure prediction can be made
444
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
using Chou-Fasman [9], GOR, neural network, and nearest neighbor methods. Methods of tertiary structure prediction methods involve energy minimization, molecular dynamics, and stochastic searches of conformational space. 34.3.3 Gene Expression and Microarrays Gene expression is the process by which a gene's coded information is converted into the structures present and operating in the cell. Expressed genes include those that are transcribed into mRNA and then translated into protein and those that are transcribed into RNA but not translated into protein (e.g., transfer and ribosomal RNAs). Not all genes are expressed and gene expression involves the study of the expression level of genes in the cells under different conditions. Conventional wisdom is that gene products that interact with each other are more likely to have similar expression profiles than if they do not [10]. Microarray technology [11] allows expression levels of thousands of genes to be measured at the same time. Comparison of gene expression between normal and diseased (e.g., cancerous) cells are also done by microarray. There are several names for this technology - DNA microarrays, DNA arrays, DNA chips, gene chips, others. A microarray is typically a glass (or some other material) slide, on to which DNA molecules are attached at fixed locations (spots). There may be tens of thousands of spots on an array, each containing a huge number of identical DNA molecules (or fragments of identical molecules), of lengths from twenty to hundreds of nucleotides. For gene expression studies, each of these molecules ideally should identify one gene or one exon in the genome, however, in practice this is not always so simple due to families of similar genes in a genome. The spots are either printed on the microarrays by a robot, or synthesized by photo-lithography (similarly as in computer chip productions) or by ink-jet printing. Many unanswered, and important, questions could potentially be answered by correctly selecting, assembling, analyzing, and interpreting microarray data. Clustering is commonly used in microarray experiments to identify groups of genes that share similar expressions. It may also help in identifying promoter sequence elements that are shared among genes. 34.3.4 Gene Regulatory Network Analysis Since almost all cells in a particular organism have an identical genome, differences in gene expression and not the genome content are responsible for cell differentiation (how different cell types develop from a fertilized egg) during the life of the organism. In gene regulation (how gene expression is switched on and off) an important role is played by a type of proteins called transcription factors. The transcription factors can attach (bind) to specific parts of the DNA, called transcription factor binding sites (i.e., specific, relatively short combinations of A, T, C or G), which are located in so-called promoter regions. Specific promoters are associated with particular genes
34 Neurocomputing for Certain Bioinformatics Tasks
445
and are generally not too far from the respective genes, though some regulatory effects can be located as far as 30,000 bases away, which makes the definition of the promoter difficult. Transcription factors control gene expression by binding the gene's promoter and either activating (switching on) the gene's transcription, or repressing it (switching it off). Transcription factors are gene products themselves, and therefore in turn can be controlled by other transcription factors. Transcription factors can control many genes, and some (probably most) genes are controlled by combinations of transcription factors. Feedback loops are also possible. Therefore we can talk about gene regulation networks.
34.4 Relevance of Neural Network in Bioinformatics Artificial neural network (ANN) models try to emulate the biological neural network with electronic circuitry. Recently, ANN have found a widespread use for classification tasks and function approximation in bioinformatics. For bioinfprmatics data analysis mainly two types of networks are employed, "supervised" neural networks (SNN) and "unsupervised" neural networks (UNN). The main applications of SNN (e.g. Multilayer perceptron) are function approximation, classification, pattern recognition and feature extraction, and prediction. Moreover, they are able to detect second and higher order correlations in patterns. This is specially important in biological systems, which frequently display a nonlinear behavior. These networks are able to determine the relevant features in unknown data after training (with known data). This principle coined the term "supervised" networks. Correspondingly, "unsupervised" networks (e.g. Kohonen self organizing maps) can be applied to clustering and feature extraction tasks with new (unknown) data. The main characteristics of ANN are: a) b) c) d)
Adaptability to new data/environment Robustness/ ruggedness to failure of components Speed via massive parallelism Optimality w.r.t. error
Let us now explain the functioning of ANN in bioinformatics with an example of protein secondary structure prediction from a linear sequence of amino acids (Figure 34.4). Step 1: In the ANN usually a certain number of input "nodes" are each connected to every node in a hidden layer. Step 2: Every residue in a PDB (Protein Data Bank) entry can be associated to one of the three secondary structures (HELIX, SHEET or neither: COIL). ANN are designed with 21 input nodes (one for each residue including a null residue) and three output nodes coding for each of the three possible secondary structure assignments (HELIX, SHEET and COIL). Step 3: Each node in the hidden layer is then connected to every node in the final output layer.
446
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
Step 4: The input and output nodes are restricted to binary values (1 or 0) when loading the data onto the network during training and the weights are then modified by the program itself during the training process. Step 5: HELIX can be coded as 0,0,1 on the three output nodes; SHEET can be coded as 0,1,0 and COIL as 1,0,0. A similar binary coding scheme can be used for the 20 input nodes for the 20 amino acids. Step 6: To consider a moving window of n residues at a time, input layer should contain 20 x n nodes plus one node at each position for a null residue. Step 7: Each node will "decide" to send a signal to the nodes it is connected to, based on evaluating its transfer function after all of its inputs and connection weights have been summed. Step 8: Over 100 protein structures were used to train the network. Step 9: Training proceeds by holding a particular data constant onto both the input and output nodes and iterating the network in a process that modifies the connection weights until the changes made to them approach zero. Step 10: When such convergence is reached, the network is said to be trained and is ready to receive new (unknown) experimental data. Step 11: Now the connection weights are not changed and the values of the hidden and output nodes are calculated in order to determine the structure of the input sequence of proteins. Selection of unbiased and normalized training data, however, is probably just as important as the network architecture in the design of a successful NN.
TC.
Fig. 34.3. A linear chain of amino acids is applied as input to the ANN
34.5 ANN In Bioinformatics Let us now describe the different attempts made using ANNs in certain tasks of bioinformatics in the broad domains of sequence analysis, structure prediction, and gene analysis described in Section 34.3.
34 Neurocomputing for Certain Bioinformatics Tasks
447
34.5.1 Sequence Alignment Given inputs extracted from an aligned column of DNA bases and the underlying Perkin Elmer Applied Biosystems (ABI) fluorescent traces, AUex et al. [12] trained a neural network to determine correctly the consensus base for the column. They empirically compared five representations; one uses only base calls and the others include trace information. The networks that incorporate trace information into their input representations attained the most accurate results for consensus sequence. Consensus accuracies ranging from 99.26% to 99.98% are acheived for coverages from two to six aligned sequences. In contrast, the network that only uses base calls in its input representation has over double that error rate. GenTHREADER is a neural network architecture that predicts similarity between gene sequences [13]. The effects of sequence alignment score and pairwise potential are the network outputs. GenTHREADER was successfully used for the structure prediction in two cases: case 1: ORF MG276 from Mycoplasma genitalium was predicted to share structure similarity with IHGX; case 2: MG276 shares a low sequence similarity (10% sequence identity) with IHGX. In [14] a molecular alignment method with the Hopfield Neural Network (HNN) is discussed. Other related investigations in sequence analysis are available in [15, 16]. 34.5.2 Gene Finding and Promoter Identification The application of artificial neural networks for discriminating the coding system of eukaryotic genes is investigated in [17]. Over 300 genes different discrimination models are build which are relevant to genes promoter regions, poly(A) signals, splice site locations of introns and noose structures. The results showed that as long as the coding length is definite, the only correct coding region can be chosen from the large number of possible solutions discriminated by neural networks. In [18] the quantitative similarity among tRNA gene sequences was acquired by analysis with an artificial neural network. The evolutionary relationship derived from ANN results was consistent with those from other methods. A new sequence was recognized to be a tRNA-like gene by a neural network on the analysis of similarity. The work of Lukashin et al. [19] is one of the earlier investigations that discussed the problem of recognition of promoter sites in the DNA sequence in neural network framework. The learning process involves a small (of the order of 10%) part of the total set of promoter sequences. During this procedure the neural network develops a system of distinctive features (key words) to be used as a reference in identifying promoters against the background of random sequences. The learning quality is then tested with the whole set. The efficiency of promoter recognition has been reported as 94 to 99% and the probability of an arbitrary sequence being identified as a promoter is 2 to 6%.
448
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
In [20] a multilayered feed-forward ANN architecture is trained for predicting whether a given nucleotide sequence is a mycobacterial promoter sequence. The ANN is used in conjunction with the caliper randomization (CR) approach for determining the structurally/functionally important regions in the promoter sequences. This work shows that ANNs is an efficient tool for predicting mycobacterial promoter sequences and determining structurally/functionally important sub-regions therein. Other related investigations in this field are available in [21, 22, 23]. 34.5.3 Protein Analysis The most successful techniques for prediction of the three-dimensional structure of protein rely on aligning the sequence of a protein of unknown structure to a homologue of known structure. Such methods fail if there is no homologue in the structural database, or if the technique for searching the structural database is unable to identify homologues that are present. The work of Qian et al [24] is one of the earlier investigations that discussed the protein structure prediction problem in neural network framework. After training the ANN with more than 100 X-ray crystal structures of globular proteins, a prediction accuracy of 64% was obtained for secondary structure of non-homologous proteins. Rost et al. [25, 26] took advantage of the fact that a multiple sequence alignment contains more information about a protein than the primary sequence alone. Instead of using a single sequence as input into the network, they used a sequence profile that resulted from the multiple alignments. Design of ANN with bi-directional training and the use of the entire protein sequence as simultaneous input instead of a shifting window of fixed length has led to prediction accuracy above 71%. In [27] a method has been developed using ANN for the prediction of beta-turn types I, II, IV and VIII. For each turn type, two consecutive feed-forward backpropagation networks with a single hidden layer have been used. The first sequenceto-structure network has been trained on single sequences as well as on PSI-BLAST PSSM. The output from the first network along with PSIPRED [28] predicted secondary structure has been used as input for the second-level structure-to-structure network. The networks have been trained and tested on a non-homologous data set of 426 proteins chains by seven-fold cross-validation. The prediction performance for each turn type is improved by using multiple sequence alignment, second level structure-to-structure network and PSIPRED predicted secondary structure information. Wood et al. [29] compared the cascade-correlation ANN architecture [30] with back-propagation ANN using a constructive algorithm and found that cascadecorrelation achieves predictive accuracies comparable to those obtained by backpropagation, in shorter time. Ding et al. [31] used support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers for protein fold recognition, without relying on sequence similarity. Other related investigations in protein structure prediction are available in [32, 33, 34, 35, 36, 37, 38].
34 Neurocomputing for Certain Bioinformatics Tasks
449
34.5.4 Gene Expression and Microarray Most of the analysis of the enormous amount of information provided on microarray chips with regard to cancer patient prognosis has relied on clustering techniques and other standard statistical procedures. These methods are inadequate in providing the reduced gene subsets required for perfect classification. ANNs trained on microarray data from DLBCL lymphoma patients have, for the first time, been able to predict the long-term survival of individual patients with 100% accuracy [39]. Here it is shown that differentiating the trained network can narrow the gene profile to less than three dozen genes for each classification and artificial neural networks are a superior tool for digesting microarray data. Bicciato et al. [40] described computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. The methodology has been tested on two different microarray datasets, acute human leukemia and the human colon adenocarcinoma. Vohradsky [41] used artificial neural networks as a model of the dynamics of gene expression. The significance of the regulatory effect of one gene product on the expression of other genes of the system is defined by a weight matrix. The model considers multigenic regulation including positive and/or negative feedback. The process of gene expression is described by a single network and by two linked networks where transcription and translation are modeled independently. Each of these processes is described by different networks controlled by different weight matrices. Methods for computing the parameters of the model from experimental data are also shown. On the basis of published microarray data, Ando et al. [42] described a fuzzy neural network (FNN) model to analyze gene expression profiling data for the precise and simple prediction of survival of DLBCL patients. From data on 5857 genes, this model identified four genes (CD 10, AA807551, AA805611 and IRF-4) that could be used to predict prognosis with 93% accuracy. Relevant investigations for Gene Expression and Microarray is also available in [43,44, 45, 46, 47]. 34.5.5 Gene Regulatory Network Adaptive double self-organizing map (ADSOM) [48] provides a clustering technique for identifying gene regulatory networks. It has a flexible topology and it performs clustering and cluster visualization simultaneously, thereby requiring no a-priori knowledge about the number of clusters. ADSOM is developed on a technique known as double self-organizing map (DSOM). DSOM combines features of the popular self-organizing map (SOM) with two-dimensional position vectors to provide a visualization tool to decide how many clusters are needed, but its free parameters are difficult to control to guarantee correct results and convergence. ADSOM updates its free parameters during training and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number
450
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
of nodes is greater than the expected number of clusters. The number of clusters can be identified by visually counting the clusters formed by the position vectors after training. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, and mouse. It may be noted that gene regulatory network analysis is a very recent research area, and neural network applications to it are scarce. Using simulated data, Ritchie et al. [49] optimized back propagation neural network architecture using genetic programming to improve the ability of neural networks to model, identify, characterize and detect nonlinear gene-gene interactions in studies of common human diseases. They showed that the genetic programming optimized neural network is superior to the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present.
34.6 Other Bioinformatics Tasks Using ANN Dopazo et al. [50] described a unsupervised growing self-organizing neural network that expands itself following the taxonomic relationships existing among the sequences being classified. The binary tree topology of this neural network, opposite to other more classical neural network topologies, permits an efficient classification of sequences. The growing nature of this procedure allows to stop it at the desired taxonomic level without the necessity of waiting until a complete phylogenetic tree is produced. The time for convergence is approximately a linear function of the number of sequences. This neural network methodology is an excellent tool for the phylogenetic analysis of a large number of sequences. Parbhane et al. [51] utilize an artificial neural network (ANN) for the prediction of DNA curvature in terms of retardation anomaly. ANN captured the phase information and increased helix flexibility. Base pair effects in determining the extent of DNA curvature has been developed. The network predictions validate the known experimental results and also explain how the base pairs affect the curvature. The results suggest that ANN can be used as a model-free tool for studying DNA curvature. Drug resistance is a very important factor influencing the failure of current HIV therapies. The ability to predict the drug resistance of HIV protease mutants may be useful in developing more effective and longer lasting treatment regimens. In [52] a classifier was constructed based on the sequence data of various drug resistant mutants. Self-organizing maps were first used to extract the important features and cluster the patterns in an unsupervised manner. This was followed by subsequent labelling based on the known patterns in the training set. The classifier using the structure information is able to correctly recognize the previously unseen mutants with an accuracy of between 60 and 70%. The method is superior to a random classifier. Neural network computations on DNA and RNA sequences are used in [53] to demonstrate that data compression is possible in these sequences. The result implies
34 Neurocomputing for Certain Bioinformatics Tasks
451
that a certain discrimination should be achievable between structured vs random regions. The technique is illustrated by computing the compressibility of short RNA sequences such as tRNA.
34.7 Conclusions and Discussion The rationale for applying computational approaches to facilitate the understanding of various biological processes are mainly a) to provide a more global perspective in experimental design b) to capitalize on the emerging technology of database-mining - the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms. Neural networks appear to be a very powerful artificial intelligence (AI) paradigm to handle these issues [54]. Other soft computing tools, like fuzzy set theory and genetic algorithms, integrated with ANN [55] may also be used; based on the principles of Case Based Reasoning [56]. Even though the current approaches in biocomputing are very helpful in identifying patterns and functions of proteins and genes, they are still far from being perfect. They are not only time-consuming, requiring Unix workstations to run on, but might also lead to false interpretations and assumptions due to necessary simplifications. It is therefore still mandatory to use biological reasoning and common sense in evaluating the results delivered by a biocomputing program. Also, for evaluation of the trustworthiness of the output of a program it is necessary to understand the mathematical/theoretical background of it to finally come up with a use- and senseful analysis.
References 1. Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, MA (1998) 2. Altman, R.B., Valencia, A., Miyano, S., Ranganathan, S.: Challenges for intelligent systems in biology IEEE Intelligent Systems 16 (2001) 14-20 3. setubal, J., Meidanis, J.: Introduction to Computational Molecular Biology. International Thomson Publishing, 20 park plaza, Boston, MA 02116 (1999) 4. Nash, H., Blair, D., Grefenstette, J.: Comparing algorithms for large-scale sequence analysis. Proc. 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE'Ol) (2001) 89-96 5. Needleman, S.B., Wunsch, CD.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (1970) 443-453 6. Smith, T.F., Waterman, M.S.: Identification of common molecular sequences. Journal of Molecular Biology 147 (1981) 195-197
452
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
7. Fickett, J.W.: Finding genes by computer: The state of the art. Trends in Genetics 12 (1996)316-320 8. Salzberg, S.L., Searls, D.B., Kasif, S.: Computational Methods In Molecular Biology. Elsevier Science, Amsterdam (1998) 9. Chou, P., Fasmann, G.: Prediction of the secondary structure of proteins from their amino acid sequence. Advances in Enzymology 47 (1978) 145-148 10. Luscombe, N.M., Greenbaum, D., Gerstein, M.: What is Bioinformatics? A Proposed Definition and Overview of the Field. Yearbook of Medical Informatics (2001) 83-100 11. Quackenbush, J.: Computational analysis of microarray data. National Review of Genetics 2 (2001) 418^27 12. Allex, C.F., Shavlik, J.W., Blattner, F.R.: Neural network input representations that produce accurate consensus sequences from DNA fragment assemblies. Bioinformatics 15 (1999)723-728 13. Jones, D.T.: GenTHREADER: An Efficient and Reliable Protein Fold Recognition. Journal of Molecular Biology 287 (1999) 797-815 14. Arakawa, M., Hasegawa, K., Funatsu, K.: Application of the novel molecular aHgnment method using the Hopfield Neural Network to 3D-QSAR. J Chem Inf Comput Sci. 43 (2003) 1396-1402 15. Hirst, J.D., Sternberg, M.J.: Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31 (1992) 7211-7218 16. Petersen, S.B., Bohr, H., Bohr, J., Brunak, S., Cotterill, R.M., Fredholm, H., Lautrup, B.: Training neural networks to analyse biological sequences. Trends Biotechnol. 8 (1990) 304-308 17. Cai, Y., Chen, C : Artificial neural network method for discriminating coding regions of eukaryotic genes. Comput Appl Biosci. 11 (1995) 497-501 18. Sun, J., Song, W.Y., Zhu, L.H., Chen, R.S.: Analysis of tRNA gene sequences by neural network. J Comput Biol. 2 (1995) 409-416 19. Lukashin, A.V., Anshelevich, V.V., Amirikyan, B.R., Gragerov, A.I., Frank-Kamenetskii, M.D.: Neural network models for promoter recognition. J Biomol Struct Dyn. 6 (1989) 1123-1133 20. Kalate, R.N., Tambe, S.S., Kulkarni, B.D.: Artificial neural networks for prediction of mycobacterial promoter sequences. Comput Biol Chem. 27 (2003) 555-564 21. Sherriff, A., Ott, J.: Applications of neural networks for gene finding. Adv Genet. 42 (2001) 287-297 22. Reese, M.G.: Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem. 26 (2001) 51-56 23. Mahadevan, I., Ghosh, I.: Analysis of E.coli promoter structures using neural networks. Nucleic Acids Res. 22 (1994) 2158-2165 24. Qian, N., Sejnowski, T.J.: Predicting the secondary structure of globular proteins using neural network models . Journal Molecular Biology 202 (1988) 865-884 25. Rost, B., Sander, C : Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. National Academy of Sciences USA 90 (1993) 7558-7562 26. Rost, B., Sander, C : Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232 (1993) 584-599 27. Kaur, H., Raghava, G.P.: A neural network method for prediction of beta-turn types in proteins using evolutionary information. Bioinformatics. 2004 May 14 (2004) accepted 28. McGuffin LJ, Bryson K, J.D.: The PSIPRED protein structure prediction server. Bioinformatics 16 (2000) 404-405
34 Neurocomputing for Certain Bioinformatics Tasks
453
29. Hirst, M.J.WJ.D.: Predicting protein secondary structure by cascade-correlation neural networks. Bioinformatics 20 (2004) 419-420 30. Pasquier, C, Promponas, V.J., Hamodrakas, S.J.: PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications. Proteins 44 (2001)361-369 31. Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17 (2001) 349-358 32. Berry, E.A., Dalby, A.R., Yang, Z.R.: Reduced bio basis fimction neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Comput Biol Chem. 28 (2004) 75-85 33. Shepherd AJ, Gorse D, T.J.: A novel approach to the recognition of protein architecture fi'om sequence using Fourier analysis and neural networks. Proteins 50 (2003) 290-302 34. Pollastri, G., Baldi, P., Fariselli, P., Casadio, R.: Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics 17 (2001) 234242 35. Lin, K., May, A.C., Taylor, W.R.: Threading using neural nEtwork (TUNE): the measure of protein sequence-structure compatibility. Bioinformatics 18 (2002) 1350-1357 36. Cai, YD., Liu, X.J., Chou, K.C.: Prediction of protein secondary structure content by artificial neural network. J Comput Chem. 24 (2003) 727-731 37. Dietmann, S., Frommel, C: Prediction of 3D neighbours of molecular surface patches in proteins by artificial neural networks. Bioinformatics 18 (2002) 167-174 38. Riis, S., Krogh, A.: Improving Prediction of Protein Secondary Structure using Structured Neural Networks and Multiple Sequence Alignments. Journal of Computational Biology 3 (1996) 163-183 39. O'Neill, M.C., Song, L.: Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics 4 (2003) 13-20 40. Bicciato, S., Pandin, M., Didone, G., Bello, CD.: Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol Bioeng. 81 (2003) 594-606 41. Vohradsky, J.: Neural network model of gene expression. FASEB J. 15 (2001) 846-854 42. Ando, T., Suguro, M., Hanai, T., Kobayashi, T., Honda, H., Seto, M.: Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma. Jpn J Cancer Res. 93 (2002) 1207-1212 43. Sawa, T., Ohno-Machado, L.: A neural network-based similarity index for clustering DNA microarray data. Comput Biol Med. 33 (2003) 1-15 44. Spicker, J.S., Wikman, E, Lu, M.L., Cordon-Cardo, C, Workman, C, ORntoft, T.F, Brunak, S., Knudsen, S.: Neural network predicts sequence of TP53 gene based on DNA chip. Bioinformatics 18 (2002) 1133-1134 45. Herrero, J., Valencia, A., Dopazo, J.: A hierarchical unsupervised growing neural network for clustering gene expression pattems. Bioinformatics 17 (2001) 126-136 46. Software, P.: PNN Technologies. (Pasadena, CA) 47. Liang, Y, Georgre, E.O., Kelemen, A.: Bayesian Neural Network for Microarray Data. Technical Report (Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, U.S.A.) 48. Ressom, H., Wang, D., Natarajan, P.: Clustering gene expression data using adaptive double self-organizing map. Physiol. Genomics 14 (2003) 35-46 49. Ritchie, M.D., White, B.C., Parker, J.S., Hahn, L., Moore, J.H.: Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC Bioinformatics 4 (2003) 28-36
454
S.S. Ray, S. Bandyopadhyay, P. Mitra, and S.K. Pal
50. Dopazo, J., Carazo, J.M.: Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44 (1997) 226-233 51. Parbhane, R.V., Tambe, S., Kulkami, B.D.: Analysis of DNA curvature using artificial neural networks. Bioinformatics 14 (1998) 131-138 52. Draghici, S., Potter, R.B.: Predicting HIV drug resistance with neural networks. Bioinformatics 19 (2003) 98-107 53. Alvager, T., Graham, G., Hutchison, D., Westgard, J.: Neural network method to analyze data compression in DNA and RNA sequences. J Chem Inf Comput Sci. 37 (1997) 335337 54. Pal, S.K., Polkowski, L., Skowron, A.: Rough-neuro Computing: A way of computing with words. Springer, Berlin (2003) 55. Pal, S.K., Mitra, S.: Neuro-fuzzy Pattern Recognition: Methods in Soft Computing Paradigm. John Wiley, NY (1999) 56. Pal, S.K., Shiu, S.C.K.: Foundations of Soft Case Based Reasoning. John Wiley, NY (2004)
35 Rough Set Based Solutions for Network Security* Guoyin Wang, Long Chen, and Yu Wu Institute of Computer Science and Technology Chongqing University of Posts and Telecommunications Chongqing, 400065, RR. China [email protected] Summary. The problem of network security and intrusion detection is discussed atfirst,and then some data-mining-based methods are presented to solve these problems. The problems, possibilities, and methods of data mining solutions for intrusion detection are further analyzed. The art of rough-set-based solutions for network security, and application frame are also discussed. It is shown that rough-set-based method is promising in terms of detection accuracy, requirement for training data set and efficiency. Rough-set-based new techniques such as data reduction, incremental mining, uncertain data mining, and initiative data mining are suggested for intrusion detection systems. Key words: rough set, network security, intrusion detection system, data mining
35.1 Introduction With the rapid growing of interconnections among computer systems, network-based computer systems are playing increasingly vital roles in modem society. They become the target of intrusions by potential intruders. Network security is becoming a major challenge. In order to meet this challenge, intrusion detection systems (IDS) are designed to protect network information systems. Intrusion detection is important in the network security framework. An IDS evaluates suspected intrusions, it signals an alarm once a suspected intrusion happens. An IDS also watches for attacks that originate from within a system. Intrusion detection techniques are generally categorized into anomaly detection and misuse detection. Misuse detection systems use patterns of well-known attacks or weak spots of the system to match and identify known intrusion patterns or signatures. Anomaly detection systems attempt to quantify the usual or acceptable behav*This paper is partially supported by National Natural Science Foundation of P. R. China (No.60373111), Key Science and Technology Research Foundation by the State Education Ministry of R R. China, Application Science Foundation of Chongqing of P. R. China, and Science and Technology Research Program of the Municipal Education Committee of Chongqing ofR R.China.
456
Guoyin Wang, Long Chen, and Yu Wu
iors and flag irregular activities that deviate significantly from the established normal usage profiles as anomalies (i.e. potential intrusions). Another useful classification for intrusion detection systems is according to their data source [1]. Normally, the data source determines the types of intrusions that can be detected. The two general categories are host-based detection and network-based detection. For host-based systems, the data source is from an individual host on the network. In particular, these systems employ their host's operating system audit trail as the main source of input. Because host-based systems directly monitor the host data files and operating system processes, they can determine exactly which host resources are the targets of a particular attack. Recent research in intrusion detection techniques has shifted from user based intrusion detection to process based intrusion detection. Process based intrusion detection tools analyze the behavior of executing processes for possible intrusive activity. The premise of process based intrusion detection is that most computer security violations are made by misusing programs. When a program is misused, its behavior will differ from its normal usage. With the rapid development of computer networks, some traditional single-host based intrusion detection systems have been modified to monitor a number of hosts on a network. They transfer the monitored information from multiple monitored hosts to a central site for processing. These are termed distributed intrusion detection systems, such as IDES [2,3], NSTAT [4], and AAFID [5]. Network-based intrusion detection systems employ network traffic as the main source of input. There are a set of traffic sensors within the network. These sensors perform local analysis, detect suspicious events and report to a central location. Recent developments in network oriented intrusion detection systems have moved the focus from network traffic to the computational infrastructure (the hosts and their operating systems) and the communication infrastructure (the network and its protocols). They use the network as just a source of security-relevant information. Network-based intrusion detection systems have been widened to address large, complex network environments. Examples of this trend include GrIDS (Graph based Intrusion Detection System) [6], EMERALD [7], NetStat [8], and CARDS (Coordinated Attack Response and Detection System) [9]. Intrusion detection systems can be also divided into passive or reactive intrusion detection systems. In a passive system, the IDS detects a potential security breach, logs the information and signals an alert. In a reactive system, the IDS responds to the suspicious activity by logging off a user or reprogramming the firewall to block network traffic from the suspected malicious source.There are several open questions for intrusion detection techniques [10]: •
•
Soundness of approach: Does the approach actually detect intrusions? Is it possible to distinguish anomalies related to intrusions from those related to other factors? Completeness of approach: Does the approach detect most, if not all, intrusions, or is a significant proportion of intrusions undetectable by this method?
35 Rough Set Based Solutions for Network Security
• •
• •
•
457
Timeliness of approach: Can we detect most intrusions before significant damage is done? Choice of metrics, statistical models, and profiles: What metrics, models, and profiles provide the best discriminating power? Which are cost-effective? What are the relationships between certain types of anomalies and different methods of intrusion? System design: How should a system based on the model be designed and implemented? Feedback: What effect should a detection of an intrusion have on the target system? Should an IDS system automatically direct the system to take certain actions? Social implications: How will an intrusion detection system affect the user community it monitors? Will it deter intrusion? Will the users feel their data are better protected? Will it be regarded as a step towards "big brother"? Will its capabilities be misused to that end?
The following criteria should be the goal of intrusion detection systems [11]. • • •
• •
Completeness: All operations in a valid trace should be classified as valid (or normal). Consistency: For every invalid trace (or intrusion trace), a valid access specification should classify at least one operation as bad (or invalid). Compactness: The specification should be concise so that it can be inspected by a human and be suitable for real-time detection. One simple compactness measure is the number of rules (or clauses) in a specification. Predictability: The specification should be able to explain future execution traces with a low false alarm rate. Detectability: The specification should fit closely to the actual valid behavior and reject future execution traces that are intrusions.
In the following, problems and possible solutions of data mining in network security are discussed in section 35.2. Then, rough set based incremental rule generation algorithm is introduced in section 35.3. Several rough-set-based methods for network security are further discussed in section 35.4 in detail. Finally, in section 35.5, we conclude the paper and give some suggestions for future work.
35.2 Data Mining in Network Security Some commonly used techniques in data mining are: artificial neural networks, fuzzy sets, rough sets, decision trees, genetic algorithms, nearest neighbor method, statistics based rule induction, linear regression and linear predictive coding, et al. Most existing information systems have security flaws that render them susceptible to intrusions, penetrations, and other forms of abuse. Finding and fixing all these deficiencies is not feasible for technical and economic reasons. It is not easy to replace existing systems with known flaws with more secure systems because these
458
Guoyin Wang, Long Chen, and Yu Wu
systems have attractive features that are missing in the more-secure systems, or for economic reasons. Developing secure systems is extremely difficult, if not generally impossible. Even the most secure systems are vulnerable to abuses by insiders who misuse their privileges. Thus, the development of real-time intrusion detection systems is needed and important. However, real-time detection of previously unseen attacks with high accuracy and a low false alarm rate is still a challenge. Many recent approaches for intrusion detection have applied data mining techniques, which have been empirically proven effective [10,12]. One major drawback of data mining based approaches is that the data required for training is very expensive to produce [13]. Data mining based IDSs collect data from sensors that monitor some aspect of a system. Sensors may monitor network activity, system calls used by user processes, or file system access. They extract predictive features from the raw data stream being monitored to produce formatted data that can be used for detection. A detection model determines whether the data is intrusive. Algorithms for building detection models are also usually classified into two categories: misuse detection and anomaly detection. Misuse detection models are typically obtained by training on a large set of data in which the attacks have been manually labeled. It is very expensive to produce these data because each piece of data must be labeled as either normal or some particular attack. Anomaly detection models compare sensor data to normal patterns learned from a large amount of training data. The data used for training is required to be purely normal and does not contain any attacks. This data can be very expensive because the process of manually cleaning the data is quite time consuming. Models trained on data gathered from one environment may not perform well in some other environment. This means that in order to obtain the best intrusion detection models, data must be collected from all possible environments in which the intrusion detection system will be deployed. The cost of generating data sets can be very expensive and the cost incurred is a significant barrier to IDS deployment. Because possible malicious behaviors and intruder actions are potentially infinite, it is difficult and impossible to demonstrate all of them from a finite training corpus. Furthermore, the previously unseen attack is often the greatest threat. Finally, for reasons of privacy, it is desirable that a user-based anomaly-detection agent only employs data that originate from the profiled user or are publicly available. Releasing traces of one's own normal behaviors, even to assist the training of someone else's anomaly detector, runs the risk that the data will be abused to subvert the original user's security mechanisms. Thus, in many cases, only positive instances are available for training. Learning from only positive examples presents a challenge for classification, since it can easily lead to overgeneralization. The accuracy of data mining based detection models depends on sufficient training data and suitable feature set. According to the above analysis of intrusion detection systems, we can find that there are many unsolved problems in intrusion detection systems, such as,
35 Rough Set Based Solutions for Network Security • • • • • •
459
How to design the model for describing the characteristics of normal behaviors and abnormal behaviors? How to design the model for describing the characteristics of specific misuse intrusion behaviors? How to reduce the cost of collecting, storing, processing the huge amounts of source data for mining knowledge of intrusion behaviors? How to adjust the intrusion detection systems with the growing of source data at low cost and the changing of users' behaviors? How to implement efficient intrusion detections to reduce the damages caused by intrusions? How to increase the detection rate while decreasing the false positive rate at the same time?
It should be possible to solve the above problems using some specific data mining techniques. Many data mining techniques have been used to model normal and abnormal behaviors, and many kinds of misuse intrusions. The critical issue is how to combine multiple simple detection models into one integrated intrusion detection system. Since it is very expensive to produce the training data required by data mining based intrusion detection systems, rough set [14] might be a potential method to solve the problem of reducing the cost of collecting, storing, processing the huge amounts of source data. Rough set has a unique advantage in data reduction. It can be used to process the data available at present and recommend the importance degrees of different data generated by each sensor for distinguishing each kind of intrusion behaviors from normal behaviors. Thus, we could know what features are important for intrusion detection, and what data is sufficient for each intrusion detection system. We need not to obtain, store, or process unnecessary sensor data. The data obtained from the IDS monitors increases quickly. The behaviors and methods of intruders often vary also. In order to detect these new intrusion behaviors, data mining based intrusion detection systems are required to be updated frequently through mining all collected data, including data mined before and new coming data, again. Generally, data mining processes are time consuming and very expensive. It would be very helpful if incremental data mining method were available. Fortunately, many researchers have developed several data mining algorithms with incremental leaming ability, such as rough set and rule tree based incremental knowledge acquisition algorithm [15]. We need not to mine the data mined before again while mining new coming data. New detection models and knowledge can be extracted from the new coming data and be added into the ones mined from the old data. Thus, the detection ability of an intrusion detection system can grow itself.
35.3 Rough Set Based Incremental Rule Generation Algorithm There are thousands of variations of known attacks and new attacks emerging in network every day. How to update the rule set is a severe problem for every intrusion detection system. One technical solution in classical rough set theory is to
460
Guoyin Wang, Long Chen, and Yu Wu
combine new attack records with original training data as new training data, and then repeat the procedure of rule generation with the new training data. Nevertheless, with the quick growing of training data, it is getting more and more difficult and time-consuming to maintain and update the rule set. To some extent, it seems that the knowledge acquired before is useless, since all knowledge have to be re-studied again even when only one new attack appears. In addition, most works are repetitive and time-consuming. To solve this problem, incremental leaming would be an appropriate choice. Human brain is a typical representation of incremental leaming. For example, when an undergraduate is leaming new knowledge, he needn't leam the knowledge he has already leamed in elementary school and high school again. He can update his knowledge stmcture with new knowledge. An incremental knowledge acquisition algorithm (RRIA) [15] based on rough set and mle tree is introduced to update the mle set of an intmsion detection system. When a new attack type or new variation of attack appears, we need not repeat the procedure of mle generation with the whole training data. The only thing we should do is to generate new mles from new attack records and then combine them with the original mle set. The original mle set is in the following format of mle tree [15]. 1) A mle tree is composed of one root node, some leaf nodes and some intemal nodes. 2) The root node represents the whole mle set. 3) Each path from the root node to a leaf node represents a mle. 4) Each intemal node represents an attribute testing. Each branch represents a possible value of an attribute in a mle set. If an attribute is reduced in some mles, a special branch will be needed to represent it and the value of the attribute in this mle will be supposed to be "*". "*" is different from any possible values of the attribute. When a new mle needs to be added into the mle tree, the following incremental leaming will be done. The processing for more than one record is simply to repeat this process. 1) Begin with root node of a mle tree. 2) Scan the mle tree and find a path that matches this new mle. If there are several such paths, we use some strategies to select one. Then a matched mle is obtained. 3) If the decision value of the matched mle is different from that of the new mle, re-study those data(mles) related to the matched mle. 4) If the new mle is not matched, insert it into the mle tree. Experimental results show that RRIA algorithm has higher speed and equal (or a slightly higher) recognition rate than classical rough set algorithms. Because RRIA algorithm can reduce data records required for rule acquisition and shorten the mntime of updating mle set while maintaining satisfied performance, it is appropriate for processing huge data online.
35 Rough Set Based Solutions for Network Security
461
35.4 Rough Set Based Methods for Network Security 35.4.1 Junk email detection based on rough set Email is very familiar to us. It is very important to eliminate unsolicited emails or junk emails. Rough set based filters can be used to detect junk emails on the Internet. One major technique of it is to build filters in email transfer route. We noticed that many junk email filters hadn't made use of the whole security information in an email, which existed mostly in the junk email header rather than in the text and attachment. Lets have a look at our decision table for email headers which is helpful to judge an email is a junk email or not [16]. Eleven condition attributes and one decision attribute are defined in the following. Condition Attributes: A. Number of "Received" fields. That is the times of email relaying. One "Received" per relay. B. Number of addressees. C. Number of email route interruption. For example, it is a route interruption when the receiver's domain name and IP address in the former "Received" field are different from those in the latter "Received" field; D. Number of mismatch between domain name and its corresponding IP address. This attribute is rather important. Since the dynamic of domain name and limit on network resource, it is difficult to obtain this attribute in our tests. Therefore, its default value is zero in this paper. E. Number of no domain name of sending host after "from" in "Received" field. F. Number of no domain name of receiving host after "by" in "Received" field. G. Number of no IP address of sending host after "from" in "Received" field. H. Whether the original sender address in "From" field is accordant with that in "Received" field. The original sender address is given in the last "Received" field after "from" or "by". I. Whether the destination address in "To"fieldis accordant with that in "Received" field. The latter is the actual receiver. J. If "Delivered-To" field exists, whether it is accordant with "To" field. Its default value is 1 (yes). K. If "Return-Path" field exists, whether it is accordant with "From" field. Its default value is 1 (yes). Decision Attribute: L. Whether junk email or not. Legitimate emails value 1 and junk mails value 2. There are several processes to mine knowledge from a decision table, such as • • •
Preprocessing of data, including dealing with missing attribute values, data discretization. Attribute reduction. Value reduction.
462
Guoyin Wang, Long Chen, and Yu Wu
We obtain some useful junk email detection knowledge from email headers. Our simulation results demonstrate that when mining on selected baleful email corpus, the filter has high efficiency and high identification rate. 35.4.2 IDS Architecture Based on Rough Set A network-based IDS based on rough set theory is developed recently [17]. The system architecture is shown in Fig.35.1. Since in high-speed network, the detection efficiency of a single host will be very low, a distributed framework is adopted. It is mainly composed of Sensor, Protocol Decoder, Rule Generation Component, Intrusion Detection Module, Alarm/Response Plug-in, and Administrator. Their functions are as follows. •
•
•
•
•
•
Sensor is responsible for collecting network data. It is actually an interface to capture the information flowing through a network card on a machine or scan port of a switch. Evidently, its location determines the localization of intrusion detection. For example, intrusion detection can be done on a single machine, a network segment, or a gateway. Protocol Decoder analyzes raw data collected by the Sensor. At the beginning, it reassembles those data according to their protocol. Then, it converts raw packet and connection data into a format so that the rule generation component and intrusion detection module can use. Rule Generation Component integrates rough set theory with rule generation. The whole procedure of rule generation was already discussed in section 35.3. In this component, a rule model is generated in the format of rule tree. Each path from the root node of the rule tree to a leaf node represents a rule. Intrusion Detection Module examines the rule tree generated in Rule Generation Component and matches the incoming raw data with rules. The results of rule matching will be transferred to Alarm/Response Plug-in and the latter will handle them by use of the strategy defined in advance. Alarm/Response Plug-in is responsible for dealing with results from Intrusion Detection Model. When external attacks occur, it will notify the administrator by means of e-mails, console alerts, log entries, or a visualization tool. Administrator is an interface between intrusion detection and users. Users can update the rule set manually by checking the log file, and define the strategy of the detection, etc.
As shown in Fig. 35.1, the system performs its task in two phases, rule training phase and detection phase. In the rule training phase, audit data labeled with attacks are used as training data for rule generation. The output of this phase is a rule tree. Especially, this phase will be executed offline before intrusion detection. In the detection phase, actual detection is implemented through matching incoming network data with the rule tree. The incoming data will be labeled as normal behavior or a certain attack. Based on the detection results, Alarm/Response Plug-in will take action.
35 Rough Set Based Solutions for Network Security
463
AkxuVResponse Module t=d
L z c jAkimLog ^
Rule Set
I
IntiusionDetection Module TT
ModuU Trainiie Data
PioiDcol Decoder
AdztnnbtralDr
I
Sensor? Detection
Rule Training
Network Fig. 35.1. Architecture of IDS
The system is capable of extracting a set of detection rules from network packet features. It is effective and suitable for online intrusion detection with low cost and high efficiency. Simulations on KDDCUP data [18] show that rough set theory is very appropriate for intrusion detection. The incremental rule generation algorithm RRIA introduced in the last section is used to detect new attacks. It can update the rule set quickly and conveniently. Compared with other methods, our method requires a smaller size of training data set and less effort to collect training data. 35.4.3 Other Rough Set Related Methods for Network Security Another method was presented for anomaly intrusion detection with low cost and high efficiency [19]. It extracts detection rules using rough set algorithm from the system call sequences generated during the normal execution of a process and considered as the normal behavior model. It is capable of detecting the abnormal operating status of a process and thus reporting a possible intrusion. Compared with other methods, it requires a smaller size of training data set and less effort to collect training data and is more suitable for real-time detection. Empirical results show that this method is promising in terms of detection accuracy, required training data set and efficiency. Not only rule generation based on rough set theory can be used for network security, but other concept may also be useful. For example, rough inclusion is used for matching of normal behaviors and abnormal behaviors [20]. All the above methods almost get a common conclusion that rough set method is suitable and promising for network security.
464
Guoyin Wang, Long Chen, and Yu Wu
35.5 Conclusions and Future Work We introduce the current status of intrusion detection systems (IDS) and data mining research, discuss some problems in technologies, methods and models of intrusion detection, and present some possible data mining based ways for solving these problems. Rough set based methods with some new techniques like data reduction, incremental mining for network security are discussed. Simulation experiment results show that rough set method is suitable and promising for network security. It is also very expensive to obtain the information from all possible sensors. It would save our time and money if we could mine from partial or incomplete data. In addition, the behaviors of normal computer users and network users differ greatly. Inconsistent data are often generated from detection sensors. Thus, it would be also very important for data mining based intrusion detection systems to deal with uncertain data containing incomplete or inconsistent records. Except for some traditional uncertain data processing techniques like statistical methods, some new techniques for processing uncertain data based on rough set theory were developed in recent years [21,22]. It would be helpful to cope with these problems. Some intrusion actions might not be noticed even if they have occurred before since the limitation of our knowledge about intrusion behaviors, and new intrusion techniques are developed and used by intruders. Some automatic (self, initiative) data mining algorithms driven by data itself [23] would be useful to mine the knowledge of this kind of intrusion actions and improve the detection rate. Our long-term goal is to design and build an intelligent, accurate, and flexible intrusion detection system with distributed and real-time characteristics. The system will have low false negative and false positive rates, not easily fooled by small variations in intrusion patterns. Scalable knowledge acquisition method, uncertain data mining, and initiative data mining, may be integrated into one system to improve its performance in the future.
References 1. Noel,S., Wijesekera, D. and Youman, C.(2002). Modern Intrusion Detection, Data Mining, and Degrees of Attack Guilt, in Applications of Data Mining in Computer Security, Daniel Barbar and Sushil Jajodia (eds.), Kluwer Academic Publishers. 2. Denning, D.E.(1987).An Intrusion-Detection Model, IEEE Transactionson Software Engineering, vol.13, pp.222-232 3. Lunt, T. F. A.(1993). Survey of Intrusion Detection Techniques, Computers and Security, vol. 12 (4), pp. 405-418. 4. Kenmierer, R. A. (1997). NSTAT: A Model-based Real-time Network Intrusion Detection System, University of California Santa Barbara Department of Computer Science, Santa Barbara, CA, Technical Report TR 1997-18. 5. Spafford, E. H. and Zamboni, D.(2000). Intrusion Detection Using Autonomous Agents, Computer Networks, vol. 34, pp. 547-570. 6. Staniford-Chen, S., Cheung, S., Crawford, R., Dilger, M., Frank, J., Hoagland, J., Levitt, K., Wee, C , Yip, R. and Zerkle, D.(1996). GrIDS-A Graph Based Intrusion Detection
35 Rough Set Based Solutions for Network Security
7.
8.
9.
10.
11. 12. 13.
14. 15. 16.
17.
18. 19.
20. 21. 22.
23.
465
System for Large Networks, Proceedings of 19th National Information Systems Security Conference, Baltimore, MD, pp. 361-370. Neumann, P. G. and Porras, P. A.(1999). Experience with EMERALD to Date, Proceedings of First Usenix Workshop on Intrusion Detection and Network Monitoring, Santa Clara, CA, pp. 73-80. Vigna, G. and Kemmerer, R. A.(1998). NetSTAT: A Network-based Intrusion Detection Approach, Proceedings of 14th Annual Computer Security Applications Conference, Phoenix, AZ, pp. 25-34. Yang, J., Ning, P, Wang, X. S.(2000). and Jajodia, S. CARDS: A Distributed System for Detecting Coordinated Attacks, Proceedings of IFIP TCI 1 16th Annual Working Conference on Information Security , pp. 171-180. Warrender, C , Forrest, S. and Pearlmutter, B.(1999). Detecting intrusions using system calls: alternative data models, 1999 IEEE Symposium on Security and privacy, IEEE Computer Socitey, pp. 133-145. Ko, C.(2000). Logic Induction of Valid Behavior Specifications for Intrusion Detection, 2000 IEEE Symposium on Security and Privacy, Berkeley, California, USA, pp. 142-153. Lee, W, Stolfo, S. J. and Mok, K.(1999). A Data Mining Framework for Building Intrusion Detection Models, 1999 IEEE Symposium on Security and Privacy, pp. 120-132. Eskin, E., Miller, M., Zhong, Z. D., Yi, G., Lee, W. A. and Stolfo, S. (2000). Adaptive Model Generation for Intrusion Detection Systems, Proceedings of the ACMCCS Workshop on Intrusion Detection and Prevention, Athens, Greece. Wang, G. Y. (2001).Rough Set Theory and Knowledge Acquisition, Xi'an: Xi'an Jiaotong University Press. Zheng, Z., Wang, G. Y. and Wu, Y.(2003). A Rough Set and Rule Tree Based Incremental knowledge Acquisition Algorithm, LNAI2639, Springer-Verlag. pp. 122-129. Wu, Y, Li, Z. J., Luo, P., Wang, G. Y.(2003). A new anti-Spam filter based on data mining and analysis of email security. Data Mining and Knowledge Discovery: Theory, Tools, and Technology V, pp. 147-154. Li, Z. J., Wu, Y, Wang, G. Y, Hai, Y J., He. Y R(2004). A new framework for intrusion detection based on rough set theory. SPIE Defense and Security Symposium. Orlando, Florida USA . accepted and to appear. http://kdd.ics.uci.edu/databases/kddcup99/ Cai, Z. M., Guan, X. H., Shao, R, Peng, Q. K., Sun, G. J. (2003) A rough set theory based method for anomaly intrusion detection in computer network systems. Expert Systems Vol.20 (5), pp251-259. Li X. J., Huang Y, Huang H. K.,(2003). An Computing Immune Model based on Poisson Procedure and Rough Inclusion. Chinese Journal of Computers. Vol.26 (1), pp.71-76. Wang, G. Y.(2002). Extension of Rough Set under Incomplete Information Systems, 2002 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1098-1103. Wang, G. Y and Liu, F.(2000). The Inconsistency in Rough Set Based Rule Generation, The Second International Conference on Rough Sets and Current Trends in Computing (RSCTC'2000), Canada, pp. 370-377. Wang, G. Y and He, X. (2003).A Self-learning Model under Uncertain Condition, Journal of Software, vol. 14(6), 2003, pp. 1096-1102.
36 Task Assignment with Dynamic Tol^en Generation Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi University of Rome La Sapienza, Dipartimento di Informatica e Sistemistica Via Salaria 113, Rome, Italy [email protected]
Summary. The problem of assigning tasks to a group of agents acting in a dynamic environment is a fundamental issue for a MAS and is relevant to several real world applications. Several techniques have been studied to address this problem, however when the system needs to scale up with size, communication quickly becomes an important issue to address; moreover, in several applications tasks to be assigned are dynamically evolving and perceived by agents during mission execution. In this paper we present a distributed task assignment approach that ensure very low communication overhead and is able to manage dynamic task creation. The basic idea of our approach is to use tokens to represent tasks to be executed, each team member creates, executes and propagates tokens based on its current knowledge of the situation. We test and evaluate our approach by means of experiments using the RoboCup Rescue simulator.
36.1 Introduction The problem of assigning tasks to a group of agents or robots acting in a dynamic environment is a fundamental issue for Multi Agent Systems (MAS) and Multi Robot Systems (MRS) and is relevant to several real world applications. Many techniques have been studied to address this problem in different scenarios, providing solutions that in different ways approximate the optimal solution of the Generalized Assignment Problem (GAP), which consists in assigning a predefined set of tasks (or roles) to a set of agents maximizing an overall utility function that takes into account the capabilities of all the agents in the team. While GAP requires the definition of a static set of tasks, that must thus be known in advance, in many application domains, tasks to be accomplished are not known a priori, but are discovered dynamically during the execution of the mission. Furthermore, when the system needs to scale up with size, communication quickly becomes an important issue to address. The problem of dynamic task assignment has been studied and experimented by many researchers both in MRS (e.g. [3, 16, 10]) and in MAS (e.g. [6, 4, 7, 13]) communities. Several different aspects of the problem have been investigated and several approaches proposed. However, the growing complexity of missions in which
468
Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi
robots and agents are involved pushes toward the development of novel solutions for task assignment, which are able to address the more challenging issues posed by the applications. For example, auction based approaches to task assignment, have been proved to fail in the RoboCup Rescue domain, due to high communication requirements [8]. In this paper we present a distributed task assignment approach that is able to dynamically discover new tasks to be accomplished according to the situation perceived by the agents during the execution of their activities, and to ensure very low communication overhead. We focus on task assignment for teams operating in environments that need to meet (soft) real time constraints in their mission execution, where agents involved have similar functionalities but possibly varied capabilities. The reference scenario we are interested in has the following characteristics: i) the domain and the number of agents involved pose strict constraints on communications; ii) agents may perform one or more tasks, but within resource limits; iii) too many agents fulfilling the same task lead to conflicts that needs to be avoided; iv) tasks are discovered during mission execution. The basic idea of our approach is derived from previous work based on token passing [12]. Tokens are used to represent tasks that must be executed by the agents, and each team member creates, executes and propagates these tokens based on its knowledge of the environment. The basic approach is based on the assumption that one token is associated to every task to be executed and that the token is maintained by the agent that is performing such a task, or passed to another agent if the agent that has the token is not in the condition of performing it. In the case of dynamic discovery of the tasks to be performed and thus of dynamic token generation, the token passing approach must be appropriately extended in order to limit the number of tokens associated to the same task. Indeed, in our reference scenario optimal performance is obtained when there is a limited number of agents cooperating to execute the same task; when too many agents operate on a single task the overall performance decreases, since they ignore other tasks that evolve in a dynamic environment. The algorithm presented in this paper allows every agent to generate tokens dynamically whenever a task to be accomplished is perceived, while limiting the number of tokens associated to the same task and minimizing the bandwidth (i.e. communication messages among agents) required. We test and evaluate our approach by means of experiments on a simulated scenario, that models a team of fire-fighters engaged in fighting fires in a city. To this end, we use the RoboCup Rescue simulator, that models the evolution of fires in the buildings of a city, city traffic, fire-fighters actions of extinguishing fires and communication among them. In this scenario, the location of the fires are not known a priori and the fire-fighter agents find them during their activities; in addition fires may unpredictably spread over adjacent buildings if not extinguished in time. Moreover, communication constraints are very strict, since messages are both limited and costly (in terms of simulation time steps). The results that are reported in this paper show that the proposed extension of the token passing approach provides good performance in this scenario, while maintaining a very low conmiunication bandwidth and thus significantly increasing the scala-
36 Task Assignment with Dynamic Token Generation
469
bility of the system. Therefore, the proposed approach is specifically well-suited for large scale teams operating in dynamic environment, as compared to other dynamic task assignment methods that require a wider communication bandwidth.
36.2 Problem Definition The definition of the problem considered in this paper is derived from the GAP problem [14], which consists in assigning a set of tasks (or roles) i? = { r i . . . r^^} to a set of agents (or entities) E = {ei.. .en} with different capabilities for each task Cap{ei, Tj) € [0,1] (i.e. a reward for the team when agent e^ performs task r^), different resources needed by the agents for performing each task Resources{ei, rj), and the resources available for an agent €{.resources. An allocation matrix A is used for establishing task assignment: aij = 1 if and only if the agent e^ is assigned to task rj. The goal for the GAP problem is to find such an allocation matrix, that maximizes the overall capability function: / ( ^ ) = X^XlC'ap(ei,r^) x a^j i
3
subject to: Vi ^ J Resources{ei,rj)
x a^j < ei.resources
3
i
For example, in the rescue scenario that we have considered in our experiments, tasks are fires to be extinguished and agents are fire fighter brigades. The capability of a fire fighter to extinguish a fire, maybe dependent on several parameters, however a good approximation could be to consider the capability as a function of distance from the fire; clearly, if the nearest fire fighter is allocated to each fire the team gain a reward in terms of total traveled distance and time to extinguish all the fires. Resources are represented by the amount of water needed to put out fires. The above formulation is well defined for a static environment, where agents and tasks are fixed and capabilities and resources do not depend on time. However, in several applications it is useful or even necessary to solve a similar problem where the defined parameters changes with time. For example, in the above mentioned rescue scenario, all the defined parameters clearly depends on time, (e.g. fire fighters capabilities are strongly dependent on the environment evolution). Indeed several methods for dynamic task assignment implicitly take into consideration such an aspect, providing solutions that consider the dynamics of the world and derive a task allocation that approximate solutions of the GAP problem at each time steps (see for example [3, 16, 10, 8]).
470
Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi
The method described in this paper follows the line described above, and aims at solving the GAP problem when the set of tasks R is not known a priori when the mission starts, but it is discovered and dynamically updated during tasks execution. To describe our method we will use the following notation. We denote that the set R depends on time with R{t) = { r i . . . rm{t)}^ where m{t) is the number of tasks considered at time t, and we express the capabilities and the resources depending on time with Cap{ei^rj,t), Resources{ei^rj^t), and ei.resources(t). The dynamic allocation matrix is denoted by Au in which ai^j^t = 1 if and only if the agent e^ is assigned to task Vj at time t. Consequently, the problem definition is to find a dynamic allocation matrix that maximizes the following function m{t) t
i
j=i
subject to: Tn{t)
VtVi y j Resources{si^rj^t)
x aij^t <
ei,resources(t)
VtVjG{0,...,m(t)}^a,,,-t
36.3 Token Generation for Tasks Allocation The main idea of the token passing approach is to regulate access to tasks execution through the use of tokens, i.e. only the agent currently holding the token can execute the task. Following this approach the communication needed to guarantee that each task is performed by one agent at time is dramatically reduced (see [2]). If a task can benefit from the simultaneous execution of several agents, we can decide to create several tokens referring to the same task. However, when tokens are generated and perceived by agents during mission execution conflicts on tasks may arise. In this paper we will deal with two kinds of conflicts: the first one is due to the fact that the same task can be perceived by several agents during the missions, and if no explicit procedure is used the allocation process has no control on the maximum number of agents operating on such a task; this can lead to a consistent waste of resources and result in poor performance. The second type of conflict arises when an agent accomplishes a task and other tokens referring to the same task are still active, causing agents to waste precious time in trying to accomplishing terminated tasks. We explicitly address these problems by proposing an extension to the algorithm presented in [2]. In the following, a Task refers to the physical object or event that the agent perceives and that implies an activity to be executed (e.g. a fire to be extinguished), therefore given a perceived object o we define the related task T{o)\ a Token comprises the physical object related to the task and an identification number, that identifies different tokens for the same task, therefore given a task T{o) we
36 Task Assignment with Dynamic Token Generation
471
may have a number s of tokens TK{o, l)...TK(o, s). The main idea of the proposed algorithm is that when an agent perceives a task, it records this information in a local structure and announces the presence of the task to all its team mates. Only the agent that first perceives a new task (e.g. a fire) creates one ore more tokens for it; conflicts that might arise due to simultaneous perception are addressed and solved as explained later. Whenever, an agent accomplishes a task it announces to the entire team the task termination, and each of the team members removes the tokens referring to the accomplished task from their local structures. Using this approach conflicting tokens can still be created for two main reasons: i) Contemporary task discovery: two agents ei and €2 perceive a new task t, creating a set of tokens Tk{t, l)...Tk{t^ s) exactly at the same time, such that both agents will have different tokens referring to the same task, ii) Messages asynchrony Assume we have three agents ei, 62,63; if ei immediately after the creation of a new set of tokens Tk{t, l)...Tk{t, s) decide to send one of them, say Tk{t,j), to agent 63, this token will not be found in the local structure of ei when the announce messages of 62 arrives and therefore will not be deleted; for 63 we can have two situations: a) the message referring to token Tk{t, j) arrives before the announce message of 62 b) the announce message of 62 arrives before the message referring to token Tk{t,j). In both these situations the token Tk{tJ) will not be deleted, and the conflict will not be solved. Both these problems have been addressed and solved in our approach as explained later in this section. In the algorithm the following data structures are used: i) Known Tasks Set (KTS) is a set containing at each time step all the tasks that has been perceived by all the agents; ii) Token Set (TkS) is the set of tokens each agent currently holds; iii) Temporary Token Set (TmpTkS) is a set containing the tokens created by the agent in the current time step; iv) Accomplished Tasks Set (ATS) is a set containing at each time step all the tasks that have been accomplished by all the agents each of this data structure is local to one agent, v) A message has three fields: type E {announce^ accomplishedTask^ token), task that contains information about the perceived task (e.g. fire position), valid when type is announce or accomplishedTask\ finally the token field is valid only when the message is of type token and contains information about the token (e.g. task position. Id number, visited agents etc.); whenever an agent detects a new task through its perception it adds the new task to the KTS, creates s tokens referring to the task and adds them in the TmpTkS, then it Announce the new task to all its team members (Algorithm 1 OnPercReceived). Each team member when accomplishes a task sends an accomplished message to all its team mates and update its ATS (Algorithm 1 OnTaskAccomplishment). Each team member when receiving a message updates its local structures as explained in Algorithm 1, OnMsgReceived. Whenever a task is perceived, a new token is generated only if that task is not present in the KTS. After tokens have been processed (Algorithm 1, TokenManagement) the TmpTkS is copied in the TkS . Assuming that messages cannot get lost, Algorithm 1 guarantees that when an agent a perceives a task T, that has already been discovered before (i.e. that is present in the KTS), it will not create new tokens for it, correctly assuming that someone else already has the token(s) for T.
472
Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi
Algorithm 1: Procedures for on line token generation ONPERCRECEIVED(tas/c)
(1) (2) (3) (4)
if {task ^KTS) KTS = KTS U task TmpTkS = TmpTkS U T{task, 1) U ... U T{task, s) SEND(Msg(Announce,tasfc))
ONMSGRECEIVED(MSP)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
if Msg.type =—AccomplishedTask ATS = ATS U Msg.task if Msg.type == Announce if {Msg.task 0 KTS) KTS = KTS U {Msg.task} else if Msg.senderld > My Id TmpTkS = TmpTkS \ {Tf^jT{Msg.task if CurrentT ask == Msg.task
J)}
STOPCURRENTTASKO
(11) if Msg.type == Token (12) TkS = TkS U Msg.Token ONTASKACCOMPLISHMENT(tas/c)
(1) (2)
ATS = ATS U task SEND(Msg(AccomplishedTask,ta5/c))
TOKENMANAGEMENTO
(1) (2) (3) (4) (5) (6)
TkS = TkS \ ATS TokenSet = CHOOSETOKENSET(TkS) SendTokenSet = TkS \ TokenSet SEND(Msg(Token,5'endTo/cen5eO) TkS = TkS U TmpTkSet STARTTASK(CHOOSETASK(TokenSet))
Notice that OnPercReceived, OnMsgReceived and OnTaskAccomplishment are asynchronous procedures, triggered by particular events; theoretically all the possible interleaving of their execution could occur, hovv^ever, if we assume that each procedure is atomic (which is a reasonable assumption since no synchronization among agents is involved), we can guarantee that there will never be two tokens referring to the same task in the system for a time longer than the time required for the Announce messages to reach all the team members. In fact, as explained above conflicting tokens may be created in case of Contemporary task discovery or due to Messages asynchrony. The problem of Contemporary task discovery is considered and solved by procedure OnMsgReceived: when agents receive the announce messages the one with a lower static priority, represented in the procedure by the lower Id number, will delete the token for task t from TmpTkS, solving the conflict; if t is already being
36 Task Assignment with Dynamic Token Generation
473
executed by the agent with lower static priority, it will stop its execution yielding to the higher priority agent the possibility to execute the task. The problem arising due to Messages asynchrony is avoided thanks to the distinction between temporary tokens (stored in TmpTkS) and normal tokens (stored in TkS). In fact, assuming that the time needed for an announce message to reach all the agents is less than one simulation step (i.e. assuming that messages are synchronized with agents execution) the use of a Temporary Token Set guarantees that the conflicts will be detected and avoided. Otherwise, a higher communication overhead is needed in order to recover from such conflicts. Setting a static fixed priority among agents can obviously result in non optimal behavior of the team, for example assuming that Cap{ei,rj,ti) > Cap{e2,rj,ti) following the static priority, we yield to the less capable agent the access to the task rj. However, while theoretically the difference among capabilities can be unbounded, generally, when tasks are discovered using perception capabilities agents perceive tasks when they are close to the object location, ( e.g. if two fire fighters perceive the same fire their distance from the fire is comparable) and therefore the loss of performance due to the use of a fixed priority is limited. Once a token has been created and added to the TkS the token-based access to values requires that each agent decides whether to execute the tasks represented by tokens it currenriy has or to pass the tokens on. The token management procedure of Algorithm 1 describes how tokens are processed: each agent erases from its TkS the accomplished task set ATS, then it chooses a set of tokens it can execute (ChooseTokenSet(TkS)). Each agent follows a greedy policy in this decision process, i.e. it tries to maximize its utility given the tokens it currently can access and its resource constraints. However, each agent in its decision consider whether it is in the best interest of the team for it to execute the tasks represented by its tokens. The key question is whether passing the token on will lead to a more capable team member taking on the token. Using probabilistic models of the members of the team and the tasks that need to be assigned, the team member can choose the minimum capability the agent should have in order to take on a token. Each agent sends the remaining tokens to its team mates, following a round robin policy and copies the TmpTkS in the TkS. Finally, each agent chooses the best task (e.g. for fire fighters could be the nearest fire) among the TokenSet it currently has (ChooseTask(TokenSet)) and starts the task execution.
36.4 Experiments and Results We tested our task assignment approach in the RoboCup Rescue environment [5]. RoboCup Rescue provides an ideal simulation environment to test allocation strategies for team comprised of rescue agents. We focus on a real city map of Foligno in Italy [9], so as to test the performance of our approach in a realistic disaster rescue environment and where agents must navigate narrow streets and passages. Here, a team of fire brigades must fight fires in real-time, while facing the uncertainty of fire spreading and the dynamism that arises due to several factors: (i) agent has a limited
474
Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi
view of the world, and do not know in advance fires initial positions (ignition points); (ii) the way fires spread can not be precisely predicted; (iii) agents can be blocked in narrow passages. To show that the algorithm presented in section 36.3 does actually avoid conflicts of both types, we implemented three different kinds of allocation strategies. The first strategy, referred to as Token Passing (TP), is a plain implementation of the token based approach algorithm, no announce procedure is used, but agents record in a Known Fire List the fires they perceive to avoid that different agents create two tokens for the same fire. This strategy does not enforce any constraint on the maximum number of agents simultaneously fighting the same fire. The second strategy, referred to as TP with Announce (TPA-n), makes use of the announce procedure to enforce that no more than n agents are simultaneously fighting the same fire, however this strategy does not address the second kind of conflict type, therefore situations in which agents can try to fight already extinguished fires may arise. The third strategy, referred to as TPA-n with AccomplishedTask (TPAA-n), makes use of the announce and AccomplishedTask messages, avoiding both types of conflicts. In all the strategies the processing token procedure is the same and the capability to execute a task is computed considering the distance between the fire fighting agent and the fire, and whether the agent is blocked in a narrow passage. If an agent is blocked, it sends out the task it is currently executing and choose a different task from its set. The set of tasks to be executed is computed choosing the nearest fire / and keeping up to K fires whose distance from / is lower than a fixed Threshold T. The Threshold T and the number of tokens each agent can retain is statically defined, and is computed considering global information, such as the number of agents involved in the simulation and their distribution on the map. For a detailed discussion on how this static values can be computed we refer to [11]. We tested each strategy in different operative conditions, changing the extinguish power the fire fighting agents have. We start each simulation from the same initial configuration, comprised of 10 fire fighting agents and 18 ignition points distributed as shown in Figure 36.1; In this experiments we assume that messages cannot be lost, and that their delay is not higher than a simulation step (i.e. agent execution is synchronized with message passing), moreover we set the number of tokens to be created for each task to be a fixed number (three in the performed experiments); while it is possible to dynamically change this number during mission execution depending on the environment situation, in these experiments we focus on studying how conflicts influence performance of fire fighting agents, leaving the problem of how many tokens would be needed for each task and how to deal with possible lost messages and unpredictable delays to later investigation. We extracted from the performed experiments the extinguish time, as the time needed to put out all the fires, the number of point to point messages exchanged among agents per time step, the number of broadcast messages sent by agents per time step, the total traveled distance per agent and finally the total number of conflicts, as the number of times during the entire simulation that more than three agents have the same fire as target.
36 Task Assignment with Dynamic Token Generation
475
^ ^ ^ ^ ^
I h(Spi S7jil9li 91 I flfcPoInt ID» 0
Fig. 36.1. Foligno Map used in the experiments
Table 36.1. Results obtained averaging 10 simulations
Ext.Tlme^ Ptp Msg per time step Beast Msg per time step Trav. Dist. per agent Conflicts
TP 67 [0.7] 1.4 [0.04] 0[0] 2495 [198] 26.62 [1.92]
TPA-3
TPAA-3 59.63 [16.6] 50.62 [2.5] 1.8 [0.56] 1.7 [0.053] 0.68 [0.63] 1.63 [0.13] 3201 [527] 2221 [195] 0[0] 0[0]
In Table 36.1 we report results obtained from the simulations performed. Each reported value is the average obtained from ten repetitions of the simulation with the same operative conditions, along with the computed standard deviation (reported between brackets). From the table it is possible to see that the TPAA-3 strategy consistently outperform the TP strategy with a higher but still acceptable amount of messages. IVloreover, the traveled distance for each agent is smaller on average, showing that better results are reached with a smaller waste of resources. The performance of TPA-3 strategy are on average in the middle with respect to TP and TPAA-3, however this strategy is characterized by a very high variance specially regarding the extinguish time and traveled distance. The high variance is due to the fact that the
476
Alessandro Farinelli, Luca locchi, Daniele Nardi, and Fabio Patrizi
strategy does not avoid the second type of conflicts, possibly generating consistent resource wasting. In the performed experiments we have used values for extinguish power ranging from 6000 (water unit per minute) and up to model situations where it is useful that the agents allocation is balanced among the different tasks. Indeed, we found that the similar relationships among strategies hold increasing the extinguish power from 6000 (results reported in table 36.1) to 8000 and 10000.
36.5 Conclusions and Future works Task allocation is a very widely studied area and several approaches have been presented in literature addressing different issues and techniques ranging from forward looking optimal model [8], to market or auction based techniques [16,4], to symbohc matching [15] and Distributed Constrained Optimization Problem based algorithms [6]. However, the growing complexity of application for MAS and MRS requires novel solutions for task assignment, which are able to address specific features posed by the domain, such as dynamic tasks evolution, strict constraints on communication and soft real time constrained to be met. Token based approach have been proved to be well suited for task allocation in such scenario [13, 1], however the specific problems of dynamic token generation and conflicts resolution have not been considered yet. In this paper we take a step in this direction proposing an extension to the token approach able to address this issue while keeping a reasonably low communication overhead. Moreover, we present first experimental results obtained for our approach, showing that it is actually applicable in a rescue scenario and is able to resolve conflicts improving the performances of the rescue teams. Several other issues need to be further addressed, in particular we intend to test our algorithm with different types of rescue teams, such as ambulances or police force. The ambulance case is particularly interesting because it is important to enforce the constraint that only one agent can take care of a civilian, since no further benefit can be given to the team by having more than one ambulance trying to pick up a civilian, therefore we plan to further test our approach with ambulances. When dealing with different forces type constrained tasks comes into play, for example an ambulance agent could need to have a blocked road freed to pick up a civilian by a police agent, and an evaluation of our approach in such situation is particularly interesting. Finally, in our working scenario we assumed that no messages can be lost, this is quite a strong assumption, that can be easily violated in real world applications, therefore an interesting extension of our method will be devoted to explicitly deal with such situations. Acknowledgment This effort was partially funded by the U.S. Air Force European Office Of Scientific Research under grant number 033065 and by project "Simulation and Robotic
36 Task Assignment with Dynamic Token Generation
477
Systems for intervention in emergency scenarios" within program COFIN03 of the Italian MIUR, grant number 2003097252
References 1. A. Farinelli, P. Scerri, and M. Tambe. Allocating and reallocating roles in very large scale teams. In First Int. Workshop on Synthetic Simulation and Robotics to Mitigate Earthquake Disaster, Padua, Italy, July 2003. 2. A. Farinelli, P. Scerri, and M. Tambe. Building large-scale robot systems: Distributed role assignment in dynamic, uncertain domains. In Representation and approaches for time-critical decentralized resources/role/task allocation (AAMAS Workshop), 2003. 3. B. Gerkey and J. M. Mataric. Multi-robot task allocation: Analyzing the complexity and optimality of key architectures. In Proc. of the Int. Conf on Robotics and Automation (ICRAW), Taipei, Taiwan, Sep 14 - 19 2003. 4. L. Hunsberger and B. Grosz. A combinatorial auction for collaborative planning. In Proceedings of the Fourth International Conference on Multi-Agent Systems (ICMAS2000J, pages 151-158, 2000. 5. H. Kitano, M. Asada, Y. Kuniyoshi, I. Noda, E. Osawa, and H. Matsubara. RoboCup: A challenge problem for AI. AI Magazine, 18(l):73-85, Spring 1997. 6. Roger Mailler, Victor Lesser, and Bryan Horling. Cooperative negotiation for soft realtime distributed resource allocation. In Proceedings of AAMAS'03, 2003. 7. P. J. Modi, H. Jung, W. Shen, M. Tambe, and S. Kulkami. A dynamic distributed constraint satisfaction approach to resource allocation. In Proc of Constraint Programming, 2001. 8. R. Nair, T. Ito, M. Tambe, and S. Marsella. Task allocation in robocup rescue simulation domain. In Proceedings of the International Symposium on RoboCup, 2002. 9. D. Nardi, A. Biagetti, G. Colombo, L. locchi, and R. Zaccaria. Realtime planning and monitoring for search and rescue operations in largescale disasters. Technical report. University "La Sapienza" Rome, 2002. http://www.dis.uniromal.it/~rescue/. 10. L. E. Parker. ALLIANCE: An architecture for fault tolerant multirobot cooperation. IEEE Transactions on Robotics and Automation, 14(2):220-240, 1998. 11. P. Scerri, A. Farinelli, S. Okamoto, and M. Tambe. Allocating roles in extreme team. In AAMAS 2004 (Poster), New York, USA, 2004. 12. P. Scerri, A. Farinelli, S. Okamoto, and M. Tambe. Token approach for role allocation in extreme teams: analysis and experimental evaluation. In 13th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2004)., Modena, Italy, 2004. 13. P. Scerri, D. V. Pynadath, L. Johnson, Rosenbloom P., N. Schurr, M Si, and M. Tambe. A prototype infrastructure for distributed robot-agent-person teams. In In Proceedings of AAMAS, 2003. 14. D. Shmoys and E. Tardos. An approximation algorithm for the generalized assignment problem. Mathematical Programming, 62:461-474, 1993. 15. G. Tidhar, A. S. Rao, and E. A. Sonenberg. Guided team selection. In Proceedings of the Second International Conference on Multi-Agent Systems, 1996. 16. R. Zlot, A Stenz, M. B. Dias, and S. Thayer. Multi robot exploration controlled by a market economy. In Proc. of the Int. Conf. on Robotics and Automation (ICRA'02), pages 3016-3023, Washington DC, May 2002.
37
DyKnow: A Framework for Processing Dynamic Knowledge and Object Structures in Autonomous Systems Fredrik Heintz and Patrick Doherty * Department of Computer and Information Science, Linkoping University 581 83 Linkoping, Sweden Summary. Any autonomous system embedded in a dynamic and changing environment must be able to create qualitative knowledge and object structures representing aspects of its environment on theflyfrom raw or preprocessed sensor data in order to reason qualitatively about the environment. These structures must be managed and made accessible to deliberative and reactive functionalities which are dependent on being situationally aware of the changes in both the robotic agent*s embedding and internal environment. DyKnow is a software framework which provides a set of functionalities for contextually accessing, storing, creating and processing such structures. The system is implemented and has been deployed in a deliberative/reactive architecture for an autonomous unmanned aerial vehicle. The architecture itself is distributed and uses real-time CORBA as a communications infrastructure. We describe the system and show how it can be used in execution monitoring and chronicle recognition scenarios for UAV applications.
37.1 Introduction Research in cognitive robotics is concerned with endowing robots and software agents with higher level cognitive functions that enable them to reason, act and perceive in a goal-directed manner in changing, incompletely known, and unpredictable environments. Research in robotics has traditionally emphasized low-level sensing, sensor processing, control and manipulative tasks. One of the open challenges in cognitive robotics is to integrate techniques from both disciplines and develop architectures which support the seamless integration of low-level sensing and sensor processing with the generation and maintenance of higher level knowledge structures grounded in the sensor data. Knowledge about the internal and external environments of a robotic agent is often both static and dynamic. A great amount of background or deep knowledge is required by the agent in understanding its world and in understanding the dynamics *Both authors are supported by grants from the Wallenberg Foundation, Sweden and NFFP 539 COMPAS.
480
Fredrik Heintz and Patrick Doherty
in the embedding environment where objects of interest are cognized, hypothesized as being of a particular type or types and whose dynamics must be continuously reasoned about in a timely manner. This implies signal-to-symbol transformations at many levels of abstraction with different and varying constraints on real-time processing. Much of the reasoning involved with dynamic objects and the dynamic knowledge related to such objects involves issues of situation awareness. How can a robotics architecture support the task of getting the right information in the right form to the right functionalities in the architecture at the right time in order to support decision making and goal-directed behavior? Another important aspect of the problem is the fact that this is an on-going process. Data and knowledge about dynamic objects has to be provided continuously and on-the-fly at the rate and in the form most efficient for the receiving cognitive or reactive robotics functionality in a particular context. Context is important because the most optimal rates and forms in which a robotic functionality receives data are often task and environmentally dependent. Consequently, autonomous agents must be able to declaratively specify and re-configure the character of the data received. How to define a change, how to approximate values at time-points where no value is given and how to synchronize collections of values are examples of properties that can be set in the context. By robotic functionalities, we mean control, reactive and deliberative functionalities ranging from sensor manipulation and navigation to high-level functionalities such as chronicle recognition, trajectory planning, and execution monitoring. The paper is structured as follows. We start with section 37.2 where a larger scenario using the proposed framework is described. In section 37.3, the UAV platform used in the project is briefly described. In section 37.4, DARA, a Distributed Autonomous Robotics Architecture for UAVs is briefly described. DyKnow is an essential module in this architecture. In sections 37.5 and 37.6, the basic structure of the DyKnow framework and the dynamic knowledge and object structures is described. In sections 37.7.2 and 37.7.3, two deliberative functionalities which use the DyKnow framework are considered, chronicle recognition and execution monitoring, in addition to the dynamic object repository (DOR) described in section 37.7.1. We conclude in section 37.8 with a discussion of the role of the DyKnow framework and some related work.
37.2 An Identification and Track Scenario In order to make these ideas more precise, we will begin with a scenario from an unmanned aerial vehicle project the authors are involved in which requires many of the capabilities discussed so far. Picture the following scenario. An autonomous unmanned aerial vehicle (UAV), in our case, a helicopter, is given a mission to identify and track a vehicle with a particular signature in a region of a small city. The signature is provided in terms of color and size (and possibly 3D shape). Assume that the UAV has a 3D model of
37 DyKnow
481
the region in addition to information about building structures and the road system. These models can be provided or may have been generated by the UAV itself. Additionally, assume the UAV is equipped with a GPS and INS^ for navigating purposes and that its main sensor is a camera on a pan/tilt mount. Let's consider the processing from the bottom up, even though in reality, there will be many feedback loops in the UAV architecture. One way for the UAV to achieve its task would be to initiate a reactive task procedure (parent procedure) which calls the systems image processing module with the vehicle signature as a parameter. The image processing module might then try to identify colored blobs in the region of the right size, shape and color as a first step. These object descriptions would have to be sent to a module in the architecture called the dynamic object repository (DOR) which is responsible for the dynamic management of such objects. Each of these vision objects would contain features related to the image processing task such as RGB values with uncertainty bounds, length and width in pixels, position in the image, a sub-image of the object which can be used as a template for tracking, an estimate of velocity, etc. From the perspective of the UAV, these objects are only cognized to the extent that they are moving colored blobs of interest and the feature data being collected should continue to be collected while tracking those objects perceived to be of interest. What objects are of interest? The parent procedure might identify that or those objects which are of interest based on a similarity measure according to size, color and movement. In order to do this, the DOR would be instructed to create one or more world objects and link them to their respective vision objects. At this point the object is cognized at a more qualitative level of abstraction, yet its description in terms of its linkage structure contains both cognitive and pre-cognitive information which must be continuously managed and processed due to the interdependencies of the features at various levels. A world object could contain additional features such as position in a geographic coordinate system rather than the low-level image coordinate. Generating a geographic coordinate from an image coordinate continuously, called co-location is a complex process that involves combining dynamic data about features from several different objects such as the camera object, helicopter object and world objects, together with data from an onboard geographical information system (GIS) module which is also part of the architecture. One would require a computational unit of sorts that takes streamed data as input and outputs a new stream at a higher level of abstraction representing the current geographical coordinate of the object. This colocation process must occur in real-time and continually occur as the world object is tracked. This implies that all features for all dynamic objects linked to the world object in focus have to be continually updated and managed. At this point, the parent task may want to make a comparison between the geographical coordinate and the position of that coordinate in terms of the road system for the region, information of which is stored in the onboard GIS. This indexing ^GPS and INS are acronyms for global positioning system and inertial navigation system, respectively.
482
Fredrik Heintz and Patrick Doherty
mechanism is important since it allows the UAV to reason qualitatively about its spatial surroundings. Let's assume this is done and after some period of tracking and monitoring the stream of coordinates, the parent procedure decides that this looks like a vehicle that is following the road. On-road objects might then be created for each of the world objects that pass the test and linked to their respective world objects. An on-road object could contain more abstract and qualitative features such as position in a road segment which would allow the parent procedure to reason qualitatively about its position in the world relative to the road, other vehicles on the road, and other building structures in the vicinity of the road. At this point, streams of data are being generated and computed for many of the features in the linked object structures at many levels of abstraction as the helicopter tracks the on-road objects. The parent procedure could now use static knowledge stored in onboard knowledge bases and the GIS together with this dynamic knowledge to hypothesize as to the type of vehicle. The hypothesis would of course be based on the linkage structure for an on-road object and various features at different levels of abstraction. Assume the parent procedure hypothesizes that the on-road object is a car. A car object could then be created and linked to the existing linkage structure with additional high-level feature information about the car. Whether or not the sum of streamed data which makes up the linkage structure represents a particular type of conceptual entity will only ever remain a hypothesis which could very well change, based on changes in the character of the streams of data. Monitors, users of these structures, would have to be set up to observe such changes and alert the parent procedure if the changes become too abnormal relative to some criteria determined by the parent procedure. Abnormality is a concept that is well-suited for being reasoned about at a logical level and the streamed data would have to be put into a form amenable to this type of processing. How then can an architecture be set up to support the processes described in the UAV scenario above? This is the main topic of this paper and in it we propose a software system called the DyKnow Framework.^
37.3 The WITAS UAV Platform The WITAS"* Unmanned Aerial Vehicle Project [1, 2] is a long-term basic research project whose main objectives are the development of an integrated hardware/software VTOL (Vertical Take-Off and Landing) platform for fully-autonomous missions and its future deployment in applications such as traffic monitoring and surveillance, emergency services assistance, photogrammetry and surveying. The WITAS Project UAV platform we use is a slightly modified Yamaha RMAX (Fig. 37.1). It has a total length of 3.6 m (including main rotor), a maximum take-off ^"DyKnow" is pronounced as "Dino" in "Dinosaur" and stands for Dynamic Knowledge and Object Structure Processing. "^WITAS (pronounced vee-tas) is an acronym for the Wallenberg Information Technology and Autonomous Systems Laboratory at Linkoping University, Sweden.
37 DyKnow
483
weight of 95 kg, and is powered by a 21 hp two-stroke engine. Yamaha equipped the radio controlled RMAX with an attitude sensor (YAS) and an attitude control system (YACS).
Fig. 37.1. The WITAS RMAX Helicopter The hardware platform consists of three PC 104 embedded computers (Fig. 37.2). The primary flight control (PFC) system consists of a PHI (700Mhz) processor, a wireless Ethernet bridge and the following sensors: a RTK GPS (serial), and a barometric altitude sensor (analog). It is connected to the YAS and YACS (serial), the image processing computer (serial) and the deliberative computer (Ethernet). The image processing (IP) system consists of a second PC 104 embedded computer (PHI 700MHz), a color CCD camera (S-VIDEO, serial interface for control) mounted on a pan/tilt unit (serial), a video transmitter (composite video) and a recorder (miniDV). The deliberative/reactive (D/R) system runs on a third PC 104 embedded computer (PHI 700MHz) which is connected to the PFC system with Ethernet using CORBA event channels. The D/R system is described in more detail in the next section. For further discussion, it is important to note that computational processes are executed concurrently on distributed hardware. Data flow is both synchronous and asynchronous and the concurrent distributed nature of the hardware platform contributes to diverse latencies in data flow throughout the system.
37.4 DARA: A Distributed Autonomous Robotics Architecture The DARA system [3] consists of both deliberative and reactive components which interface to the control architecture of the primary flight controller (PFC). Current flight modes include autonomous take-off and landing, pre-defined and dynamic trajectory following, vehicle tracking and hovering. We have chosen real-time
484
Fredrik Heintz and Patrick Doherty 700Mhz PIII/256Mbram/500Mbflash
LINUX
Camera Platform
700Mhz PIII/256Mbram/500Mbflash IPAPI ^a^rabberj •C RS232
RMAX Helicopter Platform
[preprocessor] RtLlNUx
700Mhz PIII/256Mbram/256Mbflash
Fig. 37.2. DARA Hardware Schematic CORBA [4]^ as a basis for the design and implementation of a loosely coupled distributed software architecture for our aerial robotic system. The communication infrastructure for the architectures is provided by CORBA facilities and services. Fig. 37.3 depicts an (incomplete) high-level schematic of some of the software components used in the architecture. Each of these may be viewed as a CORBA server/client providing or requesting services from each other and receiving data and events through both real-time and standard event channels. The modular task architecture (MTA) which is part of DARA is a reactive system design in the procedure-based paradigm developed for loosely coupled heterogeneous systems such as the WITAS aerial robotic system. Reactive behaviors are implemented as task procedures (TP) which are executed concurrently and essentially event-driven. A TP may open its own (CORBA) event channels, and call its own services (both CORBA and application-oriented services such as path planners) including functionalities in DyKnow.
^We are currently using TAG/ACE. The Ace Orb is an open source implementation of CORBA 2.6.
37 DyKnow Task Planner Service
Helicopter Controller
Path Planner Service ^
I Task Procedure Execution I ...Module (TPEM). I ('TPf: • . . • < ('tPrt
v....^-::i Physical Camera Controller
Chronicle Recognition Service
Image Controller
Prediction Service Qualitative |Signal Processing Controller
J:^!:^^::! IPAPI ( IPAPI Runtime ) Image Processing Module (IPM)
485
Geographical Data L Repository Knowledge Repository Dynamic Object L Repositorv
Fig. 37.3. DARA Software Schematic
37.5 DyKnow Given the distributed nature of both the hardware and software architectures in addition to their complexity, one of the main issues is getting data to the right place at the right time in the right form and to be able to transform the data to the proper levels of abstraction for use by high-level deliberative functionalities and middle level reactive functionalities. DyKnow is designed to contribute to achieving this. Ontologically, we view the external and internal environment of the agent as consisting of entities representing physical and non-physical objects, properties associated with these entities, and relations between entities. We will call such entities objects and those properties or relations associated with objects will be called features. Features may be static or dynamic and parameterized with objects. Due to the potentially dynamic nature of a feature, that is, its ability to change value through time, 2ifluentis associated with each feature. A fluent is a function of time whose range is the feature's type. For a dynamic feature, the fluent values will vary through time, whereas for a static feature the fluent will remain constant through time. Some examples of features would be the estimated velocity of a world object, the current road segment of an on-road object, and the distance between two car objects. Each fluent associated with these examples implicitly generates a continuous stream of time tagged values of the appropriate type. Additionally, we introduce locations, policies, computational units and fluent streams which refer to aspects of fluent representations in the actual software architecture. A location is intended to denote any pre-defined physical or software location that generates feature data in the DARA architecture. Some examples would be onboard or offboard databases, CORBA event channels, physical sensors or their device interfaces, etc. In fact, a location will be used as an index to reference a representational structure associated with a feature. This structure denotes the process which implements the fluent associated with the feature. A fluent implicitly represents a stream of data, 2ifluentstream. The stream is continuous, but can only ever be approximated in an architecture. A policy is intended to represent a particular contextual window or filter used to access a fluent. Particular functionalities in the architecture may need to sample the stream at a particular rate or interpolate values
486
Fredrik Heintz and Patrick Doherty
in the stream in a certain manner. Policies will denote such collections of constraints. Computational units are intended to denote processes which take fluent streams as input, perform operations on these streams and generate new fluent streams as output. Each of these entities are represented either syntactically or in the form of a data structure within the architecture and many of these data structures are grounded through sensor data perceived through the robotic agent's sensors. In addition, since declarative specifications of both features and policies that determine views of fluent streams are Ist-class citizens in DyKnow, a language for referring to features, locations, computational units and policies is provided, see [5] for details. One can view DyKnow as implementing a distributed qualitative signal processing tool where the system is given the functionality to generate dynamic representations of parts of its internal and external environment in a contextual manner through the use of policy descriptors and feature representation structures. The dynamic representations can be viewed as collections of time series data at various levels of abstraction, each time series representing a particular feature and each bundle representing a particular history or progression. Another view of such dynamic representations and one which is actually put to good use is to interpret the fluent stream bundles as partial temporal models in the logical sense. These partial temporal models can then be used on the fly to interpret temporal logical formulas in TAL (temporal action logic) or other temporal formalisms. Such a functionality can be put to good use in constructing execution monitors, predictive modules, diagnostic modules, etc. The net result is a very powerful mechanism for dealing with a plethora of issues associated with focus of attention and situational awareness.
37.6 Dynamic Object Structure in DyKnow An ontologically difficult issue involves the meaning of an object. In a distributed architecture such as DARA, information about a specific object is often distributed throughout the system, some of this information may be redundant and it may often even be inconsistent due to issues of precision and approximation. For example, given a car object, it can be part of a linkage structure which may contain other objects such as on-road, world and vision objects. For an example of a linkage structure see Fig. 37.4. In addition, many of the features associated with these objects are computed in different manners in different parts of the architecture with different latencies. One candidate definition for an object could be the aggregate of all features which take the object as a parameter for each feature. But an object only represents some aspects of an entity in the world. To represent that several different objects actually represent the same entity in the world, links are created between those objects. It is these linkage structures that represent all the aspects of an entity which are known to the UAV agent. It can be the case that two linkage structures in fact represent the same entity in the world but the UAV agent is unable to determine this. Two objects may even be of the same type but have different linkage structures associated with them. For example, given two car objects, one may not have an on-road
37 DyKnow
487
object, but an off-road object, as part of its linkage structure. It is important to point out that objects as intended here have some similarities with OOP objects, but many differences.
VisionObJect
#2 ^
J
WorldObject #3 L
J
OnRoadObject #5
^
^
> L
CarObject
#7 >
Fig. 37.4. An example object linkage structure
To create and maintain these object linkage structures we use hypothesis generation and validation. Each object is associated with a set of possible hypotheses. Each possible hypothesis is a relation between two objects associated with constraints between the objects. To generate a hypothesis, the constraints of a possible hypothesis must be satisfied. Two different types of hypotheses can be made depending on the types of the objects. If the objects have different types then a hypothesis between them is represented by a link. If they have the same type then a hypothesis is represented by a codesignation between the objects. Codesignations hypothesize that two objects representing the same aspect of the world are actually identical, while a link hypothesizes that two objects represent different aspects of the same entity. A link can be hypothesized when a reestablish constraint between two existing objects is satisfied or an establish constraint between an object and a newly created object is satisfied. In the anchoring literature these two processes are called reacquire and find [6]. Since the UAV agent can never be sure its hypotheses are true, it has to continually verify and validate them against its current knowledge of the world. To do this, each hypothesis is associated with maintenance constraints which should be satisfied as long as the hypothesis holds. If the constraints are violated then the hypothesis is removed. The maintenance and hypothesis generation constraints are represented using the linear temporal logic (LTL) with intervals [7] and are checked using the execution monitoring module which is part of the DyKnow framework. For a more detailed description see [8].
37.7 Applications using DyKnow In the following subsections, we will show how the DyKnow framework can be used to generate fluent streams for further processing by two important deliberative functionalities in the DARA system, chronicle recognition and execution monitoring. Both are implemented in the UAV system. Before doing this, we provide a short description of the Dynamic Object Repository (DOR), an essential part of the DARA which uses the DyKnow framework to provide other functionalities in the system with information about the properties of dynamic objects most often constructed from sensor data streams.
488
Fredrik Heintz and Patrick Doherty
37.7.1 The Dynamic Object Repository The Dynamic Object Repository (DOR) is essentially a soft real-time database used to construct and manage the object linkage structures described in section 37.6. The DOR is implemented as a CORBA server and the image processing module interfaces to the DOR and supplies vision objects. Task procedures in the MTA access feature information about these objects via the DyKnow framework, creating descriptors on-the-fly and constructing linkages. Computational units are used to provide values for more abstract feature properties associated with these objects. For example, the co-location process involving features from the vision, helicopter and camera objects, in addition to information from the GIS, use computational units to output geographical coordinates. These are then used to update the positional features in world objects linked to the specific vision objects in question. Objects are referenced via unique symbols which are created by the symbol generation module which is part of the DOR. Each symbol is typed using pre-defined domains such as car, world-object, vision-object, vehicle, etc. Symbols can be members of more than one domain and are used to instantiate feature representations and as indexes for collecting information about features which take these symbols as arguments. Since domains collect symbols which reference a certain type of object, one can also conveniently ask for information about collections or aggregates of objects. For example, "take all vision objects and process a particular feature for each in a certain manner". 37.7.2 An Application to Clironicle Recognition Chronicles are used to represent complex occurrences of activity described in terms of temporally constrained event structures. In this context, an event is defined as a change in the value of a feature. For example, in a traffic monitoring application, a UAV might fly to an intersection and try and identify how many vehicles turn left, right or drive straight through a specific intersection. In another scenario, the UAV may be interested in identifying vehicle overtaking. Each of these complex activities can be defined in terms of one or more chronicles. In the WITAS UAV, we use the CRS chronicle recognition system developed by France Telecom. CRS is an extension of IxTeT [9]. Our chronicle recognition module is wrapped as a CORBA server. As an example, suppose we would like to recognize vehicles passing through an intersection. Assume cars are being identified and tracked through the UAV's camera as it hovers over a particular intersection. Recall that the DOR generates and maintains linkage structures for vehicles as they are identified and tracked. It can be assumed that the following structured features exist: pos = positJon(DOR, policyl, carl) roadseg = road-segment(DOR, roadSegment(pos), policy2, carl) incross = in_crossing(DOR, inCrossing(roadseg), policyS, carl)
pos is a feature of a car object and its fluent stream can be accessed via the DOR as part of its linkage structure, roadseg is a complex feature whose value
37 DyKnow
489
is calculated via a computational unit roadSegment which takes the geographical position of a world object associated with the car object as argument and uses this as an index into the GIS to return the road segment that the vehicle is in. Similarly, incross is a complex feature whose value is produced using a computational unit that takes the roadseg fluent stream as input and returns a boolean output stream, representing whether the car is in a crossing or not, calculated via a lookup in the GIS. For the sake of brevity, a car is defined to pass through an intersection if its road segment type is not a crossing then it eventually is in a road segment that is a crossing and then it is again in a road segment that is not a crossing. In this case, if the fluent stream generated by incross generates samples going from false to true and then eventually true to false within a certain time frame then the car is recognized as passing through a crossing. The chronicle recognition system would receive such streams and recognize two change events which match its chronicle definition and thereby recognize that the car has passed through the crossing. The stream itself requires some modification and policyS specifies this via a monotonic time constraint and a change constraint. The monotonic time constraint would make sure the stream is ordered, i.e. the time stamp of events increase monotonically. The change constraint specifies how change is defined for this stream. There are several alternatives which can be used: • • •
any change policy - any difference between the previous and current value is a change; absolute change policy - an absolute difference between the previous and current value larger than a parameter delta is a change; relative change policy - a. normalized difference between the previous and current value larger than a parameter delta is a change.
There are obvious variations on these policies for different types of signal behavior. For example, one might want to deal with oscillatory values due to uncertainty of data, etc. The example used above is only intended to provide an overview as to how DyKnow is used by other modules and is therefore simplified. 37.7.3 An Application to Execution Monitoring The WITAS UAV architecture has an execution monitoring module which is based on the use of a temporal logic, LTL (linear temporal logic with intervals [7]), which provides a succinct syntax for expressing highly complex temporal constraints on activity in the UAV's internal environment and even aspects of its embedding environment. For example safety and liveness conditions can easily be expressed. Due to page limitations we can only briefly describe this functionality. Essentially, we appeal to the intuitions about viewing bundles of fluent streams as partial models for a temporal logic and evaluating formulas relative to this model. In this case though, the model is fed piecewise (state-wise) to the execution monitor via a state extraction mechanism associated with the execution monitor. A special progression algorithm [7] is used which evaluates formulas in a current state and retums a new formula which if true on the future states would imply that the formula is true for the complete time-line being generated.
490
Fredrik Heintz and Patrick Doherty
The DyKnow system is ideal for generating such streams and feeds these to the execution monitor. Suppose we would like to make sure that two task procedures (all invocations) in the reactive layer of the DARA, called A and B, can never execute in parallel. For example, A and B may both want to use the camera resource. This safety condition can be expressed in LTL as the temporal formula always-i(3x3y tp_name[x]="A" A tp_running[x]=true A tp_name[y]="B" A tp-runing[y]=true), where "always" in the formula is the modal operator for "at all times". To monitor this condition the execution monitor requires fluent streams for each of the possible instantiations of the parameterized features tp_name and tp_running which can be generated by the reactive layer of the DARA. These are fed to the instantiated execution monitor which applies the progression algorithm to the temporal formula above relative to the fluent streams generated via the DyKnow framework. This algorithm is run continuously. If the formula evaluates to false at some point, an alert message is sent to a monitor set up by the functionality interested in this information and modifications in the system configuration can be made.
37.8 Related Work The DyKnow framework is designed for a distributed, real-time and embedded environment [10, 11] and is developed on top of an existing middleware platform, realtime CORBA [12], using the real-time event channel [13], the notification [14] and the forthcoming real-time notification [15] services. One of the purposes for this work is in the creation of a knowledge processing middleware capability, i.e. a framework for interconnecting different knowledge representation and reasoning services, grounding knowledge in sensor data and providing uniform interfaces for processing and management of generated knowledge and object structures. The framework is quite general and is intended to serve as a platform for investigating a number of pressing issues associated with the processing and use of knowledge on robotic platforms with soft and hard real-time constraints. These issues include anchoring, or more generally symbol grounding, signal to symbol transformations, information fusion, contextual reasoning, and focus of attention. Examples of application services which use the middleware capabilities are execution monitoring services, anchoring services and chronicle recognition services. We are not aware of any similar frameworks, but the framework itself uses ideas from many diverse research areas mainly related to real-time, active, temporal, and time-series database [16, 17, 18], data stream management [19, 20, 21], and work in the area of knowledge representation and reasoning. The main differences between DyKnow and the database and data stream approaches are that we have a different data model based on the concepts of features and fluents and we have many views or representations of the same feature data in the system each with different properties depending on the context where the feature is used as described by a policy.
37 DyKnow
491
References 1. Doherty, P.: Advanced research with autonomous unmanned aerial vehicles. In: Proceedings on the 9th International Conference on Principles of Knowledge Representation and Reasoning. (2004) 2. Doherty, P., Granlund, G., Kuchcinski, K., Sandewall, E., Nordberg, K., Skarman, E., Wiklund, J.: The WITAS unmanned aerial vehicle project. In: Proceedings of the 14th European Conference on Artificial Intelligence. (2000) 747-755 3. Doherty, P., Haslum, P., Heintz, P., Merz, T., Nyblom, P., Persson, T., Wingman, B.: A distributed architecture for autonomous unmanned aerial vehicle experimentation. In: Proceedings of the 7th International Symposium on Distributed Autonomous Robotic Systems. (2004) 4. Object Computing, Inc.: TAO Developer's Guide, Version 1.3a (2003) See also h t t p : //www.cs . w u s t l . e d u / ~ Schmidt/TAO. h t m l . 5. Heintz, E, Doherty, P.: DyKnow: An approach to middleware for knowledge processing. Journal of Intelligent and Fuzzy Systems (2004) 6. Coradeschi, S., Saffiotti, A.: An introduction to the anchoring problem. Robotics and Autonomous Systems 43 (2003) 85-96 7. Lamine, K.B., Kabanza, E: Reasoning about robot actions: A model checking approach. In: Advances in Plan-Based Control of Robotic Agents. LNAI (2002) 123-139 8. Heintz, E, Doherty, P.: Managing dynamic object structures using hypothesis generation and validation. In: Proceedings of the AAAI Workshop on Anchoring Symbols to Sensor Data. (2004) 9. Ghallab, M.: On chronicles: Representation, on-line recognition and learning. In: Proceedings of the International Conference on Knowledge Representation and Reasoning (KR-96). (1996) 10. Schmidt, D.C.: Adaptive and reflective middleware for distributed real-time and embedded systems. Lecture Notes in Computer Science 2491 (2002) 282-?? 11. Schmidt, D.C.: Middleware for real-time and embedded systems. Communications of the ACM 45 (2002) 43-48 12. Schmidt, D.C., Kuhns, E: An overview of the real-time CORBA specification. IEEE Computer 33 (2000) 56-63 13. Harrison, T., Levine, D., Schmidt, D.C.: The design and performance of a real-time CORBA event service. In: Proceedings of the ACM SIGPLAN Conference on ObjectOriented Progranmiing Systems, Languages and Applications (OOPSLA-97). Volume 32, 10 of ACM SIGPLAN Notices., New York, ACM Press (1997) 184-200 14. Gruber, R., Krishnamurthy, B., Panagos, E.: CORBA notification service: Design challenges and scalable solutions. In: 17th International Conference on Data Engineering. (2001) 13-20 15. Gore, P., Schmidt, D.C., Gill, C , Pyarali, I.: The design and performance of a real-time notification service. In: Proc. of the 10th IEEE Real-time Technology and Application Symposium. (2004) 16. Eriksson, J.: Real-time and active databases: A survey. In: Proc. of 2nd International Workshop on Active, Real-Time, and Temporal Database Systems. (1997) 17. Ozsoyoglu, G., Snodgrass, R.T.: Temporal and real-time databases: A survey. IEEE Trans. Knowl. Data Eng. 7 (1995) 513-532 18. Schmidt, D., Dittrich, A.K., Dreyer, W., Marti, R.W.: Time series, a neglected issue in temporal database research? In: Proceedings of the International Workshop on Temporal Databases, Springer-Verlag (1995) 214-232
492
Fredrik Heintz and Patrick Doherty
19. Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C , Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. VLDB Journal (2003) 20. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of 21st ACM Symposium on Principles of Database Systems (PODS 2002). (2002) 21. Group, T.S.: STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26(1) (2003)
38 Classifier Monitoring using Statistical Tests Rafal Latkowski^'^ and Cezary Glowinski^ ^ SAS Institute ul. Gdanska 27/31,01-633 Warszawa, Poland [email protected] ^ Warsaw University, Institute of Computer Science ul. Banacha 2, 02-097 Warszawa, Poland [email protected]
Summary. This paper is addressed to methods for early detection of classifier fall-down phenomenon, what gives a possibility to react in advance and avoid making incorrect decisions. For many applications it is very essential that decisions made by machine learning algorithms were as accurate as it is possible. The proposed approach consists in applying a monitoring mechanism only to results of classification, what not cause an additional computational overhead. The empirical evaluation of monitoring method is presented based on data extracted from simulated robotic soccer as an example of autonomous agent domain and synthetic data that stands for standard industrial application.
38.1 Introduction The achievements of machine learning make it possible to apply it to many areas. Predictive models and built with their help classifiers not only enable us to create autonomous agents, but are commonly used also in business and industry. It is very essential that decisions made by machine learning algorithms were as accurate as it is possible. In other case they cannot achieve the expected targets, wherever applied: to marketing, to industry or in autonomous systems. Generally speaking the correctness of the decision making strictly depends on the accuracy of applied classifier. Obviously, the accuracy of the classifier is measured during the training phase. While creating the predictive model we select for deployment the model that achieves the highest accuracy and stability measured over prepared test data sets. Such verification is not possible during the productive life cycle of classifier, when it is applied to the real data gathered in dynamic and nondeterministic environment. The question that arises from such a situation is how we can trust the results of classifier? The first phenomenon that makes it doubtful to trust the classifier is that every natural process is evolving in time, e.g., customers are learning other offer and products, machines are changing their physical parameters and autonomous agents learn new strategies, what is frequently described as "concept drift" (see, e.g., [7]). It is known fact, that the classification results are continuously getting weaker and such a process is called ageing of the model. Usually the process of model ageing is slow and the reporting
494
Rafal Latkowski and Cezary Glowinski
is employed to identify it in a posteriori process, when the actual decision is known. The actual value of the decision is known not exactly at the same point of time when the classification is made, but dependently on the application, from fraction of second up to several months after the classification. The second phenomenon is sudden change of process of the revolutionary character, e.g., introduction of completely new product on market, machine failure or reprogramming autonomous agent with new meta-strategy of learning. The sudden classifier ageing or classifier fall-down phenomenon can be a consequence of many circumstances, even errors or changes in data preprocessing. It is a very dangerous phenomenon because it result in making wrong decision for a period of time (a couple of months in worst case), what can result in severe losses. To better express the necessity of the classifier monitoring let take some examples. The first example is related to autonomous agents. The open research community concentrated on the robotic soccer and RoboCup world championships has an aim to compete by the 2050 a human team of soccer players with a team of autonomous humanoid robots (see, e.g., [4]). Many research groups build software simulators or hardware robots for achieve this goal. Such an artificial soccer player should have special classifier that recognizes the strategy of opponent. This classifier can be misled by opponent that is completely reprogrammed or comes from newly created team. In such a situation classifier fall-down phenomenon can result in losing the game. The second example comes from business application. The telecommunication operators collect a lot of data on their customers. This data is used, e.g., to avoid the customer resignations by predicting them in advance. Such systems for customer retention are suffering from classifier fall-down phenomenon, e.g., when completely new categories of products are introduced. With false prediction the marketing campaigns are directed not to the desired target group. In this case reduced accuracy results in measurable losses even comparing to the case without classifier at all. This paper is addressed to methods for early detection of classifier fall-down phenomenon, what gives a possibility to react in advance and avoid making incorrect decisions. The proposed method consists in applying a monitoring mechanism only to results of classification, what not cause an additional computational overhead. The paper is organized as follows. In next Section the classifier monitoring method is described. Section 38.3 provides empirical evaluation with detailed description of the data sets and experiments. Section 38.4 contain final conclusions and remarks.
38.2 Method description 38.2.1 Motivation The initial idea on how to monitor a classifier could be checking the distributions of variables that are used to make the decision (predictors). In such an approach all variables are independently tested before classification is performed. This approach can be applied only to the cases, when the distribution of one variable is significantly different in training and test set. If the distribution is changing on more than one variable than even insignificant changes on one variable can result in classifier fall-down. The
38 Classifier Monitoring using Statistical Tests
495
proposed here approach is free from such deficiencies because it consist in testing the classifier answer. There is also another common situation that results in classifier fall-down. Training data used to build the model does not cover the full scope of universe, because, even when universe is finite, it is enormous huge. We believe that inductive learning find the proper generalization of presented facts. However, in real applications the classification of objects very far from presented ones in training phase results in pure accuracy. The one-variable test can easily do not capture such a situation. There were proposed some solutions to this problem (see, e.g., [6]), but they assume monitoring the object space by nearest neighbor methods or neural networks. These algorithms require additional computational effort comparable to the cost of creating classifier itself. Our approach require only a linearly proportional time to the number of objects in test and training set. 38.2.2 Classifier Monitoring The proposed approach consists in applying a monitoring mechanism only to results of classification. The classifier monitoring compares the distribution of answers on data set used for training with distribution of answers on data set currently being classified. If the applied test shows the significant difference, than it is a signal to perform detailed checking of classifier and, e.g., build new model. There are a number of statistical tests for comparing different properties of one, two or a number of distributions. In this research we utilize nonparametric statistical tests and we do not assume any particular distribution. Only several statistical tests satisfy such a conditions, in particular: Wilcoxon rank sum test (equivalent to the Mann-Whitney test) and Kolmogorov-Smimov test (see, e.g., [2, 3, 5]). These tests detect the differences in location and shape of two distributions. The Wilcoxon and Kolmogorov-Smimov tests have the advantage of making no assumption about the distribution of data, i.e., they are non-parametric and distribution free. The result of classification process usually can be of two types. The simpler type is one-valued decision that assigns classified object to a particular decision class. The more expressive result of classification is the probability vector that assigns to each possible decision a predicted probability that classified object belongs to considered decision class. For our research we use the second type of answer, what gives more detailed information on how model works on provided data. The classification or prediction process frequently proceed in bunches or in data streams, where not one object is classified, but whole set of objects. Such a situation occurs when we are performing stand-alone tests on previously prepared data or classification (prediction) is performed for, e.g., total base of customers. The result of classification is then a set of answers, i.e., probability assignments. In this paper we are limited to the binary decision — yes or no, what corresponds to classification that object belongs to a concept versus classification that object does not belong to a concept. The procedure of classifier monitoring is following (cf. Fig. 38.1): 1. Let C is a classifier, T = {^i,... ,tn} is data set used for training and P = {Pi? • • • ^Pm} is new data set, being currently classified. 2. Select one decision class d for which the probability assignments will be considered. From now on we will assume, that Cj^ : [/ -^ [0,1] gives an proba-
496
Rafal Latkowski and Cezary Glowiriski
bility assignment that an object x belongs to decision class d with probability C\d{x) = s. 3. Prepare set of probability assignment ST, called scoring, for data set used for training T. The set 5 T = { s f , . . . , 5^} consist of all answers of classifier C, such that C|d(^i) = sj, 4. Prepare set of probability assignment (scoring) Sp for data set used for testing P, The set Sp = { s f , . . . , s ^ } consist of all answers of classifier C, such that C\d{Pi) = sf. Scoring Sp can be computed without knowing the actual decision value, so also before gathering the data on decision. 5. Perform statistical test on ST and Sp that compares whether changes in classification process are significant or not. If the test value exceed a specified threshold, than notify of potential classifier fall-down. Training Set
Test Set Scoring for test set
Fig. 38.1. The procedure of classifier monitoring applies a statistical test to results of classification. 38.2.3 Classifier Fall-Down Identification The proposed approach for classifier monitoring consists in comparing two scorings: for training data and for currently classified data. There are several issues on proper classifier fall-down identification using this approach. The empirical evaluation presented further shows, that not all statistical tests are applicable to this problem, even in spite of satisfying requirements, e.g., that a test is model free. Besides presented here method of classifier monitoring we evaluated also another approach that compares not the scorings, but the distributions of tested objects to final leaves of decision tree. However, in such an approach we found no test or measure that correctly recognizes the classifier fall-down phenomenon. The Wilcoxon signed rank test, cosine measure, Kullback-Leibner divergence measure or six-sigma rule either do not capture the classifier fall-down or notify of nonexistent one. We suspect that the problem with those measures comes from the fact, that they do not consider the actual score value s that is assign to each decision tree leaf. If we consider the Kolmogorov-Smimov test on two scorings, then this test depends not only on distribution of objects to decision tree leaves but also on the actual score value in each leaf. The empirical distribution Junction (EDF) of scoring, which is used to calculate the KS-test, can be fully determined form distribution of objects to leaves combining with leaf score value. Perhaps other measures that take into consideration also the score value of leaves can be successfully applied to this problem. In fact the transformation of the Kolmogorov-Smimov test from EDF to
38 Classifier Monitoring using Statistical Tests
497
distribution of objects to leaves combining with leaf score values results in reduction of computational complexity of testing and in great compression of the classifier control data that has to be stored. The unresolved issue is how to estimate the optimal threshold value that delimitate predicted acceptable classifier accuracy from accuracy fall-down. Even if we precise the border between acceptable and unacceptable classifier accuracy it is unknown how to estimate this threshold. In our research we are familiar with considered data and classifier properties, so the threshold can be determined based on an expert experience. However, we do not have a general answer on how to estimate the threshold for proposed statistical tests. The proposed classifier monitoring is able to detect the accuracy fall-down only if there are some differences in description of classified objects. We can imagine another situation, where all object descriptions are untouched, but the concept is changing itself. In spite of that such a case is unobserved in real applications, it is possible to, e.g., generate the same synthetic data but with other concept labeling, where differences are only in decision attribute (target variable). There is no method at all to identify that prior knowing the actual decision (concept), while it touches the problem of learning the proper concept itself. In particular the proposed method of classifier monitoring is not able to recognize such a situation.
38.3 Empirical Evaluation 38.3.1 Data Description We used two groups of data sets for experimental evaluation of proposed method. The first group is synthesized in such a way that simulates an industrial data mining application. The second group is extracted from the RoboCup World Championship 2003 in soccer simulation league. Table 38.1. The results of experiments with synthetic data, where decision tree classifier was induced forfirstdata set. Data Set
Accuracy
Error rate
Standarized Wilcoxon Statistic
P-value Wilcoxon Test
Kolmogorov-Smimov Statistic
1 2 3 4
83.83% 70.83% 57.20% 43.71%
16.17% 29.17% 42.80% 56.29%
0 0.409571 -0.3174 0.200541
1 0.682121 0.75094 0.841057
0 0.017434 0.031917 0.037072
The datasets for simulating an industrial application are synthesized. They contain samples from two multinormal distributions in eight dimensional space [0,1]^. There are four data sets, where the standard deviations are constant, but locations are getting closer in consecutive data sets. Each data set contain about 10000 observations (objects). The data sets from RoboCup domain are extracted from log files of soccer simulator games that held at the finals of RoboCup World Championship 2003. The data contain the overall information about playfield, like position of players or number of executed already actions of each type. Each simulated player on the playfield was
498
Rafal Latkowski and Cezary Glowinski
manually market, whether it plays using an offensive strategy (attacker) or defensive strategy (defender or goalie). The data was desymmetrized and transformed to a special form, where each record describes one player at given time point of game. The finally transformed data contains 46 conditional attributes and one decision (target) attribute, namely strategy. There are eight data sets collected from four games with four participating teams, so each team is represented in two data sets. Each data set contain about 70000 observations (objects). 38.3.2 Experiments We carried out experiments separately for RoboCup domain data sets and syntectic data sets. The experiments were performed using an algorithm for decision tree induction implemented in SAS Enterprise Miner (see, e.g., [1]). The automatically generated scoring code allows storing both, scoring and distribution of leaves. The first group of experiments were carried out for synthetic data sets. The decision tree model was induced for the first data set, where centers of two normal distributions are distant. Then the classifier was applied to all four data sets. The classification results were gathered and tested as described in previous section. The results of this experiment are presented in Table 38.1. The first data set was used in both, training and testing. In case of first data set we can observe the highest classification accuracy and obviously no differences detected by statistical tests at all. The consecutive data sets, that contain samples from closer distributions, are worse classified by model induced for the first data set. The Wilcoxon statistic does not capture the essential classifier fall-down that occur for third and fourth data set. In the case of Kolmogorov-Smimov statistic we can easily observe that first and second data set receive values less then 0.2, while third and fourth on more than 0.2. If we put a threshold at level 0.2, than Kolmogorov-Smimov statistic perfectly detects the classifier fall-down. The experiments for data sets from RoboCup domain were performed differently. The model for predicting strategy was built for each data set. Each classifier was applied to all data sets. There are eight data sets, so also eight models were induced. In total 8 • 8 = 64 experiments were carried out to cover all combinations. Such a proceeding simulates a strategy detection classifier that is faced to unknown team or known team but in other game. The results of classification accuracy are presented in Table 38.2. As we expect the diagonal elements, which correspond to classifying the data set on which the model as built, present fully accurate or almost fully accurate classification. The similar observation holds for classifying the same team, for which model was built, but from the other game. The weakest classification accuracy in this category is 97.07% for model built on team TsinghuAeolus in game 4 (final) and tested on game 1 (third level group game). The classification accuracy of other teams varies from 36.36% up to 100%. The results of Kolmogorov-Smimov test are presented in Table 38.3. The results presented in this table are almost perfectly correlated to accuracy results. The diagonal elements are obviously equal to zero and classification of the same team
38 Classifier Monitoring using Statistical Tests
499
Table 38.2. The accuracy results of experiments with data sets from RoboCup domain. Training data set Tsinghu- Gl Aeolus G4 UvA_- 0 2 Trileam G4 Everest G2 03 Brain- 0 1 stormers 0 3
Test data set UvA_Trileam | Everest Oame2 | Oame4 [ Oame2 | Oame3 99.03% 97.13% 91.44% 96.53% 89.69% 87.96% 99.39% 99.07% 99.99% 99.59% 99.47% 97.37% 98.13% 100% 76.84% 76.4% 89.91% 89.28% 100% 98.66% 88.32% 88.1% 99.26% 99.99% 63.64% 72.73% 72.73% 45.45% 63.64% 72.73% 72.73% 45.45%
TsinghilAeolus Oamel Oame4 100% 98.56% 97.07% 100% 98.26% 99.86% 97.14% 90.19% 97.61% 100% 99.36% 98.98% 36.36% 63.64% 36.36% 63.64%
Brainstormers03 Oamel | Oame3 94.48% 99.25% 99.34% 95.26% 98.81% 98.93% 78.28% 96.11% 98.63% 96.12% 99.25% 93.27% 100% 100% 100% 100%
Table 38.3. The Kolmogorov-Smimov statistic results of experiments with data sets from RoboCup domain. Testd ata set Everest TsinghilAeolus UvA_l rileam Oamel Oame4 Oame2 Oame4 Oame2 Oame3 0.0007 0.0042 0.0143 0.0393 0.0135 0 0.0147 0 0.0515 0.0602 0.0010 0.0017 0.0090 0.0007 0 0.0017 0.0027 0.0112 0.1158 0.1180 0.0143 0.0490 0.0076 0 0 0.0015 0.0120 0.0001 0.0504 0.0536 0.0016 0.0039 0.0583 0.0595 0.0018 0 0.0455 0.0909 0.1818 0.1364 0.1364 0.0909 0.0454 0.0909 0.1817 0.1363 0.1363 0.0909
Training data set 01 04 02 04 02 03 Brain- Ol stormers 0 3
TsinghuAeolus UvA.Trileam Everest
• 1.0 0.9 0.8 H 0.7 0.6 0J5 0.4 03-\ 02-{ 0.1 0.0
•
•
•
Accuracy
•
•
^—0—^
KS Statistic
Brainstc)rmers03 Oamel Oame3 0.0037 0.0030 0.0027 0.0237 0.0060 0.0056 0.1086 0.0194 0.0013 0.0194 0.0030 0.0329 0 0 0 0
P-Value Wilcoxon
i,n;nin»i.m..i>..»wfiiuiig: I I I I I I I I I I I I I I I I r I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 'I I I
1
8
15
2
2
2
9
3
6
4
3
5
0
57
64
Fig. 38.2. The classification accuracy and statitical test results on data from RoboCup domain. The results are sorted by classification accuracy.
gives KS-test value below 0.015. Figure 38.2 presents the same results in graphical form, where experiments are sorted with respect to classification accuracy. It is easy to observe that while the accuracy is decreasing the KS-test value is almost al-
500
Rafat Latkowski and Cezary Glowinski
ways increasing. If we set the threshold between 0.04 and 0.045 then all 22 worst classification results in range from 36% to 90% are recognized as doubtful. If we set the threshold between 0.061 and 0.09 then the classification accuracy fall-down from level 88% to 78% is correctly recognized except two the worst experiments. It means that 12 out of 14 cases are correctly recognized. The p-value of Wilcoxon rank sum test, presented on Figure 38.2, does not manifest similar properties. The p-value for experiments with 100% classification accuracy is 1.0. However, for other experiments the p-value is extremely variable and is almost zero also for tests with classification accuracy above 90%.
38.4 Conclusions The empirical evaluation shows that the application of proper statistical test makes it possible to detect the classifier malfunctioning. The experimental results showed that the Kolmogorov-Smimov test is recommended for detecting the classifier fall-down phenomenon. The proposed method can be applied to monitor any type of classifier under assumption that it generates scoring if form of probability estimation, e.g., probability of belonging to a decision class. The proposed approach is suitable for detection of classification accuracy falldown in case of binary classifiers. For other purposes it is necessary to extend the scoring definition in order to apply similar statistical tests or replace the testing technique. The other deficiency of proposed method is lack of strict guidelines how to determine the proper threshold value and its confidence interval. In our further research we will try to overcome this problem by providing strict estimations on the possible classification accuracy fall-down with respect to the KS-test value. Although presented experiments were carried out using decision tree induction algorithm, there is no obstacle to apply this method to other classifiers, e.g., based on decision rules or artificial neural networks. The proposed method of classifier monitoring is applicable to classifiers induced by any algorithm. The only requirement is the availability of scoring or similar probability-like values that are produced by classifier.
References 1. Data Mining Using SAS Enterprise Miner: A Case Study Approach, Second Edition. SAS Publishing (2003) 2. Conover W.J.: Practical Nonparametric Statistics, Second Edition. John Wiley & Sons (1980) 3. Hollander M., Wolfe D. A.: Nonparametric statistical inference. John Wiley & Sons (1973) 4. Kaminka G. A., Lima P. U., Rojas R.: RoboCup 2002: Robot Soccer World Cup VI. LNCS 2752. Springer (2003) 5. Koronacki J., Mielniczuk J.: Statystyka dla studentow kierunkow technicznych iprzyrodniczych. WNT (2001) 6. Liu Y., Menzies T., Cukic B.: Data Sniffing — Monitoring of Machine Learning for Online Adaptive Systems. In 14th IEEE International Conference on Tools with Artificial Intelligence (ICTAr02). IEEE (2002) 7. Freund Y., Mansour Y: Learning under persistent drift. In S. Ben-David, editor, Proceedings of the EuroCOLT97. LNCS 1208, 94-108. Springer (1997)
39 What Do We Learn When We Learn by Doing? Toward a Model of Dorsal Vision* Ewa Rauch Linkoping University, Sweden (on leave) [email protected] Summary. Much effort in computer science is currently focused on developing architectures for multi-agent adaptive system capable of monitoring the environment and detecting security threats. I present here one such architecture developed by evolution and implemented in the neural mechanisms of the human brain - it is the dorsal visual system. I claim that the dorsal visual system in the human brain can be modeled as two cooperating rough agents which monitor the environment and guide other systems. The two agents' adaptation capabilities can be modeled on the basis of research in neuroscience related to the processes of implicit learning from experience. In the paper I first present arguments behind my claim. Next, I show how studying the dorsal visual system may help to improve human-machine interaction. Finally, I suggest how the conjectures presented here can be tested experimentally. Key words: categorization, dorsal vision,finitestate automata, guidance, knowledge discovery, patterns, proportional monitoring, rough set theory, templates
39.1 Introduction Building monitoring software systems capable of operating in "out-of-human-reach" environments, like deep seas or earthquake areas, presents still a serious challenge. Such systems need to be quite autonomous and adapt to new environments with minimal involvement of human expertise. They have to provide informative guidance for other software systems and to alert human operators, when a threat is detected. Ease of communication with human operators is of highest importance. The cognitive load that is put on people operating existing monitoring systems is much too high: "Instead of, how it is today, having 4 people occupied with interpreting information from a large sky telescope, we want to have a single operator working in front of screens with information from 4 such telescopes" [22] How can this goal be achieved? Probably, not by improving existing artifacts. Despite many important achievements, traditional Artificial Intelligence (AI) has not *I would like to express my gratitude to Prof. Andrzej Skowron for his advice. His comments are invaluable help to me. The research has been supported by the grant 3 Tl IC 002 26 from Ministry of Scientific Research and Information Technology of the Republic of Poland.
502
Ewa Rauch
provided efficient means for building such software. Complex environments, changing rapidly and largely unpredictably, cannot be monitored using approaches based on prespecified knowledge, like statistics, cause-effect relationships, or domain models. These approaches require a lot of human expertise and are relatively inflexible. As Waltz and Kasif say about statistical decision theory, "the models we can devise effectively are rarely accurate," while methods of traditional AI "rely on laborious hand-coding, have difficulty coping with uncertainty and change" [23]. Moreover, monitoring systems are oriented toward predictions, while traditional AI approaches usually provide rich diagnostic knowledge, which is necessary in breakdown situations, but often not useful for prediction. As Cobos et al, point, in diagnosis and in prediction "the temporal structure of the information provided to a reasoner may vary (e.g., multiple events followed by a single event vs. a single event followed by a multiple events)" [2]. Traditional AI approaches are organised around global, inflexible, usually hand-coded, models. Waltz and Kasif stress that it is not a proper way of building monitoring systems required to act in dynamic and uncertain environments. They posit that local models, learned over small neighborhoods from the most relevant instances, should be used instead [23]. It might seem that we need to invent a new architecture, but is it really true? In this paper I show that an architecture composed of two autonomous cooperating subsystems capable of monitoring a complex environment and adapting to its changing character has already been invented and is actually realized in many implementations. This architecture provides for building reliable monitoring systems which learn local models without supervision and generate efficient guidance information without increasing cognitive load unless absolutely necessary. This architecture underlies the dorsal visual system in the human brain, invented and perfected over ages by Evolution. Previously relatively unknown, because of its impenetrability by conscious access, the dorsal visual system is now amenable to investigation in the healthy human brain using neuroimaging procedures. It may be observed, studied, and computationally modeled using new, so called "soft", AI technologies. This paper is organized as follows. First, I present arguments behind my claim that the dorsal visual system in the human brain can be modeled as two monitoring rough agents. Next, I discuss how studying the dorsal visual system may help to improve human-machine interaction. Finally, I suggest how the conjectures presented here can be tested experimentally.
39.2 IVfodeling the dorsal system as two monitoring rough agents In order to show that the dorsal visual system can be modeled as two monitoring rough agents, I first present evidence that it is a system which monitors the visual environment and provides guidance to other systems. Later, I refer to research which suggests that the dorsal visual system adapts to changing conditions by implicit learning from experiences of interactions with objects in the new environment. Then, I present a patient case which demonstrates visual performance which is possible when visual information from the environment is provided exclusively by a single, retino-cortical dorsal pathway. It shows how powerful and autonomous is each of the, two in the healthy human brain, dorsal visual pathways. Having shown that
39 Toward a Model of Dorsal Vision
503
the dorsal visual system is an adaptive system which has two, relatively autonomous, subsystems, I draw on the most often used definition of an agent as a software system which is adaptive, cooperative, and autonomous, and conceptualize the dorsal visual system as two agents. Finally, I present some conjectures underlying the work on modeling the dorsal visual system and explain why its computational model has to be built using tools based on rough set theory [19], one of new technologies in AI. 39.2.1 Monitoring and the dorsal visual system The world around us is an ever-changing, complex environment. In order to act advantageously we need to monitor it. Gruber and Goschke present recent behavioral and functional neuroimaging studies suggesting that in the human brain continuously operate mechanisms of background-monitoring. These mechanisms, which are located in the fronto-parietal regions, interpret information that selective attention dismisses as task-irrelevant. Though potentially distracting for the ongoing action, such information cannot be ignored completely because in a changing environment task-irrelevant information may signal serious threats or important chances. The background-monitoring mechanisms, working for the most silently and in hiding, when necessary, interrupt an ongoing action and update working memory. This way the executive control centra in the human brain may be "influenced by the occurrence of unexpected or significant stimuli outside the current focus of attention. [...] This may enable the individual to interrupt and adapt behavior in response to significant and/or unexpected events." [10] People gather information from the world via senses. The most often used is, probably, vision. In the human brain operate two, anatomically and functionally separate, visual systems: the ventral visual system and the dorsal visual system. In the ventral visual system there are two retino-cortical pathways; each one leading from the eye to the temporal lobe in the same hemisphere. In the dorsal visual system each of the two retino-cortical pathways crosses the midline and goes to the contralateral parietal lobe. The ventral visual system, also called the 'what'-system, getting its input from the central visual field, is very well understood, since it is accessible for conscious access and for verbal report [9]. It was extensively studied over years, mainly as a consequence of interest in human-computer interaction (HCI). Because the ventral visual system processes only stimuli coming from the central visual field around the current focus of attention, the background-monitoring processes take their input from the stimuli carried by the dorsal visual system. It means that the dorsal visual system in the human brain is responsible for monitoring events in the visual environment and gathering information which may signal threats or unexpected chances. 39.2.2 Adaptivity and the dorsal visual system In order to act advantageously in the complex and changing world. Evolution equipped us with adaptation capabilities. "The ability to detect unusual, significant, or possibly dangerous events is fundamental for adapting to a rapidly changing environment and insuring survival of the organism" [16]. When a behavior, understood here as a sequence of actions, successful in the old environment is no longer advantageous, it must be adapted to the new environment. The dorsal visual system, which
504
Ewa Rauch
is responsible for visual guidance of actions [18], must also quickly adapt to the new conditions. Since the dorsal visual system is impenetrable for conscious access, it cannot be provided with explicit knowledge form the cognitive mechanisms and must learn autonomously from experiences of interactions with the new environment. Recent results in the, closely related to learning action sequences, area of sequential learning, improve our understanding of how the dorsal visual system learns. Keele et al. present a theory of sequence learning - a single theory which has grown out of years of independent studies in several laboratories with different origins. This theory suggests that there are two learning systems and that they represent sequential regularities in different ways. The first system, called multidimensional, builds cross-dimensional associations between events. The second system, unidimensional, is strongly modular. Each module is capable of automatically extracting regularities from its input events. Learning in such modules is implicit and dual-task conditions do not disrupt it. The dichotomy unidimensional/multidimensional closely resembles the dorsal/ventral dichotomy in the human visual system, but with respect to the dorsal visual system the theory suggests "a more general role for this system, proposing modules that provide the representations for an organized sequences of actions" [12]. Hence, the dorsal visual system is an adaptive system. It implicitly learns regularities in the environment as well as sequences of actions. 39.2.3 Autonomy/cooperation and the dorsal visual system Though the ventral visual system is well understood since quite a long time, the dorsal visual system is still largely a mystery. We know that it is very sensitive to luminance changes and motion, but we have very poor understanding of what really is happening with the low-spatial resolution stimuli gathered outside the central visual fields and carried via the magnocellular channels to the parietal lobes. Before the advent of the brain imaging tools, the dorsal visual system was difficult to study in healthy humans as it is not available for conscious access. To study it in animals was impossible because it is silenced by anesthetics. Since human interaction with computers was assumed to be channeled exclusively via the ventral visual system, the dorsal visual system as a medium of communication between a human operator and a machine was never really studied. Despite some lesion studies, providing evidence that it is actually a powerful channel of gathering visual information, there was no strong pressure on investigating how the dorsal visual system works. Now, with modem tools allowing us to see a healthy brain in action, the situation is different and the interest in studying the dorsal visual system grows very quickly. In [14] it is reported the case of a 30-year-old-man who at the age of 3 years completely lost his ventral visual system and one retino-cortical pathway in his dorsal visual system. All visual information, that his brain uses is carried by the retino-cortical magnocellular-dorsal pathway leading from his right eye to the left parietal lobe. Despite the fact that he is legally blind - has no conscious vision whatsoever - he is capable of driving a motorcycle and playing fast ball games. He is capable of moving in unfamiliar environments and with poor lighting, relying only on landmarks to direct his movements toward a goal. His visuomotor performance is very good - he easily catches two table tennis balls and juggle with them. At the same time, because of his perceptual
39 Toward a Model of Dorsal Vision
505
limitations, he cannot choose his food at the cafeteria. He is able to grasp precisely an object and name it, though he has no conscious perception of its size, color or orientation. He can neither read nor write, but is fluent in Braille. He "offers the quite unusual opportunity of evaluating what it is to see [...] through a dorsal system on the basis of magnocellular information alone" [14]. This patient case proves that we strongly underestimate the dorsal visual system and its importance for human performance. The architecture of a computational model of the dorsal visual system has to be composed of two agents, because, as we can see from above, the dorsal visual system has two, relatively autonomous, adaptive subsystems. 39.2.4 The dorsal visual system as a multi-agent monitoring architecture From what was presented follows that the dorsal visual system may be seen as a monitoring multi-agent architecture implemented in the neural structures of the human brain. A symbolic computational model developed on the basis of this architecture would allow to test hypotheses related to the dorsal visual system as well as to multiagent, monitoring systems in general. Though nowadays neuroscience is studying the dorsal visual system very actively and many researchers have attempted to model some aspects of it, so far I know of no symbolic model of the dorsal visual system, which could provide for computational testing of its monitoring capabilities. Aristotle once said that what we have to learn to do, we learn by doing. His main interest was the phenomenon of change and its importance for human cognitive functioning in world. Recent results in attention research provide evidence that changes are important not only for human cognition, but also for human body performance. Changes, which we are not aware of, may be detected by the dorsal visual system and affect our actions [8]. There is some evidence that, while performing interactive actions, people, unconsciously and unintentionally, learn regularities in environmental changes reflecting effects of the actions [13]. These learned regularities, temporal and spatial, in sequences of changes, may guide attention as well as anticipatory action preparation mechanisms in order to improve performance [6]. Generally speaking, performance can be based on endogenous cues (top-down, memory based, cognitive expectations) or on exogenous cues (bottom-up, sensory events) [6]. Performance based on endogenous cues requires deliberate effort and awareness, while performance based on exogenous cues, detected with or without awareness, is fast and almost effortless. The dorsal visual system learns regularities in sequences of changes in a stream of spatio-temporal information [12]. On this basis it may sometimes predict time delay between two changes and find the next location of interest. These two kinds of information, approximate spatial locations and temporal intervals, are, respectively, spatial and temporal exogenous cues and may, influence performance [4]. Spatial exogenous cues provide benefits only if they reliably predict location of interest. Because of costs inherent in attentional shifts, spatial exogenous cues which are not predictive and require reorienting of attention, are not advantageous. Today, very little is known about the mechanisms employing temporal exogenous cues. I suggest, that temporal exogenous cues are employed by the background-
506
Ewa Rauch
monitoring mechanisms and, while predictive temporal cues provide for efficient monitoring strategies based on proportional monitoring [1], unpredictive temporal cues make necessary using inefficient strategies based on periodic monitoring. I suggest that "leaming-by-doing" may be seen as unconsciously extracting, more and more predictive, spatial or temporal exogenous cues. It is accompanied by decrease in performance guided by the ventral visual system and increase in performance guided by the dorsal visual system. This paper is a part of a work toward a computational model of the dorsal visual system in the human brain. The model is aimed at explaining how unconscious processes of action guidance, which operate in the dorsal visual system in the human brain [18], can extract regularities from visual stimuli and how these regularities can be used in environment monitoring and in the category-invention process of unsupervised learning [3]. The model will also show how finite state machines, suggested as a computational analogue of the neurobiological mechanisms [20], can be induced from the extracted regularities and used by the dorsal visual system in the human brain for representing knowledge learned from experience. Following the ideas presented by Metzinger and Gallese, the model will explain how action ontologies may emerge in the process of unconscious "leaming-by-doing". The work is in a very early phase. Currently, the focus is on modeling the mechanisms which provide for autonomous discovery of predictive temporal knowledge in categorical time-series generated from timestamped snapshots and using it for temporal prediction, as well as on modeling the mechanisms for autonomous discovery of approximate spatial knowledge from sequences of snapshots and using it for spatial prediction. The dorsal visual system in the human brain is proposed to be modeled as two agents. The agents extract information from streams of imperfect data and use it on-line for task guidance. The extracted information is also used off-line for learning from experience. Since the dorsal visual system is not accessible for conscious access, each agent extracts information solely from its input stream. The dorsal visual system is proposed to be modeled using rough agents, i.e., agents performing approximate reasoning based on rough sets [19,17]. According to what results in neuroscience tell us about the dorsal visual system and its monitoring capabilities, it should be modeled as an unsupervised learning system discovering knowledge from an evolving stream of imperfect data. Actually, the tools based on rough set theory provide the best means for modeling such systems [7].
39.3 The dorsal visual system and human-machine interaction The presented here work on modeling the dorsal visual system is aimed toward better understanding of what we unconsciously learn when we "leam-by-doing". In the area of human-machine interaction it may help to reduce the attentional load on the human operator if designers of graphical user interfaces (GUIs) take into consideration the fact that the dorsal visual system is a channel through which the human brain may effortlessly gather action guidance information. Instead of designing GUIs based on endogenous cues, which require deliberate effort and awareness, GUIs may generate exogenous cues whenever not a decision, but a visually guided action, is required.
39 Toward a Model of Dorsal Vision
507
Getting this way some unnecessary attentional load off the human operator is, surely, a step toward having a single operator working with 4 telescopes at once. Another potential advantage of the work on modeling the dorsal visual system is preventing health problems related to using computers at work. Recent rapid development of brain imaging techniques opened ways for studying unconscious processes in the brain of a person performing a task. We know today for sure, that unconsciously registered , spatial as well as temporal, information may strongly affect performance of the musculoskeletal system in the human body [11]. Graphical user interface (GUI) almost continuously generates an evolving stream of graphical stimuli. Since designers take into consideration only spatial organisation of visual information, the temporal regularities, if any, are created accidentally and may provide the dorsal visual system with some unpredictive temporal exogenous cues. In sensitive people this erroneously activates anticipatory action preparation mechanisms, increases the level of neuromotor noise together with accompanying it limb stiffness [24], and eventually may lead to neurological injuries. In order to prevent such problems, GUIs for sensitive computer users must control temporal aspects of information generated on the screen. I believe that some musculosceletal and neurological complaints related to using computers at work, like carpaltunnelsyndrome, may be avoided if designers pay attention to temporal regularities generated on the computer screen. The computational model proposed here may provide means for testing individual differences in sensitivity to temporal exogenous cues as well as temporal regularity of a GUI.
39.4 Computational experiments related to the model Conjectures underlying the proposed model of the dorsal visual system can be seen as an attempt to explain grounding action words in perception. Computational experiments can be designed following the approach described by Regier and Carlson who construct spatial templates for visualization of some linguistic spatial categories and use computational models for predicting acceptability ratings. The predicted ratings are later compared with human behavioral data. Similarly, temporal templates can be created with some linguistic action categories related to these templates, and computational predictions can be compared with behavioral data.
39.5 Final comment I hope that by taking into consideration the processes operating in the dorsal visual system we may build interactive tools which do not harm people using them in everyday work.
References 1. M.S. Atkin, P.R. Cohen: Monitoring in Embedded Agents, Computer Science Technical Report 95-66, University of Massachusetts, Amherst, Ma, 1995. 2. RL. Cobos, F.L. Lopez, A. Cano, J. Almaraz, D.R. Shanks : Mechanisms of Predictive and Diagnostic Causal Induction, Journal of Experimental Psychology: Animal Behavior Processes, 2002, 28(4), 331-346. 3. J.P. Clapper, G.H. Bower: Adaptive Categorization in Unsupervised Learning, Journal of Experimental Psychology: Learning, Memory, and Cognition, 2002, 28(5), 908-923.
508
Ewa Rauch
4. J.T. Coull, A.C. Nobre: Where and when to pay attention: The neural systems for directing attention to spatial locations and to time intervals as revealed by both PET and fMRI, Journal of Neuroscience, 1998, 18(18), 7426-7435. 5. J.T. Coull, CD. Frith, C. Biichel, A.C. Nobre: Orienting attention in time: behavioural and neuroanatomical distinction between exogenous and endogenous shifts, Neuropsychologia, 2000, 38(6), 808-819. 6. J.T. Coull: fMRI studies of temporal attention: allocating attention within, or towards, time. Cognitive Brain Research, 2004, In Press. 7. I. Duntsch, G. Gediga: Rough set data analysis. In Encyclopedia of Computer Science and Technology, Marcel Dekker, 2000. 8. D. Femandez-Duque, I.M. Thornton: Explicit Mechanisms Do Not Account for Implicit Localization and Identification of Change: An Empirical Reply to Mitroff et al. (2002), Journal of Experimental Psychology: Human Perception and Performance, 2003, 29(5), 846-858. 9. M.A. Goodale, D.A. Westwood: An evolving view of duplex vision: separate but interacting cortical pathways for perception and action. Current Opinion in Neurobiology, 2004, 14, 1-9. 10. O. Gruber, T. Goschke: Executive control emerging from dynamic interactions between brain systems mediating language, working memory and attentional processes. Acta Psychologica, 2004, 115, 105-121. 11. K. Imanaka, I. Kita, K. Suzuki: Effects of nonconscious perception on motor response. Human Movement Science, 2002, 21, 541-461. 12. S. Keele, R. Ivry, U. Mayr, E. Hazeltine, H. Heuer: The Cognitive and Neural Architecture of Sequence Representation, Psychological Review, 2003, 110(2), 316-339. 13. W. Kunde, J. Hoffmann, P. Zellmann: The impact of anticipated action effects on action planning. Acta Psychologica, 2002, 109, 137-155. 14. S. Le, D. Cardebat, K. Boulanouar, M-A. Henaff, F. Michel, D. Milner, C. Dijkerman, M. Puel, J-F. Desimonet: Seeing, since childhood, without ventral stream: a behavioural study. Brain, 2002, 125, 58-74. 15. T. Metzinger, V. Gallese: The emergence of a shared action ontology: Building blocks for a theory. Consciousness and Cognition, 2003, 12, 549-571. 16. E. Nagy, G.F. Potts, K. Lovelan: Sex-related ERP differences in deviance detection. International Journal ofPsychophysiology, 2003, 48(3), 285-292. 17. S.K. Pal, L. Polkowski, A. Skowron: Rough-Neural Computing: Techniques for Computing with Words, Series in Cognitive Technologies, Springer-Verlag, Heidelberg, 2004. 18. R. Passingham, I. Toni: Contrasting the Dorsal and Ventral Visual Systems: Guidance of Movement versus Decision Making, Neurolmage, 2001, 14, S125-S131. 19. Z. Pawlak: Rough Sets. Theoretical aspects of reasoning about data. Kluver Academic Publishers, Dordrecht, 1991. 20. K.M. Petersson, C. Forkstam, M. Ingvar: Artificial syntactic violations activates Broca's region. Cognitive Science, 2004, 28, 383-407. 21. T. Regier, L. Carlson: Grounding Spatial Language in Perception: An Empirical and Computational Investigation, Journal of Experimental Psychology: General, 2002, 130(2), 273-298. 22. K. Sycara: personnal communication, MSRAS, Plock, 2004. 23. D. Waltz, S. Kasif: On Reasoning from Data, ACM Computing Surveys, 1995, 27(3), 356359. 24. A.W.A. Van Genmiert, G.P. Van Galen: Auditory stress effects on preparation and execution of graphical aiming: A test of the neuromotor noise concept. Acta Psychologica, 1998,98,81-101.
40
Rough Mereology as a Language for a Minimalist Mobile Robot's Eenvironment Description Lech Polkowski^'^ and Adam Szmigielski^* ^ Polish-Japanese Institute of Information Technology Warsaw, Poland ^ Department of Mathematics and Computer Science Univ. Wannia and Mazury, Olsztyn, Poland {polkow,aszmigie}@pjwstk. edu.pl
Summary. Rough Mereology is a paradigm for approximate reasoning proposed by Polkowski and Skowron in 1994. The basic primitive notion of Rough Mereology is the notion of a part to a degree and the functor returning this value is called the rough inclusion. Rough Mereology has been shown to be a veryflexibletool in problems of Approximate Reasoning including reasoning in many-agent systems, spatial reasoning, granulation of knowledge problems, computing with words paradigm, among others. Here, it is applied to provide a language aimed at describing the mobile robot environment in case of a minimalist mobile robot equipped with an omnidirectional sonar and navigating in the environment of a sonar GPS. Mereological constructs are applied toward definitions of geometric notions and navigation algorithms are based on these geometric notions. Some results of experiments with the real robot Pioneer 2DX are reported showing the applicability of rough mereology in mobile robot navigation tasks.
40.1 Introduction Space considerations play a very important role in tasks of mobile robot navigation and control. It stems from a rather obvious fact that in order to effectively navigate in an environment the mobile robot either should possess a knowledge about the environment that may be encoded within the robot as a map of the environment [1], or as a database of objects to be recognized - facilitating a transition from perception to action [7]. Other possibility is to engage a robot in interaction with the environment and to build spatial constructs from sensory readings directly without resorting to a dedicated planner or reasoning center. In this way it seems one implements the embodied intelligence idea of Brooks [3]. *The simulation and real experiments results presented here formed a part of this author's PhD dissertation, of which thefirstauthor was the supervisor, defended at the Institute of Automation and Computer Science of the Department of Electronics and Information Techniques of the Warsaw University of Technology in May, 2004.
510
Lech Polkowski and Adam Szmigielski
Spatial models of a mobile robot environment should be effective in the sense of robot control complexity but also they should have sufficient expressive power in the sense of human understanding. While standard quantitative models are very well developed and powerful, any precise description cannot itself illuminate expressions like "near", "front of" etc. Qualitative models are very useful in mobile robot navigation, but they may fail when we try to locate the robot with linguistic expressions like "near", "between", or when we want to express intention of robot movement like "move further away from" or "turn a little left". In implementing a program that would lead to a robot controlled by natural language phrases as mentioned above, one has to translate first phrases of language into constructs related to spatial properties of the robot environment and then to relate those properties to specific configurations of the sensory readings. For these reasons, we have selected Rough Mereology as a language for approximate reasoning to be used in constructing a set of notions that a robot may use to relate itself to its environment. In order to stress the method and its relative strength, we have decided on a minimalist robot, equipped only with an omnidirectional sonar as its only sensor for detecting environment objects and interacting with the environment by means of communicating with a network of sonar receivers by means of a sonar emitter. As a result of the interaction and sensory abilities, two circles are created: one centered at the robot of radius equal to the distance to the nearest obstacle, a boundary to a region called the collision-free region, and the other centered at a receiver, delimiting a region called the receiver region. The two regions intersect allowing for a degree of partial containment measured as the ratio of the area of the intersection of the regions to the area of the larger region. The numerical value obtained in this way is used in definitions of geometric notions like "between", "near to", etc., in terms of which algorithms for robot navigation are written down. In what follows, we outline the basic notions of mereology and rough mereology, we describe the idea of the reasoning process devised for our robot, we present the description of the physical system of the robot, and finally we present some results of experiments.
40.2 Mereology Mereology is a theory of sets that - in contrast to the classical, or naive, set theory based on the primitive notion of an element - assumes the relation of being a part as its primitive notion, cf.[9]. The primitive notion of mereology is a predicate (relation) TT of being a part. Given a universe of entities, TT does satisfy the following requirements, (PRTl) it is not true that there exists an entity x with the property that XTTX; (PRT2) if x-Ky and yirz then XTTZ.
40 Rough Mereology in Mobile Robotics
511
meaning that the relation n is non-reflexive and transitive, i.e., it is a pre-order relation on the given universe. (!) In order to relate this non-standard notion to the familiar notions of naive set theory, we may observe that the relation C of strict containment is a part relation on any family of sets. The notion of an element el, can be defined as xely ^ xny V x = y. Then the el relation is a partial order on the universe of entities, in particular: xelx for any entity X.
40.3 Rough Mereology Lesniewski's mereology can be extended by adding -a predicate of being a part to a degree r, denoted by /ir, where r e [0,1], see [10]. The formula xfirV can be read: "x is a part ofy to a degree r'\ We recall the set of axioms the predicate /i^ should satisfy. These axioms stem from our intuitive understanding of the nature of partial containment. •
•
• •
RMl x^iy <=> xely, this means that being a part to the degree 1 is equivalent to being an element in the mereological sense. (!) In the model where part relation is strict containment of sets C, (RMl) does mean that the region (set) x is included in the region y either in strict way or both regions are equal. RM2 xfiiy =^ Wz{zfirX => z^ry\ meaning the monotonicity property: any region z is a part of the region y to degree not smaller than that to which 2: is a part of X, if only x is an element of y. RM3 (x = y A xfirz) => yfJ^r^^ this means that the identity of individuals is a congruence with respect to p. RM4 {xprV A s < r) => xpsV, establishes the meaning of "a part to degree at least r". It means that if a region is a part to some degree, then it is also a part to any smaller degree.
Rough mereology can make the bridges between the qualitative and quantitative spatial description by the numerical determination of region-based relations. We recall the idea of Pawlak and Skowron [8] of a rough membership function. For two non-empty sets of objects in an information table, x, y, let Mxd/) = 1 ^ , \y\
(40.1)
where |x| stands for the cardinality of x. Obviously, the formula extends to the continuous case with Ixl the area of x.
512
Lech Polkowski and Adam Szmigielski
40.4 Rough-Mereological Geometry System Predicates /x^ may be regarded as weak distance functions in the context of geometry. From this point of view we may apply // in order to define basic notions of rough-mereological geometry. We recall a notion of distance ^r in rough mereological universe [9], K(X, y) = r <=> r = min{max{u, w : xjiuV A yfJ^w^}}.
(40.2)
Rough mereological distance, or shorter mereo-distance, is defined in an opposite way to the traditional distance - the smaller r, the greater distance. Using formula (40.2) the predicate Nrr, to be nearer, proposed by van Benthem [2] can by described as zNrr{x,y)
4=^ {i^{z,x) > n{z,y)).
(40.3)
zNrr{x, y) means that the region z is closer to the region x than to the region y. If neither the region z is closer to the region x than to the region y nor the region z is closer to the region y than to the region x, then both regions x, y are at the same mereo-distance to the region z. Formally we can define equi-distance as a predicate Eq{x,y), zEq{x, y) ^ {-^{xNrr{z, y)) A ->{yNrr{z, x))). (40.4) In our system, in which regions are disks, equidistance is the collection of disks whose centers lie on the straight line perpendicular to the segment linking centers of the disks x and y and crossing it at the middle. We may adopt also other predicates , e.g., the predicate to be between, Btw [2], defined as, zBtw{x, y) ^ \/u{u = zW xNrr{z, u) V yNrr{z, u)).
(40.5)
Elementary calculations show that the mereo-distance of two disks x, y is minimal (i.e., maximal as to the value) in the case when the radii of both disks are equal and either center lies on the boundary of the other disk; then, 9
/^
/^(x, y) = - - —- = max. O
(40.6)
ZTT
With the use of this minimal distance, we define the predicate near that signals that the object (robot) x is reaching its target (receiver) y, X near y ^ 3u[uelx A K,{U, y) = max].
(40.7)
40 Rough Mereology in Mobile Robotics
513
40.5 System description Our navigation system consists of one, omnidirectional ultrasonic sensor, to measure the distance to the nearest obstacle, located on the robot and a set of receivers placed in the robot environment. Correct placement of receivers is important for proper description of topology of the environment. Robot has also an ultrasonic emitter to dialogue with receiver(s). As the moving platform, the mobile robot Pioneer P2DX with a differential drive was used. Our robot has two degrees of freedom - it can go forward and rotate. Consequentiy, movement control is limited to two control parameters - rotation angle a and the length of movement s. Measurement interpretation The set of measurements consists of the radius of non-collision region d and radii of receiver regions h^h, The configuration of the non-collision region and the receiver region is shown in Figure 40.1.
Fig. 40.1. Configuration of non-collision region and receiver region Let us denote the non-collision region as Rs - the disk{S, d), the receiver region as Ro - the disk{OJ). The point S is always located on circle{0,l), so mereodistance KSO of these two regions can be calculated using formula (40.2) as,
. ,\RsnRo\ \RsnRo\. Kso = m m { — — - — , \rCo\
•}.
,.^«, (40.8)
\^S\
The rough mereological distance depends not only on the distance / from the transmitter to the receiver but also on the radius d of non-collision region. From linguistic to spatial description If we define spatial relations within mereology we can also treat them semantically [5]. In this way complementary interpretations of defined relations are introduced. From one point of view we can understand them as semantical (psychological), from the other one as a description of the physical space. Rough mereology can deal with real world data to specify defined relations numerically. This duality can make the bridges between qualitative and quantitative spatial description.
514
Lech Polkowski and Adam Szmigielski natural language spatial expressions (eg. near, far) ^ rough-mereological spatial relations (eg. near, far) ^ physical space (numerically described) Fig. 40.2. Translation scheme from linguistic to physical space description
The conversion scheme of language expression to low level description via rough mereology is shown in Fig. 40.2. The major difficulty is to distinguish the set of spatial relations useful for a specific task, which formalized some spatial intuitions. Navigation system Architecture of our navigation system is shown in Fig. 40.3. The system works in feedback control loop. The controller uses rough mereological space description in its reasoning.
CONTROL GOAL (in natural language)
cb'ntr"QTIe"r"f CONTROL
GHifiiii;
GOAL kas mereological relation)!
SONAR SYSTEMS SPATIAL REASONING (mereo-geometry description) 4»!-rot.angl^ -path PATA INTERPRETATIOJf (in mereology context)
ROBOT moving platform Omnidirectional sonaJ-
Fig. 40.3. Architecture of navigation system The set value of controller (the control goal), in opposition to classical regulators, is an expression of natural language.
40.6 Realization of robot tasks In all robot navigation tasks, we will use the scheme presented in Fig. 40.2. Following that scheme, the first step is to express our intention of robot movement in
40 Rough Mereology in Mobile Robotics
515
natural language. After that we should find the mereological relation to describe this intention. The last step is a numerical description, necessary in robot control. The main problem is to specify the relation, that should express our intention. 40.6.1 Coming near to the receiver region In robot navigation, we may want the robot to come closer to the receiver region. It is easy to observe, that for region configuration presented in Fig. 40.1 both regions have the smallest mereo-distance if and only if they have the same area (equal to I — - ^ « 0,391). From control point of view, every region, included in the noncollision region is also a non-collision region. If there exists a non-collision region X, whose distance to a receiver region y is minimal, then the non-collision region x is near to the receiver region y, in symbols x near y. The intention of movement, •
GOAL OF CONTROL: move the robot near to the receiver region
can be replaced by a formula •
MEREOLOGICAL DESCRIPTION: X neary.
The goal of control can be obtained by finding the proper robot orientation and continuing the movement. The algorithm of control is based on a set of rules that have been induced from an analysis of the robot movements. We denote with the symbols K(O, j), K(O, S) , K(O, p), respectively, mereo-distances between regions (disks) of radii d centered, respectively at points G,S,D in Fig, 1, and the receiver region at O. Clearly, K{O,J) < K{O,S) < K{O,P). The symbol K will denote the distance between the non-collision and the receiver regions after the movement is made. Table 40.1 presents basic rules determining the rotation angle a of the robot as a function of the distance K. Table 40.1. Rotation angle as a function of mereo-distance mereo-distance center distance angle of rotation 1-d 1 1+d
K(O,P) K{O, S)
i^{oJ)
0 2 TT
The form of the function K ^^ aof which Table 40.1 is an approximation, is set as, a = arctan((K(o,p) -
K{OJ))
•
^ K ^ ) - ^
The direction of rotation is set by the decision rule.
^^ J
(4Q
9)
516
Lech Polkowski and Adam Szmigielski
if apresent > OLprevious then changc the direction of rotation,
(40.10)
where ^present ? ^previous denote the current, respectively, previous, value of the rotation angle. The algorithm stops when the non-collision region is greater then the receiver region. This criterion depends on robot's environment. When the robot moves in wide space it can stop at much greater metric distance to the receiver in comparison to the case when the robot moves in the bounded space, limited by walls (e.g., a comer). 40.6.2 Driving robot in straight line The intention of movement: •
GOAL OF CONTROL: move the robot in the straight line,
can be replaced by a mereological formula •
MEREOLOGICAL DESCRIPTION: x-KEql,
where Eql is the straight line, constrained from equation (40.4) - all regions (points) meetings requirements of equation (40.4) are in this straight line. 40.6.3 TViming the robot Turning the robot is a more advanced task of robot navigation, which can be reduced to simpler tasks. That process can be executed in three stages: 1. to go straight, 2. to reach the point of changing the navigated pair of receivers, 3. to go following the new navigating points. The first and third stages are tasks of 'going along the straight line'. The problem is to identify the moment of switching the pair of receivers. Line segments of navigating trajectory does not have to be perpendicular.
40.7 Experimental results Tasks of robot navigation, described previously, were simulated and tested in real world conditions. In real world experiments we used Pioneer P2DX robot, produced by Activmedia Corporation. In Fig. 40.4 a trajectory of turning the robot and coming near to receiver region is presented as exemplary, in case the robot was initially oriented at 90°.
40 Rough Mereology in Mobile Robotics
517
Fig. 40.4. Task of turning and coming near to the receiver region at initial orientation 90°
40.8 Conclusion In this paper we have presented a method for robot's environment description based on rough mereological approach based on using disks as primitive objects best suited to the robot sensory abilities (i.e.,omnidirectional sonar and a system of sonar receivers). A scheme for converting natural language spatial expressions into low-level physical descriptions and a dynamical control system using rough mereological spatial reasoning have been proposed and tested in real world conditions.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Arkin R (1998) Behavior Based Robotics. MIT Press, Cambridge MA van Benthem JFAK (1983) The Logic of Time. Reidel, Dortrecht Brooks RA (2002) Robot: The Future of Flesh& Machines. ePenguin Coventry KR (1996) Spatial preposition, functional relation and lexical specification.In: Olivier P, Gapp (eds) Representation and Processing of Spatial Expression. Lawrence Erlbaum Associates Srzednicki JTJ, Rickey VF (eds)(1984) Lesniewski's Systems Ontology and Mereology. Ossolineum, Wroclaw Lukasiewicz J, O zasadzie sprzecznosci u Arystotelesa. PWN, Warszawa Ming Xie (2003) Fundamentals of Robotics. Linking Perception to Action. World Scientific, New Jersey Pawlak Z, Skowron A (1994) Rough membership function. In: Yager RR, Fedrizzi M, Kacprzyk J (eds) Advances in Dempster-Schafer Theory of Evidence. Wiley, New York Polkowski L (200x), Mereological foundations to approximate reasoning,.This volume Polkowski L, Skowron A (1997) Rough mereology: A new paradigm for approximate reasoning. Intern. J. Approx. Reas. 15(4): 333-365 Szmigielski A (2002) Rough mereological localization and navigationIn:Proceedings RSCTC 2002 Rough Sets and CurrentTrends in Computing. Lecture Notes in AI 2475, Springer, Berlin
518
Lech Polkowski and Adam Szmigielski
12. Szmigielski A, Polkowski L (2003) Computing from words via rough mereology in mobile robot navigation. In: Proceedings IROS 2003, lEEE/JRS International Conference on Intelligent Robots and Systems. Las Vegas, USA
41 Data Acquisition in Robotics Krzysztof Luks^ Polish-Japanese Institute of Information Technology, ul. Koszykowa 86, 02-008 Warszawa, Poland kluks@ai . p j wstk. edu. p i
Summary. This paper presents data acquisition system for a humanoid head robot built at PJIIT Robotics and Multiagent Systems Laboratory. Real time operating system QNX Neutrino was used together with dedicated hardware driver to provide low overhead, high throughput data acquisition system. Driver architecture was determined by the environment it was operating in and software it interfaced with.
41.1 Introduction In many aplications in robotics one needs to gather data from sensors probing the outside envitonment. In typical cases sensors will range from CCD cameras to sonars and gyroscopes to simple temperature sensors. The need for data obtained by means of sensors arises from the necesity of converting sensory readings ("perceptions") to actions taken by robotic systems; this conforms to the metaphore of "linking perceptions to actions" [1]. In all cases the results of sensor's measurement has to be converted into digital form and transferred via some kind of interface to computer's memory. Only then can the processing software gain access to it. Some sensors have the analog to digital converter integrated or even be the converter itself (e.g. CCD matrix cameras) while others need separate converting element. The system may also need to provide feedback link to anable applications to send control commands to hardware components. Possible commands include controlling camera focus, picture resolution or data sampling frequency. Some of the sensors are more "intelligent" than others and require less sophisticated controling software. E.g. cameras can implement picture adjutment algorithms, such as white balancing or auto-focus, in hardware significantly reducing processing power required to acquite satisfying quility images. Difficulty in getting sensor data to the computer depends mainly on the nature of the data - it's amount and transfer speed. In a case of streaming high resolution video special care has to be taken to reduce transfer overhead to minimum so that it won't use up all the system resources. A single camera with 512 by 512 pixel image with
520
KrzysztofLuks
WORLD
Sensor
Converter
Interface
Application
Memory
Driver
Fig. 41.1. Generic sensing information flow model, based on [1]
256 intensity levels per pixel generating 30 images a second produces 24 megabytes of data every second [2]. Also system environment in which data processing application will run has to be taken into an account. With many processes running on the same computer system proper care needs to be taken to ensure uninterrupted and timely execution of all programs. Failure to meet this requirement may lead to data loss which in consequence can cause damage to robot, it's malfunction or poise a threat to people or property. In the most basic case data acquisition system will consist of a simple sensor integrated with analog to digital converter connected via a serial port to, possibly embedded, dedicated computer system. Good example of such setup is temperature sensor from which only few bytes of data is being read every few seconds. These issues can be addressed by using a hard realtime operating system which will ensure that time constraints are met.
41.2 Concepts of realtime computing There are many definitions of realtime computing. Quoting the comp.realtime newsgroup FAQ one can define realtime system as one in which the correctness of the computations not only depends upon the logical correctness of the computation but also upon the time at which the result is produced. If the timing constraints of the system are not met, system failure is said to have occurred. Traditionally, realtime operating systems have been used in "mission-critical" environments requiring realtime capability, where failure to perform computations in a certain time frame can result in harm to persons or property. Such systems include for example medical equipment or industrial process monitoring. Recently, however, another field of application of realtime computing have become popular. Systems where failure to meet time constraints results in financial penalty or considerable loss of quality of service. Such systems include consumer
41 Data Acquisition in Robotics
521
multimedia devices where dropped video frames exceeding certain amount make such device unacceptable to the customer. 41.2.1 Non realtime operating systems The key characteristic that separates an RTOS from a conventional OS is the predictability needed to meet above requirements above. Conventional (monolithic) OS uses "fair" process and thread scheduling algorithms. This doesn't guarantee that realtime threads finish their processing on time. Also, priority information is, in most cases, lost during kernel calls being performed on behalf of a client thread. This results in unpredictable delays preventing an activity from completing on time. 41.2.2 Realtime microkernel architecture The microkernel architecture used in QNX RTOS leaves only the most basic tasks to be performed by OS kernel. This task includes managing threads and processes and passing messages between them. Scheduling for execution is done at per thread basis and high-priority threads can preempt lower-priority ones when they become ready for execution.
application device driver
micro kernel
message bus
Fig. 41.2. Microkernel message bus
All other functionality, such as device drivers and OS services exist as separate processes and don't run within the kernel. In QNX such processes are called resource managers and use IPC interface provided by the microkernel as a message
522
KrzysztofLuks
bus to exchange data (fig. 41.2). This provides complete network transparency as the microkernel automaticaly recognises if data transfre can be acomplished by simple memory copy operation or by use of local area network. Seperating device drivers from the core kernel has additional benefits of protecting the system from accidental memory corruption caused by badly written code. Also all processing is done at a priority determined by the thread on whose behalf they are operating [3]. 41.2.3 QNX Resource Managers Resource managers register a pesudo-file element in pathname space (e.g. /dev/nudaq). Such pseudo-files can be accessed by standard POSIX I/O functions like open ( ) , r e a d ( ) , w r i t e ( ) . When this happens, the resource manager receives an open request, followed by read and write requests. Resource managers not only deal with hardware devices, but can also provide functionality like filesystem interfaces. They are not restricted to handling just open(), readO and write() calls but can support any functions that are based on a file descriptor or file pointer, as well as other forms of IPC. In QNX Neutrino, resource managers are responsible for presenting an interface to various types of devices. In other operating systems, the managing of actual hardware devices (e.g. serial ports, parallel ports, network cards, and disk drives) or virtual devices (e.g. /dev/null, a network filesystem, and pseudo-ttys), is associated with device drivers. But unlike device drivers, the Neutrino resource managers execute as processes separate from the kernel.
41.3 Data acquisition subsystem in project Paladyn
Hardware Layer
Software Layer
Image Stabilisation module
Gyroscope
Microphone
"'W
NuDaq9112 Sound Sources Separation Module
Microphone Resource Manager
Fig. 41.3. Data acquisition scheme in project Paladyn
41 Data Acquisition in Robotics
523
41.4 NuDaq 9112 The analog to digital converter used for data acquisition purposes in project Paladyn was NuDaq 9112 32 bit PCI card. It has 16 single-ended or 8 differential analog inputs and it's own FIFO buffer. The card supports sampling with frequency up to llOkHz. Conversion can be initiated by one of three sources: Software Trigger A single value conversion is performed when a 1 is written into NuDaq's STR register. This mode is suitable for low frequency conversion because of big CPU overhead imposed by subsequents writes to STR registry. Timer Pacer NuDaq card is equipped with 3 programmable counters that can be used to trigger conversion at fixed frequency. When used together with DMA data transfer this mode is suitable for high speed conversion with very low CPU usage. External Trigger External frequency generator can be used to synchronise NuDaq's conversion speed with external devices. 41.4.1 Data transfer modes When the conversion is complete and data is stored in card's internal buffer one of following modes is used to transfer data to computer's memory: Polling Used in conjunction with software trigger. The software must check the state of DRDY bit, which is set to 1 when data become available. Then it can read converted value from data registry. Interrupt driven transfer NuDaq 9112 can use hardware interrupts to send data to the PC. In this mode the card generates an interrupt each time the conversion is completed. One can set up an interrupt handler that will copy the data from card's buffer to PC memory. This mode is asynchronous. Direct Memory Access The card "pushes" data to pre-allocated buffer in computer's memory and notifies it by signalling an interrupt when the transfer is complete. This method uses double buffering technique that combined with DMA transfer reduces CPU usage to almost 0%. 41.4.2 Driver architecture The core element of NuDaq driver is double buffer holding data transferred by the card and making it available to other processes via shared memory. One half of the double buffer holds samples that are being transferred from the A/D converter while the other half holds samples that can be read by other processes. When the first half becomes full their roles are exchanged. During driver initialisation PCI bus is programmed with physical address of intermediate continuous buffer that is used as a temporary storage for data copied from NuDaq's FIFO buffer to main drive buffer (fig. 41.4). Each time the FIFO buffer fills NuDaq card orders PCI controller to copy data to the continuous buffer.
524
Krzysztof Luks
(nudaq)
A/D converter V
Contlnous buffer
r~ (resource manager)
<-
1 1 1 "
.
open()
*
readO
1
/dev/nudaq A
1
data
\r
Double buffer
Resource manager interface (/dev/nudaq) is used both for configuring the hardware and for retrieving data.
Fig. 41.4. Driver architecture using only resource manager interface DMA architecture requires that memory area, to which data is copied by PCI controller, is continuous and is located in first 16 megabytes of computer memory. Additionally buffer address passed to PCI controller must be physical memory address. Therefore additional intermediate buffer was added to comply with above requirements. 41.4.3 Driver operation During startup the driver performs initialisation of PCI bus and NuDaq card. Next all buffers are allocated and interrupt handler thread is spawned. This handler is responsible for copying data from intermediate buffer to double buffer. Then two files are registered in local namespace: /dev/PCI9112W0 and / d e v / s h m e m / n u d a q . The former is used by clients to send control conmiands the latter is an access point to shared memory area containing sampled data. Driver then awaits for incoming events. Main thread responds to opening of /dev/PCI9112W0 file, while the interrupt handler thread sleeps and is awaken when NuDaq's FIFO fills up and PCI controller finishes transferring this data to intermediate buffer. It then copies the data to appropriate half of the double buffer and checks if active half needs to be changed. In such event the driver changes beginning of shared memory area address to address of currently active half of double buffer.
41 Data Acquisition in Robotics l ~ (resource manager)
(nudaq);
A/D converter y
Continous buffer
y
525
<
1
1
1
1 /dev/nudaq
1
configurat
1
command
[
r
]
1 memcpyO
J
1 data
Double buffer (shared memory)
Only configuration requests are passed through /dev/nudaq. Data is read from shared nnemory segment pointing to filled half of driver's double buffer.
Fig. 41.5. Driver architecture using shared memory This way clients only deal with single buffer available through shared memory access point / d e v / s h m e m / n u d a q . The client library offers function AI_AsyncDblBufferHa I f Ready, which checks if current half of double buffer is ready to be read. It uses resource manager /dev/PCI9112W0 interface to complete this task. If this function returns non zero values the client can copy data from shared memory region to its local buffer.
41.5 Applications NuDaq board controlled by described driver was successfully used in project Paladyn for reading data from microphones and gyroscopes. The board was set up to sample data from two microphones at 44 kHz frequency and at 2 kHz frequency from piezoelectric gyroscope. Sampled data was read by two running concurrently applications: image stabilisation module and sound separation engine. They both run wait for data become available for them and copy samples from shared memory buffer to their local address space.
526
Krzysztof Luks
41.6 Conclusions Thanks to flexible nature of QNX microkernel architecture it was possible to build stable and reliable driver. Also retaining source-level compatibility with Linux client library was possible. The driver was tested in project Paladyn where it was used to obtain sound samples from microphones and data from gyroscopes and accelerometers.
References 1. Xie M (2003) Fundamentals of robotics. Linking perception to action. World Scientific, New Jersey London Singapore Hong Kong 2. Arkin RC (1998) Behavior-based robotics. MIT Press, Cambridge London 3. QNX Software Systems 2003 QNX system documentation 4. Luks K (2003) System stabilizacji obrazu i akwizycji danych dla glowy robota humanoidalnego. MSc Thesis, Polish-Japanese Institute of Information Technology, Warsaw (in Polish)
42 Spatial Sound Localization for Humanoid Lech Blazejewski Polish-Japanese Institute of Information Technology PJIIT Robotics and Multiagent Systems Lab.,Koszykowa 86, 02-008 Warsaw, Poland [email protected]
Summary. The problem of sound source separation and localization is a challenging task not commonly addressed in today's humanoid robotics projects. Additional (prior to visual) spatial auditory information is required to control humanoid head's attention more accurately. This is done in order to augment human-robot interaction. Humanoid head robot operates in noisy, dynamic environment. System capable of handling random noise in order to separate sound sources, and localize them in robot's coordinate system is presented. Key words: humanoid robot, auditory localization, spectral analysis
42.1 Introduction The idea of embodiment lays beneath most of today's humanoid robotics projects. In order to achieve robot's intelligent behaviour and make it very human-like, it is necessary to make the same sensoric modality available to robot that is for humans. Embodiment could be the answer to the problem. It allows emulation of human senses as artificial sensoric systems. It is a good idea to start with the embodiment process when constructing an humanoid head, as it contains the greatest number of human senses. Robot's intelligent behaviour relates to it's interaction with the surrounding world. In this paper we discuss the problem of sound localization. Sound source separation and localization is an important task on the way of attention control. Some facts (such as talking person) happening in robot's surrounding could possibly be out of sight. With detection of significant sound source and the ability to localize it, robot could possibly turn its attention to the auditory event, saccading into new area. With the help of auditory localization system, robot's attention would be much more flexible, allowing the new modality of interaction. Humanoid head named „Paladyn" has been constructed in order to sustain human like sensoric systems - optokinetic and auditory modules. The auditory hardware consists of two microphones 18 cm apart, mounted on head's corpus. The model is assumed to be linear, as there is no any spherical casing - the sound can easely pass
528
Lech Blazejewski
the head interior, where the shortest possible way between the ears is approximated by straight line.
Fig. 42.1. The "Paladyn" humanoid head We will first discuss localization methods for the head spatial configuration. Then we'll emphasize on the application of the whole auditory module, explaining technical issues of the process of sound localization of "Paladyn" head.
42.2 Binaural localization methods In the late XlX'th century John William Strutt, also known as Lord Rayleigh formulated so called duplex theory. In this model he explained the process of human sound localization using classic wave physics. Duplex theory discusses sinusoidal sound source localized on the one side of the human head. The sound reaching the further ear seems to be delayed temporally and has lower amplitude. Strutt's theory defines two major localization cues: Interaural Time Difference (ITD) and Interaural Intensity Difference (IID). Typical human head ranges in 18 cm of diameter (the ear to ear distance, reflected in the robotic head construction). The model of the head is simplified to the two receivers separated by linear or spherical distance. Interaural Time Difference is derived stright forward from the phase difference of the waves perceived at the same time instance. When the sound source is located on the far side of the head the signal received in the closer ear is of different phase then perceived in the other ear at the same time. This process is valid for the low frequency sounds, where the wavelength is shorter than the interaural distance. In the case of higher frequency sounds there is growing ambiguity because the phase differences measured are amongst different wave cycles.
42 Spatial Sound Localization for Humanoid
529
For the sounds of wavelengths shorter that the interaural distance diffraction is very small. The waves are being scattered and suppressed creating an significant acustical shadow on the opposide side of the head. The difference in amplutude of sound received in both ears can reach up to 20dB. That sort of shadowing does not occur for the long sound waves for which the diffraction is strong. The sound localization cues are limited in accuracy as a function of frequency. For the head of linear structure, the ears - microphones are separated only by linear distance of 18cm, the ITD cue is useful up to the frequencies of J^ 1911Hz. Human percieves sounds of 20Hz to 20kHz range. For the linear head sound wave diffraction (as well as the validity of IID cue) starts from «1911Hz and is stronger with the higher frequencies. The conclusion is that the ITD and IID cues are complementary and used together enable us to localize sounds of all audible spectrum. The localization methods presented in duplex theory explain the horizontal localization processes only. Even that the IID and ITD cues are producing uncertain data, that could be falsely interpreted. Geometric analogy explains the problem. 42.2.1 ITD and sound localization ITD could be understood as a time of flight of the wave that travels the distance between the ears (microphones). That distance could be expressed as a distance difference (Jntaraural Distance Difference - IDD) of paths (Dieft, Dright) travelled by sound to both ears: (42.1)
IDD = Dieft - a^right S(x,y)
L(-f,0)
(0,0)
P(f.O)
Fig. 42.2. Geometry of linear head model
With the assumed constant speed of sound in the air Vsound ^ 344 ^^ IDD could be expressed as the path travelled by a sound wave in time ITD:
530
Lech Blazejewski IDD
(42.2)
= ITD * Vsound
The remaining problem is to define the relationship between the IDD and the sound source position in head's surrounding. The construction of our humanoid head „Paladyn" could be approximated as a linear in terms of interaural space, what leads to quite an simplification of the model: L and R are the left and right ear (microphone) separated by robot's head diameter B, where S represents the sound source. When all those parameters are known the IDD could be expressed as subtraction: IDD = \LS\ - \RS\ = \l{x+^y
- \l{^-^y
+ y^
(42-3)
The ITD cue itself isn't enough localize the source precisely. There is infinite number of pairs of paths A e / t , Dright resulting identical ITD value. In fact all those possible sources are located on the surface of hiperboloid, also called an cone of confusion.
Fig. 42.3. Hyperbola asymptotes approximate the sound source location
As we can see at the distance at least y the direction of sound source is unequivocal to the asymptote of the hiperbola. We need to rewrite the IDD equation in order to find the asymptote formula: X
a where
IDD'^
0
(42.4)
IDD{B^ - IDD^) (42.5) 4/DD2 The formulations of the asymptotes could be written as a function of variables a and /?: a
Al:y=
/3 =
\ -X A2:y = -\ -x (42.6) ya Ma The asymptote inclination over the x (binaural) axis is unequivocal to location of sound source and it is expressed as:
42 Spatial Sound Localization for Humanoid (f)Ai = arctan J -
(t)A2 = arctan f - J ^ |
531 (42.7)
This localization approach is unable to differentiate the front-back location of sound source. 42.2.2 IID and sound localization When the sound source is emiting wavelengths smaller then the diameter of the head the scattering of the wave occurs which produces significant sound suppresion and acoustical shadowing. That results in noticable amplitude difference (IID) in signals recorded in both ears. IID cue is directionally dependent. With the sound located on front of the head the IID is approximately equal to zero, with the source moving to the side of the head, one receiver acquires sound of weaker amplitude. IID is the measure of difference of acoustical power dissipated on the receivers, expressed in decibels: IDD = 20 log10
bright
Si eft
(42.8)
where Sright and 5/e/t are the signals or the signal groups acquired in right and left ear. The data processing is being done on discrete samples. The IID estimation could possibly be done on each individual pair of samples from left and right ear, and also on mean value of all the samples in left and right sampling window. IID based localization only assist ITD localization to resolve the left-right ambiguity.
5000 m
4000 I
I
t
•o 3000 c
I
^
I
2500 I
t
«
m iti 1111H11 itfi 1
O 1800 I
Fig. 42.4. Interaural Intensity Differences as a function of sound source location and intensity. Courtessy of C. J. Moore [2]
532
Lech Blazejewski
42.3 Binaural cues estimation The practical problem in implementation of the sound localization system is simplified to estimation of binaural cues: ITD and IID. The system solves the problem in two separate ways: the full spectral analysis and correlation based processing. The resulting ITD's and IID's are then concatenated in DPF (see diagram), giving enough knowledge to estimate the sound source position. przetwarzanie spektralne
Fig. 42.5. Auditory localization module diagram
42.3.1 A/D Conversion First the analog signals from the microphones are being A/D converted. We use the Nudaq PCI-9112 converter allowing the conversion speeds up to 55kHz per channel (when converting two channels) at 12bit resolution. Because of the card's DMA capability the hard real time coversion and analysis became possible. The choice for the conversion speed was an tradeoff between perceived frequency range, the time available for analysis and how precise the angular estimation of source could be. The human audible range lays between 20Hz and 20kHz. According to Nyquist law the sampling frequency should be 40kHz or higher. The practical choice was 44kHz. That choice fixed the sample length at 22.7/iS, what lead to the maximum theoretical ^ precision of localization (for the linear head model) set at 3.9°^ Another problem was the length of sampling window. Again it was a tradeoff between it's size and the hard real time requirements for the auditory localization system. The window of 2048 samples has been chosen, lasting for 46 ms and providing plausible quality. Auditory localization system works in cycles. Each cycle ^ when calculating ITD t hrough correlation of two signals, the minimal step of correlation is also the minimal possible difference computable. The size of minimal step (identical to the sample length) might be understood as resolution of the correlation process ^with the linear „Paladyn" head, 18 cm in diameter, the time that sound travels from one ear to the other is equal to 523)Lts. Taking the sample period T = 22.7/iS we divide ^j ^ 23 sections (an distingushible localization slices in 90 degree localization area), which leads to 3.9 degree localization precision
42 Spatial Sound Localization for Humanoid
533
Starts with A/D conversion which lasts for 46 ms, during that time all the computation (the estimation of ITD/IID and localization) has to be finished, until the new pack of sample data arrive. This enables the auditory system work at Q - ^ ^ 21 Hz - which complies with the real time strategy. 42.3.2 Discrete Fourier Transform All the localization analysis is being done in frequency domain. The next step of data processing is time-frequency domain shift using FFTW {Fastest Fourier Transform in the West) implementation of Fourier Transform. This process is biologically plausible, as the analogical process of frequency coding is found in human cohlea. To improve the resolution of the spectrogram we use the zero supplementation technique^. The resulting spectrogram resolution is ^^^ ~ lOHz. 42.3.3 Spectral processing Spectral based localization processing occurs in frequency domain. All the frequencies representing pure tones are being analzed. Tone extraction Tone extracion helps to avoid analysis of the spectral leakage peaks. It significantly lowers the computation time as the processing occurs only for the peaks fn commiting the condition:"^ fpeak ' fn SO that fn-1 < fn > /n+1 whcrc fn > treshold
(42.9)
Spectral ITD calculation Interaural Time Differencie is proportional to phase difference (IPD, Interaural Phase Difference of any given subband pair^. Using the Fourier transform form of interaural signal^ we calculate the IPD for every peak frequency in spectrum:
'^""••'='-•" (l|life§) -"'- (IHfes) '-••»' IPD is expressed in radians. The evalutaion of ITD^ is straightforward, given the frequency of the peak: IPD ITDf^... = ^i^:r(42.11) ^'^ Jpeak
^supplementing the 2048 widow samples with additional 2048 zeros "^being a local maximum peak which amlitude exceeding the given treshold ^by subband pair we understand any pair of peaks of interaural signals that comply to tone condition ^by ^ i rinteraural signal we think of two signals received at two, spacially separated ears ^ which is measured in seconds of delay
534
Lech Blazejewski
Spectral IID calculation For every peak frequency in analyzed time slice there is also a slight intensity difference (varying with the frequency, and the source position). The IID is expressed as a measure of momentary power difference:
where PUfpeak) = mSlifpeak)?
+ mSUfpeak)?
(42.13)
Pnifpeak) = mSniUeak)?
+ {^{SRifpeak))^
(42.14)
IID enforced ITD localization The IID cue is measured in decibels, and in a case of realtime system can only be used for left-right sound source differentiation^. For the frequencies up to ?=::il911Hz it is worth to calculate ITD (in case of linear model of „Paladyn" head). For the higher frequencies there is arising ambiguity, and only IID. remains as localization cue. Because calculating of IID and ITD is a computationally cheap process, processing is done for all the tonal peaks in the spectrum. With the known ITD, IID, interaural distance^ and the speed of sound in the air Vsound = 3 4 4 ^ the localization of peaks formula is derived from the hyperbola asymptote equation:
0^XM2 = 4-/ - arctan 1^^
( F . . . . , ./Ti^)3
j
^'^'''^
There is arising left-right and front-back localization ambiguity, because single ITD value reflects four possible positions of sound source. This could be avoided using IID cue. An IID is a measure of proportionality between left and rigt signal powers (as shown in equation 42.12. If Puft > Pright (signal is stronger from the left side), the value of IID would be in [0,1], when the situation is opposite, Pright > Pie ft, the values are between [1, oo].
^it is possible however to use the IID information to create a directional filtering function of the head. This localization methods are related to HRTF's (Head Related Transfer Functions) and thus comptutationally epensive and unresistant to random noise are of no use in realtime, real-world reactive auditory systems ^18 cm for „Pal adyn" head
42 Spatial Sound Localization for Humanoid
535
42.3.4 Correlation processing Still, even with localized tonal frequencies we need to refine sound sources. Each sound source consist of one or more tonal frequencies. A good way to extract major sources is to correlate two signals in time domain. We could understand the correlation process of two (left and right ear) signals as a similarity in the function of time delay r . Every peak in the correlatogram means higher similarity between signals at given delay (interpreted as ITD), and in fact the possible sound source angular position ^^ To simplify the process of detection of local maximums in the correlatogram the input signals are half-wave rectified. This means treating all the negative values of the time-domain signals as zero (so the redundant information enclosed in the signal is lost). Because the aim of correlation process is to extract the ITD, represented by r seconds, as read in correlatogram, it is appropriate to apply low-passfiltration(LPF) to the 20Hz-20kHz signal. As we stated the ITD cue is valid - in „Paladyn" humanoid head case - up to 1911Hz. The work of J. Blauert and W. Cobben proved 800Hz as optimal cutoff frequency. Because the major aim of humanoid head is to interact with it's environment, especially humans it should be able to localize sounds of frequencies of typical speech. The compromising value for cut-off frequency was threrefore set to 1500Hz. Weighted cross-correlation The perceived sound has been digitized, transformated, half-wave rectified and lowpass filtered. Still, there is a need to enhance the SNR {Signal to Noise Ratio) of the signal. To achieve more reliable ITD estimation through correlation of two signals the additional weighting is performed. The correlation known as SCOT {Smooth Coherence Transform is applied using correlated and autocorrelated signals:
^Q \/GLL{m)'
GRR{m)
Where the GLR and GLL^ GRR are respectively correlated and autocorelated signals in frequency domain: SUt) 0r SR{t) = FSl{m)FSR{m)
(42.17)
The problem of estimation of ITD is simplified to estimaton of maximum value of the correlatogram. ^^the accuracy of this method is limited by the resolution of the given discrete signal window
536
Lech Blazejewski
Direction-pass filtering The last filtering process that leads to distinguish separate sound sources and it's spatial localization is DPF (Direction Pass Filtering). In fact the DPF stands for comparing the sound sources detected by correlation with the localization of separate spectral tones.
DPF 1 1 1
zrodto 1 ITD^
Ir wcorr
zr6dto 2 max{...}.
ITD
1 !
lokalizacja i ITD 1
lokalizacja |_ ITD f
1
1
I—
1 1
komparator
1 sourcesjist 1
*
1
ITD subband table
1 1
1 1 1 1 1
Fig. 42.6. Direction Pass Filter First the DPF extracts local maxima in the weighted correlatogram (up to three as the laboratory testing shows). Then the area between the each local maximum and surrounding minima is called a separate source. The source has strong directionality at the ITD represented by maximum and weaker at falloffs. Then all the localized frequencies (found in spectral processing procedure) are compared with the refined sources. If at least one of the localized frequencies comply with the localized source the source is treated as valid and representative. Such information is then stored and used in the process of humanoid head behaviour.
42.4 Conclusion The system of binaural localization proved itself as functional and reliable to localize sound sources up to 30 degree accuracy (and up to 2 sources at a time) in noisy, everyday environment. As long it is not as accurate as human sound localization (which is fully spatial, and much more accurate, up to degrees in frontal plane) it's main goal - the control of visual attention - has been achieved. The auditory module itself, because of it's modular design is fully scalable and provides all the low-level audio data. This information can be used by any other auditory module for the humanoid head, without any redundant work. The first goal of the „Paladyn" project ~ to create an low level sensoric behaviours has been achieved.
42 Spatial Sound Localization for Humanoid
537
References 1. Blauert J (2001) Spatial Hearing. The Psychoacoustics of Human Sound Localization. MIT Press, Cambridge MA 2. Moore B (1999) Wprowadzenie do psychologii slyszenia. PWN, Warszawa 3. Hartmann W (1998) Signals, sound and sensation. Springer, Berlin Heidelberg New York 4. Lindsay H, Norman A (1984) Procesy przetwarzania informacji u czlowieka. Wprowadzenie do psychologii. PWN, Warszawa 5. Knapp C, Carter C (1976) The generalized correlation method for estimation of ime delay - IEEE transactions on acoustics, speech and signal processing. Vol. ASSP-24, No. 4:320-327 6. Natale L, Matta G, Sandini G (2002) Development of auditory - evoked reflexes: Visuo - acoustic cues integration in a binocular head - Robotics and Autonomous Systems 39:87-106 7. Irie R (1995) Robust sound localization: an application of an audtitory perception system for humanoid robot. MIT, Cambridge MA 8. Martin K (1995) A computational model of spatial hearing, MIT, Cambridge MA 9. Wasson G (1995) Using acoustic information to control visual attention, University of Virginia VR 10. Nakadai K, Okuno H, Kitano H (1998) A method of peak extraction and its evaluation for humanoid, Japan Science and Technology Corp. Tokyo 11. Blauert J, Cobben B (1978) Some consideration of binaural cross correlation analysis Acoustica 39:96-104 12. Yost W, Gourevitch (1987) Physical acoustics and measurements pertaining to directional hearing - Directional hearing 2:3-33, Springer-Verlag, Berlin Heidelberg New York 13. Feddersen W, Sandel D, Jeffress T (1957) Localization of high frequency tones - J. Acoust. Soc. Am. 29:988-911 14. Scassellati B (1999) A binocular, foveated active vision system, MIT, Cambridge MA 15. Murray D (1993) Design of stereo heads - Active Vision, MIT Press, Cambridge MA 16. Nakadai K, Lourens T, Okuno H (2000) Active audition for humanoid, Japan Science and Technology Corp, Tokyo 17. Nakadai K, Lourens T, Kitano H (2000) Expoiting auditory fovea in humanoid-human interaction, Japan Science and Technology Corp, Tokyo 18. Okuno H, Nakadai K, Kitano H (2002) Non-verbal eliza-like human behaviours in human-robot interaction through real time auditory abd visual multiple-talker tracking, Japan Science and Technology Corp, Tokyo 19. Nakadai K, Hidai K, Okuno H (2001) Real-time multiple speaker tracking by multi modal integration for mobile robots, Japan Science and Technology Corp, Tokyo 20. Okuno H, Nakadai K, Lourens T (2001) Separating three simultaneous speeches with two microphones by integrating auditory and visual processing, Japan Science and Technology Corp, Tokyo 21. Okuno H, Nakadai K, Lourens T (2000) Humanoid active audition system, Japan Science and Technology Corp, Tokyo 22. Nakadai K, Hidai K, Okuno H (2001) Real-time active human tracking by hierarchical integration of audition and vision, Japan Science and Technology Corp, Tokyo 23. Nakadai K, Matsui T, Okuno H (2001) Active audition system and humanoid exterior design, Japan Science and Technology Corp, Tokyo
43 Oculomotor Humanoid Active Vision System Piotr Kazmierczak Polish-Japanese Institute of Information Technology PJIIT Robotics and Multiagent Systems Laboratory, Koszykowa 86, 02-008 Warsaw, Poland [email protected]
Summary. This paper presents stereo active vision system for a humanoid head developed at PJIIT Robotics and Multiagent Systems Lab. The system is able to foveate on a salient object and then maintain that object in the focus of attention. The task decomposition was based on the developmental approach of human infants thus allowing step by step achievements of skills. Visual behaviors matching eye movements were implemented on a network of computers running real time system. Key words: humanoid robot, active vision, visual behaviors, tracking
43.1 Introduction The main motivation for creating humanoid robots is that human-like intelligence needs human-like interaction. Biologically inspired robots seem to be ideal platform for active vision and visual attention experiments. Interaction v^ith objects or humans are very important to learn behaviors. Vision plays an important role in directing attention to the object and gathering information [9]. Because w^e believe that the w^orld cannot be represented in symbolic forms, we have adopted alternative methodology [2] which emphasises embodiment and developmental approach to task decomposition thus allowing step by step achievements of skills. Developmental approach gives structured decomposition and provides gradual increase in complexity [7]. 43.1.1 Robot platform specification Humanoid head PALADYN consists of visual, auditory and inertial system. This paper outlines only visual characteristics of our robot. Readers interested in binaural localization and inertial systems are referred to other work. Our hardware configuration consist of a bunch of Pentium-IV class computers running QNX RTOS (controller, visual and auditory modules) as well as GNU/Linux (speech synthesis system). All of the machines communicate via 100Mbps Ethernet.
540
Piotr Kazmierczak
Front and side view of the 5 DoF robotic active vision platform capable of mimicking human eye movements. Each eye consists of a pair of cameras (wide and narrow field of view) corresponding to the peripheral (low res) and foveal (high res) areas of the retina.
Fig. 43.1. PALADYN, humanoid active vision head. Peripheral accessories include Galil DMC-1850 motion control card (mechanical DoF control), two AdLink NuDAQ-9112 data acquisition cards (sound sampling, accelerometers/gyro communication) and four Imagenation PXC-200AL framegrabbers (visual data acquisition). Mechanical overview Mechanical system of our robot was inspired by the physiology of humans. It is believed that millions of years of evolution resulted in the most adapted visual system to the environment where people live. Oculomotor control system of our robot mimics three basic voluntary movements (described in more detail in section 43.2). In order to express those movements our mechanical implementation characterises 5 degrees of freedom (right-left separately for each eyeball, up-down for both eyes, up-down for the neck and right-left rotation for the whole head). Camera system Each eyeball consists of a pair of color cameras: peripheral camera used primarily for motion detection in the field of view and high resolution foveal camera used for detailed analysis of the characteristics of the object. This configuration gives us enough ability to imitate oculomotor control and voluntary visual behaviors expressed by humans.
43 Oculomotor Humanoid Active Vision System
541
43.1.2 Software environment The whole system runs on a network of QNX computers. QNX was chosen because of its scalability, portability, memory protection and QNET transparent network communication. Visual behaviors and attention modules make extensive use of Intel's Open Computer Vision Library. OpenCV proved to be very useful in many areas (human-computer interaction systems, object identification, segmentation, face/gesture recognition and motion tracking).
43.2 Eye movements Ability to mimic human eye movements is important to keep an interesting object in the central field of view of the robot (fovea). This requires constant translation of visual information to appropriate motor commands such as change in position or velocity of the motor. Human eye movements are the combination of three basic voluntary movements [6] (saccade, smooth pursuit and vergence): O Saccade. Ballistic eye movement in response to stimuli or position error. As a result of execution target is placed in the fovea. Operates in open loop (no visual feedback during execution). ® Smooth pursuit. Following movement in response to target velocity error on the retina. Maintains moving object on the fovea. Operates in closed loop (continuous visual feedback). ® Vergence. Movement activated in response to relative image disparity on both retinas. Adjusts the eyes for viewing objects at varying depth. and two involuntary movements [6] (VOR and OKN): ® Vestibulo-ocular Reflex. Stabilises the eyes during rapid head motions. Compensates head rotation through the stimulation of the semi-circular canals and the otolith organs located in the inner ear. Maintains direction of gaze by counterrotating the eyes. ® Opto-kinetic Nystagmus. Stabilises image on the retina. Measures optical flow on the retina (for motions slower than VOR). Following by the eyes of any large moving pattern. All of the voluntary eye movements were implemented on humanoid robot PALADYN. Movements of the eyes consist of saccades and smooth pursuit operating alternately. Vergence and smooth pursuit operate in parallel. Visual behaviors performed by voluntary movements include locating, fixation, wandering and tracking of targets. One of the most important algorithms used by the components of the visual system is normalized cross-correlation. The idea is to find the image region of / that is most similar (best matches) to the given template T. Normalization of the images makes it additionally less prone to errors resulting from luminance change
542
Piotr Kazmierczak
during camera movements. Given the input image / (raw camera image) with W X H and a template image T with w x h, the resulting image has dimensions {W — w -\-l) X {H — h-{-1) where values in each location (x, y) characterise the similarity between the T and the part of the input image [rectangle with the top-left comer at {x,y) and right-bottom comer at {x -\- w — l^y -{- h — 1)]. Similarity is calculated as follows: h-1
C{x,y)
w-1
y'=Ox'=0 h—1 w—l
(43.1) h—1 w—1
where / (x, y) is the pixel value of the searched location (x, y) and T (x, y) is the value of the template pixel in the location (x, y). Last step is to find the location for which the resulting value of the image pixel is the highest. Distance between the maximum location and the center of the image / gives the displacement. 43.2.1 Saccade to stimuli Saccades are extremely fast (up to ^ ^ ) eye movements that focus the selected target on the fovea (high res. area of the retina). One of the characteristic features of this type of movement is that the precise distance from the center of the image to the location of the target is known before saccade execution. In case of an artificial system, executing a saccade means moving eye motors so that the resulting distance from the target to the FoV center is 0. One method would require translation of pixel values (distance x, y) to the respective offset values for each axis. Because of nonlinearities resulting from the distortion of wide angle lens (peripheral cameras), simple mapping between the distance on the image plane and motor offsets necessary to precisely foveate is not satisfactory. Marjanovic [5] presented solution to this problem. Our system adapted his method and learned to saccade to salient stimuli decreasing the error displacement resulting from the nonlinearities of the lens. Initial step requires construction of interpolated saccade maps that contain precalculated offset values for appropriate areas of the image. Each part of the input image of size 8 x 8 corresponds to one cell in the double-channel saccade matrix. Each channel contains motor offset values for x and y respectively. Active area of the input image is delimited by the rectangle
where T, / , w, h is template, input image, width and height respectively.
43 Oculomotor Humanoid Active Vision System
543
Standard size of images used by all of the visual modules is 384 x 288 (which is exactly I of the PAL resolution: 768 x 576). Interpolated saccade map for that resolution is 48 X 36. At first system defines constant factor for a single pixel (corresponding to the encoder offset that moves eye by one pixel). Knowing factors for each axis (x and y determined during self-calibration) each matrix cell becomes initialized by the distance from the center of the matrix (24 x 18) to the cell coordinates (x, y) expressed in encoder offset values. Once at the beginning saccade matrix gets initialized with default values determined during self-calibration. For each column y in each row x of the matrix, every point px^y holds distance from the center of the matrix:
where w, h are respectively the number of columns and rows in the matrix. For each cell X, y new value gets defined: Ox = (PxFm)
Fp]
Oy = {pyFm)
Ft,
(43.4)
where o is the new value of the cell, p: coordinates of the new point (after conversion), Fm'- interpolation factor (here 8) and Fp, Ft: pan and tilt constant factors for a single pixel respectively. For each learning trial or target point, visual target coordinates get converted as follows: Px,y = ^Fm(43.5) Image patch surrounding target point gets converted to grayscale and saved in T for future use. Robot attempts to saccade to target location using map estimates in px^y. The central patch of the center of the new image gets converted to grayscale and saved in C for future use. Normalized cross-correlation is performed on T and C. If the maximum value of the resulting image is beyond given threshold, then the object is declared as lost. If the error value (displacement of T in relation to C) is greater than 0 for any axis, then the offset value corresponding to the target point in the matrix gets updated: Ox=Ox-^ {exFp);
Oy = Oy - {syFt),
(43.6)
where o is offset, e: error, Fp, Fm'- pan and tilt factors respectively. 43.2.2 Smooth pursuit tracking Smooth pursuit is one of the slowest voluntary eye movements. The purpose of this movement is to keep moving object in the center of the fovea. In artificial visual system such as humanoid head PALADYN, smooth pursuit means constantly keeping moving object in the field of view of the narrow (high resolution) camera which enables the system to process image without the risk of losing valuable information. Smooth pursuit gets activated automatically after saccade to the object. Pursuit will continue for slow motions (below ^ ^ ) and as long as the object stays in the field
544
Piotr Kazmierczak
[safora saecade: saving tAsgat {>atch
~ j [After aaecadie: coixaiating against target |
LEFT: saccade internals: before and after execution of the move (cross-correlation). RIGHT: interactive training program (user defined or randomly chosen target location). Fig. 43.2. Saccade mechanism
^
eg e f f s « Xiitosc, LA
E
Off£« vMrtosc, T
J
Wy,UJr»r«.
p;7TT.T*T^7o7roW«Lp«M w f ^ ^ ^
1
I°l'"::''.^''l' !:*"/1-''""-"L"!:'*J!1" i*"i
l;!!r^!U^°:!'J :!!'^.;U?:!iJlJ:fr"tL
l.|
| w.;;;..rj z.un» j z^^.,., j z.,.» |.y9»„c.
"" J
LEFT: saccade map after initialization with values obtained during self-calibration. RIGHT: saccade map being dynamically updated during learning trials. Fig. 43.3. Saccade map viewer
of view. Losing tracking objects triggers saccade to the most salient object in the activation map (see section 43.3.5). At first (right after the saccade) central patch (50 x 50) of the current image gets converted to grayscale and saved for future use. For each consecutive frame normalized cross-correlation is run on the saved template and the central patch of the current image (100 x 100). Distance x, y from the maximum value in the resulting image to the center of the current image multiplied by CLCC^LJI^X}^ ^^^{Li?|T} gives new values for motor velocities and acceleration (ace: acceleration, vel: velocity, LR, T: left-right and tilt DoF respectively). Smooth pursuit stops as soon as the maximum value in the resulting image is below constant 0.95. This is treated as if the object was lost which naturally triggers saccade to new target (possibly lost one if it is moving). Because of natural distortions in the tracking object (scale changes, occlusions etc.) in some cases tracking template must be updated. Each time the maximum value of the resulting image drops below 0.98 template gets updated by the current central
43 Oculomotor Humanoid Active Vision System
545
patch of the image (50 x 50). Other less computationally intensive methods of tracking based on the optical flow are currently under heavy investigation. In the nearest future Kalman-like predictor for non-linear systems will be integrated. ^^^^^^^^S^^^H^^^^^^^^^Pp^
;1
h"
H/
m
f—r 1 l^.^™-™^.^
™,.
•••^••Hl
[ During ptitsttit: measuring
I B«£or« pursuit: saving central j ^ t c h
j
J.
[ p s s j i..:;;riF;';^„i.i..»u.«;n..»».sn
LEFT: following of the moving object (motor velocity update based on disparity). RIGHT: smooth pursuit at work (real time visualisation of the pursuit algorithm).
Fig. 43.4. Smooth pursuit mechanism
43.2.3 Binocular vergence Vergence is the slowest voluntary eye movement. By measuring image disparity between both retinas it adjusts the eyes for viewing objects at varying depth. Other depth perception methods include stereovision, but these have much higher requirements and rarely work in real time configurations. Vergence is important for object manipulation, figure-ground separation and collision avoidance. Adopted approach to disparity measurement is very simple. Vergence module makes use of normalized cross-correlation algorithm to find regions of the image that match the template. The idea that lays behind the vergence is the assumption that by utilising saccades and smooth pursuit, tracking object always stays in the center of the dominant eye. Thus measuring horizontal disparity between both images (from both peripheral cameras) is realised by measuring disparity from the center of the template that matched best in the non-dominant eye and the center of the image. The resulting value (x axis) multiplied by constant factors (accR, VCIR) drive velocity and acceleration of the non-dominant eye. Vergence and smooth pursuit running in parallel allow real time tracking of moving objects in three dimensional space. Future plans include re-implementation and integration of zero-disparityfilter[4].
546
Piotr Kazmierczak
[-|>9»-^ijteai4j>«nt "aye; ,»»vi»g c a o t r a l ' p » i ; ^
] [ OonUtoant »ye? mfttouy-ih^ ,v0pg»"ne» egyor
]
LEFT: disparity error drives acceleration and velocity of the non-donninant eye motor. RIGHT: estinnating horizontal disparity between image templates from both cameras. Fig. 43.5. Vergence mechanism
43.3 Pre-attentive selection of stimuli It is impossible to process every single information received from the environment. Attention acts as a mechanism of early selection of relevant stimuli thus reducing computational requirements of the system. The most common view on human attention is two-level processing: pre and postattentive. Pre-attentive (parallel) processing works beyond conscious mind and is fully automatic. This level of attention was implemented on our robot. Post-attentive (serial) processing is available to conscious inspection and planning. This level of processing is beyond the scope of the project. 43.3.1 Motion saliency Motion is one of the most important stimuli allowing humans and other biological systems to extract moving objects from the background. Orienting head towards objects in motion is very important for proper human-robot interaction. Even insignificant motions like unconscious blink can be distinguished by robotic system and trigger saccade to that salient stimuli. Keeping eye contact also influence the way people treat and interact with robots resulting in a more natural interaction. Motion saliency map is based on the difference between two consecutive frames received by the same camera. Every incoming frame is converted to grayscale, downsampled and saved in a ring of frame buffers. Every incoming frame / (at | size of the PAL resolution) is converted to grayscale and down-sampled: output image O requires down-sampling input image / with prior gaussian smoothing (structuring element size for morphological operations: 5x5): O.
-f 1;
OH
+ 1.
For each new frame O motion saliency map M is constructed as follows:
(43.7)
43 Oculomotor Humanoid Active Vision System
547
M{x,y) = \Ot{x,y) - Ot-i{x,y)\,
(43.8)
where Ot and Ot-i is current and previous down-sampled image respectively. 43.3.2 Habituation One of the distinguishing features of human visual system is the ability to dynamically change the object of attention. Infants respond strongly to novel stimuli, but soon habituate and respond less as familiarity increases [3]. Saccade places object of interest at the central area of the image plane (at the hires narrow FoV camera corresponding to fovea) thus we can assume that the object on which robot is fixating is located in the fovea. Simple mechanism of habituation was adopted based on the previous work of Scassellati (2001) and Breazeal (2000) [8, 1]. Following saccade central areas of the habituation map receive maximum values. The purpose of this is to keep interesting object in the center of the image right after the movement. If the object remains static and none of the motors are running, saliency values begin to decrease over time approaching minimum after a few seconds. This prevents fixation on the central object eventually resulting in ballistic saccade to new salient stimuli located at the periphery of the image. Figure 43.6 shows habituation mechanism at work.
\ ^^^^^^^B^ / lljgjgiich
_
II ___{i 0 — . Babitttatlon ua£> «:han
]
|s*SWl)>«4,j
ZikolKt
glfc»>«i S Ji-!V«'y<«ia«» p
0 |j09M"«
C \'iff»if*
D1
i. )
LEFT: internals of the habituation mechanism, changes of the map values over time. RIGHT: building activation map combining habituation, motion and sound saliency. Fig. 43.6. Habituation mechanism
43.3.3 Sound localization Although not directly connected with visual attention, information from binaural sound localization system can influence robot's actions by providing additional information about the environment surrounding the robot.
548
Piotr Kazmierczak
Often moving objects generate sound which combined with other salient stimuli can greatly improve accuracy of saccade activation. Saliency map for sound source localization uses information such as signal strength S (value in range of 0 — 255), signal location L (value in range of 0 — 383 - signal must come from the source that is actually visible) and the accuracy level P (where lower value means higher accuracy). PALADYN is capable of localizing sound source in one dimensional space thus each sound information represents a bar on a saliency map. A bar of width 2P and value S is placed in each location L. Values decrease gradually from the center of the bar at S to each border where they approach 0. Example saliency map for sound localization is presented in Figure 43.7.
I Two sound gow:c«8 W i ^ varying atgeiigtfa
| | thya* »Q«nd sources {one possibly moving)
LEFT: bars on the image represent sound source locations and strength of the signal. RIGHT: activation map combining motion and sound saliency.
Fig. 43.7. Sound saliency map
43.3.4 Other saliency maps There were attempts to integrate robotic system with other types of saliency maps such as depth perception (stereovision algorithm), color (including skin color feature map), and face recognition. Due to very high computational requirements those maps were not fully integrated at the current stage. 43.3.5 Building activation map Each saliency map represents distribution of specific stimuli across the visual field (motion, sound, habituation, other cues). Attention process combines saliency maps to produce activation map. Each saliency map is represented as a grayscale image where values represent the amount of saliency (from no stimulus to maximum stimuli: 0 — 255). Constructing activation map consists of summation of every single saliency map S (with respect to the weight a) with the actual activation map A:
43 Oculomotor Humanoid Active Vision System
A = S-a + A'{l-a)-\-l,
549
(43.9)
After summation, the resulting activation map is segmented and binary thresholded. Area and centroid for each segment in the resulting image is calculated. Centroid of the biggest segment becomes the new saccade target.
43.4 Conclusions and future work System presented in this paper makes use of biologically inspired voluntary oculomotor behaviors such as saccades, smooth pursuit and vergence. Visual behaviors operating cooperatively perform real time 3D tracking of objects. Future plans include full integration of non-voluntary movements (gaze stabilisation), optical flow, Kalman-like estimator and zero-disparity filter as well as additional saliency maps for depth (stereovision), skin and color.
43.5 Acknowledgements This work was based on the MSc thesis of the author. The author wishes to thank prof, dr hab. Lech Polkowski (project supervisor) and other members of The PALADYN Group for their great support, enthusiasm and contribution: | Piotr Ciesielski Lech Blazejewski, Sebastian Pawlak and Krzysztof Luks.
References 1. Breazeal, C.L. (2000) Sociable Machines: Expressive Social Exchange Between Humans and Robots. PhD thesis, Massachusetts Institute of Technology 2. Brooks, R.A. (1991) Intelligence without representation. Artificial Intelligence, 47:139160 3. Carey, S. and Gelman, R. (1991) The Epigenesis of Mind. Lawrence Erlbaum Associates, Hillsdale, NJ 4. Coombs, D. and Brown, CM. (1993) Real-Time Binocular Smooth-Pursuit. /JCV, 11 (2): 147-165 5. Marjanovic, M., Scassellati, B. and Williamson, M. (1996) Self-taught Visually-Guided Pointing for a Humanoid Robot. In Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, pages 35-44 6. Robinson, D. (1968) The Oculomotor Control System: A Review. Proceedings of the IEEE, 56 (6) 7. Scassellati, B. (1998) Building Behaviors Developmentally: A New Formalism. AAAI Sprint Symposium'98, AAAI Press 8. Scassellati, B. (2001) Foundations for a Theory of Mind for a Humanoid Robot. PhD thesis, Massachusetts Institute of Technology 9. Yamato, J. (1998) Tracking moving object by Stereo Vision Head with Vergence for Humanoid Robot. Master's thesis, Massachusetts Institute of Technology
44 Crisis Management via Agent-based Simulation Grzegorz Dobrowolski and Edward Nawarecki Institute of Computer Science, AGH University of Science and Technology, Krakow, Poland [email protected]
Summary. Based on original definitions of: agent, system, activity, resources a model of critical situations in multi-agent systems is proposed. Next, the model is used to introduce an idea of monitoring sub-system which plays an important role in two sketched architectures of critical situations management for virtual and real cases of multi-agent systems applications.
44.1 Introduction The contribution deals with a class of intelligent decentralized systems that meet the agent paradigm [1,2]. There can be systems that both are designed from scratch as multi-agent ones (operating in the virtual world, e.g. network information services, virtual enterprises) and function in the reality as a set of cooperating autonomous subsystems of whatever origin (e.g. transportation systems, industrial complexes). Such systems (virtual as well as real) are marked by possibility of arising critical situations that can be caused by both outer (e.g. undesirable interference or the forces of nature) and inner (e.g. resource deficit, local damages) factors. Generally, crisis is interpreted here as threat of loss (partial or complete) of the system functionality. Difficulties of crisis identification, evaluation of possible effects and prevention (anti-crisis) actions come from general features of multi-agent systems (autonomy of the agent's decisions, lack of global information) as well as their dynamics (consequences appear after an operation in unpredictable manner) [3, 4]. Intention of the reported work is to propose a formalization of an agent and multiagent system description that can serve as a base for analysis of the system operation, especially in critical situations. The formalization allows, among others, for specification of how the analyzed system can be monitored and, in consequence, for creation of a simulation model [5, 6] of its behavior in the face of a particular crisis. Results of the simulation studies are the scenarios of the crisis progress. Investigation of the scenarios leads to finding a strategy of avoiding the crisis or, at least, reducing its effects. Considerations are carried out at the appropriate level of generality so as it could be possible to adjust them to specificity of the particular application.
552
Grzegorz Dobrowolski and Edward Nawarecki
The article is organized as follows. The first part (sections 44.2-44.4) is devoted for presentation of basic issues of multi-agent systems. They comprise definitions of: agent, system, activity, resources, and so on. Section 44.5 gives a proposed model of the critical situation with discussion of its most important aspects, e.g. a need and function of a monitoring sub-system. Then, Section 44.6 introduces some assumptions and architecture solutions of how to manage with the defined critical situations in two general types of multi-agent systems mentioned above.
44.2 Models of an Agent and System A presented beneath model of an agent (see also [7]) is constructed according to the black-box schema — a part of a domain is taken out and constitutes a system through specification of all interaction observed. In this way the model effectively reflects mainly the agent's ability to interact with his neighborhood and other agents (the features that are a base for a multi-agent system creation) but also in general defines his internal (abstract) architecture. The approach leads to adoption of the following assumptions crucial to the model. 1. Excluding physical impact on an agent, the rest of agent-neighborhood interface is fully controlled by the agent himself. 2. An agent operates in a discrete manner. His activity is a finite sequence of actions (elementary) performed by him. 3. An agent decides on the sequence in the sense of both actions to do and moments of time of their initiation. 4. The basic mechanism of the agent's model is sequential initiation of actions, called further the mechanism of choice. The assumptions seem to be enough general to encompass possible algorithms and implementation techniques of artificial agents as well as to describe presence of human beings in multi-agent systems. Definition 1 Agent A is a three-tuple of the form: A = {A,
S , FcSxAxS}
(44.1)
where: A finite set of actions (elementary) of agent A; S finite set of internal states of agent A; F three-element relation describing permitted succession of states and actions of agent A — in the given state the agent can perform an action (the second element) that leads him to a new state (the third element). Sustaining of cause-effect conjunction requires relation F to have the following feature: (5,a,5i) eF ^ (5,a,S2) G F => si = S2 (44.2)
44 Crisis Management via Agent-based Simulation
553
Relation F reflects the possible combinations of states and actions. Each state implies a subset of actions allowed. Description 1 Let As describe a set of actions of agent A admissible in state s As = {aeA
: ( s , a , # ) € F } , A.FeA
(44.3)
no matter which states they lead to. Assigning and performing of a particular action from As is an effect of the choice mechanism of an agent (an elementary action can be performed several times). Now let us focus our attention on the external characteristics of an agent i.e., how he behaves, not decides about actions. Definition 2 Let mapping f of the form: Si-^i = faM)
= 3 (si.ai.Si^i)
eF
: F eA
(44A)
denote performing action ai G A and changing the agent's state. To emphasize cause-effect conjunction, the appropriate states are indexed with natural numbers. Definition 3 Manifold but finite application of mapping / , represented in the formula beneath by operator 0, is called activity of agent A: ifaj
<8) faj+i
<8) • • • 0 / a j = /{a,-,a, + i,...,afe}
•' Oj, ^ ^ + 1 , • • • , Ofc G ^ G TI (44.5)
The agent's activity can be denoted also as a sequence of chosen and performed elementary actions of him. Having the above definition of an agent, let us show how such agents can be linked to form a multi-agent system. Definition 4 Multi-agent system Q is a two-tuple of the form: n
= {K}i=i,...,n;/}
(44.6)
It consists of a finite set of agents and relation called the interaction relation:
I =
A' X A^ D P^ 3 {a\a^)
\J
P^
; A' £ A' ; A^ € A^ ; A\A^
e H
Relation I is symmetric: {a^^a^) G I -^ {a^^a^) G / . The interaction relation describes potential interactions (cooperation) among the agents of the system. Interactions are realized when the appropriate actions are performed by the agents. None of the action can be performed independently (separately). Consequently, activity of a multi-agent system can be defined as a composition of activities of its members.
554
Grzegorz Dobrowolski and Edward Nawarecki
Definition 5 Activity of multi-agent system Q is a two-tuple of the form: {X,
^n}
(44.7)
where X is a quotient set of the union of agent's activities by the extended interaction relation:
X ={ U
{ai:aieA^})/I
i=l,...,iV
A^
eA^]A^en
and >:Q C X X X is a quotient relation of the union ofpreceding relations ^AJ in the agent's activity by the extended interaction relation: tr2 = (
U
^A^)/I
: A^ eQ
Definition 5 indicates also possibile graphic representation of the activity of a multi-agent system. It may be represented by a directed graph {X ^ "^Q } of the nodes given by the equivalence classes in X — the simultaneous agents' actions and the arcs that join, conformably to relation >:Q^ actions directly following each other. An example of such a graph^ is shown in figure 44.1.
^8
^«-i
c
Fig. 44.1. Activity of a multi-agent system. Ovals indicate interactions between agents.
44.3 Resources in Multi-agent Systems Introduction of resources into the model of a multi-agent system allows for description of phenomena of consumption of various materials or goods coming from outside of the system, production and deposition of such resources in the neighborhood (environment) as well as internal exchange of them among agents of the system. ^For the sake of simplicity some labels and indexes are neglected.
44 Crisis Management via Agent-based Simulation
555
The basic assumption is here that changes of a resource scheme are caused by the agents as a consequence of performing actions. Incorporation of the resources into the model is simultaneously done at three levels: the level of a single agent that consumes and produces them, of a multi-agent system where production and consumption accompany cooperation actions and of a joined sub-system (environment) that is to model balance of the resources in order to reflect their interchange among the agents during cooperation and flows to and from the neighborhood. Definition 6 Agent with resources AR is an extension of tuple A of the agent from definition 1: AR = {A, RA^TAC AX RA XJR} (44.8) where: RA finite set of resource names relating to the agent's actions A; TA three-element relation describing for each action which resources (second element) and of what quantity (third element, IR - real numbers) are consumed (negative values) or produced (positive values) as an effect of its performing. In the same way we can define a multi-agent system with resources. Aggregate QR is augmented with a finite set of resource names relating to the system RQ and analogous relation TQ. A minimal structure of the environment sub-system for modelling problems with resources is a set of balanced equations reflecting their exchange. For a given activity of an agent or multi-agent system, exploiting respective definitions, a system of balance equations for the considered resources can be constructed. In the case of a single agent AR3 A and its activity {a^ : a^ G A} the equations are: ^ z ^ - h 2 ; ° = 0 : {ai,r,Zr)eTAeAR,
WreR
(44.9)
where components {z^ e TRjreR completing the equations reflect the agent's neighborhood. The balance equations for activity of a multi-agent system {X , >:Q} encompass all actions of both the agents and their cooperations. The balance equations are written under assumptions, generally justified, that the resources intervened in a multi-agent system are additive (their quantities can be added) and proposed balances have physical interpretation.
44.4 Goals and Strategies of a Multi-agent System Due to the essence of existence and creation of multi-agent systems for almost each of them there are: •
the global goal and, possibly, some secondary goals (connected with the global one), i.e. while the particular system has been built;
556 •
Grzegorz Dobrowolski and Edward Nawarecki local (individual) goals of the agents representing their autonomy, that can be contradictory mutually or with respect to the global one.
Respectively, there are strategies that applied in the system or by its agents are to cause attaining the goals. Attaining a local goal (and thus realization of a local strategy) means the particular activity (sequence of actions) appropriately chosen by the choice mechanism of an agent. It can be described in the following way. Definition 7 Agent with strategy Au is an extension of aggregate A (def. 1) of the form: Au = {A, {us}ses} (44.10) where set {us}ses called the agent's strategy is a family of mappings (choice functions) such that: \/seS
3us:ADAs-^aeAs;
A,SeA
(44.11)
If all sets of admissible actions in the agent's states are of single element (y s e S card(i4s) = 1; A,S e A), then such agent is called reactive. Its strategy is obviously trivial. The agent's decision making process is only sketched here. Deeper consideration can be carried out if detailed description of the agent's state and the rest of elements from definition 7 is done. For the sake of explanation, let us assume that information possessed by an agent is a component of his state. During activity this information is modified. Based on performed actions, it can be observed from the outside as evolution of the choice mechanism. If the agent's state can be evaluated, the utility theory may be a core of the mechanism. Because the global goal is imposed externally on a system, its description as well as the corresponding strategy ought to be introduced independently of realization of the local strategies. The only way to do it is exploiting the notion of system activity. Definition 8 A strategy of multi-agent system Q is choice function UQ of the form: UQ : V —^ V{V)
(44.12)
where V is a set of all possible activities of the system (V{V) -family of subsets of V), The corresponding (chosen) subset V is called the global goal. Although it is not assumed in the above definition, function UQ ought to choose not empty and not too big subsets of activities. The strategy is often defined in the form of assumptions with respect to the given evaluation of the system activity. Two possibilities can be mentioned here as examples. The first one is a request the system to attain the specified final state (strategy is formed by all activities that lead to that state). The second possibility comes from description of the system activities via balance equations of resources. Then, formulation of a strategy is similar to an optimization problem, e.g. operate with minimal consumption of energy.
44 Crisis Management via Agent-based Simulation
557
Let us discuss now the question of implementation of both kinds of strategies. Contrary to the global strategy local ones are directly built into the agents. The general assumption about rationality of agents guarantees realization of their strategies. Implementation of the global strategy is very difficult mainly because of autonomy of agents. Definition 8 just characterizes the goal not showing how to achieve it. This is yet another formulation of the general problem of multi-agent systems, namely how to organize them.
44.5 Modeling a Critical Situation Let us take the following general description of critical situations in multi-agent systems as a point of departure. A critical situation is recognized as a particular state or sequence of states that violate or lead to violation of the global as well as local (the agents') goals of a system. Thus critical situations can be local (concerning a single agent) and global (involving not only all but also a group of agents). Arising of a local crisis may entail a global one in the future, but functional abilities of a system very often allow avoiding the consequences at the global level. Such phenomenon straight results from the basic features of multi-agent systems. One may say that some anti-crisis mechanisms (in the above sense) are already incorporated. On the contrary, threat of a global crisis usually requires especially invented mechanisms. A crisis among a group of agents is treated here as a global one because of similar way of the state description and the obvious fact that such crisis must emerge with respect to a partial or side- goal of a system. The above characteristics allow to define general conditions of management of critical situations: • • •
possibility of observation (monitoring) of the system state based on observation of the agents' states individually, adoption of the adequate ways of evaluation of a state in order to achieve operational criteria of critical situations recognition, availability of appropriate anti-crisis mechanisms.
Degree of realization of the above postulates can be regarded as a determinant of the system immunity against a crisis. As it has been signalled, a flexible by nature multiagent system has some elements of the mechanisms implemented either as parts of the agent's algorithms or in a way of communication or organization of a system (or a sub-system). Let us discuss the conditions firstly for the case of local critical situations. In the obvious way an agent monitors his state as well as evaluates it on his own. In state s he determines set As. Significant reduction of the set can be an indication of the crisis. If the agent must consider actions like "do nothing" or self-destruction, it is not only the indication but a kind of remedy also. Although application of the agent's strategy in the state Ug is oriented towards a decision, it is also the evaluation of the state. If some ranking of the actions is
558
Grzegorz Dobrowolski and Edward Nawarecki
prepared according to the utility coefficients, its values can be used for formulation of a crisis criterion. Then decline in utility can be regarded as a sign of a crisis. Finally, if the both mechanisms occur to be insufficient, choice function Ug can be augmented with an element intended for monitoring of crises and causing activation of built-in especially anti-crisis actions. The similar analysis with respect to global critical situations is a bit harder. This is because of a problem with determining the state of a multi-agent system. The state can be easily defined as composition of the agent's states but its calculation is usually operationally impossible. Such situation comes from the following features of a system. • • •
There are no strong enough synchronization mechanism to determine simultaneity of the agent's states. The system state is highly multi-dimensional so that the high cost of information acquisition should be taken into account. The agents are autonomous. They usually intend to disclose only as much information as it is necessary for the system operation.
In a general case it is assumed that agent j reveals just a sub-space of his state s*^ or some evaluation of its state v^{s^). Restriction of the state is accepted as a report while the evaluation is regarded as subjective. Of course, interpretation of the above information is known around the system. It is worthwhile to mention here that a state comprises also information about history so that the evaluation can have dynamic character. Putting all descriptions of the agent's state together possibly in a single place and regarding them as simultaneous is the only way to construct a description of the state of the whole system. Let us assume for further discussion that a monitoring sub-system operates and the following specific evaluation of the state of a multi-agent system is possible. v^=.vo{s*',...,v^{s^),...)
: A^ef2
(44.13)
The evaluation — similarly as in the case of a single agent — can be oriented towards critical situation. Adoption of a special shape of the evaluation functions and the appropriate definition of subsets of their values opens possibility of specialized tracking of the system states. For example, the values can be done by linguistic values: normal, preferred, danger, crisis. In its simplest form tracking can be just memorization of monitoring data and introduction of them into an evaluation procedure. Following the pointed earlier ways of definition of the global goal, two kinds of critical situations can be introduced: direct and indirect. The direct means the threat of loosing operability of the system in consequence of unavailability of the some agents' actions. The primary cause of an indirect critical situation is a lack of resources (violation of the appropriate balance) that, in turn, gives deficit of functionality. Detection of both kinds can be done by the monitoring sub-system based on individual evaluations pointed out lost of functionality or observation of distribution of some resource crucial to the agent's or system activity.
44 Crisis Management via Agent-based Simulation
559
44.6 Structure of Management of Critical Situations Particular causes of crises are various. It depends on both application of multi-agent systems as well as technical solutions used so that it seems hard or even pointless to search for not only an universal but detailed manner of management. Let us focus our attention on two characteristic cases that can be applied in several variants. They will be deepened with some implementation solutions. As it was already said the problem of management of critical situation encompasses two important sub-problems: monitoring of a system state and management itself. In the case when agents operate in a virtual environment (computer system) a problem of gathering information about a system state is a standard problem of computer science. Only amount of data can be essential to chosen implementation solutions. However, the open question is how to recognize critical situations and deal with them. An algorithm of the management block may follow different ideas. Two of them are worth to point at: • •
management based on previously elaborated patterns of critical situations, on-line management via forcing into particular reactions (changes) in a multiagent system.
r
^ <—o
Monitor V
J
MAS iL
•
Patterns
Fig. 44.2. Management Structure for the Case of Virtual System A general diagram for the discussed case is done in figure 44.2. A multi-agent system (MAS) is observed by a monitoring sub-system (Monit) that feeds with data a management block using patterns. Such structure is proper when — among others — the system is designed for computation (softcomputing) or agent-based simulation — is a model of some real system applied in the off-line mode (e.g. transportation net, production system). The problem of monitoring becomes much complicated when agents of the observed system have strong autonomy i.e. they can not want to be observed or direct observation is difficult for various reasons. Such situation arises in open heterogeneous information systems (access to information may be limited because of lack of sharing ontology) or in real-life systems monitored in the on-line mode (e.g. direct observation of traffic, trade or production complexes, monitoring of the environment).
560
Grzegorz Dobrowolski and Edward Nawarecki
In the presence of difficulties in reproduction of the agents' states and restricted means of interference in their activities, it is justified to extend the diagram of figure 44.2 with a new block that is designed to fill the multi-agent model with real data (see fig. 44.3). Then there are two monitoring sub-systems: rMonitor, vMoni-
vMonitor
vMAS
rMonitor
^----o
•'•
rMAS
Management
Patterns
Fig. 44.3. Management Structure for the Case of Real System tor. The former serves as deliverer of information about the real multi-agent system (rMAS) while the latter is mainly used for aggregation of data generated by the model (vMAS). Functions of the management block are also enriched. Besides the discussed control functions with respect to (vMAS) oriented to study on variants of anti-crisis reactions, there are also means for enforcement (or initialization of it) in real-life conditions (rMAS) on application of the elaborated variants. It is necessary to mention that the structures of figures 44.2 and 44.3 depict only functionalities proposed in the approach. They do not have to be separated in the shape of blocks (parts) in the concrete implementation. Particularly, monitoring subsystem vMonit may be a sort of an overlay on the agent model. It may be even regarded as an environment of vMAS. Other implementation solution may be proposed that consists in extension of a society of agents. Groups of agents of the especially oriented functions are introduced into rMAS (sometimes into vMAS as well): • •
monitoring agents that observe behavior of the original agents (e.g. by tapping them), preventing agents that realize anti-crisis policy by interference in the environment or activity of the original agents (e.g. by prohibition of some of their actions).
Current research activities of Institute of Computer Science AGH are directed towards realization of the blocks of figures 44.2 and 44.3. In particular, agent-based simulators of production complexes and transport services, pilot versions of monitoring systems are realized. A reader is referred here to INFOCAST system [8] as the most mature example. Simultaneously, studies on management algorithms in critical situations are also carried out.
44 Crisis Management via Agent-based Simulation
561
44,7 Summary Consideration of the article has been concerned with the application of the agent approach to the problem of management of critical situations. The approach has been discussed in two aspects: •
•
the conduct in menace to operation of the multi-agent system that is, in the case, of virtual nature (e.g. information or softcomputing systems realized using agent technology); the use of a multi-agent system for decision making (or supporting) in critical situation that (can) happen in the real world (e.g. in communication or power systems, environment [ecological])
Although procedures for both cases are different, approach to analysis and tools — especially software ones — used for management of crisis occur to be similar. They encompass: • • •
a formal description of a multi-agent system (virtual or real) allowing building of the simulation model; a system (or systems) that monitors operation of a multi-agent system under consideration; a decision block using results of monitoring or simulation and moreover be able to enforce elaborated anti-crisis strategies.
The proposed variants of utilization of the above elements do not take into consideration all nuances of possible applications but are a good point of departure for further research in the field that seems to be of great theoretical importance as well.
References 1. Weiss G (ed) (1999) Multiagent Systems: A Modem Approach to Distributed Artificial Intelligence. The MIT Press 2. Ferber J(1999) Multi-Agent Systems. Addison-Wesley 3. Wu S, Soo V (1999) Risk control in multi-agent coordination by negotiation with a trusted third party. In: Thomas D (ed) Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Voll). Morgan Kaufmann Publishers 500-505 4. Collins J, Tsvetovas M, Sundareswara R, van Tonder J, Gini M, Mobasher B (1999) Evaluating risk:flexibilityand feasibility in multi-agent contracting. In: Etzioni O, Muller J P, Bradshaw J M (eds) Proceedings of the Third International Conference on Autonomous Agents (Agents'99). Seattle, WA, USA, ACM Press 350-351 5. Uhrmacher A M, Gugler K (2000) Distributed, parallel simulation of multiple, deliberative agents. In: Bruce D, Donatiello L, Turner S (eds) Procedings of the 14th Workshop on Parallel and Distributed Simulation (PADSOO). Los Alamitos, CA, IEEE Press 101-110 6. Zhao Z, Belleman R G, van Albada G D, Sloot P M A (2002) Scenario switches and state updates in an agent-based solution to constructing interactive simulation systems. In: Proceedings of the Communication Networks and Distributed Systems Modeling and Simulation Conference (CNDS 2002) 3-10
562
Grzegorz Dobrowolski and Edward Nawarecki
7. Dobrowolski G, Nawarecki E (2001) Multi-agent system in a decentralized control problem. In: Binder Z (ed) Management and Control of Production and Logistics 2000. PERGAMON 445-450 8. Kluska-Nawarecka S, Dobrowolski G, Marcjan R (2001) INFOCAST - A system for quality control procedures and diagnosis of casting defects. Acta Metallurgica Slovaca 7:441^46
45
Monitoring in Multi-Agent Systems: Two Perspectives'*^ Marek Kisiel-Dorohinicki Institute of Computer Science AGH University of Science and Technology, Krakow, Poland [email protected] Summary. The subject of the paper is the discussion of monitoring issues in multi-agent systems, and particularly, the infrastructure that should be built into an agent platform, so that its use would not bring much trouble to the designer. In the course of the paper two cases are considered depending on the kind of core MAS infrastructure applied: informationintensive and computational systems. For each case some general remarks on how to build a monitoring subsystem are presented as well as a more detailed description of a particular prototype realisation is given.
45.1 Introduction It seems obvious that implementation and deployment of computer systems must be supported by tools allowing to check whether they are working properly. Of course, this is also true for multi-agent systems, yet in this case the problem is of vast importance not only for the (human) designer or even a user, but also for the agents — the components of the system. Since the cooperation of autonomous agents depends on their proper behaviour, this should be verified as much as possible by the partners before proceeding with the identified strategy of interaction. In both cases we need some mechanism that allow for detection and diagnosis of possible failures (critical situations) considering the goal of the whole MAS or a group of cooperating agents (a team) to be achieved. This is only possible assuming on-line monitoring of the states and behaviour of interacting agents and drawing inmiediate conclusions on the possibility of the emergence of crisis. It depends on the particular application and design policy applied, whether these conclusions are to be drawn by the user, some external software tools, or by the agents themselves — but this problem is out of the scope of this paper. In complex multi-agent environments the problem of efficient monitoring of often distributed and heterogeneous agents is not trivial, and many concepts of the monitoring policy were already proposed (e.g. [5, 3]). Yet most papers leave out the *This work was partially sponsored by State Committee for Scientific Research (KBN) grant no. 4 TllC 027 22
564
Marek Kisiel-Dorohinicki
implementation problems, or show only some preliminary solutions strongly related to a particular platform, which gives no indication how to deal with the problem in a more common way. Furthermore, some tools provide system monitoring support for (human) users, but not for the agents of the system (e.g. [4]). This paper aims at filling this gap introducing a general model of a monitoring subsystem for MAS based on the concept of monitoring services. The essence of the approach is to allow for on-line processing of required (ordered) information via autonomous monitoring agents, coordinated by a local authority (manager) that also plays role of a directory of monitoring resources for both agents and external clients (and thus is called monitoring services provider). The paper is organised as follows. After a short review of the considered subtypes of MAS and their infrastructure in section 45.2, the concept of monitoring services and a general structure of a monitoring subsystem are presented in section 45.3. Sections 45.4 and 45.5 discuss in more detail the monitoring infrastructure for the cases of information-intensive and computational MAS respectively, together with their prototype implementations.
45.2 The infrastructure of multi-agent systems Today the term multi-agent system covers a variety of different systems. In the course of the paper we will concentrate on two extreme cases: information-intensive systems consist of cognitive (reasoning), socially aware, and service-oriented (client-server model) agents, often designed to work in a distributed heterogeneous and open environment (the global network), and utilising
host
platform
Fig. 45.1. The structure of information-intensive MAS
45 Monitoring in Multi-Agent Systems: Two Perspectives
565
reach (knowledge oriented) inter-agent (point-to-point) communication with sophisticated collaboration schemata (fig. 45.1); computational systems consist of a relatively large number of reactive, Hghtweight (computationally simple) agents, designed to work in parallel but rather not network-aware, thus often utilising simple broadcasting communication (fig. 45.2). Typical applications of the so-called information-intensive MAS include: information searching and gathering, knowledge acquisition and management, user assistance, etc. The systems in the second group essentially differ from these. Thus it was proposed to call such systems mass (or massively) multi-agent systems [7]. Their field of application can be mainly: •
•
simulation of problems that are distinguished due to their granularity in the sense of existence of a large number of similar objects that manifest some kind of autonomy, and soft computing restricted to problems that can be reduced to searching some space for elements of the given features; objects similar to those mentioned above are used as a means for the search.
infrastructure
aswO 1<«>
agem^ / (^agenT)
^gb) <^> c ^
Fig. 45.2. The structure of computational M4S The main difference between these types of multi-agent systems is the implementation technology and, in consequence, the relative load of infrastructure. Infrastructure may be understood here as everything in the system that is not an agent (e.g. conmiunication facilities or registration and directory services). In an informationintensive MAS there is relatively lightweight infrastructure, which means that most of the software (code) is concentrated in agents. This also means that the infrastructure knows a little about the states of agents. On the contrary, in a computational MAS agents have very simple and often the same structure, and so the infrastructure is relatively heavy and, which is of vast importance in this context, knows much
566
Marek Kisiel-Dorohinicki
about the agents' states. Nevertheless, it must be stressed that absolute load of infrastructure may be much higher in an information-intensive MAS, because of the complexity of the implementation environment and the needs of the technology used.
45.3 Monitoring infrastructure and services For both information-intensive and computational MAS building effective and efficient monitoring mechanisms is not an easy task. This is mainly because of the number and variety of agents that produce huge amount of data, which quickly become out-of-date. What is more, often information concerning groups of interacting agents or resulting from longer observation of agents activity is indispensable. Also problems of distribution and heterogeneity of agents are of vast importance, especially in open environments. In this context the proposed solution assumes local on-line processing of required information via monitoring services, available both to the agents and external software tools via dedicated interaction protocols. Monitoring services should allow for obtaining the information about: • • • •
physical structure of MAS (agents present in a given location) — if not provided by the platform, logical structure of MAS (classes of agents, their properties) — if available in a particular application, actual state of a given agent (node, team), changes of the state of a given agent (node, team) — this is a subscription rather then a single inquiry.
monitoring agents
CO CD data acquisition i
i
i
i
i
i
o^rtS'-O^
monitoring service provider
I
client (e.g. GUI)
I
domain-specific agents
Fig. 45.3. A general structure of a monitoring subsystem for MAS A general structure of a monitoring subsystem for MAS is presented in fig. 45.3. The acquisition and processing of required information is actually realized by monitoring agents of two kinds: 1. domain-specific agents with built-in monitoring functionality, 2. dedicated agents which obtain required information e.g. by observing behaviour or overhearing communication of other agents.
45 Monitoring in Multi-Agent Systems: Two Perspectives
567
The creation and activity of various monitoring entities is coordinated by monitoring services provider, which is a local authority responsible for management of all monitoring resources in a particular location (host). Since some directory of monitoring resources is indispensable to support processing and delegation of monitoring services, the monitoring services provider also delivers appropriate interfaces for agents of the system and external clients to facilitate identification of agents, their properties, and actual state. Cooperation between providers in different locations allows for remote monitoring of agents (e.g. distributed teams or mobile agents). Even though at this level of abstraction the structure of monitoring subsystem looks similar for both information-intensive and computational systems, the implementation differs a lot. Monitoring in the former case should be based mainly on autonomous monitoring agents, which may be built on top of monitored agents, or acquire required information from other (monitored) agents. In the latter case much information about monitored agents is available in the infrastructure, and thus monitoring agents should be tightly integrated with the platform.
45.4 Monitoring infrastructure for information-intensive MAS In information-intensive MAS only some data may be acquired via platform's directory services, and most information must be acquired from agents: directly, if possible, or observing the behaviour or overhearing communication of other agents otherwise. In the first case domain-specific agents must be monitoring-aware — equipped with some monitoring module, which often assumes a specific agent architecture with built-in monitoring functionality. In the second case dedicated monitoring agents may be delegated to observe (overhear) and draw conclusions on the domain-specific agents' states. This is only possible if supported by the core infrastructure of the platform and obviously violates agents autonomy. It may be also conceptually difficult if the internal agent architecture is not known. Nevertheless, this does not require any mechanism built into the agent structure.
monitoring service provider
platform )rm (facilitator)
client (e.g. GUI)
1 /
Fig. 45.4. Monitoring infrastructure for information-intensive MAS
568
Marek Kisiel-Dorohinicki
As described in section 45.2, information-intensive multi-agent systems should be applicable in open heterogeneous environments. This is because contemporary distributed software architectures require efficient access and exchange of resources and services, which assumes cooperation between systems supplied by different vendors. Yet interoperability between a variety of different environments for agent technology cannot be possible until a sufficient set of standards is available and widely used by developers [2, 8]. This is why the prototype implementation of a monitoring subsystem was realised with conformance to FIPA^ specifications. The agent platform that was used for implementation facilitates agent identification and localisation (directory services), as well as communication — exchange of ACL messages (message router), according to FIPA specifications. It also provides basic agent structure that supports sending and receiving messages (protocol stack and ACL message parser). This structure was extended with a monitoring module, that provides values of any agent attributes using standard Java reflection APL This way every agent in the platform may be easily directly monitored without any additional effort of the designer of the particular application. The monitoring services provider was realised as a regular agent registered in the directory facility and thus every agent of the system is able to discover and use its monitoring functionality. The most important part of the realisation was the design of appropriate communication protocols — definition of ACL messages with specific content language exchanged between: • • •
agents (clients) and monitoring services provider, monitoring services provider and monitoring agents, monitoring services providers in different locations.
To facilitate effective communication the protocols allow for aggregation of requests and obtained data. Further work will firstly concentrate on implementation on a well-established FIPA-compliant platform (such as JADE). Also a more general content language (e.g. based on KIF) will be proposed. Because of our main application area (distributed expert systems), monitoring agents and processing of acquired knowledge for rulebased representation will be considered.
45.5 Monitoring infrastructure for computational MAS In a computational MAS acquisition of required information may be realised mostly by the core infrastructure, since it "knows" a lot about the agents' states. Thus the monitoring subsystem should be tightly integrated with the platform, as fig. 45.5 shows. Due to a huge amount of agents in the system caching and statistical processing of acquired (numerical) data may be of vast importance for efficient communication between distributed components of the monitoring infrastructure. The prototype implementation of the monitoring subsystem for a computational MAS was realised for AgWorld platform [1] — a software framework facilitating ^Foundation for Intelligent Physical Agents, for further reference see http://www.fipa.org
45 Monitoring in Multi-Agent Systems: Two Perspectives monitoring service provider core
569
client ^jJe.g.GUI) j
Fig. 45.5. Monitoring infrastructure for computational MAS agent-based implementations of distributed evolutionary computation systems (both evolutionary MAS and flock-based MAS models [6]). AgWorld is a PVM-based library of C++ (or Java) classes^. The main components of its structure are: Resource is a passive entity consumed by Agents (used as a means of management of evolutionary processes). Agent represents individual or flock (a unit of evolution and migration). Place constitutes abstraction of local environment (also a unit of distribution). Path is a directed connection between Places (facilitates conmiunication and migration). World is responsible for management of Places and Paths. Every active object in the system (e.g. Agent, Place or World) is a specialization of SimulationObject class, where the core monitoring functionality is implemented. This way all these objects become monitoring-aware and the monitoring infrastructure is automatically integrated with the whole platform. The difference is only in the communication protocols used between particular elements of the structure: • • •
for the sake of efficiency direct method calls are used between Agents in one Place (which is a single PVM process), PVM library functions are used to send messages between possibly distributed Places and World, XML-based socket communication is dedicated for external clients.
Further work on monitoring infrastructure for computational systems will focus on extending communication protocols (e.g. to support management functionality). Other transport protocols for external clients will be proposed — Corba IDL interface is nearly ready. Simultaneously a conmion monitoring infrastructure for PVM and MPI-based platforms is being prepared; integration with grid services is also considered.
^For further reference see http://agworld.sourceforge.net
570
Marek Kisiel-Dorohinicki
45.6 Concluding remarks Because of its limited length this paper is surely not a complete guide to design and implementation of monitoring mechanisms in multi-agent systems. Yet the author tried to identify the most important problems and propose possibly general solutions for two different classes of MAS implementations, based on experiences gained during numerous projects in various application domains. To be more comprehensible the considerations were illustrated by concrete realisations of general-purpose monitoring subsystems for information-intensive and computational MAS respectively — of course only selected aspects of their design were discussed. The material presented is obviously the work in progress, and thus descriptions of both prototypes were completed with expected strategies of further development. It seems that interaction protocols dedicated for monitoring could be more formally described to allow for interoperability between monitoring subsystems of different platforms and implementation of common external tools supporting not only analysis and visualisation of obtained data but also management of monitored agents.
References 1. A. Byrski, L. Siwik, and M. Kisiel-Dorohinicki. Designing population-structured evolutionary computation systems. In T. Burczyiiski, W. Cholewa, and W. Moczulski, editors, Methods of Artificial Intelligence (AI-METH 2003). Silesian Univ. of Technology, Gliwice, Poland, 2003. 2. J. Dale and E. Mamdani. Open standards for interoperating agent-based systems. Software Focus, 1(2), 2001. 3. J. Dix, T. Eiter, M. Fink, A. Polleres, and Y. Zhang. Monitoring Agents using Declarative Planning. Fundamenta Informaticae, 57(2-4):345-370, 2003. Short version appeared in: Glinther, Kruse, Neumann, editors, Proceedings of KI2003, LNAI2821, Springer Verlag, 2003. 4. J. R. Graham, D. McHugh, M. Mersic, F. McGeary, M. V. Windley, D. Cleaver, and K. S. Decker. Tools for developing and monitoring agents in distributed multi-agent systems. In T. Wagner and O. Rana, editors, Infrastructure for Agents, Multi-Agent Systems, and Scalable Multi-Agent Systems, LNAI 1887. Springer Verlag, 2001. 5. G. A. Kaminka, D. V. Pynadath, and M. Tambe. Monitoring teams by overhearing: A multi-agent plan-recognition approach. Journal ofArtificial Intelligence Research, 17:83135,2002. 6. M. Kisiel-Dorohinicki. Agent-based models and platforms for parallel evolutionary algorithms. In M. Bubak, G. D. van Albada, P. M. A. Sloot, and J. Dongarra, editors, Computational Science - ICCS 2004, Part III, LNCS 3038. Springer Verlag, 2004. 7. E. Nawarecki, M. Kisiel-Dorohinicki, and G. Dobrowolski. Organisations in the particular class of multi-agent systems. In B. Dunin-Keplicz and E. Nawarecki, editors. From Theory to Practice in Multi Agent Systems, LNAI 2296. Springer-Verlag, 2002. 8. P. O'Brien and R. Nicol. FIPA - Towards a standard for software agents. BT Technology Journal, 16(3):51-59, 1998.
46 Multi-Agent Environment for Management of Crisis in an Enterprises-Markets Complex Jarostaw Kozlak Department of Computer Science, AGH University of Science and Technology Al. Mickiewicza 30, 30-059 Krakow, Poland Summary. A very important research problem is elaboration of methodology for an efficient reaction to crisis events. The research goal presented in this work is creation of a simulation environment for analysing and estimating the methods of reaction to crisis events such as foreseeing any potential crisis events and taking them into consideration while making plans or creating plans which may be modified after a crisis event has occurred and that with limited disadvantageous consequences. An illustration of the research said is a system composed of an environment containing renewable common resources, as well as a set of enterprises and markets. Key words: crisis situations, multi-agent systems, supply chains
46.1 Introduction Many problems, which are of importance in today's world, are characterised by a high degree of complexity and are influenced by different heterogeneous entities of different autonomy levels. There could be mentioned such exemplary problems, like production planning or transport planning. For the realization of this kind of planning processes, it may be useful to apply a multi-agent approach which offers methodologies for dealing with such situations. The multi-agent approach is based on the realisation of systems composed of interacting, intelligent and autonomous entities, called agents, which construct local plans; in consequence of interactions occurring between them there will emerge global plans affected by all the complex system. The multi-agent approach proved its usefulness for solving standard production planning problems (job shop, open shop, flow shop), especially in respect to their dynamic version (e.g. [11]). There is also performed research on the application of the multi-agent approach for more complex problems which better reflect the reality. One can mention applications for supply-chains management [2, 6, 8]. For example, in [8] was presented an environment for creating systems for supplychains management, with two distinguished classes of component agents: structural elements (representing physically existing production enterprises, storehouses, brokers, sellers etc.), and control elements responsible for resources flow in the system.
572
Jaroslaw Kozlak
Market approach is also used [9, 10] for the development of efficient supply-chains with multi-agent systems. A very important research problem is elaboration of methodology for an efficient reaction to crisis events. It is assumed that crisis situations occur after the those events which entail a reaction, even at the cost of deterioration in the quality of realisation of requests adopted in hitherto plans. The research goal presented in this work is creation of a simulation environment for analysing and estimating the methods of reaction to crisis events such as foreseeing any potential crisis events and taking them into consideration while making plans or creating plans which may be modified after a crisis event has occurred and that with limited disadvantageous consequences. An illustration of the research said is a system composed of an environment containing renewable conunon resources, as well as a set of enterprises and markets [4]. Enterprises effectuate production process (obtain some output resources as a result of a transformation of suitable configuration of input resources); they may be also obtained resources from the environment or exchange resources with other enterprises paying the price set by market. The enterprises plans have a form of workflows representing a series of resource transformations; one of their main evaluation criteria is the profit obtained therefrom. The markets determine the market prise for rendering particular services and mediate in resource transfer between the enterprises. We assume here that a crisis situation is an arrival of a high-priority production order which cannot be rejected. The experiments performed are to demonstrate some abilities of pro-active planning which can foresee the incoming requests and tries to take them into consideration while constructing the plan.
46.2 Overview of Agent Planning Methods Particular agents make plans which best comply with their goals. On the basis of private goals of agents, it is possible to establish global plans in which participate different agents. It is possible to distinguish the following kinds of distributed planning (DP) [3]: Cooperative Distributed Planning (CSP) - the objective is to establish a good common global plan. Particular entities participate in the construction of the plan, agents exchange information about their plans which are constructed and adjusted to the needs of global common plan realization. Full cooperation of participating agents is assumed. Negotiated Distributed Planning (NDP) - the reference point is a successful realization of private, local goal by each agent. The development of a plan embracing a group of agents results from negotiation held among the agents and coordination of actions performed by particular agents. Distributed Continual Planning, (DCP) [3] - planning a process is distributed among the agents and the development of plans is led together with execution of actions purposed to their realization. For such kinds of problem planning, one can distinguish two approaches reactive (based on the idea of the best possible reaction
Title Suppressed Due to Excessive Length
573
to the occurred events leading to the plan modification) and pro-active (a more or less precise model of potential events which will have influence on the plans is used and one tries to predict them during plan construction). In case of proactive planning, it is possible to differentiate methods aimed at constructing robust solutions which give the highest chance that the solutions obtained remain valid after future dynamic events or methods aimed at creating flexible solutions which can be easy modified and adapted in case of need [1].
46.3 Multi-Agent Systems and Crisis Situations According to main ideas of multi-agent approach, it is assumed that as a result of multi-agent cooperation the system functionality should increase thanks to cooperation among the agents which gives possibilities to perform additional, often more complex actions [7]. Autonomy of agents should support an adjustment of the system to changes which appear after crisis situations take place. In the case of crisis situation, agents became aware of changes and modify its goals in response to them.
46.4 Model Features The objective of the work is the creation of a universal multi-agent environment with a possibility of testing different techniques of agent planning. It should be also possible to generate crisis situation and to observe the modification of agents plans purposing to their adjustment to situation changed. The environment should possess the following features: • •
•
a possibility of testing different methods of construction of agent plans, a possibility of modelling (in simple form) various kinds of crisis situations associated to agent functionality as well as free resources in the environment or being possessed by agents, a possibility of examination of different methods of reactions to crisis situations.
46.5 Model Description The model is composed of an environment and four types of agents: enterprises, producers, customers and markets. 46,5.1 Environment The environment constitutes a place, where agents are located. It is also a space where resources, item renewable (which quantity increases each given period of time, value of increase depends on current quantity of resources).
574
Jaroslaw Kozlak
46.5.2 Agent-Enterprise The role of agent-enterprise is the coordination of functioning agent-producers subordinated to it. Agent enterprise AE may be described as (EP, gr, g), where: EP - set of agents-producers subordinated to agent-decision module, gr - set of co-ordination rules (resources, actions, negotiated rules), g - goal function, value depends on the configuration of resources. 46.5.3 Agent-Producer Agent-producer - transforms configuration of resources. It is described as 5-tuple: APi = (AE, gu Pi, Ri, Ki\ where: AE - identifier of enterprise - owner of agent-producer, Qi - goal function, value depends on the configuration of resources. Pi - plan of agent (series of actions executed successively), Ri - resources configuration: R = (roi, ro2,... ,ron, rfi, r/2,... ,r/n,), where roi, ro2, ..., ron, are quantities of resources owned by agent and r / i , r/2, ... ,r/n are quantities of free resources owned by agent. Ki - knowledge on environment, expressed as a list of triplets a^, Oj, pij, where ai - action performed by agent, Oj - operation describing results of action execution end influences on agent configuration and its neighborhood (environment and other agents), Pij - probability that execution of action a^ (roi, ro2, ... ,rOn, r / i , r/2, ... , rfn) causes the result Oj. Action GiQ) is described by type i, force of execution j (expresses the quantity of resources used and the anticipated quantity of resources obtained as results of action) and execution time. There are the following kind of actions: action on the environment, transformation of resources and resources exchange between agents. 46.5.4 Agent-Customer Agent-customer AC = (g, R) is described by a utility function g (preferences conceming resources configuration) and current configuration of owned resources R. 46.5.5 Agent-Market In the system, there is a set of agents-markets. The role of the agent-market is mediation in an exchange of resources between agents, definition of equilibrium prices of action execution and coupling agent -producer with agent-customer for goods exchange. Market Mi is described as a following tuple (a^, pr^, qoi, qdi, Ipi), where: tti, - type of action of resources exchange;
Title Suppressed Due to Excessive Length
575
pri - current price for action execution; qdi - number of demands for action execution; qoi - number of offers of action execution; Ipi - list of agent couples (producer-consumer) performing exchange of resources; The algorithm of market functioning is described as follows: begin cycle wait for offers and demands during n steps if [demand - offer| < epsilon then keep service price if demand - offer > epsilon then price += correction(price, demand-offer) if offer - demand > epsilon then price -= correction(price, offer-demand) end cycle 46.5.6 Agent-Disturber The role of the agent-disturber is introduction of disturbances to the system, leading to occurrence of crisis situations. In this goal, it is necessary to generate particular events which are described in 46.5.8. The Agent Disturber AD is represented as (car, caa, cr), where: car - vector describing probabilities that agent will be devoid of one resource of particular type, caa - vector describing probabilities that agent will be devoid of performing a particular type of operation (by removing capabilities of performing of certain actions), cr - vector describing probabilities that a particular type of critical request, which agents have to execute, arrives. 46.5.7 Planning Algorithms The plan consists of a series of actions performed in given time periods. The plan development is based on a random generation of a set of action series, possible to be performed according to the knowledge possessed by the agent. Plans are estimated on the basis of calculation of goal function values for the given configurations of resources of agent at the end of plan realization. An additional criterion may be the probability of successful realization of the hereby defined plan. Then, it is possible to either choose the best plan or perform a random selection e.g. based on roulette rule. Constructed planning algorithms are adapted for use in dynamic changing environment. The plan may be modified, when (see fig. 46.1):
576 •
•
Jaroslaw Kozlak The given auction was not successfully performed (obtained result diverged from assumed result more then tolerance limit) or became impossible for realization (as a result of changes in an agent's entourage, for example, associated to a crisis situation). A better plan is found which lead to higher value of goal function in the given time horizon
replanning better plan found
better plan found
success
failure
replanning
Fig. 46.1. Plan construction and realization
46.5.8 Crisis Situations It is possible to distinguish the following kinds of crisis situations leading to the necessity of modification of plans by agents: • • • •
overexploitation of resources accessible in the environment, deprivation of resources possessed by an agent. In this case, the agent has to adapt its goals to the configuration changed, deprivation of an agent by removing capabilities of performing certain actions, introductions of certain critical requests which the enterprise has to execute.
Pro-active planning may make possible a prediction of some disadvantageous events (especially, if probabilities of their occurrence may be calculated).
46.6 Realization A prototype system version was implemented in Java. The research also contains some works on two environments for multi-agent planning in the environments composed of virtual enterprises and markets, based on agent platforms JADE [12] and MadKit [13]
Title Suppressed Due to Excessive Length
577
46.7 Experimental Results The preliminary experiments concerning a reaction of system to disturbances and applicability of pro-active planning were performed. The percentage of agents' plans successfully realized was analyzed. The goals of agents and system configuration were selected so that the probability of the successful realization of plan was 100 percent. Then, disturbances based on depriving agents of resources possessed by them were introduced to the system. Dependant on of the strength of disturbances, the probability of the successful realization of agent's plans decreases (fig. 46.2). When the quantity of deprived resources was about 30, it was impossible for agents to realize their plans, because of lack of resources. The average value assumed to be obtained after the realization of plan was 5056.68. The applied mechanisms of pro-active planning, based on taking into consideration of apparitions of the disturbances during plan realization and creation of plans with slightly lower requirements concerning the resources, limited the average value of plan constructed by agents (equals 5041.8), but slightly increased possibilities of correct realization of despite disturbances (the results are especially promising for disturbances with limited strength).
ipn 100 • CO
/^"^
80 •
c
_ ^
f-
60 •
1
40n U 1
-•—norm
yx •
0
1
_/y ^
10
1
t
20
1
1
22
[—•—pro-act II
1
25
1
28
1
30
1 1
40
disturbances [number]
Fig. 46.2. Plan sucessfully realized with or without pro-active plannning mechanisms In the future, it is necessary to perform more experimental research for different configurations of analysed agents and different values of goal functions.
46.8 Conclusions and Future Works In the paper, the idea of an environment composed of virtual enterprises and markets was presented. The main goal of the realization of this system is its application in
578
Jaroslaw Kozlak
different experiments concerning the planning and scheduling performed by groups of autonomous agents. The special importance for us has the construction of the plans considering crisis situations, described in section 46.5.8. Then, the following works are intended: • • •
further works on environment development; modelling of crisis situations and analysis of results obtained after application of different planning approaches; introduction of uncertain events and taking them into consideration in planning algorithms.
References 1. Davenport A, Beck J Survey of Techniques for Scheduling with Uncertainty. Unpublished manuscript. http://www.eil.utoronto.ca/EIL/profiles/chris/zip/uncertainty-survey.ps 2. Fox M, Barbuceanu M, Teigen R (2000) Agent-Oriented Supply-Chain Management. The International Journal of Flexible Manufacturing Systems, 12:165-188. Kluwer Academic Publishers, Boston. 3. desJardins M, Durfee E, Ortiz C, Wolverton M (1999) A survey of research in distributed, continual planning. AI Magazine, 4:3-22 4. Kozlak J, (2001) Management of the renewable resources in the open multi-agent system. In: Binder Z (ed) Management and Control of Production and Logistics. Pergamon/Elsevier Science 5. Liu J-S, Sycara K (1995) Multiagent coordination in tightly coupled real-time environments. In: Lesser V (ed). Proceedings of the International Conference on Multi-Agent Systems. MIT Press 6. Parunak V, VanderBok R (1998) Modeling The Extended Supply Network, Industrial Technology Institute, working paper 7. Steels L (1990) Cooparation between distributed agents through self-organisation. In: Demazeau Y, Muller J-P (eds) Decentralized AI. Elsevier Science PubHshers B.V., NorthHolland 8. Swaminathan J, Smith S, Sadeh N (1998) Modeling Supply Chain Dynamics: A Multiagent Approach. Decision Sciences. Volume 29, Number 3 9. Walsh S, Wellman M (1999) Modeling Supply Chain Formation in Multiagent systems. In: IJCAI-99 Workshop on Agent Mediated Electronic Commerce 10. Wellman M, Walsh W, Wurman P, MacKie-Mason J (2001) Auction protocols for decentralized scheduling. In: Games and Economic Behavior 35:271-303. 11. Yoo M-J, Muller J-P (2002) Using Multi-Agent System for Dynamic Job Shop Scheduling. In: Proceedings of ICEIS 2002 12. JADE (2003) Java Agent Devlopment Framework. http://sharon.cselt.it/projects/jade/ 13. The MadKit Project (a Multi-Agent Development Kit). http://www.madkit.org/
47
Behavior Based Detection of Unfavorable Events Using the Multiagent System Krzysztof Cetnarowicz^, Edward Nawarecki^, and Gabriel Rojek^ ^ Institute of Computer Science AGH University of Science and Technology Al. Mickiewicza 30, 30-059 Krakdw, Poland [email protected] ^ Department of Computer Science in Industry AGH University of Science and Technology Al. Mickiewicza 30, 30-059 Krakow, Poland roj [email protected] Summary. This article presents an attempt to creation of a security system which makes possible automatic detection of danger in protected area. Considered protected areas are real world systems e.g. airports, shop or city centers. Automatic processing of behavior of persons acting in the considered area should enable to indicate some actors which effects of behavior can be or will be unfavorable for the considered system - danger for other actors in the secured are.
47.1 Introduction Danger of destructive attacks seems to be more and more frequent in the contemporary world. The main problem is to detect the attack risk as soon as possible. The detection of such dangers must comply with the following requirements: • • •
detection of dangers early enough in order to undertake security-protection actions, detection of new kinds of danger that are unknown a priori, constant monitoring of the area which should be protected.
Analyzing the historical development of human societies we find that a society can meet a new kind of danger and to protect itself against it. A society can create and update moral principles, penal codes, and using them may detect unusual behavior and identify ill deeds. This is very flexible and efficient way in detection of new kind of dangers but is not practical in contemporary menaces because it takes a lot of time. Using the computer simulation we can accelerate the mentioned above way and obtain a method that is efficient in both: detection and time.
580
Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek
47.2 Real world system A surveyed real world system (e.g. airport) may be in general considered as an environment with resources and actors. Resources (e.g. cars, rubbish bins) have characteristic properties and create an environment. Actors acting in the environment change resources. The action of actors modifying resources may be monitored and than analyzed and used to determine unusual behavior. However in general, most systems are so called open systems. In the open system there is a migration of actors and new actors (new kind of actors) may come into a given system. So in the open system, we have to deal with new type of actors, new kind of behavior of new-type actors, and existing, stale models of behavior become useless. In general we can say that the main goal is to find a common property of good behavior of good actors and take an opposite behavior as the wrong one. Than we can use the negative selection to identify ill deeds and unfavorable behavior and in turn find the authors of it. 47.2.1 Real world system simulation We can built a computer model of the real system that simulates events that take place in the real system. Than we can realize a real-time simulation of a real system where simulated events are driven by real events of the real system. In the model: • •
the real resources are represented by simulated resources, actors of the real system are represented by agents.
We built a multiagent model of the real system. The multiagent system may be provided with social procedures that fulfill needed task.
47.3 Actors, actions, behavior and estimation of behavior An actor of the real world undertakes actions. The actions collected in a given period of time form a sequence that define actor's behavior. If more actions of an actor under consideration are known, we can say more about behavior of the actor. The order of actions (undertaken by an agent) is essential to evaluate its behavior. The same actions in different order may indicate behaviors that can be judged in different ways. Behavior of an individual is a sequence of actions. The length of this sequence is meaningful. The longer sequence enables to estimate behavior with higher confidence. It is impossible to define simply: long sequence, short sequence. The length of sequence depends on features of an individual (actor) which is observed. For example: taking into consideration people observed in the airport - we are unable to say anything about a given passenger who has just entered the monitored area. When we notice that he left some packages in five rubbish bins - we can estimate his behavior as bad and we should start an alarm. The essence of this article is to discuss how
47 Behavior Based Detection of Unfavorable Events
581
an alarm can be started automatically what leads us to the problem of the automatic behavior evaluation.
47.4 Estimation of behavior in multiagent systems Agent that represents a given actor of the real world undertakes actions that may be considered as objects. The objects create a sequence which is registered by agents observing another acting agent. In real world society all agents (all individuals in society) observe (and estimate) all individuals which actions should be noticed. In the simulated model of the real world society (real world system) we can create one agent which estimates behavior of all individuals in the monitored area. Actions of agents as registered objects may be processed in order to decide whether the behavior of the agent is good or had. Than it may enable evaluation of the agent as good or bad and imply an adequate action. It should be mentioned, that the quoted notions of good and bad do not have absolute meaning. Good behavior or good agent is a desirable individual for a given system (monitored area) in which evaluation takes place. Bad agent is an undesirable agent in a given system, although it is possible that it is good in another one. 47.4.1 Estimation inspired by immunological mechanisms Particular mechanisms, inspired by immunological mechanisms (as found in [3, 4, 5, 6]) are applied to estimate the behavior of an agent. The proposed mechanisms operate on actions committed by agents under observation. Structures used by immunological mechanism has form of sequences (chains) of actions performed by the agents. The length of a chain is defined as /, (i. e. every chain contains / objects). Every object represents one action undertaken by an agent under observation. The algorithm of the proposed mechanism is following: • • •
observe and store (permanently) actions (corresponding objects) undertaken by every agent visible in the environment (system); once after a given number of observed actions (undertaken by every agent) generate corresponding detectors; when detectors are generated - evaluate behavior of every agent in the system.
The generated detectors have a form of fragments of sequences (subsequences) composed of objects representing actions. A subsequence may be considered as a detector if it does not appear in a sequence of actions of any agent which is considered as good (represents good behavior). The process of creation of detectors takes place after a given number of memorized actions - we need to gather knowledge about undertaken actions. Behavior estimation consists of verification of sequences of actions (of all agents in monitored area). The sequence of actions is considered as bad if it contains as a subsequence a detector. Behavior of an agent is evaluated as bad if its sequence of actions is similar (within a given level) to detectors.
582
Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek
47.5 Division profile Our aim is to obtain a class of agent activity whose goal is to observe others agents in society and possible other elements of the environment. Those observations should be made in order to distinguish individuals whose behavior is unfavorable or incorrect (bad) for the observer. Such distinguished bad individuals should be adequately treated (e.g. convicted, avoided, liquidated) which should also be formed by a division profile. In the case of a multiagent system which is a simulation of real world system, it is desirable to equip every agent in the system with mentioned mechanisms, so the security is assured by all agents existed in the system. This mechanisms built-in into exemplary agent create division profile of this agent in interpretation of M-agent theory (presented e.g. [1, 2]). Division profile mechanisms are built-in into all agents existing in simulated system, but every agent posses his own instantion of division profile. In environment of simulated system there are added two common for all agents structures: •
•
board of actions F - the structure in which actions (objects representing actions) of all agents in environment are stored, there are stored only h last actions, every action is accompanied by the notion by whom this action has been undertaken; board of removal O - the structure which collects results of functioning of division profiles of all agents in the system.
Referring to presented in Sect. 47.4.1 synopsis of inmiunological inspired algorithm of behavior evaluation, functioning of division profile mechanisms can be spitted into 3 stages as it is shown onto Fig. 47.1.
creation of collection W
^ I
behavior evaluation of neighboring agents - qualifing particular agents as "barf*
time
generation of detector set J? Fig. 47.1. Stages of division profile functioning
47.5.1 Creation of collection W Collection W is named collection of own actions. This collection includes correct, "good" sequences of actions. The collection W should consists of action-object sequences of length Z, which is undertaken by the agent-observer (agent that owns considered instantion of division profile). This is correct, because of the assumption that actions which the agent undertakes are evaluated as good by him. Presuming
47 Behavior Based Detection of Unfavorable Events
583
there are stored h last actions undertaken by every agent, own collection W will contain h — I + 1 elements. 47.5.2 Generation of detector set R The algorithm of detectors generation refers to the negative selection - the method of T-lymphocytes generation (presented in [3, 4, 5, 6]). From set RQ of generated sequences of length / those reacting with any sequence from collection W are rejected. Sequences from set RQ represents every possible actions in every possible order (every possible actions are actions that can be undertaken by anyone agent in the system). Sequence reaction means that elements of those sequences are the same. Sequences from set RQ which will pass such a negative selection create a set of detectors R. 47.5.3 Algorithm of actions evaluation Once detector set of the agent-observer is generated, this detector set is used to find bad among sequences of action-objects of other agents (presented in board of actions F). Actions evaluation is done by the agent-observer (agent that division profile is considered) for every agent in environment separately. Assuming there are j agents in the system, evaluation of actions is made by the agent-observer j times separately. An exemplary agent a^ has attributed sequence Ni stored in board F , i indicates the number of agents. There is also attributed coefficient rrii to this agent a^. At the beginning of actions evaluation coefficients m are set to zero for all agents. For an agent a^ every subsequence of length / of sequence Ni is compared with every detector from set R, as it is shown in Fig. 47.2. If any element of detector set and any subsequence of Ni match, there is incremented coefficient rrii. Sequence matching means that the elements of the sequences compared are the same. At the end of process of actions evaluation for all agents, every agent has attributed coefficient indicating the number of matches. Coefficient rrii indicate the number of counted matches for agent a^. The bigger number of matches of an agent indicates that this agent can be considered as bad. The agent-observer chooses an agent or agents which has the greatest number of coefficients and sends a demand of deleting of that agent or agents. To this demand is attributed coefficient equal to number of matches of the agent which demand concerns. 47.5.4 Board of removal O Board of removal O is a structure in the environment with mechanisms which calculate all demand of deleting sent by all agents evaluating behavior. Board O is not part of division profile of any agent, but it is a part of evaluation mechanisms shared for all agents which makes evaluation of behavior. Board of removal is array O = (oi,02, ...Oi, ...Oj), where Oi is coefficient attributed to agent a^ and j is the number of agents in the system. Rules of board of removal exploitation:
584
Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek
Agent aj
(M)
#^ #^
Agent (22
1m
\
n f a n o n f yf.
%
v » «g^«m«, •»
%
Detector seti?
•
i^r
f 1
# Agent (2,
Evaluation
,
1.
J\.
^ ^ ^
Identity of symbols in sequences
no
NI/V"
#
mi=mi+I
Agent (2/ j - amount of all agent in environment Nj- collection of sequences of actions udertook by agent a, Fig. 47.2. Algorithm of actions evaluation presented for agent ai
1. board O is reset (all coefficients are set to zero) at the beginning of every constant time period At; 2. in the case of receiving of remove demand of an exemplary agent a^ with attributed to this demand coefficient m^: •
Oi — Oi + m i ;
3. at the end of every constant time period At there is removed agent (or agents) ad which characterize: • Od = max{oi,02'-'Oj), • Od> OU, where OU is constant (is named sensitivity of recognition).
47.6 Experiment In order to confirm effectiveness and utility of proposed solutions there was implemented multiagent system. In the environment of multiagent system there exists two types of resources: • •
resources of type A, resources of type B.
Resources are used by agents, but refilling of resources is only possible when resources are used simultaneously. It can be distinguished two types of agents acting in the environment of research system:
47 Behavior Based Detection of Unfavorable Events
•
•
585
type g=0 - agents which in every constant time period At take one unit of randomly selected (A - 50%, B - 50%) resource; type g=0 agents need units of resources to refill energy; if energy level of an type g=0 agent goes down, this agent is eliminated; type g=l - agents which in every constant time period At take one unit of A resource; its existence does not depend on its energy level; type g=l agents are also called intruders.
There are some similarities between actions of acting agents in research system (taking A resource / taking B resource) and actions of real world actors e.g. leaving baggage / taking baggage (at repository). 47.6.1 Results: intruders inside environment In this part of research there were researched three cases: •
•
•
a case with only type g=0 agents in the system without division profile mechanisms - initially there are 50 type g=0 agents in the system, which do not have any security mechanisms; a case with type g=0 agents and type g=l agents without division profile mechanisms - initially there are 35 type g=0 agents and 15 type g=l agents, all agents do not have any security mechanism; a case with type g=0 agents and type g=l agents with division profile mechanisms - initially there are 35 type g=0 agents and 15 type g=l agents, all agents in the system are equipped with the division profile mechanisms with parameters h = 18,1 = 5, OU = 300.
The system in those three cases was simulated to 300 time periods and there were performed 10 simulations. Diagram in Fig. 47.3 shows the average numbers of agents in separate time periods. In the two cases of system with agent without division profile mechanisms: if there are not any intruders in the simulated system, all type g=0 agents can exist without any disturbance. The existence of intruders in the system causes problems with executing tasks of type g=0 agents, which die after some time periods. Bad agents still remain in the system, which is blocked by those bad agents. In the case of system with agent with division profile mechanisms: in the environment there are stored last 18 actions undertaken by every agent. After 18 actions have been undertaken by every agent, detectors are constructed of length / = 5. Agents use their division profile mechanisms to calculate which neighboring agent they want to eliminate. Agent demand to eliminate these neighbors which have the maximum of detector's matchings. Agents present their demands to the environment with the number of matchings. The environment counts matchings in the demands presented and eliminates agents as it was presented in the description of division profile mechanisms. The constant OU is set up to 300. As results presented onto Fig. 47.3 verify after detectors were constructed, intruders were distinguished due to the division profile mechanisms. In the same time
586
Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek 60 50 40 30 H 20 10 A — I
30
—
r -
60
90
120
I
150
1 —
-1
180
210
1 —
240
270
300
time = initially 50 type g»0 agents •" initially 35 type g^O agents and 15 type g=1 agents (intruders), ail agents without division profile mechanisms -initially 35 type g^O agents and 15 type g^l agents (intruders), ail agents with division profile mechanisms
Fig. 47.3. The system without intruders, intruders inside the system, intruders inside the system with agents with built-in division profile
period distinguished agents were deleted what makes possible for type g=0 agents to function freely. 47.6.2 Results: intruders penetrating the environment There was simulated a case with mobile intruders. Initially there are 40 type g=0 agents. After 20 constant time periods there are getting new agents to the system. In every time period there is getting one intruder to the system, this process occurs to 80th time period, so there are getting 60 type g=l agents. There were simulated two cases: • •
agents without any security mechanisms; all agents equipped in division profile mechanisms with parameters h=lS,l = 5, OU — 300 (mobile agents are also equipped in division profile mechanisms) - after 18 actions have been undertaken by every agent there are constructed detectors, so after this had happened agents can distinguish bad whether good agents.
The system was simulated to 300 time periods and there were 10 simulations performed. Diagram in Fig. 47.4 shows the average numbers of agents in separate time periods. Knowing that all mobile agents are had it seems that all getting agents should be immediately eliminated. It is not true, because agents distinguish bad agent on the
47 Behavior Based Detection of Unfavorable Events
587
agents using division profile nnechanisms agents without division profile
20 -I
0
30
1
1
60
1
1
90
1
1
120
1
1
150 time
1
1
180
1
1
210
r-
240
270 300
Fig. 47.4. The system with mobile intruders penetrating the system, agents without or with buil-in division profile
basis of behavior estimation. It is possible to evaluate agents which have presented behavior - have undertaken 18 required by division profile mechanisms actions. Every new bad agent getting to the system is destroyed after he has presented his behavior - has undertaken 18 actions. Because agents undertake one action in one constant time period At, every new bad agent is killed after 18 constant time period At of his functioning.
47,7 Conclusion This paper presents a discussion of automated assuring security of real world system such as airport or shop center. The real world system can be simulated in order to monitor and analyze actions which are undertaken by actors in the real world. It can be used multiagent model which in the environment of the multiagent system refers to the protected area: resources represent real world resources (e.g. cars, baggage) and agents represent real world actors (persons). Using multiagent system it is possible to equip every agent in mechanisms which goal is to distinguish bad or good agents. This mechanisms were named division profile in our work. Presented mechanisms with immunological approach operate on objects which represent observed actions. The environment of multiagent system is designed with some additional mechanisms (mechanisms of actions storing and agents removing) in order to support security mechanisms which all agents are equipped. In order to confirm the effectiveness of our conception of unfavorable events detection there was simulated system which refers to a simple real world system (leaving / taking baggage at repository). Obtained results indicate that it is possible to detect threats unknown a priori (that are unknown in the process of security system creation). Presented mechanisms enable also constant monitoring of protected area
588
Krzysztof Cetnarowicz, Edward Nawarecki, and Gabriel Rojek
and as it was verified in test make possible to evaluate behavior before unfavorable consequences of this behavior.
References 1. Cetnarowicz K.,: M-agent architecture based method of development of multiagent systems. Proc. of the 8th Joint EPS-APS International Conference on Physics Computing, ACC Cyfronet, Krakow (1996) 2. Cetnarowicz K., Nawarecki E., Zabiriska M.: M-agent Architecture and its Application to the Agent Oriented Technology. Proc. of the DAIMAS'97, St. Petersburg (1997) 3. Forrest S., Perelson A. S., Allen L., Cherukuri R.: Self-nonself Discrimination in a Computer. In Proc. of the 1994 IEEE Symposium on Research in Security and Privacy, IEEE Computer Society Press, Los Alamitos (1994) 202-212 4. Forrest S., Perelson A. S., Allen L., Cherukuri R.: A Change-detection Algorithm Inspired by the Immune System. IEEE Transactions on software Engineering, IEEE Computer Society Press, Los Alamitos (1995) 5. Hofmeyr S. A., Forrest S.: Architecture for an Artificial Immune System. Evolutionary Computation, vol. 7, No. 1 (2002) 45-68 6. Wierzchon S. T: Sztuczne systemy immunologiczne: teoria i zastosowania. Akademicka Oficyna Wydawnicza Exit, Warszawa (2001)
48 Intelligent Medical Systems on Internet Technologies Platform Beata Zielosko and Andrzej Dyszkiewicz Institute of Computer Science, University of Silesia B^dzinska 39,41-200 Sosnowiec, Poland [email protected] [email protected]
48.1 Introduction The amount of accumulated medical data generates a problem of their efficient process, which aims at getting cross-sectional knowledge about this research. Thanks to computer science, if we want to research this type of problems, we can successfully use intelligent techniques such as e.g. genetic algorithms, neural networks, fuzzy sets or rough sets. These last give good results in patients' classification at sequent treatment stages and rehabilitation stages [7]. Considering the peculiarity of medical data (i.e. amount and diversity of parameters and many stages in diagnose and treatment), the usage of rough sets in creating advisory medical systems can shorten the time needed to diagnose. It can minimize the treatment expenses, can raise doctors' job quality and decrease risk in making a mistake during diagnose. Placing these solutions on .Net platform and implementing them as Web Services, gives new possibilities in usage of artificial intelligence domains in intelligent medical systems.
48.2 A role of data unification in medical metrology The usage of different types of diagnosis within the scope of motion functions, spontaneous emission and metabolism, within many hospital units, creates some conmiunication problems. They include transmission and estimation of temporary condition of patient and achieved clinical results with application various kinds of therapy. It is important, especially either in case of transferring patient from one hospital unit to another or discharging from hospital to home with intention of re-admission with further therapy in the future. In above-mentioned situations sometimes it is difficult to characterize objectively an exit state of patient and compare it with the state after staying at home or in another hospital unit. Doctors from other units are forced to duplicate many initial descriptive actions. It has an influence on peculiarity of medical data [1]. There exist some diseases of many attributes and sometimes it is difficult to indicate the most important symptoms. Some features are more important then
590
Beata Zielosko and Andrzej Dyszkiewicz
other in process of diagnosis, or some features are typical only for individual stages of disease. Besides, symptoms of one disease are caused by the other. Not always it is easy to indicate all attributes of disease and—we get incomplete data. In such situations unification of methodology and starting of usage of measure methods, which register discreet symptoms of life, could have a good influence on conmiunication between various medical structures [2]. The uniformity of measuring methods, with application e.g. in orthopedics unit, neurology unit or rehabilitation unit, will help in keeping one way of measuring procedures started in other organization unit. It will help to precise the patient's state [4]. Next it will enable the objective assessment of qualitative medical services in to apply the most effective methods of therapy, which will simultaneously generate the lowest economical expenses. To aim unification and possibilities in data exchanging between hospital units or individual healthcare centers, we have some international standards that allow transmission of medical data in the electronic form. The most popular ones: — HL7 (Health Level 7)—standard of data exchange in a text form, applied in USA. In latest version HL7 3.0 introduces objective information model in health services, which is described by UML (Unified Modeling Language). — DICOM (Digital Imaging and Communication in Medicine)—standard of data exchange medical images. In case of DICOM norm, every application which is compatible with a norm, must have the document of "Conformance Statement", where the author of application characterizes what services are compatible with DICOM. The "Conformance Statement" is sometimes selective which means that two applications could be compatible with the norm, but don't interact with each other because they are compatible in different scopes. — UN/EDIFACT (United Nations/Electronic Document Interchange For Administration, Conmierce and Transport)—standard used in Europe and Poland. Norm UN/EDIR\CT is applied for administrative and reporting purposes. It is mainly used to implement an outside data exchange e.g. relation between deliverer of medical services and NFZ (national health service). Above standards of data transmission are often used only in big health centers. Sometimes it happens that every medical unit has its own standard of accumulation, process and data transmission—depending on the applied system. It makes communication between medical units difficult. One of solution of this problem is to apply XML (extensible Markup Language) in above standards as unification method to transmit medical data.
48.3 Web Services and XML—elements of .Net platform The main aim of .Net platform is to resolve building and functions of diffusion systems into simple form and to secure their efficiency, scaling and security [8]. .Net offers modem technologies to create software to co-operate with internet. Web Services are one of the examples. We could characterize Web Services as independent software components available in network. They are used for specified functions and services [10]. It means that Web Service written in Java language and available in
48 Intelligent Medical Systems on Internet Technologies Platform
591
Linux system, could be called by application written in Visual Basic language and this application could be available in Windows system. The conmiunication between Web Services and clients happens thanks to standard protocols used in internet— HTTP (HyperText Markup Language) and SOAP (Simple Object Access Protocol). HTTP is used in sending the requests to Web Services and the messages replies return from them. SOAP defines a way of calling services and transfer return values. SOAP messages are formed in XML language. We could design application with many modules (multi modular application) and a possibility of passing data parameters and also use individual modules as Web Services. These Web Services could be modernized and scaled depending on client application needs or on development of implemented algorithms. Additional advantage of this solution is layers building (presentation layer, bussiness rules layer and data access layer) which makes application independent of used operating systems or device platform on client's side [3]. Web Services allows progranmmers to get the individual functions which are available in application. In this way there is no need of creating whole application every time we want to create a new one or add some new functions to the existing one. Owing to continuous development of intelligent techniques in processing data and economical aspects of designed informatics systems, the choice of .Net platform as a technology to create diagnosis support systems, and the choice of XML language as standard of transmission medical data, seems to be an interesting solution. XML is an extensible markup language and is used as a universal document format. It defines the way of publishing contents in databases and it determines format of exchanging data between applications and systems from other producers. XML is also a basic language of data description used by Web Services [6]. When creating XML document, we don't have to use a narrow, predefined set of markups (as HTML (HyperText Markup Language) language). We could create own elements for describing document. The described text included in a document i.e. set of information about format of data and data origin is called metadata. In this way standard document defines XML application (so-called XML vocabulary) [11]. This application creates document's framework and could be used to describe specified type of document e.g. MathML—XML application for formatting equations. Large flexibility of creating its own elements in XML documents requires precise rules of syntax. Usually XML application is defined by DTD (Document Type Definition), which is an optional component of XML document. Other ways to declare contents of XML documents are: CSS (Cascading Style Sheets), XSL (extensible Stylesheet Language) and XML Schemes. These files include instructions of formatting elements in XML documents and are to join by proper declaration e.g. , "default.xsl" is the name of XSL file. Using the XML Schemes, XML document could be presented by means of DOM (Document Object Model). XML file is described by means of tree, which has bundles. These bundles are objects with various methods and properties.
592
Beata Zielosko and Andrzej Dyszkiewicz
48.4 Multi modular medical system of patients' diagnosis As an example I would like to present possibility of rough sets elements adaptation in .Net technology to create medical system of patients' diagnosis. The usage of rough sets enables to help in solving problems connected with patients' classification [9].
Fig. 48.1. Multi modular medical system of patients' diagnosis
Proposed solution is a multi-channel canvassing measurement data, which are synchronized together with time basis. It will be measured by a four-channel photopletismograph coupling four-channel spirometer and a four-channel thermometer [5]. An important issue in this system is synchronous data collecting from the sensors. On the basis of this information and disease symptoms, the system can help a doctor in deciding about patients' diagnosis. Knowledge about correlations between data got from the research could be an indirect result of the work of this system. This system will make the assessment of human's body reactions and emotional conditions. Those conditions will be registered as a change of breath frequency which modifies pulse and as a consequence influences a temperature of human body organs. For the reason of synchronous data collecting from individual modules included in system, it could be used e.g. in intensive care, monitoring patient's condition when he is away of hospital and telemedicine. In further research this system will be used as one of the elements of the system to diagnose patients with scoliosis. On the basis of expert's knowledge and analysis of results got from research, the decision table is constructed and sent as an XML file format. This file is a parameter for function implemented in Web Service. Such function makes it possible to compute by other Web Services. For example: function of one Web Service returns data in abstraction classes format. Abstraction classes are passed as parameters to next Web Service which generates e.g. a core. Another Web Services with other functions implemented allows e.g. to generate decision rules. The business rules of appUcation layer placed on the server includes algorithms to process data. This assures that we don't have to modify other functions implemented
48 Intelligent Medical Systems on Internet Technologies Platform
593
in other Web Services in case we want to change one of those algorithms. Features as: scaling, multi modular building, diffusion structure, devices independence, builtin exchange of data standards, operating systems independence, permit to aggregate this technology with rough sets. This solution could give interesting results in the form of new services available in a global network.
48.5 Summary Era of static WWW (World Wide Web) is finishing. Now internet is a platform accessible for newer and newer services, standards as XML or script languages. Electronic shops, banks, institutions, schools are more and more popular. .Net platform could be an efficient environment for an exchange of data and services of applications working in individual medical units. Above examples show that implementation of intelligent techniques of data process as Web Services on .Net platform opens new possibilities in designing diffusion decision support systems and autonomic computing.
Fig. 48.2. Diagnosis support system based on Web Services and XML standard
594
Beata Zielosko and Andrzej Dyszkiewicz
References 1. Brodziak A (1974) Formalizacja naturalnego wnioskowania diagnostycznego. Psychonika-teoria struktur i procesow informatycznych centralnego systemu nerwowego czlowieka i jej wykorzystanie w infonnatyce. PAN Warszawa 2. Doroszewski J (1990) Komputerowe wspomaganie diagnostyki medycznej. Nal^cz M, Problemy Biocybemetyki i Inzynierii Biomedycznej. WKL Warszawa 3. Dunway R (2003) Visual studio .NET. Mikom, Warszawa 4. Dyszkiewicz A, Wr6bel Z (2001) Elektromechaniczne procedury diagnostyki i terapii w rehabilitacji. Problemy Biocybemetyki i Inzynierii Biomedycznej pod redakcj§ Macieja Nal^cza, Warszawa 5. Dyszkiewicz A, Zielosko B, Wakulicz-Deja A, Wrobel Z (2004) Jednoczesna akwizycja wielopoziomowo sprz^zonych parametrow organizmu czlowieka krokiem do wyzszej swoisto^ci wnioskowania diagnostycznego. MPM Krynica 6. Esposito D (2002) Building Web Solutions with ASP .NET and ADO .NET. MS Press, Redmond 7. Komorowski J, Pawlak Z, Polkowski L, Skowron A Rough Sets: A Tutorial 8. Mackenzie D, Sharkey K (2002) Visual Basic .Net dla kazdego. Helion, Gliwice 9. Pawlak Z (1991) Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht 10. Panowicz L (2004) Software 2.0: Darmowa paltforma .Net 1: 18-26 11. Young J. Michael (2000) XML krok po kroku. Wydawnictwo RM, Warszawa
Author Index
Bandyopadhyay, Sanghamitra, 439 Bazan, Jan, 191 Bell, David, 227 Bieniawski, Stefan, 31 Blazejewski, Lech, 527 Burkhard,Hans-Dieter, 347 Bums, Tom R., 363 Castro Caldas, Jose, 363 Cetnarowicz, Krzysztof, 579 Chen, Long, 455 Chilov, Nikolai, 385 Czyzewski, Andrzej, 397 Dardziiiska, Agnieszka, 133 Dobrowolski, Grzegorz, 551 Doherty, Patrick, 479 Dunin-K^plicz, Barbara, 69 Duntsch, Ivo, 179 Dyszkiewicz, Andrzej, 589
locchi, Luca, 467 Johnson, Rodney W., 85 Karsaev, Oleg, 411 Kazmierczak, Piotr, 539 Kisiel-Dorohinicki, Marek, 563 Kostek, Bozena, 397 Kozlak, Jaroslaw, 571 Krizhanovsky, Andrew, 385 Latkowski, Rafal, 493 Levashova, Tatiana, 385 Liao, Zhining, 227 Luks, Krzysztof, 519 Marszal-Paszek, Barbara, 339 Melich, Michael E., 85 Michalewicz, Zbigniew, 85 Mitra, Pabitra, 439 Moshkov, Mikhail Ju., 239
El Fallah-Seghrouchni, Amal, 53 Farinelli, Alessandro, 467 Fioravanti, Fabio, 99 Gediga, Gunther, 179 Glowinski, Cezary, 493 Gomolinska, Anna, 203 Gorodetsky, Vladimir, 411 Grabowski, Adam, 215 Guo, Gongde, 179, 227 Heintz, Fredrik, 479
Nakanishi, Hideyuki, 423 Nardi, Daniele, 467 Nawarecki, Edward, 551, 579 Nguyen Hung Son, 249 Nguyen Sinh Hoa, 249 Nowak, Agnieszka, 333 Pal, Sankar K., 439 Pashkin, Michael, 385 Paszek, Piotr, 339 Patrizi, Fabio, 467 Pawlak, Zdzislaw, 3
596
Author Index
Peters, James R, 13 Pettorossi, Alberto, 99 Polkowski, Lech, 117,509 Proietti, Maurizio, 99
Stepaniuk, Jaroslaw, 305 Szczuka, Marcin, 281 Szmigielski, Adam, 509 Sl?zak, Dominik, 281
Ra^, Zbigniew W., 133, 261 Rauch, Ewa, 501 Ray, Shubhra Sankar, 439 Rojek, Gabriel, 579 Roszkowska, Ewa, 363 Ryjov, Alexander, 147
Tzacheva, Angelina A., 261
Samoilov, Vladimir, 411 Schmidt, Martin, 85 Sergot, Marek, 161 Simiiiski, Roman, 273 Skarzynski, Henryk, 397 Skowron, Andrzej, 191 Smimov, Alexander, 385 Staruch, Bozena, 293
Verbrugge, Rineke, 69 Wakulicz-Deja, Alicja, 273, 333 Wang, Guoyin, 455 Wang, Hui, 179, 227 Wei, Ling, 317 Wolpert, David H., 31 Wr6blewski,Jakub,281 Wu, Yu, 455 Zhang, Wenxiu, 317 Zielosko, Beata, 589
Printing and Binding: Strauss GmbH, Morlenbach