This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
NewChinaTown (chinese) BrusselsTel: +32(0)2 345 67 89
RoyalChina(chinese) LeuvenTel: +32(0)16 61 61 61
ChinaGarden (chinese) AmsterdamTel: +31(0)20-4321234
1 = (κ • μ)(w2 ) which violates the RecR condition.
5
Conclusion
In this paper, we presented a general revision model for epistemic state using plausibility measures and this model generalizes Spohn’s and Dubois and Prade’s results on revision in ordinal conditional
[DP93] [DP97] [FH95] [Hal01] [Hal03] [Jef65] [JT07] [KM91] [NPP03] [RF89] [Spo88]
[Wil94]
C E Alchourr´on, P G¨ ardenfors, and D Makinson. On the logic of theory change: Partial meet functions for contraction and revision. Journal of Symbolic Logic, 50, 510-530, 1985. S Benferhat, S Konieczny, O Papini, and R P P´erez. Iterated Revision by Epistemic States: Axioms, Semantics and Syntax. Procs. of ECAI 2000, 13-17, 2000. S Benferhat, S Lagrue, and O Papini. Revision of Partially Ordered Information: Axiomatization, Semantics and Iteration. Procs. of IJCAI 2005, 376-381, 2005. C Boutilier. Iterated Revision and Minimal Change of Conditional Beliefs. Journal of Philosophical Logic, 25:263-305, 1996. H Chan and A Darwiche. A distance measure for bounding probabilistic belief change. Internat. J. Approx. Reason., 38(2), 149-174, 2005. H Chan and A Darwiche. On the revision of probabilistic beliefs using uncertain evidence. Artif. Intel., 163, 67-90, 2005. D Dubois and H Prade. Belief Revision and Updates in Numerical Formalisms: An Overview, with New Results for the Possibilistic Framework. Procs. of IJCAI 1993, 620-625, 1993. A Darwiche and J Pearl. On the logic of iterated belief revision. Artif. Intel., 89, 1-29, 1997. N Friedman and J Y Halpern. Plausibility measures: a user’s guide. Procs. of UAI 1995, 175-184, 1995. J Y Halpern. Plausibility measures: A general approach for representing uncertainty. Procs. of IJCAI 2001, 1474-1483, 2001. J Y Halpern. Reasoning about Uncertainty. The MIT Press, Cambridge, Massachusetts, London, England, 2003. R C Jeffrey. The Logic of Decision. McGraw-Hill, New York, 1965. (2nd edition) University of Chicago Press, Chicago, IL, 1983. (Paperback correction) 1990. Y Jin and M Thielscher. Iterated belief revision, revised. Artif. Intel., 171, 1-18, 2007. H Katsuno and A O Mendelzon. Propositional knowledge base revision and minimal change. Artif. Intel., 52, 263-294, 1991. A C Nayak, M Pagnucco, and P Peppas. Dynamic belief revision operators. Artif. Intel., 146:193-228, 2003. A S Rao and N Y Foo. Minimal Change and Maximal Coherence: A Basis for Belief Revision and Reasoning about Actions. Procs. of IJCAI 1989, 966-971, 1989. W Spohn. Ordinal Conditional Functions: A Dynamic Theory of Epistemic States. In W.Harper and B.Skyrms (Eds.), Causation in Decision, Belief Change, and Statistics, 2, 105-134, 1988 by Kluwer Academic Publishers. M.A. Williams. Transmutations of Knowledge Systems. Procs. of KR 1994, 619-629, 1994.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-361
361
Structure Learning of Markov Logic Networks through Iterated Local Search Marenglen Biba and Stefano Ferilli and Floriana Esposito1 Abstract. Many real-world applications of AI require both probability and first-order logic to deal with uncertainty and structural complexity. Logical AI has focused mainly on handling complexity, and statistical AI on handling uncertainty. Markov Logic Networks (MLNs) are a powerful representation that combine Markov Networks (MNs) and first-order logic by attaching weights to first-order formulas and viewing these as templates for features of MNs. State-of-theart structure learning algorithms of MLNs maximize the likelihood of a relational database by performing a greedy search in the space of candidates. This can lead to suboptimal results because of the incapability of these approaches to escape local optima. Moreover, due to the combinatorially explosive space of potential candidates these methods are computationally prohibitive. We propose a novel algorithm for learning MLNs structure, based on the Iterated Local Search (ILS) metaheuristic that explores the space of structures through a biased sampling of the set of local optima. The algorithm focuses the search not on the full space of solutions but on a smaller subspace defined by the solutions that are locally optimal for the optimization engine. We show through experiments in two real-world domains that the proposed approach improves accuracy and learning time over the existing state-of-the-art algorithms.
1
Introduction
Traditionally, AI research has fallen into two separate subfields: one that has focused on logical representations, and one on statistical ones. Logical AI approaches like logic programming, description logics, classical planning, symbolic parsing, rule induction, etc, tend to emphasize handling complexity. Statitistical AI approaches like Bayesian networks, hidden Markov models, Markov decision processes, statistical parsing, neural networks, etc, tend to emphasize handling uncertainty. However, intelligent agents must be able to handle both for real-world applications. The first attempts to integrate logic and probability in AI date back to the works in [1, 8, 19]. Later, several authors began using logic programs to compactly specify Bayesian networks, an approach known as knowledge-based model construction [26]. Recently, in the burgeoning field of statistical relational learning [7], several approaches for combining logic and probability have been proposed such as probabilistic relational models [17], bayesian logic programs [10], relational dependency networks [18], and others. All these approaches combine probabilistic graphical models with subsets of first-order logic (e.g., Horn Clauses). In this paper we focus on Markov logic [22], a powerful representation that has finite 1
Department of Computer Science, University of Bari, Italy, email: {biba,ferilli,esposito}@di.uniba.it
first-order logic and probabilistic graphical models as special cases. It extends first-order logic by attaching weights to formulas providing the full expressiveness of graphical models and first-order logic in finite domains and remaining well defined in many infinite domains [22, 25]. Weighted formulas are viewed as templates for constructing MNs and in the infinite-weight limit, Markov logic reduces to standard first-order logic. In Markov logic it is avoided the assumption of i.i.d. (independent and identically distributed) data made by most statistical learners by using the power of first-order logic to compactly represent dependencies among objects and relations. Learning an MLN consists in structure learning (learning the logical clauses) and weight learning (setting the weight of each clause). In [22] structure learning was performed through ILP methods [13] followed by a weight learning phase in which maximumpseudolikelihood [2] weights were learned for each learned clause. State-of-the-art algorithms for structure learning are those in [11, 16] where learning of MLNs is performed in a single step using weighted pseudo-likelihood as the evaluation measure during structure search. However, these algorithms follow systematic search strategies that can lead to local optima and prohibitive learning times. The algorithm in [11] performs a beam search in a greedy fashion which makes it very susceptible to local optima, while the algorithm in [16] works in a bottom-up fashion trying to consider fewer candidates for evaluation. Even though it considers fewer candidates, after initially scoring all candidates, this algorithm attempts to add them one by one to the MLN, thus changing the MLN at almost each step, which greatly slows down the computation of the optimal weights. Moreover, both these algorithms cannot benefit from parallel architectures. We propose an approach based on the Iterated Local Search (ILS) metaheuristics that samples the set of local optima and performs a search in the sampled space. We show that, through a simple parallelism model such as independent multiple walk, ILS achieves improvements towards the state-of-the-art algorithms. The paper is organized as follows: Section 2 introduces MNs and MLNs, Section 3 describes learning approaches for MLNs, Section 4 introduces stochastic local search methods, Section 5 presents the ILS metaheuristic for MLNs structure learning. We present the experiments in Section 6 and conclude in Section 7.
2
Markov Networks and Markov Logic Networks
A MN (also known as Markov random field) is a model for the joint distribution of a set of variables X = (X1 ,X2 ,. . . ,Xn ) ∈ χ [5]. It is composed of an undirected graph G and a set of potential functions. The graph has a node for each variable, and the model has a potential function φk for each clique in the graph. A potential function is a non-negative real-valued function of the state of the corresponding
362
M. Biba et al. / Structure Learning of Markov Logic Networks Through Iterated Local Search
clique. The joint distribution represented by a MN is given by: P (X = x) =
1 Z
φk (x{k} )
k
where x{k} is the state of the kth clique (i.e., the state of the variables that appear in that clique). Z, known as the partition function, is given by: Z=
x∈χ
φk (x{k} )
k
MNs are often conveniently represented as log-linear models, with each clique potential replaced by an exponentiated weighted sum of features of the state, leading to: P (X = x) =
1 wj fj (x)) exp( Z j
A feature may be any real-valued function of the state. We will focus on binary features, fj ∈ {0, 1}. In the most direct translation from the potential-function form, there is one feature corresponding to each possible state xk of each clique, with its weight being log(φ(x{k} ). This representation is exponential in the size of the cliques. However a much smaller number of features (logical functions of the state of the clique) can be specified, allowing for a more compact representation than the potential-function form, particularly when large cliques are present. MLNs take advantage of this. A first-order knowledge base (KB) can be seen as a set of hard constraints on the set of possible worlds: if a world violates even one formula, it has zero probability. The basic idea in Markov logic is to soften these constraints: when a world violates one formula in the KB it is less probable, but not impossible. The fewer formulas a world violates, the more probable it is. Each formula has an associated weight that reflects how strong a constraint it is: the higher the weight, the greater the difference in log probability between a world that satisfies the formula and one that does not, other things being equal. A MLN [22] L is a set of pairs (Fi ; wi ), where Fi is a formula in first-order logic and wi is a real number. Together with a finite set of constants C = {c1 , c2 , . . . , cp } it defines a MN ML;C as follows: 1. ML;C contains one binary node for each possible grounding of each predicate appearing in L. The value of the node is 1 if the ground predicate is true, and 0 otherwise. 2. ML;C contains one feature for each possible grounding of each formula Fi in L. The value of this feature is 1 if the ground formula is true, and 0 otherwise. The weight of the feature is the wi associated with Fi in L. Thus there is an edge between two nodes of ML;C iff the corresponding ground predicates appear together in at least one grounding of one formula in L. An MLN can be viewed as a template for constructing MNs. The probability distribution over possible worlds x specified by the ground MN ML;C is given by
1 P (X = x) = exp( wi ni (x)) Z F
i=1
where F is the number of formulas in the MLN and ni (x) is the number of true groundings of Fi in x. As formula weights increase, an
MLN increasingly resembles a purely logical KB, becoming equivalent to one in the limit of all infinite weights. In this paper we focus on MLNs whose formulas are functionfree clauses and assume domain closure (it has been proven that no expressiveness is lost), ensuring that the MNs generated are finite. In this case, the groundings of a formula are formed simply by replacing its variables with constants in all possible ways.
3
Structure and Parameter Learning of MLNs
A first-order knowledge base (KB) is a set of sentences or formulas in first-order logic [6]. Formulas are constructed using four types of symbols: constants, variables, functions, and predicates. Constant symbols represent objects in the domain of interest. Variable symbols range over the objects in the domain. Function symbols represent mappings from tuples of objects to objects. Predicate symbols represent relations among objects in the domain or attributes of objects. A term is any expression representing an object in the domain. It can be a constant, a variable, or a function applied to a tuple of terms. An atomic formula or atom is a predicate symbol applied to a tuple of terms. A ground term is a term containing no variables. A ground atom or ground predicate is an atomic formula all of whose arguments are ground terms. Formulas are recursively constructed from atomic formulas using logical connectives and quantifiers. A positive literal is an atomic formula; a negative literal is a negated atomic formula. A KB in clausal form is a conjunction of clauses, a clause being a disjunction of literals. A definite clause is a clause with exactly one positive literal (the head, with the negative literals constituting the body). A possible world or Herbrand interpretation assigns a truth value to each possible ground predicate. Inductive Logic Programming (ILP) systems learn clausal KBs from relational databases, or refine existing KBs [13]. Hypotheses are constructed through refinement operators that add or remove literals from clauses. In the learning from interpretations setting of ILP, the examples are databases, and the system searches for clauses that are true in them. For example, CLAUDIEN [4], starting with a trivially false clause, repeatedly forms all possible refinements of the current clauses by adding literals, and adds to the KB those that satisfy a minimum accuracy and coverage criterion. In the learning from entailment setting, the system searches for clauses that entail all positive examples of some relation and no negative ones. For example, FOIL [21] learns each definite clause by starting with the target relation as the head and greedily adding literals to the body. MN weights have traditionally been learned using iterative scaling [5]. However, maximizing the likelihood (or posterior) using a quasiNewton optimization method like L-BFGS has recently been found to be much faster [23]. Regarding structure learning, the authors in [5] induce conjunctive features by starting with a set of atomic features (the original variables), conjoining each current feature with each atomic feature, adding to the network the conjunction that most increases likelihood, and repeating. The work in [15] extends this to the case of conditional random fields, which are MNs trained to maximize the conditional likelihood of a set of outputs given a set of inputs. The first attempt to learn MLNs was that in [22], where the authors used the CLAUDIEN system to learn the clauses of MLNs and then learned the weights by maximizing pseudo-likelihood. In [11] another method was proposed that combines ideas from ILP and feature induction of MNs. This algorithm, that performs a beam or shortest first search in the space of clauses guided by a weighted pseudo-likelihood (WPLL) measure [2], outperformed that of [22].
M. Biba et al. / Structure Learning of Markov Logic Networks Through Iterated Local Search
Recently, in [16] a bottom-up approach was proposed in order to reduce the search space. This algorithm uses a propositional MN learning method to construct template networks that guide the construction of candidate clauses. In this way, it generates fewer candidates for evaluation. Even though it evaluates fewer candidates, after initially scoring all candidates, the algorithm attempts to add them one by one to the MLN, thus changing the MLN at almost each step, which greatly slows down the computation of the WPLL. For every candidate structure, in both [11, 16] the parameters that optimize the WPLL are set through L-BFGS that approximates the secondderivative of the WPLL by keeping a running finite-sized window of previous first-derivatives. Regarding weight-learning, as pointed out in [11] a potentially serious problem that arises when evaluating candidate clauses using WPLL is that the optimal (maximum WPLL) weights need to be computed for each candidate. Since this involves numerical optimization, and needs to be done millions of times, it could easily make the algorithm too slow. In [15, 5] the problem is addressed by assuming that the weights of previous features do not change when testing a new one. Surprisingly, the authors in [11] found this to be unnecessary if it is used the very simple approach of initializing LBFGS with the current weights (and zero weight for a new clause). Although in principle all weights could change as the result of introducing or modifying a clause, in practice this is very rare. Secondorder, quadratic-convergence methods like L-BFGS are known to be very fast if started near the optimum [23]. This is what happened in [11]: L-BFGS typically converges in just a few iterations, sometimes one. We use the same approach for setting the parameters that optimize the WPLL.
4
Iterated Local Search
Many widely known and high-performance local search algorithms make use of randomized choice in generating or selecting candidate solutions for a given combinatorial problem instance. These algorithms are called stochastic local search (SLS) algorithms [9] and represent one of the most successful and widely used approaches for solving hard combinatorial problems. Many “simple” SLS methods come from other search methods by just randomizing the selection of the candidates during search, such as Randomized Iterative Improvement (RII), Uniformed Random Walk, etc. Many other SLS methods combine “simple” SLS methods to exploit the abilities of each of these during search. These are known as Hybrid SLS methods [9]. ILS is one of these metaheuristics because it can be easily combined with other SLS methods. One of the simplest and most intuitive ideas for addressing the fundamental issue of escaping local optima is to use two types of SLS steps: one for reaching local optima as efficiently as possible, and the other for effectively escaping local optima. ILS methods [9, 14] exploit this key idea, and essentially use two types of search steps alternatingly to perform a walk in the space of local optima w.r.t. the given evaluation function. The algorithm works as follows: The search process starts from a randomly selected element of the search space. From this initial candidate solution, a locally optimal solution is obtained by applying a subsidiary local search procedure. Then each iteration step of the algorithm consists of three major steps: first a perturbation method is applied to the current candidate solution s; this yields a modified candidate solution s’ from which in the next step a subsidiary local search is performed until a local optimum s” is obtained. In the last step, an acceptance criterion is used to decide from which of the two local optima s or s’ the search process
363
Algorithm 1 Structure Learning Input: P:set of predicates, MLN:Markov Logic Network, RDB:Relational Database CLS = All clauses in MLN ∪ P; LearnWeights(MLN,DB); Score = WPLL(MLN,RDB); repeat BestClause = SearchBestClause(P,MLN,Score,CLS,RDB); if BestClause = null then Add BestClause to MLN; Score = WPLL(MLN,RDB); if BestScore <= Score then Gain = Score - BestScore; BestScore = Score; end if end if until BestClause = null || Gain <= minGain for two consecutive steps return MLN
is continued. The algorithm can terminate after some steps have not produced improvement or simply after a certain number of steps. The choice of the components of the ILS has a great impact on the performance of the algorithm. As pointed out in [9] there are three good reasons to consider applying SLS algorithms instead of systematic algorithms. The first is that many problems are of a constructive nature and their instance is known to be solvable. In these situations, the goal of any search algorithm is to find a solution rather than just to decide whether a solution exists. This holds in particular for optimization problems, where the actual problem is to find a solution of sufficiently high quality. Therefore, the main advantage of a complete systematic algorithm (the ability to detect that a given problem instance has no solution) is not relevant for finding solutions of solvable instances. Secondly, in most application scenarios, the time to find a solution is limited. In these situations, systematic algorithms often have to be aborted after the given time has been exhausted, which renders them incomplete. This is problematic for many systematic optimization algorithms that search through spaces of partial solutions without computing complete solutions early in the search, and if such a systematic algorithm is aborted prematurely, usually a non solution candidate is available, while in the same situation SLS algorithms typically return the best solution found so far. Thirdly, algorithms for real-time problems should be able to deliver reasonably good solutions at any point during their execution. For optimization problems this typically means that run-time and solution quality should be positively correlated; for decision problems one could guess a solution when a timeout occurs, where the accuracy of the guess should increase with the run-time of the algorithm. This so-called any-time property of algorithms is usually very difficult to achieve, but in many situations the SLS paradigm is naturally suited for devising any time algorithms. In general, it is not straightforward to decide whether to use a systematic or SLS algorithm in a certain task. Systematic and SLS algorithms can be considered complementary to each other. SLS algorithms are advantageous in many situations, particularly if reasonably good solutions are required within a short time, if parallel processing is used and if knowledge about the problem domain is rather limited. In other cases, when time constraints are less important and some knowledge about the problem domain can be exploited, systematic search may a better choice. Structure learning of MLNs is a hard optimization problem due to the large space to be explored, thus SLS methods are suitable for
364
M. Biba et al. / Structure Learning of Markov Logic Networks Through Iterated Local Search
finding solutions of high quality in short time. Moreover, one of the key advantages of SLS methods is that they can greatly speed up learning through parallel processing, where speedups proportional to the number of CPUs can be achieved [9]. We also exploit this feature of our ILS algorithm, by parallelizing multiple independent walks of ILS in separate CPUs.
5
Generative Structure Learning of MLNs through ILS
In this section we describe the ILS metaheuristic tailored to the problem of learning the structure of MLNs. Algorithm 1 iteratively adds the best clause to the current MLN until two consecutive steps have not produced improvement (however other stopping criteria could be applied). Algorithm 2 performs an iterated local search to find the best clause to add to the MLN. It starts by randomly choosing a unit clause CLC in the search space. Then it performs a greedy local search to efficiently reach a local optimum CLS . At this point, a perturbation method is applied leading to the neighbor CLC of CLS and then a greedy local search is applied to CLC to reach another local optimum CLS . The accept function decides whether the search must continue from the previous local optimum CLC or from the last found local optimum CLS (accept can perform random walk or iterative improvement in the space of local optima). Careful choice of the various components of Algorithm 2 is important to achieve high performance. The clause perturbation operator (flipping the sign of literals, removing literals or adding literals) has the goal to jump in a different region of the search space where search should start with the next iteration. There can be strong or weak perturbations which means that if the jump in the search space is near to the current local optimum the subsidiary local search procedure LocalSearchII may fall again in the same local optimum and enter regions with the same value of the objective function called plateau, but if the jump is too far, LocalSearchII may take too many steps to reach another good solution. In our algorithm we use only strong perturbations, i.e. we always re-start from unit clauses (in future work we intend to adapt dynamically the nature of the perturbation). Regarding the procedure LocalSearchII we decided to use an iterative improvement approach in order to balance intensification (greedily increase solution quality by exploiting the evaluation function) and diversification (randomness induced by strong perturbation to avoid search stagnation). The accept function always accepts the best solution found so far.
6 6.1
Experiments Datasets
We carried out experiments on two publicly-available databases: the UW-CSE database used by [11, 22, 16] (available at http://alchemy.cs.washington.edu/data/uw-cse) and the Cora dataset originally labeled by Andrew McCallum. Both represent standard relational datasets and are used for two important relational tasks: Cora for entity resolution and UW-CSE for social network analysis. For Cora we used a cleaned version from [24], with five splits for crossvalidation. The published UW-CSE dataset consists of 15 predicates divided into 10 types. Types include: publication, person, course, etc. Predicates include: Student(person), Professor(person), AdvisedBy(person1, person2), TaughtBy(course, person, quarter), Publication (paper, person) etc. The dataset contains a total of 2673 tuples (true ground atoms, with the remainder assumed false). The Cora
Algorithm 2 SearchBestClause Input: P:set of predicates, MLN:Markov Logic Network, BestScore: current best score, CLS: List of clauses, RDB:Relational Database) CLC = Random Pick a clause in CLS ∪ P; CLS = LocalSearchII (CLS ); BestClause = CLS ; repeat CL’C = Perturb(CLS ); CL’S = LocalSearchII (CL’C ,MLN,BestScore); if WPLL(BestClause,MLN,RDB) ≤ WPLL(CL’S ,MLN,RDB) then BestClause = CL’S ; Add BestClause to MLN; BestScore = WPLL(CL’S ,MLN,RDB) end if CLS = accept(CLS ,CL’S ); until two consecutive steps have not produced improvement Return BestClause dataset consists of 1295 citations of 132 different computer science papers, drawn from the Cora Computer Science Research Paper Engine. The task is to predict which citations refer to the same paper, given the words in their author, title, and venue fields. The labeled data also specify which pairs of author, title, and venue fields refer to the same entities. We performed experiments for each field in order to evaluate the ability of the model to deduplicate fields as well as citations. Since the number of possible equivalences is very large, we used the canopies found in [24] to make this problem tractable.
6.2
Systems and Methodology
We implemented Algorithm 1 (ILS) in the Alchemy package [12]. We used the implementation of L-BFGS in Alchemy to learn maximum WPLL weights. We compared our algorithm performance with the state-of-the-art algorithms for generative structure learning of MLNs: BS (Beam Search) of [11] and BUSL (Bottom-Up Structure Learning) of [16]. In the UW-CSE domain, we used the same leave-one-area-out methodology as in [22]. In the Cora domain, we performed crossvalidation. For each system on each test set, we measured the conditional log-likelihood (CLL) and the area under the precision-recall curve (AUC) for all the predicates. The advantage of the CLL is that it directly measures the quality of the probability estimates produced. The advantage of the AUC is that it is insensitive to the large number of true negatives (i.e., ground atoms that are false and predicted to be false). The CLL of a query predicate is the average over all its groundings of the ground atoms log-probability given evidence. The precision-recall curve for a predicate is computed by varying the CLL threshold above which a ground atom is predicted to be true; i.e. the predicates whose probability of being true is greater than the threshold are positive and the rest are negative. For all algorithms, we used the default parameters of Alchemy changing only the following ones: maximum variables per clause = 5 for UW-CSE and 6 for Cora; penalization of WPLL: 0.01 for UWCSE and 0.001 for Cora. For L-BFGS: convergence threshold = 10−5 (tight) and 10−4 (loose); minWeight = 0.5 for UW-CSE for BUSL as in [16], 1 for BS as in [11] and 1 for ILS; minGain = 0.05 for ILS. For ILS we used a multiple independent walk parallelism, assigning an instance of the algorithm to a separate CPU on a cluster of Intel Core2 Duo 2.13 GHz CPUs.
M. Biba et al. / Structure Learning of Markov Logic Networks Through Iterated Local Search
6.3
Results
After learning the structure, we performed inference on the test fold for both datasets by using MC-SAT [20] with number of steps = 10000 and simulated annealing temperature = 0.5. For each experiment, all the groundings of the query predicates on the test fold were commented. MC-SAT produces probability outputs for every grounding of the query predicate on the test fold. We used these values to compute the average CLL over all the groundings and the relative AUC (for AUC we used the method proposed in [3]). For ILS we report the best performance in terms of CLL among ten parallel independent walks. Both CLL and AUC results (Table 1) are averaged over all predicates of the domain. Learning times are reported in Table 2. For BS in the Cora domain we were not able to report results, since structure learning with this algorithm did not finish in 45 days. BS is heavily slowed by its systematic top-down nature that tends to evaluate a very large number of candidates. In the UW-CSE domain, BS gets easily stuck in local optima due to its greedy strategy. Table 1.
Algorithm BS BUSL ILS
Accuracy results for all algorithms
UW-CSE CLL -0.312±0.046 -0.074±0.014 -0.069±0.016
AUC 0.320 0.431 0.432
CORA CLL -0.196±0.003 -0.102±0.003
AUC 0.201 0.225
In both domains, ILS gives the best overall results in terms of CLL and AUC. BUSL is competitive with ILS in terms of accuracy but is much slower. Even though BUSL evaluates fewer candidates than ILS, it changes the MLN completely at each step, thus calculating the WPLL becomes very expensive. In ILS this does not happen because, like in [11], at each step L-BFGS is initialized with the current weights (and zero weight for a new clause) and it converges in a few iterations. We empirically observed that ILS is very effective in escaping local optima and further improvements can be achieved by dynamically adapting the strength of the perturbation operator. Table 2.
Average learning times for all algorithms (in minutes)
Algorithm BS BUSL ILS
7
UW-CSE 335 618 148
CORA 9350 1597
Conclusion and Future Work
Markov logic networks are a powerful representation that combine first-order logic and probability. We have introduced an iterated local search algorithm for learning the structure of Markov Logic Networks. The approach is based on a biased sampling of the set of local optima focusing the search not on the full space of solutions but on a smaller subspace defined by the solutions that are locally optimal for the optimization engine. We have shown through experiments in two real-world domains that the proposed algorithm performs better than state-of-the-art structure learning algorithms for MLNs. Future work includes implementing more sophisticated parallel models suich as MPI (Message Passing Interface) or PVM (Parallel Virtual
365
Machine), dynamically adapting the nature of perturbations in ILS, using a Metropolis criterion in the acceptance function of ILS.
ACKNOWLEDGEMENTS We thank Pedro Domingos and Stanley Kok for helpful discussions, Marc Sumner for help on using Alchemy and Lilyana Mihalkova for help on BUSL.
REFERENCES [1] F. Bacchus, Representing and Reasoning with Probabilistic Knowledge, Cambridge, MA: MIT Press, 1990. [2] J. Besag, ‘Statistical analysis of non-lattice data’, Statistician, 24, 179– 195, (1975). [3] J. Davis and M. Goadrich, ‘The relationship between precision-recall and roc curves’, in Proc. 23rd ICML, pp. 233–240, (2006). [4] L. De Raedt and L. Dehaspe, ‘Clausal discovery’, Machine Learning, 26, 99–146, (1997). [5] S. Della Pietra, V. Della Pietra, and J. Laferty, ‘Inducing features of random fields’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380–392, (1997). [6] M. R. Genesereth and N. J. Nilsson, Logical foundations of artificial intelligence, San Mateo, CA: Morgan Kaufmann., 1987. [7] L. Getoor and B. Taskar, Introduction to Statistical Relational Learning, MIT, 2007. [8] J. Halpern, ‘An analysis of first-order logics of probability’, Artificial Intelligence, 46, 311–350, (1990). [9] H. H. Hoos and T. Stutzle, Stochastic Local Search: Foundations and Applications, Morgan Kaufmann, San Francisco, 2005. [10] K. Kersting and L. De Raedt, ‘Towards combining inductive logic programming with bayesian networks’, in Proc. 11th Int’l Conf. on Inductive Logic Programming, pp. 118–131. Springer, (2001). [11] S. Kok and P. Domingos, ‘Learning the structure of markov logic networks’, in Proc. 22nd Int’l Conf. on Machine Learning, pp. 441–448, (2005). [12] S. Kok, P. Singla, M. Richardson, and P. Domingos, ‘The alchemy system for statistical relational ai’, Technical report, Department of CSEUW, Seattle, WA, http://alchemy.cs.washington.edu/, (2005). [13] N. Lavrac and S. Dzeroski, Inductive Logic Programming: Techniques and applications, UK: Ellis Horwood, Chichester, 1994. [14] H.R. Loureno, O. Martin, and T. Stutzle, ‘Iterated local search’, in Handbook of Metaheuristics, 321–353, F. Glover and G. Kochenberger, Kluwer Academic Publishers, Norwell, MA, USA, (2002). [15] A. McCallum, ‘Efficiently inducing features of conditional random fields’, in Proc. UAI-03, pp. 403–410, (2003). [16] L. Mihalkova and R. J. Mooney, ‘Bottom-up learning of markov logic network structure’, in Proc. 24th Int’l Conf. on Machine Learning, pp. 625–632, (2007). [17] D. Koller N. Friedman, L. Getoor and A. Pfeffer, ‘Learning probabilistic relational models’, in Proc. 16th Int’l Joint Conf. on AI (IJCAI), pp. 1300–1307. Morgan Kaufmann, (1999). [18] J. Neville and D. Jensen, ‘Dependency networks for relational data’, in Proc. 4th IEEE Int’l Conf. on Data Mining, pp. 170–177. IEEE Computer Society Press., (2004). [19] N. Nilsson, ‘Probabilistic logic’, Artificial Intelligence, 28, 71–87, (1986). [20] H. Poon and P. Domingos, ‘Sound and efficient inference with probabilistic and deterministic dependencies’, in Proc. 21st Nat’l Conf. on AI, (AAAI), pp. 458–463. AAAI Press, (2006). [21] J. R. Quinlan, ‘Learning logical definitions from relations’, Machine Learning, 5, 239–266, (1990). [22] M. Richardson and P. Domingos, ‘Markov logic networks’, Machine Learning, 62, 107–236, (2006). [23] F. Sha and F. Pereira, ‘Shallow parsing with conditional random fields’, in Proc. HLT-NAACL-03, pp. 134–141, (2003). [24] P. Singla and P. Domingos, ‘Entity resolution with markov logic’, in Proc. ICDM-2006, pp. 572–582. IEEE Computer Society Press, (2006). [25] P. Singla and P. Domingos, ‘Markov logic in infinite domains’, in Proc. 23rd UAI, pp. 368–375. AUAI Press, (2007). [26] J. S. Wellman, M. Breese and R. P. Goldman, ‘From knowledge bases to decision models’, Knowledge Engineering Review, (1992).
366
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-366
Single-peaked consistency and its complexity ¨ urk ¨ 3 Bruno Escoffier1 and J´erˆome Lang2 and Meltem Ozt Abstract. A common way of dealing with the paradoxes of preference aggregation consists in restricting the domain of admissible preferences. The most well-known such restriction is singlepeakedness. In this paper we focus on the problem of determining whether a given profile is single-peaked with respect to some axis, and on the computation of such an axis. This problem has already been considered in [2]; we give here a more efficient algorithm and address some related issues, such as the number of orders that may be compatible with a given profile, or the communication complexity of preference aggregation under the single-peakedness assumption.
1
Introduction
Aggregating preferences for finding a consensus between several agents is an important topic at the boarder between social choice and artificial intelligence. Given the preferences of a set of agents (or voters) over a set of alternatives (or candidates), preference aggregation aims at determining a collective preference relation representing as much as possible the individual preferences, whereas voting rules consists in finding a socially preferred candidate. Among the paradoxes and impossibility theorems of preference agregation, the most famous may be the following three (in all three cases we assume that there are at least 3 alternatives): • the Condorcet paradox [3]: a Condorcet cycle is a sequence of candidates x1 , . . . , xk such that for all i ≤ k − 1, a majority of voters prefers xi to xi+1 , and a majority of voters prefers xk to x1 . Such cycles make it impossible to build a collective preference relation compatible with pairwise majority comparisons between candidates. • Arrow’s theorem [1]: any unanimous aggregation function for which the pairwise comparison between two alternatives is independent or irrelevant alternatives is dictatorial; • Gibbard and Satterthwaite’s theorem [7, 8]: any surjective and nondictatorial voting rule is manipulable. A profile consists of a collection of preference relations over the candidates (one per voter). In the above results, any profile is admissible. However, in some contexts, voters’ preferences may have a special structure restricting the domain of admissible profiles. The most well known such restriction is single-peakedness. It assumes that there is a natural linear axis, independent of the voters, on which alternatives are positioned: one may for instance think of a left-right axis as in politics, or a numerical axis (when the voters have to decide for instance about an amount of money to spend). A voter has a single-peaked preferences with respect to such an axis if, on each side of the “peak” (that is, the preferred candidate), his preference grows 1 2 3
LAMSADE, Universit´e Paris Dauphine and CNRS, email: escoffier@lamsade.dauphine.fr IRIT, Universit´e Paul Sabatier, Toulouse, email: lang@irit.fr CRIL, Universit´e d’Artois, email: ozturk@cril.fr
with the proximity to the peak. It is well-known that Condorcet cycles cannot occur when preferences are single-peaked; therefore, one escapes from the Condorcet paradox as well as Arrow’s and GibbardSatterthwaite’s theorem. However, this way of escaping the paradoxes and impossibility theorems assumes that the axis on which the candidates are positioned is known in advance. In contexts where it is partially or fully unknown, one should identify it before any aggregation process is started. Therefore, we consider the problem of determining whether, given the preferences of some agents on a set of alternatives, these preferences are single-peaked with respect to some axis (which we refer to as single-peaked consistency), and if so, how one of the possible axes can be determined. This problem has been considered in [2] (as well as the problem of determining whether a profile is single peaked w.r.t. a tree [9], which is weaker than single peakedness w.r.t. an axis). They give an algorithm in O(m.n2 ) where n (resp. m) is the number of candidates (resp. voters), based on matrix representation. We give here a different algorithm, both more intuitive and efficient since it works in time O(m.n). While the difference between O(m.n) and O(m.n2 ) is pratically not very significant for standard political elections where n is typically small, this is no longer the case when the set of alternatives (or “candidates ”) has a combinatorial structure, which is often the case in AI applications. A related problem is addressed by Conitzer [4]: without the prior knowledge of the axis, but knowing the preference relation of one agent (which gives some incomplete information about the axis), how can we elicit as efficiently as possible the preferences of a second agent? Single peaked consistency is important in at least two contexts. First, some domains tend to have a single-peaked structure, but for some reason we may not know the axis : In this case, from a few votes (for instance obtained from a sample of votes), we may learn this axis. Second, in some domains it is unclear whether it is reasonable to assume single-peakedness: then, checking the single-peaked consistency of the preference relations of a few voters gives a good hint as to whether single-peakedness is reasonable.4 In Section 2, we define single-peaked consistency and give a constructive algorithm that checks whether a profile is single-peaked consistent, and if so, returns a compatible axis. This algorithm works in time O(n.m), where n is the number of agents and m the number of alternatives. In Section 4 we study a few combinatorial aspects of single-peaked preferences; in particular, we give a result on the number of axes that are compatible with a tuple of single-peaked preferences. In Section 5 we give a simple additional result on the communication complexity of preference aggregation of single-peaked preferences. Finally we point to interesting extensions of our work. 4
This is for instance of particular interest when alternatives are evaluated on several criteria; here, the hidden axis may be some (a priori unknown) combination of the different criteria (projection from a multidimensional to a monodimentional representation).
B. Escoffier et al. / Single-Peaked Consistency and Its Complexity
2
Single-peaked preferences
Let V = {1, . . . , m} be a finite set of voters and X = {x1 , . . . , xn } a finite a set of candidates (or alternatives), with n ≥ 3. Definition 1 A preference relation 1 on X is a linear order on X. The peak of a preference relation 1 is the candidate x∗ = peak(1) such that x∗ 1 x for all x ∈ X \ {x∗ }. A profile is a m-uple P = 11 , . . . , 1m of preference relations on X. Definition 2 An axis O (noted by >) is a linear order on X. Given two candidates xi , xj ∈ X, a preference relation 1 on X whose peak is x∗ , and an axis O, we say that xi and xj are on the same side of the peak of 1 iff one of the following 2 conditions is satisfied: (1) xi > x∗ and xj > x∗ ; (2) x∗ > xi and x∗ > xj . A preference relation 1 is single-peaked with respect to an axis O if and only if for all xi , xj ∈ X such that xi and xj are on the same side of the peak x∗ of 1, one has xi 1 xj if and only if xi is closer to the peak than xj , that is, if x∗ > xi > xj or xj > xi > x∗ . For simplicity, we sometimes note (as in Example 1) x1 x2 . . . xn instead of x1 1 x2 1 · · · 1 xn or of x1 > x2 > · · · > xn . Example 1 Let X = {x1 , x2 , x3 , x4 , x5 , x6 } and O = (x1 > x2 > x3 > x4 > x5 > x6 ). The preferences x2 x3 x4 x1 x5 x6 ; x4 x3 x2 x5 x6 x1 ; and x6 x5 x4 x3 x2 x1 are single-peaked with respect to O but not x4 x3 x5 x1 x6 x2 . Indeed, x1 and x2 are on the same side of the peak (x4 ) but x2 is not preferred to x1 while it is closer to the peak than x1 . An interesting question is the existence of a common axis to all voters, such that the preferences of these voters are single-peaked with respect to this common axis. Definition 3 A profile 11 , . . . , 1m is single-peaked with respect to O iff for each voter i, 1i is single-peaked with respect to O. Whether single-peakedness seems justified or not strongly depends on the nature of X. It is often deemed reasonable if the axis represents an objective left-right political axis such that voters’ preferences are determined only from the position of the candidates on the axis, or else, if X is a set of numerical values or more generally a set equipped with a natural ordering. Conitzer [4] considers the elicitation of single-peaked preferences. The elicitation process is all the more efficient as the amount of communication required by the process is low. This amount of communication can be measured in terms of the number of elementary queries of the form “between the candidates x and y, which one do you prefer?”
3
Single-peaked consistency
A very natural question is the following: given a p-voter profile, is it single-peaked with respect to some (unknown) axis? This is defined formally as follows: Definition 4 (single-peaked consistency) A preference profile P = 11 , . . . , 1m on X is single-peaked consistent if there exists an axis O such that for all i, 1i is single-peaked with respect to O. When P is single-peaked with respect to the axis O, we say that O is compatible with P . For every axis O, we denote by SP (O) the set of preference relations on X that are single-peaked with respect to
367
O. For instance, if n = 3 and O = x1 > x2 > x3 , then SP (O) = {x1 x2 x3 , x2 x1 x3 , x2 x3 x1 , x3 x2 x1 }. The main problem associated with this definition is to determine if a given profile is single-peaked consistent. We now present the main result of this article, i.e. the resolution of this problem. More precisely, we propose an algorithm working in time O(mn) which, given a profile, outputs an axis compatible with this profile if it exists, and find a contradiction otherwise. The axis is built recursively, starting from the candidates ranked in last position by one or more voters. Indeed, we have the following easy lemma. Lemma 1 Let x be a candidate ranked in last position by a voter i. If the axis O is compatible with 1i , then x is either in the leftmost or in the rightmost position in O. Proof. If x is neither in the leftmost nor in the rightmost position, then there exist a candidate y on the left of x and a candidate z on the right of x (in O). But y 1i x and z 1i x, contradiction with the fact that 1i is single-peaked with respect to O. As a consequence of Lemma 1: in a single-peaked consistent profile, at most two candidates are ranked last by at least one voter. Before giving the algorithm, we first explain in detail the first (and easiest) iteration. Let L be the set of all candidates ranked last by at least one voter. We consider the three (exhaustive) possible cases: • |L| ≥ 3: then P is not single-peaked consistent, due to Lemma 1. • L = {x}: we place indifferently x either in the leftmost or in the rightmost position of the axis; this choice does not create any constraint in the remainder of the construction of the axis. Indeed, the problem is equivalent to first finding an axis compatible with the profiles restricted to the other candidates, and then adding x. • L = {x1 , x2 }: we place x1 and x2 in the leftmost and the rightmost position of the axis. P is compatible with an order O if and only if it is compatible with the inverse of O; as a consequence, the choice (x1 in leftmost or rightmost position) does not matter. Then, the candidates of L being positioned, we iterate the process considering the restriction of the preference relations to the other candidates. Of course, this first iteration is simple because no other candidate is already positioned in the axis. More generally, at each step of the algorithm, we have a set T of candidates already positioned at the extremal positions of the axis. Without loss of generality, let T = {x1 , x2 , . . . , xi , xj , xj+1 , . . . , xn } the candidates already positioned in the axis under construction: we have x1 > x2 > . . . > xi in the leftmost positions of the axis O, and xj > xj+1 > . . . > xn in the rightmost positions. The other candidates in T = X \ T will be positioned between xi and xj positioned in the leftmost/rightmost position). Then, at this iteration: • either we find a full compatible axis and P is single-peaked consistent; • or we find a contradiction and P is not single-peaked consistent; • or we position one or two new candidates to the right of i and/or to the left of j. The soundness of the algorithm will follow from the recursive proof of the following hypothesis. At each iteration, the axis under construction verifies the two following properties: • There exists a compatible axis for P if and only if there exists a compatible axis which extends the axis under construction. • For any voter k, x1 ≺k x2 ≺k . . . ≺k xi and xj 1k xj+1 1k . . . 1k xn .
368
B. Escoffier et al. / Single-Peaked Consistency and Its Complexity
In particular, from the second item we deduce that the candidates in T , i and j excepted, are not the peak of any voter. Let us now analyze the different possible configurations. Let L be the set of candidates ranked last by at least one voter (restricted to the candidates in T ). Based on Lemma 1, we have 3 possible cases: 1. |L| ≥ 3: contradiction, 3 candidates must be either in position i + 1 or j − 1. 2. L = {x, y}: either x is in position i + 1 and y in position j − 1, or vice versa, or we will find a contradiction. Let us consider a voter k who ranked x last (among the candidates in T ): (a) x ≺k xi and x ≺k xj : this is not possible since necessarily xi or xj is ranked worse than x by k (xi or xj was the candidate ranked last by k at the previous iteration). (b) xi ≺k x and xj ≺k x: x being the last candidate in T , and since x1 ≺k x2 ≺k . . . ≺k xi and xj 1k xj+1 1k . . . 1k xn , then any axis compatible with voter k on T will be compatible on all the candidates. Having positioned the first candidates does not create any constraint . Indeed, all the candidates in T are ranked better than all the candidates in T by voter k. As a consequence, for voter k, having x in position i + 1 and y in position j − 1 or vice versa does not matter. (c) xi ≺k x ≺k xj ≺k y : x is necessarily in position i + 1. Indeed, having x in position j − 1 leads to a contradiction: x is positioned between y and xj in the axis, but x ≺k y and x ≺k xj . Then, necessarily x is in position i + 1 and y in position j − 1. Symmetrically, if xj ≺k x ≺k xi ≺k y, then x is necessarily in position j − 1. (d) xi ≺k x ≺k y ≺k xj (or the symmetrical case) : xj is necessarily the peak for the voter k (the candidate positioned immediately to the left is worse, and the candidate xj+1 (if any) positioned immediately to the right is also worse, by our recursive hypothesis), hence the candidates in T are necessarily positioned between positions i and j following the increasing order of voter k. We test if this axis is compatible with the preferences of other voters. If so, we have a compatible axis, otherwise we conclude that P is not single-peaked consistent. We repeat step 2 for all voters. If case 2d occurs (for at least one voter), then the algorithm ends (either we found an axis, or a contradiction). Otherwise, either we find a contradiction (x have to be placed in two different positions), and the algorithm stops, or we position candidates x and y on the axis. To conclude, note that if we are not in case 2d , the induction hypothesis x1 ≺k x2 ≺k . . . ≺k xi and xj 1k xj+1 1k . . . 1k xn remains true after positioning x and y (otherwise, in case 2d the algorithm stops). 3. L = {x}, i.e. each voter ranked x last (in T ). Several cases may occur for voter k: (a) x ≺k xi and x ≺k xj : as previously, this case is impossible. (b) xi ≺k x and xj ≺k x : no constraint. (c) xi ≺k x ≺k xj (or inverse): x is necessarily in position i + 1. Hence, if no contradiction is obtained and no compatible order is found, we position one or two new candidates. Steps 2 and 3 are repeated until all the candidates are positioned or a contradiction occurs. The previous analysis enables us to state the following result: Proposition 1 Let P be a preference profile. The previous algorithm outputs an axis compatible with P if any, or finds a contradiction otherwise.
Example 2 Let X = {x1 , x2 , x3 , x4 , x5 , x6 } and consider two voters with the following preferences: x6 ≺1 x5 ≺1 x4 ≺1 x1 ≺1 x3 ≺1 x2 and x1 ≺2 x6 ≺2 x5 ≺2 x2 ≺2 x3 ≺2 x4 • Iteration 1: The set L of worst candidates is L = {x1 , x6 }. T being empty, we can choose the positions of x1 and x6 , for instance respectively in the leftmost and rightmost positions. Partial axis: x1 > .... > x6 . • Iteration 2: T = {x2 , x3 , x4 , x5 } and L = {x5 }. For voter 1, x6 ≺1 x5 ≺1 x1 , hence necessarily x5 is in fifth position in the axis. For voter 2, x1 ≺2 x5 and x6 ≺2 x5 hence for the voter 2 the positioning does not matter. Partial axis: x1 > ... > x5 > x6 . • Iteration 3: T = {x2 , x3 , x4 } and L = {x2 , x4 }. For voter 1, x5 ≺1 x4 ≺1 x1 ≺1 x2 , hence necessarily x4 is in fourth position, and therefore x2 is in second position. For voter 2, x1 ≺2 x5 ≺2 x2 ≺2 x4 hence for her the positioning does not matter. Partial axis: x1 > x2 > . > x4 > x5 > x6 • Iteration 4: T = {x3 }. We verify that with x3 in third position, the partial axis x2 > x3 > x4 is compatible with the two votes. Then, the axis x1 > x2 > x3 > x4 > x5 > x6 is compatible with the profile constituted by the preference relations of the 2 voters. Example 3 Let us consider five candidates and two voters, with x1 ≺1 x2 ≺1 x3 ≺1 x4 ≺1 x5 and x4 ≺2 x3 ≺2 x2 ≺2 x1 ≺2 x5 • Iteration 1: L = {x1 , x4 }: we choose x1 > ... > x4 . • Iteration 2: T = {x2 , x3 , x5 } with L = {x2 , x3 }. voter 1: x1 ≺1 x2 ≺1 x3 ≺1 x4 hence x4 is necessarily the peak of the voter 1. The unique axis possible is consequently x1 > x2 > x3 > x5 > x4 ; it is not compatible with the preference relation of the second voter. This profile is not single-peaked consistent. Example 4 Let us consider five candidates and two voters, with x1 ≺1 x2 ≺1 x3 ≺1 x4 ≺1 x5 and x4 ≺2 x2 ≺2 x3 ≺2 x1 ≺2 x5 . Iteration 1 is as Example 3. For iteration 2: T = {x2 , x3 , x5 } with L = {x2 }. For voter 1, x1 ≺1 x2 ≺1 x4 hence x2 must be immediately to the right of x1 . For voter 2, x4 ≺2 x2 ≺2 x1 hence x2 must be immediately to the left of x4 . Contradiction. This profile is not single-peaked consistent. Example 4 shows that a 2-voters profile may not be consistent. Now, we analyse the running time of the algorithm. At each iteration, either we find a compatible order, or a contradiction, or we position at least one new element. Assuming that each preference relation is given in decreasing order, we find the set L of worst candidates in time O(m). Then, for each voter we do O(1) comparaisons. Step 2d can be possibly longer, since we test the compatibility of an axis with the preference relations of all voters. This step is done in time O(nm) (O(n) for each voter), but it occurs at most once during the algorithm. Then, as long as this step does not occur we have T (n, m) ≤ T (n − 1, m) + O(m). This sums up to T (n, m) = O(nm), and the possible execution of step 2d still leads to T (n, m) = O(nm). Therefore : Proposition 2 The single-peaked consistency problem can be solved in time O(nm). Proposition 2 improves the O(m.n2 ) algorithm given in [2] and is established by a completely different method. Interestingly the algorithm in [9] for cumputing a tree with respect to which the profile is single peaked has similarities with ours. However, not only it works in O(m.n2 ) but it is designed to find a tree and does not guarantee to output an axis where there exists one.
B. Escoffier et al. / Single-Peaked Consistency and Its Complexity
Of course, there may exist several axes compatible with a given profile (the number of such axes is the topic of the next section), and given a profile, one might be interested in finding all the axes compatible with it5 . It is easy to see that the method we proposed can be adapted to find all axes compatible with a profile P ; indeed, it is sufficient to keep in steps 2b and 3b all the different possibilities when several choices are possible. As we will see in the next section, there can be an exponential number of compatible axes, hence of course the running time cannot be polynomially bounded.
Lemma 3 Q(2n−1 , n) = 2
x4 x3 x2 x1 x7 x6 x5
x5 x6 x1 x7 x2 x3 x4
x4 x3 x2 x7 x1 x6 x5
x5 x6 x7 x1 x2 x3 x4
x4 x3 x1 x7 x2 x6 x5
x5 x6 x2 x7 x1 x3 x4
Lemma 4 For all k, n ≥ 1, Q(k, n + 1) ≥ 2Q(k, n)
x4 x3 x7 x1 x2 x6 x5
x5 x6 x2 x1 x7 x3 x4
x5 ≺2 x6 ≺2 x4 ≺2 x3 ≺2 x2 ≺2 x7 ≺2 x1 The modified algorithm gives the 8 compatible axes:
On the number of axes compatible with a profile
In Section 3, we proposed an algorithm for computing an axis compatible with a given profile, but such an axis is not necessarily unique. It is now worth to give bounds on the number of axes compatible with a given profile, as well as the prior probability that a profile is singlepeaked consistent. As mentioned earlier, this set of compatible axes may be of some interest when new voters give their preferences. Obviously, the more compatible axes we have, the more likely this new profile is single-peaked consistent. On the other hand, the existence of several compatible axes may be considered as a drawback, for instance if our goal is to learn some structural information about the candidates. In this section, we focus on the minimum and maximum numbers of axes that are compatible with a set of k distinct votes for n candidates. Let q(k, n) and Q(k, n) be these respective numbers. To begin with, remark that P is compatible with O then P is compatible with the inverse of O (denoted by O−1 ). Moreover, of course, the more voters (or candidates), the less the number of compatible axes. Hence, q and Q are even and non increasing with k and n. First, let us deal with the case of a single axis. Lemma 2 |SP (O)| = 2n−1 Proof. Let O = x1 > x2 > . . . > xn and 1∈ SP (O). 1 is fully determined by (a) its peak xi and (b) the positions of x1 , . . . , xi−1 in the remaining n − 1 positions. Indeed, we know that xj 1 xk for xk < xj < x∗ and for` x∗ ´< xj < xk , hence (a) and (b) suffice to describe 1. There are n−1 possible positionings for x1 , . . . , xi−1 , i−1 ` ´ therefore, n−1 preference relations in SP (O) whose peak is xi . To i−1 get the cardinality of SP (O) we have to sum up over i. By symmetry considerations, we obtain that there exist 2n−1 axes compatibles with a given preference relation. Hence, q(1, n) = Q(1, n) = 2n−1 . We also know (cf. Example 4 without x5 ) that q(2, 4) = 0, therefore, for every k ≥ 2 and n ≥ 4 we have q(k, n) = 0. The only missing case is q(2, 3), which can be easily shown to be equal to 2. The case of Q(k, n) is more interesting. We already know that Q(1, n) = 2n−1 , and, by Lemma 2, Q(k, n) = 0 for k > 2n−1 . 5
We now show that the maximum number of compatible axes is globally inversely proportional to the number of distinct votes. More precisely, Q(k, n) = 2n /k when k = 2j 1 ≤ j ≤ n−1 (Proposition 3). This gives bounds on Q(k, n) for the other values of k. We first show this result for k = 2n−1 (Lemma 3), and then some relations between the values of Q(k, n) when n and/or k change (lemmas 4 and 5).
Proof (sketch). Let O = x1 > x2 > · · · > xn . Let us focus on the set of axes compatible with the 2n−1 preference relations (see Lemma 2) in SP (O). Let xi , xj with xi > xj . The relation R: xj 1 xj+1 1 . . . xn 1 xj−1 1 . . . 1 xi 1 . . . 1 x1 is compatible with O. Any axis O such that xj >O xi >O xn is not compatible with R. Therefore, O is the only axis compatible with SP (O) whose rightmost element is xn . By symmetry, O−1 is the only one whose rightmost element is x1 . The result follows from Lemma 1.
Example 5 Let us consider 7 candidates and two voters, with: x4 ≺1 x3 ≺1 x5 ≺1 x6 ≺1 x2 ≺1 x1 ≺1 x7
4
369
This may be useful for instance if a new voter appears. In this case, it is very easy to find for instance if this new profile is single-peaked consistent.
Proof. Consider a profile P of k preference relations on n candidates that are compatible with Q(k, n) axes. We extend these k relations to n+1 candidates by positioning the new candidate xn+1 last in all relations. For each of the Q(k, n) axes compatible with the initial k relations, we can add xn+1 either as the leftmost element or rightmost element. Therefore we obtain 2Q(k, n) distinct axes, compatible with k distinct preference relations. Thus, Q(k, n+1) ≥ 2Q(k, n). Lemma 5 (Proof omitted) For all n ≥ 2 and all k : Q(k, n + 1) ≤ max{Q(5k/26, n), 2Q(k, n)}. Proposition 3 For all n ≥ 2, all j ∈ [1, n − 1]: Q(2j , n) = 2n−j Proof (sketch). Let j between 1 and n − 1. By Lemma 3, Q(2j , j + 1) = 2. Thanks to Lemma 4, we get Q(2j , n) ≥ 2n−j . Using Lemma 5, we can show that it is in fact an equality. In particular, we get that for each k between 2 and 2n−1 , n−1 2 /k < Q(k, n) < 2n+1 /k (or, if we want tighter bounds: 2n−log2 (k)−1 < Q(k, n) ≤ 2n−log2 (k) ). Lemma 2 enables us to give an approximation of the probability that a randomly generated k-voter, n-candidate profile is singlepeaked consistent. Suppose P is drawn randomly with a uniform probability: for each voter i, the probability that a given preference 1 relation R is the preference relation of voter i is n! , the preference relations of two different voters being independent, therefore each ` 1 ´k . From Lemma 2 we get possible profile has a probability of n! that given an axis O and a preference relation R, the probability that n−1 R ∈ SP (O) is 2 n! . Now, the probability that a k-voter profile is “ n−1 ”k k(n−1) = 2 n!k . This imcompatible with a fixed axis O is 2 n! plies that the probability that a k–voter profile on n candidates is k(n−1) k(n−1) single-peaked consistent is smaller than n! 2 n!k = 2n!k−1 . (The exact probability is of course lower than that, but gets asymptotically close to this upper bound, when the number of voters grows.) Therefore, the probability of single-peaked consistency decreases exponentially with both with the number of voters and the number of candidates6 . Finally, note that the probability of single-peaked consistency is lower than the probability of non-occurrence of the Condorcet paradox. which has received much more attention (see e.g. [6]). 6
Of course, the above computation relies on the assumption that the preference relations of the voters are independent, which is arguably not very realistic. Positive correlations between preference relations allow the probability of single-peaked consistency to decrease less fast.
370
5
B. Escoffier et al. / Single-Peaked Consistency and Its Complexity
Communication complexity of the aggregation of single-peaked preferences
We end this paper by a short additional result on the communication complexity of the aggregation of single-peaked preferences. As said in Section 1, the restriction to single-peaked profiles allows for escaping usual impossibility theorems, which means that there exist natural and satisfactory voting rules and aggregation functions under single-peakedness. First, it is well-known that, if the number of voters is odd (which we will now assume for the sake of simplicity), then the median of the peaks is the Condorcet winner and the pairwise majority aggregation of a profile P , defined by x 1∗P y if and only if |{k | x 1k y}| > m for all x, y ∈ X, is a linear order. 2 We are now interested in the communication complexity of the median voting rule and pairwise majority aggregation for single-peaked profiles. The deterministic communication complexity of a function is the minimal quantity of information (measured in number of bits) used by the a protocol that computes it. One can find a study on the communication complexity of several voting rules (without the single-peakedness restriction) in [5]. In this Section, we assume that the axis O is given (and is common knowledge to all voters). Obviously, the deterministic communication complexity of the median of peaks for single-peaked profiles is at most m.5log n6, since the median of peaks can simply be computed by asking voters to name their peak, which needs 5log n6 bits per voter. The lower bound is less obvious. It can be obtained by taking the same fooling set as in the proof of Theorem 3 in [5], and taking an axis whose median is a. This leads to the following result: Proposition 4 The deterministic communication complexity of the median of peaks is O(m. log n) and Ω(m. log n)7 . The (deterministic) communication complexity of pairwise majority aggregation is a little less obvious but still very simple: Proposition 5 The deterministic communication complexity of pairwise majority aggregation for single-peaked profiles is at most 2m.5log n6 + 2m(n − 2). The proof uses a protocol very similar to the one used in [4] for the elicitation of single-peaked preferences of a voter. We start by determining the median of peaks, which needs m.5log n6 bits (see above). Then we communicate the result to each voter (which requires again m.5log n6 bits). After this, the voters are asked m − 2 successive pairwise comparisons, according to the following protocol, presented informally on an example: suppose the median of peaks is x3 (the axis being x1 < x2 < x3 < x4 < . . .). We set rank(x3 ) = 1, and we ask to each voter her preference between x2 and x4 . If there is a majority for x2 , then x2 is the second “socially preferred candidate” and we set rank(x2 ) = 2. Then, we ask to each voter her preference between x1 and x4 , and so on. Each of these steps requires the central authority (CE) to send to each voter the information enabling her to know the two candidates she has to compare. For this, CE does not have to send the identity of the two candidates (which would require 25log n6 bits) but only one bit, indicating whether the winner of the previous step is the “right” candidate, or the “left” one (for instance, after the voters have been asked their preferences between x2 and x4 , if there is a majority for x4 then CE sends the information “right” to the voters, who now know the next comparison is between x2 and x5 ). Each voter sends the answer to CE, which requires one bit per voter. Hence each iteration requires 2m bits. There are exactly n − 2 iterations, hence the protocol requires the communication of 7
Actually, the same bounds would hold for the nondeterministic communication complexity – see [5].
m.5log n6 + 2m(n − 2) bits. Finally, we see easily that x 1∗P y if and only if rank(x) < rank(y), hence the protocol computes 1∗P .
6
Discussion
In this article we have studied some combinatorial and algorithmic aspects of reasoning with single-peaked preferences. The main contribution is an algorithm that outputs an axis compatible with a profile (when there is one) in time O(mn). We have identified the minimal and maximal number of axes that are simultaneously compatible with a profile (which, as a byproduct, gives an approximation of the probability of single-peaked consistency of a randomly generated profile). As a side result we have given some simple results on the communication complexity of the aggregation of single-peaked preferences. This work deserves some further research in several directions. In particular, as said in Section 4, the probability that a profile singlepeakes decreases dramatically with the number of voters and the number of candidates. However, in many practical cases, even if not stricto sensu single-peaked, the profile can be close (with respect to some metric) to being so. For instance, in a nation-wide political election, given the very high number of voters, the profile is surely not single-peaked. However, in this case, it may be the case that the profile is approximately single-peaked. To make this more precise, we need to define formal notions of “approximate singlepeakedness”, which are meant to measure how far a profile is from being single-peaked. Several definitions seem natural, such as (1) the minimum number of voters whose deletion gives a single-peaked profile, (2) the minimum number of candidates whose deletion gives a single-peaked profile, or (3) the minimum number of axes such that each preference relation of the profile is single-peaked with at least one axis. Computing these measures of single-peakedness lead to very interesting computational problems, for which our algorithm of Section 3 can be the starting point. For instance, for (1) and (2), we can design a branch-and-bound algorithm that generalizes our algorithm. As for (3), we can modify our algorithm to produce a set of axes which covers the whole profile (i.e. such that each preference relation of the profile is compatible with at least one axis).
ACKNOWLEDGEMENTS The authors are grateful to the Project ANR-05-BLAN-0384 for its financial support.
REFERENCES [1] K.J. Arrow, Social choice and individual values, J. Wiley, New York, 1951. 2nd edition, 1963. [2] J. Bartholdi and M. Trick, ‘Stable matching with preferences derived from a psychological model’, Operations Research Letters, 5(4), 165– 169, (1986). [3] Marquis de Condorcet, Essai sur l’application de l’analyse a` la probabilit´e des d´ecisions rendues a` la pluralit´e des voix, Imprimerie Royale, Paris, 1785. [4] V. Conitzer, ‘Eliciting single-peaked preferences using comparison queries’, in Proceedings of AAMAS-07, pp. 408–415, (2007). [5] V. Conitzer and T. Sandholm, ‘Communication complexity of common voting rules’, in Proceedings of EC-05, pp. 78–87, (2005). [6] W. Gehrlein, ‘Condorcet’s paradox and the likelihood of its occurrence: different perspectives on balanced preferences’, Theory and Decision, 52(2), 171–199, (2002). [7] A. Gibbard, ‘Manipulation of voting schemes: A general result’, Econometrica, 41, 587–601, (1973). [8] M.A. Satterthwaite, ‘Strategy proofness and Arrow’s conditions: Existence and correspondence theorems for voting procedures and social welfare functions’, Journal of Economic Theory, 10, 187–217, (1975). [9] M. Trick, ‘Recognizing single-peaked preferences on a tree’, Mathematical Social Sciences, 17(1), 329–334, (1989).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-371
371
Belief Revision through Forgetting Conditionals in Conditional Probabilistic Logic Programs Anbu Yue 1 and Weiru Liu 1 Abstract. In this paper, we present a revision strategy of revising a conditional probabilistic logic program (PLP) when new information is received (which is in the form of probabilistic formulae), through the technique of variable forgetting.We first extend the traditional forgetting method to forget a conditional event in PLPs. We then propose two revision operators to revise a PLP based on our forgetting method. By revision through forgetting, the irrelevant knowledge in the original PLP is retained according to the minimal change principle. We prove that our revision operators satisfy most of the postulates for probabilistic belief revision. A main advantage of our revision operators is that a new PLP is explicitly obtained after revision, since our revision operator performs forgetting a conditional event at the syntax level.
1
Introduction
Belief revision is concerned with how to revise an agent’s current belief when new evidence is received, where this new evidence is assumed to have the highest priority. Any belief in the current belief set that is inconsistent with the evidence has to be weakened or omitted in order to get a revised consistent set of beliefs. In the literature of probabilistic belief revising, most research focuses on revising a single probabilistic distribution [5, 1, 8, 4, 3]. However, a single probabilistic distribution is not suitable for representing imprecise probabilistic beliefs, as the case for a conditional probabilistic logic program (PLP), where a set of probability distributions are usually associated with a PLP [13, 14]. Research on revising a set of probabilistic distributions are reported in [16, 7], but these methods (as well as methods on revising single probabilistic distributions) can only revise probability distributions by a certain kind of evidence, i.e., evidence that is consistent with the original distributions. Therefore, any evidence that is not fully consistent with current knowledge (beliefs) cannot be used. The notion of forgetting (facts) (or referred to as variable forgetting) proposed in [12] has been applied (or adapted) in many logic based reasoning techniques. For example, forgetting is used for belief merging in [11], and the relationship between forgetting and belief change is studied in [15]. Traditionally, the main focus has been on forgetting a fact in classical logics. The issue of forgetting conditional knowledge has not been investigated, whilst conditional knowledge is very important, especially in research on (logic) reasoning with conditionals [13, 14]. In this paper, we extend the method of forgetting to forget conditional events in conditional probabilistic logic programs (PLPs). 1
School of Electronics, Electrical Engineering and Computer Science, Queen’s University of Belfast, Belfast BT7 1NN, UK {a.yue, w.liu}@qub.ac.uk
Given a PLP P , forgetting a conditional event (ψ|φ) in P means that fact ψ is forgotten only in the domain defined by φ. Assume that (ψ |φ )[l , u ] ∈ P , the challenge is how to retain part or all of knowledge (ψ |φ )[l , u ] when φ and φ are inequivalent. To achieve this, we define a notion of irrelevnace for conditional events, so that forgetting a conditional event will retain any irrelevant knowledge. Since any classical theory T can be represent by a PLP [13], we prove that forgetting a fact ψ in a classical theory T is equivalent to forgetting a conditional event (ψ|) in P that represents T . Based on the technique of forgetting a conditional event, we propose two operators for revising PLPs by a probabilistic formula of the form (ψ|φ)[l, u]. Our revision operators satisfy most of the postulates for imprecise probabilistic belief revision. These postulates were proposed in [18] and were proved to be an extension of Darwiche and Pearl postulates [2], Bayesian conditioning and Jeffrey’s rule. Since any conditional event can be forgotten in a PLP, our revision operators do not require new evidence (information) to be consistent with the original PLPs. Another advantage of these revision operators is that, a new PLP is explicitly obtained as the result of revision, since forgetting a conditional event is defined at the syntax level. This is in contract to traditional probabilistic revision mentioned above where a revision result is a single or a set of probability distributions (which can be seen as the models of a probabilistic knowledge base, e.g., PLP). This paper is organized as follows. In the next section, we briefly review probabilistic logic programming, postulates for probabilistic belief revision, and forgetting. In Section 3, we propose an approach to forgetting a conditional event in a PLP, and in Section 4, we propose two belief revision operators and give their properties. After comparing with related work in Section 5, we conclude this paper.
2 2.1
Preliminaries Probabilistic logic programs (PLPs)
We briefly review conditional probabilistic logic programs here, see [13, 14] for details. Let Φ be a finite set of predicate symbols and constant symbols, and V be a set of object variables and B be a set of bound constants which are in [0,1] describing the bound of probabilities. It is required that Φ contains at least one constant symbol. We use lowercase letters a, b, . . . for constants from Φ, uppercase letters X, Y for object variables, and l, u for bound constants. In Φ, there are two predicate symbols and ⊥ which represent true and false respectively. An object term is a constant from Φ or an object variable from V. An atom is of the form p(t1 , . . . , tk ), where p is a predicate symbol and ti is an object term. An event or formula is constructed from
372
A. Yue and W. Liu / Belief Revision Through Forgetting Conditionals in Conditional Probabilistic Logic Programs
a set of atoms by logic connectives ∧, ∨, ¬ as usual, and a conditional event is of the form ψ|ϕ with events ψ and ϕ. We use Greek letters φ, ψ, ϕ for events, α, β for conditional events. A probabilistic formula is of the form (ψ|ϕ)[l, u] which means that the probability bounds for conditional event ψ|ϕ are l and u. We call ψ its consequent and ϕ its antecedent. A conditional probabilistic logic program (PLP) P is a set of probabilistic formulae. We use PL to denote the set of all PLPs, and F to denote the set of all probabilistic formulas. An object term, event, conditional event, probabilistic formula, or PLP is called ground iff it does not contain any object variables from V. Herbrand universe (denoted as HUΦ ) is the set of all constants from Φ, and Herbrand base HBΦ is a finite nonempty set of all events constructed from the predicate symbols in Φ and constants in HUΦ . A possible world I is a subset of HBΦ s.t. ∈ I and ⊥ ∈ / I, and IΦ is the set of all possible worlds over Φ. An assignment σ maps each object variable to an element of HUΦ . It is extended to object terms by σ(c) = c for all constant symbols from Φ. An event ϕ is satisfied by I under σ, denoted by I |=σ ϕ, is defined inductively as: • • • •
I I I I
|=σ |=σ |=σ |=σ
p(t1 , . . . , tn ) iff p(σ(t1 ), . . . , σ(tn )) ∈ I; φ1 ∧ φ2 iff I |=σ φ1 and I |=σ φ2 ; φ1 ∨ φ2 iff I |=σ φ1 or I |=σ φ2 ; ¬φ iff I |=σ φ
An event ϕ is satisfied by a possible world I, or I is a model of ϕ, denoted by I |=cl ϕ, iff I |=σ ϕ for all assignment σ. In this paper, we call the set of the models of ϕ the domain of ϕ. An event ϕ is a logical consequence of event φ, denoted as φ |=cl ϕ, iff all possible worlds that satisfy φ also satisfy of ϕ. A probabilistic interpretation P r is a probability distribution on IΦ (i.e., as IΦ is finite, P r is a mapping from IΦ to the unit interval [0,1] such that I∈IΦ P r(I) = 1). The probability of an event ϕ in P r under an assignment σ, is defined as P rσ (ϕ) = I∈IΦ ,I|=σ ϕ P r(I). If ϕ is ground, we simply write as P r(ϕ). A probabilistic formula (ψ|ϕ)[l, u] is satisfied by a probabilistic interpretation P r under an assignment σ, denoted by: P r |=σ (ψ|ϕ)[l, u] iff P rσ (ϕ) = 0 or P rσ (ψ|ϕ) ∈ [l, u]. A probabilistic formula μ is satisfied by a probabilistic interpretation P r, or P r is a probabilistic model of μ, denoted by P r |= μ, iff P r |=σ μ for all assignments σ. A probabilistic interpretation is a probabilistic model of a PLP P , denoted by P r |= P , iff P r is a probabilistic model of all μ ∈ P . A PLP P is satisfiable or consistent iff a model of P exists. A probabilistic formula (ψ|ϕ)[l, u] is a consequence of the PLP P , denoted by P |= (ψ|ϕ)[l, u], iff all probabilistic models of P are also probabilistic models of (ψ|ϕ)[l, u]. A probabilistic formula (ψ|ϕ)[l, u] is a tight consequence of P , denoted by P |=tight (ψ|ϕ)[l, u], iff P |= (ψ|ϕ)[l, u], P |= (ψ|ϕ)[l, u ], P |= (ψ|ϕ)[l , u] for all l > l and u < u (l , u ∈ [0, 1]). Notice that, if P |= (φ|)[0, 0], then it is canonically defined as P |=tight (ψ|φ)[1, 0], where [1, 0] stands for an empty set.
2.2
Probabilistic belief revision
We briefly review the postulates for revising PLPs here, see [18] for details. Given a PLP P , we define set Bel0 (P ) as Bel0 (P ) = {(ψ|φ)[l, u] | P |= (ψ|φ)[l, u], P |= (φ|)[0, 0]} and call it the belief set of P . Condition P |= (φ|)[0, 0] is required because when P |= (φ|)[0, 0], P |= (ψ|φ)[l, u] for all ψ and all [l, u] ⊆ [0, 1]. Without this condition, some counterintuitive conclusions can be inferred, for instance, (ψ|φ)[0, 0.3] and (ψ|φ)[0.9, 1] can simultaneously be the beliefs of an agent if P |= (φ|)[0, 0].
Each probabilistic epistemic state, Ψ, has a unique belief set, denoted as Bel0 (Ψ), which is a set of probabilistic formulae. Bel0 (Ψ) is closed, i.e. Bel0 (Bel0 (Ψ)) = Bel0 (Ψ). We call Ψ a probabilistic epistemic state of a PLP P , iff Bel0 (Ψ) = Bel0 (P ). In general, there exist many ways to define probabilistic epistemic state. e.g., we can define a probabilistic epistemic state as the set of probabilistic distributions that satisfies the PLP, see [18] for details. Furthermore, we have the following inference relations: Ψ |= (ψ|φ)[l, u] iff (ψ|φ)[l, u] ∈ Bel0 (Ψ), and Ψ |=tight (ψ|φ)[l, u] iff Ψ |= (ψ|φ)[l, u] and for all [l , u ] ⊂ [l, u], Ψ |= (ψ|φ)[l, u]. We write Ψ ∧ (ψ|φ)[l, u] to represent Bel0 (Ψ) ∪ {(ψ|φ)[l, u]}. Also, Ψ |= (ψ|φ)[l, u] iff P |= (ψ|φ)[l, u] when P |=tight (φ|)[0, 0]. Definition 1 A conditional event (ψ|φ) is more specific than another conditional event (ψ |φ ), denoted as (ψ|φ) (ψ |φ ), iff • φ |=cl φ ∧ ψ , or • φ |=cl φ ∧ ¬ψ . Conditional event (ψ|φ) affects only the relationship (probability distributions) between φ ∧ ψ and φ ∧ ¬ψ. When (ψ|φ) (ψ |φ ) holds, (ψ|φ) provides detailed information about φ, which is a subevent of φ ∧ ψ or φ ∧ ¬ψ . Therefore, (ψ|φ) is more specific than (ψ |φ ). Definition 2 (perpendicular) A conditional event (ψ|φ) is perpendicular with another conditional event (ψ |φ ), denoted as (ψ|φ) !" (ψ |φ ) iff (ψ|φ) (ψ |φ ), or (ψ |φ ) (ψ|φ), or |=cl ¬(φ ∧ φ). The perpendicularity relation formalizes a kind of irrelevance between two conditional events. The above definition is an extension of the definition of perpendicular in [9], in which the first condition is not required. If (ψ|φ) (ψ |φ ), then (ψ|φ) is more specific than (ψ |φ ) and thus (ψ|φ) will not affect (ψ |φ ). We know that (ψ|φ) can not affect the probability distributions within the domain (ψ ∧ φ) or the domain (¬ψ ∧ φ), so if (ψ |φ ) (ψ|φ), then φ is a sub-event of (ψ ∧ φ) or (¬ψ ∧ φ), and therefore (ψ|φ) can not affect (ψ |φ ). If |=cl ¬(φ ∧ φ), then φ and φ have disjoint domains, so (ψ|φ) and (ψ |φ ) are irrelevant. Definition 3 ([18]) Let P be a PLP with epistemic state Ψ and μ = (ψ|φ)[l, u] be a probabilistic formula. The result of revising P by μ is another probabilistic epistemic state, denoted as Ψ μ where is a revision operator. Operator is required to satisfy the following postulates: R*1 Ψ μ |= μ R*2 Ψ ∧ μ |= Ψ μ R*3 if Ψ ∧ μ is satisfiable, then Ψ μ |= Ψ ∧ μ R*4 Ψ μ is unsatisfiable only if μ is unsatisfiable R*5 Ψ μ ≡ Ψ μ if μ ≡ μ R*6 Let μ = (ψ|φ)[l, u] and Ψ μ |=tight (ψ|φ)[l , u ]. Let μ = (ψ|φ)[l1 , u1 ] and Ψ μ |=tight (ψ|φ)[l1 , u1 ]. For any > 0, if |u1 − u| + |l1 − l| < , and both of (ψ|φ)[l, u] and (ψ|φ)[l , u ] are satisfiable, then |u1 − u | + |l1 − l | < . R*7 if Ψ |= (φ|)[l, u], then (Ψ μ) |= (φ|)[l, u] R*8 for all ψ and φ , if (ψ|φ) !" (ψ |φ ) and Ψμ |= (ψ |φ )[l, u] then Ψ |= (ψ |φ )[l, u] .
A. Yue and W. Liu / Belief Revision Through Forgetting Conditionals in Conditional Probabilistic Logic Programs
R*1 - R*5 is an analog to postulates R1 - R4 in [2]. We do not have corresponding postulates for R5 and R6 in [2] since revision with the conjunction of conditional events are more complicated and is beyond the scope of this paper. R*6 is a sensitivity requirement, which says that a slightly modification on the bounds of μ = (ψ|φ)[l, u] (i.e., μ = (ψ|φ)[l1 , u1 ]) shall not affect the result of revision significantly. R*7 says that revising by μ = (ψ|φ)[l, u] should not affect the statement about φ (but the impreciseness of φ may be decreased). Recall that perpendicular condition characterizes a kind of irrelevance, R*8 says that any irrelevance knowledge with new evidence should not be affected by the revision using this evidence. It is proved that these postulates is an extension of modified AGM postulates and Darwiche and Pearl postulates for iterative revision [2]. It is also proved that these postulates lead to Jeffrey’s rule and Bayesian conditioning when the original PLP (probabilistic epistemic) defines a single probability distribution.
2.3
Forgetting a fact
Given a set of ground formulas and an atom p, forgetting p in a set of formulas T means obtaining another set of formulas which is weaker than T , but retain the same conclusions that irrelevant to p. Let p(t) be a ground atom, and I1 , I2 be two possible worlds. Define I1 ≈p(t) I2 iff I1 and I2 agree on everything except possibly on the truth value of p(t): 1. I1 and I2 have the same domain, i.e. I1 and I2 are defined on the same Herbrand base. 2. for every predicate symbol q that differs from p, and for every ground term t , q(t ) ∈ I1 iff q(t ) ∈ I2 . Definition 4 ([12]) Let T be a set of formulae and p(t) be a ground atom. The result of forgetting p(t), denoted as T = f orgetcl (T, p(t)), is a set of formulae such that, for any possible world I , I is a model of T iff there is a model I of T such that I ≈p(t) I . Proposition 1 ([12]) For any theory T and ground atom p(t), T |= f orgetcl (T, p(t)). Let ϕ be a ground formula and p(t) be a ground atom. We use ϕ+ (resp. ϕ− ) to denote the result of replacing every occurrence p( t) p( t) of p(t) in ϕ by (resp. ⊥). Proposition 2 ([12]) Let ϕ be a ground formula and p(t) be a ground atom. Suppose that theory T = {ϕ}, then ∨ ϕ− }. f orgetcl (T, p(t)) ≡ {ϕ+ p( t) p( t) Let p1 (t1 ), . . . , pn (tn ) be a sequence of ground atoms. The result of forgetting p1 (t1 ), . . . , pn (tn ) in T , denoted as f orgetcl (T, p1 (t1 ), . . . , pn (tn )), is inductively defined as f orgetcl (f orgetcl (T, p1 (t1 ), . . . , pn−1 (tn−1 )), pn (tn )). Proposition 3 ([12]) For any theory T and any ground atoms f orgetcl (f orgetcl (T, p1 (t1 )), p2 (t2 )) and p1 (t1 ), p2 (t2 ), f orgetcl (f orgetcl (T, p2 (t2 )), p1 (t1 )) are logically equivalent. The above proposition indicates that the order of the sequence p1 (t1 ), . . . , pn (tn ) is not important in f orgetcl (T, p1 (t1 , . . . , pn (tn ))). In this paper, we write f orgetcl (T, A) to represent f orgetcl (T, p1 (t1 ), . . . , pn (tn )), where A = {p1 (t1 ), . . . , pn (tn )}. We also write f orgetcl (T, φ) to represent f orgetcl (T, Aφ ), where Aφ is the set of atoms that appear in φ.
3
373
Forgetting a Conditional Event
Sometimes, forgetting a fact under certain conditions is useful, for example, forgetting fact ϕ when φ is given. To achieve this, we provide an approach to forgetting a conditional event (ψ|φ), which means forgetting ψ only in the domain of φ, and keeping the original knowledge that is out of the domain of φ unchanged. Definition 5 Let [l, u] and [l , u ] be two intervals. The closest subinterval of [l , u ] to [l, u], denoted as clb([l , u ], [l, u]), is defined by clb([l , u ], [l, u]) = [lb , ub ], where • if u < l then lb = ub = u , • if l > u then lb = ub = l , • otherwise, lb = max{l, l }, ub = min{u, u }. Definition 6 Let P be a PLP and μ ∈ P where μ = (ψ1 |φ1 )[l, u]. Assume that ν = (ψ2 |φ2 ) is a conditional event. We define f orgetP (μ, ν) as: ⎫ ⎧ ⎬ ⎨ (φ2 |φ1 )[la , ua ], (φ1 |φ2 )[lb , ub ], (ψ1 |φ1 ∧ ¬φ2 )[l1 , u1 ], f orgetP (μ, ν) = ⎭ ⎩ (f orgetcl (ψ1 , ψ2 )|φ1 ∧ φ2 )[l2 , u2 ] where P |=tight (φ2 |φ1 )[la , ua ], P |=tight (φ1 |φ2 )[lb , ub ], P |=tight (ψ1 |φ1 ∧ ¬φ2 )[l , u ], P |=tight (ψ1 |φ1 )[l”, u”], clb([l , u ], [l”, u”]) = [l1 , u1 ], P |=tight (f orgetcl (ψ1 , ψ2 )|φ1 ∧ φ2 )[l2 , u2 ]. We define f orget(P, ν) = μ∈P f orgetP (μ, ν). When forgetting a conditional event (ψ2 |φ2 ), the domain of the original beliefs should be divided into two parts: within the domain of φ2 and out of the domain of φ2 . That is, if (ψ1 |φ1 )[l, u] ∈ P , then the knowledge about (ψ1 |φ1 ) in P is implicitly contained by (ψ1 |φ2 ∧ φ1 ) and (ψ2 |φ1 ∧ ¬φ2 ). Intuitively, the former may be affected and the latter should be retained. Also, the knowledge about (ψ1 |φ1 ) should be changed as minimal as possible. To achieve this, the knowledge about (ψ1 |φ1 ) must be retained by the knowledge about (ψ1 |φ1 ∧ ¬φ2 ) in the result PLP. In addition, the relationships (subsumption, overlap, disjoint, etc.) between the domains of φ1 and φ2 should not be affected. Proposition 4 Let P be a PLP, and ν = (ψ|φ) be a conditional event. If P |= (φ|)[0, 0] then f orget(P, ν) |=tight ν[0, 1]. If P |= (φ|)[0, 0] then f orget(P, ν) ≡ P , and we have that ν∈ / f orget(P, ν). In the above proposition, P |= (φ|)[0, 0] indicates that any conditional event with φ as the antecedent has no effects on the semantics of P , however, at the syntax level ν ∈ / f orget(P, ν). Proposition 5 Let P = {(ψ1 |φ1 )[l1 , u1 ], . . . , (ψn |φn )[ln , un ]} be a PLP, and ν = (ψ|φ) be a conditional event. Suppose that (ψ|φ) (ψi |φi ) for all i ∈ {1, . . . , n}, then f orget(P, ν) ≡ P . However, if P = {(ψ1 |φ1 )[l1 , u1 ], . . . , (ψn |φn )[ln , un ]} and (ψi |φi ) (ψ|φ) holds for i = 1, ...n, then f orget(P, (ψ|φ)) ≡ P does not hold in general. This is because that forgetting a conditional event (ψ|φ) will forget not only the relationship between (φ∧ψ) and (φ ∧ ¬ψ), but also all statements about ψ in the domain of φ.
374
A. Yue and W. Liu / Belief Revision Through Forgetting Conditionals in Conditional Probabilistic Logic Programs
Proposition 6 Let P = {(φ1 ∧ · · · ∧ φn |)[1, 1]} and ν = (ϕ|). Then for any event ψ, f orget(P, ν) |= (ψ|)[1, 1] iff f orgetcl ({φ1 ∧ · · · ∧ φn }, ϕ) |=cl ψ. Let two theories be T1 = {φ1 , . . . , φn } and T2 = {φ1 ∧· · ·∧φn }, then T1 ≡ T2 and T2 is logically equivalent to PLP P = {(φ1 ∧· · ·∧ φn |)[1, 1]}, f orget(P, ν) is equivalent to f orgetcl (T, ϕ), where ν = (ϕ|). As a consequence, forgetting facts is a special case of forgetting conditional events. Definition 7 Let P be a PLP and its set of probability distributions be Pr, and ν = (ψ|φ) be a conditional event. We let PrνP be the set of probabilistic distributions s.t. P r ∈ PrνP iff there exists a P r ∈ Pr such that (1) (2) (3)
P r (I) = P r(I), if I |=φ J|=φ,J≈ψ I P r (J) = J|=φ,J≈ψ I P r(J), if I |= φ P r (φ ∧ φ ) = P r(φ ∧ φ ), if there exists (ψ |φ )[l, u] ∈ P
In the above definition, condition (1) means that when φ is not satisfied, then nothing should be forgotten; condition (2) says that even when φ is satisfied, only the beliefs that are relevant to ψ are forgotten; condition (3) says that within the domain of φ, the probabilities of the antecedents of probabilistic formulae in P should not be affected. Obviously, Pr ⊆ PrνP and therefore, PrνP is not empty iff P is satisfiable. Proposition 7 Let P be a PLP and ν = (ψ|φ) be a conditional event. Then PrνP is the set of probabilistic models of f orget(P, ν). Forgetting a conditional event will not introduce new knowledge. Proposition 8 Let P be a PLP and ν = (ψ|φ) be a conditional event. Then f orget(P, v) |= P . Example 1 Let P be given as: ⎧ ⎫ ⎨ (f ly(t)|bird(t))[0.98, 1] ⎬ (bird(t)|penguin(t))[1, 1] P = ⎩ ⎭ (penguin(t)|bird(t))[0.1, 1] From P , it can be inferred that P |= (f ly(t)|penguin(t))[0.8, 1]. When it is informed that this conclusion may be wrong, we want to revise P by forgetting ν = (f ly(t)|penguin(t)). After forgetting ν from P we can get the PLP f orget(P, ν). It is worth noting that, for any PLP P , for any events φ and ψ and any l, u ∈ [0, 1], the statements P |= (|φ)[1, 1], P |= (φ|⊥)[l, u] and P |= (ψ|φ ∧ ψ)[1, 1] always hold. By omitting such kind of probabilistic formulae, f orget(P, ν) can be simplified as: ⎧ ⎫ ⎨ (penguin(t)|bird(t))[0.1, 1] ⎬ (bird(t)|penguin(t))[1, 1] f orget(P, ν) = ⎩ ⎭ (f ly(t)|bird(t) ∧ ¬penguin(t))[0.98, 1] In the original P , P |=tight (f ly(t)|bird(t) ∧ ¬penguin(t))[0, 1]. The lower bound is from the assumption that it is possibly that all birds are penguins and all penguins cannot fly. In another word, this conclusion depends on the knowledge about (f ly(t)|penguin(t)) which should be forgotten, and thus, this bound is not suitable. On the contrary, it is stated in f orget(P, ν) that (f ly(t)|bird(t) ∧ ¬penguin(t))[0.98, 1], which retains the knowledge that a bird (which is not a penguin) very likely can fly. Let P = f orget(P, ν), we have P |= (f ly(t)|penguin(t))[0, 1], which means that indeed in P , the knowledge about whether penguins can fly is totally forgotten.
4
Belief Revision by Forgetting
In this section, we define two specific revision operators that revising a PLP P with a probabilistic formula. Definition 8 Let P be a PLP in PL, and μ = (ψ|φ)[l, u] be a probabilistic formula in F . Let ν = (ψ|φ). We define operator 70 : PL × F → F such that P 70 μ = f orget(P, ν) ∪ {μ}. Example 2 Let P be given as in Example 1, and (f ly(t)|penguin(t))[0, 0] be a probabilistic formula. ⎧ ⎪ ⎪ (f ly(t)|bird(t) ∧ ¬penguin(t))[0.98, 1] ⎨ (bird(t)|penguin(t))[1, 1] P 70 μ = (penguin(t)|bird(t))[0.1, 1] ⎪ ⎪ ⎩ (f ly(t)|penguin(t))[0, 0]
μ
=
⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭
Now we can infer that P 0 ν |=tight (f ly(t)|bird(t))[0, 0.9] which is intuitively correct. The upper bound of the probability if whether a bird can fly is changed to be 0.9 following the fact that some birds (penguins) cannot fly. In the above example, we have P 70 ν |=tight (f ly(t)|bird(t))[0, 0.9]. This lower bound (0) means that it is possible that all birds cannot fly. The lower bound comes from the possibility that all birds are penguins since P |= (penguin(t)|bird(t))[0.1, 1]. Using operator 70 to revise P with (f ly(t)|penguin(t))[0, 0] does not eliminate such possibility. On the another hand, since the new information that penguins cannot fly contradicts with the original general knowledge that most birds can fly, it implicitly suggests that penguins are very different from typical birds. Formally, the probability of (penguin(t)|bird(t)) should be low. In fact, if we had (penguin(t)|bird(t))[0.1, 0.1] in P 70 μ in the above example, we should have got P = (P 70 μ) ∪ {(penguin(t)|bird(t))[0.1, 0.1]} and P |=tight (f ly(t)|bird(t))[0.882, 0.9] which gives a much tighter and more intuitive bounds for (f ly(t)|bird(t)). This discussion suggests that sometimes the contradiction between new information (ψ|φ)[l, u] and an original PLP P implies that the antecedent φ is a special case of φ for any φ that (ψ |φ )[l , u ] ∈ P and φ is relevant to (ψ|φ). Here φ is relevant to (ψ|φ) means that a tighter probability bound for (ψ|φ) can be inferred from P only when more knowledge about the relationship between φ and φ (i.e. a tighter bound for (φ |φ) or (φ|φ )) is provided. The above discussion leads us to define another revision operator 7. Revising with this operator, the impreciseness of the antecedent of new information may be decreased. Definition 9 Let P be a PLP in PL, and μ = (ψ|φ)[l, u] be a probabilistic formula in F . Let ν = (ψ|φ). We define operator 7 : PL × F → PL which satisfies (1) μ ∈ P 7 μ (2) f orget(P, ν) ⊆ P 7 μ (3) ∀(ψ |φ )[l, u] ∈ P, (φ |φ)[la , ua ] ∈ P 7 μ and (φ|φ )[lb , ub ] ∈ P 7 μ where P |=tight (ψ|φ)[l0 , u0 ] clb([l0 , u0 ], [l, u]) = [l , u ] P ∪ {(ψ|φ)[l , u ]} |=tight (φ |φ)[la , ua ] P ∪ {(ψ|φ)[l , u ]} |=tight (φ|φ )[lb , ub ] and P 7 μ is the smallest set (with respect to set inclusion) that satisfying the above conditions. Obviously, P 7 ν |= P 70 ν.
A. Yue and W. Liu / Belief Revision Through Forgetting Conditionals in Conditional Probabilistic Logic Programs
Example 3 Let P be as given in Example 1, and (f ly(t)|penguin(t))[0, 0] be a probabilistic formula. ⎧ (f ly(t)|bird(t) ∧ ¬penguin(t))[0.98, 1] ⎪ ⎪ ⎨ (bird(t)|penguin(t))[1, 1] P 7μ= (penguin(t)|bird(t))[0.1, 0.1] ⎪ ⎪ ⎩ (f ly(t)|penguin(t))[0, 0]
μ
=
⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭
Now, we have that most birds can fly since P 7 ν |=tight (f ly(t)|bird(t))[0.882, 0.9] and this knowledge is still imprecise. Proposition 9 Both operators 70 and 7 satisfy the postulates R*1, R*2, and R*4-R*7. Both operators do not satisfy R*3 in general. This comes from the fact that our operators retain the impreciseness of the original knowledge whilst Ψ ∧ μ decreases the impreciseness of the original knowledge. These two operators also do not satisfy R*8 in general. Let be any revision operator, then R*8 is equivalent to the following three separate postulates: R*8 .1 for all ψ and φ , if (ψ|φ) (ψ |φ ) and P (ψ|φ)[l, u] |= (ψ |φ )[l , u ] then P |= (ψ |φ )[l , u ]. R*8 .2 for all ψ and φ , if (ψ |φ ) (ψ|φ) and P (ψ|φ)[l, u] |= (ψ |φ )[l , u ] then P |= (ψ |φ )[l , u ]. R*8 .3 for all ψ and φ , if |=cl ¬(φ ∧ φ ) and P (ψ|φ)[l, u] |= (ψ |φ )[l , u ] then P |= (ψ |φ )[l , u ]. Proposition 10 The operator 70 and 7 satisfy R*8 .1 and R*8 .3. R*8 .2 is not satisfied by 70 and 7 because forgetting conditional event (ψ|φ) may affect the knowledge about (ψ |φ ) if (ψ |φ ) (ψ|φ).
5
Related Work and Conclusion
Related work: Traditionally, forgetting is to delete some concepts (atoms or facts) from a given theory in a classical logic-based language. In this paper, we extended the concept of forgetting to forget conditional events other than facts in the framework of conditional probabilistic logic programming. Since facts can be represented as a special kind of conditional events, i.e., conditional events that have tautologies as its antecedent, it is not surprising that our forgetting method subsumes the original approach to forgetting facts. In [15], forgetting facts is deployed in belief change in propositional logic. When reducing forgetting conditional events operation to forgetting facts in our operator 70 (since when the bounds for every probabilistic formula is either [0,0] or [1,1], a PLP actually contains a set of propositional formulae), we can obtain the update operator defined in [15]. However, there is no counterpart of our 7 in [15]. In the literature of probabilistic belief revision, most revision operators are model-based, that is a revision operator revises a single or a set of probability distributions, and the result is also a single or a set of probability distributions. This kind of revision makes the probabilistic knowledge implicit, especially when this knowledge is in the form of PLP. On the contrary, our operators are defined at the syntax level, and a revised PLP is obtained as the result. Many probabilistic belief revision operators require that new knowledge is consistent with the original knowledge [1, 8, 3, 4, 10, 17]. In contrast, since any conditional event can be forgotten from a PLP, we do not require that new knowledge is consistent with a given PLP. Furthermore, our revision results can still be imprecise (See Example 3) while some other revision operators [1, 8, 3, 4, 5, 6, 10], produce single probability distributions as the result of revision.
375
Conclusions: In this paper, we extended the concept of forgetting to forgetting conditional events in PLPs and proposed two revision operators based on our forgetting (of conditional events) approach. Our revision operators forget inconsistent knowledge and retain irrelevant knowledge with respect to new information. Among the two operators we have defined, the second operator (7) is particularly designed for situations where the antecedent of a conditional event (new information) in the original PLP is imprecise. The first revision operator does not change anything (bounds of probabilities) about the antecedent after revision whilst the second operator decreases the imprecision of the antecedent (in terms of probability bounds). The rational of operator 7 comes from the assumption that if new information contradicts with the original PLP, then it suggests that the antecedent may be a special case of a general concept defined in this PLP (such as penguin is a special type of bird, but not a common type of bird). Our operators satisfy most of the postulates for probabilistic belief revision and operate at the syntax level of a PLP, so that a new PLP is explicitly returned as the result of revision.
REFERENCES [1] Hei Chan and Adnan Darwiche, ‘On the revision of probabilistic beliefs using uncertain evidence’, Artif. Intell., 163(1), 67–90, (2005). [2] Adnan Darwiche and Judea Pearl, ‘On the logic of iterated belief revision’, Artif. Intell., 89(1-2), 1–29, (1997). [3] Didier Dubois and Henri Prade, ‘Focusing vs. belief revision: A fundamental distinction when dealing with generic knowledge’, in Proc. of ECSQARU-FAPR’97, pp. 96–107, (1997). [4] B. Van Fraasen, ‘Probabilities of conditionals’, in Proc. of Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, pp. 261–300, (1976). [5] I. R. Goodman and Hung T. Nguyen, ‘Probability updating using second order probabilities and conditional event algebra’, Inf. Sci., 121(34), 295–347, (1999). [6] Adam J. Grove and Joseph Y. Halpern, ‘Probability update: Conditioning vs. cross-entropy’, in Proc. of UAI’97, pp. 208–214, (1997). [7] Adam J. Grove and Joseph Y. Halpern, ‘Updating sets of probabilities’, in Proc. of UAI’98, pp. 173–182, (1998). [8] Peter Gr¨unwald and Joseph Y. Halpern, ‘Updating probabilities’, J. Artif. Intell. Res. (JAIR), 19, 243–278, (2003). [9] Gabriele Kern-Isberner, ‘Postulates for conditional belief revision’, in Proc. of IJCAI’99, pp. 186–191, (1999). [10] Gabriele Kern-Isberner and Wilhelm R¨odder, ‘Belief revision and information fusion on optimum entropy’, Int. J. Intell. Syst., 19(9), 837–857, (2004). [11] J´erˆome Lang, Paolo Liberatore, and Pierre Marquis, ‘Propositional independence: Formula-variable independence and forgetting’, J. Artif. Intell. Res. (JAIR), 18, 391–443, (2003). [12] Fangzhen Lin and Raymond Reiter, ‘Forget it!’, in Working Notes, AAAI Fall Symposium on Relevance, eds., Russell Greiner and Devika Subramanian, pp. 154–159, Menlo Park, California, (1994). American Association for Artificial Intelligence. [13] Thomas Lukasiewicz, ‘Probabilistic logic programming.’, in Proc. of ECAI’98, pp. 388–392, (1998). [14] Thomas Lukasiewicz, ‘Probabilistic logic programming with conditional constraints.’, ACM Trans. Comput. Log., 2(3), 289–339, (2001). [15] Abhaya C. Nayak, Yin Chen, and Fangzhen Lin, ‘Forgetting and knowledge update’, in Proc. of Australian Conference on Artificial Intelligence, pp. 131–140, (2006). [16] Damjan Skulj, ‘Jeffrey’s conditioning rule in neighbourhood models’, Int. J. Approx. Reasoning, 42(3), 192–211, (2006). [17] Frans Voorbraak, ‘Probabilistic belief change: Expansion, conditioning and constraining’, in Proc. of UAI’99, pp. 655–662, (1999). [18] Anbu Yue and Weiru Liu, ‘Revising imprecise probabilistic beliefs in the framework of probabilistic logic programming’, in Proc. of AAAI’08, (2008).
376
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-376
Mastering the Processing of Preferences by Using Symbolic Priorities in Possibilistic Logic Souhila Kaci and Henri Prade 1 Abstract. The paper proposes a new approach to the handling of preferences expressed in a compact way under the form of conditional statements. These conditional statements are translated into classical logic formulas associated with symbolic levels. Ranking two alternatives then leads to compare their respective amount of violation with respect to the set of formulas expressing the preferences. These symbolic violation amounts, which can be computed in a possibilistic logic manner, can be partially ordered lexicographically once put in a vector form. This approach is compared to the ceteris paribus-based CP-net approach, which is the main existing artificial intelligence approach to the compact processing of preferences. It is shown that the partial order obtained with the CP-net approach fully agrees with the one obtained with the proposed approach, but generally includes further strict preferences between alternatives (considered as being not comparable by the symbolic level logic-based approach). These additional strict preferences are in fact debatable, since they are not the reflection of explicit user’s preferences but the result of the application of the ceteris paribus principle that implicitly, and quite arbitrarily, favors father node preferences in the graphical structure associated with conditional preferences. Adding constraints between symbolic levels for expressing that the violation of father nodes is less allowed than the one of children nodes, it is shown that it is possible to recover the CP-net-induced partial order. Due to existing results in possibilistic logic with symbolic levels, the proposed approach is computationally tractable. Key words: preference, priority, partial order, CP-net, possibilistic logic.
1
Introduction
The compact representation of preferences has raised a vast interest in artificial intelligence in the last decade [5, 9, 18, 14, 10]. Indeed, it has been early recognized that, since value functions cannot be explicitly defined in case of a great number of alternatives described by means of attributes, preferences should be handled in a compact way, starting from non completely explicit preferences expressed by a user. In particular, conditional statements are often used for describing preferences in a local, contextualized manner. Moreover some generic principle is often used for completing the preferences [5, 14]. The CP-net approach [6] has emerged in the last decade as the preeminent and prominent method for processing preferences in artificial intelligence, due to its intuitive appeal. The CP-net approach directly exploit sets of conditional preferences and their associated graphical structures, assuming an apparently natural and innocuous ceteris 1
Souhila Kaci, Universit´e Lille-Nord de France, Artois, F-62307 Lens CRIL, CNRS UMR 8188, F-62307 - IUT de Lens, kaci@cril.univ-artois.fr, Henri Prade, IRIT, Universit´e Paul Sabatier, 118 route de Narbonne, 31062 Toulouse Cedex 9, France, prade@irit.fr
paribus principle that expresses that conditional preferences, which in general refer to two incompletely described alternatives, still hold when the specification of the two alternatives are completed in the same way. However, the CP-net approach may be computationally costly for dominance queries, which ask whether a ranking for two alternatives holds in any preference ordering that satisfies the CPnet requirements, rather than just asking if it holds in at least one of these preference orderings. This has led to look for tractable approximations of CP-nets [10, 18, 16]. Generally speaking, conditional statements express, in a given context, preferences about what are the most plausible states of the world according to pieces of default knowledge, or what are the most satisfactory states when expressing desires. It has been shown that conditional statements can be expressed under the form of constraints that may be turned into sets of prioritized logical formulas [17, 2, 1]. However, although the case might be encountered in practice, the available approaches for handling preferences do not usually allow for the simultaneous expression of general preferences and of more specific ones that are reversed with respect to the general tendency. In this latter case, the various levels of specificity of the conditionals induce a complete preorder on the logical formulas encoding the defaults. But, in the case of a set of (monotonic) conditional preference statements, we have not necessarily indications about their respective levels of importance. It is why in the following we encode the conditional preferences statements by means of classical logical formulas associated with symbolic priorities (since no a priori ordering between them is known), as already done in the approximation of CP-nets recently proposed [16]. Then the respective amount of preference violation of an alternative with respect to the set of formulas encoding the preferences, can be computed in a possibilistic logic manner [13, 3], and results in a conjunctive combination of symbolic levels. Such combinations of symbolic levels can be partially ordered lexicographically, once they are put in a vector form. After introducing the basic definitions in Section 2, this is explained in Section 3 on a motivating example taken from the CP-net literature. In Section 4, after a refresher on the CP-net approach, it is shown that the partial order obtained with the CP-net approach fully agrees with the one obtained with the symbolic priorities approach, but generally includes further strict preferences. A discussion shows that it is due to a debatable use of ceteris paribus principle on pairs of alternatives for which there is no inclusion relation between the two sets of preferences that they violate. Section 5 shows how the CP-net partial order can be recovered by adding constraints between symbolic levels for expressing that the violation of father nodes is less allowed than the one of children nodes. Such a representation framework, where logical formulas are associated with symbolic priority levels between which further con-
S. Kaci and H. Prade / Mastering the Processing of Preferences by Using Symbolic Priorities in Possibilistic Logic
straints may be added, is akin to the one presented in [3] (for handling multiple sources information) for which tractable computational procedures exist.
2
Definitions and notations
Let V = {X1 , · · · , Xl } be a set of l variables. Each variable Xi takes its values in a domain denoted Dom(Xi ) = {xi1 , · · · , ximi }. Let V be a subset of V . An assignment of V is the result of giving a value in Dom(Xi ) to each variable Xi in V . Asst(V ) is the set of all possible assignments to variables in V . In particular Asst(V ), denoted Ω, is the set of all possible assignments of the variables in V . Each element in Ω, denoted ω, is called an alternative. When dealing with binary variables, formulas of propositional logic are denoted a, b, c, · · ·. Let , (resp. 1) be a binary relation on a finite set A = {x, y, z, · · ·} such that x , y (resp. x 1 y) means that x is at least as preferred as (resp. strictly preferred to) y. x = y means that both x , y and y , x hold, i.e. x and y are equally preferred. Lastly x ∼ y means that neither x , y nor y , x holds, i.e. x and y are incomparable. , is a partial preorder on A if and only if , is reflexive (x , x) and transitive (if x , y and y , z then x , z). 1 is a partial order on A if and only if 1 is irreflexive (x 1 x does not hold) and transitive. A partial order 1 may be defined from a partial preorder , as x 1 y if x , y holds but y , x does not. A (pre-)order is asymmetric if and only if ∀x, y ∈ A, if x 1 y holds then y 1 x does not. A preorder , on A is complete if and only if all pairs are comparable i.e. ∀x, y ∈ A, we have x , y or y , x.
3
Motivating example and preference encoding
We first motivate the proposed approach on an example inspired from [11] about how to be dressed for an evening party. Example 1 Let V (vest), P (pants), S (shirt) and C (shoes) be four binary variables taking their values in {Vb , Vw }, {Pb , Pw }, {Sr , Sw } and {Cr , Cw } respectively, where b, w and r stand respectively for black, white and red. Clearly there are sixteen possible evening dress Ω = {Vb Pb Sr Cr , Vb Pb Sw Cr , Vb Pw Sr Cr , Vb Pw Sw Cr , Vw Pb Sr Cr , Vw Pb Sw Cr , Vw Pw Sr Cr , Vw Pw Sw Cr , Vb Pb Sr Cw , Vb Pb Sw Cw , Vb Pw Sr Cw , Vb Pw Sw Cw , Vw Pb Sr Cw , Vw Pb Sw Cw , Vw Pw Sr Cw , Vw Pw Sw Cw }. Assume that when choosing his evening dress, Peter is not able to compare the sixteen possible choices but expresses the following partial preferences: (P1 ): he prefers black vest to white vest, (P2 ): he prefers black pants to white pants, (P3 ): when vest and pants have the same color, he prefers red shirt to white shirt otherwise he prefers white shirt, and (P4 ): when the shirt is red then he prefers red shoes otherwise he prefers white shoes. The problem now is how to rank-order the sixteen possible choices according to Peter’s preferences. The above preferences are conditionals of the form “in context c, a is preferred to b”, where c may be a tautology. Such a preference can be modelled as a pair of prioritized goals {(¬c ∨ a ∨ b, 1), (¬c ∨ a, 1 − α)}, which stand for “when c is true, one should have a or b (the choice is only between a and b) and in context c, it is somewhat imperative to have a true”. These pairs of propositional formulas associated with a level are known as
377
possibilistic formulas [13]. Indeed, e.g. (¬c ∨ a, 1 − α) encodes a constraint of the form Π(c ∧ ¬a) ≤ α (≡ N (¬c ∨ a) ≥ 1 − α), where Π, N are dual possibilistic measures (1 − Π(¬p) = N (p)). This expresses that the satisfaction level when the constraint is violated is upper bounded by α. Note that when b ≡ ¬a, the clause (¬c ∨ a ∨ b, 1) becomes a tautology, and thus does not need to be written. Indeed the clause (¬c ∨ a, 1 − α) expresses a preference for a over ¬a in context c. The clause (¬c ∨ a ∨ b, 1) is only needed if a ∨ b does not cover all the possible choices. Assume a ∨ b ≡ ¬d (where ¬d is not a tautology), then it makes sense to understand the preference for a over b in context c, as the fact that in context c, b is a default choice if a is not available. If one wants to open the door to remaining choices, it is always possible to use (¬c ∨ a ∨ b, 1 − α ) with 1 − α > 1 − α, instead of (¬c ∨ a ∨ b, 1). Thus, the approach would easily extend to non binary choices. Example 2 (Example 1 continued) Thus P1 and P2 are encoded by means of (i) : {(Vb , 1 − α)} and (ii) : {(Pb , 1 − β)} respectively. P3 is encoded by (iii) : {(¬Vb ∨ ¬Pb ∨ Sr , 1 − γ)}, (iv) : {(¬Vw ∨ ¬Pw ∨ Sr , 1 − η)}, (v) : {(¬Vw ∨ ¬Pb ∨ Sw , 1 − δ)} and (vi) : {(¬Vb ∨¬Pw ∨Sw , 1−ε)}. Lastly P4 is encoded by (vii) : {(¬Sr ∨ Cr , 1 − θ)} and (viii) : {(¬Sw ∨ Cw , 1 − ρ)}. Note that we have chosen here, in order to be as general as possible, to give distinct symbolic priority levels for the formulas associated to the different contexts covered by a preference Pi . Since one does not know precisely how imperative the preferences are, the weights will be handled in a symbolic manner. However, they are assumed to belong to a linearly ordered scale (the strict order will be denoted by > on this scale), with a top element (denoted 1) and a bottom element (denoted 0). Thus, 1 − (.) should be regarded here just as denoting an order-reversing map on this scale (without having a numerical flavor necessarily), with 1 − (0) = 1, and 1 − (1) = 0. On this scale, one has 1 > 1 − α, as soon as α = 0. The order-reversing map exchanges two scales: the one graded in terms of necessity degrees, or if we prefer here in terms of imperativeness, and the one graded in terms of possibility degrees, i.e. here, in terms of satisfaction levels. Thus, the level of priority 1 − α for satisfying a preference is changed by the involutive mapping 1 − (.) into a satisfaction level α < 1 when this preference is violated. Since in the example the values of the weights 1 − α, 1 − β, 1 − γ, 1 − η, 1 − δ, 1 − ε, 1 − θ and 1 − ρ are unknown, no particular ordering is assumed between them. Table 1 gives the satisfaction levels of the above clauses and the sixteen possible choices. The last column gives the vector of the global satisfaction, exhibiting symbolic satisfaction levels that are different from 1, each time a formula is violated. In practice, this violation amounts can be syntactically computed using the approach proposed in [3]. Even if the values of the weights are unknown, a partial order between the sixteen choices can be naturally induced. For example Vb Pb Sr Cr is preferred to all remaining alternatives since it is the only alternative that satisfies all Peter’s preferences. Also, Vw Pb Sw Cw is preferred to Vw Pw Sr Cr since the former falsifies (Vb , 1 − α) while the latter falsifies both (Vb , 1 − α) and (Pb , 1 − β). This partial order is depicted in Figure 1. An edge from ω to ω means that ω is preferred to ω. Indeed an alternative ω is naturally preferred to an alternative ω when the set of clauses falsified by ω is included in the set of clauses
378
S. Kaci and H. Prade / Mastering the Processing of Preferences by Using Symbolic Priorities in Possibilistic Logic
Vb Pb Sr Cr Vb Pb Sw Cr Vb Pw Sr Cr Vb Pw Sw Cr Vw Pb Sr Cr Vw Pb Sw Cr Vw Pw Sr Cr Vw Pw Sw Cr Vb Pb Sr Cw Vb Pb Sw Cw Vb Pw Sr Cw Vb Pw Sw Cw Vw Pb Sr Cw Vw Pb Sw Cw Vw Pw Sr Cw Vw Pw Sw Cw
(i) 1 1 1 1 α α α α 1 1 1 1 α α α α
(ii) 1 1 β β 1 1 β β 1 1 β β 1 1 β β
(iii) 1 γ 1 1 1 1 1 1 1 γ 1 1 1 1 1 1
(iv) 1 1 1 1 1 1 1 η 1 1 1 1 1 1 1 η Table 1.
(v) 1 1 1 1 δ 1 1 1 1 1 1 1 δ 1 1 1
(vii) 1 1 1 1 1 1 1 1 θ 1 θ 1 θ 1 θ 1
(viii) 1 ρ 1 ρ 1 ρ 1 ρ 1 1 1 1 1 1 1 1
satisfaction levels (1, 1, 1, 1, 1, 1, 1, 1) (1, 1, γ, 1, 1, 1, 1, ρ) (1, β, 1, 1, 1, ε, 1, 1) (1, β, 1, 1, 1, 1, 1, ρ) (α, 1, 1, 1, δ, 1, 1, 1) (α, 1, 1, 1, 1, 1, 1, ρ) (α, β, 1, 1, 1, 1, 1, 1) (α, β, 1, η, 1, 1, 1, ρ) (1, 1, 1, 1, 1, 1, θ, 1) (1, 1, γ, 1, 1, 1, 1, 1) (1, β, 1, 1, 1, ε, θ, 1) (1, β, 1, 1, 1, 1, 1, 1) (α, 1, 1, 1, δ, 1, θ, 1) (α, 1, 1, 1, 1, 1, 1, 1) (α, β, 1, 1, 1, 1, θ, 1) (α, β, 1, η, 1, 1, 1, 1)
Satisfaction levels.
(vi) 1 1 ε 1 1 1 1 1 1 1 ε 1 1 1 1 1
Figure 1. Basic partial order.
lation (called “ordered Pareto” and denoted 1OP ) that exploits the available information about the relative values of the symbolic levels. Then v is preferred to v , denoted v 1OP v , if there is a reordering of each vector of symbolic levels such that ∀i, do (vi ) ≥ do (vi ) and ∃j, do (vj ) > do (vj ) according to the current knowledge about the ordering between symbolic levels, where do (v) is the reordered vector associated to d(v). Initially the only available knowledge about the ordering between the symbolic levels is α < 1 when α = 1 and 1 ≤ 1. Then, for example d(v) = (α, 1) and d(v ) = (1, ε) are incomparable. Now if we also know that α > ε then v 1OP v (i.e. (α, β, 1, 1, 1, 1, 1, 1) 1OP (1, β, 1, 1, 1, ε, 1, 1)) since do (v) = (α, 1) and do (v ) = (ε, 1) are now Pareto comparable. Proposition 1 Let Σ = {(ai , αi )} be a set of formulas. Let ω and ω be two alternatives. Let Fω and Fω be the sets of formulas of Σ falsified by ω and ω respectively. Let v and v be the satisfaction levels of ω and ω respectively. Then, ω 1b,Σ ω iff v 1OP v .
falsified by ω . Definition 1 (Basic preference relation) Let Σ = {(ai , αi )} be a set of formulas associated with symbolic weights. Let ω and ω be two alternatives and Fω and Fω be the sets of Σ falsified by ω and ω respectively. ω is basically preferred to ω , denoted ω 1b,Σ ω , iff Fω ⊂ F ω . Thus ω is preferred to ω only when the components of its associated satisfaction vector are equal to 1 for those components that are different in the two satisfaction vectors associated to ω and ω . Formally we describe the basic preference relation as follows. Let v = (v1 , · · · , vn ) and v = (v1 , · · · , vn ) be two vectors of satisfaction levels. These satisfaction levels are ordered according to the order in which we consider the formulas. In our example from (i) to (viii). Discrimin criterion [12] is defined by ignoring the values that are the same in both v and v for a given vector’s component pertaining to the same formula. For example the two vectors v = (α, β, 1, 1, 1, 1, 1, 1) and v = (1, β, 1, 1, 1, ε, 1, 1) reduce to d(v) = (α, 1) and d(v ) = (1, ε) respectively. For further comparing the reduced vectors, we define the following preference re-
Each additional preference between two alternatives should be the consequence of an explicit constraint between symbolic weights. For example Vb Pb Sw Cw and Vb Pb Sr Cw are incomparable since γ and θ are incomparable. Now if we state that θ > γ then Vb Pb Sr Cw would be preferred to Vb Pb Sw Cw .
4
Conditional Preference Networks (CP-nets)
Conditional preference networks (CP-nets for short) [5] encode comparative conditional statements and are based on ceteris paribus principle. More precisely, a CP-net is a directed graphical representation of conditional preferences, where nodes represent variables and edges express preference links between variables. When there exists a link from X to Y , X is called a parent of Y . P a(X) denotes the set of parents of a given node X. It determines the user’s preferences over possible values of X. For the sake of simplicity, we suppose that variables are binary. Preferences are expressed at each node by means of a conditional preference table (CP T for short) such that:
S. Kaci and H. Prade / Mastering the Processing of Preferences by Using Symbolic Priorities in Possibilistic Logic
• for root nodes Xi , the conditional preference table, denoted CP T (Xi ), provides the strict preference2 over xi and its negation ¬xi , other things being equal, i.e. ∀y ∈ Asst(Y ), xi y 1 ¬xi y where Y = V \{Xi }. This is the ceteris paribus principle.
• For other nodes Xj , CP T (Xj ) describes the preferences over xj and ¬xj other things being equal given any assignment of P a(Xj ), i.e. xj zy 1 ¬xj zy, ∀z ∈ Asst(P a(Xj )) and ∀y ∈ Asst(Y ) where Y = V \({Xj } ∪ P a(Xj )). For each assignment z of P a(Xj ) we write for short a statement of the form z : xj 1 ¬xj . Note that this is a parent-dependent specification.
Definition 3 (Preference entailment) Let N be a CP-net over a set of variables V , and ω, ω ∈ Ω. N entails that ω is strictly preferred to ω , denoted ω 1N ω , if and only if ω 1 ω holds in every preference ranking , that satisfies N .
Vb Vw
Pb Pw P
V
S
C
Figure 2.
Vb Pb : Sr Sw Vw Pw : Sr Sw Vw Pb : Sw Sr Vb Pw : Sw Sr
Sr : C r C w Sw : C w C r
A CP-net and its associated order.
Proposition 2 Let N be a CP-net. Let Σ = {(¬ui ∨ x, αi )} where ui : x 1 ¬x are unconditional/conditional local preferences expressed in N . Then, ∀ω, ω ∈ Ω, if ω 1b,Σ ω then ω 1N ω . For example Vw Pw Sr Cr falsifies (Vb , 1 − α), (Pb , 1 − β) and Vw Pw Sw Cr falsifies (Vb , 1 − α), (Pb , 1 − β), 2
We restrict ourselves to a complete order over xi and ¬xi as it is the case with CP-nets in general. However this can be easily extended to a preorder.
A CP-net N is consistent when there exists an asymmetric preference ranking that is consistent with N . We focus in this paper on acyclic CP-nets in order to ensure their consistency.
Example 3 (Example 1 continued) Peter’s preferences can be represented by the CP-net depicted in Figure 2. As one would expect, the CP-net fully agrees with basic preference relation.
Definition 2 A complete preorder , on Ω, called also preference ranking, satisfies a CP-net N if and only if it satisfies each conditional preference expressed in N . In this case, we say that the preference ranking , is consistent with N .
Indeed 1N is the intersection of all preference rankings consistent with N . When ω 1N ω holds, we say that ω dominates ω . The preferential comparison in CP-nets is based on the notion of worsening flip. A worsening flip is a change of the assignment of a variable to an assignment that is less preferred following the conditional preference table of that variable, and under ceteris paribus assumption, w.r.t. the CP-net N . Then ω is preferred to ω w.r.t. N iff there is a chain of worsening flips from ω to ω .
379
Figure 3.
A CP-net and its associated order.
(¬Vw ∨¬Pw ∨Sr , 1−η) and we have Vw Pw Sr Cr 1N Vw Pw Sw Cr . However as we can check in Figure 2, the partial order associated to the CP-net is more refine than the basic preference relation, i.e. some incomparabilities in the latter have been turned into strict comparabilities in the former. For example Vw Pb Sr Cw is preferred to Vw Pw Sr Cw w.r.t. the CP-net while they are incomparable w.r.t. 1b,Σ since Vw Pb Sr Cw falsifies (Vb , 1−α), (¬Vw ∨¬Pb ∨Sw , 1−δ) and (¬Sr ∨ Cr , 1 − θ) while Vw Pw Sr Cw falsifies (Vb , 1 − α), (Pb , 1 − β) and (¬Sr ∨ Cr , 1 − θ). These additional strict preferences are due to the fact that preferences in CP-nets depend on the structure of the graph. More precisely, since preferences over the values of a variable are conditioned on the values of its parents, the application of ceteris paribus principle implicitly gives priority to father nodes. For example Vw Pb Sr Cw 1N Vw Pw Sr Cw due to Pb 1 Pw . Indeed Vw Pw Sr Cw is less preferred than Vw Pb Sr Cw since the former falsifies (Pb , 1 − β) while the latter falsifies (¬Vw ∨ ¬Pb ∨ Sw , 1 − δ) (they both falsify (Vb , 1 − α) and (¬Sr ∨Cr , 1−θ)). Indeed when two alternatives ω and ω differ on the value of one variable only, ω is preferred to ω w.r.t. a CP-net if and only if • either Fω ⊂ Fω , (cf. Definition 1) • or ω falsifies a father node preference while ω falsifies a child node preference.
5
Encoding CP-nets
We show in this section that the partial order associated to a CPnet can be retrieved in our approach using additional constraints on symbolic levels. This encoding follows three steps: • Let X be a node in the CP-net N and CP T (X) be its associated conditional preference table. For each local preference
380
S. Kaci and H. Prade / Mastering the Processing of Preferences by Using Symbolic Priorities in Possibilistic Logic
ui : x 1 ¬x in CP T (X) we associate a base made of one formula ¬ui ∨ x as follows ΣX,ui = {(¬ui ∨ x, 1 − αi )}. We do not add (¬ui ∨ x ∨ ¬x, 1) since we are dealing with binary variables.
• For each node X in the CP-net N , build ΣX = i ΣX,ui where the bases ΣX,ui have been obtained at the previous step. Then Σ = X ΣX is the partially ordered base associated to N . • For each formula (¬ui ∨x, 1−αi ) in ΣX and each formula (¬uj ∨ y, 1 − αj ) in ΣY such that X is a father of Y and we are in the same context, i.e. ¬uj = ¬x ∨ ¬uk , we put 1 − αi > 1 − αj . Example 4 (Example 1 cont’d) We have Σ = {(Vb , 1 − α), (Pb , 1 − β), (¬Vb ∨ ¬Pb ∨ Sr , 1 − γ), (¬Vw ∨ ¬Pw ∨ Sr , 1 − η), (¬Vw ∨ ¬Pb ∨ Sw , 1 − δ), (¬Vb ∨ ¬Pw ∨ Sw , 1 − ε), (¬Sr ∨ Cr , 1 − θ), (¬Sw ∨ Cw , 1 − ρ)}. We define the following constraints between symbolic weights, which express that constraints associated with father nodes have priority w.r.t. the ones associated with their child nodes: 1 − α > 1 − γ, 1 − α > 1 − ε, 1 − β > 1 − γ, 1 − β > 1 − δ, 1 − γ > 1 − θ, 1 − η > 1 − θ, 1 − δ > 1 − ρ, 1 − ε > 1 − ρ which are equivalent to α < γ < θ, α < ε < ρ, β < δ < ρ, β < γ and η < θ. Then we have the following general result: Proposition 3 Let N be a CP-net and Σ be its associated formulas base as described above and its associated partial order > on symbolic levels. Then ∀ω, ω ∈ Ω, v 1OP v iff ω 1N ω , where v (resp. v ) is the vector of satisfaction levels associated to ω (resp. ω ). Example 5 (Example 1 continued) Let us consider again the two alternatives ω : Vw Pb Sr Cw and ω : Vw Pw Sr Cw . We have vω = (α, 1, 1, 1, δ, 1, θ, 1) and vω = (α, β, 1, 1, 1, 1, θ, 1). Then vω 1OP vω since vω and vω reduce to (1, δ) and (β, 1) following discrimin criterion, i.e. d(vω ) = (1, δ) and d(vω ) = (β, 1). Now since δ > β, (1, δ) and (β, 1) can be reordered into do (vω ) = (δ, 1) and do (vω ) = (β, 1) such that we have δ > β and 1 ≥ 1. We can check that Vw Pb Sr Cw 1N Vw Pw Sr Cw . Generally speaking, the proposed approach allows us to add any further constraint between priority levels, which may privilege a particular child node if it is desirable, or express, as in TCP-nets [7], a conditional relative importance of the satisfaction of a particular requirement over another. Indeed a contextual preference in favor of a variable attached to a node can be expressed in our framework by means of additional constraints between symbolic levels.
6
Conclusion
The paper has proposed an encoding of conditional preferences by means of classical logic formulas associated with symbolic priority levels, in a possibilistic logic manner. It has led to the definition of a natural partial order that is always more cautious than the corresponding partial order obtained with a CP-net approach. Moreover adding constraints between symbolic priority levels has enabled us to recover the CP-net partial order exactly (although as explained in the paper, the strict preferences found in the CP-net approach, but not with our approach, are debatable). The approach can benefit from the
existance of a computationally tractable inference procedure in possibilistic logic with partially ordred symbolic levels [3]. Besides, it is worth noticing that the representation obtained looks similar to an hybrid possibilistic Bayesian-like network [4] since each node of the graphical structure reflecting the conditional preferences is associated with a set of constraints encoded by possibilistic logic-like formulas. The precise linkage between the representation presented in this paper and hybrid possibilistic networks is a topic of a further research. Lastly the proposed approach might be applied to the management of preference queries addressed to a database for rank-ordering the answers according to their amounts violation of conditional preferences associated to the queries, and thus contributes to an active database research trend [8, 15].
REFERENCES [1] S. Benferhat, D. Dubois, S. Kaci, and H. Prade, ‘Bridging logical, comparative, and graphical possibilistic representation frameworks’, in ECSQARU, pp. 422–431, (2001). [2] S. Benferhat, D. Dubois, and H. Prade, ‘Representing default rules in possibilistic logic’, in KR, pp. 673–684, (1992). [3] S. Benferhat and H. Prade, ‘Encoding formulas with partially constrained weights in a possibilistic-like many-sorted propositional logic’, in IJCAI, pp. 1281–1286, (2005). [4] S. Benferhat and S. Smaoui, ‘Hybrid possibilistic networks’, International Journal of Approximate Reasoning, 44(3), 224–243, (2007). [5] C. Boutilier, R. Brafman, C. Domshlak, H. Hoos, and D. Poole, ‘CPnets: A tool for representing and reasoning with conditional ceteris paribus preference statements’, Journal of Artificial Intelligence Research, 21, 135–191, (2004). [6] C. Boutilier, R.I. Brafman, H.H. Hoos, and D. Poole, ‘Reasoning with conditional ceteris paribus preference statements’, in UAI, pp. 71–80, (1999). [7] R.I. Brafman and C. Domshlak, ‘Introducing variable importance tradeoffs into cp-nets’, in UAI, pp. 69–76, (2002). [8] J. Chomicki, ‘Database querying under changing preferences’, Annals of Mathematics and Artificial Intelligence, 50(1-2), 79–109, (2007). [9] S. Coste-Marquis, J. Lang, P. Liberatore, and P. Marquis, ‘Expressive power and succinctness of propositional languages for preference representation’, in KR, pp. 203–212, (2004). [10] C. Domshlak, F. Rossi, K.B. Venable, and T. Walsh, ‘Reasoning about soft constraints and conditional preferences: complexity results and approximation techniques’, in IJCAI, pp. 215–220, (2003). [11] C. Domshlak, F. Rossi, K.B. Venable, and T. Walsh, ‘Reasoning about soft constraints and conditional preferences: complexity results and approximation techniques’, in IJCAI, pp. 215–220, (2003). [12] D. Dubois, H. Fargier, and H. Prade, ‘Beyond min aggregation in multicriteria decision: (ordered) weighted min, discri-min, leximin’, in The Ordered Weighted Averaging Operators - Theory and Applications (R.R. Yager, J. Kacprzyk, eds.), pp. 181–192. Kluwer Acad. Publ., (1997). [13] D. Dubois, J. Lang, and H. Prade, ‘Possibilistic logic’, In Handbook of Logic in Artificial Intelligence and Logic Programming, 439–513, (1994). [14] R. G´erard, S. Kaci, and H. Prade, ‘Ranking alternatives on the basis of generic constraints and examples - a possibilistic approach’, in IJCAI, pp. 393–398, (2007). [15] A. HadjAli, S. Kaci, and H. Prade, ‘Database preferences queries a possibilistic logic approach with symbolic priorities’, in FoIKS, pp. 291–310, (2008). [16] S. Kaci and H. Prade, ‘Relaxing ceteris paribus preferences with partially ordered priorities’, in ECSQARU, pp. 660–671, (2007). [17] J. Pearl, ‘System Z: A natural ordering of defaults with tractable applications to default reasoning’, in Proceedings of the 3rd Conference on Theoretical Aspects of Reasoning about Knowledge (TARK’90), pp. 121–135, (1995). [18] N. Wilson, ‘An efficient upper approximation for conditional preference’, in ECAI, pp. 472–476, (2006).
7. Distributed and Multi-Agents Systems
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-383
383
Interaction-Oriented Agent Simulations: From Theory to Implementation Yoann Kubera and Philippe Mathieu and S´ ebastien Picault1 Abstract. This paper deals with the software architecture for individual-centered simulations, i.e. involving many entities interacting together. Many software architectures have been developped in this context, especially many advanced – but domain specific – frameworks. Yet those frameworks imply tight software dependencies between agents, behaviors and action selection mechanisms, which leads to many difficulties in modelling and programming. We propose a method and an architecture where interactions are reified regardless of agents, in order to obtain a complete interaction-oriented design process for simulations. Then, an agent is only an entity that can perform or undergo a set of interactions, even not specifically developped for it. Thus most interactions can be re-used in many contexts. In addition, our method clearly separates knowledge about behaviors from its processing, and thus makes the design of simulations easier. Moreover, this new and user-friendly approach helps programmers to build simulations with a large number of different behaviors at the same time, especially in the context of large-scale simulations.
1
Introduction
In recent years, agent-based simulations became preponderant among living beings simulation tools, either to understand their mechanisms or to copy them for leisure use (video games, animation in films, etc.). It links up experts from both specific domains (biology, sociology, etc.) and computer science. Its multidisciplinary nature has given birth to more-or-less domain-specific platforms. A large subset of those – like Swarm [4], Madkit [6] or Magique [2] – are open, and thus enable the user to freely implement agents, behaviors and environments. They offer different levels of software refinement and allow the use of many engineering tools – design patterns, components, inheritance, etc. Moreover, the platforms cited above are not only dedicated to simulations, but can also be used to build agent-based applications. Others – like Netlogo [12] – are based on a simple programming paradigm designed for non-computer scientists. The generic aspect of all those open platforms is obtained at the expense of a formal way to guide the design of behaviors. Data is indeed mixed with its processing – i.e. the action selection mechanism is mixed with behavior representation – which implies a complete reimplementation of the agent when adding or deleting an interaction in which it is involved. On the opposite, many formalisms – like Petri nets, subsumption, rules sets, artificial neural networks – may strongly 1
University of Lille, France, email: name.surname@lifl.fr
guide agent architecture, at the expense of reusability in other formalisms. Some of the rare ones that make possible behavior reuse are the cognitive architectures with plans like in Act-R [1] where knowledge is separated from its processing. However they are often not fitted neither to build multiagent simulations because of their poor performances nor to design reactive agents. In order to build reusable and generic behaviors, we promote in this paper the Interaction-Oriented Design of Agent simulations (IODA) formal method and architecture based on [9, 8] works. It consists in abstracting from the agents the actions they participate in, by reifying them into the notion of interaction. An agent may perform or undergo a set of interactions which are not specifically developped for it. Thus most interactions can be re-used in many contexts. In addition, this architecture clearly separates data from processing, and thus makes the design of simulations easier. We also describe the Java Environment for the Design of agent Interactions (JEDI) platform, which is a Java implementation of IODA for simulations with reactive situated agents. The second section contains a brief introduction to related work on generic agent behavior achitectures. The third section describes the IODA methodology and its advantages – like the separation between data and its processing, interactions libraries, or large-scale simulation construction. The fourth section presents the generic features of IODA concepts, through an easy to customize simulation platform called JEDI. Eventually, the last section concludes about IODA and JEDI.
2
Additional Related Work
Research on multiagent systems and on agent design is very active, and many generic agent description models do exist. Formal description methods and generic architectures for agents behavior can be examined from two points of view. The first one is about interactions design : the way agents communicate with each others are extracted from their model into abstract communication patterns and protocols. Generally, this abstraction is limited to the model design step, and the interaction protocol and the agent’s behavior are mixed together during implementation – like in JADE, AgenTalk, Swarm, etc. This leads to a decreased maintainability due to dispersal of the protocol’s implementation. As proposed in [5], one solution is to abstract the interaction protocol from agents, and then reify it as a single entity defined by roles and messages sequences, which use functionalities that agents implement on their own according to their role.
384
Y. Kubera et al. / Interaction-Oriented Agent Simulations: From Theory to Implementation
The second one is about agents behavior itself. Many generic methodologies stop at the formal specification of a simulation, giving place at worse to implementation errors and at best to mixing data (i.e. actions an agent can perform) and its processing (i.e. selection of an action given a particular valuation of the global state of the simulation). Definition 1 The global state of the simulation is the union of the set of all states of the environment and the states of all agents in the environment. Formal methods and architectures allow to keep the separation between data and processing with agent-independent actions, like in [3] where actions are agent-independent components, so that the behavior of an agent is defined by a set of interconnected components. This kind of solution is well suited to complex action scheduling, but the connectivity of these components decrease the maintainability of the agents, especially if their behavior change during simulation, or if the simulation is using a large scale knowledge representation. Definition 2 A simulation is called large scale simulation if its environment contains a great amount of agents (namely simulation with large scale computations) or if it contains a large number of agents with different behavior and a large number of actions per agent (namely simulation with large scale knowledge representation). In the following sections, we propose a formal method and an architecture providing the advantages of both interaction reification and separation between knowledge and processing, fitting large scale knowledge representation requirements with an homogeneous design of agents and interactions.
3
The IODA Methodology
In general, a communication protocol is used in order to describe a particular abstract process involving many agents, for instance “to exchange goods”. In order to build reusable and generic behavior, we present in this paper the IODA formal method and architecture. It relies on an homogeneous representation of actions performed by agents, called Interaction, close to the concept of design/perceived affordance of Norman [11]. This formal representation is adapted to represent actions involving only one agent as well as complex actions involving many communicating agents.
3.1
An Interaction-centered Methodology
The behavior of an agent is defined by a specific arrangement of semantic blocks called interactions (see § 3.5). An interaction is itself a set of primitives simultaneously involving a fixed number of agents, which describes how and under what kind of conditions agents may interact one with others or with the environment. An agent owns a set of perception primitives – used to get information from the global state of the simulation – and a set of action primitives – used to change this global state (change the environment’s, other agent’s or his own local state). These are the atomic elements of interactions. Definition 3 An Interaction is a structured set of action primitives involving simultaneously a fixed number of agents.
An interaction can occur only if the activation conditions – a boolean expression of perception primitives – are met. Definition 4 Agents involved in an interaction generally do not play the same role. We make a difference between Source agents that may perform the interaction, and Target agents that may undergo it. As described in Def. 3, an interaction sets the logical sequence of primitives required in order to make agents interact. These primitives may be implemented differently according to the agent specificities. As a consequence, it leads to a more enhanced and easier-to-use polymorphism in agent behavior compared to other agent architectures like [3] where close behaviors cannot be expressed without complex means. An interaction is not agent-dependent and may be re-used in other simulations. Thus, building simulations leads to the construction of interaction and agent libraries, and facilitates further simulation design.
3.2
IODA Agents
In IODA, agents follow a simple architecture which makes possible to design homogeneously agents with different specificities in the same simulation. Definition 5 An agent x is an autonomous entity of a simulation. Its minimal specification : • • • •
has properties; has a local state, which is a valuation of its properties; implements a set of action and perception primitives; perceives other agents and the state of the environment only in a subset of the environment H(x) called halo. The set N (x) of agents present in H(x) is called neighborhood; • is assigned a set of interactions it can perform or undergo (see § 3.5); • implements an interaction selection process (see § 3.7).
Definition 6 An agent family (or agents equivalence class or agent class) is an abstract set of agents, in which all agents share all or part of their properties, action or interaction primitives, or behavior. From this point on, if S ∈ F, x ∈ S means x is an agent from the S agent family. A IODA agent is not restricted to a particular kind of agent. Programmers may freely define a cognitive or reactive interaction selection process, reactive or cognitive perception primitives, more or less complex neighborhood computations. Besides, neighborhood computation taken apart, the interaction selection process is independent from the environment’s topology, and needs only a notion of distance between agents.
3.3
Interactions and cardinality
As its name implies, an interaction may occur between a source agent and a target agent. However, complex problems need to define other situations like the interaction of an agent with itself (to sleep, to think ) or with the environment (to move, to die). Even more complex situations may occur, where interactions involve more than one source or target (for instance to burst involving many casualties). Cardinality (see Def. 7) unifies those notions.
Y. Kubera et al. / Interaction-Oriented Agent Simulations: From Theory to Implementation
385
Definition 7 The cardinality of an interaction I is the pair (cardS (I), cardT (I)) where cardS (I) (resp. cardT (I)) is the number of source agents (resp. target agents) involved in the interaction. Particular interactions where an agent interact with itself or with the environment, i.e. with T = ∅, are called degenerate interactions.
Definition 9 The assignation aS/T of an interaction set (Ij )j∈[1,n] between a source agent family S and a set of target agent families T describes the set of interactions that agents belonging to S may perform as sources together with sets of agents from T as targets. It is defined by a set of tuples (Ij , pj , cj , dj )j∈[1,n] , named assignation elements, where :
Definition 8 An interaction I is in normal form if and only if cardS (I) = 1. It has been shown that any interaction can be expressed into normal form [8]. Thus, in the following sections of this paper, interactions are supposed in normal form, mainly for complexity matters [8].
• Ij is an interaction that S can perform and all x ∈ T can undergo; • pj is the priority of this assignation of interaction Ij ; • cj is the interaction’s cardinality (i.e. the number of awaited targets); • dj is the limit distance allowed between S and all x ∈ T so that S may perform the interaction with T .
3.4
N.B.: Elements of the assignation aS/∅ of degenerate interactions are (Ij , pj ) pairs.
Problem analysis
In addition to the formal specification of simulations, IODA provides a set of algorithms to go from model analysis to concrete implementation. Those algorithms are demonstrated in the JEDI platform (see § 4) in the context of reactive and situated agents, but could also be implemented for any other kind of multi-agent system as well. According to our methodology, the design of a simulation follows 5 steps : 1. Identify all agent families as well as all interactions of the simulation. It leads to the definition of a matrix between source agents and target agents containing interactions. This step is called “assignation of interactions to source and target agents”. 2. Define all primitives needed to write the activation conditions and the action sequence of the interactions. 3. Identify the action and perception primitives that will be implemented by each agent family, and how they will be implemented. 4. Define for each assigned interaction I a priority p(I) and a limit distance d(I) (see § 3.7). It implies refining the initial matrix. 5. Define how the matrix evolves during simulation, i.e. if agents can change their own or other’s behavior by changing a line or a colmun of the matrix. To help the design of simulations, the assignation of interactions to source and target agents is summarized into a matrix called Interaction Matrix .
3.5
The Interaction Matrix
Agents may interact only if target agents are present into the neighborhood of the source agent, but interaction is also constrained by a limit distance. Indeed, seeing a target doesn’t means a source agent may perform the interaction to slap target with it : it has to be close enough to the target, and this distance depends on the source agent’s properties. This notion is independent of grid-like environments : it may be a Minkowski distance as well as a social distance, etc. Additionally, every assigned interaction is endowed with a priority, so to build a hierarchy between them from the viewpoint of the source agent, which is used in the interaction selection process (see § 3.7). These priorities may be constant or dynamic, depending on the nature of the source agent.
Definition 10 If F is the set of all agent families in a simulation, then the interaction matrix of the simulation is the set M = (aS/T )S∈F,T ⊆F of all assignations between all relevant source agent family S and target agent family set T , according to the behaviors to be modeled (see Fig. 1).
3.6
Agent libraries
Because agents from different families may have some similar behavior, agents from an A agent family may be a particular subset of a B agent family. Thus, if M = (aS/T )S∈F,T ⊆F is the interaction matrix of a simulation, S and T may be abstract sets of agent families like groups, teams, etc. We define a particular algebra to specify the relations between agent families, especially how they share their assignation elements through 3 matrix modification operators : Definition 11 Let F be the set of all agent families. • The specialization of an agent family X by a agent family Y is noted Y : X. It means agents of the Y family inherit all assignation elements, perception process, primitives and properties of the X family. • The addition of an assignation element e with source agent family S ∈ F and target agent families T ⊆ F to the interaction matrix is noted +(aS/T , e). • The suppression of an inherited assignation element e with source agent ` family´ S ∈ F and target agent families T ⊆ F is noted − aS/T , e . • The modification of an inherited assignation element e = (I, p, c, d) with source agent ´ ` family S ∈ F and target agent families T ⊆ F is noted ∗ aS/T , e, I , p , c , d . • The modification of an inherited assignation element e = (I, p) with source agent ` family S ∈´F and target agent families T ⊆ F is noted ∗ aS/T , e, I , p . Property 1 Let F be the set of all agent families, X, S, Y ∈ F, T ⊆ F, e an assignation element, I, I two interactions, d, d ∈ R and c, c , p, p ∈ N. ´ ` • Generally, ´(Y : X) ⇒ ∀T ⊆ F, aX/T ⊆ aY /T ` • + `aS/T , e´ ⇒ e ∈ aS/T • − aS/T , e ⇒ e ∈ / aS/T • ∗(aS/T , (I, p, c, d), I , p , c , d ) ⇒ ((I, p, c, d) ∈ / aS/T ∧ (I , p , c , d ) ∈ aS/T )
386
Y. Kubera et al. / Interaction-Oriented Agent Simulations: From Theory to Implementation
``` source
```target ``` Grass
Animal Herbivore Sheep:Animal,Herbivore Goat:Animal,Herbivore Wolf:Animal
∅ +(Grow;0) +(Die;3) +(Move;0)
Grass
Sheep
+(Eat;2;1;0)
+(Breed;1;1;1)
*((Die;3),Die,4)
+(Eat;2;1;0)
Goat
Wolf
+(Breed;1;1;1) +(Eat;3;1;0)
+(Breed;1;1;1)
Figure 1. Example of an interaction matrix for a predator/prey simulation with 4 species. The ’∅’ column contains degenerate interactions. In this example, the ’+’ operator uses either one integer representing the degenerate assignation element’s priority, or three integers representing the assignation element’s priority, its cardinality and its limit distance. The ’∗’ operator, in this case, is used to modify the priority of the inherited “Die” interaction for wolves.
• ∗(aS/T , (I, p), I , p ) ⇒ ((I, p) ∈ / aS/T ∧ (I , p ) ∈ aS/T ) In the interaction matrix, a cell is the intersection of a line, corresponding to the interactions that an agent of S family can perform, and a column, corresponding to the interactions that a set of agents of T families can undergo. Thus aS/T is implicit in the operators used in the matrix on Fig. 1. Such a formalism is platform-independent, especially the specialization notion which meaning changes along the programming language : inheritance for a language object, kind of in a frame language, etc.
3.7
Interaction Selection Basics
The core of an agent’s behavior is the interaction selection process (see Def. 12). This process checks if activation conditions are met, finds targets to interact with, selects a particular set of targets, considers interactions with the correct priorities, and finally performs the sequence of actions. Definition 12 Interaction selection is the process an agent uses in order to select an interaction to perform (i.e. as a source) on particular targets given a particular valuation of the global state of the simulation. Both the eligibility syntaxic criterion and realizability semantic criterion, as well as the Interaction potential set are defined in this section to help the census of all possible interactions for a source agent x. Definition 13 Let dist(x, y) be the distance between two agents x and y. The assignation element e = (Ij , pj , cj , dj ) is said eligible for the source agent x and the set Targ of target agents – written eligible(e, x, Targ ) – if and only if e ∈ ax/Targ and ” “ cardT (Ij ) = 0 ⇒ ∀y ∈ Targ , y ∈ N (x) ∧ dist(x, y) ≤ dj . Definition 14 Let cond(I, x, Targ ) be the activation conditions of the interaction I applied to the source agent x and the set of target agents Targ . The assignation element e = (Ij , pj , cj , dj ) is said realizable for the source x and the set Targ of targets – written realizable(e, x, Targ ) – if and only if: eligible(e, x, Targ ) ∧ cond(e, x, Targ ) Definition 15 The “p-level interaction potential” of an agent x – written Pp (x) – is the set of all realizable assignation elements with x as a source for any target set : Pp (x) = {(e, T ), e = (Ie , pe , ce , de ) | T ⊆ N (x) ∧ p = pe ∧ realizable(e, x, T )}
As a consequence, interaction selection is the process where an x agent selects an element from Ppmax (x) where pmax is the highest priority such that Ppmax (x) = ∅. All the definitions and properties defined in this section are platform independent. Their implementation on a specific programming language implies many choices. We propose in the following section a possible implementation of IODA concepts in the Java language.
4
From Methodology to Implementation
The JEDI platform implements the formal concepts defined in IODA, which means there is an univocal path from problem analysis in IODA to implementation in JEDI. Besides, this transition between model and implementation is automated by a generator called JEDI-Builder . Note that JEDI is more a proof of usefulness of IODA concepts than a regular simulation platform : the IODA methodology may be implemented in other languages, so we did in Netlogo.
4.1
Implementation Choices
Implementation choices define the scope of simulation models supported by a platform. Their consequences are displayed in [7], therefore this section does not argue in details about the reasons of those choices. In JEDI, these are : • • • •
Interaction cardinality is restricted. Simulation is in discrete time. Situated : simulation is in a two-dimensional grid. Everything is agent, which allows an uniform treatment of things (called artifacts, objects, tools, patches, etc.) and “true” agents at implementation.
Definition 16 An agent is said active if he can perform at least one interaction. Otherwise he is said passive. In JEDI, the only difference between passive and active agents depends on the interaction matrix. This homogeneous representation of agents makes transition of agents between passive and active easier. Interactions are reified in a Java abstract class called Interaction. Each agent family S ∈ F – represented by a class inheriting from Agent – contains a set canPerform which is a part of the interaction matrix. It is defined such that ∀x ∈ S, canP erf orm(x) = {aS/T ∀T ⊆ F}. Thus each line of the interaction matrix is defined in an agent family. The abstract class Moteur is the core of the simulation, where the run() method executes the main algorithm of the simulation, i.e. performs every steps of the simulation (see Fig. 2).
Y. Kubera et al. / Interaction-Oriented Agent Simulations: From Theory to Implementation
Let A be the set of agents in the environment and Aact ⊆ A the set of active agents. 1. Reorder Aact according to a particular criterion (see Sect. 4.2), for instance a random order (equitable choice); 2. Set all agents in A operative; 3. For each operative agent a ∈ Aact do : (a) Define the part of the environment H(a) perceived by a; (b) Define the set of all neighboring agents N (a), and remove from it all non-operative agents; (c) Let p = maximal priority in canPerform(a); (d) Compute Pp (a); while Pp (a) = ∅, decrement p and compute again; (e) If p = 0 and P0 (a) = ∅, then a cannot perform any interaction. It remains operative but ends its simulation step; (f) Else, select at an element from Pp (a), i.e. elements ((I, p, c, d), Targ ) containing an assignation element and a set of target agents, using the interaction selection process of the agent; for instance a random choice; (g) Perform the interaction I with a as source and Targ as targets; (h) Deactivate a and all agents in Targ . Figure 2.
4.2
Algorithm of a simulation step.
JEDI Tuning
In order to build simulations with large-scale computations, the programmer has to control the complexity of many parts of the simulation platform in order to find a tradeoff between performances and implementation bias. JEDI’s modular decomposition defines a set of parameters for this purpose : • Agents’ halo H(x) may be defined at will as a set of cells. • “P-level interaction potential“ computing complexity (3f in Fig. 2) may be reduced if needed, though it may introduce a bias in the evaluation order of assignation elements and target sets; for instance a census of only one target set Targ per assignation element e. • Interaction Selection process may easily be customized by writing how to select an element from Pp (x). • Pseudo parallelism may be tuned by the order according to which agents are evaluated (1 in Fig. 2), knowing what kind of bias are introduced [10]. • Interaction matrix is a shared object between agents when is not modified during the simulation.
5
Conclusion
Designing a simulation is the art of finding a tradeoff between model precision – in order to implement the model without any ambiguities – and model universality – in order to easily implement it on any simulation platform. Most simulation platforms neglect one of those points and sometimes do not even clearly define the model they use. In this paper we have presented a formal method and an architecture for the design of multiagent simulations, called IODA, which uses an homogeneous representation of actions performed by agents named Interaction. Actions involving a single agent, or complex actions involving many communicating agents, are both represented with the same formalism. As a consequence of this, the interaction selection process is also
387
the same for all agents, and can be defined independently from both agents and interactions. Knowledge and processing are not mixed, therefore the user is able to build reusable agent and interaction libraries along with simulations. Moreover, the interaction matrix helps to design simulations with largescale knowledge representation, and to build automatically the corresponding implementation through a code generator. The JEDI simulation platform provides a simple implementation tool of IODA models, and defines an interaction selection process suitable to reactive, cognitive or any other kind of agents. In addition, it points up a set of parameters that can be tuned at will. This aims at controlling implementation bias when adapting the complexity of the platform to match with large-scale computations requirements.
Acknowledgements This research is supported by the FEDER and the “Contrat´ Plan Etat R´egion TAC” of Nord-Pas de Calais.
REFERENCES [1] J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, and Y Qin, ‘An integrated theory of the mind’, Psychological Review, 111(4), (2004). [2] Nourredine Bensaid and Philippe Mathieu, ‘A hybrid and hierarchical multi-agent architecture model’, in Proceedings of the Second International Conference and Exhibition on the Practical Application of Intelligent Agents and Multi-Agent Technology, London, UK, (april 1997). [3] Jean-Pierre Briot, Thomas Meurisse, and Fr´ed´ eric Peschanski, ‘Une exp´erience de conception et de composition de comportements d’agents ` a l’aide de composants’, L’Objet, Revue des Sciences et Technologies de l’Information, 12(4), (2006). [4] R. Burkhart, ‘The swarm multi-agent simulation system’, in Position Paper for OOPSLA’94 Workshop on ’The Object Engine’, (1994). [5] Takuo Doi, Yasuyuki Tahara, and Shinichi Honiden, ‘IOM/T: an interaction description language for multi-agent systems’, in AAMAS’05: Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems. ACM, (2005). [6] Olivier Gutknecht, Jacques Ferber, and Fabien Michel, ‘Integrating tools and infrastructures for generic multi-agent systems’, in Proceedings of the Fifth International Conference on Autonomous Agents, eds., J¨ org P. M¨ uller, Elisabeth Andre, Sandip Sen, and Claude Frasson, Montreal, Canada, (2001). ACM Press. [7] Yoann Kubera, Philippe Mathieu, and S´ebastien Picault, ‘La complexit´ e dans les simulations multi-agents’, in Actes des Journ´ ees Francophones sur les Syst` emes Multi-Agents (JFSMA07), ed., C´epadu`es-Editions, Carcassonne, France, (2007). [8] Philippe Mathieu, S´ebastien Picault, and Jean-Christophe Routier, ‘Donner corps aux interactions (l’interaction enfin erence MFI’07, Paris, France, concr´etis´ ee)’, in Actes de la conf´ (2007). [9] Philippe Mathieu, Jean-Christophe Routier, and Pascal Urro, ‘Un mod` ele de simulation agent bas´e sur les interactions’, in Actes des Premi` eres Journ´ ees Francophones sur les Mod` eles Formels de l’Interaction (MFI’01), Toulouse, France, (2001). [10] Fabien Michel, Jacques Ferber, and Olivier Gutknecht, ‘Generic simulation tools based on mas organization’, in Proceedings of the 10 European Workshop on Modelling Autonomous Agents in a Multi Agent World MAMAAW’2001, Annecy, France, (2001). [11] Donald A. Norman, The Psychology of Everyday Things, Basic Books, 1988. [12] Uri Wilenski, ‘Netlogo’, Technical report, Center for Connected Learning and Computer-Based Modeling, (1999).
388
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-388
Optimal Coalition Structure Generation In Partition Function Games Tomasz Michalak, Andrew Dowell, Peter McBurney and Michael Wooldridge Department of Computer Science, The University of Liverpool, L69 3BX Email: {tomasz, adowell, mcburney, mjw} @ liv.ac.uk Abstract. 1 In multi-agent systems (MAS), coalition formation is typically studied using characteristic function game (CFG) representations, where the performance of any coalition is independent from co-existing coalitions in the system. However, in a number of environments, there are significant externalities from coalition formation where the effectiveness of one coalition may be affected by the formation of other distinct coalitions. In such cases, coalition formation can be modeled using partition function game (PFG) representations. In PFGs, to accurately generate an optimal division of agents into coalitions (so called CSG problem), one would have to search through the entire search space of coalition structures since, in a general case, one cannot predict the values of the coalitions affected by the externalities a priori. In this paper we consider four distinct PFG settings and prove that in such environments one can bound the values of every coalition. From this insight, which bridges the gap between PFG and CFG environments, we modify the existing state-of-the-art anytime CSG algorithm for the CFG setting and show how this approach can be used to generate the optimal CS in the PFG settings.
1
Introduction & Motivation
In multi-agent systems (MAS), coalition formation occurs when distinct autonomous agents group together to achieve something more efficiently than they could accomplish individually. One of the main research issues in co-operative MAS is to determine which division of agents into disjoint coalitions (i.e. a coalition structure (CS)) maximizes the total payoff of the system [12, 10]. To this end, coalition formation is typically studied using characteristic function game (CFG) representations which consist of a set of agents A and a characteristic function v, which takes, as input, all feasible coalitions C ⊆ A and outputs numerical values reflecting how these coalitions perform. Furthermore, it is assumed that the performance of any coalition is independent from co-existing coalitions in the system. In other words, the value of a coalition C in a structure CS has the same value as it does in another distinct structure CS . Based on this characteristic of CFGs, Rahwan et al. [10] proposed an algorithm that usually generates an optimal CS without searching through the 1
The authors are grateful for financial support received from the UK EPSRC through the project Market-Based Control of Complex Computational Systems (GR/T10657/01). The authors are also thankful to Jennifer Mcmanus, School of English, University of Liverpool for excellent editorial assistance.
entire space of CSs. In many real life MAS environments, CFG representations are sufficient to model coalition formation, as the coalitions either do not interact with each other while pursuing their own goals or because such interactions are small enough to be neglected. However, in a number of other environments, there are significant externalities from coalition formation (henceforth externalities) where the performance of one coalition may be affected by the formation of another distinct coalition. For example, as more commercial activity moves to the internet, we can expect online economies to become increasingly sophisticated, as is happening, for instance, with real time electronic purchase of wholesale telecommunications bandwidth or computer processor resources. In such contexts, ad hoc coalition formation will need to allow for coalition externalities, thus, rendering CFG representation inadequate to model coalition formation. In contrast, externalities are accounted for in the partition function game (PFG) representation. A PFG consists of a set of agents A and a partition function which takes, as input, every feasible coalition structure (CS), and for each coalition in each structure, outputs a numerical value that reflects the performance of the coalition in that structure. Now, the value of a coalition C in a structure CS may not have the same value in another distinct structure CS . This means that it is not generally possible to pre-determine the value of a coalition in a certain CS without actually computing it in this specific CS. Consequently, one must search through the entire space of CSs to guarantee an optimal solution. This presents a major computational challenge as, even for a moderate number of agents, there are billions of structures to search through (for example, for 14 agents there are 190, 899, 322 CSs and for 15 agents there are 1, 382, 958, 545 CSs). In this paper we contribute to the literature as follows: • We prove that it is possible to bound the coalition values in two commonly used PFG settings, thus bridging the gap between PFG and CFG environments; • We show that our theorems regarding bounded values can be used to modify the existing state-of-the-art CSG algorithm for the CFG settings. Consequently, our new algorithm can be applied to generate the optimal CS in these PFG settings; • Using numerical simulations we demonstrate the effectiveness of our approach which, in a number of cases, is comparable to results obtained for the CFG setting. Much research effort has been directed at optimal CS generation in the CFG setting. Sandholm et al. [12] proposed a new way to rep-
T. Michalak et al. / Optimal Coalition Structure Generation in Partition Function Games
resent the entire set of CSs in the form of a coalition structure tree. For this representation, they developed an algorithm which generates CS values within a finite bound from the optimal value for the entire system. It initially searches the two lowest rows of the tree and then searches from the top downwards either until the whole space has been searched or the running time of the algorithm has expired. Based on this representation, Dang and Jennings proposed a much faster algorithm which, after performing the same initial step as that of Sandholm et al., then searches exclusively through particular coalition structures in the remaining space [3]. Nevertheless, both solutions have drawbacks; notably, that the worst case bounds they provide are relatively low and that both the algorithms must always search the whole space in order to guarantee an optimal solution. To circumvent these problems Rahwan et al. recently proposed a more efficient anytime CSG algorithm for the CFG setting [10]. Using a novel representation of the search space, this algorithm is significantly faster than its existing counterparts. The input to the algorithm are coalitions lists structured according to the distributed coalitional value calculation (DCVC) algorithm presented in [9]. In contrast, in the field of economics, much research has been directed at coalition formation in PFG settings. Particular efforts have been undertaken towards computing both the Shapley value and the core solution in such settings [7, 4]. Furthermore, PFGs have been used to represent coalition formation in many practical applications, such as fisheries on the high seas [8], fuel emissions reduction [5] or Research & Development (R&D) cooperation between firms [2]. Both of the former settings are examples of games with positive externalities, where the decision by one group of countries to reduce fishing activities or fuel emissions may have a positive impact on other countries. In contrast, a R&D cooperation between a group of companies could be modeled using a game with negative externalities since the market positions of some companies could be hindered by the increased competitiveness resulting from a collusion of other companies. An excellent overview of both CFG and PFG approaches in economics is provided in [1].
2
Partition Function Games
For a set of agents A = {a1 , . . . , an } and a coalition C ⊆ A, a PFG generates a non-negative integer value v(C; CS), where CS is a coalition structure of A and C ∈ CS. Following Halfalir [6], a PFG is said to have weak positive externalities if for every three subsets C, S, T ⊆ A where C ∩ S ∩ T = ∅ and for any structure CS of A \ (S ∪ T ∪ C) then:
389
C , C1 , . . . Ci−1 , Ci+1 , Cj−1 , Cj+1 , . . . , Ck is at least (at most) as large as the sum of the values of Ci and Cj in CS. Classic results in game theory tell us that for super-additive CFGs (where for any two disjoint coalitions S, T v(S ∪ T ) ≥ v(S) + v(T )) the optimal CS is the grand coalition (i.e. the coalition containing every agent in the system), whereas in sub-additive CFGs (where for any two disjoint coalitions S, T v(S ∪ T ) ≤ v(S) + v(T )) the optimal structure is the CS of singletons, i.e. the structure where all the agents act as individuals.2 We now show, with the aid of an example (taken from [6]), that this does not necessarily hold in a super- (sub-) additive PFG setting. Consider the following super-additive PFG for A = {1, 2, 3}, where, in addition, there are negative externalities: • v((i); {(1), (2), (3)}) = 4 for i = 1, 2, 3; • v((j, k); {(i), (j, k)}) = 9 and v((i); {(i), (j, k)}) = 1 for all i, j, k ∈ A where i = j = k; and • v(A; {A}) = 11. Clearly, the super-additive requirement is met but the grand coalition is not the optimal structure since v(A; {A}) = 11 < 3 v((i); {(1), (2), (3)}) = 12. Thus, this example shows that i=1 the grand coalition is not always the optimal structure in a superadditive PFG with negative externalities. Equally, for the same A, suppose that the values of the partition function are as follows: • v((i); {(1), (2), (3)}) = 3 for i = 1, 2, 3; • v((j, k); {(i), (j, k)}) = 2 and v((i); {(i), (j, k)}) = 7 for all i, j, k ∈ A where i = j = k; and • v(A; {A}) = 4. In this game, the sub-additivity property is met but the CS of singletons is not the optimal CS, due to the positive externalities. This shows that this structure is not always the optimal in sub-additive PFGs with positive externalities. Thus, the classic results from the CFG setting do not always hold in the PFG one. Consequently, in this paper, we shall study four classes of PFG: 1. 2. 3. 4.
+ super-additive games with positive externalities (P Fsup ); − super-additive games with negative externalities (P Fsup ); + sub-additive games with positive externalities (P Fsub ); − sub-additive games with negative externalities (P Fsub ).
v(C; {S ∪ T, C} ∪ CS ) ≥ v(C; {S, T, C} ∪ CS ). In the case where the inequality is ≤ the PFG is said to exhibit weak negative externalities. Intuitively, this property means that a game has positive (respectively, negative) externalities if a merger between two coalitions makes every other coalition better (worse) off. Furthermore, a PFG is weakly super-additive (sub-additive) if for any S, T ⊆ A with S∩T = ∅ and structure CS of A\S∪T then:
Figure 1: Paths for a six agent setting
v(S ∪ T ; {S ∪ T } ∪ CS ) ≥ (≤)v(S; {S, T } ∪ CS ) + v(T ; {S, T } ∪ CS ). Intuitively, this means that a PFG is super-additive (subadditive) if two coalitions Ci and Cj in a structure, say CS = C1 , . . . Ci , Cj , . . . , Ck , join together to form coalition C = Ci ∪ Cj then the value of C in the structure CS =
The Sandholm et al. tree representation of the CS space, briefly described in Section 2, is very useful in solving the CSG prob2
There also exists similar definitions for the strong positive and negative externalities and strong super- and sub-additivity, in which signs ≤ and ≥ are replaced with < and >. In the remainder of this paper, whenever we refer to externalities and additivity, we mean their weak forms. Note that strong relationships are a subset of weak ones.
390
T. Michalak et al. / Optimal Coalition Structure Generation in Partition Function Games
lem for PFGs. Figure 1 displays a modified version of the Sandholm et al. tree for six agents, where nodes (hereafter configurations) represent subspaces of CSs containing coalitions of particular sizes indicated by the number (cf. [11]). For instance, the configuration {5, 1} denotes the subspace of all CSs containing exactly two coalitions of size 5 and 1 for 6 agents i.e {(12345), (6)}, {(12346), (5)}, {(12356), (4)}, {(12456), (3)}, {(13456), (2)} and {(1), (23456)}. The arrows between the subspaces show how a merger of two coalitions converts one CS to the other. For example, the arrow from {4, 1, 1} → {4, 2} shows how the merge of the two coalitions of size 1 converts the configuration {4, 1, 1} to {4, 2}. The notion of weakness implies that there can be many CSs with the optimal value. Therefore, in actual fact, we should speak about a set of optimal coalition structures which, in a special case, might contain every feasible CS; this could occur, for example, when all weak externalities are zero and weak super- (sub-) additivity does not increase (decrease) the combined value of merging coalitions. − + Theorem 1 In P Fsup (P Fsub ) the grand coalition (the coalition structure of singletons) always belongs to the set of optimal coalition structures. Furthermore, assuming that super- (sub-) additivity is not weak and both the positive and negative externalities are not weak − + then in P Fsup ) the grand coalition (the coalition structure (P Fsub of singletons) is the only optimal structure. − + Proof: Consider P Fsup (P Fsub ). Beginning with configuration {1, 1, 1 . . . , 1}, it is possible to reach configuration {n} by a variety of paths. Assume that we move from a coalition structure CS in configuration G of size k to a structure CS in configuration G of size k − 1, ∀k = n, . . . , 2. In such a case, CS must contain one coalition which is the union of exactly two coalitions in CS ∈ G and k − 2 ‘other’ coalitions in CS which were not involved in the merge. Due to the super-additivity (sub-additivity) property, the value of the merged coalition in CS must be greater than (less than) or equal to the sum of the component coalitions in CS. Furthermore, as a result of the positive (negative) externalities, the value of the other coalitions in CS must not be smaller (bigger) than in CS. Consequently, the value of CS ∈ G is not smaller (not bigger) than the value of CS ∈ G. Without loss of generality, this is applicable to every path, thus the configuration {n} ({1, 1, 1 . . . , 1}) must contain a structure whose value is not smaller than values of other CSs in every other configuration. Hence, the grand coalition (structure of singletons) always belongs to the set of optimal coalition structures − + in P Fsup (P Fsub ). Waiving the assumption of weakness (where the ‘≤’ and ‘≥’ signs are replaced with ‘<’ and ‘>’, respectively, in both super- and subadditivity as well as positive and negative externality) then the above + proof remains valid and it is not difficult to show that in the P Fsup − (P Fsub ) setting the grand coalition (structure of singletons) is the only optimal structure.
It immediately follows that for both these PFGs, it is not necessary to search the entire CS space to find the optimal CS.
3
Bounded Coalition Values in
− P Fsup
and
+ P Fsub
In the PFG setting, each coalition (with the exception of the grand coalition and some coalitions in the second level of the Sandholm et al. tree) may have many values, depending on which CS it belongs to. This means that we cannot determine an exact value of a coalition in a particular structure without actually searching it. However, we will now show that, by searching only certain paths in the Sandholm et al.
Figure 2: An extract from Sandholm et al. tree for 6 agents representation, it is possible to bound the value of every coalition in + − problem is dual to the P Fsub problem, the entire tree. As the P Fsup our result can be presented for both classes of games simultaneously. + − (P Fsub ) setting and the coaliTheorem 2 Consider the P Fsup tion Cx in the structures CS = {Cx , (i1 ), . . . , (in−|Cx | )} and CS = {Cx , Cy } where (i1 ), . . . , (in−|Cx | ) ∈ Cx and Cy = A \ Cx . The value of Cx in CS is the greatest (smallest) value of Cx in every coalition structure it belongs to, or ∀Cx ∈ CS, v(Cx ; CS ) ≥ (≤)v(Cx ; CS). The value of Cx in CS is the smallest (greatest) value of Cx in every coalition structure it belongs to, or ∀Cx ∈ CS, v(Cx ; CS ) ≤ (≥) v(Cx ; CS).
Proof: First consider the value of Cx in CS (i.e. v(Cx ; CS )). In Figure 1, CS can belong to any configurations in the following path: {1, 1, 1, 1, 1, 1} → {2, 1, 1, 1, 1} → {3, 1, 1, 1} → {4, 1, 1} → {5, 1}. Every coalition Cx such that |Cx | ≥ 1 which appears in any configuration in this path is the only coalition that is formed. This guarantees that v(Cx ; CS ) has never been affected by a negative (positive) externality. Conversely, in all the other configurations where Cx appears, other non-trivial coalitions co-exist whose creation, by definition, have induced negative (positive) externality on Cx . In such configurations the values of Cx will be at most (least) equal to v(Cx ; CS ) since, as is visible in Figure 1, one can always reach any other configuration containing CSs with Cx starting from CS .3 Since, on such a path, Cx is only subject to negative (positive) externalities, v(Cx ; CS ) must be at least as big (small) as in any other CS. Therefore, v(Cx ; CS ) is the greatest (smallest) value of Cx in every CS that it belongs to. Now consider the value of Cx in CS (i.e. v(Cx ; CS )). Cx is a part of both CS and CS , therefore, it is always possible to find a path which starts from CS and leads to CS , i.e. CS → . . . → CS . Since Cx is only subject to consecutive negative (positive) externalities, the value of Cx will decrease (increase) or at most (least) remain the same, every time one traverses this path, moving from one configuration to another. Consequently, v(Cx ; CS ) will not be greater (smaller) than v(Cx ; CS ) or the value of Cx in any other configuration on this path. Similarly, starting from any other configuration containing Cx , it is always possible to find a path leading to CS . Since Cx is subject to consecutive negative (positive) externalities through such paths, the above argument is equally compelling. Therefore, the value of Cx ∈ CS is the smallest (greatest) value of Cx in every coalition structure it belongs to. Consider a few elements of the original Sandholm et al. tree − in Figure 2. Theorem 2 says that under P Fsup ∀(123) ∈ CS, v((123); CSa ) ≥ v((123); CS) (where CS is any structure containing (123)). Initially, it may seem possible for v((123); CSd ) to be higher than v((123); CSa ) because the former structure emerged 3
With the exception of CS , CS and the grand coalition, any coalition Cx might have a number of different values in one configuration, as it belongs to a number of distinct CSs. Thus, we use the plural for “values” and “coalition structures”.
T. Michalak et al. / Optimal Coalition Structure Generation in Partition Function Games
after agent 3 joined coalition (12) in CSb and, due to superadditivity property, v((123); CSd ) could become much higher than v((123); CSa ). However, in actual fact, this cannot happen because of the assumed negative externalities. It is always possible to find a path from {(123), (4), (5), (6)} to any other CS that contains (123) and on such a path the value of (123) is only subject to negative externalities. Consequently, v((123); CSd ) cannot be higher than v((123); CSa ). Such reasoning can also show that v((123); CSe ) is the smallest value of (123) in Figure 2 and similar reasoning can + be used to back up our claims for the P Fsub setting.
4
CSG Algorithm For The PFG Setting
The Rahwan et al. CSG algorithm relies on the fact that coalition values are always constant in the CFG setting. This makes it possible to collect a number of basic statistics at the very beginning to assess which configurations are most promising and which not. In the PFG setting, coalition values depend on the CS they belong to, so such − a technique is not generally feasible. However, for both P Fsup and + P Fsub , Theorem 2 allows us to construct bounds on the values of every coalition in every CS. Subsequently, we can use these bounds to construct upper and lower bounds for each configuration. In other words, our theorem bridges the gap between both settings, making it possible to modify the existing state-of-the-art CSG algorithm so + − that it can generate a set of optimal CSs in the P Fsup (P Fsub ) setting, often without searching the entire CS space. Let Ls denote the (structured) list containing all coalitions of size s. 4 Our CSG algorithm can be summarized as follows: Step 1. Compute the value of the grand coalition. For every coalition C in list Ls : 1 ≤ s = |C| < n, compute its value in the CSs where: (i) all the other agents not in C form coalition C = A \ C, and (ii) every other agent not in C acts alone. These are the maximum and minimum (minimum and maximum) values of each coalition in the entire CS space and are stored in lists Lmax and s Lmin which are structured as in the DCVC algorithm (see [9]); s Step 2. Partition the search space into configurations. Prune those which were searched in Step 1; Step 3. Compute the upper bounds of every remaining configuration G, denoted U BG , using maximum values the lists ofmax from Step 1, i.e. U BG = . Set the upper ∀s∈G maxLs bound of the entire system U B to be the value of the highest upper bound, i.e. U B = maxU BG and set the lower ∗ bound to be max{v(CSN ), max{AvgG }}, where AvgG = min avgL is the lower bound for the average value of each s ∀s∈G ∗ configuration G and CSN is the CS with the highest value found thus far. Order the configurations w.r.t. the value of U BG ; Step 4. Prune away those subspaces which cannot deliver a CS greater than LB, i.e. U BG < LB; Step 5. Search the configuration with the highest upper bound, updating LB to be the highest value of the structure found thus far ∗ (CSN ). During the search process, a refined branch and bound technique should be used; Step 6. Once the search of the configuration in Step 5 is completed, ∗ check whether v(CSN ) = U B or all configurations have been searched or pruned. If any of these conditions hold then the optimal CS has been found. Otherwise, go to Step 4. In Step 1 we compute the maximum (minimum) and the minimum (maximum) values of each coalition C in the entire tree. Storing both numbers per coalition requires twice as much memory as 4
see [9] for more details
391
in the CFG setting, but ensures that the highest and lowest values of each list Ls can be computed. This makes it possible to determine upper and lower bounds for each configuration as well as the upper bound of the entire system. Furthermore, in contrast to Rahwan et al., we cannot compute an exact average value of all the coalitions of size si , ∀i = 1, . . . , m, for a given configuration G = {gs1 , . . . , gsm }. However, it is possible to compute a lower bound for such an average value using Lmin as no average value can be smaller than the s one computed for the lists containing minimum values. In addition, the upper bound for each configuration G can be defined as the sum of maximal values that every coalition of size s in CS ∈ G can take, i.e. ∀s∈G maxLmax . s In the PFG setting, partitioning and pruning of the search space is done as in the Rahwan et al. algorithm for the CFG setting. Also, the process of searching through the promising subspaces is similar. In particular, certain techniques ensure that no redundant calculations are performed, i.e. no CS is considered twice. However, the branch-and-bound rule needs to be modified for the PFG setting. This rule prevents traversing hopeless paths while constructing CSs in the considered configuration. Branch and Bound Rule Suppose that G∗ = {gs1 , gs2 , gs3 , gs4 } is the configuration with the highest upper bound, which has not yet been searched. In the CFG setting, the branch and bound rule of Rahwan et al. goes as follows. Suppose the algorithm has already added coalitions Cgs1 , Cgs2 to the CS under construction. When adding the next coalition from list Lgs3 , the rule ignores cases which together with max Lgs4 , would render the values of the CS less than the current LB of the entire system. From Theorem 2, instead of exact values of coalitions which we do not know beforehand, we can use the maximum values as computed in Step 1 and incorporate + − this rule to both P Fsup and P Fsub settings. However, with only maximum values, such a branch and bound rule is likely to be less effective than in the original setting. Anytime properties When the arguments of Sandholm et al. [12] are applied to the upper bounds of values of coalitions, it can be proven that after Step 1 of our algorithm, where levels 1, 2 and;n have < ∗ already been searched, CSN is no smaller than n2 of ; n
5
Numerical Simulations
To the best of our knowledge, the CSG algorithm for the PFG setting proposed in this paper is the only one in the literature; thus there is no benchmark algorithm that can be used for a numerical comparison. Although it would be possible to adapt the CFG dynamic programming techniques for the PFG setting, due to lack of space, we will compare our results to the CSG algorithm for the CFG setting instead. As noted at the beginning of the paper, this solution has already been proven to be significantly superior w.r.t. dynamic programming alternatives, because it does not need to search all the feasible CSs. We will show that, in many cases, our modification of this algorithm for the PFG setting also only searches through a fraction of the CS space, thus saving a vast amount of calculation time. − Simulations are performed for the P Fsup setting. When the new
392
T. Michalak et al. / Optimal Coalition Structure Generation in Partition Function Games
the structures in each configuration are dependent on the value of the structures in the configuration in the previous level (see Figure 1). Consequently, when the gain from the super-additivity and the loss from externalities are of a similar magnitude, the extreme values of CSs in different configurations are more likely to be akin, making pruning techniques less effective. This effect is magnified by the use of the uniform distribution since CSs’ values in all configurations tend to be relatively dispersed.
6 Figure 3: Simulation results for
− P Fsup
setting
coalition is formed, the ‘gain’ from super-additivity is accounted for by adding a factor αa to its value. In addition, the ‘loss’ from the externality on the other coalitions in the structure is accounted for by multiplying their values by factors b−β , where α, β ∈ [0, 1) b are randomly-generated uniform variables and a, b ≥ 1 are constants. We assume that in the system there are 10 agents, from which 115,975 CSs can be formed.5 In Step 1 2028 CSs are searched, i.e. the grand coalition, the CSs of singletons and 2C210 + 2C310 + ... + 2C810 +C910 other CSs. This amount accounts for 1.75% of the search space. The vertical axis on Figure 3 represents the proportion of the CS space searched, whereas a and b are indicated on the x and y axes, respectively. As the values of a and b increase, the ‘gain’ from superadditivity and the ‘loss’ from externalities decrease. We performed our simulations 25 times for each combination of a and b. The surface shown in Figure 3 is the average proportion of space searched by our algorithm. Furthermore, as the original CSG algorithm for the CFG setting for the uniform distribution of coalition payoffs searches on average about 2.5%, and this result is independent from a and b, we do not report it in Figure 3. We observe that when the ‘gain’ from super-additivity is high and the ‘loss’ from the negative externality is low, only a minimal proportion (under 4 %) of the space need be searched in order to compute the optimal structure. In fact, in such cases, the grand coalition or a coalition in the first few levels of the Sandholm et al. tree is usually the optimal structure. Consequently, it would seem that the − smaller the externality, the more the P Fsup setting becomes like the super-additive CFG setting, thus explaining why so little of the space is searched. Conversely, when the ‘gain’ from super-additivity is low and the ‘loss’ from the negative externality is high, only a fraction of − the search space was searched. This time, the P Fsup setting becomes more akin to the sub-additive CFG setting, so that the CS of singletons or a CS with a relatively small number of cooperating agents tends to be optimal. However, in situations where the ‘loss’ from the externality and the ‘gain’ from the super-additivity are both either high or low, it seems that pruning is ineffective since nearly all of the search space has to be searched in order to guarantee an optimal outcome (more than 98% in many cases). This is due to an inher− ent characteristic of the P Fsup setting: namely, that the values of 5
The particular challenge of simulations in the PFG setting is that (in contrast to the CF G setting) one must generate the values of all CSs beforehand. Furthermore, during the random generation of coalition values, it is im− + portant to ensure that all the CSs meet P Fsup (P Fsub ) properties. Consequently, we restrict our simulations to 10 agents and 115,975 CSs. Although this is less than the system of 27 agents considered for the CFG setting (cf. [11]), such a system in the PFG setting would require generating a CS space with more than 5.24 × 1020 CSs.
Conclusion & Future Work
In this paper, we considered coalition structure formation in the presence of coalition externalities, a novel topic in the multi-agent system literature. We modeled coalition formation with a partition function game (PFG), and considered four cases: (1) super-additive + games with positive externalities (P Fsup ), or (2) negative exter− nalities (P Fsup ); (3) sub-additive games with positive externalities + − (P Fsub ); or (4) negative externalities (P Fsub ). For cases (1) and (4), we proved that computing the optimal structure is straightforward, because either the grand coalition or the CS of singletons belong to the set of optimal CS. In contrast, this is not true for cases (2) and (3), where any CS can belong to the set of optimal coalition structures. Therefore, for these two cases we proved that it is possible to bound the value of each coalition. From this insight, we modified the existing state-of-the-art anytime CSG algorithm for the CFG setting and show how it can be used to generate the optimal CS in these two PFG settings. In future work, we plan to study the numerical performance of the new algorithm under different distributional assumptions regarding coalition values, and also develop a distributed version of our approach.
REFERENCES [1] F. Bloch, ‘Non-cooperative models of coalition formationin games with spillovers’, in Carraro, C. (ed.), Endogenous Formation of Economic Coalitions, ch. 2 pp. 35–79, (2003). [2] E. Catilina and R.Feinberg, ‘Market power and incentives to form research consortia’, Review of Industrial Organization, 28(2), 129–144, (2006). [3] V. Dang and N. Jennings, ‘Generating coalition structures with finite bound from the optimal guarantees’, in AAMAS, New York, USA, (2004). [4] K. Do and H. Norde, ‘The shapley value for partition function form games’, Discussion Paper, Tilburg University, Center for Economic Research, (2002). [5] M. Finus and B. Rundshagen, ‘Endogenous coalition formation in global pollution control. a partition function approach.’, Working Paper No. 307, University of Hagen., (2001). [6] I. Hafalir, ‘Efficiency in coalition games with externalities’, Games and Economic Behaviour, 61(2), 209–238, (2007). [7] L. K´oczy, ‘A recursive core for partition function form games’, Theory and Decision, 63(1), 41–51, (2007). [8] P. Pintasssilgo, ‘A coalition approach to the management of high seas fisheries in the presence of externalities’, Natural Resource Modeling, 2(16), 175–197, (2003). [9] T. Rahwan and N. Jennings, ‘Distributing coalitional value calculations among cooperating agents’, in AAAI, pp. 152–157, Pittsburgh, USA, (2005). [10] T. Rahwan, S. Ramchurn, V. Dang, A. Giovannucci, and N. Jennings, ‘Anytime optimal coalition structure generation’, in Proceedings of AAAI 2007, (2007). [11] T. Rahwan, S. Ramchurn, V. Dang, A. Giovannucci, and N. Jennings, ‘Near-optimal anytime coalition structure generation’, in IJCAI, Hyderabad, India, (2007). [12] T. Sandholm, K. Larson, M. Andersson, O. Shehory, and F. Tohme, ‘Coalition structure generation with worst case guarantees’, AI, 12(111), 209–238, (1999).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-393
393
Coalition Structures in Weighted Voting Games Edith Elkind and Georgios Chalkiadakis and Nicholas R. Jennings 1 Abstract. Weighted voting games are a popular model of collaboration in multiagent systems. In such games, each agent has a weight (intuitively corresponding to resources he can contribute), and a coalition of agents wins if its total weight meets or exceeds a given threshold. Even though coalitional stability in such games is important, existing research has nonetheless only considered the stability of the grand coalition. In this paper, we introduce a model for weighted voting games with coalition structures. This is a natural extension in the context of multiagent systems, as several groups of agents may be simultaneously at work, each serving a different task. We then proceed to study stability in this context. First, we define the CS-core, a notion of the core for such settings, discuss its non-emptiness, and relate it to the traditional notion of the core in weighted voting games. We then investigate its computational properties. We show that, in contrast with the traditional setting, it is computationally hard to decide whether a game has a non-empty CS-core, or whether a given outcome is in the CS-core. However, we then provide an efficient algorithm that verifies whether an outcome is in the CS-core if all weights are small (polynomially bounded). Finally, we also suggest heuristic algorithms for checking the non-emptiness of the CS-core.
1
Introduction
Coalitional games [8] provide a rich framework for the study of cooperation both in economics and politics, and have been successfully used to model collaboration in multiagent systems [11, 3]. In such games, teams (or coalitions) of agents come together to achieve a common goal, and derive individual benefits from this activity. A particularly simple, yet expressive, class of coalitional games is that of weighted voting games (WVGs) [13]. In a weighted voting game each player (or agent) has a weight, and a coalition wins if its members’ total weight meets or exceeds a certain threshold, and loses otherwise. Weighted voting has straightforward applications in a plethora of societal and computer science settings ranging from real-life elections to computer operating systems, as well as a variety of settings involving multiagent coordination. In particular, an agent’s weight can be thought of as the amount of resources available to this agent, and the threshold indicates the amount of resources necessary to achieve a task. A winning coalition then corresponds to a team of agents that can successfully complete this task. Originally, research in weighted voting games was motivated by a desire to model decision-making in governmental bodies. In such settings, the threshold is usually at least 50% of the total weight, and the issues of interest relate to the distribution of payoffs within the grand coalition, i.e., the coalition of all agents. Perhaps for this reason, to date, all research on weighted voting games tacitly assumes that the grand coalition will form. However, in multiagent settings such as 1
School of Electronics and Computer Science, University of Southampton, UK; email: {ee, gc2, nrj}@ecs.soton.ac.uk
those described above, the threshold can be significantly smaller than 50% of the total weight, and several winning coalitions may be able to form simultaneously. Moreover, in this situation the formation of the grand coalition may not, in fact, be a desirable outcome: instead of completing several tasks, forming the grand coalition concentrates all agent resources on finishing a single task. In contrast, the overall efficiency will be higher if the agents form a coalition structure (CS), i.e., a collection of several disjoint coalitions. To model such scenarios, in this paper we introduce a model for WVGs with coalition structures. We then focus on the issue of stability in this setting. A structure is stable when rational agents are not motivated to depart from it, and thus they can concentrate on performing their task, rather than looking for ways to improve their payoffs. Therefore, stability provides a useful balance between individual goals and overall performance. To study it, we extend the notion of the core—a classic notion of stability for coalitional games— to our setting, by defining the CS-core for WVGs. We then provide a detailed study of this concept, comparing it with the classic core and analyzing its computational properties. Our main contributions are as follows: (1) we define a new model that allows weighted voting games to admit coalition structures (Sec. 3); (2) we define the CS-core for such games, relate it to the classic core, and describe sufficient conditions for its non-emptiness (Sec. 4); (3) we show that several natural CS-core-related problems are intractable—namely, it is NP-hard to decide the non-emptiness of the CS-core and coNP-complete to check whether a given outcome is in the CS-core (Sec. 5). Interestingly, this contrasts with what holds in weighted voting games without coalition structures, where both of these problems are polynomial-time solvable; (4) we provide a polynomial-time algorithm to check if a given outcome is in the CScore in the important special case of polynomially-bounded weights. We then show how to use this algorithm to efficiently check if a given coalition structure admits a stable payoff distribution, and suggest a heuristic algorithm to find an allocation in the core (Sec. 6). We begin with some background and a brief review of related work.
2
Background and Related Work
In this section, we provide an overview of the basic concepts in coalitional game theory. Let I, |I| = n, be a set of players. A subset C ⊆ I is called a coalition. A coalitional game with transferable utility is defined by its characteristic function v : 2I → R that specifies the value v(C) of each coalition C [14]. Intuitively, v(C) represents the maximal payoff the members of C can jointly receive by cooperating, and it is assumed that the agents can distribute this payoff between themselves in any way. While the characteristic function describes the payoffs available to coalitions, it does not prescribe a way of distributing these payoffs. We say that an allocation is a vector of payoffs x = (x1 , . . . , xn ) assigning some payoff to each i ∈ I. We write x(S) to denote
394
E. Elkind et al. / Coalition Structures in Weighted Voting Games
i∈S xi . An allocation is feasible for the grand coalition if x(I) ≤ v(I). An imputation is a feasible allocation that is also efficient, i.e., x(I) = v(I). A weighted voting game (WVG) is a coalitional game G given by a set of agents I = {1, . . . , n}, their weights w = {w1 , . . . , wn }, wi ∈ R+ , and a threshold T ∈ R; we write G = (I; w; T ). We use w(S) to denote i∈S wi . For a coalition S ⊆ I, its value v(S) is 1 if w(S) ≥ T ; otherwise, v(S) = 0. Without loss of generality, the value of the grand coalition I is 1 (i.e., w(I) ≥ T ). One of the best-known solution concepts describing coalitional stability is the core[8].
Definition 1. An allocation x is in the core of G iff x(I) = v(I) and for any S ⊆ I we have x(S) ≥ v(S). If an allocation x is in the core, then no subgroup of agents can guarantee all of its members a higher payoff than the one they receive in the grand coalition under x. This definition of the core can therefore be used to characterize the stability of the grand coalition. The setting where several coalitions can form at the same time can be modeled using coalition structures. Formally, a coalition structure (CS ) is an exhaustive partition of the set of agents. CS(G) denotes the set of all coalition structures for G. Given a structure CS = {C1 , . . . , Ck }, an allocation x is feasible for CS if x(Ci ) ≤ v(Ci ) for i = 1, . . . , k and efficient for CS if this holds with equality. Games with coalition structures were introduced by Aumann and Dreze [2], and are obviously of interest from an AI/multiagent systems point of view, as illustrated in Section 1. Indeed, in this context dealing with coalition structures other than the grand coalition is of uttermost importance: simply put, there is a plethora of realistic application scenarios where the emergence of the grand coalition is either not guaranteed, is plainly impossible, or might be perceivably harmful (for instance, it usually makes little sense to allocate all available robots on a single task). In particular, in the context of WVGs, by forming several disjoint winning coalitions, the agents generate more payoff than in the grand coalition. Additional motivation from an economics perspective is given in [2], which contains a thorough and insightful discussion on why coalition structures arise. Now, there exists a handful of approaches in the multiagent literature that do take coalition structures explicitly into account. Sandholm and Lesser [11] discuss the stability of coalition structures when examining the problem of allocating computational resources to coalitions. Apt and Radzik [1] also do not restrain themselves to problems where the outcome is the grand coalition only. Instead, they introduce various stability notions for abstract games whose outcomes can be coalition structures, and discuss simple transformations by which stable partitions of the set of players may emerge. Dieckmann and Schwalbe [5] also propose a version of the core with coalition structures when studying dynamic coalition formation, and so do Chalkiadakis and Boutilier when tackling coalition formation under uncertainty [4]. None of these papers studies WVGs, however. A thorough discussion of weighted voting games can be found in [13]. The stability-related solution concepts for WVGs (without coalition structures) have recently been studied by Elkind et al. [6], who also investigate them from computational perspective. However, there is no existing work in the literature studying WVGs with coalition structure—a class of games that we now proceed to define.
3
Coalition structures in WVGs
We now extend the traditional model for WVGs to allow for coalition structures. First, an outcome of a game is now a pair of the form
(coalition structure, allocation) rather than just an allocation. Furtheremore, in the traditional model, any allocation of payoffs among the participating agents is required to be an exhaustive partition of the value of the grand coalition. In other words, it is always an imputation, i.e., an allocation of payoffs that is feasible and efficient for the grand coalition I. As we now allow WVGs to admit coalition structures, we replace the aforementioned requirement with similar requirements with respect to a coalition structure: First, we no longer require an allocation to be an imputation in the classic sense. Instead, we demand that, for a given outcome (CS , x), the allocation x of payoffs for I is feasible for CS . In this way, CS may contain zero or more winning coalitions. Furthermore, we define an imputation for a coalition structure CS as a vector p of nonnegative numbers (p1 , . . . , pn ) (one for each agent in I), such that for every C ∈ CS it holds p(C) = v(C) ≤ 1; we write p ∈ I(CS ). That is, an imputation is now a feasible and efficient allocation of the payoff of any coalition C ∈ CS .
4
Core and CS-core of weighted voting games
In this section we define the core of WVG games with coalition structures, relate it to the “classic” core of WVG games without coalition structures, and obtain some core characterization results for a few interesting classes of WVG games. The definition of the core (Def. 1) takes the following simple form in the traditional WVGs setting (see, e.g., [6]): Definition 2. The core of a WVG game G = (I; w; T ) is the set of imputations p such that, ∀S ⊆ I, w(S) ≥ T ⇒ p(S) ≥ 1. Intuitively, an imputation p is in the core whenever the payoffs defined by p are such that any winning coalition already receives collective payoff of 1 (and therefore no coalition can improve its payoff by breaking away from the grand coalition). This notion of the core cannot be directly used for coalition structures: indeed, it demands that an allocation is an imputation in the traditional sense, and therefore no imputation for a coalition structure with more than one winning coalition can ever be in the core. We will now extend this definition to the setting with coalition structures. Namely, we define the core of weighted voting games with coalition structures, or CS-core, as follows: Definition 3. The CS-core of a WVG game G = (I; w; T ) with coalition structures is the set of outcomes (CS , p) such that ∀S ⊆ I, w(S) ≥ T ⇒ p(S) ≥ 1 and ∀C ∈ CS it holds p(C) = v(C). Intuitively, given an outcome that is in the CS-core, no coalition has an incentive to break away from the coalition structure. Now, it is well-known (see, e.g., [6]) that in weighted voting games the core is non-empty if and only if there exists a veto player, i.e., a player that belongs to all winning coalitions, and an imputation is in the core if and only if it distributes the payoff in some way between the veto players. This directly implies the following result. Observation 1 (An imputation in the core induces an outcome in the CS-core). Let G = (I; w; T ). If the core of G is non-empty, then, for any p in the core, the outcome ({I}, p) is in the CS-core of G. However, it turns out that the CS-core may be non-empty even when the core is empty. Example 1. Consider a weighted voting game G = (I; w; T ), where I = {1, 2, 3}, w = (1, 1, 2) and T = 2. It is easy to see that none of the players in G is a veto player, so G has an empty core. On
E. Elkind et al. / Coalition Structures in Weighted Voting Games
the other hand, the outcome (CS , p), where CS = {{1, 2}, {3}}, p = (1/2, 1/2, 1) is in the CS-core of G. Indeed, agent 3 is getting a payoff of 1 under this outcome, so his payoff cannot improve. Therefore, the only deviation available to the other two players is to form singleton coalitions, and this is clearly not beneficial. We now show that if the threshold T is strictly greater than 50% the CS-core and the core coincide. Proposition 1 (In absolute majority games, the cores coincide). Let G = (I; w; T ) be a WVG game with T > w(I)/2. Then there is an outcome (CS , p) in the CS-core of G if and only if p is in the core of G. Consequently, G has a non-empty core if and only if it has a non-empty CS-core. Proof. Suppose that an outcome (CS , p) is in the CS-core of G. As T > w(I)/2, CS can contain at most one winning coalition C, and hence p(I) = 1. Consider any player i ∈ C such that pi > 0. If pi is not a veto player, we have w(I \ {i}) ≥ T , p(I \ {i}) < 1, so (CS , p) is not in the CS-core of G, a contradiction. Hence, under p only the veto players get any payoff, which implies that p is in the core of G. Conversely, if p is in the core of G, it is easy to see that ({I}, p) is in the CS-core of G. We can also prove the following sufficient condition for nonemptiness of the CS-core. Theorem 1. Any WVG G = (I; w; T ) that admits a partition of players into coalitions of weight T has a non-empty CS-core. Proof. Let CS = {C1 , . . . , Ck } be the corresponding partition such that w(Ci ) = T for all i = 1, . . . , k. Define p by setting pj = wj /T for all j = 1, . . . , n. Consider any winning coalition S. We have w(S) ≥ T , so p(S) = w(S)/T ≥ 1, and hence S does not want to deviate. As this holds for any S with v(S) = 1, the outcome (CS , p) is in the CS-core of G. However, it is not the case that the CS-core of a weighted voting game is always non-empty. In particular, this follows from the fact that the CS-core coincides with the core in games with T > w(I)/2, and such games may have an empty core. We now show that the CScore can be empty also if T < w(I)/2: Example 2. Consider a WVG G = (I; w; T ), where I = {1, 2, 3, 4, 5}, w = (1, 1, 1, 1, 1) and T = 2. We now show that this game has empty CS-core. Indeed, consider any CS ∈ CS(G) and any p ∈ I(CS ). Clearly, CS can contain at most two winning coalitions, so p(I) ≤ 2. Now, if there is a coalition C ∈ CS , |C| ≥ 3, such that pi > 0 for all i ∈ C, any two players i, j ∈ C can deviate by forming a winning coalition and splitting the surplus p(C \ {i, j}). If all coalitions have size at most 2, then there is a player i that forms a singleton coalition (and hence pi = 0). There also exists another player j such that pj < 1 (otherwise p(I) ≥ 4). But then S = {i, j} satisfies w(S) ≥ T , p(S) < 1, so it is a successful deviation.
5
Non-emptiness of the CS-core: hardness results
In the rest of the paper, we deal with computational questions related to the notion of the CS-core. This topic is important since in practical applications agents have limited computational resources, and may not be able to find a stable outcome if this requires excessive computation. To provide a formal treatment of complexity issues in our setting, we assume that all weights and the threshold are integers
395
given in binary. As any rational weights can be scaled up to integers, this can be done without loss of generality. In the previous section, we explained how to verify whether the core is non-empty or whether a given outcome is in the core. It is not hard to see that this verification can be done in polynomial time: e.g., to check the non-emptiness of the core, we simply check if w(I \ {i}) ≥ T for all i ∈ I. In WVGs with coalition structures, the situation is very different. Namely, we will show that it is NP-hard to decide whether a given WVG has a non-empty CS-core. Moreover, even if we are given an imputation, it is coNP-complete to decide whether it is in the CS-core of a given WVG. We now state these computational problems more formally. Name: N ON E MPTY C S C ORE. Instance: Weighted voting game G = (I; w; T ). Question: Does G have a non-empty CS-core? Name: I N C S C ORE. Instance: Weighted voting game G = (I; w; T ), a coalition structure CS ∈ CS(G) and an imputation p ∈ I(CS ). Question: Is (CS , p) in the CS-core of G? Both of our reductions rely on the well-known NP-complete PAR problem. An input to this problem is a pair (A; K), where A is a list of positive integers A = {a1 , . . . , an } such that n i=1 ai = 2K. It is a “yes”-instance if there is a subset of indices J such that i∈J ai = K and a “no”-instance otherwise [7, p.223]. TITION
Theorem 2. The problem N ON E MPTY C S C ORE is NP-hard. Proof. We will describe a polynomial-time procedure that maps a “yes”-instance of PARTITION to a “yes”-instance of N ON E MP TY C S C ORE and a “no”-instance of PARTITION to a “no”-instance of N ON E MPTY C S C ORE. Suppose that we are given an instance (a1 , . . . , an ; K) of PARTITION. If there is an i such that ai > K, then obviously it is a “no”-instance of PARTITION, so we map it to a fixed “no”-instance of N ON E MPTY C S C ORE, e.g., by setting G = ({1, 2, 3, 4, 5}; (1, 1, 1, 1, 1); 2) as in Example 2. Otherwise, we construct a game G = (I; w; T ) by setting I = {1, . . . , n}, wi = ai for i = 1, . . . , n, T = K. Note that in this case we have w(I \ {i}) ≥ T for any i, so there are no veto players in G. Suppose that we have started with a “yes”-instance of PARTITION, and let J be such that i∈J ai = K. Consider the coalition structure CS = {J, I \ J} and an imputation p given by pi = wi /K for i = 1, . . . , n. Note that w(J) = w(I \J) = K, so p(J) = p(I \J) = 1, i.e., p is a valid imputation. It is easy to see that (CS , p) is in the CScore of G. Indeed, for any winning coalition S we have w(S) ≥ K, so p(S) ≥ 1, i.e., the members of S would not want to deviate. On the other hand, suppose that we have started with a “no”instance of PARTITION. Consider any outcome (CS , p) in the resulting game. Clearly, CS can contain at most one winning coalition: if there are two disjoint winning coalitions, each of them has weight K, i.e., it can be used as a “yes”-certificate for PARTITION. If CS contains no winning coalitions, then it is clearly unstable, as w(I) ≥ T , p(I) = 0. Now, suppose that CS contains exactly one winning coalition S. In this case we have p(S) = p(I) = 1 and pi = 0 for all i ∈ S. We have pi > 0 for some i ∈ S, so p(I \ {i}) < 1. Moreover, by construction, w(I \ {i}) ≥ T . Hence, I \ {i} can deviate, so (CS , p) is not in the CS-core of G. Theorem 3. The problem I N C S C ORE is coNP-complete. Proof. We will show that the complementary problem on checking that a given outcome is not in the core is NP-complete.
396
E. Elkind et al. / Coalition Structures in Weighted Voting Games
First, it is easy to see that this problem is in NP: we can guess a coalition S such that w(S) ≥ T , but p(S) < 1; this coalition can successfully deviate from (CS , p). Now, to show that this problem is NP-hard, we construct a reduction from PARTITION as follows. Given an instance (a1 , . . . , an ; K) of PARTITION, we set I = {1, . . . , n, n + 1, n + 2} and wi = 2ai for i = 1, . . . , n. Define also I = {1, . . . , n}. The weights wn+1 and wn+2 and the quota T are determined as follows. We construct a coalition S by adding agents 1, 2, . . . to it one by one until the weight of S is at least 2K. If the weight of S is exactly 2K, this means that we have started with a “yes”-instance of PARTITION. In this case, we set wn+1 = wn+2 = 0, T = 2K, CS = {I}, and pi = wi /T for all i ∈ I. It is easy to see that the outcome (CS , p) is not stable: the agents in S can deviate and increase their total payoff from 1/2 to 1. Hence, in this case we have mapped a “yes”-instance of PARTITION to a “no”-instance of I N C S C ORE. Now, suppose that w(S) > 2K. As all weights are even, we have w(S) = 2Q for some integer Q > K. Also, we have w(I \ S) = 4K − 2Q. Set T = 2Q, and let wn+1 = wn+2 = 2Q − 2K. Now we have w(I \ S) = 4K − 2Q + 4Q − 4K = 2Q, i.e., both S and I \ S are winning coalitions. Set CS = {S, I \ S}. Now, p is defined as follows: for all i ∈ I set pi = wi /T , set pn+1 = wn+1 /(T + 1), and set pn+2 = 1−p(I \S)−pn+1 . We have p(S) = w(S)/T = 1, p(I \ S) = p(I \ S) + pn+1 + pn+2 = 1, so p is an imputation. Note also that we have pn+1 + pn+2 = 1 − p(I \ S) = 1 − w(I \ S)/T = (wn+1 + wn+2 )/T . Moreover, we have pn+1 < wn+1 /T , p(I \ S) = w(I \ S)/T , and hence pn+2 > wn+2 /T . We now show that if (a1 , . . . , an ; K) is a “yes”-instance of PAR TITION , then (I; w; T ), CS , p is a “no”-instance of I N C S C ORE . Indeed, suppose there is a set J such that i∈J ai = K. Consider the coalition J = J ∪{n+1}. We have w(J ) = 2K +2Q−2K, so it is a winning coalition. On the other hand, p(J ) = p(J) + pn+1 = w(J)/T + wn+1 /(T + 1) < w(J )/T = 1. Hence, J can benefit from deviating, i.e., (CS , p) is not in the core. On the other hand, suppose that (I; w; T ), CS , p is a “no”instance of I N C S C ORE, i.e., there is a set J such that w(J ) ≥ T , p(J ) < 1. Suppose that w(J ) > T , i.e., w(J ) ≥ T + 1. We have pi ≥ wi /(T + 1) for all i ∈ I (indeed, we have pi ≥ wi /T for i = n + 1 and pi = wi /(T + 1) for i = n + 1), so p(J ) ≥ w(J )/(T + 1) ≥ 1, a contradiction. Hence, we have w(J ) = T . Moreover, if n + 1 ∈ J , we have p(J ) ≥ w(J )/T = 1, a contradiction again. Therefore, n + 1 ∈ J . Finally, if n + 2 ∈ J , we have p(J ) = p(J ∩ I ) + pn+1 + pn+2 = w(J ∩ I )/T + (wn+1 + wn+2 )/T = w(J )/T = 1, also a contradiction. We conclude that w(J ) = T , n + 1 ∈ J , n + 2 ∈ J , and hence w(J ∩ I ) = 2Q − (2K − 2Q) = 2K, which means that i∈J ∩I ai = K, i.e., J ∩ I is a witness that we have a “yes”-instance of PARTITION.
6
Algorithms for the CS-core
The hardness results presented in the previous section rely on all weights being given in binary. However, in practical applications it is often the case that the weights are not too large, or can be rounded down so that the weights of all agents are drawn from a small range of values. In such cases, we can assume that the weights are given in unary, or, alternatively, are at most polynomial in n. It is therefore natural to ask if our problems can be solved efficiently in such settings. It turns out that for I N C S C ORE this is indeed the case. Theorem 4. There exists a pseudopolynomial2 algorithm AInCsCore 2
An algorithm whose running time is polynomial if all numbers in the input
for I N C S C ORE, i.e., an algorithm that correctly decides whether a given outcome (CS , p) is in the CS-core of a weighted voting game (I; w; T ) and runs in time poly(n, w(I), |p|), where |p| is the number of bits in the binary representation of p. Proof. The input to our algorithm is an instance of I N C S C ORE, i.e., a weighted voting game G = (I; w; T ), a coalition structure CS ∈ CS(G) and an imputation p ∈ I(CS ). The outcome (CS , p) is not stable if and only if there exists a set S such that w(S) ≥ T , but p(S) < 1. This means that our problem is essentially reducible to the classic K NAPSACK problem [7], which is known to have a pseudopolynomial time algorithm based on dynamic programming. In what follows, we present this algorithm for completeness. Let W = w(I). For j = 1, . . . , n and w = 1, . . . , W , let P (j, w) be the smallest total payoff of a coalition with total weight w all of whose members appear in {1, . . . j}: P (j, w) = min{p(J) | J ⊆ {1, . . . , j}, w(J) = w}. Now, if minw=T,...,W P (n, w) < 1, it means that there is a winning coalition whose total payoff is less than 1. Obviously, this coalition would like to deviate from (CS , p), i.e., in this case (CS , p) is not in the CS-core. Otherwise, the payoff to any winning coalition (not necessarily in CS ) is at least 1, so no group wants to deviate from CS , and thus (CS , p) is in the CS-core. It remains to show how to compute P (j, w) for all j = 1, . . . , n, w = 1, . . . , W . For j = 1, we have P (1, w) = p1 if w = w1 and P (1, w) = +∞ otherwise. Now, suppose we have computed P (j, w) for all w = 1, . . . , W . Then we can compute P (j + 1, w) as min{P (j, w), pj+1 + P (j, w − wj )}. The running time of this algorithm is polynomial in n, W and |p|, i.e., in the input size. We now show how to use the algorithm AInCsCore to check whether for a given coalition structure CS there exists an imputation p such that the outcome (CS , p) is in the CS-core. Our algorithm for this problem also runs in pseudopolynomial time. Theorem 5. There exists a pseudopolynomial algorithm Ap that given a weighted voting game G = (I; w; T ) and a coalition structure CS ∈ CS(G), correctly decides whether there exists an imputation p ∈ I(CS ) such that the outcome (CS , p) is in the CS-core of G and runs in time poly(n, w(I)). Proof. Suppose CS = {C1 , . . . , Ck }. Consider the following linear feasibility program (LFP) with variables p1 , . . . , pn :
pi ≥
0
for all i = 1, . . . , n
pi =
1
for all j such that w(Cj ) ≥ T
pi =
0
for all j such that w(Cj ) < T
pi ≥
1
for all J ⊆ I such that w(J) ≥ T
i∈Cj
i∈Cj
(1)
i∈J
The first three groups of equations require that p is an imputation for CS : all payments are non-negative, the sum of payments to members of each winning coalition in CS is 1, and the sum of payments to members of each losing coalition in CS is 0. The last group of equations states that there is no profitable deviation: the payoff to each winning coalition (not necessarily in CS ) is at least 1. Clearly, we can implement the algorithm Ap by solving this LFP, as follows: The size of this LFP may be exponential in n, as there is a constraint for each winning coalition. Nevertheless, it is well-known that such LFPs can be solved in polynomial time by the ellipsoid method are given in unary is called pseudopolynomial.
E. Elkind et al. / Coalition Structures in Weighted Voting Games
provided that they have a polynomial-time separation oracle. A separation oracle is an algorithm that, given an alleged feasible solution, checks whether it is indeed feasible, and if not, outputs a violated constraint [12]. In our case, such an oracle will have to verify whether a given vector p violates one of the constraints in (1): It is straightforward to verify whether all pi are non-negative, and whether the payment to each winning coalition in CS is 1 and the payment to each losing coalition in CS is 0. If any of these constraints is violated, our separation oracle outputs the violated constraint. If this is not the case, we can use the algorithm AInCsCore described in the proof of Theorem 4 to decide whether there exists a winning coalition J such that w(J) ≥ T , p(J) < 1; this algorithm can be easily adapted to return such coalition if one exists. If AInCsCore produces such a coalition, our separation oracle outputs the corresponding violated constraint. If AInCsCore reports that no such coalition exists, then (CS , p) is in the CS-core of G, so we can output p and stop. The algorithm Ap described in the proof of Theorem 5 allows us to check whether a given weighted voting game G has a non-empty CS-core: we can enumerate all coalitional structures in CS(G), and for each of them check whether there is an imputation p, which, combined with the coalition structure under consideration, results in a stable outcome. However, the number of coalition structures in CS(G) is exponential in n, and solving a linear feasibility problem for each of them using the ellipsoid method is prohibitively expensive. We now describe heuristics that can be used to speed up this process. First, observe that we can exclude from consideration coalition structures that contain more than one losing coalition. Indeed, if any such coalition structure is stable, the coalition structure obtained from it by merging all losing coalitions will also be stable. Moreover, we can assume that each winning coalition C in our coalition structure is minimal, i.e., if we delete any element from C, it becomes a losing coalition. The argument is similar to the previous case: if any coalition structure with a non-minimal coalition C is stable, the coalition structure obtained by moving the extraneous element from C to the (unique) losing coalition is also stable. Now, suppose that we have a coalition structure CS = {C0 , C1 , . . . , Ck } such that v(C0 ) = 0 (C0 can be empty), v(Ci ) = 1 for i = 1, . . . , k, and all Ci , i > 0, are minimal. Consider an agent j ∈ Ci , i > 0. If pj > 0 and w(C0 ) ≥ wj , then CS is not stable: the players in C0 ∪ Ci \ {j} can deviate by forming a winning coalition and redistributing the extra payoff of pj between themselves. Set Ci = {j ∈ Ci | wj ≤ w(C0 )}. The argument above shows that the members of the sets Ci get paid 0 under any imputation p such that (CS , p) is stable. Now, set C = ∪i>0 Ci . If w(C ) + w(C0 ) ≥ T , there is no imputation p such that (CS , p) is stable: any such imputation would have to pay 0 to players in C0 and each Ci , but then the players in these sets can jointly deviate and form a winning coalition. Therefore, we can speed up the algorithm in the proof of Theorem 5 as follows: given a coalition structure CS = {C0 , C1 , . . . , Ck }, compute the sets Ci , i = 1, . . . , k, and check whether w(C ) + w(C0 ) ≥ T . If this is indeed the case, there is no imputation p such that (CS , p) is stable. Otherwise, run the algorithm Ap . Clearly, this preprocessing step is very fast (in particular, unlike Ap , it runs in polynomial time even if the weights are large, i.e., given in binary), and in many cases we will be able to reject a candidate coalition structure without having to solve the LFP (which is computationally expensive). We can also try to optimize the order in which we consider the candidate coalition structures. Heuristics for social welfare-maximizing coalition structure generation might be of use here [10, 9].
7
397
Conclusions
In this paper, we extended the model of weighted voting games (WVGs) to allow for the formation of coalition structures, thus permitting more than one coalition to be winning at the same time. We then studied the problem of stability of the resulting structure in such games. Specifically, we introduced CS-core (the core with coalition structures), and discussed its properties by relating it to the traditional concept of the core for WVGs and proving sufficient conditions for its non-emptiness. Following that, we showed that deciding CS-core non-emptiness or checking whether an outcome is in the CS-core are computationally hard problems (unlike what holds in the traditional WVGs setting). However, for specific classes of games, we presented polynomial-time algorithms for checking if a given outcome is in the CS-core, and discovering a CS-core element given a coalition structure. We then suggested heuristics that, combined with these algorithms, can be used to generate an outcome in the CS-core. We believe that the line of work presented here is important: Weighted voting games are well understood, and the addition of coalition structures increases the usability of this intuitive framework in multiagent settings (where weights can represent resources and thresholds do not necessarily exceed 50%). In terms of future work, we intend, first of all, to come up with new heuristics to speed up our algorithms. In addition, notice that the algorithms and heuristics of Sec. 6 provide essentially centralized solutions to their respective problems. Therefore, we are interested in studying decentralized approaches; to begin, we intend to speed up, in the WVGs context, the exponential decentralized coalition formation algorithm of [5]. Finally, studying other solution concepts in this context, such as the Shapley value [8], is also within our intentions. Acknowledgements This research was undertaken as part of the ALADDIN (Autonomous Learning Agents for Decentralised Data and Information Networks) project. ALADDIN is jointly funded by a BAE Systems and EPSRC strategic partnership (EP/C548051/1).
REFERENCES [1] K. Apt and T. Radzik. Stable Partitions in coalitional games, 2006. Working Paper, available at http://arxiv.org/abs/cs.GT/0605132. [2] R.J. Aumann and J.H. Dreze, ‘Cooperative Games with Coalition Structures’, International Journal of Game Theory, 3(4), 217–237, (1974). [3] P. Caillou, S. Aknine, and S. Pinson, ‘Multi-agent models for searching pareto optimal solutions to the problem of forming and dynamic restructuring of coalitions’, in Proc. of ECAI’02, pp. 13 – 17, (2002). [4] G. Chalkiadakis and C. Boutilier, ‘Bayesian Reinforcement Learning for Coalition Formation Under Uncertainty’, in Proc. of AAMAS’04. [5] T. Dieckmann and U. Schwalbe. Dynamic Coalition Formation and the Core, 1998. Economics Dept. Working Paper Series, Nat. Univ. of Ireland - Maynooth. [6] E. Elkind, L.A. Goldberg, P.W. Goldberg, and M. Wooldridge, ‘Computational complexity of weighted threshold games’, in Proc. of AAAI’07. [7] M. Garey and D. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness, W. H. Freeman & Co., N. York, 1990. [8] R. Myerson, Game Theory: Analysis of Conflict, 1991. [9] T. Rahwan, S. Ramchurn, A. Giovannucci, V. Dang, and N. R. Jennings, ‘Anytime optimal coalition structure generation’, in Proc. of AAAI’07. [10] T. Sandholm, K. Larson, M. Andersson, O. Shehory, and F. Tohme, ‘Anytime coalition structure generation with worst case guarantees’, in Proc. of AAAI’98, (1998). [11] T. Sandholm and V.R. Lesser, ‘Coalitions Among Computationally Bounded Agents’, Artificial Intelligence, 94(1), 99 – 137, (1997). [12] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency, Springer, 2003. [13] A. Taylor and W. Zwicker, Simple Games: Desirability Relations, Trading, Pseudoweightings, Princeton University Press, Princeton, 1999. [14] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior, Princeton University Press, Princeton, 1944.
398
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-398
Agents Preferences in Decentralized Task Allocation Mark Hoogendoorn 1,2 and Maria L. Gini 2 Abstract. The ability to express preferences for specific tasks in multi-agent auctions is an important element for potential users who are considering to use such auctioning systems. This paper presents an approach to make such preferences explicit and to use these preferences in bids for reverse combinatorial auctions. Three different types of preference are considered: (1) preferences for particular durations of tasks, (2) preferences for certain time points, and (3) preferences for specific types of tasks. We study empirically the tradeoffs between the quality of the solutions obtained and the use of preferences in the bidding process, focusing on effects such as increased execution time. We use both synthetic data as well as real data from a logistics company.
1 Introduction Auctions are used in multi-agent systems, among other things, to perform allocation of tasks (see e .g. [13] and [14]). Such reverse auctions, where the buyer is the auctioneer, can be of a combinatorial type, allowing for bidding on bundles of tasks. Sandholm [12] notes that reverse auctions are not economically efficient because optimal bundling depends on suppliers preferences, which traditionally cannot be expressed. Enabling the agents to express the preferences of their users is an important requirement for actual companies and people to use agents for bidding. In this paper we propose a concrete preference function to be used by an agent to express preferences over tasks. This function expresses preferences for specific properties of tasks and it is used in a decentralized task allocation setting. We introduce a bidding algorithm, where an agent bids on its most preferred tasks that are feasible given its current commitments. This algorithm uses a pricing mechanism which depends on the actual cost to perform the tasks and on the preference for the task. The influence of preferences on the price can be varied by setting a parameter (look at the role of the parameter p in the algorithm in Section 3.5). Using this algorithm, we investigate the impact of preferences upon other aspects of task execution, such as execution time. We use both synthetic as well as real data from a logistics company. This paper is organized as follows. First, the auctioning system used throughout the paper is introduced in Section 2. Section 3 introduces a function to express preferences and a bidding algorithm based upon such preferences. Experiments to evaluate the bidding algorithm and to study the trade-off between preferences and efficiency of task execution are presented in Section 4. Section 5 discusses related work, and finally, Section 6 concludes the paper. 1
2
Vrije Universiteit Amsterdam, Department of Artifi cial Intelligence, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands, email: mhoogen@cs.vu.nl University of Minnesota, Department of Computer Science and Engineering, 200 Union Street SE, Minneapolis, MN 55455, United States of America, email: gini@cs.umn.edu
2 The MAGNET System The approach we present exploits some unique features of the MAGNET [4] system that allows autonomous agents to negotiate over coordinated tasks with precedence and time constraints. The MAGNET system consists of: (1) a customer agent, which puts tasks up for auction. The tasks have time constraints and other restrictions; (2) suppliers agents, which bid on the tasks and execute them if awarded; and (3) the MAGNET market server, which keeps track of the activities of the agents and of the auctions. The main interactions between agents in the MAGNET system are as follows: • A customer agent issues a Request for Quotes (RFQ) which specifies the tasks, their precedence relations, and a time line for the bidding process. For each task, a time window specifies the earliest time the task can start and the latest time the task can end. • Supplier agents submit bids. A bid includes one or more tasks, a price, the portion of the price to be paid as a non-refundable deposit, and the estimated duration and time window for task execution. Bids reflect supplier resource availability and constrain the customer’s scheduling process. • The customer agent decides which bids to accept. Each task needs to be mapped to one bid and the constraints of all awarded bids must be satisfied in the final schedule. In MAGNET the customer can chose from a collection of winner-determination algorithms (A*, IDA* [2], simulated annealing, and integer programming [3]). • The customer agent awards bids and specifies the work schedule.
3 Preference Algorithm In the bidding algorithm we propose, price is used as a mechanism to express preferences for tasks. Preferences in our case can be a combination of the following: (1) a preference for tasks of a particular duration (e.g. I hate performing very short tasks), (2) a preference for tasks at particular times during the day (e.g. I love getting up early in the morning, so give me tasks that ought to start early in the morning), and (3) a preference for particular types of tasks (e.g. I really hate to perform a task like that). We show how to express these preferences and how to combine them. The preference for a task is referred to as φtask , which we express using a real number in the interval [0,1]. Hereby, 21 indicates a neutral preference, 0 is not preferred, and 1 is fully preferred. Since humans typically do not think in terms of a number when specifying preferences, we provide for each of the preference types covered a more intuitive formulation, as explained next. The specifics of how preferences are computed could be adapted for different domains, while keeping the approach.
M. Hoogendoorn and M.L. Gini / Agents Preferences in Decentralized Task Allocation
399
3.1 Preferences for Duration
3.4 Combining Preferences
Let the preference to perform tasks of a certain duration be an integer. Such an integer can indicate either a minimum or a maximum duration (i.e. dmin , dmax ). Let dmin be the minimum duration you want a task to last, i.e. you want the task to last longer than dmin . Durations below dmin are not preferred. If the duration is precisely dmin your preference is 21 , i.e. neutral. Let dclose be an integer that indicates how longer than dmin you want the task to last to be fully preferred. Tasks with duration in the range [dmin , dmin +dclose ] are more preferred than neutral, but not fully preferred. Any duration longer than dmin +dclose is fully preferred. Then the preference φduration of a task with duration dtask can be calculated as follows:
The preferences specified above are usually combined. We use a weighted sum of the preferences, setting the weight to 0 if a preference is not expressed.
• if there is a preference for minimum duration dmin : −dmin 1 , 2) dtask ≥ dmin : φduration,task = 12 + min( 12 × dtask d close
- 12 , 0) dtask < dmin : φduration,task = max( ddtask min • if there is a preference for maximum duration dmax : −dtask ), 12 ) dtask ≤ dmax : φduration,task = 12 + min( 12 × ( dmax dclose dmax 1 dtask > dmax : φduration,task = max( dtask - 2 , 0)
3.2 Preferences for Time Points Let the preference for particular time points be indicated by a time of the day (e.g. 6.30 a.m.). Such a preference can indicate that the time needs to be before a particular time point tbef ore , or after a time point taf ter . Let tclose indicate a time which is considered close to a particular time point. Again, the preference for a task which is precisely at the specified time point tbef ore or taf ter is 21 , i.e. preference neutral. The preference for a given start time ttask can now be calculated as follows (note that for calculations using time points these are represented in seconds of the day): • if a preference has been set for a task time before tbef ore : t −ttask ), 12 ) ttask ≤ tbef ore : φtime,task = 12 + min( 12 × ( bef ore t t
close
ore - 12 , 0) ttask > tbef ore : φtime,task = max( bef ttask • if a preference has been set for a task time after taf ter : t −taf ter 1 , 2) ttask ≥ taf ter : φtime,task = 12 + min( 12 × tasktclose ttask 1 ttask < taf ter : φtime,task = max( t - 2 , 0) af ter
3.3 Preferences for Tasks The last way to express preferences is for particular types of tasks. Let typetask be the type for a given task. The type of a task is specified by means of a certain range of integers, whereby integers are ordered based upon similarity of the tasks. For example, if the tasks are represented on the interval [0, 100], then the task identified with 1 is completely different from the task identified with 100, but has great similarity with the task identified with 2. Let the preferred tasks include a certain range of tasks [typelower , typeupper ]. Furthermore, let typeclose be an integer that expresses when a task is close to another task. The preference is calculated as follows: • if (typelower ≤ typetask ) ∧ (typeupper ≥ typetask ) φtype,task = 1; • if typelower > typetask typeclose φtype,task = max( typelower , 0) −typetask • if typeupper < typetask typeclose φtype,task = max( typetask , 0) −typeupper
φtask = wduration × φduration,task + wtime × φtime,task + wtype × φtype,task , where wduration + wtime + wtype = 1
3.5 Bidding Algorithm with Preference for Tasks We assume that the supplier agent owns a single resource with a particular capability (with which, of course, a number of different task types can be performed, as explained earlier). Furthermore, the resource has an availability slot (i.e. a begin and end time) as well as a particular typebegin when the resource is initially setup and an end typeend at which the use of the resource needs to end. The supplier agent maintains a schedule of the tasks planned for its resource. We now present a bidding algorithm that takes preference values φtask into account. The algorithm is a greedy algorithm, supplier agents try to bid upon as many tasks as feasible to maximize the usage of their resource. The algorithm uses a parameter, p, to vary the influence of the preference upon the eventual price bid. The tasks within an RFQ are first ordered based upon their preference. If some tasks have identical preferences, they are ordered according to the earliest start time specified in the RFQ for the tasks included. We assume that there exists a function switch time: TASK T YPE × TASK T YPE → D URATION that calculates the switching time from one task type to another (when it can be performed on the resource). Furthermore, performance time: TASK T YPE → D URATION expresses the time needed to perform the task. Bidding Algorithm For the bidding algorithm, let latest end timeprevious be the latest end time of the previous task in the current schedule of the resource (or the schedule start time in case no such task exists). Let typeprevious be the type of the previous task (or the start type in case of no prior task), latest start timenext be the latest start time of the next task (or the schedule end time in case no such task exists), and typenext be the type of the next task (or the schedule end time in case no such task exists). For each preference ordered task: Check if task (current) can be done using the resource. If yes, see if it fits in the current schedule (see below). From the beginning of the schedule and for each empty slot in the schedule do: If the task fits in the current empty slot in the schedule then insert the task in the bid, add its time parameters to the schedule, and compute the price of the bid (see below) else if latest end timecurrent > latest end timenext then continue with the next slot else continue with the next task. To see if the task fits in the schedule, check if the following holds: [(latest end timeprevious + switch time(typeprevious , typecurrent )) ≤ latest start timecurrent ] ∧ [(latest start timenext - switch time(typecurrent , typenext ) performance time(typecurrent )) ≥ earliest start timecurrent ] ∧
400
M. Hoogendoorn and M.L. Gini / Agents Preferences in Decentralized Task Allocation
[(latest end timeprevious + switch time(typeprevious , typecurrent )) remains constant over time, so that the influence of the parameter p is the only variation regarding the preference function. ≤ (latest start timenext - switch time(typecurrent , typenext ) We used several variations of the difficulty of the task allocation - performance time(typecurrent ))] throughout the experiments. In particular we considered a market where more than sufficient resources are available (overflow) versus a The price of the bid is computed as follows (note the parameter p): market where resources are insufficient (shortage). Furthermore, the pricetask = tightness of the time windows was varied by either setting them very (1 + (p × (1 − φtask )))× tight or setting them wide. More precisely, the following parameters [switch time(typeprevious , typecurrent ) + have been used to affect tightness of tasks: switch time(typecurrent, typenext , ) + perf ormance time(typecurrent)] 1. the number of tasks was fixed to 10. We have shown earlier how to calculate the value of φtask for the 2. the number of resources available varied between 12 (tight mardifferent type of preferences. This price equation assumes a certain ket) and 50 (plenty of resources available). standard price for each minute of time spent. In case these costs vary, 3. the ratio between the required resources to perform tasks and the the cost per minute can be included as an additional parameter. availability of those resources was fixed. We had three types of resources, each generated with an equal probability. The number of different tasks per resource was set to 9999. The maximum time 4 Experimental Setup to change from one task to another was set to 100 minutes. Task We now describe the effect of adjusting the parameter p in the bidtypes were generated in a random fashion with an equal probabilding algorithm defined above. Furthermore, we study the effect of ity as well. the preferences on the duration of task execution, which is an indica4. The tightness of the time windows specified in the tasks was varied tor of how efficiently the tasks are being performed. Together these between just sufficient time to perform the task to twice the time form the utility function of the suppliers. Of course it is expected that needed plus two full hours. having more preferences awarded will result in a less efficient execution. We are interested in assessing the severeness of these effects. The parameter setting for the preferences are set so that initially We performed experiments using synthetic data, and experiments usthe preference for tasks is around 60%, equally divided over the difing a real dataset obtained from a trucking company. ferent preferences. Each of the agents is assigned one type of preference at random. The parameter p varied between 0 and 5.
4.1 Experimental Setup with Synthetic Data We start by describing the parameters in the setup with synthetic data, and specify the actual settings used. There are many parameters that can influence the results. Many of them influence the difficulty of the task allocation problem in general. These include:
4.1.2 Results 1
The preference value itself is influenced by other parameters, including the following:
Average preference for awarded tasks
0.9
1. The number of tasks to be allocated. 2. The number of resources available. 3. The ratio between the resources required to perform the tasks and the availability of those resources (e.g. one resource might be more scarce than another). This also includes the specification of duration of tasks, switching time, and initial resource settings. 4. The tightness of the time windows specified in the tasks. Wider time windows allow more flexible scheduling of tasks, therefore finding a solution is easier.
0.8
0.7
0.6
0.5
0.4
0.2
1. The parameter setting for the preference functions (e.g. what is considered to be a close by task, the stricter this norm is, the more easily preferences can be met). 2. The variation of tasks that exist (i.e. more variation means that it will be easier to get your preferences met). Finally, other parameter settings can be varied, such as the number of iterations, and the value of the parameter p, which is used in the bidding algorithm to determine the influence of preferences on price.
4.1.1 Parameter Settings Used We set the parameters of the preference functions and the variety of tasks to fixed values. This means that the preference function itself
Shortage, Wide Overflow, Wide Shortage, Narrow Overflow, Narrow
0.3
0
0.5
1
1.5
2
2.5 Setting of p
3
3.5
4
4.5
5
Figure 1. Preferences met for varying values of p
Figure 1 shows the average preferences for tasks with varying p for the different market types and time window settings. As can be seen, the easiest way to get the preferences met is the overflow market with wide time windows. The most difficult is the shortage market with tight time windows. The curves of the shortage market are less steep compared to the overflow market. The influence of the time windows on the average preference value is that the curve is basically lower by a certain constant value. The shape of the curve does not change for varying time window settings (i.e. in both the shortage market and
401
M. Hoogendoorn and M.L. Gini / Agents Preferences in Decentralized Task Allocation
mentioned for the synthetic data. Finally, the preference for type of tasks is the average of the three different integers included in the task description (i.e. pickup, delivery, and return location).
18
16
Shortage, Wide Overflow, Wide Shortage, Narrow Overflow, Narrow
14
4.2.1 Results
10
1
8 0.95
6
4
2
0 0.4
0.5
0.6 0.7 Preferences Met
0.8
0.9
1
Average preference for awarded tasks
Total duarion increase (%)
12
0.9
0.85
0.8
Figure 2. Preferences met versus increase in duration 0.75
4.2 Trucking data Besides synthetic data, we tested our approach using a real company dataset from the trucking domain. The dataset consists of a number of container transports that need to take place. Tasks require a certain transportation from one zip code (the pickup location) to an intermediate location (the delivery location), ending at a third location (the return location). Therefore, a task description does not consist of one integer specifying the task, as before, but of three integers. Furthermore, each task is associated with a certain early start time and a particular deadline at which the container needs to be returned at the return location. In addition to the containers that require transportation, the dataset also specifies which trucks are available. These can carry one container at a time (so only one type of resource is available), and have a certain availability slot when the truck becomes available, and when the truck needs to be returned. A location is also specified where the truck starts, and where it has to end. This nicely maps to the algorithm specified. The performance time is now defined as the time to go from the pickup to the delivery location, plus the time to go from the delivery location to the return location. The switching time is no longer an artificial time, but it is the actual driving time from one zip code to another. The only artificial data which we have generated are the preferences of the various trucks. This is done according to the method
Trucking dataset 0.7
0
0.1
0.2
0.3
0.4
0.5 Setting of p
0.6
0.7
0.8
0.9
1
Figure 3. Preferences met for trucking dataset, for varying values of p
Figure 3 shows how the value of p affects the average preference for tasks. It can be seen that the value of p required to increase the average preference significantly is much lower than for the random dataset. Furthermore, the limit seems to be comparable with the overflow market with wide time windows. 2.5 Trucking dataset
2
Total duarion increase (%)
the overflow market, the shape of the curve is the same for narrow and wide time windows). Figure 2 shows the average preference for tasks on the x-axis and the increase in the average duration to perform the tasks on the yaxis. This clearly shows the trade-off between preferences awarded and the efficiency of task execution. All curves look similar (an xn type shape) except for the point where the huge increase starts, which varies for the different types of markets. The only exception is the curve of the overflow market with narrow time windows. In this case the results are less stable compared to the other results. The curve with the lowest preference value, after which a steep increase is observed, is the one in the shortage market with narrow time windows. This makes sense because there is hardly any room for allocating tasks to other agents. The curve with the highest point is the overflow market with wide time windows, in which there is plenty of space to express preferences and get them awarded.
1.5
1
0.5
0
−0.5 0.6
0.65
0.7
0.75 0.8 Preferences Met
0.85
0.9
0.95
Figure 4. Average task preference versus duration of performing the tasks
Figure 4 shows the preference value versus the average duration increase, i.e. the trade-off between preferences met, and the efficiency of execution. It can be seen that there is hardly any correlation between the average preference value of the trucks and the average increase in duration. This is of course very good news for the trucking company because this means they can award drivers their preferences without increasing the total driving time. This is assuming that preferences are equally divided amongst the truckers, as in the experimental setup.
402
M. Hoogendoorn and M.L. Gini / Agents Preferences in Decentralized Task Allocation
5 Related Work In the field of combinatorial auctions, a lot of attention has been devoted to finding out the exact preference for particular bundles of tasks (see e.g. [5] and [11]). In general a certain preference for each of the bundles is assumed, but no detail is given on how the bidder comes to such a preference value. In this paper we introduce a preference function that allows for a more intuitive specification of preferences, thereby taking multiple aspects of the tasks into account. Preferences for different aspects of the tasks are combined using a weighted average to produce a single preference value. In research on preference elicitation, typically the impact on selling is addressed, but the precise influence of preferences upon the quality of the solution is not. In this paper, we show how the allocation of tasks in a decentralized fashion directly influences the quality of the solution, and we explore the relationship between the average preference of tasks and the solution quality. In [7] an approach for scheduling a meeting between agents is proposed, which takes into account the preferences of the agents. The relationship between such preferences and the quality of the solution is addressed, but the problem is not studied from the perspective of combinatorial auctions. Task allocation can also be performed from a centralized perspective, using preferences as soft constraints. See, for example, [9] for an approach to consider preferences in decision making. There are decentralized variants of constraint optimization, but the agents in our case are not necessarily cooperative. In the field of planning and scheduling preferences have been considered as well. Languages have been developed that allow for the specification of preferences and soft constraints (see e.g. [8]). The logistic domain we use for our experiments has been researched for quite some time (see e.g. [10]), mainly focusing on calculating optimal solutions from a centralized perspective. For instance, in [6] the problem addressed is to find optimal routes for transportation orders of a large set of users. Orders have to be picked up and delivered at specific locations, within a given time window, and using a limited number of trucks. The solution proposed is centralized, and it is used to support a human dispatcher. The current trend in logistics requires an even more distributed setting because of the use of fourth party logistics (4PL) [1]. 4PL companies sign contracts with large companies to arrange their entire transportation demand. These companies, however, do not have sufficient resources on their own to arrange all these transports and therefore distribute many of those tasks to other (partner) companies. Centralized calculation might no longer be feasible due to lack of complete information (availability of resources is too sensitive for a company to communicate) as well as the complexity of calculating an optimal solution within a short period (time is crucial in the business).
6 Conclusions We have presented an approach to specify preferences for tasks in a combinatorial auction setting. Allowing users to specify such preferences is essential for them to use auctions and to increase the economic efficiency of reverse auctions, as reported, for instance, in [12]. We propose a preference function and use it in a bidding algorithm where bids on non-preferred tasks have a higher price. We evaluated our approach in two ways, first by rigorously testing it with synthetic data. Several parameters have been varied, namely the tightness of the time windows, within a certain schedule), and the relative availability of resources. It was shown that it was easiest to
get preferences awarded in for markets with wide time windows. The trade-off between meeting preferences and overall execution time has been studied in depth. We have shown that the overall execution time is influenced most in the case of the overflow market, due to the fact that in the shortage market there are hardly any alternatives at hand and therefore, although the agent might not prefer a task, it will still get its bid awarded. The curves observed tend to have the same shape when the time window setting changes but the market type remains the same. For different market types, the curves vary in steepness. Besides testing with synthetic data, we have also used a real company dataset from the trucking domain. We have shown that the bidding algorithm is effective in awarding suppliers more preferred tasks. The influence of this preference on the overall solution quality was not observed using the real dataset. Hence, in this setting the preferences being met have much less influence on the efficiency of the solution found. For future work, it would be interesting to find out whether other real datasets would show the same results as the dataset used in this paper. Furthermore, exploring how well the companies can express their preferences using these functions would be interesting as well. Acknowledgments: Partial support is gratefully acknowledged from NSF under grant IIS-0414466.
REFERENCES [1] P. Briggs. The hand-off: the future of outsourced logistics may be found in the latest buzzword [fourth party logistics]. Canadian Transportation Logistics, 102(5):18, 1999. [2] J. Collins, G. Demir, and M. Gini. Bidtree ordering in IDA* combinatorial auction winner-determination with side constraints. In J. Padget, O. Shehory, D. Parkes, N. Sadeh, and W. Walsh, editors, Agent Mediated Electronic Commerce IV, volume LNAI2531, pages 17–33. Springer-Verlag, 2002. [3] J. Collins and M. Gini. An integer programming formulation of the bid evaluation problem for coordinated tasks. In B. Dietrich and R. V. Vohra, editors, Mathematics of the Internet: E-Auction and Markets, volume 127 of IMA Volumes in Mathematics and its Applications, pages 59–74. Springer-Verlag, New York, 2001. [4] J. Collins, W. Ketter, and M. Gini. A multi-agent negotiation testbed for contracting tasks with temporal and precedence constraints. Int’l Journal of Electronic Commerce, 7(1):35–57, 2002. [5] W. Conen and T. Sandholm. Preference elicitation in combinatorial auctions. In Proc. First Int’l Conf. on Autonomous Agents and MultiAgent Systems, volume 1, pages 168–169, Bologna, Italy, July 2002. [6] K. Dorer and M. Calisti. An adaptive solution to dynamic transport optimization. In Proc. Fourth Int’l Conf. on Autonomous Agents and Multi-Agent Systems, pages 45–51, 2005. [7] M. Franzin, E. Freuder, F. Rossi, and R. Wallace. Multi-agent meeting scheduling with preferences: effi ciency, privacy loss, and solution quality. In Proc. of AAAI Workshop on Preference in AI and CP, 2002. [8] A. Gerevini and D. Long. Preferences and soft constraints in PDDL3. In Proc. ICAPS Workshop on Planning with Preferences and Soft Constraints, 2006. [9] R. L. Keeney and H. Raiffa. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, 1976. [10] T. Magnanti. Combinatorial optimization and vehicle fleet planning: Perspectives and prospects. Networks, 11:179–214, 1981. [11] D. C. Parkes. Auction design with costly preference elicitation. Annals of Mathematics and Artificial Intelligence, 44:269–302, 2005. [12] T. Sandholm. Expressive commerce and its application to sourcing: How we conducted $35 billion of generalized combinatorial auctions. AI Magazine, 28(3):45–58, Fall 2007. [13] R. G. Smith. The contract net protocol: High level communication and control in a distributed problem solver. IEEE Trans. Computers, 29(12):1104–1113, December 1980. [14] W. Walsh and M. Wellman. A market protocol for decentralized task allocation and scheduling with hierarchical dependencies. In Proc. of 3th Int’l Conf on Multi-Agent Systems, 1998.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-403
403
Game Theoretical Insights in Strategic Patrolling: Model and Algorithm in Normal-Form Nicola Gatti Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy ngatti@elet.polimi.it Abstract. In artificial intelligence literature there is a rising interest in studying strategic interaction situations. In these situations a number of rational agents act strategically, being in competition, and their analysis is carried out by employing game theoretical tools. One of the most challenging strategic interaction situation is the strategic patrolling: a guard patrols a number of houses in the attempt to catch a rob, which, in its turn, chooses a house to rob in the attempt to be not catched by the guard. Our contribution in this paper is twofold. Firstly, we provide a critique concerning the models presented in literature and we propose a model that is game theoretical satisfactory. Secondly, by exploit the game theoretical analysis to design a solving algorithm more efficient than state-of-the-art’s ones.
1
Introduction
The study of strategic interaction situations, commonly named noncooperative games, has been receiving more and more attention in artificial intelligence literature [7]. For instance, the problem of automating agents in negotiations [4] and in auctions [7] is usually modeled as a strategic interaction problem. Commonly, strategic interaction situations are tackled by employing game theoretical tools [3], in which one distinguishes the mechanism (i.e., the rules according which agents interact) from the strategies (i.e., the behaviors of the agents in the game). Given a mechanism, rational agents should behave in order to maximize their revenue. An interesting open strategic interaction problem is the strategic patrolling [8, 9]. This problem is characterized by a guard that decides what houses to patrol and how often and by a robber that decides what house to strike. Obviously, the guard will not know in advance exactly where the robber will choose to strike. Moreover, the guard does not known with certainty what adversary it is facing. A common approach for choosing a strategy for agents in such a scenario is to model the scenario as a Bayesian game [3]. A Bayesian game is a game in which agents may belong to one or more types; the type of an agent determines its payoffs. The probability distribution over agents’ types is common knowledge. The appropriate solution concept for these games is the Bayes-Nash equilibrium [3]. In [8] the authors propose a model for the strategic patrolling and an algorithm to solve it. Exactly they model the situation as a Bayesian game. The guard’s actions are all the possible routes of houses, while the robber’s action is the choice of a single house to rob. The robber can be of several types with a given probability distribution. Moreover, the rob can observe the actions undertaken by the guard and choose its optimal action on the basis of this observation. We show in this paper that the model proposed in [8] is not game
theoretically satisfactory. Indeed, we show that such model does not effectively capture the possibility available to the rob to observe the actions undertaken by the guard. A further issue risen by [8] concerns time required to compute solutions. Although the algorithm proposed by the authors finds solutions that are computationally less hard than Bayes-Nash, the computation of a solution does not result affordable even in very simple settings. This paper provides two original contributions. The first contribution concerns the design of a strategic interaction model for the strategic patrolling that is game theoretically satisfactory. Precisely, we provide a critique to the model presented in [8], showing why it is not satisfactory in real-word settings. Subsequently, we provide a satisfactory Bayesian game model. The second contribution concerns the design of an efficient solving algorithm. Algorithmic game theory literature provides a number of on-the-shelf algorithms able to solve a large class of games [7]. However, these algorithms have exponential complexity in the worst-case and cannot address realworld settings. The exploitation of game theoretical analysis can lead to improve the efficiency of the solving algorithms and therefore to address real-world problems. This approach, although it is very preliminary, has been successfully followed in [2, 4], where the authors provide efficient algorithms for bargaining situations. The contribution of game theoretical analysis in the design of efficient algorithms can be twofold. Firstly, game theoretical analysis can be employed to reduce the space of search, e.g. by excluding all the strategy profiles that can be assured to be not of equilibrium independently of the parameters of the game. Secondly, it can be employed to “guide” the searching algorithm, e.g. by choosing specific orders over the strategy profiles according which the algorithm searches for the equilibrium [10]. In this paper we exploit game theoretical analysis to the limited extent of the first issue: the reduction of the space of search. We propose an algorithm much more efficient then on-the-shelf ones, its space of search being dramatically reduced with respect to the one considered by these algorithms. However, the space of search of the proposed algorithm raises exponentially in the size of the problem and therefore the algorithm needs to be improved by considering also the second issue concerning game theoretical analysis: the exploitation of information to efficiently guide the search. This second issue will be considered in future works. This paper is structured as follows. The next section reviews the strategic patrolling model presented in [8] and provides a critique to it. Section 3 proposes a satisfactory game model for the considered situation. Section 4 provides some game theoretical insights concerning the proposed model and Section 5 exploits these to design a solving algorithm. Section 6 closes the paper.
404
2
N. Gatti / Game Theoretical Insights in Strategic Patrolling: Model and Algorithm in Normal-Form
Basic Strategic Patrolling Model and Critique
We briefly review the model proposed in [8]. The strategic situation to be considered is constituted by m houses, denoted by 1, . . . , m, and two agents: a guard, denoted by g, and a robber, denoted by r. Essentially, g chooses a patrolling strategy, i.e. a route of houses, in the attempt to catch r, which, in its turn, chooses the house to rob in the attempt to be not caught by g. For the sake of simplicity, the following assumptions are commonly made: • time is discretized in turns; • g takes one turn to patrol one house independently of the patrolled house; • r takes d turns to rob one house independently of the robbed house; • the time needed by g to move between two houses is negligible. Agents act simultaneously and their available actions are: g : it can choose a route of d houses to patrol, e.g. 1, 2, 3, . . .; r : it can choose one house to rob, e.g. 1. Possible outcomes are the following: if the house chosen by r is within the route chosen by g, then g catches r; otherwise r robs the house. Players’ preferences over the outcomes are expressed by the following payoffs: g : it assigns the outcome wherein r is caught an evaluation x0 and assigns each outcome wherein house i is robbed an evaluation xi . If g catches r, then g’s payoff is x0 , otherwise its payoff is xi where i is the robbed house. Customarily, it is assumed that x0 > max{xi } with i > 0; r : it assigns each house i an evaluation y i and assigns its catch an evaluation y 0 . If r is caught by g, then r’s payoff is y 0 , otherwise its payoff is y i where i is the robbed house. Customarily, it is assumed that y 0 < min{y i } with i > 0. Finally, it is assumed that g’s preferences are common knowledge, while r’s ones not. Precisely, it is assumed that r can be of n types with a given probability distribution. We denote type i of r by ri . According to Harsanyi such a game is casted into an imperfect information game wherein nature, denoted by N, chooses initially the type of r and g does not perfectly know which game is playing [3]. An example with m = 2, n = 2, and d = 1 is depicted in Fig. 1 Nb H HHωr2 H pr p p p p p p p p p gp p p p p p p p H pH p r @ 2 @ 2 1 1 @ @ pr p p p p r1p p p p @ p r pr p p p p r2p p p p @ p r L L L L 1 1 1 1 L2 L2 L2 L2 L L L L r Lr r Lr r Lr r Lr ωr 1
x0 , yr01 x2 , yr21 x1 , yr11 x0 , yr01 x0 , yr02 x2 , yr22 x1 , yr12 x0 , yr02
Figure 1.
Game tree with two houses, denoted by 1 and 2, and with d = 1.
The appropriate solution concept for a game such the one we are dealing with is the Bayes-Nash equilibrium [3]. It prescribes one strategy σg∗ for g and one, generally different, strategy σr∗i for each ri . The peculiarity of this solution concept is that g maximizes its
expected payoff according to its beliefs, i.e. the probability distribution over r’s types. It can be showed – we omit the pertinent proof for reason of space – that agents’ equilibrium strategies prescribe that g randomizes over all the possible routes wherein houses are patrolled only one time, e.g. with d = 2 all routes i, j such that i = j. It can be also showed that, in order for a strategy profile to be an equilibrium, at least one r’s type must randomize. The above model is satisfactory when g and r act simultaneously. However, in real-world applications it is unreasonable to assume that r always acts at the turn where g starts to patrol. This is essentially due to two reasons. Firstly, g cannot synchronize the beginning of its patrolling route with r’s action, since g cannot observe r. Secondly, r could wait for one or more turns before choosing the house to rob in order to observe g’s strategy and take advantage from this observation. Thus, there is a discrepancy between the situation captured by the above model, i.e. r cannot make anything but choosing the house to rob, and the real-world situation, i.e. r can wait for some turns observing g’s strategy. This discrepancy must be carefully studied in order to evaluate the effectiveness of the above model. Exactly, we need to verify whether in real-world situations r violates the protocol prescribed by the above model. Technically speaking, we need to verify whether r can improve its revenue by waiting. In the affirmative case, r will wait, violating thus the protocol, and then the above model will be not satisfactory. In what follows we show that on the equilibrium path r waits. At first, if r waits for one or more turns, the game could close after d turns. However, the above model captures a strategic situation d turns long and does not prescribe how g behaves after t = d. Since we are limiting our analysis to the above model, we can only assume that g repeats its equilibrium strategies at every d turns. With this extended model it can be showed that r can improve its expected utility by waiting for one or more turns in order to partially observe the route of g and exploiting this information to choose its strategy. (This is essentially because the model does not perfectly captures the situation we are considering: it implicitly assumes that r can enter the house to rob only at every d turns, meanwhile r could enter it at every turn.) We report an example. Consider a setting with three houses, d = 2, one r’s type, and y i = y j = y H for all i, j > 0. Call αij the probability prescribed by σg∗ to make the route i, j. It can be easily showed that αij = 16 for all i = j with i, j > 0. If r immediately enters the house to rob, playing at the initial turn, its optimal strategy is to randomize with probability 13 among the three houses and its expected utility is 26 y H + 46 y 0 . If r waits for one turn to observe g’s action, it can improve its expected utility. Precisely, if r has observed that g has patrolled the house i in the first turn, then r’s expected utility of choosing house i in the second turn is 46 y H + 26 y 0 that, being y H > y 0 , is strictly greater than r’s expected utility of robbing at the initial turn. Therefore, r will wait for one turn rather than to rob a house immediately.
3
The Proposed Normal-Form Model
The failure of the model previously described is due to the neglecting of the possibility that r can wait: since in real-world situations r can improve its revenue by waiting, then it will violate the protocol. To overcome the drawbacks of this model, we must take into account the real-world possibility that r can wait for one or more turns. Two routes can be followed: 1. we cast the game into an extensive-form game and we explicitly take into account the action wait for r by introducing it at every decision node of r;
N. Gatti / Game Theoretical Insights in Strategic Patrolling: Model and Algorithm in Normal-Form
2. we develop a normal-form game wherein the action wait is not explicitly taken into account, but, when the game is played in realworld situations, r cannot improve its expected utility by waiting. In this paper we limit our study to normal-form models for patrolling and therefore we follow the second route. The development of an appropriate extensive-form game will be explored in future works. The model we propose is simple. We initially describe it and subsequently we discuss why it is satisfactory. The model prescribes that g and r act simultaneously. The actions available to r are the same ones in the model presented in the previous section. The actions available to g are the following:
405
Anyway, if r waits for one or more turns, it can be easily observed that r’s expected utility and g’s one do not change. Then this possibility does not affect the employment of the model in real-world situations. Consider g. The proposed model requires that g employs the same strategy at every turn, but in real-world situations it can employ different strategies at different turns. Anyway, g cannot make anything better than employing the same strategy at every turn, since it has not any information concerning when r acts. It can be showed – we omit the pertinent proof for reason of space – that g’s optimal strategy in the proposed model is consistent to g’s optimal strategy in the extensive-form game wherein the r’s action wait is explicitly taken into account.
g: it chooses a house to patrol among all the possible ones. The strategy of g will be repeated at every turn. For instance, if the strategy chosen by g is to patrol house 1, then g will always patrol it. Practically, on the equilibrium path g’s strategy will be fully mixed and therefore g will randomize over all the houses at every turn with the same probability distribution. Agents’ payoffs are exactly the same ones we defined in the previous section. We provide agents’ expected utilities since they will be fundamental in the analysis we carried out in the next section. The expected utility of r can be easily calculated. Precisely, called αi the probability prescribed by σg∗ to patrol house i, the expected utility for rj of robbing house i is EUrj (i) = (1 − αi )d · yri j + (1 − (1 − αi )d ) · yr0j . Essentially, it is the convex combination between yri j , i.e. the rj ’s evaluation of house i, and yr0j , i.e. the evaluation of rj ’s caught, where the parameter of the convex combination is (1 − αi )d , i.e. the probability that g will never patrol house i for d turns. The calculation of g’s expected utility is more complicated. We give it by degrees. Suppose initially that r can be only of one type. Called β i the probability prescribed by σr∗ to rob house i and supposed that g will follow a mixed strategy based on probabilities αl s for the next d − 1 turns, the expected utility for g of patrolling house j at the current turn is: EUg (j) =
h
m X
i
i
i d−1
x · β · (1 − α )
i=1,i=j
0 0
»
m X
+ x · @β + j
i
+
„ i
β ·
1 «– “ ” i d−1 A 1− 1−α
i=j,i=j
Essentially, EUg (j) gives the expected utility to choose house j at the current turn given that g will employ a mixed strategy from turn t = 1 to turn t = d. Suppose now that r can be of different types. The formula of EUg (j) is defined as a weighted sum of a number of terms. The weights are the types’ probabilities. The terms to sum are defined exactly as in the previous formula of EUg (j) and refer to the single types. The formula of EUg (j) is: EUg (j) =
n X
" ωrk ·
k=1 0
+x ·
j k
βr +
m X i=1,i=j m X i=1,i=j
h i i i i d−1 x · βr · (1 − α ) + k
» „ «– !!# “ ” i i d−1 βr · 1 − 1 − α k
Now we produce some considerations concerning the proposed model. Precisely, we need to verify whether in real-world situations agents will violate the protocol prescribed by the proposed model. Consider r. The proposed model does not take into account the possibility that r can wait for some turns, but in real-world setting it can.
4
Game Theoretical Insights
A game such as the one we are dealing with can be solved by employing on-the-shelf algorithms. Specifically, such a problem can be casted into a linear-complementarity problem and then solved by employing the Lemke-Howson’s algorithm [6]. However, the computational complexity of the Lemke-Howson’s algorithm is exponential in the size of the problem, i.e. the number m of houses and the number n of r’s types. Practically, the production of exact solutions in real-world situations is not affordable and the computation of approximate solutions for very simple problems requires long time, e.g. more than 30 minutes with m = 3 and n = 7 [8]. The drawbacks related to on-the-shelf algorithms are due to the principle on which they are based: they search for an equilibrium strategy profile among all the possible ones neglecting any information concerning the specific problem to solve. Since the space composed of all the possible strategy profiles raises exponentially in the size of the problem, the search is inefficient also with very simple problems. This makes the study of real-world strategic situations by employing on-shelf-algorithm unaffordable. A route to follow to solve more efficiently strategic situations is to exploit game theoretical analysis. Precisely, the game theoretical analysis allows one to derive insights concerning regularities and singularities of the problem that can be employed to reduce the space of strategy profiles among which the algorithm searches for the equilibrium. Examples of similar works can be found in [1, 2, 4]. In what follows we game theoretically analyze the proposed game in the attempt to produce several insights to employ in the design of a solving algorithm more efficient than the state-of-the-art. Considering g’s strategies, we can state the following lemma. Lemma 4.1 On the equilibrium path g’s strategy cannot be pure. Proof. The proof is by contradiction. Assume σ ∗ to be an equilibrium strategy profile wherein g’s strategy is pure. On the equilibrium path every r’s types believes that g employs a pure strategies choosing a specific house to patrol (say house i). On the basis of these beliefs, since any r’s type strictly prefers not to be caught rather than to be caught, no r’s type will choose house i. On the basis of this fact, g can improve its expected utility by patrolling a house different from i. We reach a contradiction and then σ ∗ is not an equilibrium. 2 We can state the following lemma, whose proof is omitted being similar to the proof of Lemma 3.1 but much longer. Lemma 4.2 On the equilibrium path g’s strategy prescribes that every house can be patrolled with a strictly positive probability. Considering strategies of r’s types, we can state the following lemma. Lemma 4.3 On the equilibrium path at least one r’s type employs a mixed strategy.
406
N. Gatti / Game Theoretical Insights in Strategic Patrolling: Model and Algorithm in Normal-Form
Proof. The proof is by contradiction. Assume σ ∗ to be an equilibrium strategy profile wherein the strategy of all r’s types is pure. It can be easily showed that g’s optimal strategy is a unique action, expect for a null-measure subset [5] of the space of the parameters. However, by Lemma 3.1, there is not any equilibrium wherein g’s strategy is pure. We reach a contradiction and therefore σ ∗ is not an equilibrium. 2
5
Improving Solving Algorithm Efficiency
In this section we show how the previous three lemmas can be employed to reduce the space of the strategy profiles among which one can search for an equilibrium. Precisely, we can exclude a large number of strategy profiles that we can assure to be not of equilibrium independently of the values of the agents’ parameters, e.g. x0 and x1 . Although the proposed algorithm searches within a space of strategy profiles dramatically reduced with respect the state-of-the-art’s one, this space raises exponentially in the size of the problem. Therefore, in order to tackle real-world problems, the proposed algorithm must be improved by introducing heuristics that efficiently guide the search. We will discuss this topic in future works. On the basis of Lemma 3.2 every equilibrium strategy profile for the game we are dealing with is characterized by αi ∈ (0, 1) for any i ∈ {1, . . . , m}. Since these variables are bound by the equation P m i i=1 α = 1, the number of free variables related to g’s strategy is m − 1. Furthermore, on the basis of Lemma 3.3 we know that every equilibrium strategy profile is characterized by at least one r’s type that randomizes. The exact number of r’s types that randomize and the number of actions over which each specific randomizing type randomizes in an equilibrium strategy profile can be determined by studying the pertinent solving equation sets and by excluding singularities of these. For the sake of clarity, we study the possible randomizations of r’s types by degrees: at first when the number of r’s types is one and subsequently when r’s types are more than one.
5.1
The Base Case: One Robber’s Type
We consider the situation in which the number of r’s types is one. As customarily in game theory, in a two-player game, the randomization probabilities related to each player are computed in such a way the other player can effectively randomize, i.e. every action over which a player randomizes gives it the same expected utility and no other action gives it more than randomizing. In the game we are studying, the randomization probabilities of g will be computed in such a way the actions over which r randomizes give r the same expected utility and vice versa. Technically speaking, we have two equation sets: the first one, say Φg , wherein the variables are the randomization probabilities of g, i.e. αi s, and the equations are of the form EUr (i) = EUr (j) for all actions i, j over which r randomizes, the second one, say Φr , wherein the variables are the randomization probabilities of r, i.e. β i s, and the equations are of the form EUg (i) = EUg (j) for all actions i, j over which g randomizes. On the basis of Lemma 3.2, we know that equation set Φg is characterized by m − 1 variables and that equation set Φr is characterized by m − 1 independent equations. We need to find the number of actions over which r randomizes in order to have two well-defined equation sets. Since, when r randomizes over m actions, m − 1 variables are introduced in equation set Φr and m − 1 equations are introduced in equation set Φg , the appropriate number of actions over which r randomizes on the equilibrium path is m. Notice that, if r randomizes over a number of actions lower than m, then equation set Φr would
present a number of variables lower than the number of equations and therefore it does not admit any solution. Easily, since at the equilibrium both g and r randomize over all the possible actions and since Φg and Φr , being linear equation sets, admit a unique solution, then the game admits a unique equilibrium strategy. In this equilibrium αi s, β j s∈ (0, 1). Since agents’ equilibrium strategies can be provided in closed form, no search is needed. By imposing EUr (i) = EUr (j) for any i, j ∈ {1, . . .q , m}, we 0
, can calculate the values of αi . Exactly, called γ(i, j) = d yyj −y −y 0 Pm 1 + j=1 [γ(i, j) − 1] Pm . by trivial mathematics we obtain: αi = j=1 [γ(i, j)] By imposing EUg (i) = EUg (j) for any i, j ∈ {1, . . . , m}, we can calculate the values of β i . Exactly, called ε(i, j) = i
(x0 −xi )·(1−αi )d−1 , (x0 −xj )·(1−αj )d−1
by trivial mathematics we obtain: β i
=
1 Pm . j=1 [ε(i, j)]
5.2
The General Case: More Robber’s Types
We consider the situation in which the number of r’s types can be any. The analysis is similar to the basic case, but it is more complicated. Meanwhile with a unique r’s type there is a unique possible set of actions over which r can randomize that makes the above equation sets well-defined, i.e. all the m houses, with more r’s types it does not, e.g. r1 could randomize over m−3 houses and r2 over 3 houses. Furthermore, among all the possible ways with which r’s types can randomize that makes the pertinent equation sets well-defined, only one leads to an equilibrium. We need therefore to search for this. At first, we characterize the strategy profiles with respect to the actions over which r’s randomizing types randomize (we exclude all r’s types that do not randomize). We use a n × m binary matrix Rr where the rows denote the r’s types and the columns denote the houses, i.e. z 2 1 0 Rr = 40 0
m houses }| 1 ... 0 0 ... 1 0 ... 0 0 ... 0
{ 39 0 = 15 0 ; n types 0
Precisely, the meaning of Rr is the following: Rr (i, j) = 1 means that ri randomizes over house j, while Rr (i, j) = 0 means that ri does not randomize over house j. Notice that, inP order for Rr to be well-defined, the following constraint must hold: m j=1 Rr (i, j) = 1 for any i ∈ {1, . . . , n} (i.e., a randomizing agent must randomize over two actions at least). We call this constraint C1. Given a matrix Rr we can build equation set Φg for the calculation of g’s randomization probabilities. Trivially, in order for Φg to be well-defined, two properties must hold: Φg must be composed of m − 1 independent equations and all αi s must be present in Φg . These two properties can be translated into the following two constraints over Rr : Pm Rr (i, j) > 0 (i.e., each C2: for any j ∈ {1, . . . , m} it holds i=1 j variable α must be present in Φ g ), Pn Pm C3: i=1 [max{ j=1 [Rr (i, j)]−1, 0}] = m−1 (i.e., the number of independent equations must be m − 1). Similarly, given a matrix Rr we can also build equation set Φr for the calculation of the randomization probabilities of r’s types. It can showed that, in order for Φr to be well-defined, no further constraint over Rr is needed.
407
N. Gatti / Game Theoretical Insights in Strategic Patrolling: Model and Algorithm in Normal-Form
With m = 3 and n = 2, all the feasible Rr s, i.e. all ones that satisfy C1, C2, C3, are: nh
1 0
1 0
i h , 01 h
i h , 11 i h 1 1 0 0 , 0 1
1 0
1 1
0 1
0 1
0 1
i h , 10 i h 1 0 1 1 , 1 1
1 0
0 1
1 1
0 1
i
i h 1 0 0 , 1
, 1 0
1 1
io
RAM. Experimental results are reported in Tab. 5.3. Although the proposed algorithm is a prominent step ahead with respect to the state-of-the-art, it cannot address real-world settings. For instance, it requires more than one day computation for settings with m = 20 and n = 10. The efficiency of the algorithm can be improved by employing heuristics to order dynamically the feasible Rr s.
Given a matrix Rr , it can be possible to find univocally the values of αi s and βrji s by employing equations similar to the ones employed in the previous section and, subsequently, it is possible to verify whether agents’ strategies computed on the basis of Rr lead or not to an equilibrium. Precisely, we need to verify that: • all αi ∈ (0, 1); • all the βrji s prescribed by Rr belong to (0, 1); • all the randomizing r’s types cannot make anything better than randomizing. Therefore, we can limit the search for an equilibrium to the search for a feasible Rr that leads to an equilibrium. This allows one to dramatically reduce the space of search and reduce thus the time needed to compute a solution. Consider for instance the setting with m = 3, n = 2, and d = 2. The space over which on-the-shelf algorithms search is the set of vertices of a complex 9-polytope, while the space of all the feasible Rr s is composed of eight elements. We report our algorithm in Algorithm 1. Currently, all the feasible Rr s are statically ordered in lexicographic order. Algorithm 1: EQUILIBRIUM
FINDER
1 for all feasible Rr do solve Φg 2 3 if all αi ∈ (0, 1) then 4 calculate optimal strategies of randomizing r’s types on the basis of αi s if no randomizing type deviates from actions in Rr then 5 6 calculate optimal strategies of non-randomizing r’s types on the basis of αi s solve Φr 7 if all βri ∈ (0, 1) then 8 j
9
houses 3 4
6 0.007 0.190
7 0.011 0.352
8 0.017 0.720
types 9 0.024 1.015
10 0.033 1.532
11 0.043 1.852
12 0.055 2.231
Table 1. Average time (in seconds) required by Algorithm 1 for the computation of the equilibrium.
6
Conclusions and Future Works
The strategic patrolling is a challenging problem that has found a lot of attention in artificial intelligence literature. In this paper we consider the principal strategic patrolling model presented in literature. We provide two prominent contributions. At first, we show that the model proposed in the state-of-the-art presents some unsatisfactory issues concerning game theory and subsequently we provide a model that is game theoretically satisfactory. Then, we have analyzed the considered game in order to produce some insights concerning regularities and singularities of the corresponding solving equation sets. These insights have been subsequently employed in the design of a solving algorithm. This algorithm has been showed to be much more efficient than state-of-the-arts’s ones. Our intention is to develop the proposed work along two main directions. The first one concerns the provision of an appropriate extensive-form model for the considered strategic interaction situation. We will study furthermore leadership with commitment to mixed strategies in our model. The second one is more general and concerns the development of a general approach to exploit game theoretical analysis to enable algorithms to afford real-world setting game situations, e.g. by employing genetic algorithms.
return Rr , αi s, and βrj
i
REFERENCES
5.3
Experimental Considerations
We provide a preliminary experimental evaluation of the proposed algorithm. In order to evaluate it, we compare the average time it requires for the computation of an equilibrium with respect the one required by the algorithm proposed in [8]. Since our algorithm is not directly comparable with the algorithm presented in [8], considering a different model, we have modified our algorithm to solve the model present in [8]. The experimental results reported below refer to this modified version of the algorithm. No significant difference in terms of computational time (< 5%) has been found between the application of our algorithm to the model proposed in [8] and the one we present in Section 3. The algorithm proposed in [8], implemented in CPLEX, requires more than 30 minutes to compute approximate solutions for settings with m = 3, n = 7 and settings with m = 4, n = 6. We have implemented our algorithm in C and we have considered all the settings with m = 3, 4 and n ∈ {6, . . . , 13}. For each setting we have considered 103 different agents’ payoffs calculated at random in (0, 1). We have used a 1.4GHz CPU with 500MBytes
[1] F. Di Giunta and N. Gatti, ‘Alternating-offers bargaining under onesided uncertainty on deadlines’, in Proceedings of ECAI, pp. 225–229, Riva del Garda, Italy, (2006). [2] F. Di Giunta and N. Gatti, ‘Bargaining over multiple issues in finite horizon alternating-offers protocol’, Annals of Mathematics in Artificial Intelligence, 47(3-4), 251–271, (2006). [3] D. Fudenberg and J. Tirole, Game Theory, The MIT Press, Cambridge, MA, USA, 1991. [4] N. Gatti, F. Di Giunta, and S. Marino, ‘Alternating-offers bargaining with one-sided uncertain deadlines: an efficient algorithm’, Artificial Intelligence, 172(8-9), 1119–1157, (2008). [5] P. R. Halmos, Measure Theory, Springer, Berlin, Germany, 1974. [6] C. Lemke, ‘Some pivot schemes for the linear complementarity problem’, Mathematical Programming Study, 7, 15–35, (1978). [7] N. Nisam, T. Roughgarden, E. Tardos, and V. V. Vazirani, Algorithmic Game Theory, Cambridge University Press, New York, USA, 2007. [8] P. Paruchuri, J. P. Pearce, M. Tambe, F. Ordonez, and S. Kraus, ‘An efficient heuristic approach for security against multiple adversaries’, in Proceedings of AAMAS, pp. 311–318, Honolulu, USA, (2007). [9] P. Paruchuri, M. Tambe, F. Ordonez, and S. Kraus, ‘Security in multiagent systems by policy randomization’, in Proceedings of AAMAS, pp. 273–280, Hakodate, Japan, (2006). [10] R. Porter, E. Nudelman, and Y. Shoham, ‘Simple search methods for finding a nash equilibrium’, in Proceedings of AAAI, p. 664669, San Jose, USA, (2004).
408
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-408
Monitoring the Execution of a Multi-Agent Plan: Dealing with Partial Observability Roberto Micalizio and Pietro Torasso1 Abstract. The paper addresses the task of monitoring and diagnosing the execution of a multi-agent plan (MAP) which involves actions concurrently executed by a team of cooperating agents. The paper describes a weak commitment strategy to deal with cases where observability is only partial and it is not sufficient for inferring the outcome of all the actions executed so far. The paper discusses the role of target actions in providing sufficient conditions for inferring the pending outcomes in a finite time window. The action outcome provides the basis for computing plan diagnosis and for singling out the goals which will not be achieved because of an action failure.
requirement combined with the weak commitment strategy guarantees that the outcome of each action can be inferred within a finite time window. The paper is organized as follows. In the following sections we introduce the basic notions of global and local plans, then we formalize the processes of monitoring and diagnosis of a MAP and discuss the role of minimal observability requirement and weak commitment strategy in inferring the actions outcomes which cannot be directly observed; finally we discuss some computational issues and conclude.
1 Introduction
2 Distributed Plan Execution and Supervision
The problem of diagnosing the execution of a single-agent plan has been investigated long time ago (see the pioneering work by Birnbaum et al. [1], where the concept of plan threat is introduced). However, only recently a number of Model-Based approaches (see [4, 8, 5]) have started to address the complex problem of diagnosing the execution of a multi-agent plan (MAP), i.e. a plan involving a team of cooperating agents, which execute actions concurrently. These works are essentially based on a distributed approach where each agent is responsible for supervising (monitoring and diagnosis) the actions it executes. Typically these approaches assume that action failures are not consequences of plan flaws, but failures are due to the occurrence of unexpected events (such as discrepancies in the shared assumptions or occurrence of faults in some agents functionalities). Thus, the plan execution needs to be supervised in order to detect and explain an action failure as soon as possible. As discussed in [8], the plan diagnosis consists in a subset of actions whose failure is consistent with the anomalous observed behavior of the system. However, this notion of plan diagnosis can be complemented with a notion of threatened actions, which estimates the impact of the failure since the harmful effects of an action failure may propagate to the whole MAP. In this paper, similarly to the previous approaches, a distributed approach for supervising the MAP execution is adopted. However, we address the problem of diagnosing plans characterized by the presence of joint actions, which introduce further dependencies among the agents as they need to synchronize and to communicate among themselves. Moreover, we have to deal with actions whose faulty behavior may be non deterministic. In the paper we show that the nominal plan execution imposes some requirement on observability (we will call minimal observability requirement) in order to guarantee the inter-agent communication and we introduce a weak commitment strategy to deal with cases where observability is only partial and it is not sufficient for inferring the outcome of all the actions executed so far. We will show how the minimal observability
In this paper we consider a specific class of MAS where a team T of agents cooperate to reach a common complex goal G. In particular, the global goal G is decomposed into a set of (easier) sub-goals, each of which is assigned to an agent in the team. In most cases, however, the sub-goals are not independent of one another as the agents have to cooperate by exchanging services or by executing joint actions; this cooperative behavior introduces causal dependencies among activities, hence when an unexpected event causes the failure of an agent activity, this failure may propagate in the whole system affecting the activities of the other agents in the team. Global plan. The notion of multi-agent plan (MAP), as formalized by Cox et al. in [2], is well suited for modeling both the agents activities and the causal dependencies existing among them. According to [2], given a team T of agents, the MAP is the tuple A, E, CL, CC, N C such that: A is the set of the action instances the agents have to execute; each action a is assigned to a specific agent i of the team T and it is modeled in terms of preconditions and direct effects. Within the set A there are two special actions: a0 and a∞ ; a0 is the starting action, it has no preconditions and its effects specify which propositions are true in the initial state; a∞ is the ending action, it has no effects and its preconditions specify the propositions which must hold in the final state i.e., the preconditions of a∞ specify the MAP’s goal G. While E is a set of precedence links between actions, CL is a set of q causal links of the form l : a → a ; the link l states that the action a provides the action a with the service q, where q is an atom occurring in the preconditions of a ; finally, CC and N C are respectively the concurrency and non-concurrency symmetric relations over the action instances in A; in particular, the pair a, a in CC models a joint action whereas constraints in N C prevent the conflicts for accessing the resources; this is equivalent to the concurrency requirement introduced in [8]. Plan Distribution The execution of the MAP P is a critical step as the agents have to concurrently execute the actions assigned to them without violating the constraints introduced in the planning phase. It
1
Dipartimento di Informatica - Universit`a di Torino, Italy, email: {micalizio, torasso}@di.unito.it
R. Micalizio and P. Torasso / Monitoring the Execution of a Multi-Agent Plan: Dealing with Partial Observability
409
AT(O1,P4)
AT(A1,P2)
1 Move(A1,P0,P2)
AT(O1,T) AT(A1,T)
AT(A1,P4)
2 Move(A1,P2,P4)
3 Push(A1,O1,P4,T)
4 PutOn(A1,O1,O2,T) NC
AT(A1,P0) AT(A2,P3)
a0 AT(O2,P2)
5 Move(A2,P3,P2)
AT(O2,T)
AT(O2,P4) AT(A2,P4)
AT(A2,P2)
6 Push(A2,O2,P2,P4)
ON(O1,O2)
AT(O2,T) AT(A2,P4)
AT(A2,T)
7 Push(A2,O2,P4,T)
8 Move(A2,T,P4)
a∞
NC AT(A3,P0) CC AT(A3,P1)
9 Move(A3,P0,P1)
AT(O2,P2) AT(A3,P4)
AT(A3,P2)
10 Move(A3,P1,P2)
Figure 1.
11 Push(A3,O2,P2,P4)
AT(O2,T)
CC
12 Push(A3,O2,P4,T)
AT(A3,P4)
AT(A3,T)
13 Move(A3,T,P4)
The MAP P to be monitored.
is therefore quite natural to conceive a distributed approach to the supervision of the plan execution. In this paper we adopt a distributed approach to the supervision (similar to the ones discussed in [8, 5]) where each agent performs a (local) control loop over the actions it executes. Local Plans. The MAP P under consideration is decomposed into as many sub-plans Pi as the agents in T and each sub-plan Pi is assigned to agent i. The decomposition can be easily done by selecting from P all the actions an agent i has to execute. Formally, the subplan for agent i is the tuple Pi = Ai , Ei , CLi , CCi , N Ci where: Ai , Ei , CLi , CCi and N Ci are the same as in P restricted to the actions agent i has to execute (i.e., at least one action belongs to Ai ). We consider the time as a discrete sequence of instants; the actions are executed synchronously by the agents in the team and each action in P takes a time unit to be executed (this common assumption is also made in [8, 6]). At a given time t, an agent i can execute just one action a (in the following the notation ait will denote the action executed by agent i at time t). After the execution of action ait the agent i may receive a set of observations, denoted as obsit+1 , relevant for the status of i itself. Minimal Observability Requirement and Agent Communication. Since the agents need to communicate for achieving coordination during the plan execution, a minimal observability requirement must be satisfied. To figure out what events are included in the minimal observability requirement consider that coordination is required in three cases during the nominal plan execution. First, when an agent i has to provide a service q to another agent j; technically, this case is encoded by a q causal link l : a → a in the MAP P (where a ∈ Ai and a ∈ Aj ). After the execution of a, the agent i must be able to observe the achievement (or the absence) of service q and must notify agent j whether the service has been provided or not; in fact, because of partial observability, the agent j can not directly observe the service q and has to wait for a message from i. The second situation which requires explicit coordination during the execution regards the joint actions. Every pair of actions a, a , included in the set of concurrency constraints CC, models a joint action; where a and a are actions assigned to agents i and j, respectively. In order to execute the joint action a, a in a synchronized way, the two agents i and j need to observe whether the preconditions of the actions a and a are satisfied; in fact the joint action can be performed only when the preconditions of both actions are satisfied and both agents have to be aware of this. Explicit coordination is also required for executing actions bounded by non-concurrency constraints: in this case coordination is ruled by the set of non-concurrency constraints N C in P and prevents the simultaneous execution of the constrained actions. Given
the pair of actions a, a ∈ N C (where a ∈ Ai and a ∈ Aj respectively), when agent i intends to execute action a must inform agent j and in case of conflict the two agents have to negotiate. As we will see in section 4, agents communicate even in case of action failures: an agent must notify other agents when a service will not be provided as a consequence of a failure. Running Example. In the paper we will use a simple example from the blocks world for illustrating the concepts and the techniques we propose. Let us consider three agents that cooperate to achieve a global goal G where two blocks O1 and O2 are moved in a target position T and O1 is put on the top of block O2; initially, the blocks are located in position P4. In its nominal behavior an agent can move a block by pushing it; however, in some cases a block may be too heavy and two agents need to join their efforts to push it. Figure 1 shows a possible MAP which achieves the goal G, in particular, the agents A2 and A3 cooperate to move the (heavy) block O2 in position T (see the joint actions 6,11 , 7,12 ); the agent A1 move the block O1 in position T than it puts O1 on the top of O2 (see action PutOn). The MAP is a DAG where nodes are actions, solid and dashed arrows are causal and precedence links respectively, while concurrent and non-concurrent constraints are solid, bidirectional arrows labeled as CC and N C respectively. The dashed rectangles specify which actions are included in the sub-plans assigned to the three agents. The operations within the target position are constrained: at each time instant only one block can be moved in, so there are non-concurrency constraints between the joint action 7,12 and the simple action 3; moreover, since the block O2 must be positioned in T earlier than O1, precedence links exist between the actions 7,12 and 3.
3
Monitoring with uncertain action outcomes
The monitoring performed by agent i over the execution of its subplan provides two important services: 1) Estimate the state of agent i after the execution of an action a 2) Detect the outcome of the action a However, before describing the monitoring process we need to introduce some important concepts. Agent state. Intuitively, the system status can be expressed in terms of the status variables of the agents in the team T and of the status of the system resources RES . However, the distributed approach to the supervision prevents the adoption of a global notion of status while it allows a local view based on a single agent. The status of agent i is expressed in terms of a set of status variables VAR i , which is partitioned into three subsets END i , ENV i and HLT i . END i and ENV i denote the set of endogenous (e.g., the agent’s position) and environment (e.g., the resources state) status variables, respectively. Note that, because of the partitioning, each
410
R. Micalizio and P. Torasso / Monitoring the Execution of a Multi-Agent Plan: Dealing with Partial Observability
agent i has to maintain a private copy of the resource status variables; more precisely, for each resource resk ∈ RES (k : 1..|RES |) the private variable resk,i is included in the set ENV i . Since we are interested in monitoring the plan execution even when action failures occur, we introduce a further set of variables in order to model the agent faults, which may cause action failures. HLT i denotes the set of variables concerning the health status of an agent functionalities (e.g., mobility and power); in particular, for each agent functionality f , a variable vf ∈ HLT i represents the health status of f , the domain of variable vf is the set {ok, abn1 , . . . , abnn } where ok denotes the nominal mode while abn1 , . . . , abnn denote non nominal modes. It is worth noting that the observations obsit agent i may receive, convey information about just a subset of variables in VAR i . First of all, an agent can directly observe just the status of the resources it is actively exploiting: the status of other resources is not directly observable but an agent can communicate with other agents to determine it. Moreover, the observations obsit provide in general the value of just a subset of variables in END i ; whereas the variables in HLT i are not directly observable and their actual value can be just inferred. Given this partial observability, at each time t the agent i can just estimate a set of alternative states which are consistent with the received observations obsit ; in literature this set is known as belief state and will be denoted as Bti . Action models. The model of a simple action ait (assigned to agent i at time t) is the tuple var(ait ), pre(ait ), eff(ait ), event(ait ), Δ(ait ) ; where var(ait ) ⊆ VAR i is the subset of status variables over which the set pre(ait ) of preconditions and the set eff(ait ) of effects are defined; event(ait ) is the set of exogenous events (e.g., faults) which may occur during the execution of action ait and which possibly may affect its outcome; finally, Δ(ait ) is a transition relation where every tuple d ∈ Δ(ait ) models a possible state transition, which may occur while i is executing ait . Each tuple d has the form d = st , event , st+1 ; where st and st+1 represent two agent states at time t and t + 1 respectively (each state is a complete assignment of values to the status variables in var(ait )) and event (possibly empty) represents the occurrence of an unexpected event in event(ait ). Since Δ(ait ) is a relation, the action model can represent non deterministic, anomalous action effects. The healthy formula healthy(ait ) of action ait is computed by restricting each variable vf ∈ healthVar(ait ) to the nominal behavioral mode ok and represents the nominal health status of agent i required to successfully complete the action itself. Therefore, the (expected) nominal effects of ait are nominalEff(ait )={q ∈ eff(ait ) | pre(ait )∪ healthy(ait ) q}. On the contrary, when the healthy formula does not hold, the behavior of the action may be non deterministic and some (even all) of the expected effects may be missing. Joint actions. The notion of simple action can be extended to cover the notion of joint action, which as discussed in [3], can be seen as the simultaneous execution of a subset of simple actions. In this paper we consider a stronger notion of joint action: two simple actions ait and ajt are part of a joint action ai,j t not only because they are executed at the same time, but also because the agents i and j actively cooperate to reach an effect. The notion of dependency set, introduced in [5], is exploited to homogenously represent both simple and joint actions. Intuitively, a dependency set I(t) highlights the subset of agents whose strict cooperation is required in a specific time instant t and can be easily determined from the concurrency constraints defined in the MAP. The agents within the same dependency set I(t) synchronize in order to build a joint belief state BtI (resulting from the conjunction of their local belief states) and the joint model of the
I(t)
action at (see details in [5]). Thus, given the dependency set I(t), I(t) at may denote a simple or a joint action. The prediction process. In [5] Micalizio et al. have proposed a distributed strategy for monitoring the execution of a MAP. Their apI(t) proach can be summarized as follows: let at denote the (joint) action executed by the agent(s) in the dependency set I(t) at time t (for the sake of readability we will write aIt whenever the time of the dependency set is obvious from the context), let BtI be the (joint) belief state of the agents in I, let Δ(aIt ) the model of the (joint) action the agents in I have to execute at time t, the joint belief state at time t + 1 (i.e., after the action execution) can be inferred as: I Bt+1 = π VAR I (σ obsI (BtI 1 Δ(aIt ))). t+1
t+1
The join operation BtI 1 Δ(aIt ) represents the prediction step as it estimates the set of possible states of the agents in I at time t + 1. However, the set of predictions resulting from the join operation is in general spurious as it predicts all possible evolutions. The selection operation σ obsI has the effect of pruning off all those predictions t+1 which are inconsistent with the observations received by the agents in I at time t+1 where obsIt+1 = i∈I obsit+1 . Of course, the precision of the estimated joint belief strongly depends on the amount of available observations: in the worst case obsIt+1 is empty and the selection operator can not discard any of the predicted states; in the best (unrealistic) case obsIt+1 is so complete to reduce the estimated I states to the actual agent state. Finally, the (joint) belief state Bt+1 is inferred by projecting the set of refined predictions over the agent status variables at time t+1. Strong Commitment. Intuitively, the outcome of action aIt is succeeded when all the nominal, expected effects are achieved after its execution, the action outcome is failed otherwise. However, since the I belief state Bt+1 may be highly ambiguous, in [5] the authors adopt a strong commitment policy by considering aIt as successfully completed iff all its nominal effects nominalEff(aIt ) have been achieved I , formally: in every possible state s included in Bt+1 I aIt succeeded ↔ ∀q ∈ nominalEff (aIt ), ∀s ∈ Bt+1 , s |= q
aIt
(1)
Moreover, [5] requires that the outcome of action must be immediately assessed after the execution of the action at time t+1. Therefore, when nominalEff(aIt ) are not satisfied in at least one state I included in the joint belief Bt+1 , the outcome of action aIt is assumed to be failed. This strong commitment policy is based on the assumption that, whenever the action aIt is successfully completed, the amount of observations available at time t+1 is sufficient for pruning I off from Bt+1 any state s where the nominal effects do not hold. Under this assumption it is sufficient that each agent maintains just the I last belief state Bt+1 , as it represents a synthesis of the past history till time t+1. Unfortunately this assumption may not hold in many domains and, as a consequence of the partial observability, it may happen that even when an action is successfully completed an agent concludes a failure because it can not univocally assert a success. Weak Commitment. In order to avoid this problem we propose a more flexible strategy where the outcome of an action aIt can be inferred within a time window rather than at the precise time instant t+1. In particular we assume that the system observability satisfies just the minimal observability requirement, and we propose a methodology for monitoring the plan execution which is able to cope with this constraint. This means that, when an agent is unable to determine the outcome of action aIt , the agent does not conclude the failure of aIt , but postpones the assessment of the action outcome. In fact, although the outcome of aIt can not be precisely determined
R. Micalizio and P. Torasso / Monitoring the Execution of a Multi-Agent Plan: Dealing with Partial Observability
at the current time t+1, it may be determined in a future time instant by exploiting observations that each agent i in I will receive. For this reason, each agent i has to maintain a list pOi (t) of actions whose outcome has not been determined yet at time t; i.e., a list of pending outcomes. Moreover, the agent i has to maintain a trajectory T ri [0, t + 1], which relates all the belief states agent i has inferred so far. In particular, since the belief states are ambiguous and, in general, include a number of alternative states, T ri [0, t + 1] is a set of trajectories. Refining the agent trajectory. Given the action aIt , such that the agent i ∈ I, the process for estimating the belief state of agent i at time t+1 consists in the process for extending the agent trajectory T ri [0..t] to cover the time instant t+1. Also in this case we adopt the Relational Algebra operators to formalize this process: T ri [0, t + 1]= σ obsI (T ri [0, t] 1 Δ(aIt )) t+1 The join operator represents the step which extends the agent trajectory, in fact any of the transitions modeled in Δ(aIt ) are appended at the end of one (or more) traces in T ri [0..t]. Observe that the join operator implicitly refines also the agent trajectory: all the traces in T ri [0..t] which do not participate to the join are discarded. The selection operator further refines the agent trajectory as it filters out all those traces which are inconsistent with the observations available at time t+1. Therefore, T ri [0..t + 1] maintains all the possible agent trajectories which are consistent with the observations received so far, given a sequence of actions executed by agent i in the interval [0..t + 1]. Inferring action outcome. Since the extension of the agent trajectory refines the trajectory itself, it may reduce the ambiguity in some of the previous belief states. Thus, agent i can try to infer the outcome of some of the actions in pOi (t + 1); in fact, for each action I(k) ∈ pOi (t + 1) (where I(k) represents the dependency set inak cluding i at time k ∈ [0, t + 1]), it is possible to determine the belief state inferred by the agent at time k + 1 from the agent trajectory T ri [0, t + 1] as follows: I =π k+1 (T ri [0, t + 1]). Bk+1 I Observe that Bk+1 is potentially different from the belief state inI results from the proferred by the agent i at time k, in fact Bk+1 gressive extension of the agent trajectory from time k to time t+1 I and at each step Bk+1 may have been refined. The nominal outcome I of action ak is therefore inferred similarly to the definition in formula 1; i.e., if the nominal effects of the action aIk hold in every state I s ∈ Bk+1 the action outcome is succeeded. However, the achievement of the nominal effects of action aIk is a consequence not only of the nominal execution of this action but also of the previous actions which are causally related to aIk . The relation between the nominal outcome of aIk and the previous actions is formalized in the following property: Property 1 Given the agent i and its dependency set I at time k, let aIk be an action with outcome succeeded, then all the actions ah in pO i (k ) ∩ dependsOn(akI ) have outcome succeeded too where dependsOn(aIk ) denotes the subset of actions {a1 , . . . , an } in Ai , which directly or indirectly provide aIk with a service (i.e., through a sequence of causal links). It is also possible that the extension of the trajectory does not suffiI in such a way the nominal effects ciently refine the belief state Bk+1 I of the action aIk hold just in a subset of the states included in Bk+1 ; I in this case the outcome of ak remains pending. Finally, if the nominal effects do not hold in any state included in I Bk+1 we conclude that the outcome of aIk is non nominal. This does not necessarily imply that the action aIk is failed since the not achievement of the nominal effects may depend on the failure of previous actions causally related to aIk .
411
4 Plan Diagnosis. As soon as the outcome of an action aIt is determined to be non nominal , a diagnostic process is activated in order to provide a possible explanation for such a non nominal outcome. In this paper we adopt the same notion of plan diagnosis introduced by Roos et al. in [8]: once observed a non nominal outcome of action aIt , the plan diagnosis P D(aIt ) singles out a subset of actions executed by the agents in I, whose failure is consistent with the anomalous, observed behavior of the system. Given an agent i ∈ I, every action a in EXP i (aIt )= (pOi (t) ∩ dependsOn(aIt )) ∪ aIt is a minimal explanation of the non nomI inal outcome of at . Therefore, the plan diagnosis for agent i is a; in fact, due to the causal depenP Di (aIt ) = a∈EXP i (aI ) t dencies, it is sufficient to assume the failure of at least one of these actions to explain the observed non nominal outcome of aIt . It is east extending the plan diagnosis to the dependency set I as P D(aIt )= i∈I P Di (aIt ). Essentially, the plan diagnosis explains the observed, non nominal outcome of aIt by singling out a subset of actions whose failure may be the root cause of that observation. Missing Goals. The plan diagnosis can be refined by determining the set of missing goals. A missing goal is a service which can not be provided by agent i as a consequence of the failure of action aIt (where i belongs to the dependency set I). To formally characterize the concept of missing goal we introduce the notion of primary effect: given an action aI the nominal effect q in nominalEff(aI ) is a primary effect if at least one of the following conditions hold: 1. q ∈ pre(a∞ ) i.e., q belongs to the global goal. 2. q is a service that aI provides to a subset J of agents; i.e., there q exists a causal link l : aI → aJ where I = J. Observe that aI and J a can be joint or simple actions. In general, given an action aI , primary(aI ) denotes the (possibly empty) set of primary effects provided by aI . To determine the set of missing goals we adopt a conservative policy and we assume that all actions included in plan diagnosis are actually failed. Therefore, the subset of missing goals that the agents in I can no longer achieve is: missingGoals(aIt )= a∈P D(aI ) primary(a). t In principle, it is sufficient to achieve all the missing goals in an alternative way in order to reach the MAP’s global goal G despite the occurrence of the failure. Therefore the missing goals may be the starting point for any plan recovery strategy. Propagating the Plan diagnosis. As said above, the failure of aIt may propagate in the plan preventing the execution of actions assigned to different agents in the team (not limited to the dependency set I), possibly causing the stop of the whole system. For this reason, we complement the notion of plan diagnosis with the set threatened actions ThrActs(aIt ) which could be indirectly affected by the failure of aIt (through a sequence of causal links). Intuitively, an acq tion is threatened through a causal link l : a → a when it is no longer guaranteed that the action a provides the service q; this may happen either because a is failed (i.e., included in the plan diagnosis P D(aIt )) or because a is in turn threatened. Formally the set ThrActs(aIt ) is defined as: q ThrActs(aIt )={a ∈ A| aIt ≺ a and ∃ a causal link l : a → a, I l ∈ CL, a ∈ P D(at ) or a ∈ ThrActs } Observe that, the propagation is a form of communication among agents, which conveys negative information; hence an agent does not wait indefinitely for services which will never be provided. Running Example. Let us consider the blocks world example and assume that at time 4 the failure of the joint action 7,12
412
R. Micalizio and P. Torasso / Monitoring the Execution of a Multi-Agent Plan: Dealing with Partial Observability
(whose dependency set is {A2,A3}) is detected. To determine whether this failure may be consequence of a previous failure, one has to single out which actions in pOA2 (4) (pOA3 (4)) directly or indirectly provide 7,12 with a service. According to the definition of primary service, the outcome of actions 5 and 10 must be observable, whereas the outcome of action 6,11 may be not. Let us suppose that the observations available at time 3 are not sufficient for inferring the outcome of action 6,11 ; thereby both the agents A2 and A3 include the joint action 6,11 in the set pending outcomes; i.e., pOA2 (4)=pOA3 (4)={6,11 }. Since 6,11 directly provides a service to action 7,12 , the failure of the second action may be a consequence of the failure of the first one, thus the plan diagnosis includes both actions: P D(7,12 )={6,11 , 7,12 }. Given the plan diagnosis, the set of missing goals is missingGoals(7,12 )={AT(O2,T)}; moreover the propagation of the plan diagnosis highlights that the failure of action 7,12 affects not only the actions 8 and 13 of the agents A2 and A3 respectively, but also the action 4 of the agent A1. Note that providing the missing service AT(O2,T) the agent A1 would be able to accomplish its task without any adjustment to its sub-plan.
and the robotic agents, simulated in a software environment, are implemented as threads running on the same Intel Pentium (1.86 GHz, RAM 1 GB, Windows XP OS). The preliminary results collected so far are encouraging: given MAPs involving up to 6 agents and 60 actions on average, the plan supervision (monitoring, agent diagnosis and failure propagation) performed by each agent requires on average 5 msec. per instant (being the maximum absolute CPU time per instant 30 msec); exploiting the target actions, an agent maintains a trajectory whose length is 5 instants in the worst case (3 on average).
6 Discussion and Conclusion
Property 2 Given the agent i, the target action aIt , where i ∈ I, and the set pOi (t), if each action in pOi (t) provides (directly or indirectly) aIt with a service, the detection of the outcome of aIt allows to infer the outcome of each action included in pOi (t).
The problem of diagnosing a multiagent plan has been recently addressed by exploiting methods and techniques developed within the MBD community, in particular for the diagnosis of distributed system (see e.g., [7]). In [4] the authors consider multi-agent systems where, at each time instant, every agent chooses the more appropriate behavior to assume according to its beliefs. The authors introduce the notion of social diagnosis to explain the disagreements among cooperating agents. The approach presented in this paper has some resemblance with [8], where a distributed approach to monitoring and diagnosing the execution of a MAP is proposed. It assumes that each agent monitors and diagnoses the actions it is responsible for where actions are atomic and are modeled as functions of their nominal behavior only. Since the anomalous behavior of the actions is not explicitly modeled, the monitoring can not estimate faulty system states. In this paper, we propose a distributed approach for monitoring and diagnosing the execution of a multi-agent plan in a system which is only partially observable. Differently from the approach in [8], we adopt extended action models for capturing both nominal and anomalous execution. Thereby the monitoring process we propose is able to estimate system states even after the occurrence of faults. Moreover, by exploiting the notion of dependency set introduced in [5], the approach uniformly deals with simple as well as joint actions. Finally the paper has discussed a methodology based on the weak commitment strategy which is able to infer a plan diagnosis and to determine two important pieces of knowledge over the system status: the set of missing goals and the set of actions threatened by the plan diagnosis. These two sets play a critical role in any plan recovery strategy since one has to find (if possible) an alternative way for reaching the global goal which is not achievable because of the action failure. In general such a recovery step requires a re-planning phase where the set of missing goals contribute to reduce the search space since they clearly point out what must be achieved.
Property 2 states that, after the execution of a target action aIt , every agent i ∈ I can determine the outcome of all the actions in the set of pending outcomes pOi (t). Moreover, in case no failure has been detected, every agent i can replace the trajectory with the belief state I as it represents a synthesis of the past history till time t+1. Bt+1 For example, in the MAP of Figure 1, the simple actions 4,5,8,13 are target actions as well as the joint action 7,12 . Implementation and preliminary results From a computational point of view, managing relations such as belief states and action models which possibly have a huge dimension may be very expensive. In order to implement both the monitoring and diagnostic processes in an effective way, we have encoded the relation by means of the symbolic formalism of the Ordered Binary Decision Diagrams (OBDDs); the relational operations have been mapped into standard operations on OBDDs. A prototype has been implemented in Java JDK 1.6 and exploits the JavaBDD( http://sourceforge.net/projects/javabdd) package for manipulating OBDDs. The approach has been tested in a office domain
[1] L. Birnbaum, G. Collins, M. Freed, and B. Krulwich, ‘Model-based diagnosis of planning failures’, in Proc. AAAI’90, pp. 318–323, (1990). [2] J. S. Cox, E. H. Durfee, and T. Bartold, ‘A distributed framework for solving the multiagent plan coordination problem’, in Proc. AAMAS05, pp. 821–827, (2005). [3] R. M. Jensen and M. M. Veloso, ‘Obdd-based universal planning for synchronized agents in non-deterministic domains’, JAIR, 13, 189–226, (2000). [4] M. Kalech and G.A. Kaminka, ‘Towards model-based diagnosis of coordination failures’, in Proc. AAAI05, pp. 102–107, (2005). [5] R. Micalizio and P. Torasso, ‘On-line monitoring of plan execution: a distributed approach’, Knowledge-Based Systems, 20(2), 134–142, (2007). [6] Roberto Micalizio and Pietro Torasso, ‘Plan diagnosis and agent diagnosis in multi-agent systems’, volume 4733 of LNCS, pp. 434–446, (2007). [7] Y. Pencol´e and M.O. Cordier, ‘A formal framework for the decentralised diagnosis of large scale discrete event systems and its application to telecommunication networks’, AI, 164, 121–170, (2005). [8] C. Witteveen, N. Roos, R. van der Krogt, and M. de Weerdt, ‘Diagnosis of single and multi-agent plans’, in Proc. AAMAS05, pp. 805–812, (2005).
5 Computational Issues So far we have discussed in a declarative way a methodology for supervising a MAP, in this section we analyze some computational issues which may arise while implementing the approach. Agents Trajectories. Since maintaining a set of trajectories from the initial time instant may be computationally expensive, we can limit the length of the agent trajectories by considering that the primary effects of an action aIt must always be observable at time t+1 (according to the minimal observability requirement). In order to make evident in the MAP which actions provide primary effects we introduce the notion of target action; in particular, an action aIt is said to be a target action iff primary(aIt ) is not empty. Since the outcome of a target action is always observable, target actions can be considered as milestones in the plan and exploited for determining temporal windows the agent trajectories must cover. In particular, under some requirements on the causal dependencies in the MAP, the following property holds.
REFERENCES
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-413
413
A hybrid approach to multi-agent decision-making Paulo Trigo 1 and Helder Coelho 2 Abstract. In the aftermath of a large-scale disaster, agents’ decisions derive from self-interested (e.g. survival), common-good (e.g. victims’ rescue) and teamwork (e.g. fire extinction) motivations. However, current decision-theoretic models are either purely individual or purely collective and find it difficult to deal with motivational attitudes; on the other hand, mental-state based models find it difficult to deal with uncertainty. We propose a hybrid, CvI-JI, approach that combines: i) collective ‘versus’ individual (CvI) decisions, founded on the Markov decision process (MDP) quantitative evaluation of joint-actions, and ii) joint-intentions (JI) formulation of teamwork, founded on the belief-desire-intention (BDI) architecture of general mental-state based reasoning. The CvI-JI evaluation explores the performance’s improvement during the process of learning a coordination policy in a partially observable stochastic domain.
1
INTRODUCTION
The agents that cooperate to mitigate the effects of a large-scale disaster, e.g. an earthquake or a terrorist incident, take decisions that follow two large behavioral classes: the individual (ground) activity and the collective (institutional) coordination of such activity. Additionally, agents are motivated to form teams and jointly commit to goals that supersede their individual capabilities [8]. Despite such motivation, communication is usually insufficient to ensure that decision-making is supported by a single and coherent world perspective. The communication constraint causes the decision-making process to evolve simultaneously, both at the collective (common-good) and at the individual (self-interested) strata, sometimes in a conflicting manner. For instance, an ambulance searches for a policy to rescue a perceived civilian, while the ambulance command center, when faced with a global view of multiple injured civilians, searches for a policy to decide which ambulance should rescue which civilian. However, despite the intuition on a 2-strata decision process, research on multi-agent coordination often proposes a single model that amalgamates both strata and searches for optimality within that model. The approaches based on the multi-agent Markov decision process (MMDP) [1] are purely collective and centralized, thus too complex to coordinate while requiring unconstrained communication. The multi-agent semi-Markov decision process (MSMDP) [7], although decentralized, requires each individual agent to represent the whole decision space (states and actions) which may become very large, thus causing the individual policy learning to be slow and highly dependent on up-to-date information about the decisions of all other agents. The game-theoretic approach requires an agent to compute the utility of all combinations of actions executed by all other 1 2
GuIAA/LabMAg; DEETC, ISEL - Instituto Superior de Engenharia de Lisboa, Lisbon, Portugal, email: ptrigo@deetc.isel.ipl.pt LabMAg; DI, FCUL - Faculdade de Ciˆencias da Universidade de Lisboa, Lisbon, Portugal, email: hcoelho@di.fc.ul.pt
agents (payoff matrix), which is then used to search for Nash equilibria (where no agent increases his payoff by unilaterally changing his policy); thus, if several equilibria exist, agents may adhere to purely individual policies never being pulled by a collective perspective. The multi-agent collective ‘versus’ individual (CvI) decision model [15], which is founded on the semi-Markov decision process (SMDP) framework, is neither purely collective nor purely individual and explores the explicit separation of concerns between both (collective and individual) decision strata while aiming to conciliate their reciprocal influence. Despite that, the CvI misses the agents’ intentional stance toward team activity. On the other hand, the joint-intentions (JI) formulation of teamwork [5], based on the belief-desire-intention (BDI) mental-state architecture [9, 16], captures the agents’ intentional stance, but misses the MDP domain-independent support for sequential decision-making in stochastic environments. Research on single-agent MDP-BDI hybrids formulates the correspondence between the BDI plan and the MDP policy concepts [11] and empirically compares each model’s performance [10]. Multi-agent MDP-BDI hybrid models often exploit BDI plans to improve MDP tractability, and use MDP to improve BDI plan selection [13]. In this paper, instead of exploring the MDP-BDI policy-plan relation, we focus on the link between the BDI intention concept and the MDP temporally abstract action concept [12]. We see an intention as an action that executes for time variable periods and, when terminated, yields a reward to the agent. We extend this view to the joint-intentions concept and integrate the resulting formulation in the 2-strata multilevel hierarchical CvI decision model. Thus, the CvI-JI is a hybrid approach that combines the MDP temporally abstract action concept and the BDI mental-state architecture. The motivation for the hybrid CvI-JI model is to use the JI as a heuristic constraint that reduces the space of admissible MDP joint-actions, thus enabling to escalate the problems’ dimension. The experiments show the CvI-JI learning improvement in a partially observable environment.
2
THE CvI DECISION MODEL
The premise of the CvI decision model is that the individual choice coexists with the collective choice and that coordinated behavior happens (is learned) from the prolonged relation (in time) of the choices exercised at both of those strata (individual and collective). Coordination is exercised on high level, hierarchically organized cooperation tasks, founded on the framework of Options [12], which extends the MDP theory to include temporally abstract actions (variable time duration tasks, whose execution resorts to primitive actions).
2.1
The framework of Options
Formally, an MDP is a 4-tuple M ≡ S, A, Ψ, P, R model of stochastic sequential decision problems, where S is a set of states, A
414
P. Trigo and H. Coelho / A Hybrid Approach to Multi-Agent Decision-Making
is a set of actions, Ψ ⊆ S × A is the set of admissible state-action pairs, R( s, a ) is the expected reward when action a is executed at s, and P ( s | s, a ) is the probability of being at state s after executing a at state s. Given an MDP, an option o ≡ I , π, β , consists of a set of states, I ⊆ S, from which the option can be initiated, a policy, π, for the choice of actions and a termination condition, β, which, for each state, gives the probability that the option terminates when that state is reached. The computation of optimal value functions and optimal policies, π , resorts to the relation between options and actions in a semi-Markov decision process (SMDP): “any MDP with a fixed set of options is a SMDP” [12]. Thus, all the SMDP learning methods can be applied to the case where temporally extended options are used in an MDP. The options define a multilevel hierarchy where the policy of an option chooses among other lower level options. At each time, the agent’s decision is entirely among options; some persist for a single time step (primitive action or one-step option), others are temporarily extended (multi-step option).
2.2
The CvI collective and individual strata
The individual stratum is simply a set of agents, Υ, each agent, j ∈ Υ, with its capabilities described as a hierarchy of options. The collective stratum is an agent (e.g. institutional) that cannot act on its own (its actions are executed by the individual stratum agents) and its purpose is to coordinate the individual stratum. Formally, at the collective stratum, each action is defined as a collective option, oo = Io , πo , βo , where o = o1 , . . . , o|Υ| represents the simultaneous execution of option oj ≡ I j , π j , β j by each agent j ∈ Υ. ⊆ O 1 × . . . × O |Υ| , The set of agents, Υ, defines an option space, O is a where Oj is the set of agent j options and each oo ∈ O decomposes into O d disjoint subsets, each collective option. The O with the collective options available at the, d, hierarchical level, where 0 < d ≤ D − 1 and level-0 is the root and level-D is the hierarchy depth. A level d policy, πd , is implicitly defined by the d . The Md solution is SMDP Md with state set S and action set O the optimal way to choose the level d individual policies which, in the long run, gathers the highest collective reward. The CvI structure. Figure 1 illustrates the CvI structure, where the individual stratum (each agentj ) is a 3-level hierarchy and thus the collective stratum (the two, o1 and o2 , collective option instances) is a 2-level hierarchy; at each level, the set of diamond ended arcs, links the collective option to each of its individual policies. o2 = 〈o1p-2.1 , o2p-3.2〉 o1 = 〈o1p-2 , o2p-3〉
collective stratum individual stratum op-1
o
π11
op-2
op-3
π21
o
op-1 op-2 op-3 op-4
π12 agent 1
π22 op-2.1 op-2.2
op-3.1
op-3.2
op-3.3
agent 2
Figure 1. The CvI structure and inter-strata links (superscript j refers to agentj ; subscripts k and p-k refer to k hierarchical level and k tree path).
stratum, which replies with an option, oj , decision. The d-w-d process represents the importance, that an agent credits to each stratum, defined as the ratio between, the maximum expected benefit in choosing a collective and an individual decision. The expected benefit is given, at each hierarchical level-d, by the value functions of the corresponding SMDP Md . A threshold, κ ∈ [ 0, 1 ], focus-grades between collective and individual strata, thus enabling the (human) designer to specify diverse social attitudes: ranging from common-good (κ = 0) to self-interested (κ = 1) motivated agents. The CvI is a decentralized model as each agent decides whether to make a decision by itself or to ask the collective layer for a decision. The comprehensive description of the CvI model refers to [15].
2.3
Given the individual stratum set of agents, Υ, and a collective stratum agent, υ, the design of a CvI instance is a 3-step process: i. For each j ∈ Υ, specify Oj — the set of options and its hierarchical organization. ii. For each j ∈ Υ, and from the agent υ perspective, identify the subset of cooperation tasks, C j ⊆ Oj — the most effective options to achieve coordination skills; the remaining options, J j = Oj − C j , represent purely individual tasks. iii. For each j ∈ Υ, assign κ its regulatory value — where κ = 0 is a common-good motivated agent, κ = 1 is a self-interested attitude, and κ ∈ ] 0, 1 [ embraces the whole spectrum between those two extreme decision motivations. A simple, domain-independent design defines C j (item ii above) as the multi-step options; hence J j as the one-step options. Also, the highest hierarchical level(s) are usually effective to achieve coordination skills as they escape from getting lost in lower level details.
3
The CvI dynamics. At each decision epoch, agent gets the partial perception, ω j , and decide-who-decides (d-w-d), i.e., the agentj either: i) chooses an option oj ∈ Oj , or ii) requests, the collective
THE JOINT-INTENTIONS (JI) MODEL
The precise semantics for the intention concept varies across the literature. An intention is often taken to represent an agent’s internal commitment to perform an action, where a commitment is specified as a goal that persists over time, and a goal (often named as desire) is a proposition that the agent wants to get satisfied; an intention can also represent a plan that an agent has adopted to reach or a state that the agent is committed to bring about [3, 4, 9, 16]. The framework of joint-intentions (JI) adopts the semantics of the “intention as a commitment to perform an action” and extends it to describe the concept of teamwork. A team is described as a set, of two or more agents, collectively committed to achieve a certain goal [5]. The teamwork agents (those acting within a team) are expected to first form future-directed joint-intentions to act, keep those joint-intentions over time, and then jointly act. Formally, given a set of agents, Υ, a team is described as a 2-tuple T ≡ α, g , where the team members are represented by α ⊆ Υ, and the team goal is g. In a team all members, α, are jointly committed to achieve the goal, g, while mutually believing that they are all acting towards that same goal. The teamwork terminates as soon as all members mutually believe that there exists at least one member that considers g as finished (achieved, impossible to achieve or irrelevant).
4 j
The design of CvI agents
THE HYBRID CvI-JI DECISION MODEL
Given the CvI (cf. section 2) decision-theoretic model we regard the JI approach as a way to reduce the collective option space exponentially in the number of team members. For example, given Υ
P. Trigo and H. Coelho / A Hybrid Approach to Multi-Agent Decision-Making
agents, all with the same cooperation tasks, C, there are at most |C||Υ| admissible options to choose; during α, g teamwork, that number reduces to |C||Υ|−|α| and such reduction motivates the formulation of the hybrid CvI-JI decision model. The next sections address two questions: i) how to specify, at design time, the JI using the CvI components, and ii) how to integrate, at execution time, the JI specification in the CvI decision process.
4.1
Specify JI using the CvI components
The teamwork goal. The JI describes teamwork in terms of goals which, in general, take multiple time periods until satisfaction. The CvI specifies decisions in terms of options which are temporally abstract actions. Therefore, a (team) goal corresponds to a (team) option. Given a goal, g, described as a proposition, ϕ, we formulate the corresponding option as I , π, β , where, I is the set of states where ¬ ϕ is satisfied, β( s ) = 1 if s ∈ ( S −I ) or β( s ) = 0 otherwise, and π is any policy to satisfy ϕ (i.e., to terminate the option). The teamwork commitment. The JI only requires agents to “keep the joint-intentions commitment over time, and then jointly act”. It is up to the agent to decide when to terminate executing an ongoing task and effectively start acting to achieve the team goal. Thus, being jointly committed to a goal, g, does not imply immediate action toward that same goal, g. For example, two ambulances may jointly commit to the same disaster while one of them is executing an action (e.g., delivering an injured civilian); as soon as the ongoing task is terminated, the ambulance starts acting towards the team goal. Therefore, our CvI-JI formulation assumes that, at each decision epoch, an agent may establish a JI while still acting to satisfy another intention (either individual or JI). Thus, at each instant, an agent may have an ongoing activity and also one (at most) established JI. Our approach enables teamwork decisions to be asynchronous; agents do not need to wait, for each others’ option termination, before committing to a JI. Our hybrid CvI-JI option selection function distinguishes two teamwork stages: i) the “ongoing task continue” when an agent decides to establish a JI (becomes a team member) even though the agent still executes some other task, and ii) the “team option startup” when a team member decides to start executing the team option. Given a team member, j, a team option, o, and its initiation set, I, we define the ongoing states, I ongo: j ⊂ I, where j is allowed to continue executing an ongoing task while jointly committed to achieve the team option, o. The teamwork reconsideration. The JI assumes that once an agent commits to a team goal he will fulfil that commitment. The CvI is a stochastic model so we assume the possibility that an agent drops a previous commitment before actually starting to act as a team member. Given agent j we define the commitment probability, p commit: j , that j meets his engagement. The teamwork design component. The CvI-JI combines all the above (team option, ongoing set and commitment probability) into a “teamwork design component” tdcj ≡ oj , I ongo: j , p commit: j , which describes, for agent, j ∈ Υ, and team option, oj ∈ Oj , the set of states, I ongo: j , where the agent may continue an ongoing task before start executing oj , and the probability, p commit: j , of effectively committing to oj . The design of the tdc structure assumes that: i) a team option is always represented in more than one agent, ii) a tdcj is specified for each team option that j may get committed, and iii) theI ongo: j specification considers the j’s environment local view. The CvI-JI model describes, via tdc, the domain-dependent teamwork knowledge which contributes to reduce the collective option space. Thus, CvI integrates JI as an heuristic filter (at collective stra-
415
tum) that reifies the (human) designer domain knowledge. The next section integrates the heuristic filter in the decision process.
4.2
Integrate JI in the CvI decision process
The integration of the JI in the CvI decision process is designed, at the collective stratum, by modifying the CvI option selection process, which chooses, at each decision epoch, a level d collective option, od given perceived state, s, and a set of agents, B, that request for a collective stratum decision. The algorithm 1 shows the option selection function, CHOOSE O PTION, and the inclusion of the two subroutines, APPLY F ILTER -JI (cf. line 3) and UPDATE F ILTER -JI (cf. line 5), that implement the CvI-JI integration. Algorithm 1 Choose option at level d of CvI collective stratum. d , πd , B ) 1 function CHOOSE O PTION( s, O d, B ) d ← getAdmissibleOptionSet( s, O 2 O 3 Od ← APPLY F ILTER -JI( s, Od , B ) d , O d , πd ) 4 od ← applyPolicy( s, O 5 UPDATE F ILTER -JI( od , B ) 6 return od 7 end function The getAdmissibleOptionSet function (cf. algorithm 1, line 2) is the same as in CvI; evaluates Io of each collective option, oo , and re d , of admissible options (given the perceived s and turns the set, O the set of agents, B, that requested a level d collective stratum decision). The applyPolicy function (cf. algorithm 1, line 4) chooses the next collective option to execute; the policy, πd , is either predefined or follows some explore-and-exploit reinforcement learning method. We followed the learning approach and implemented a -greedy policy, which picks: i) a random admissible collective option, oo ∈ d , with probability , and ii) otherwise, picks the highest estimated O action value collective option, at the current state, s, already considering the JI commitments (i.e., picks the maxoo ∈ O d Q ( s, oo ) ). The algorithm 2, APPLY F ILTER -JI function, shows the integration of JI commitments throughout the manipulation of the tdc instances. The set of goals that call for teamwork effort are represented by the global TDC set (cf. line 3) which is initially empty. The first part (cf. lines 2 to 10, algorithm 2) determines the TDC set of admissible tdc from agents that requested for a level d collective stratum decision. The teamwork reconsideration concept (cf. section 4.1) is represented by the possibility of discarding a previously established and currently admissible JI (cf. algorithm 2, line 5). The second part (cf. lines 11 to 16, algorithm 2) restricts the collective options to those that are compatible (all oo components match) with the team options of all tdc ∈ TDC ; the remaining collective options are discarded. The algorithm 3, UPDATE F ILTER -JI function, describes the strategy used, at each decision epoch, to select a team goal and to find the set of agents that are available to commit to that team goal (i.e., select a goal, g, and find the set, α ⊆ Υ, of agents available to form a team T ≡ α, g ). The implemented strategy simply selects the first admissible team goal and assumes that each agent “is available to commit to a team goal as long as he is not already a team member”. The TDC set is updated (cf. algorithm 3) according to that strategy, for all agents, at each decision epoch.
5
EXPERIMENTS AND RESULTS
We propose the teamwork taxi coordination problem that extends the previous taxi coordination problem [6, 15] and enforces the team-
416
P. Trigo and H. Coelho / A Hybrid Approach to Multi-Agent Decision-Making
Algorithm 2 Apply JI to reduce collective options’ admissible set. d , B ) 1 function APPLY F ILTER -JI( s, O 2 TDC ← ∅ 3 for each tdc ∈ TDC do 4 if ( s[ j ] ∈ / tdc.I ongo: j ) ∧ ( tdc.j ∈ B ) then 5 if random ≤ tdc.p commit: j then 6 TDC ← TDC ∪ { tdc } 7 end if 8 TDC ← TDC − { tdc } 9 end if 10 end for d ← ∅ 11 O d do 12 for each oo ∈ O 13 if oo is compatible with TDC then d ∪ { oo } d ← O 14 O 15 end if 16 end for d d = O d when TDC = ∅ 17 return O !O 18 end function Algorithm 3 Strategy to update the set, TDC, containing the selected team goal and the agents available for a JI. 1 function UPDATE F ILTER -JI( od , B ) 2 teamOption ← false 3 for each tdc ∈ DTDC do ! DTDC ≡ designed tdc elements 4 if ¬ teamOption then 5 o ← tdc.o ! o ≡ a team option 6 end if 7 for each ag ∈ Υ do 8 if ( od [ ag ] = o ) ∧ ( od [ tdc.j ] = o ) ∧ 9 ( ag ∈ B ) ∧ ( ag = tdc.j ) then 10 TDC ← TDC ∪ { tdc } 11 if ¬ teamOption then 12 teamOption ← true 13 end if 14 end if 15 end for 16 end for 17 end function work behavior, as follows: “passengers appears at an origin site and wants to get transported to a destination site; there are some predefined sites where passengers only accept to get transported all together (as in a family)”; those sites are named teamwork sites as taxis must work as a team to transport all passengers at the same time. The experimental setup is given by: i) a 5 × 5 grid, ii) 4 sites, Sb = { b1 , b2 , b3 , b4 }, iii) 2 taxis, St = { t1 , t2 }, iv) 3 passengers, Spsg = { psg 1 , psg 2 , psg 3 }, and v) a single, btw ∈ Sb , teamwork site. The primitive actions, available to each taxi, are pick, put, move( m ), where m ∈ { N, E, S, W } are the cardinal directions, and the wait action supports the agent’s synchronization (at teamwork sites). The problem is partially observable as a taxi does not perceive the other taxis’ locations; it is collectively observable as the combination of all individual observations determines a sole world state. We defined 3 different CvI-JI configurations, each assigning all j ∈ Υ the same p commit: j ∈ { 0, 12 , 1 } value. Therefore, we define: i) never JI, when p commit: j = 0, ii) sometimes JI, when p commit: j = 12 , and iii) always JI, when p commit: j = 1. The goal of the individual stratum is to learn how to execute tasks (e.g. how to navigate to a site and when to pick up a passenger). The
goal of the collective stratum is to learn to coordinate the individual tasks to minimize the resources (time) to satisfy passengers’ needs. The learning of the policy at the collective stratum occurs simultaneously with the learning of each agent’s policy at the individual stratum. The results of the experiments (cf. section 5.4) show the hybrid CvI-JI performance improvement of the collective stratum learning process, when compared with the pure CvI (i.e., never JI) approach.
5.1
JI specification
The JI is specified as a set of predefined tdc instances. The tdc instance is defined, for each taxi (agent) tj ∈ St as btw , I ongo: tj , p commit: tj . The btw is the teamwork site. The I ongo: tj specifies the following ongoing state set: i) the taxi, tj , already transports a passenger, or ii) there is a passenger to pick up at tj current location. The p commit: tj is assigned the value 0, 12 or 1, respectively for the never JI, sometimes JI or always JI experiment configuration.
5.2
Individual stratum specification
Each taxi’s observation, ω = x, y, psg 1 , psg 2 , psg 3 , is its ( x, y )-position and passenger, psg i = loci , desti , orig i , status where loci ∈ Sb ∪ St ∪ { t1acc , t2acc } (t1acc means that taxi j accomplished delivery), desti ∈ Sb , and orig i ∈ Sb . Therefore, the state space, perceived by each taxi, is describe by a total of 5 × 5 × (8 × 4 × 4)3 = 52,428,800 states. The taxi capability is a 3-level hierarchy, where root is the multi-step level-zero option, navigate( b ) is the multi-step level-one option, pick, put and wait are the one-step level-one options and move( m ) are the level-two one-step options (for each navigate( b )); a total of 5 multi-step options and 7 one-step actions. The taxi is not equipped with any explicit definition of its goal; also, it does not hold any internal representation of the maze grid. The taxi j decision is based solely on the information available at each decision epoch: i) its perception, ω j , and ii) the immediate reward provided by the last executed one-step action. The immediate taxi rewards are: i) 20 for delivering a passenger, ii) −10 for illegal pick or put, iii) −12 for any illegal move action in a teamwork site, and iv) −1 for any other action, including moving into walls and picking more than one passenger in a teamwork site.
5.3
Collective stratum specification
The collective stratum perceives s = t1 , t2 , psg 1 , psg 2 , psg 3 which combines all the individual stratum partial observations, where tj is the ( x, y )-position of agent j. Therefore, the collective stratum state space is describe by (5 × 5)2 × (8 × 4 × 4)3 = 1,310,720,000 states. The collective stratum chooses mainly among multi-step options, so we specify: i) C = { navigate( b ) for all b ∈ Sb } ∪ { wait } ∪ { indOp }, and J = { pick, put }, where indOp is an implicit option representing J at the collective stratum. The indOp option gives place to a ping-pong decision scenario between strata, whenever an agent chooses to “request for a collective stratum decision” and the collective stratum replies: “decide yourself but consider only your purely individual tasks”. Hence, the decision forwards back to the agent (via indOp) raising a second opportunity for the agent “to choose an option in J ”. The ping-pong effect, while giving a second decision opportunity, does not increase the communication between strata and reduces, to |J |, the individual decision space. We assume that agents equitably contribute to the current state. Thus, the collective reward is the sum of rewards provided to each agent; our purpose is to maximize the long run collective reward.
P. Trigo and H. Coelho / A Hybrid Approach to Multi-Agent Decision-Making
5.4
The CvI-JI experimental evaluation
Our experiments evaluate the influence of the JI integration in the CvI model, by measuring the learning process performance (quantified as the collective stratum cumulative reward). An episode starts with 2 passengers in the teamwork site and the third passenger in another site; the episode terminates as soon as all passengers reach their destination; each experiment executes for 700 episodes. Policy learning follows the SMDP Q-learning [2, 12] approach with the -greedy strategy (cf. section 4.2). Each experiment starts with = 0.15 and, after the first 100 episodes, decays 0.004 every each 50 episodes. We ran 3 experiments, one for each CvI-JI configuration. Figure 2 shows that the never JI configuration exhibits the worst performance; about 6.5% worse than always JI and about 12% worse than sometimes JI; the difference remains almost uniform throughout the whole experiment. The sometimes JI reveals an unexpected behavior while, around episode 300, it starts to outperforms always JI.
Cumulative Reward
100
200
300
Episode
400
500
600
700
reinforcement learning framework. The initial experimental results, of the CvI-JI model, sustain the hypothesis that the JI heuristic reduction of the action space improves the process of learning a policy to coordinate multiple agents. An interesting conclusion is that, taking into account our preliminary results, the teamwork reconsideration concept suggests investigating the hypothesis that not fulfilling a commitment (at a specific state) is an opportunity to find an alternative path that, in the long run, is globally better than teamwork. This work describes the ongoing research steps to construct agents that participate in the decision-making process that occurs in the response to a large-scale disaster. Future work will apply the CvI-JI in a a simulated disaster response environment [8] and will explore teamwork (re)formation strategies [14] at the collective stratum.
ACKNOWLEDGEMENTS This research was partially supported by LabMAg FCT R&D unit.
REFERENCES
-200000 -250000 -300000 -350000
neverJI
alwaysJI
sometimesJI
-400000
Figure 2. The influence of JI in the performance of the learning process.
An insight on these results is that the JI teamwork heuristic is exploited by the collective stratum, without compromising the exploration (search for novelty) that is required by the learning process. Somehow unexpected was that, being able not to fulfill a previous teamwork commitment (cf. sometimes JI), enables to find improvements over the fully reliable commitment attitude (cf. always JI). The CvI-JI enables continuous (non interrupted) flow of decision-making and task execution activities. Such asynchronous process opens a time space between the instant the agent establishes a JI and the instant the agente actually begins acting to achieve the JI. The possibility to reconsider a commitment, just before actually start acting, explores alternatives to teamwork. The ability to drop a pre-established JI enables to find individual activity in state points where the heuristic approach (JI) would suggest a teamwork approach. Results (cf. figure 2) show that the exploration of individual policies combined with the heuristic teamwork approach enables to improve the process of learning a coordination policy. The experiment’s dimension. In this experiment, an agent perceives 52,428,800 states, and the collective stratum contains 1,310,720,000 states. Each decision considers 6 individual options and 36 collective options. Hence, this experimental world captures some of the complexity of the decision-making process that aims to achieve coordinated behavior in a disaster response environment.
6
417
CONCLUSIONS AND FUTURE WORK
In this paper, we identified a series of relations between the 2-strata decision-theoretic CvI approach and the joint-intentions (JI) mental-state based reasoning. We extended CvI by exploring the algorithmic aspects of the CvI-JI integration. Such integration represents our novel contribution to a multi-agent hybrid decision model within a
[1] Craig Boutilier, ‘Sequential optimality and coordination in multiagent systems’, in Proceedings of the Sixteenth International Joint Conferences on Artificial Intelligence (IJCAI-99), pp. 478–485, (1999). [2] Steven Bradtke and Michael Duff, ‘Reinforcement learning methods for continuous-time Markov decision problems’, in Proceedings of Advances in Neural Information Processing Systems, volume 7, pp. 393– 400. The MIT Press, (1995). [3] Michael Bratman, ‘What is intention?’, in Intentions in Communication, 15–31, MIT Press, Cambridge, MA, (1990). [4] Philip Cohen and Hector Levesque, ‘Intention is choice with commitment’, Artificial Intelligence, 42(2–3), 213–261, (1990). [5] Philip Cohen and Hector Levesque, ‘Teamwork’, Noˆus, Cognitive Science and Artificial Intelligence, 25(4), 487–512, (1991). [6] Thomas Dietterich, ‘Hierarchical reinforcement learning with the MAXQ value function decomposition’, Journal of Artificial Intelligence Research, 13, 227–303, (2000). [7] Mohammad Ghavamzadeh, Sridhar Mahadevan, and Rajbala Makar, ‘Hierarchical multi-agent reinforcement learning’, Autonomous Agents and Multi-Agent Systems, 13(2), 197–229, (2006). [8] Hiroaki Kitano and Satoshi Tadokoro, ‘RoboCup Rescue: A grand challenge for multi-agent systems’, AI Magazine, 22(1), 39–52, (2001). [9] Anand Rao and Michael Georgeff, ‘BDI agents: From theory to practice’, in Proceedings of the First International Conference on Multiagent Systems, pp. 312–319, San Francisco, USA, (1995). [10] Martijn Schut, Michael Wooldridge, and Simon Parsons, ‘On partially observable MDPs and BDI models’, in Foundations and Applications of Multi-Agent Systems, volume 2403 of LNCS, 243–260, SpringerVerlag, (2002). [11] Gerardo Simari and Simon Parsons, ‘On the relationship between MDPs and the BDI architecture’, in Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-06), pp. 1041–1048, Hakodate, Japan, (2006). ACM Press. [12] Richard Sutton, Doina Precup, and Satinder Singh, ‘Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning’, Artificial Intelligence, 112(1–2), 181–211, (1999). [13] Milind Tambe, E. Bowring, H. Jung, Gal Kaminka, R. Maheswaran, J. Marecki, P. Modi, Ranjit Nair, S. Okamoto, J. Pearce, P. Paruchuri, David Pynadath, P. Scerri, N. Schurr, and Pradeep Varakantham, ‘Conflicts in teamwork: Hybrids to the rescue’, in Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-05), pp. 3–10. ACM Press, (2005). [14] Paulo Trigo and Helder Coelho, ‘The multi-team formation precursor of teamwork’, in Progress in Artificial Intelligence, EPIA-05, volume 3808 of LNAI, 560–571, Springer-Verlag, (2005). [15] Paulo Trigo, Anders Jonsson, and Helder Coelho, ‘Coordination with collective and individual decisions’, in Advances in Artificial Intelligence, IBERAMIA/SBIA 2006, volume 4140 of LNAI, 37–47, SpringerVerlag, (2006). [16] Michael Wooldridge, Reasoning About Rational Agents, chapter Implementing Rational Agents, The MIT Press, 2000.
418
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-418
Coalition Formation Strategies for Self-Interested Agents Thomas G´enin and Samir Aknine 1 Abstract. Coalition formation is a major research issue in multiagent systems in which the agents are self-interested. In these systems, agents have to form groups in order to achieve common goals, which they are not able to achieve individually. A coalition formation mechanism requires two definition levels: firstly agents need a common protocol to reach an agreement and secondly individual strategies are required to make efficient proposals. Both issues are addressed in this paper. First, we propose a two-phase decentralized protocol that allows agents to interact directly through message passing. Secondly we propose some strategies which allow agents to make clever proposals using the information that has already been collected from other agents. The experimental evaluation shows that the proposed mechanism allows agents to efficiently form coalitions and that the strategies make real improvements for the coalition search process.
1
INTRODUCTION
In a multi-agent system where independent agents evolve autonomously and are guided by their specific objectives, it is always difficult, even impossible to reach these objectives individually. This is mainly due to a lack of resources or expertise. One way to overcome this difficulty is by sharing agents’ resources and capabilities. This enables the agents to jointly reach their individual objectives. Such a temporary grouping dedicated to the achievement of common goals is called a coalition. The process that allows agents to build these coalitions is the coalition formation. In this article, we assume that agents are self-interested. These agents have several tasks to perform so it is often impossible for a single agent to perform them alone. In addition these tasks can be combined together. The preferences of each agent for the achievement of task combinations are represented by a utility function that is obviously different for each agent. An agent does not know the preferences i.e. the utility function of other agents of the system. It means that, initially, agents are not able to estimate which agents are likely to form a coalition for the achievement of a specific combination of tasks. In this article we propose a decentralized coalition formation protocol for self-interested agents. This protocol allows agents to interact directly through message passing. Then we propose some strategies that allow agents to select combinations of tasks cleverly, targeting potential interested agents. These strategies are based on characteristic vectors computation and on logic rule generation. An experimental evaluation of the mechanism has been carried out. The results obtained show that our mechanisms allow agents to efficiently form their coalitions. Moreover, using information about potential preferences of other agents in the proposed strategies speeds up the coalition formation process. 1
LIP6, Universit´e Paris 6, Paris, France. Email: thomas.genin@lip6.fr, samir.aknine@lip6.fr
This article is structured as follows. Section 2 presents the context of this work. Section 3 presents the protocol and the different strategies we propose. Section 4 describes the experimental evaluation of this mechanism and discusses the obtained results. Section 5 analyzes main related works. In the end, section 6 concludes our work.
2
CONTEXT AND PROBLEM DESCRIPTION
We consider a set of n agents and a set of m tasks. Agents want to perform combinations of tasks. Each agent ai has a utility function ui defined on the combinations of tasks and does not know the utility function of other agents aj . The aim of each agent is to maximize its own utility. As agents are not able to perform their tasks alone, they try to form coalitions with other agents. These coalitions will then perform the required tasks. In the following sections, N = {a1 , . . . , an } is defined as the set of agents in the system and M = {T1 , . . . , Tm } is the set of all tasks. A is a subset of N and comb is a subset of M : A ⊂ N and comb ⊂ M . To form coalitions, agents have to negotiate and exchange several messages before reaching agreements, so a shared protocol is required to perform the interaction. Moreover agents do not know initially the preferences of other agents. Consequently these agents need some strategies to select and interpret relevant proposals of task combinations and target potentially interested agents. Definition 1 (Coalition) A coalition C is a pair comprised of a set of agents A and a combination of tasks comb performed by A: A, comb with A ⊂ N and comb ⊂ M . A combination of tasks is represented by a binary vector of size m, where m is the total number of tasks in the system. A task included in the combination is represented by 1 and a task that is not included is represented by 0.
Application example To clarify our problem, we consider an example of buyers connected to a virtual bookshop in order to buy books. In this example, groupbuying allows buyers to reduce the unit price of the books [12]. However the buyers do not have the same preferences about books they want to buy. The interest of coalition formation in this problem is to allow agents to organize themselves to benefit from wholesale prices. Let us assume a set of 3 buyers a1 , a2 and a3 , interested in getting books and connected to a virtual bookshop. Six different items are available in this bookshop: a comic book o1 , a historic book o2 , a travel book o3 , a science fiction novel o4 , a philosophical essay o5 and a cookery book o6 . Potential buyers have got $10. Each book worth a unit price of $6, but there is also a wholesale price i.e. the unit price is $5 when two same books are bought. Therefore buyers try to
T. Génin and S. Aknine / Coalition Formation Strategies for Self-Interested Agents
form groups when they purchase the same books. We assume that a1 is interested in o6 but not in o1 , a2 is interested in o2 but not in o3 and a3 is interested in o5 but refuses o6 . The j th preferred combination of agent i is noted as combji . 5 combinations of the 3 agents, having highest preference are combji , i ∈ [1, 3] and j ∈ [1, 5]. For instance, a1 : (o5 , o6 ) 1 (o4 , o6 ) 1 (o3 , o6 ) 1 (o2 , o6 ) 1 (o6 ), for a2 : (o1 , o2 ) 1 (o1 , o6 ) 1 (o1 , o4 ) 1 (o1 , o5 ) 1 (o1 ) and for a3 : (o4 , o5 ) 1 (o3 , o5 ) 1 (o2 , o5 ) 1 (o1 , o5 ) 1 (o5 ). Assuming these preferences, one example of potential coalition for the group books buying, could be the coalition formed by a2 and a3 for purchasing o1 and o5 and the coalition of a1 for purchasing o6 alone. We notice here that the main difficulty for the agents is to find other potentially interested agents in a particular combination of books when they don’t have any information about their preferences at this stage of interaction.
3
PROTOCOL AND STRATEGIES
In this section, we propose a coalition formation protocol and different strategies for making proposals, and we describe the decision making method used by agents.
3.1
Protocol
The coalition formation protocol is based on message passing between each initiator agent of a proposal and its solicited agents. Initiator agent manages the negotiation concerning its proposal. Any agent in the system can make a proposal. When an agent chooses an interesting task combination and a group of solicited agents, the interaction starts with this below mentioned protocol: 1. Initiator agent sends its proposal to solicited agents 2. Solicited agents reply to the initiator agent, indicating whether they are interested in the proposal or not. 3.(a) If a solicited agent refuses the proposal, the initiator sends a withdrawal of its proposal to the solicited agents and the interaction is cancelled. (b) Otherwise the initiator sends a confirmation request to the solicited agent. 4. Solicited agents either confirm or don’t confirm their involvement in the coalition. 5.(a) If a solicited agent refuses to confirm, the initiator sends a withdrawal of its proposition to the solicited agents and the interaction is cancelled. (b) Otherwise the initiator confirms the formation of the coalition by sending a message to solicited agents. The protocol is divided into two phases: the acceptance phase (1,2,3) and the confirmation phase (3,4,5). Confirmation phase acts as a contract between the initiator agent and the solicited agents. Any agent should always be able to achieve all proposals that it has confirmed. Moreover, the first answer to a proposal (acceptance phase) is only informative. To sum up, an agent may accept as many proposals as it wants but it confirms only for the proposals it can comply with depending on its ressource or expertise.
Our strategies are based on two vectors: a characteristic vector of acceptances and a characteristic vector of refusals, which represent respectively all the accepted proposals and all the refused proposals. Definition 2 (Acceptance vector) For each agent ai , the acceptance vector V Aji of an other agent aj is a vector of size m, representing all the task combinations proposed by ai and accepted by aj and all the task combinations proposed by aj to ai . For each task t, V Aji (t) represents the ratio of task combinations containing t. In the same way, the vector of refusals V Rij represents all the task combinations proposed by ai and refused by aj . The distance d(comb, V ) between a combination comb and a characteristic vector C is defined is d(comb, V ) = m 2 1/2 i=1 (comb(i) − V (i))
3.2.1
Strategies
In this section we propose two different strategies for the selection of task combinations and solicited agents in order to make proposal.
Characteristic vectors selection strategy (CV Strategy)
For each agent ai , all accepted or refused combinations by an agent aj and all the proposals of aj to ai are represented by two vectors V Aji and V Rij . In order to generate a proposal, an agent proceeds as follows: • ai generates a list of task combinations and first selects the q preferred ones {comb1i , . . . , combiq } • ai computes the distances d(combki , V Aji ) and d(combki , V Rij ) between each combination combki and the acceptance and refusal vector of each agent aj . Then it computes the ratio of these distances. k • For each looks for the set of agents that minimizes i , ai / . comb j k d(combi ,V Ai ) moy j d(combk i ,V Ri ) • At the end ai selects the combination that minimize this mean.
d combki , V Aji ci = arg min moy d combki , V Rij k∈[1,q],j∈[1,n] Example 1 (continues) Let’s illustrate these strategies using the example presented in section 2. We assume that a2 has proposed the combinations comb12 = [1, 1, 0, 0, 0, 0] and comb22 = [1, 0, 0, 0, 0, 1] to a1 and a3 . We also assume that a1 and a3 have refused them. At the same time, a2 received the proposals comb11 = [0, 0, 0, 0, 1, 1] and comb21 = [0, 0, 0, 1, 0, 1] from a1 and comb13 = [0, 0, 0, 1, 1, 0] and comb23 = [0, 0, 1, 0, 1, 0] from a3 that it refused. From the point of view of a2 , V R21 and V R23 are computed using comb12 and comb22 which have been refused by both a1 and a3 : V R21 = V R23 = [1, 0.5, 0, 0, 0, 0.5]. Using the received proposals, a2 computes V A12 = [0, 0, 0, 0.5, 0.5, 1] and V A32 = [0, 0, 0.5, 0.5, 1, 0]. When a2 intends to make a new proposal, it pre-selects the two combinations comb32 = [1, 0, 0, 1, 0, 0] and comb42 = [1, 0, 0, 0, 1, 0]. It calculates the distances between these combinations and the four characteristic vectors V A12 , V A32 , V R21 and V R23 (cf. table 1). Finally, agent a2 selects the combination and the agent that minimize the ratio of distances (cf table 2). In this example, these combinations are comb42 and agent a3 .
3.2.2
3.2
419
Rule based Selection Strategy (RG Strategy)
The strategy presented in this subsection is based on binary rule generation algorithms. Rule generation consists of building a logic formula from an incomplete logic table. This formula has to correctly
420
T. Génin and S. Aknine / Coalition Formation Strategies for Self-Interested Agents
Table 1. Distances between proposed combinations and characteristic vectors (example 1). VC comb23 comb24
V A12
V A32
V R21
V R23
1.58
1.58
1.22
1.22
1.58
1.22
1.22
1.22
Table 2. Ratio of distances between proposed combinations and characteristic vectors (example 1). a1 a3 comb32
comb42
1,29
1,29
1,29
1
From each set of combinations, a3 computes an acceptance vector. Consequently there are as many acceptance vectors as conjunctions in the generated formula: V A23 (1) = [1, 1, 0, 0, 0, 0] and V A23 (2) = [1, 0, 0, 0, 0, 1]. For a1 , the algorithm generates only one conjunction: conj31 (1) = [−1, −1, −1, −1, −1, 1], and the corresponding acceptance vector V A13 (1) = [0, 0, 0, 0.5, 0.5, 1]. a3 computes the distances between comb43 = [1, 0, 0, 0, 1, 0] and the three vectors V A13 (1), V A23 (1) and V A23 (2) (cf. table 3). These distance ratios are shown in table 4. a3 selects the agent which minimizes these ratios i.e. a2 with a ratio of 1.34 for the two acceptance vectors V A23 (1) and V A23 (2). Table 3. Distances between proposed combinations and characteristic vectors (example 2). VC comb43
match with the table. To do so we use the OCAT algorithm which generates formula in disjunctive normal form (DNF) (for details, see [6]). This rule generation algorithm processes the combinations of tasks and expected predicate results. These combinations are used as an input table represented in a first order logic formalism. Logic variables are the tasks: the presence of a task in a combination is labeled true and its absence is labeled false. Moreover refused combinations are labeled false and accepted ones and received proposals are labeled true. As an output, we obtain a logic formula in which variables are the tasks (present or not). Each conjunction of the DNF rejects all the refused combinations of the tasks and accepts a subset of accepted combinations and received proposals. After processing these formulas, each agent ai computes for every other agent aj several acceptance vectors V Aji (1), V Aji (2),... one from each different subset of accepted combinations and received proposals. First ai gathers the combinations of tasks which are compatible with each conjunction of the DNF. From each set of combinations, ai calculates an acceptance vector. A DNF formed with p conjunctions gives p acceptance vectors. Finally for each acceptance vector, ai calculates distance ratios and chooses the minimum. Logic formulas are represented by vectors. A variable which is present in the vector and which is positive is represented by 1, the presence of a negative variable is represented by 0 and the absence of a variable by -1. For instance, the conjunction T1 ∧ T3 is represented by the vector [1, −1, 0]. Example 2 (continues) Now we assume that agents a1 , a2 and a3 have the following preferences. For a1 : (o5 , o6 ) 1 (o4 , o6 ) 1 (o3 , o6 ) 1 (o2 , o6 ) 1 (o6 ), for a2 : (o1 , o2 ) 1 (o1 , o6 ) 1 (o1 , o4 ) 1 (o1 , o5 ) 1 (o1 ) and for a3 : (o4 , o5 ) 1 (o3 , o5 ) 1 (o1 , o3 ) 1 (o1 , o5 ) 1 (o5 ) We also assume that a3 has proposed comb13 = [0, 0, 0, 1, 1, 0], comb23 = [0, 0, 1, 0, 1, 0] and comb33 = [1, 0, 1, 0, 0, 0] to a1 and a2 , which they have refused. Also agent a3 has received comb11 = [0, 0, 0, 0, 1, 1] and comb21 = [0, 0, 0, 1, 0, 1] from a1 and comb12 = [1, 1, 0, 0, 0, 0] and comb22 = [1, 0, 0, 0, 0, 1] from a2 , which it has refused. Agent a3 can compute refusal vectors of a1 and a2 : V R31 = V R32 = [0.33, 0, 0.66, 0.33, 0.66, 0]. a3 applies its rule generation algorithm on accepted and refused proposals of a1 and a2 . a3 gets a formula composed of two conjunctions conj32 (1) = [−1, 1, −1, −1, −1, −1] and conj32 (2) = [−1, −1, −1, −1, −1, 1], using this algorithm on the proposals of a2 . Then a3 collects the combinations of tasks that are compatible with these conjunctions.
Table 4.
V A23 (1)
V A23 (2)
V R31
V R32
1.58
1.41
1.41
1.05
1.05
Ratio of distances between proposed combinations and characteristic vectors (example 2).
comb43
3.3
V A13 (1)
a1 , V A13 (1)
a2 , V A23 (1)
a2 , V A23 (2)
1.5
1.34
1.34
Agent Decision Making
Agents face two levels of decision making during the coalition formation mechanism, one in the acceptance phase (Step 2 of the protocol, section 3.1) and a second in the confirmation phase (Step 4). To make these decisions, agents use two utility thresholds: acceptance utility ua and confirmation utility uc , with ua ≤ uc . Acceptance utility ua is the threshold beyond which a proposal is accepted as in step 2. Confirmation utility uc is the threshold beyond which a proposal is confirmed as in step 4.
4 4.1
EXPERIMENTAL STUDY Experimental settings
In the following experiments, the multi-agent system is composed of 30 agents and each coalition is formed for 1 to 4 tasks. The maximum number of proposals that an agent can formulate is set to 200. At the end of each experiment, agents that did not form a coalition get the ualone utility, which corresponds to the utility of the highest combination of tasks the agent can perform lonely (without coalition). The results given in this section are obtained on an average of 5 experiments. We have used an additive utility function to represent the preferences of agents and to generate these preferences, agents assign to each task a random score between -10 and 10. The utility of a combination is the sum of the utilities of all the tasks forming this combination. In the second part of the experiment, we have kept same preferences but we have applied logic formulas on the tasks. Agents do not consider the combinations of tasks that do not validate them. The thresholds ua and uc are set with ualone and umax which is the utility of the most preferred combination: ux = ualone + λx ∗ (umax − ualone ), x ∈ {a, c}, λx ∈ [0, 1] (1)
421
T. Génin and S. Aknine / Coalition Formation Strategies for Self-Interested Agents
4.2
Experimental Results
Firstly, we keep uc = ua and λx varies between 0 and 1 (equation 1). Then we measure the number of agents which join a coalition at the end of the experiment (the execution stops when every agent has made its 200 proposals or has joined a coalition) then we compare the strategies presented in the previous section. We use a third basic strategy where the agents propose each combination in decreasing order of utility, starting with the most preferred one. Each combination is proposed to disjoined groups of agents. The initiator agent maintains a list of the agents which have confirmed, accepted or refused the combination. So by using this list, the combination is proposed to the agents which have already confirmed or accepted it. These results are shown figure 1. We can observe that for values of λx greater than 0.6, agents are very selective and it is quite hard for the three strategies to form coalitions. For values of λx lower than 0.3, agents are less selective and then we observe the same behavior: most of agents are able to join a coalition. Finally, for values between 0.3 and 0.7, we notice that CV and RG strategies allow more agents to form coalitions than basic strategy.
Figure 3. Evolution of the number of formed coalitions with respect to the number of propositions for λa = λc = 0.45
Now we show the results of other experiments performed, by adding in the same system different agents which implement different strategies. We used 10 agents implementing CV strategy, 10 agents implementing RG strategy and 10 agents implementing basic strategy. We observe the distribution of the strategies implemented by agents which are the initiators of the formed coalitions. These results are shown in table 5. We observe that more than half of the coalitions are initiated by agents implementing CV strategy and that only 12% of the coalitions are initiated by the agents implementing basic strategy. Table 5. Distribution of the strategies of initiator agents of effectively formed coalitions for a simulation of 30 agents: each strategy is implemented on 10 agents CV Strategy RG Strategy Basic Strategy 54%
Figure 1. Evolution of the number of agents in a coalition with respect to λ when uc = ua
In figures 2 and 3 we can observe the evolution of the number of coalitions formed according to the number of proposals. We notice that in figure 2, for less selective agents, the three strategies allow agents to form coalitions quickly: in average at the 30th proposal almost all coalitions are formed for CV and RG strategies and most of them for basic strategy. When agents get a higher selection threshold (figure 3), CV and RG strategies still allow agents to form coalitions quickly, but with basic strategy it is harder to form coalitions and at the end of the experiment only few coalitions are formed.
34%
12%
We notice in figure 4 that for ua strictly lower than uc , the number of agents that finally joined a coalition decreases. In fact agents accept combinations of tasks that are not confirmed afterwards. Strategies that are based on these combinations to guide selection are less efficient.
Figure 4. Number of agents in a coalition w.r.t. λa for a fixed λc = 0.5
Figure 2. Evolution of the number of formed coalitions with respect to the number of propositions for λa = λc = 0
In other experiments, we have modified the utility functions by adding logic reasoning on the tasks. We applied an exclusive OR (XOR) on the preferred tasks of each agent: we keep only the combinations of tasks including the most preferred task or the second preferred task. The combinations including both of them are simply rejected. Next we have made other experiments with logical implication for
422
T. Génin and S. Aknine / Coalition Formation Strategies for Self-Interested Agents
preferred tasks. Intuitively, RG strategy should allow agents to find inherent logic formulas, and should be more efficient than CV strategy. These results are shown in table 6, and we observe that CV and RG strategies are better than basic strategy. However RG strategy is not as efficient on logic preferences as expected. This inefficiency is mainly due to the limited amount of data available for the rule generation algorithm. Some of the generated rules match with the set of rejected, accepted, confirmed and received proposals but are not the equivalent of the initial logic formulas (XOR or logical implication), which biases the results. Table 6.
Additive utility and logic. Average number of agents in a coalition. λa = λc = 0.6 CV Strategy RG Strategy Basic Strategy
XOR IMPLICATION
5
17.2 7.6
16.2 8
10.4 1.2
RELATED WORK
Several coalition formation methods have been proposed but only a few of them deal with self-interested agents and are really decentralized. Kraus, Shehory and Taase [4] proposed a coalition formation protocol for the request for proposal (RFP) domain. This protocol is used by Shehory in [9] and Westwood and Allan in [13]. The process is based on a central manager (CM) entity, which manages communications and makes proposals to the agents. The process is performed in several rounds. At each negotiation round, the CM sorts the agents randomly. Each agent, in its turn, can either send a proposal for forming a coalition or accept a previous proposal received from another agent. Each agent has only one turn in each round and proposals are valid for one round. This protocol allows a decentralization of the decision process even though communications are still centralized at the CM level. Other works are based on defining a common utility function for the coalitions [7, 10, 8]. In game theory this problem is similar to the coalitional function games (CFG). For example, the utility of a coalition can be a net income that the members of a coalition gain together and have to share. Zlotkin and Rosenschein [14] proposed a coalition formation mechanism that uses cryptographic techniques for subadditive task oriented domain. Sandholm et al. [7] developped an anytime algorithm for finding the optimal coalition structure, establishing a worst case bound on the quality of the solution. Rahwan et al. [5] proposed an efficient algorithm for distributing the coalitional value calculations among agents in cooperative environments. Several other works used a common utility function like in the request for proposal (RFP) problems [9, 4, 13]. In our work, values of coalitions are utilities which are different for each agent, as in [2]. Additionally, we focus on the study of the strategies that agents should use to come to agreements on the coalitions. Some works use learning techniques in their coalition formation process. For example, Aknine and Shehory [2] propose a coalition formation mecanism based on tasks relationship analysis and derivation of intentions. Chalkiadakis and Boutilier [3] implement a Bayesian reinforcement learning in a way that enables coalition participants to reduce their uncertainty regarding coalitional values and the capabilities of others. Soh and Li [11] use learning mechanisms (reinforcement learning based on past helpful cooperation between agents and case base reasoning). Their aim is to improve the quality of the coalition process compared to the quality of the coalitions. Finally, Aknine
et al. [1] propose a coalition formation method based on preference model for cooperative and self-interested multi-agent systems.
6
CONCLUSION
In this article we have addressed the problem of coalition formation in multi-agent systems. Our work specially focuses on self-interested agents. Several works have underlined the difficulty of solving a coalition formation problem in such a context, mostly due to the intrinsic autonomy property of the agents and to the difficulty of convergence of these systems to acceptable solutions. In this paper, we have addressed this problem by considering agents which have different utility functions. This assumption is useful since it enlarges the application scope of the proposed mechanism. To solve this problem, we have proposed an original mechanism taking into account the main features of competitive agents. We have proposed a two phase protocol which only requires information about the preferred task combinations that agents want to perform. Then this mechanism has been enhanced with several strategies based on the analysis of the agents’ proposals. The proposed mechanism has been implemented and tested. The results of the experiments have shown that our strategies allow agents to get their preferred coalitions and improve the coalition formation process. In future work, we intend to analyse the scalability of our mecanism and adapt it to large scale systems.
REFERENCES [1] S. Aknine, S. Pinson, and M. Shakun, ‘A multi-agent coalition formation method based on preference models’, Group Decision and Negotiation, 13, 513–538(26), (2004). [2] S. Aknine and O. Shehory, ‘Reaching agreements for coalition formation through derivation of agents’ intentions’, in ECAI, pp. 180–184, (2006). [3] G. Chalkiadakis and C. Boutilier, ‘Bayesian reinforcement learning for coalition formation under uncertainty’, in AAMAS ’04, pp. 1090–1097, Washington, (2004). IEEE. [4] S. Kraus, O. Shehory, and G. Taase, ‘Coalition formation with uncertain heterogeneous information’, in AAMAS ’03, pp. 1–8, New York, (2003). ACM Press. [5] T. Rahwan, S.D. Ramchurn, V.D. Dang, and N.R. Jennings, ‘Nearoptimal anytime coalition structure generation’, in IJCAI ’07, pp. 2365– 2371, (2007). [6] S. N. Sanchez, E. Triantaphyllou, C. Jianhua, and T. W. Liao, ‘An incremental learning algorithm for constructing boolean functions from positive and negative examples’, Oper. Res., 29(12), 1677–1700, (2002). [7] T. Sandholm, K. Larson, M. Andersson, O. Shehory, and F. Tohme, ‘Coalition structure generation with worst case guarantees’, Artif. Intell., 111(1-2), 209–238, (1999). [8] T. Sandholm and V. R. Lesser, ‘Coalitions among computationally bounded agents’, Artificial Intelligence, 94(1-2), 99–137, (1997). [9] O. Shehory, ‘Coalition formation: Towards feasible solutions’, Fundam. Inf., 63(2-3), 107–124, (2004). [10] O. Shehory and S. Kraus, ‘Methods for task allocation via agent coalition formation’, Artificial Intelligence, 101(1–2), 165–200, (1998). [11] L.K. Soh and X. Li, ‘An integrated multilevel learning approach to multiagent coalition formation’, in IJCAI ’03, pp. 619–624, (2003). [12] M. Tsvetovat and K. Sycara, ‘Customer coalitions in the electronic marketplace’, in AGENTS ’00, pp. 263–264, New York, NY, USA, (2000). ACM Press. [13] K. Westwood and V.H. Allan, ‘Heuristics for dealing with a shrinking pie in agent coalition formation’, in IAT ’06, pp. 537–546, Washington, (2006). IEEE. [14] G. Zlotkin and J.S. Rosenschein, ‘Coalition, cryptography, and stability: Mechanisms for coalition formation in task oriented domains’, in National Conference on Artificial Intelligence, pp. 432–437, (1994).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-423
423
1
2
2
θ ∈ Θ θ = (θ1 , . . . , θi , . . . , θn ) θi ∈ Θ i
A = {1, . . . , n}
i
θi
Fi Oi ⊂ O
ci : Oi → R− I i ⊂ Fi ri : Gi → R+
Gi ⊂ Fi
I I=
π = oi11 , . . . , oimm i πi = oj | oij ∈ π Π θ I=
i∈A
1 2
Ii
oij ∈ Oi
π π Result (I, π) θ θ i oi ∈πi ci oj
Ii π = oi11 , . . . , oimm
c (π, θ) = j i i ci (π, θ) = oij ∈π ci oj θ
π r (π, θ) =
i∈A
i∈A
g∈Gi
ci (π, θ)
ri (g) 0
i U (π, θ) = c(π, θ) + r(π, θ)
g ∈ Result (I, π) ri (π, θ) π
424
R. van der Krogt et al. / Of Mechanism Design and Multiagent Planning
(f, p1 , . . . pn ) π
π = f (θ)
i ui (π, θ) = vi (π, θ) − pi (θ)
Π i
f (θ)
vi (π, θ) = ci (π, θ) + ri (π, θ)
i θi = {Fi , Oi , ci , Ii , Gi , ri } i Θi
i
(θi , θ−i )
θ θi
i
ci Gi Oi ⊂ Oi
Ii
⊂ Ii Oi ⊃ Oi
(f, p1 , . . . pn )
ri
i θ1 , . . . , θ i , . . . , θ n ∈ Θ 1 × . . . × Θ i × . . . × Θ n θi ∈ Θi vi (f (θi , θ−i ) , θ) − pi (θi , θ−i ) ≥ vi (f (θi , θ−i ) , θ) − p (θi , θ−i )
Ii ⊃ Ii
π∈Π
Π
v
θ
v (π, θ) =
vi (π, θ) = i∈A
π (f, p1 , . . . , pn )
ci (π, θ) + ri (π, θ) i∈A
vi (π, θ) v(π, θ)
i π π ˆ
• f (θ) ∈ arg maxπ∈Π v (π, θ) f • h1 , . . . , hn : Θn−1 → R θ = (θ1 , . . . , θn ) pi (θ) hi (θ1 , . . . , θi−1 , θi+1 , . . . , θn ) − j =i vj (f (θ) , θ)
π i
=
hi (θ−i ) = 0
vi (ˆ π , θ) hi (θ−i ) i
f pi p1 , . . . , p n
i
f i∈A
f (θ) f (θ) ∈ argmaxπ∈Π vi (π, θ)
θ
pi : Θ1 × · · · × Θn → R i (f, p1 , . . . pn ) f : Θ 1 × · · · × Θn → Π p1 , . . . , pn i
3
i
hi (θ−i )
425
R. van der Krogt et al. / Of Mechanism Design and Multiagent Planning
ui (π, θ) = vi (π, θ) − pi (θi , θ−i ) = vi (π, θ) +
vj (π, θ) + hi (θ−i ) = v(π, θ), j =i
π = f (θi , θ−i )
i
i
θi i r (π, θ) + ci (π, θ) = v(π, θ) i i∈A
U (π, θ) =
r(G) =
i
i∈A
gi ∈Gi
ri (gi ) π
f i i
pi
hi (θ) = 0 π
i
πi i
i
π
π
i ui (π, θ) = v(π, θ)
θi
Oi ⊃ Oi
π
π i
i π
j = i G G = {g ∈ Gj | g ∈ Result (I, π ) \ Result (I, π)} i π ui (π , θ) = vi (ˆ π , θ) − pi (θi , θ−i ) = v(π , θ) + vi (ˆ π , θ) − vi (π , θi , θ−i ) vi (ˆ π , θ) i π ˆ θi vi (π , θi , θ−i ) θi i π vi (ˆ π , θ) = ri (ˆ π , θ) + oi ∈ˆπi ci (oj ) = vi (π , θi , θ−i )
ui (π , θ) = v(π , θ). π v(π , θ) > v(π, θ) π i π ui (π , θ) > u(π, θ) G
j
i θi
π i
i
π
π
ui (π , θ) = v(π , θ) r(G)
r(G) ≥ v(π , θ) ui (ˆ π , θ) ≤ 0 i
π i
i ui (π, θ) = r(G) − r(G) + v(π, θ) ≥ 0
π ˆ=π r(G)
hi (θ−i ) = 0
i
426
R. van der Krogt et al. / Of Mechanism Design and Multiagent Planning
ri
θ
m n
f m m
n
fbw n (f, p1 , . . . , pn )
m m
f
fbw
p(·) i∈A
vi (π, θ)
fbw
n + 2m fbw (fbw , p1 , . . . , pn )
f i∈A
(f, p1 , . . . , pn )
f Π = {f (θ) | θ ∈ Θ}
Θ vi (π, θ)
π ∈ Π
fbw
Π f θ ∈ Θ f (θ)
K
f f
fd :
Θ → Π
d K fdK fdK
K K i=0
K
|G| i
fdK ≤ K · |G|K
fdK
K fbw
K
fbw K fbw
fbw K fbw
K
θ = {F, O, c, I, G, r} F
O c
I
4
i Gi
R. van der Krogt et al. / Of Mechanism Design and Multiagent Planning
427
428
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-428
IAMwildCAT: The Winning Strategy for the TAC Market Design Competition Perukrishnen Vytelingum and Ioannis A. Vetsikas and Bing Shi and Nicholas R. Jennings1 Abstract. In this paper we describe the IAMwildCAT agent, designed for the TAC Market Design game which is part of the International Trading Agent Competition. The objective of an agent in this competition is to effectively manage and operate a market that attracts traders to compete for resources in it. This market, in turn, competes against markets operated by other competition entrants and the aim is to maximise the market and profit share of the agent, as well as its transaction success rate. To do this, the agent needs to continually monitor and adapt, in response to the competing marketplaces, the rules it uses to accept offers, clear the market, price the transactions and charge the traders. Given this context, this paper details IAMwildCAT’s strategic behaviour and describes the wide techniques we developed to operationalise this. Finally, we empirically analyse our agent in different environments, including the 2007 competition where it ranked first.
1
Introduction
Continuous Double Auctions (CDAs) have traditionally been used in stock markets in order to trade securities and other financial commodities. Their attraction lies in the fact that any trader (buyer or seller) can come into the market, at any point, and place a shout for buying (resp. selling) at some desired price, and a trade will take place almost instantly, if there is a matching offer to sell (resp. buy) at that or a better price. Given this, most of the existing work on CDAs addresses ways of designing effective strategies that maximize a trader’s profit. However, there is considerably less literature on the design of market protocols for such auctions in order to promote desirable properties (such as improved efficiency or reduced market volatility). Moreover, in today’s globalised economy, stocks are often traded simultaneously in different (competing) markets around the world. Thus, the different markets need to differentiate themselves and appeal to traders to conduct their business under their jurisdiction (e.g. by offering attractive prices for participation and trading). To rectify this shortcoming, the TAC Market Design Competition (CAT) provides a test-bed for exploring the problem of designing competitive and efficient markets (see [3] for the competition rules). Each CAT game lasts a number of days, and each day consists of a number of trading rounds, which each lasts for a known constant length of time. A number of traders and a number of markets participate (the former are determined by the competition organisers, while the latter are the competition entrants). Each trader is given a finite set of goods to trade and is assigned a private value (also referred to as a limit price) for each good. The difference between this price and the transaction price represents the profit of the agent in the transaction; their total profit in the market is the sum of these transaction prof1
University of Southampton, UK, email: {pv,iv,bs07r,nrj}@ecs.soton.ac.uk
its minus any fees that they incur in participating in the market. The traders use various well-known strategies from the CDA literature: ZI, ZIP, GD, RE (Roth-Erev) [5, 1, 4, 6] and are allowed to register with a different market at the beginning of each day. They also have a memory of the profit they achieved historically in each market such that they are more likely to register to the market where they made the highest profit. Thus, the markets must compete for traders by clearing transactions efficiently and not charging excessive fees. The different competing markets are represented by specialists, each of which is an agent entered by a separate competitor. These specialists set the rules for their respective market; they determine which shouts are accepted in the market (quote-accepting rule), which shouts will be matched for transactions (clearing rule) and at what price (pricing rule), as well as the fees to charge for various services (charging policy). The score of each agent is a combination of three different metrics: the profit obtained as a percentage of the total profit obtained by all specialists, the market share of the agent (i.e. the percentage of traders who register with the specialist), and the transaction success rate (TSR) (i.e. the percentage of shouts accepted by the market that resulted in a transaction). To be successful, therefore, an agent needs to be competitive in making profit, attracting traders and ensuring that shouts placed in the market result in transactions. While these goals are not necessarily contrary to each other, there are a number of trade-offs to be resolved here. For example, charging larger fees will increase the profit but decrease the market share, while improving the TSR by accepting fewer shouts will result in fewer total transactions and thus less profit both for the specialist and the traders. In order to design an effective specialist agent, we decided to break the agent down to multiple components, where each one deals with a particular trade-off. Then looking at each component, we designed it in such a way as to balance that trade-off. For example, we designed a clearing rule that allows us to maximize the TSR with a minimal drop in the efficiency of the transactions, and a pricing policy, that manages to extract enough profit without compromising the agent’s market share. Similar methods, of breaking down a complex problem into multiple parts and then selecting strategies for and optimizing each one separately, have successfully been used in other complex trading domains [9]. Drawing inspiration from this approach, we also started testing the various individual components using experimental comparisons. The goals of these experiments are two-fold: to determine the best possible agent design, and to examine the behaviour of the market and how it is affected by the different strategies. Against this background, in this paper we make the following contributions. First, we describe, for the first time, the various policies of our agent. We explain how the various trade-offs guided the design of the agent and how each one was addressed, in order to generate the most competent and successful agent that participated in the
P. Vytelingum et al. / IAMwildCAT: The Winning Strategy for the TAC Market Design Competition
competition. Thus, we designed a number of novel strategies, e.g. clearing shouts, in some rounds, to maximise the number of transactions cleared rather than the profit. Second, we experimentally evaluate the performance of our agent. We compare the efficiency and performance of our agent against that of the other competitors in the competition. Here we show that our agent achieved the best and most stable performance, both in the score and across other metrics (i.e. attracting “good” traders and maintaining a high market efficiency). This paper is organized as follows. In Section 2, we give a complete description of our agent and all its components. In Section 3, we present the experiments we conducted. Then, we conclude.
2
The IAMwildCAT Strategy
Given this background on the CAT game and its goals, our objective is to design an agent that maximises the scoring function. Specifically, our strategy consists of a set of different market rules and the charging policy (see Figure 1). Each of these policies involves a particular trade-off; in the rest of this section, we detail how we designed the agent in order to resolve each trade-off.
average (giving more weight to more recent transactions because of the expected convergence) that is reset at the end of each trading day. Furthermore, because traders register with a specialist at the beginning of each trading day, it is impossible to ensure that the same set of traders will remain in the market. Thus we do not expect the equilibrium price to remain constant across trading days. Given this, we reset the moving average of the equilibrium price at the beginning of each day. On the first round, because of the high variance of transaction prices [10] and, hence, the poor estimate of the equilibrium price, we set αr to be proportional to the variance of the transaction prices (for more slack at the beginning of the trading day). In more detail, intra-marginal2 traders are expected to trade earlier than extra-marginal traders (as their better shouts are more likely to be cleared first) such that most of the profitable transactions occur earlier during the trading day, with the extra-marginal and marginal traders left to bid at the end of the day. To avoid marginal bids and asks (that are slightly lower and higher than the equilibrium price respectively) being submitted and risking that they remain uncleared at the end of the trading day, we further constrain our quote-accepting rule. In particular, on the last few trading rounds, our strategy only accepts bids and asks that can be currently cleared. Thus, we minimise the number of uncleared bids and asks, improving our TSR.
2.2
Figure 1. Structure of the IAMwildCAT Strategy
2.1
The Quote-Accepting Rule
We first consider the quote-accepting rule which selects the bids and asks that are accepted into the market (i.e. not all bids submitted by the traders will necessarily be accepted into the marketplace). Such rules are typically employed to speed up the bidding process (e.g. the NYSE quote-accepting rule [10] specifies that any new quote must improve upon the currently outstanding quote), as well as to improve the properties of the auction (e.g. reducing price fluctuations [8]). In the CAT platform, because TSR is a measure of success, it is important to reject the “poor” bids and asks that the market does not expect to clear. Now, we could maximise the TSR by accepting only a few really “good” shouts. However, the fewer shouts that are accepted, the smaller the number of transactions and thus the smaller the profit of both agents and traders; it also makes the market less attractive to traders, which impacts the market share. Thus, we need to select just the right shouts in order to balance this trade-off. The micro-economic theory of competitive equilibrium states that transaction prices are expected to converge to the competitive equilibrium price p∗ where demand meets supply [2]. Thus, we expect the bids (resp. asks) that will be cleared in the market to be at least as high (resp. low) as the competitive equilibrium price. The aim, then, is to accept these bids and asks, rejecting those bids below and those asks above this price. Now, because we can only estimate the equilibrium based on the convergence of transaction prices, we assume some error in our estimation and provide some slack, αr and αa , when deciding the minimum bid, bidmin = (1 − αr )p∗ − αa or maximum ask, askmax = (1 + αr )p∗ + αa to accept. We estimate the competitive equilibrium price using a weighted moving
429
The Clearing Rule
The clearing rule defines when and how to clear the market. There are two parts to this rule. The first is when to clear. One approach is to collect all bids and asks and clear the market at the end of the trading day to maximise profits. However, because traders bid for single units at a time, this approach would imply traders have the opportunity to trade only a single unit (unable to trade the rest of their multi-unit endowment). An alternative approach is to maximise the number of transactions (instead of profits) by a continuous clearing rule whenever a bid or an ask is accepted in the market (e.g. the Continuous Double Auction clearing [10]). Given this, our strategy adopts a rule in between these two approaches, with the market clearing at the end of each round. In this way, we can be almost as efficient as clearing at the end of the day, while allowing the traders to still trade multiple times. By so doing, we get most of the benefits from both approaches without the drawbacks. The second part is how to match bids. At the end of each round, our agent has a list of shouts to clear. It can try to maximize the number of transactions, by matching “bad” shouts with “good” shouts, but in so doing, it will reduce the efficiency of the market and give less average profit to the traders (which will have an impact on the market share primarily). On the other hand, it can match the shouts efficiently, and maximise profits to the traders, but it will generate less transactions (and TSR). As mentioned earlier, intra-marginal traders are expected to trade earlier than marginal (and extra-marginal) traders such that the amount of profit to be extracted in the market is higher earlier during the trading day, with less profit to be made at the end of the trading day. Thus we chose the following strategy to deal with this trade-off: our agent clears the market for maximum profits at the end of the earlier rounds of the trading day, while, on the following rounds, with less profits to be made in the market, our agent clears to maximise the number of transactions. By so doing, some extramarginal traders are allowed to transact while increasing the number 2
An intra-marginal buyer (resp. seller) is expected to trade in the market because of its limit price is higher (resp. lower) than the equilibrium price. The remaining traders are extra-marginal.
430
P. Vytelingum et al. / IAMwildCAT: The Winning Strategy for the TAC Market Design Competition
of transactions and hence the TSR, at the expense of some profits (though these are generally low at this point).
2.3
The Pricing Rule
The pricing rule determines the price at which a transaction occurs when a buyer and a seller are chosen to transact (by the clearing rule). This price can have any value between the ask and bid prices. Initially, we used primarily discriminatory k-pricing3 with k = 0.5; this means that the mean of the ask and bid prices is chosen as the transaction price. In the competition, we used a variation of this policy, called side-biased pricing, which varies k, depending on the number of buyers and sellers participating in the market. Specifically, we looked at a window of the latest 10 trading days for the average number of buyers and sellers our agent attracted, and if the difference between the number of buyers and sellers is bigger than 10% of the total number of traders, we adjust k (proportionately to this difference) in order to give more profit to the side which is under represented. We do this in an attempt to attract more of them. However, as we wanted to be somewhat conservative4 , we only allow k to vary in k ∈ [0.3, 0.7]. In Section 3.3, we discuss this issue in more detail and examine the preformance of the two policies.
2.4
The Charging Policy
The charging policy determines the specific charges that are levied from the traders in the system. A registration fee is paid by traders in order to register with the market agent at the beginning of the day, irrespective of whether they transact or not. An information fee is paid if transaction history information is obtained. A shout fee and a transaction fee are the amount payed respectively when a shout is placed and when a transaction occurs. The profit fee is the percentage of the difference between the accepted shout and the transaction price that is paid by the traders to the market.5 Before we describe our policy in detail, it is necessary to note the ways in which an agent’s charging policy changes its score: • the score is increased each day by the percentage of the profit that the agent achieved compared to all agents; this means that extracting profit is most efficient for small absolute values of the profit (compared to the total profit extracted by everyone else). • the market share is decreased by an amount which is relatively proportional to the absolute value of the profit that any agent extracts in total. These two facts led us to design a charging policy that is mainly trying to maintain a minimum amount of target market share, while at the same time extracting the best possible score from the profit, without compromising the market share. More specifically, we use a target profit percentage charging policy, that during each day aims to extract a predetermined profit score. This target score depends on the agent’s current market share M S. Specifically, our agent aims to maintain a target market share M Starget which takes a value in: 1 1.25 M Starget ∈ [ M , M ], where M is the total number of competing markets. Thus it tries to obtain a market share slightly higher than the average market share that all markets have. We regulate our market 3 4
5
The value of k determines the difference of the transaction price from the ask and bid prices. In the CAT game, because traders consider their entire history of profits and some randomness introduced in the trader’s selection of the market to trade in, the effect of giving more profit to one side could be delayed; if we are too aggressive, it might lead us to overshoot our goal of balancing the populations of buyers and sellers and thus cause the behaviour to oscillate. Note that if this is 100%, then the pricing rule does not matter at all, since all the difference between the ask and bid prices is levied by the market.
share by getting more profit than our opponents when our market share is high, and less when our market share is below our target. We thus distinguish between two states in this strategy: • If M S < M Starget , then the market is in trader attraction mode6 and we aim to extract a small profit percentage equal to P % = 50% ; as this percentage is about half that of the average profit M made by other agents, it will lead (all other things being equal) to an increase of market share within some trading days. • If M S > M Starget , then the market is in trader exploitation mode7 and we aim to extract a larger profit percentage equal to P % = 200% ; as this percentage is about twice that of the averM age profit made by other agents it will lead (all other things being equal) to a reasonable score, but at a cost of some market share loss within the next trading days. The target share M Starget is gradually decreased if trader attraction mode lasts for more than 10 days and is increased for every day that the agent is in trader exploitation mode. In more detail, let Π, σ, τ and φ be, respectively, the total opponent profit, the number of traders in our market, the number of transactions and the average transaction profit (measured as the difference between the ask and bid prices in each transaction), averaged over the last few days. These average values are reasonable expectations for these variables during the following day. Our agent target profit πtarget is set to πtarget = P % · Π. Therefore the average fee π paid by each trader must be target . In trader attraction mode, we set σ the registration fee equal to 75% of this value, while, in exploitation mode, this is set to 50%. The remaining profit is extracted through the profit fee by dividing the remainder by φ. If this value is more than 100%, then we set the profit fee to 100% and gain the remaining profit by additionally setting a transaction fee equal to the remaining profit divided by τ . We don’t set an information nor a shout fee. The reason for chosing to extract most of the profit through the registration fee is because all traders, whether intra or extra-marginal, pay this, while only successful (i.e. intra-marginal) traders pay the other two. In this way, we also achieve the effect of attracting the desirable, intra-marginal traders and driving away the undesirable, extra-marginal traders. A final adjustment to this strategy is made to account for the beginning and end of the game. As market share is more important at the beginning and becomes progressively less so towards the end, we try to build market share at the beginning, by not extracting any profit for a set number of days (set to 80), and increasing the target percentage during the last 100 days of the game, and in particular during the last 40, when the increase becomes quite pronounced.8
3
Evaluation
In this section, we analyse the performance of our specialist against other competitors entered in the CAT competition. To this end, we adopt a similar experimental setup9 as in the competition, with a 6
7
8
9
To avoid thrashing, we also count the number of trading days since we last switched modes in the strategy; there is a minimum number of days since the last switch before the next switch is allowed. In fact we use this additional rule before we switch to this mode: we aim to exploit when the total profit made by the opponents drops below its historical average (by a certain discount), as this will allow us to get more score with less penalty to the market share. This discount is being adjusted depending on the number of times that this rule succeeds or fails. It should be noted that the length of the CAT game, during the competition, was set to 500 days and each day had 10 rounds; these facts were common knowledge and this allowed us to use this end game strategy. Note that, in our experiments, we used all the available binaries of competition entrants, with the exception of Havana (because of the unavailability of the CPLEX optimisation library they employ) and PSUCAT (because of their unstable implementation).
P. Vytelingum et al. / IAMwildCAT: The Winning Strategy for the TAC Market Design Competition
game10 running over 500 trading days each lasting 10 trading rounds. The trader population comprises of 180 ZIP traders, 180 RE traders, 20 ZI traders and 20 GD traders, equally split as buyers and sellers. Each trader is endowed with 10 goods to buy or sell at a limit price that is independently drawn from a uniform distribution between 50 and 150 such that the theoretical equilibrium price11 is 100. In particular, we first analyse the competition results reported by Nui et al. [7] in Subsection 3.1. Then, we analyse in detail the performance of IAMwildCAT. Specifically, we consider the following aspects that Nui et al. do not analyse. First, we look at how the number of globally intra-marginal12 buyers and sellers compares over the trading days (to analyse its effectiveness in attracting “good” traders) in Subsection 3.2. Second, we look at our policy for side-biased pricing in Subsection 3.3 and how it improved our market share and, finally, we look at some more general experiments on the efficiency of our strategy in a homogeneous environment in Subsection 3.4. The purpose of this exercise is to observe its efficiency if all the agents adopt the IAMwildCAT strategy. Note that in figures 2, 3 and 4 we plot only the 5 best strategies for clarity.
3.1
Intra-Marginal and Extra-marginal Traders
We observe in Figures 2 and 3 that the ratio of intra-marginal traders registered with IAMwildCAT converges to 0.9 (which is considerably higher than that of the other agents). This suggests that our agent successfully incentivises intra-marginal traders to join its market, driving away extra-marginal ones. This is done through setting the fees appropriately (see the charging policy in Subsection 2.4) such that extra-marginal traders, which are not expected to trade, would make negative profit by being charged a registration fee. A market with more intra-marginal traders would imply better bids and asks that can be cleared, which improves our TSR in the process. Now, 10
We repeat each game for 15 runs to improve our estimate of performance. Because the limit prices are drawn from a uniform distribution, the demand and supply curves are expected to be linear, intersecting at 100. 12 A trader is globally intra-marginal if it is intra-marginal when we consider all traders in the system. In our experiments, buyers (resp. sellers) are expected to be intra-marginal if their limit prices are at least higher (resp. lower) than the theoretical equilibrium price at 100. 11
Percentage of intra-marginal buyers.
Figure 3.
Percentage of intra-marginal sellers.
The CAT Competition
Nui et al. reported the results of the 2007 CAT competition which was won by IAMwildCAT, with the highest score (at 240.2) outperforming the second placed one by 13% and the third one by 25% [7]. They also empirically evaluated all strategies to identify how they perform in difference cases. They showed that IAMwildCAT had the lowest standard deviation (at 2.8), which suggests consistent behaviour over all the runs. Furthermore, they showed that IAMwildCAT had the highest market share and the highest TSR throughout most of the games. We attribute the former to our strategic choice of maximising market share at the beginning, sacrificing all profits. After 80 trading days, our agent starts charging the traders (see Subsection 2.4) which gradually increases our profit share. We typically expect its market share to decrease (as traders are less profitable in its market). However, by adapting its charging policy effectively, IAMwildCAT does not compromise its market share and, indeed, it is able to increase its profit share while sustaining its market share. Furthermore, our quote-accepting and clearing strategies (see Subsections 2.1 and 2.2) are proved to be very effective, with the TSR increasing from 0.92 at the beginning of the game to over 0.99 after 150 days, outperforming that of all the other agents.
3.2
Figure 2.
431
the intuition behind this ratio capping at around 0.9 is that given the trader’s selection strategy, there is a probability of 0.1 that a trader, whether it is intra-marginal or extra-marginal, randomly selects a specialist. Thus, there is always a chance that extra-marginal traders will register with a specialist, such that the ratio can never be 1.
3.3
Discriminatory Versus Side-Biased Pricing
We next evaluate our side-biased pricing policy (where we vary the k parameter); we considered an experiment with 7 different agents, including IAMwildCAT (with this policy) and a modified version of IAMwildCAT, which used the fixed discriminatory k-pricing policy. We believe it is necessary to vary k because intra-marginal traders in a specialist’s market might not necessarily be globally intra-marginal. Thus, given our aim to incentivise only intra-marginal traders to join our market, we vary k to give more profit to globally intra-marginal traders than to globally extra-marginal ones. Here, we analyse the effect of side-biased pricing on our strategy. Now, from Figure 4, we observe that our side-biased pricing policy does increase our ratio of intra-marginal sellers to intra-marginal buyers in the market. However, it introduces a small bias for sellers with more intra-marginal sellers than intra-marginal buyers. It is also interesting to note that IAMwildCAT has a ratio of globally intramarginal sellers to buyers stable around 1 compared to the hugely varying one of the other agents. This is indeed effective behaviour as a ratio that deviates from 1 implies an equilibrium price that is higher or lower than the theoretical equilibrium in the global market such that some of the profits are distributed to globally extra-marginal
432
P. Vytelingum et al. / IAMwildCAT: The Winning Strategy for the TAC Market Design Competition
Experiment
Global Efficiency
Convergence Coefficient
6 PersianCATS
90.9%
8.1
6 IAMwildCATS
90.6%
6.2
6 Heterogeneous CATS
88.7%
6.4
6 CorcodileAgents
79.8%
6.1
Figure 6.
Figure 4.
Ratio of intra-marginal buyers to sellers.
Figure 5. Market share with discriminatory and side-biased pricing.
traders at the expense of globally intra-marginal ones. While the pricing does not affect the specialist’s profit share (but rather the distribution of profits among buyers and sellers) or its TSR, we can see from Figure 5 that our side-biased pricing is an improvement over the fixed discriminatory pricing, since it does increase the market share.
Efficiency of homogeneous and heterogeneous markets.
offs present in the design of the agent and gave our strategic rules for quote-accepting, clearing, pricing and charging. We analysed the competition results and, in particular, the IAMwildCAT agent’s market share, profit share and transaction success rate compared to the other agents. We then looked at how IAMwildCAT is very successful at incentivising intra-marginal traders to join its market, driving away extra-marginal ones. Furthermore, we examined experimentally the advantage of our side-biased pricing over the standard fixed discriminatory pricing and showed that our agent is able to balance the number of globally intra-marginal buyers and sellers which avoids distributing profits to undesirable, extra-marginal traders. Finally, we analysed the strategies outside the scope of the competition by looking at the market efficiency in homogeneous and heterogeneous environments. As discussed in Subsection 3.4, such insights are particularly important if agents are allowed to change strategies and they all choose the most efficient one. We empirically demonstrated that a market with only IAMwildCAT agents does reasonably well at only 0.3% less than the most efficient one, PersianCAT, while outperforming the heterogeneous market in terms of market efficiency. As future work, we intend to improve on all the policies we currently have. For example, we intend to improve our charging policy, by better understanding how the different fees individually affect the market share and profit share. This would allow us to experiment with various combinations of strategies (like in [9]) and select the best combination, so as to improve our agent even more. As such strategies are designed to be more and more effective, they will be the foundations for automating real markets in a global economy.
ACKNOWLEDGEMENTS 3.4
Homogeneous and Heterogeneous Markets
Finally, as per previous evaluation methodologies of double auctions [10, 2], we analyse the global efficiency (and the convergence of the daily market efficiency) of the strategies in both homogeneous and heterogeneous settings. Now, if agents were allowed to select their strategy, they would all chose the most efficient one, i.e. IAMwildCAT, and it would then be very insightful to see how the market efficiency changes if all agents use the same strategy. In particular, in a homogeneous setting, IAMwildCAT does better than in the heterogeneous setting, with a global efficiency of 90.6% (see Figure 6). While PersianCAT has the highest global efficiency (slightly higher than IAMwildCAT at 90.9%), it does poorly in the heterogeneous environment where it scores 128.8, i.e. 47% less than IAMwildCAT. PersianCAT performs well in the homogeneous case because its strategy favours profit-maximisation (sacrificing its TSR) that contributes to the high efficiency. Thus, overall, IAMwildCAT performs well in both a homogeneous (with a high global market efficiency) and a heterogeneous environment (with a high score).
4
Conclusions
This paper details the IAMwildCAT agent, winner of the 2007 TAC Market Design Competition. In particular, we presented the trade-
We would like to thank Rajdeep K. Dash who participated in the initial design of IAMwildCAT. Part of this research was undertaken under ALADDIN (joint EPSRC and BAE project EP/C548051/1).
REFERENCES [1] D. Cliff and J. Bruten. Minimal-intelligence agents for bargaining behaviors in market-based environments. Tech Report HPL-97-91, 1997. [2] D. Friedman and J. Rust, The Double Auction Market: Institutions, Theories and Evidence, Addison-Wesley, New York, 1992. [3] E. Gerding, P. McBurney, J. Niu, S. Parsons, and S. Phelps, ‘Overview of CAT: A market design competition’, Tech Report ULCS-07-006, Dept. of Computer Science, University of Liverpool, Liverpool, UK, (2007). [4] S. Gjerstad and J. Dickhaut, ‘Price formation in double auctions’, Games and Economic Behavior, 22, 1–29, (1998). [5] D. K. Gode and S. Sunder, ‘Allocative efficiency of markets with zerointelligence traders: Market as a partial substitute for individual rationality’, Journal of Political Economy, 101(1):119–137, 1993. [6] J. Nicolaisen, V. Petrov, and L. Tesfatsion, ‘Market power and efficiency in a computational electricity market with discriminatory double-auction pricing’, IEEE Trans: Evolutionary Computation, 5(5), 504–523, (2001). [7] J. Niu, K. Cai, E. Gerding, P. McBurney, and S. Parsons, ‘Characterizing effective auction mechanisms: Insights from the 2007 TAC market design competition’, in AAMAS-08, 1079-1086, (2008). [8] S. Parsons, J. Niu, K. Cai, and E. Sklar, ‘Reducing price fluctuation in continuous double auctions through pricing policy and shout improvement’, AAMAS-06, 1143–1150, (2006). [9] I. A. Vetsikas and B. Selman, ‘A principled study of the design tradeoffs for autonomous trading agents’, in AAMAS-07, pp. 473–480, (2003). [10] P. Vytelingum, The structure and behaviour of the Continuous Double Auction, Ph.D. dissertation, School of ECS, Univ. of Southampton, 2006.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-433
433
Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion Natalia Akchurina1 Abstract. A reinforcement learning algorithm for multi-agent systems based on variable Hurwicz’s optimistic-pessimistic criterion is proposed. The formal proof of its convergence is given. Hurwicz’s criterion allows to embed initial knowledge of how friendly the environment in which the agent is supposed to function will be. Thorough testing of the developed algorithm against well-known reinforcement learning algorithms has shown that in many cases its successful performance can be explained by its tendency to force the other agents to follow the policy which is more profitable for it. In addition the variability of Hurwicz’s criterion allowed it to converge to best-response against opponents with stationary policies.
1
Introduction
At the middle of fifties a new field of research with a fascinating name “Artificial Intelligence” (AI) couldn’t help arousing a lot of questions. One of them was “will ever computers be intelligent enough to compete with humans in chess?”. In 1997 supercomputer Deep Blue did win the match against the world chess champion Garry Kasparov with the only question left — was the supercomputer really intelligent?! Around this time reinforcement learning was an AI technique that didn’t need supercomputer to play on the level of human world masters in backgammon! Nowadays a challenge to AI is multi-agent — to create a team of robots that will beat humans in football. Reinforcement learning that provides a way of programming agents without specifying how the task is to be achieved could be again of use here but the convergence of reinforcement learning algorithms is only guaranteed under the conditions of stationarity of the environment that is violated in multi-agent systems. Several algorithms [5], [4], [6], [3], [2] were proposed to extend this approach to multi-agent systems. The convergence was proved either for very restricted class of environments (strictly competitive or strictly cooperative) or/and against very restricted class of opponents. In this paper we propose an algorithm based on Hurwicz’s optimistic-pessimistic criterion that allows it to function effectively in a wider range of environment and prove its convergence. Variability of Hurwicz’s criterion allowed the proposed algorithm to be rational — play best-response against stationary opponents. In self play for all types (according to Rapoport’s classification [8]) of repeated 2 × 2 games the proposed algorithm has converged to pure Nash equilibrium when the later existed. Section 2 is devoted to formal definition of stochastic games — framework for multi-agent reinforcement learning, and presents the theorems that we will use in the proof of the convergence of our 1
International Graduate School of Dynamic Intelligent Systems, University of Paderborn, Germany, email: anatalia@mail.uni-paderborn.de
method in sections 3. Section 4 is devoted to the analysis of the results of thorough testing of our algorithm against other reinforcement learning algorithms.
2
Preliminary Definitions and Theorems
Definition 2.1 A 2-player stochastic game Γ is a 6-tuple S, A1 , A2 , r1 , r2 , p , where S is the discrete state space (|S| = m), Ak is the discrete action space of player k for k = 1, 2, rk : S × A1 × A2 → R is the payoff function for player k, p : S × A1 × A2 → Δ is the transition probability map, where Δ is the set of probability distributions over state space S. It is assumed that for every s, s ∈ S and for every action a1 ∈ A1 and a2 ∈ A2 , transition probabilities p(s |s, a1 , a2 ) are stationary Pm for all t = 0, 1, 2, . . . and s =1 p(s |s, a1 , a2 ) = 1. Each player k (k = 1, 2) strives to maximize its expected discounted cumulative reward: v k (s, π 1 , π 2 ) =
∞ X
γ t E(rtk |π 1 , π 2 , s0 = s)
t=0
= where γ ∈ [0, 1) is the discount factor, π 1 1 (π (s0 ), . . . , π (sm )) and π 2 = (π 2 (s0 ), . . . , π 2 (sm )) are the policies of players 1 and 2 respectively and s — initial state. π k (s) is a mixed policy in state s. 1
Definition 2.2 A Nash equilibrium point is a pair of policies (π∗1 , π∗2 ) such that for all s ∈ S and for all policies π 1 and π 2 : v 1 (s, π∗1 , π∗2 ) ≥ v 1 (s, π 1 , π∗2 ) v 2 (s, π∗1 , π∗2 ) ≥ v 2 (s, π∗1 , π 2 ) Repeated games are a special case of stochastic games when there is the same state at each time period. In [2] two properties that any learning algorithm for multi-agent systems should satisfy were formulated: Definition 2.3 (Rationality): If the other players’ policies converge to stationary policies then the learning algorithm will converge to a policy that is a best-response to the other players’ policies. Definition 2.4 (Convergence): The learner will necessarily converge to a stationary policy against agents using an algorithm from some class of learning algorithms.
434
N. Akchurina / Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion
2.1 Convergence Theorem Theorem 1 [10] Let X be an arbitrary set and assume that B is the space of bounded functions over X and T : B(X ) → B(X ) is an arbitrary mapping with fixed point v ∗ . Let U0 ∈ B(X ) be an arbitrarily value function and T = (T0 , T1 , . . .) be a sequence of random operators Tt : B(X ) × B(X ) → B(X ) such that Ut+1 = Tt (Ut , v ∗ ) converges to T v ∗ uniformly over X . Let V0 be an arbitrary value function, and define Vt+1 = Tt (Vt , Vt ). If there exist random functions 0 ≤ Ft (x) ≤ 1 and 0 ≤ Gt (x) ≤ 1 satisfying the conditions below with probability 1, then Vt converges to v ∗ with probability 1 uniformly over X : 1. for all U1 and U2 ∈ B(X ), and all x ∈ X , |Tt (U1 , v ∗ )(x) − Tt (U2 , v ∗ )(x)| ≤ Gt (x)|U1 (x) − U2 (x)| 2. for all U and V ∈ B(X ), and all x ∈ X , |Tt (U, v ∗ )(x) − Tt (U, V )(x)| ≤ Ft (x) sup ||v ∗ (x ) − V (x )|| x
Pn
3. t=1 (1 − Gt (x)) converges to infinity uniformly in x as n → ∞ 4. there exists 0 ≤ γ < 1 such that for all x ∈ X and large enough t Ft (x) ≤ γ(1 − Gt (x))
2.2
Stochastic Approximation
Let M (x) denote the expected value at level x of the response to a certain experiment. It is assumed that to each value x corresponds a random variable Y = Y (x) with distribution function P r[Y (x) ≤ R∞ y] = H(y|x), such that M (x) = −∞ ydH(y|x) is the expected value of Y for the given x. Neither the exact nature of H(y|x) nor that of M (x) is known to the experimenter. It is desired to estimate the solution x = θ of the equation M (x) = α, where α is a given constant by making successive observations on Y at levels x1 , x2 , . . . Let define a (nonstationary) Markov chain {xn } by taking x1 to be an arbitrary constant and defining xn+1 − xn = αn (α − yn ) where yn is a random variable such that P r[yn ≤ y|xn ] = H(y|xn ) constants Theorem 2 [9] P If {αn } is a fixed sequence of Ppositive ∞ ∞ 2 and αn = ∞, if such that 0 < n=1 αn = A < ∞ n=1 RC ∃C > 0 : P r[|Y (x)| ≤ C] = −C dH(y|x) = 1 for all x and M (x) is nondecreasing, M (θ) = α, M (θ) > 0 then limn→∞ E(xn − θ)2 = 0
3
Optimistic-Pessimistic Q-learning Algorithm with Variable Criterion (OPVar-Q)
Competitive or cooperative environments are just extreme cases. In most cases the environment where our agent will function is competitive / cooperative to some degree. In this section we are proposing a reinforcement learning algorithm (OPVar-Q) based on Hurwicz’s optimistic-pessimistic criterion [1] that allows us to embed preliminary knowledge of how friendly the environment will be. For example, parameter λ = 0.3 means that we believe that with 30% probability the circumstances will be favourable and the agents will
act so as to maximize OPVar-Q’s reward and in 70% will force it to achieve the minimum value and we choose the strategy in each state that will maximize our gain under the above described circumnstances (OPVar-Q with λ = 0.3 tries more often to avoid low rewards than to get high rewards in comparison with OPVar-Q(0.5)). The algorithm is presented for 2-player stochastic game but without difficulty can be extended for arbitrary number of players. Algorithm 1 OPVar-Q (for player 1) Input: parameters λ, , α (see theorem 3) for all s ∈ S, a1 ∈ A1 , and a2 ∈ A2 do Q(s, a1 , a2 ) ← 0 V (s) ← 0 π(s, a1 ) ← 1/|A1 | end for loop Choose action a1 from s using policy π(s) with probability 1− and with probability select an action at random Take action a1 , observe opponent’s action a2 , reward r1 and succeeding state s provided by the environment Q(s, a1 , a2 ) ← (1 − α)Q(s, a1 , a2 ) + α(r1 + γV (s )) if opponent’s has become stationary then j policy P 1 2 = arg max π (s, a2 )Q(s, a1 , a2 ) 1 a 1 2 a a π(s, a1 ) ← 0 otherwise P V (s) ← maxa1 a2 π 2 (s, a2 )Q(s, a1 , a2 ) else 8 1 a1 = arg maxa1 [(1 − λ)· > > < · mina2 Q(s, a1 , a2 )+ π(s, a1 ) ← +λ maxa2 Q(s, a1 , a2 )] > > : 0 otherwise V (s) ← maxa1 [(1 − λ) mina2 Q(s, a1 , a2 ) + λ maxa2 Q(s, a1 , a2 )] end if end loop
Lemma 3.1 Let Q : S ×A1 ×A2 → R then for Hurwicz’s criterion: H(Q(s)) = max[(1 − λ) min Q(s, a1 , a2 ) + λ max Q(s, a1 , a2 )] a1
a2
a2
where 0 ≤ λ ≤ 1 the following inequality holds: |H(Q1 (s)) − H(Q2 (s))| ≤ max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )| a1 ,a2
Proof. |H(Q1 (s)) − H(Q2 (s))| = = −
| max[(1 − λ) min Q1 (s, a1 , a2 ) + λ max Q1 (s, a1 , a2 )] a1
a2
a2
1
max[(1 − λ) min Q2 (s, a , a ) + λ max Q2 (s, a1 , a2 )]| a1
2
a2
a2
1
≤
max |(1 − λ)(min Q1 (s, a , a ) − min Q2 (s, a1 , a2 ))
+
λ(max Q1 (s, a , a ) − max Q2 (s, a , a2 ))|
≤
max[|(1 − λ)(min Q1 (s, a , a2 ) − min Q2 (s, a1 , a2 ))|
a1
a2 1
a2 1 2
a2
a1
2
a2 1
a2 1
a2 1
+
|λ(max Q1 (s, a , a ) − max Q2 (s, a , a2 ))|]
≤
max[(1 − λ) max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )|
a2
b
a1
2
a2
N. Akchurina / Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion
+
λ max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )|]
=
max max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )|
a2
a1
a2
The above holds due to the triangle and the following inequalities [10]: | max Q1 (s, a1 , a2 ) − max Q2 (s, a1 , a2 )| ≤ ak
≤
ak
max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )| ak
against opponent with stationary policy (π 2 ), and to the stationary policy defined by fixed point of operator X p(s |s, a1 , a2 )H(Q(s )) [T Q](s, a1 , a2 ) = r1 (s, a1 , a2 ) + γ s
against other classes of opponents. Proof. Let further on V (Q(s)) = BR(Q(s), π 2 (s)) when the opponent follows stationary policy (π 2 ) and V (Q(s)) = H(Q(s)) otherwise. Let Q∗ be fixed point of operator T and
| min Q1 (s, a1 , a2 ) − min Q2 (s, a1 , a2 )| ≤ ak
≤
435
M (x)
ak
max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )|
= −
ak
x − r1 (s, a1 , a2 ) X γ p(s |s, a1 , a2 )V (Q∗ (s ))) s
where k = 1, 2 Lemma 3.2 Let Q : S × A1 × A2 → R and π 2 be the policy of player 2 then for X 2 π (s, a2 )Q(s, a1 , a2 ) BR(Q(s), π 2 (s)) = max a1
a2
the following inequality holds: |BR(Q1 (s), π 2 (s)) − BR(Q2 (s), π 2 (s))|
It’s evident that conditions of theorem 2 on M are fulfilled. M (Q∗ ) = α = 0 The random approximating operator: 8 (1 − αt )Qt (st , a1t , a2t )+ > > < 1 1 2 ∗ α t (r (st , at , at ) + γV (Q (st ))) Tt (Qt , Q∗ )(s, a1 , a2 ) = 1 1 if s = st and a = at and a2 = a2t > > : Qt (s, a1 , a2 ) otherwise
where yt (s, a1 , a2 ) = Qt (st , a1t , a2t ) − r1 (st , a1t , a2t ) − γV (Q∗ (st )) if s = st and a1 = a1t and a2 = a2t It is evident that the other conditions will be satisfied if st is ranProof. domly selected according to the probability distribution defined by p(·|st , a1t , a2t ) 2 2 Then according to theorem 2 Tt approximates the solution of the |BR(Q1 (s), π (s)) − BR(Q2 (s), π (s))| = X 2 equation M (x) = 0 uniformly over X = S × A1 × A2 . In other 2 1 2 = | max π (s, a )Q1 (s, a , a ) to T Q∗ uniformly over X . words, Tt (Qt , Q∗ ) converges a1 8 1 1 a2 < 1 − αt if s = st and a = at X 2 2 1 2 2 2 1 2 π (s, a )Q2 (s, a , a )| − max and a = at Let Gt (s, a , a ) = : a1 1 otherwise a2 8 X 2 X 2 2 1 2 2 1 2 γα if s = st and a1 = a1t < t π (s, a )Q1 (s, a , a ) − π (s, a )Q2 (s, a , a )| ≤ max | 1 2 1 a and a2 = a2t and Ft (s, a , a ) = a2 a2 : X 2 0 otherwise π (s, a2 )[Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )]| = max | Let’s check up conditions of theorem 1: a1 a2 X 2 1. when s = st and a1 = a1t and a2 = a2t : π (s, a2 ) max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )|| ≤ max | max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )|
≤
a1
=
a1 ,a2
a2
a2
max max |Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )| a1
|Tt (Q1 , Q∗ )(s, a1 , a2 ) − Tt (Q2 , Q∗ )(s, a1 , a2 )| =
a2
= |(1 − αt )Q1 (st , a1t , a2t ) + The above holds due to the inequalities that we used for proving + αt (r1 (st , a1t , a2t ) + γV (Q∗ (st ))) lemma 3.1. Now we are ready to prove the convergence of our algorithm in a − (1 − αt )Q2 (st , a1t , a2t ) − usual way [10], [5], [6], [4]. − αt (r1 (st , a1t , a2t ) + γV (Q∗ (st )))| P∞ = Gt (s, a1 , a2 )|Q1 (s, a1 , a2 ) − Q2 (s, a1 , a2 )| Theorem 3 If {αt } is a sequence, such P∞that: αt > 0, 1 t=1 1χ(s2t = 1 1 2 2 2 s, at = a , at = a )αt = ∞ and t=1 χ(st = s, at = a , at = when s = st or a1 = a1t or a2 = a2t it’s evident that the condition a2 )αt2 < ∞ with probability 1 uniformly over S × A1 × A2 then holds. OPVar-Q algorithm converges to the stationary policy defined by 2. when s = st and a1 = a1t and a2 = a2t : fixed point of operator3 : X p(s |s, a1 , a2 )BR(Q(s ), π 2 (s )) [T Q](s, a1 , a2 ) = r1 (s, a1 , a2 )+γ |Tt (Q1 , Q∗ )(s, a1 , a2 ) − Tt (Q1 , Q2 )(s, a1 , a2 )| = s 2 3
χ denotes the characteristic function here. We assume here that OPVar-Q plays for the first agent.
=
|(1 − αt )Q1 (st , a1t , a2t ) +
+
αt (r1 (st , a1t , a2t ) + γV (Q∗ (st )))
436
N. Akchurina / Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion
− − = ≤
(1 − αt )Q1 (st , a1t , a2t ) − 1
αt (r (st , a1t , a2t ) + γV (Q2 (st )))| Ft (st , a1t , a2t )|V (Q∗ (st )) − V (Q2 (st ))| Ft (s, a1 , a2 ) max |Q∗ (s , a1 , a2 ) − Q2 (s , a1 , a2 )| a1 ,a2
The last inequality holds due to lemmas 3.1 and 3.2. when s = st or a1 = a1t or a2 = a2t it’s evident that the condition holds. Pn 3. t=1 (1 − Gt (x)) converges to infinity uniformly in x as n → ∞ (see the assumption of the theorem) 4. the fourth condition evidently holds.
4
Experiments
We tested OPVar-Q algorithm on 14 classes of 10-state 2×2 stochastic games (with uniformly distributed transition probabilities) and 1000 random 10-state 6-agent 2-action stochastic games (with uniformly distributed payoffs and transition probabilities) derived with the use of Gamut [7]. For the sake of reliability we derived 100 instances of each game class and made 20000 iterations. The agent plays as both the row agent and the column agent. Below in this section we will present the average rewards (including exploration stage) of the developed OPVar-Q algorithm against the following well-known algorithms for multi-agent reinforcement learning: • Stationary opponent plays the first action in 75% cases and the second action in 25% cases. • Q [11] was initially developed for single-agent environments. Learns by immediate rewards a tabular function Q(s, a) that returns the largest value for the action a that should be taken in each particular state s so as to maximize expected discounted cumulative reward. When applied to multi-agent systems Q learning algorithm ignores totally the presence of other agents though the later naturally influence its immediate rewards. • MinimaxQ [5] was developed for strictly competitive games and chooses the policy that maximizes its notion of the expected discounted cumulative reward believing that the circumstances will be against it. • FriendQ [6] was developed for strictly cooperative games and chooses the action that will bring the highest possible expected discounted cumulative reward believing that the circumstances will favor it. • NashQ [4] believes that the opponent will play its part of Nash equilibrium and is proved to converge against itself in games with only one Nash equilibrium. • JAL [3] believes that the average opponent’s strategy very well approximates the opponent’s policy in the future and takes it into account while choosing the action that maximizes its expected discounted cumulative reward. • PHC [2] in contrast to Q learning algorithm changes its policy gradually in the direction of the highest Q values. • WoLF[2] differs from PHC only in that it changes its policy faster when losing and more slowly when winning. The results of the experiments showed that the developed algorithm can function on the level (sometimes better) of its opponents which though don’t possess both properties: rationality (convergence to best-response against opponents with stationary policies) and convergence to stationary policies against all types of opponents. Because of the limitation on space we present only the analysis of
a few game classes which should be sufficient to understand the general notion of interaction between the developed OPVar-Q and the above presented multi-agent reinforcement learning algorithms. The test classes are presented in general form, where A, B, C, D are uniformly distributed in the interval [−100, 100] payoffs and A > B > C > D. We will analyze the result as though OPVarQ played for the row agent. For all games we chose neutral parameter λ = 0.5 for OPVar-Q. To illustrate the gain of variable Hurwicz’s optimistic-pessimistic criterion against stationary opponents we compare our algorithm with algorithm OP-Q(0.5) which is based on the same principle as OPVar-Q but doesn’t make a difference between opponents with stationary and non-stationary policies. Horizontal line on the graphics is the average reward of Nash equilibrium. Q, PHC, WoLF, JAL turned out to have very similar final behavior. Small difference in the performance of these algorithms is due to a bit different manner of tuning the policy and underlying mechanism.
4.1 4.1.1
Battle of the Sexes Type 1 Table 1. Battle of the sexes: type 1
A,B C,C
C,C B,A
After a short exploration phase OP-Q (and OPVar-Q at first) chooses the first strategy in battle of the sexes type 1. Indeed Hurwicz’s criterion for the first and the second strategies are: H1 = 0.5 · (A + V ) + 0.5 · (C + V ) H2 = 0.5 · (C + V ) + 0.5 · (B + V ) where V is the OP-Q’s (OPVar-Q’s) notion of the expected discounted cumulative reward that it will get starting from the next step. • Stationary opponent gets 0.75 · B + 0.25 · C as OP-Q (OPVar-Q) plays the first strategy. OP-Q gets in average 0.75 · A + 0.25 · C. After noticing that its opponent is stationary OPVar-Q also plays the first strategy for 0.75 · A + 0.25 · C > 0.75 · C + 0.25 · B and gets in average 0.75 · A + 0.25 · C. • Q, PHC, WoLF get the impression that in their environment (where OP-Q (OPVar-Q) agent is constantly playing the first strategy) the first strategy is much more profitable than the second one (B against C, where B > C) and play it. As a result OP-Q gets A as average reward after exploration stage and Q, PHC, WoLF only — B. On realizing that the opponent’s strategy has become stationary (1, 0), OPVar-Q also plays the first strategy (A > C) and gets A as average reward. • MinimaxQ strives to maximize its expected discounted cumulative reward in the worst case. But battle of the sexes is not strictly competitive. That’s why OP-Q and OPVar-Q show better results. • FriendQ developed for cooperative environments believes that when it gets the best reward so do the other agents in the environment and that’s why it is the most profitable for them to play the other part of the joint action that results in the largest reward to FriendQ. In battle of the sexes it is constantly playing the second action. As a result OP-Q and FriendQ both get very low C reward.
N. Akchurina / Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion
After realizing that its opponent plays the second strategy OPVarQ also plays the second strategy for B > C and this results in A to FriendQ and B to OPVar-Q as average rewards. • NashQ plays the first and the second strategy alternately since there are two Nash equilibria (a11 , a21 ) and (a12 , a22 ). OP-Q gets A+C and NashQ B+C . On getting to know that 0.5 is exactly the 2 2 right optimistic-pessimistic parameter OPVar-Q also chooses the first strategy and gets the same reward as OP-Q. • JAL taking into account OP-Q’s (OPVar-Q’s) stationary (1, 0) policy chooses also the first more profitable for it action (B > C). OP-Q and JAL respectively get A and B as average rewards. As JAL’s policy becomes stationary OPVar-Q also plays the first strategy (A > C) and gets A as average reward.
4.1.2
Type 2 Table 2.
Battle of the sexes: type 2
B,A C,C
C,C A,B
After a short exploration phase OP-Q (and OPVar-Q at first) chooses the second strategy in battle of the sexes type 2. • Stationary opponent gets 0.75 · C + 0.25 · B as OP-Q (OPVar-Q) plays the second strategy. OP-Q gets in average 0.75·C +0.25·A. OPVar-Q results are higher because it chooses the action that will maximize its cumulative reward against stationary opponent. • Q, PHC, WoLF, JAL and OP-Q (OPVar-Q) play the second strategies and get B and A as average rewards correspondingly. • MinimaxQ the same as for type 1. • FriendQ plays the first strategy while OP-Q chooses the second action. They both get low C average reward. On getting to know that opponent permanently plays policy (1, 0) OPVar-Q chooses the first action and gets B as average reward while FriendQ gets A. • NashQ the same as for type 1.
Figure 1.
Battle of the sexes
As you can see on the figure 1 OPVar-Q managed to get far higher rewards than it would have got playing Nash equilibrium policy
437
by forcing opponent to play the strategy that is more profitable for OPVar-Q and at the same time tuning its policy when facing stationary opponent.
4.2
Self Play
In self play OPVar-Q converged to one of pure Nash equilibria for every class of 2 × 2 repeated games (out of 78 according to Rapoport’s classification [8]) where the later exist.
5
Discussion and Conclusion
This paper is devoted to an actual topic of extending reinforcement learning approach for multi-agent systems. An algorithm based on Hurwicz’s optimistic-pessimistic criterion is developed. Hurwicz’s criterion allows us to embed initial knowledge of how friendly the environment in which the agent is supposed to function will be. A formal proof of the algorithm convergence is given. Thorough testing of the developed algorithm against Q, PHC, WoLF, MinimaxQ, FriendQ, NashQ, JAL showed that OPVar-Q functions effectively in the environments of different level of amicability by making its opponents follow the policy which is more profitable for it. The variability of Hurwicz’s criterion allowed it to converge to best-response against opponents with stationary policies. In self play for all types (according to Rapoport’s classification) of repeated 2 × 2 games the proposed algorithm has converged to pure Nash equilibrium when the later existed.
REFERENCES [1] Kenneth Arrow, ‘Hurwiczs optimality criterion for decision making under ignorance’, Technical Report 6, Stanford University, (1953). [2] Michael H. Bowling and Manuela M. Veloso, ‘Multiagent learning using a variable learning rate’, Artificial Intelligence, 136(2), 215–250, (2002). [3] Caroline Claus and Craig Boutilier, ‘The dynamics of reinforcement learning in cooperative multiagent systems’, in AAAI ’98/IAAI ’98: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, pp. 746–752, Menlo Park, CA, USA, (1998). American Association for Artificial Intelligence. [4] Junling Hu and Michael P. Wellman, ‘Multiagent reinforcement learning: Theoretical framework and an algorithm’, in ICML ’98: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 242–250, San Francisco, CA, USA, (1998). Morgan Kaufmann Publishers Inc. [5] Michael L. Littman, ‘Markov games as a framework for multi-agent reinforcement learning’, in ICML, pp. 157–163, (1994). [6] Michael L. Littman, ‘Friend-or-foe q-learning in general-sum games’, in ICML, eds., Carla E. Brodley and Andrea Pohoreckyj Danyluk, pp. 322–328. Morgan Kaufmann, (2001). [7] Eugene Nudelman, Jennifer Wortman, Yoav Shoham, and Kevin Leyton-Brown, ‘Run the gamut: A comprehensive approach to evaluating game-theoretic algorithms’, in AAMAS ’04: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 880–887, Washington, DC, USA, (2004). IEEE Computer Society. [8] Anatol Rapoport, Melvin J. Guger, and David G. Gordon, The 2 × 2 Game, Ann Arbor: The University of Michigan Press, 1976. [9] Herbert Robbins and Sutton Monro, ‘A stochastic approximation method’, Annals of Mathematical Statistics, 22(3), 400–407, (1951). [10] Csaba Szepesv´ari and Michael L. Littman, ‘Generalized markov decision processes: Dynamic-programming and reinforcement-learning algorithms’, Technical report, Providence, RI, USA, (1996). [11] Chris J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. dissertation, King’s College, Cambridge, England, 1989.
438
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-438
As Safe As It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring Danny Kuminov1 and Moshe Tennenholtz2 Keywords: DMA::Game Theoretic Foundations; PS::Planning with Incomplete Information Abstract. We introduce the first near-optimal polynomial algorithm for obtaining the mixed safety level value of an initially unknown multi-stage game, played in a hostile environment, under imperfect monitoring. In an imperfect monitoring setting all that an agent can observe is the current state and its own actions and payoffs, but it can not observe other agents’ actions. Our result holds for any multistage generic game with a “reset” action.
1
Introduction
Decision making in adversarial settings is a central topic in both AI and game theory. Assuming a purely adversarial setting, playing the (mixed) safety-level strategy is the best one can hope for. Such a strategy maximizes the expected worst case payoff of the player, and it can also be computed efficiently. This leaves us with two complementary central problems. One problem is the need to deal with settings which are not purely adversarial. Another challenging issue is the need to deal with incomplete information about the environment. In particular, the game played might be unknown, and therefore guaranteeing the safety level value may be problematic. This paper deals with the latter issue. Consider a multi-stage game. A multi-stage game consists of finitely many states, each of which is associated with a strategic form game. The actions selected by the agents in a given state determine the payoffs of the agents according to the payoff matrix of the corresponding game. Moreover, as a function of the current state and the selected actions we reach a new state. We will consider the situation where we have two agents, and we care about the payoffs that can be guaranteed by player 1 (which we refer to as the agent), when playing against player 2 (which we refer to as the opponent). If the multi-stage game starts from a given initial state, and is played along T stages then the agent can guarantee itself a particular, optimal safety-level value. This is common and highly natural solution for this general class of games. However, when the multi-stage game is unknown it is no longer clear what will be the best possibility for the agent. A clever algorithm should attempt to learn the structure of the game, in order to attempt to obtain a value which is close to the safety level value. A central issue in this regard is the type of information available to the agent. In particular, the literature distinguishes between perfect monitoring and imperfect monitoring. In the 1 2
Technion – Israel Institute of Technology, Haifa, Israel 32000. Email: dannykv@tx.technion.ac.il Technion – Israel Institute of Technology, Haifa, Israel 32000. Email: moshet@ie.technion.ac.il
perfect monitoring setting the agent can recognize the state, and observe both its payoff and the opponent action after the state-game is played. In the imperfect monitoring setting the agent can only recognize the state and observe its own payoff; it can not observe the opponent actions. In both settings the idea is to come up with an algorithm that will guarantee that the average payoff will be close to the safety level value of the underlying game. Moreover, an important objective is that convergence to this value will be obtained in a polynomial number of iterations. The above challenge fits into the so-called agent-centric approach to learning in games (see e.g. [10, 6]). We consider multi-stage games with incomplete information, where there is strict initial uncertainty about the game being played [8, 1]. In the context of repeated games with incomplete information [3], where the multi-stage game consists of only a single state, Banos [5] and Megiddo [9] proved the existence of an algorithm that converges to the safety-level value in any repeated game, even under imperfect monitoring. The algorithm they present, however, is highly inefficient; an efficient algorithm addressing this problem in repeated games can be found in [2]. These results, however, do not apply to general multi-stage games.3 On the other hand, if we allow perfect monitoring, then the R-max algorithm, introduced in [7], provides a near-optimal polynomial algorithm. However, the problem of obtaining the (mixed) safety-level value in multi-stage games with imperfect monitoring was left open. In this paper4 we address the above challenge, by presenting the first near-optimal polynomial algorithm for obtaining the safety-level value in generic multi-stage games, with strict initial uncertainty about the game. In a generic multi-stage game, for any given state s, action a of the agent, and actions b1 , b2 of the opponent, the agent’s payoff for (a, b1 ) in s is different from its payoff for (a, b2 ) in s. Although somewhat limiting, this assumption captures many interesting situations, and is quite common in the literature. Namely, given an initially unknown generic multi-stage game, we show an efficient algorithm that after polynomially many iterations (in the game size and accuracy parameters) guarantees (almost) the safety level value with overwhelming probability. A major challenge that this algorithm addresses is that given imperfect monitoring the agent can not know whether two payoffs he obtains in a given state, when playing actions a1 and a2 respectively, are associated with the same action by the opponent.
3
4
We have considered several ways to reduce a multi-stage game to a representation acceptable by the algorithm in [2], but all of them result in learning time that is exponential in the number of states. We omitted many proofs in this version of the paper, due to lack of space. A full version, with all the proofs and additional discussion, can be found at http://www.technion.ac.il/∼dannykv/ecai08.pdf.
D. Kuminov and M. Tennenholtz / As Safe as It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring
2
The setting
In a multi-stage game (MSG) the players play a (possibly infinite) sequence of games from some given set of finite games (in strategic form). After playing each game, the players receive the appropriate payoff, as dictated by that game’s matrix, and move to a new game. In the model we consider here, the identity of this new game is uniquely determined by the previous game and by the players’ actions in it.5 Formally: Definition 1. A fixed-sum, two player, multi-stage game (MSG) M on finite set of states S and finite sets of actions X1 , X2 consists of: • Stage Games: each state s ∈ S is associated with a two-player, fixed-sum game in strategic form, where the action set of each player is X1 , X2 accordingly, and the utility function of player 1 is Us : X1 × X2 → . For brevity, we denote X = X1 × X2 . • Transition Function ftr : S × X1 × X2 → S: ftr (s, x1 , x2 ) is the state to which the game transfers from state s given that the first player (the agent) plays x1 and the second player (the opponent) plays x2 . • Designated initial state. W.l.o.g let us denote it by start. In this work, we assume that player 1 (the row player of the stage games) does not know a priori what the payoff matrices are, neither he is informed after each stage about the action taken by player 2 (but he observes what his payoff at that stage was, and he knows what the current state is before playing the respective game). Player 2 (the column player), however, is fully informed about both the payoff matrices and the history of the game. We use the following definitions: • The set of histories of length t of M isH t = ti=1 (S × X). ∞ • The set of all finite histories is H = k=0 H k (here we use the simplifying notation that H 0 = {e}, where e denotes the empty history). • H∞ = ∞ i=1 (S × X) is the set of all infinite histories. • Given a history h, we will slightly abuse notation and denote by Ui (ht ) the payoff to player i in round t of the history h. • A behavioral policy of the informed player (player 2) in the multistage game is a function p2 : H × S → Δ(X2 ). • The set of histories of the game which is available to the unint t formed player (player 1) at stage t is H = l=1 (S × X1 × ). k • We denote H = ∞ H . k=0 • A behavioral policy of the uninformed player in the multi-stage game is a function p1 : H × S → Δ(X1 ). The S in the function parameters represents the current state of the game. • We denote the set of possible behavioral policies of player i by Pi . A utility function in this setting is a function U˜i : H ∞ → . There are several possible definitions of this function based on the utilities in the one-shot game; we will use the function U˜i (h) = lim inf t→∞ 1t tk=1 Ui (hk ). The true game M together with the players’ behavioral policies generate a probability measure over H ∞ , which can be described uniquely by its values for finite cylinder sets. Given this measure, the expected utility of player i given policies p1 , p2 is defined as U˜i (p1 , p2 ) = lim inf t→∞ h∈H t P rh|(p1 , p2 ) 1t tk=1 Ui (hk ), where P rh|(p1 , p2 ) denotes the probability that a finite history h ∈ H t occurs in the first t stages of the game. In this work, we will assume that all payoffs in the stage games are positive and bounded from above by Umax , that the stage games are 5
This model is a special case of the well known stochastic game model.
439
generic and that there is a designated action (w.l.o.g let us denote it by reset) that from any state and given any opponent action, transfers the game to the initial state and gives payoff 0 to the agent.6 In a generic game, for any given action a of the agent, and actions b1 , b2 of the opponent, the agent’s payoff for (a, b1 ) is different from its payoff for (a, b2 ). We also assume that the above is the only prior information available to the agent. He does not know a priori the exact payoff matrices of the stage games, neither does he know a priori what the transition function is for actions that are not reset. In this work we assume that we are given a time limit T , and we limit ourselves only to policies that play for T steps and always do reset on step T + 1 (and then repeat themselves ad infinitum), both in our learning algorithm and in the optimal policy to T t 1 which we compare.7 Let Vp1 ,p2 (T ) = Ep1 ,p2 T +1 t=1 U1 (h ) denote the expected average payoff guaranteed by such policy p1 against opponent policy p2 , and V (T ) = maxp1 minp2 Vp1 ,p2 (T ) denote the maximal expected average payoff that can be guaranteed by such policy. Given this definition, our goal is to develop a policy p1 for player 1 which, given confidence δ, accuracy and finite time horizon T , guarantees after ˆ l = poly(|X1 |, |X2 |, |S|, 1δ , 1 , T ) rounds an expected average payoff of at least (1 − )V (T ) with probability at least 1 − δ. 8 Formally, for any game for which the above assumptions hold and for any policy p2 of the opponent: l : 1l lt=1 U1 (ht ) ≥ (1 − )V (T ) ≥ 1 − δ. P rp1 ,p2 ∀l ≥ ˆ Note that the optimal policy under this criterion can be described as a mapping S × {1 . . . T } → Δ(X1 ). Informally, the policy only has to take into account the current state and the number of steps remaining in the current T -step sequence, when determining the next action – the specific previous history does not matter. This means that the T -step min-max policy can be described concisely (i.e., the size of its representation is polynomial in T and the problem parameters) and that it can be computed efficiently, by combining backward induction with the usual techniques for computing mixed min-max strategy in strategic form games. In fact, this observation holds for any stochastic game, which is a more general model.
3
The algorithm
The basic idea of the algorithm can be summarized as follows: • In each iteration, the algorithm constructs an approximate (optimistic) model of the multi-stage game, computes the T -step optimal strategy for it and executes it. • The agent represents its knowledge about the game matrix of each stage game by a partition of the set of opponent’s actions. For each element of the partition and each action of the agent, it keeps the set of payoffs associated with that subset of the opponent’s actions. • With some small probability the algorithm will explore in the current iteration - that it, it will draw a number i ∈ [1, 2, . . . , T ] (distributed uniformly and independently) and in round i of the iteration, it will play a random action (distributed uniformly and independently) and count the number of times each distinct payoff 6
7 8
The reset action ensures that there are no irreversible actions. It can be easily verified that learning is impossible otherwise, since, by trying an unknown action, the agent might trap himself in an inferior subgame, without any possibility for going back. This choice is justified in the full version of the paper. Note that the average is taken over all stages of the game, including the initial learning period ˆ l.
440
D. Kuminov and M. Tennenholtz / As Safe as It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring
was encountered for each action in each state during the sampling. After sampling, it will play reset and start the next iteration. • When updating its model, for each stage game, the algorithm tries to find a refinement of the partition of the opponent’s actions so that payoffs with sufficiently different counts9 are in different groups, and payoffs with similar counts are in the same group (and the new partition is the same for all rows). • We prove that, with high probability, if there are two groups of actions that the opponent used sufficiently different number of times, we will be able to separate the respective payoffs correctly in all rows - we will learn something about the game matrix. • Otherwise, the difference between the number of times that the opponent used actions that are in a given element of the partition is small. Note that when constructing the tentative model, the algorithm treats each element as a single meta-action, takes the payoff for the agent when the opponent plays this meta-action to be the average of the distinct payoffs associated with it, and takes the transition function for this meta-action to be uniformly distributed over the successive states for the payoff values that are associated with it.10 Given that the above difference is small, the algorithm obtains a sufficiently high payoff when using this model. We assume that the agent knows a priori the following parameters of the problem: • |S| - the number of the states in the multi-stage game. • |X1 |, |X2 | - the sizes of the strategy sets (of the agent and the opponent respectively). • Umax - the largest possible payoff for the agent in the game. • , δ - accuracy parameters. We will also use the following notation: • β, γ - two parameters that control the behavior of the algorithm (to be determined later). • S = (S × {1, . . . , T }) {0} is the extended set of states. Note that we add a fictitious state 0 to the model, and we treat being in the same state at different times of the T -step sequence as being in different states.11 • Let Cs be a variable that holds the counters that the algorithm maintains, for a stage game s ∈ S . Specifically, Cs : X1 × [0, Umax ] → N is a function that maps the distinct payoff values for each row to the number of times they were encountered while sampling that row. We denote by C the set of all such variables. • Let Ωs and φs , for s ∈ S , be two variables that represent the partial knowledge that the algorithm has regarding the game matrix (of stage game s). Specifically, Ωs is a partition over the opponent’s action set and φs : X1 × Ωs → 2[0,Umax ] is a function that maps (for each row) elements of the partition to associated groups of payoff values. Note that the initial state of “no knowledge” is represented by ∀s ∈ S : Ωs = {X2 } and ∀s ∈ S , i ∈ X1 : φs (i, X2 ) = ∅, and a state of complete and accurate knowledge is represented by Ωs = {{j}|j ∈ X2 } and φs (i, {j}) = {Uij }. • Let ftr : S ×X1 ×[0, Umax ] → S be the transition function that the algorithm maintains (since the game is generic, utility values can be used in place of opponent actions). 9
We use the word “count” to denote the number of times the algorithm encountered a specific payoff value while sampling, as opposed to the actual number of times the respective action was used by the opponent. 10 Note that although the real transition function is deterministic, the algorithm uses a stochastic game as a tentative model. 11 This distinction is required, since the optimal policy must be able to treat these states differently.
• Let ls (ω), for ω ∈ Ωs , be a variable that holds the number of times a stage game s ∈ S has been played and the opponent used an action in ω since the last time the partition Ωs was refined. We denote by l the set of all such variables. Now we define the algorithm: Procedure RecordAndReset(s, s , x, u, Ωs , φs , ftr , C, l) Let ω ∈ arg minω∈Ωs minu∈φs (x,ω) Cs (x, u) Let φs (x, ω ) := φs (x, ω ) ∪ {u} Let ftr (s, x, u) = s For all s ∈ S ∀x ∈ X1 , u ∈ [0, Umax ] : Cs (x, u) := 0 ∀ω ∈ Ωs : ls (ω) := 0 End for End procedure // Initialization For all s ∈ S For all x1 ∈ X1 \ {reset} Let ftr (s, x1 , Umax ) = 0 For all s ∈ S Let ftr (s, reset, Umax ) = (start, 1) For all s ∈ S Ωs := {X2 }; ls (X2 ) = 0 For all x ∈ X1 φs (x, X2 ) := ∅ For all x ∈ X1 , u ∈ [0, Umax ] Cs (x, u) := 0 End For While true // Endless loop // Model update For all s ∈ S For each ω ∈ Ωs : For each row i ∈ X1 Let (ui1 , . . . , ui|ω| ) be the elements of φs (i, ω), ordered in non-decreasing order of Cs (i, u). If |φs (i, ω)| < |ω| then // the number of observed payoffs for ω // is less than the number of actions in ω Add (|ω| − |φs (i, ω)|) entries of Umax + 1 to φs (i, ω) (with count 0). 12 End if End for If there exists 1 < k ≤ |ω| such that ∀i ∈ X1 : 3 Cs (i, uik ) − Cs (i, ui(k−1) ) > 2γls (ω) 4 , // Here we refine the partition split ω = {y1 , . . . , y|ω| } into ω1 = {y1 , . . . , yk−1 } and ω2 = {yk , . . . , y|ω| }. Replace ω with ω1 , ω2 in Ωs . Modify φs so that ∀i ∈ X1 : φs (i, ω1 ) = {ui1 , . . . , ui(k−1) } \ {Umax + 1} and φs (i, ω2 ) = {uik , . . . , ui|ω| } \ {Umax + 1} End if End for Repeat the previous loop until no more splits are made. If any split was made ∀x ∈ X1 , u ∈ [0, Umax ] : Cs (x, u) := 0 ∀ω ∈ Ωs : ls (ω) := 0 End for
D. Kuminov and M. Tennenholtz / As Safe as It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring
Build a stochastic game in which: S is the set of states The game matrix Us ∈ |X1 |×|Ωs | for each state s ∈ S is: For each i ∈ X1 For each ω ∈ Ωs 1 Let Uik = |ω| ui ∈φ(i,ω) ui +
+(|ω| − |φ(i, ω)|)Umax ) // See 13 The game matrix Us ∈ |X1 |×|X2 | for the state 0 is: For all x1 ∈ X1 , x2 ∈ X2 : Ux1 ,x2 = Umax For all states s ∈ S , t ∈ S \ {0}, agent action x ∈ X1 and opponent meta-action ω ∈ Ωs , the transition probability is: P r(s, x, ω, t) = 1 = |ω| |{u ∈ φs (x, ω) : ftr (s, x, u) = t}| For s ∈ S , x ∈ X1 , ω ∈ Ωs , the probability of transition to state 0 is: // See 14 P r(s, x, ω, 0) = |ω|−|φ(x,ω)| |ω| For the state 0, the transition function is ∀x1 ∈ X1 , ω ∈ Ω0 : P r(0, x1 , ω, 0) = 1 ∀x1 ∈ X1 , ω ∈ Ω0 , s ∈ S : P r(0, x1 , ω, s) = 0 Compute the T -step mixed safety level strategy for this stochastic game. Let explore be a random boolean value with P (explore = true) = β Let i be an integer selected from [1, T ] with uniform probability Repeat for t from 1 to T Let s denote the current state. If explore = true and t = i: Let x ∈ X1 be an action selected at random with uniform probability Execute action x → let u be the observed payoff and s - the new state. Let Cs (x, u) := Cs (x, u) + 1 If ∃ω ∈ Ωs : u ∈ φs (x, ω) Call RecordAndReset(s, s , x, u, Ωs , φs , ftr , C, l) Break // T -step Repeat End if Let x be the action prescribed by the safety level strategy for the current state and step. Execute action x → let u be the observed payoff and s - the new state. If ∃ω ∈ Ωs : u ∈ φs (x, ω) then Call RecordAndReset(s, s , x, u, Ωs , φs , ftr , C, l) Break // T -step Repeat End if End // T -step Repeat Play reset End while
4
The analysis
˜ ∞ = (X × {true, f alse} × X1 )N be the set of all infinite Let H histories of the game that includes information about the realization 12
Those values are just placeholders for unknown values - we could use any impossible value here. Here we again make an optimistic assumption that payoffs yet unobserved are equal to Umax . 14 Here we make an optimistic assumption that transitions yet unobserved lead to the “heaven” state 0. 13
441
of the random decision variables used by the algorithm (whether an exploration has been done in a specific round and the row chosen for exploration). The true multi-stage game M together with both play˜ ∞ , which can be ers’ policies generate a probability measure over H described uniquely by its values for finite cylinder sets. All random variables that we use in this analysis are derived from this probability measure. We show: Theorem 1. Given a multi-stage game that conforms to the requirements set in Section 2, > 0, δ > 0, there exists ˆ l = poly(|X1 |, |X2 |, T, |S|, Umax , 1δ , 1 ) such that for any policy of the opponent, the above algorithm achieves in every round l ≥ ˆ l an expected average (over all rounds since the start of the game) payoff of at least (1 − ) V (T ) with probability at least 1 − δ, where V (T ) is the maximal expected average payoff that can be guaranteed after playing T steps. To prove the theorem, we need the following notation: 1. Let (l1 , l2 . . .) be the indices of the rounds of the multi-stage game at which the algorithm updates the partition and/or records a new payoff value for one of the states (note that there can be at most |X2 ||S|T + |X1 ||X2 ||S|T such rounds), and let us divide the rounds of the game into epochs ((0, . . . , l1 − 1), (l1 , . . . , l2 − 1), (l2 , . . . , l3 − 1), . . .). 2. For brevity, we will denote by Q1 , Q2 , . . . constant values that are polynomial with respect to the problem parameters and are constant throughout the execution of the algorithm. In particular, we will denote by Q1 = (|X1 | + 1)|X2 ||S|T + 1 the maximal number of epochs. 3. For a given stage game s ∈ S and epoch e = (li , . . . , li+1 − 1), l let Cs,e be the counter function that the algorithm maintains at round l of the epoch (i.e., at round li + l of the game) for the stage game s. l 4. For an epoch e and for each j ∈ X2 , let Fs,e (j) be the number of times that the stage game s was played in the first l rounds of the epoch and the opponent played j. Note that, by definition, the value of ls (ω) at round l of the epoch equals to j∈ω Fsl (j). 5. When the epoch under consideration is clear from context, we will omit the subscript e. 6. Note that all of the above are random variables. 7. Note that the probability that sampling occurs in a given round of the multi-stage game is Tβ and the probability that a specific action β is sampled is T |X , independent of any other random variables or 1| the actions of the opponent. 8. Therefore, for any stage game s and actions i ∈ X1 , j ∈ X2 , the expected value of the counter maintained by the algorithm (Csl (xi , Uij )) given the value of Fsl (j) is
βFsl (j) . T |X1 |
The following Lemma shows that the counters maintained by our algorithm represents in an adequate manner the frequency in which actions are used by the opponent. Lemma 1 (Counter accuracy). Let us examine a specific epoch e = (lk , . . . , lk+1 − 1). There exists γ = poly(|S|, T, 1δ , |X1 |, |X2 |) so that for any policy of the opponent and any 0 < β < 1: ⎡ ⎤ ∃l ∈ N = = ⎢ ∃i ∈ X1 = 3⎥ βFsl (j) == δ Pr ⎢ : ==Csl (i, Uij ) − ≥ γls (ω) 4 ⎥ = ⎣ ∃s ∈ S ⎦ ≤ Q1 T |X1 | ∃j ∈ ω ∈ Ωs
442
D. Kuminov and M. Tennenholtz / As Safe as It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring
The intuition here is that, since the sampling is independent of any action of the adversary and any other action of the algorithm, the counters collected by the algorithm result from a representative sample of the opponent’s actions, and therefore result in a reliable estimate of the number of times the opponent used the respective action. Technically, it is proved using the Azuma bound ([4]). The proof is omitted due to lack of space, and appears in the full version. Given the above, from now on, we will assume as given that for all epochs in which the game is not yet fully known: = = = 3 βFsl (j) == ∀l ∈ N, ∀i ∈ X1 : ==Csl (i, Uij ) − < γls (ω) 4 (1) ∀s ∈ S, ∀j ∈ ω ∈ Ωs T |X1 | =
set of payoffs works well enough. The proof is omitted due to lack of space, and appears in the full version. The following Lemmas deal with the situation where nothing is learned for a “long time”, and show that in this case the agent will get high payoff. The proofs of these Lemmas are omitted due to lack of space, and appear in the full paper.
2 4 1 ||X2 | Let us denote Q2 = 4γ 4T |Xβ .
(that is, the negation of the inequality in the above lemma holds for all states and rounds of play in the epoch). Using this assumption we show that the algorithm achieves the required expected average payoff against any policy of the opponent with probability 1. The following pair of Lemmas, show that (under the above assumption) the algorithm refines the information structure appropriately.
Lemma 6. Let l denote the length (in rounds) of an epoch. Suppose 1 4Umax that l ≥ 1−β Q2 |S|T 2 |X2 | rounds, then the expected average payoff in this epoch is at least (1 − β) 1 − 2 V (T ).
Lemma 2 (Sufficient condition for split). Given Eq. (1), if at any round l ∈ N in a given epoch there exist stage game = s and two actions = y, y ∈ ω ∈ Ωs of the opponent such that =Fsl (y) − Fsl (y )= > 3
4γls (ω) 4
|X1 ||X2 | , β
then the algorithm must split ω in this round.
The intuition here is that since the sampling process is representative, the counters collected in different rows for payoffs that result from the same (hidden) action by the adversary must have similar values. In particular, if the frequency with which the opponent used two of his actions is sufficiently different, the counters for the respective payoff values will have significantly different values – in all rows. Therefore, the algorithm can safely conclude that those payoff values result from distinct actions by the opponent. The proof is omitted due to lack of space, and appears in the full version. Lemma 3 (Split correctness). Given Eq. (1), the algorithm never makes a mistake in assigning the payoffs in the “split” phase. Formally, a mistake would mean that in partitioning ω into ω1 and ω2 at round l, there are two payoff values u1 ∈ φls (i1 , ω) and u2 ∈ φls (i2 , ω) that belong to the same column in the true game matrix (i.e. ∃j ∈ X2 : u1 = Uis1 j , u2 = Uis2 j ) and the algorithm assigns u1 to φls (i1 , ω1 ) and u2 to φls (i2 , ω2 ). The intuition here is that given the error margin asserted by Eq. 1, the algorithm cannot, while refining the partition, mistakenly assign a payoff value to a partition element that does not contain the respective opponent action. This is so, since the algorithm relies on the counter values when assigning the payoffs, and the counter values are representative of the actual opponent actions so far. The proof is omitted due to lack of space, and appears in the full version. Lemma 4. Suppose that in a given epoch of length l, in all stage games, for any two opponent strategies j1 , j2 ∈ ω ∈ Ωs (which are in the same part of the partition Ωs ) in a given stage game s ∈ S it holds that |Fsl (j1 ) − Fsl (j2 )| ≤ 4T |X ls (ω). Then the expected av2| erage payoff of the algorithm in this epoch is at least 1 − 4 V (T ). The intuition here is that as long as the opponent uses some of his actions the same (roughly) amount of times, the fact that the algorithm cannot distinguish which payoff belongs to which action (in this set of actions) does not decrease its payoff – the assumption that the payoff for each of those actions is the numerical average of the
Lemma 5. If, during some epoch, there is a stage game s ∈ S and two strategies j1 , j2 ∈ ω ∈ Ωs (which are in the same part of the partition Ωs ) such that ls (ω) > Q2 and |Fsl (j1 ) − Fsl (j2 )| > l (ω), then a split will occur (and the epoch will end). 4T |X2 | s
The intuition here is that, if the algorithm did not refine any of the partitions for a long time, then, for each partition element, the opponent must have used the different actions in this partition element a similar amount of times. The key observation is that the bound on the difference in frequency of use of the actions that is implied by 3 Lemma (2) is O(ls (ω) 4 ) < O(ls (ω)), and therefore, if an epoch is longer than some polynomial, the relative difference in frequency of use (relative to the overall length of epoch) will become small enough for Lemma (4) to hold. The proof is omitted due to lack of space, and appears in the full version. 1 4Umax Let us denote Q3 = 1−β Q2 |S|T 2 |X2 |. Combining the above, we can now prove our main theorem: Proof. Let us select β = /4 – it follows that from the previous lemma that the expected average payoff of any epoch that is longer than Q3 is at least 1 − 3 V (T ). Recall that there are at most Q1 4 epochs and therefore the maximal total length of epochs that contain less than Q3 rounds is Q1 Q3 . This means that if the algorithm runs for at least ˆ l = 4 Q1 Q3 rounds the expected average payoff ˆ l−Q1 Q3 is at least 1 − 3 V (T ) = 1 − 4 1 − 3 V (T ) ≥ ˆ 4 4 l (1 − ) V (T )
REFERENCES [1] I. Ashlagi, D. Monderer, and M. Tennenholtz, ‘Robust learning equilibrium’, in Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI 2006)., (2006). [2] Peter Auer, Nicol`o Cesa-Bianchi, Yoav Freund, and Robert E. Schapire, ‘The non-stochastic multi-armed bandit problem’, SIAM J. Comput., 32, 48–77, (2002). [3] R. Aumann and M. Maschler, Repeated Games with Incomplete Information, MIT Press, 1995. [4] K. Azuma, ‘Weighted sums of certain dependent random variables’, Tˆohoku Math. Journal, 19, 357–367, (1967). [5] A. Banos, ‘On pseudo games’, The Annals of Mathematical Statistics, 39, 1932–1945, (1968). [6] M. Bowling and M. Veloso, ‘Rational and covergent learning in stochastic games’, in Proc. 17th IJCAI, pp. 1021–1026, (2001). [7] R. I. Brafman and M. Tennenholtz, ‘R-max – a general polynomial time algorithm for near-optimal reinforcement learning’, Journal of Machine Learning Research, 3, 213–231, (2002). [8] N. Hyafil and C. Boutilier, ‘Regret minimizing equilibria and mechanisms for games with strict type uncertainty’, in Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI-04), pp. 268–277, Arlington, Virginia, (2004). AUAI Press. [9] N. Megiddo, ‘On repeated games with incomplete information played by non-bayesian players’, Int. J. of Game Theory, 9, 157–167, (1980). [10] R. Powers and Y. Shoham, ‘New Criteria and a New Algorithm for Learning in Multi-Agent Systems’, in NIPS 2004, (2004).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-443
443
A Heuristic Based Seller Agent for Simultaneous English Auctions Patricia Anthony1 and Edwin Law2 Abstract. The popularity of online auction emerges due to the flexibility and convenience that it offers to consumers. Sellers offer a variety of items for sale with the aims of obtaining more profit. To obtain a reasonable profit, the reserve price for the item must be determined before the item is put up for sale. However, setting the price too high may result in no sale, while setting the price too low may yield a lower profit. In real auction, this is the main selling problem since most sellers fail to place a strategic reserve price for a given item which results in a lower profit. In this work, we develop a seller agent that proposes the item’s reserve price based upon several selling constraints that comprise of the number of competitors, the number of bidders, the duration of the auction and the degree of profit that the seller desires when disposing the item. This paper describes the detail implementation and design of our agent’s strategy. The seller strategy will be evaluated across a wide context of diverse and varying selling environment using a simulated auction marketplace.
1
INTRODUCTION
Online auction has been increasingly used as a medium to sell a variety of items via the Internet. Among the most popular auction houses are eBay, UBid, Bidbay, Yahoo, and Amazon.com. However, eBay has become the most popular auction site and has emerged as the online market leader in 2007 in which it reported the highest growth rate with sales totalling USD$16.2 billion3 . Online auction is the process of buying and selling goods by offering them up for bid, taking bids and then selling the item to the winning bidder4 . There are four main types of auction being used for single object auction where a single unit item is offered. This includes the English auction, the Dutch auction, the first price sealedbid auction and the second price sealed-bid auction (also known as Vickrey auction [9]). Among the frequent selling formats used for these auctions are the no reserve price auction (item being auctioned does not have any reserve price), public reserve price auction (the reserve price for the item is publicly announced), private reserve price auction (the item’s reserve price is known only to the seller) and buy it now auction (user can purchase the item based on a fixed price, while the auction is progressing) [3]. Since there can be multiple auctions trading the same item, the pricing of the good is the most essential factor that must be considered if the seller wishes to get more profit. Many sellers have found that setting a high price for an item may not result in a sale. However, setting a low price may result in the item being disposed off at 1 2 3 4
Universiti Malaysia Sabah,Malaysia, email: panthony@ums.edu.my Universiti Malaysia Sabah, Malaysia, email: jazedwin@gmail.com http://news.ebay.com/fastfacts ebay marketplace.cfm http://en.wikipedia.org/wiki/Auction
a low price and there is a possibility that the item is sold at a value below the market price. Since the auction environment is highly complex, dynamic and unpredictable, setting a strategic reserve price is not a straightforward process. There are several factors that need to be considered when deciding on the single optimal reserve price [1]. Firstly, we are uncertain of the number of competitors who are competing in selling the identical objects. Secondly, we cannot capture on the number of bidders who may participate in each auction since there are a number of auctions that run simultaneously in which the bidders can choose to participate. Thirdly, each ongoing auction has varying auction duration. Fourthly, each auction is imposed with different reserve price for the item being offered. To date, researchers have worked on generating the seller’s reserve price, since the role of the reserve price has received much attention in both single isolated auction and also in cases where sellers compete against each other. Gerding et al., worked on optimizing the sellers’ profit using the strategy of shill bidding where sellers submit a shill bid to increase their profit and to ensure that they do not sell the item at a low price [2]. However, shill bid is undesirable and has become a common form of Internet auction fraud that undermines the trust and sales revenues of institutions. Morris et al., developed the reserve price strategy and the seat releasing strategy based on the past history in a sealed-bid auction [7]. These two strategies work simultaneously when the reserve price for the seats is determined based on the number of seats available. The pricing strategy works when today’s reserve price is computed based on the yesterday’s reserve price. However, the strategy only works best if the demand remains constant since it does not cater on the demands prediction and movements. Besides, poor sales may occur if the first day’s reserve price was inaccurately computed. Min et al., proposed an agent based system to generate the reserve price based on the case similarity of information retrieval theory and the moving average of time series analysis [6]. This technique has the tendency to produce an unreasonable reserve price that failed to reflect the recent trend of auction prices, which may bring harm to the sellers. Most of the previous work is targeted to obtaining a high winning price in which other selling aspects such as the selling rate and the market true value for the item being auctioned have been ignored. To solve these problems, we developed an intelligent selling agent that is able to generate a reserve price for the seller by taking into account the selling constraints. In this work, we will focus on the English auction using the private reserve price format, since this kind of auction is commonly practiced in eBay auction site [5, 8]. The main purpose of our work is to generate a reserve price that guarantees a reasonable profit within a given time frame. The remainder of the paper is structured as follows. Section 2 explains the simulated marketplace used in our experiment. Section 3
444
P. Anthony and E. Law / A Heuristic Based Seller Agent for Simultaneous English Auctions
describes the implementation and the design of the selling strategy that generates the reserve price. In Section 4, we describe the experimental evaluations and finally, Section 5 concludes.
2
THE SIMULATED ONLINE AUCTION MARKETPLACE
The electronic marketplace simulation supports three types of protocols: English, Dutch and Vickrey. However, for this particular work, it is configured to run multiple auctions using English only. The simulated marketplace serves as a platform to simulate and replicate the real online auction environment in which there are multiple buyers and sellers participating in the marketplace. This platform is also used to measure and to evaluate the appropriateness and the suitability of our seller agent’s strategy. In this work, the market is setup to run in a continuous selling round where in each round, there are a number of auctions running simultaneously until the global time for the market is reached. It is assumed that each auction is offering the same identical item and only one single unit is being offered. The number of auctions is generated randomly based on a standard probability distribution between 2 and 30. This number constitutes the number of competing sellers offering the same identical item for sale. For each auction, there are between 2 and 15 number of bidders. Each auction has a selling duration and is randomly generated between 1 and 30. The reserve price (which is the minimum price that seller is willing to sell the item) is randomly generated between 50 and 90. Each bidder has their own private valuation which is the maximum price they are willing to pay for the desired item and this private valuation is randomly generated between 50 and 90. All these values are randomly drawn from a standard probability distribution. Each auction has a finite start time and finite end time. The auction starts with an opening price and each bidder bids for the item by raising up the bid price. The bidders follow the standard dominant bidding strategy in which they will only bid slightly higher than the current price as described in the auction theory [4]. It is also assumed that the English auction is a private reserve auction, in which the bidders are informed when the current bid has exceeded the reserve price. When this information is announced, the bidders will change their tactics by bidding with the smallest possible price to avoid overpaying for the item. This scenario is similar to eBay auction. The marketplace remains actives until the global time is reached and when all auctions are closed. There are several events that can happen once the auction is closed. If the closing price is less than the reserve price of the item, the auction is closed with no trade. Otherwise, the winner and the auction closing price are announced and a trade will take place between the seller and the winning bidder.
3
that needs to be considered is the number of bidders participating in each auction. In economic terms, a low price is imposed when there are a few bidders (low demand) and a high price is set when there are many bidders (high demand) in the market. The third factor that affects the price setting is the selling duration for each auction. Auction with a longer duration allows the possibility of eliciting a higher bid price. As such, the seller should impose a higher reserve price for the item being auctioned. However, a low price must be considered for auction that has a shorter duration since this limits the chances of obtaining the item at a higher bid value. The last factor being considered is the level of profit that the seller desires. If the seller’s intention is to get rid of the item, a lower pricing strategy is inevitable to optimize the chance of selling. However, a high pricing strategy must be considered if they intend to obtain a higher profit in order to obtain the best price for the item being sold. The set of considerations of the number of competitors, the number of bidders, the level of profit the seller desires, and the auction duration is referred to as the selling constraints. More formally, let C be the set of considerations that the agent takes into account when generating the reserve price and j represents the individual selling constraints, where j ∈ 1..|C|. For each of the constraint j ∈ 1..|C|, there is a corresponding function fj which will suggest a value of the reserve price based on that particular constraint. At a given time, the agent may consider any of the selling constraints individually or it may combine them depending on the situation. If the agent combines multiple selling constraints, it allocates weights to denote their relative importance. Here, the weights are rated based on the given scale of 0 ≤ wj ≤ 1 and wj = 1 where wj is the weight allocated on the constraint j. There is a set of constraints C such that the reserve price v is calculated as, v=
wj fj
(1)
j∈C
3.1
The Competitor Function
Assume that n is the number of competitors at a given selling round, r. Let fc be the function to determine a single price based on the number of sellers, where p is the mean price for a given number of competitors. fc is then defined as, fc (n) = p(n)
3.2
(2)
The Bidder Function
Similarly, assume that n is the number of bidders in a given auction. Letfb be the function to determine a single price based on the number of participating bidders, where p is the mean price for a given number of bidders. fb is then defined as,
DESIGNING THE AGENT’S SELLING STRATEGY
There are several factors that need to be considered when deciding the reserve price for the item. Rationally, the price should be determined based on the supply and demand in the market. The first factor is the number of competitors offering the same item at the same time. The key determinant of what price to offer is dependent on how many competitors are selling the same item in the marketplace. In reality, a high price must be set when there are only a few competitors (low supply) in the market. However, a low price must be considered when there are many competitors (high supply). The second factor
fb (n) = p(n)
3.3
(3)
The Time Function
Assume that t is the auction length for a given auction. Let ft be the function to determine a single price based on the duration in which the auction will be held. In a real auction, this information is expressed by the seller. The parameter p is the mean price based on the of auction length and hence ft is defined as, ft (n) = p(n)
(4)
P. Anthony and E. Law / A Heuristic Based Seller Agent for Simultaneous English Auctions
3.4
The Profit Function
Assume that n is the percentage of profit that the seller desires. Let fp be the function to determine a single price based on the given percentage between 0% to 20%. This percentage is kept small to capture the range of profit percentage under a more realistic assumption. The single price generated using this function is formed in two stages. This is made by estimating the reserve price based on the past auction history, and the price is inflated according to the percentage of profit that the seller desires. Here, we define fp as the function to determine a single price based on the profit that the seller desires, r is the reserve price and n is the percentage of profit that is desired. In order to generate the reserve price r, we analyzed the bidding history for all the successful auctions that were completed with a sale. For all closed auctions with a sale, the lower bound price which is the first minimum bid that has met the reserve price is defined as the Lowest Traded Price, (λ) whereas the closing price is termed as the Highest Traded Price, (ρ). We then calculate the mean price for χ and δ by accumulating the λ and ρ divided by the total number of successful auctions M . χ=
M
(λ1 + .. + λM )/M
(5)
(ρ1 + .. + ρM )/M
(6)
i=1
δ=
M i=1
Assume that N ∈ M is the number of auctions which recorded λ and ρ that exceeded the χ and δ. Here, α (defined as the estimated minimum price once the reserve price is met) is calculated by accumulating all the λs among these outstanding auctions N where ∀λ ≥ χ. To calculate β (the fraction of price that lies between the maximum price and the minimum price once the reserve price is met), the price difference among the λ and ρ for each of the auction is calculated and divided by the λ, and this is accumulated for all N auctions where ∀ρ ≥ δ. α=
N
(λ1 + .. + λN )
(7)
((ρ1 − λ1 )/λ1 + .. + (ρN − λN )/λN )
(8)
i=1
β=
N i=1
Finally the reserve price r can be calculated as follow, r = 1/N 2 × α × (N − β)
(9)
The reserve price fp is then inflated based on the percentage of the profit desired (n). fp (n) = r × (1.00 + n)
(10)
In each round, a different reserve price r is generated and this information is updated with each successful auction.
4
EXPERIMENTAL EVALUATION
To evaluate the performance of our agent using the selling strategy described above, we performed several experimental evaluations. The objective of these experimental evaluations is to examine the efficiency and effectiveness of our seller’s strategy in achieving and
445
delivering the desired selling aims. In order to evaluate the effectiveness of the selling strategy, four different measures are used. Firstly, the success rate which is the number of times the seller agent is able to sell the item. Secondly, the total profit made by the agent measured in percentage. The third measurement is the average winning price for all the auctions won by the seller agent. The last measurement is the percentage of gain/loss with respect to the market price for the item being sold by the seller agent. The gain/loss is calculated by taking the closing price of a given auction minus the average closing price. In this experiment, the performance of the seller agent across a diverse and various selling environment is also evaluated. The purpose of doing this, is to investigate the suitability of our strategy in various environments. The seller agent is subdivided into six individual agents. The difference from one seller agent to the other is in the distribution of weights in the each of the four functions. The purpose of configuring the experimental setup this way, is to identify which strategy is best suited to a given environment. The environment is classified into seven different settings that can be categorized as LC, MC, LB, MB, ST, LT and Rand. The environment can be categorized based on the number of competitors, the number of bidders and the duration of the auction. The first environment is categorized as less competitors (LC) that has between 2 and 15 competitors. In other words, there are between 2 and 15 auctions that are selling identical item in the marketplace. For this environment, the total number of bidders is drawn randomly between 2 and 15 while the auction duration is drawn randomly between 1 and 30. The second environment is defined as many competitors (MC) in which there are between 16 and 30 competitors that are running concurrently in the marketplace. As in the first environment, other parameters are generated randomly.The third environment is defined as less bidders (LB) where the number of bidders for each auction in the marketplace is between 2 and 8. The remaining parameters are drawn randomly, between 1 and 30 for the number of competitors and between 1 and 30 for the auction duration. The fourth environment is defined as many bidders (MB) which consist of 9 to 15 bidders. The fifth environment is defined as short time (ST) in which the duration of each auction is between 1 and 15. Similarly, all other parameters are drawn randomly. The sixth environment is defined as long time (LT) where the duration of each auction in the marketplace is between 16 and 30 and all other parameters are generated randomly. Lastly, the random environment (Rand) is where the number of competitors, the number of bidders, and the timing is generated randomly between 2 and 30, 1 and 15 and 1 and 30 respectively. We defined six different strategies for our seller agents. These six strategies use a combination of varying weights among the four constraints for each environment. The agents’ strategies are categorized as One Constraint, Two Constraints, Three Constraints, Four Constraints, Equal Constraints and Unequal Constraints. Agent I uses the strategy of One Constraint where only a single function (in this case the weighting is 1.0) is used. For example, if the environment is LC/MC, the competitor function is selected. For LB/MB setting, the bidder function is picked and similarly the timing function is considered for ST/LT environment. For Agent II that uses the strategy called Two Constraints, a combination of two functions is picked as the strategy. Similar to Agent I, Agent II deploys the strategy that is dependent on the current environment. Agent III (Three Constraints) and Agent IV (Four Constraints) uses a combination of three functions and four functions respectively. The strategy Equal Constraints for Agent V uses equal values for the four constraints (in this case each function is assigned
446
P. Anthony and E. Law / A Heuristic Based Seller Agent for Simultaneous English Auctions
a weight of 0.25). Agent VI uses the Unequal Constraints strategy, in which it uses all the four constraints but the weight for each constraint is varied accordingly based on the current environment. The performance of these agents are compared against the performance of the control agent that deploys the No Constraint strategy. It generates random pricing between 50 and 90 without considering any selling constraint. Our experiment consists of 2000 runs for the seller agents and the control agent. Running the marketplace 2000 times means that the agents have 2000 chances of selling the item. The performance of each agent is then summed and averaged over these 2000 runs. Figure 1 shows the success rate achieved by the agents. It can be seen that the seller agents (Agent I, Agent II, Agent III, Agent IV, Agent V, Agent VI) outperformed the control agent. All the seller agents produced a high success rate for all the environments . Here, all the seller agents achieved 15% higher success rate compared to the control agent in all environments. This is because the seller agents consider the selling constraints when generating the reserve price for the item and they are able to auction off the item 80% of the time. However, there is no difference in the performance of the seller agents, indicating that varying the weight for each constraint does not have a significant effect on the success rate performance of the agents. On the other hand, the control agent failed to achieve a satisfactory success rate with approximately 60% success rate in all cases because it does not take into account the selling environment (constraints) at all.
Figure 1. The Agents’ Success Rate
The profit obtained each time the agent is able to sell is shown in Figure 2. Again, the seller agents were more superior in delivering a greater profit compared to the control agent. The seller agents in MB and LT settings recorded a higher profit of 18% and 16% respectively because a higher price could be elicited for auction that is held within a longer duration. In addition, market with a higher demand (MB, in this case more bidder in a given auction) tends to generate a stiffer competition resulting in a higher closing price which in turn leads to a higher profit. The LB and ST environments recorded a lower profit with 10% and 12% while the LC, MC, and Rand setting recorded an average profit of approximately 13%. The strategy for Agent I which uses a single function recorded the highest profit in all situations except in Rand. This is partly due to the tuning of the strategy to the
auction environment. In Rand setting, the highest profit was recorded by Agent VI using unequal weight for all the four constraints because the environment is randomized (unknown), and the best strategy to be deployed is a combination of the strategies with varying weights. In contrast, the control agent recorded the lowest profit for all cases since it is not sensitive to what is going on around it.
Figure 2. The Agents’ Selling Profit
Figure 3 shows the average winning price (closing price) obtained by all the sellers. The seller agents are able to obtain a higher winning price when compared to the control agent. In this experiment, the seller agents performed best in the LC, MB and LT environments, with winning price at 79, 81 and 80 compared to the MC, LB and ST settings with only 78, 76 and 78. With a decreasing market supply (LC), there is a possibility that the bidders will bid higher and thus a higher winning price is obtained. Similarly, bidders tend to bid higher due to a large competition when the market demand is increasing (MB) resulting in a higher bid price as well. As claimed, auction with a longer time (LT) is able to elicit a higher bidding price and thus a higher closing price is observed. This result also implies that a seller agent with a high success rate will most probably obtain a higher profit and a higher winning price. As expected, the control agent failed to obtain a high winning price. We measure the average gain/loss of the closing price towards the market price as shown in Figure 4. Agent V that deploys the strategy of Equal Constraints recorded the highest gain overall across all settings when compared to the other strategies. This infers that, if seller wishes to sell above the market price, all constraints must be considered with equal weighting. The seller agents in MC, MB, ST and Rand environments recorded a gain above 1.5%. With increased market supply (MC), bidders tend to bid slower and as a result this may lower the market value. As expected, the seller agents in Rand setting recorded a high average gain across all the environments. The seller agents in LC and LB recorded the lowest gain below 0.5%. This could be due to the tendency of the bidders to bid higher when only a few auctions (LC) are made available and thus raise up the market price. However, this winning price obtained is very close to the market price, resulting in a low gain. The result obtained in LB environment could be due to the possibility that only a few bidders are participating and they are not able to raise the price due to a low competition, thus lowering the gain. The control agent recorded a
P. Anthony and E. Law / A Heuristic Based Seller Agent for Simultaneous English Auctions
Figure 3. The Agents’ Average Winning Price
significant loss for all environments except for the MB setting. This indicates that the control agent’s item is always sold at a price below the market value.
447
converted into function that generates an individual reserve price, whereby the combination of these values forms the strategy that the seller agent can utilize when auctioning an item. In addition, the strategies that the agents deploy are evaluated under various environments to investigate the appropriateness and suitability of our strategy in broad situations towards a wide applicability. The main concern of this work lies in minimizing the tradeoff between delivering an auction with a sale and obtaining profit. There is no direct comparison that can be made between this work and other previous work in that the measurements that we used to evaluate the performance of the agents is entirely different. The performance of the agents were evaluated based on the success rate, the selling profit, the average winning price and the average gain/loss. Based on the results obtained in the experimental evaluation, our seller agents outperformed the control agent in all measurements. This shows that the seller agent’s strategy is effective and efficient in generating a reserve price that will guarantee a sale with some profit and within a fixed duration. The experimental evaluation clearly demonstrated that all the six seller agents produced a higher success rate, a higher profit, a higher winning price and a higher market gain across all the environments when compared to the control agent. Therefore, our selling strategy could be considered as a model for a single object auction that utilizes the private reserve price for English auction protocol. In this work, we assume that the seller agent knows the number of bidders and the number of sellers that enter the marketplace. In reality, this information is not known and this complicates the decision process in generating the reserve price. For future work, a prediction model will be used to estimate and predict the number of bidders and sellers that participate in the market, since this information (supply and demand) is required in computing the reserve price.
ACKNOWLEDGEMENTS We wish to acknowledge the Ministry of Science, Technology and Innovation Malaysia (MOSTI) for funding this research.
REFERENCES
Figure 4. The Agents’ Average Gain/Loss
In summary, we can conclude that all our seller agents outperformed the control agent in all experiments. The four constraints that we have identified should be considered when deciding the reserve price in order to achieve the selling goals. The result obtained shows that in order to obtain an optimal performance of the agents’ strategies, the weights should be tuned according to the auction environment. Our findings also illustrate that our seller agents were able to perform with a satisfactory result in all environments.
5
CONCLUSION AND FUTURE WORK
This paper proposes the design and the development of a seller agent that tackles the problems when offering an item for sale by suggesting a strategic reserve price. We propose and establish the pricing strategy based on the four selling constraints that involve the competitors, the bidders, the timing and the profit. Each constraint is
[1] P. Anthony and J. Dargham, ‘Seller agent for online auctions’, in Proceedings of the Second International Conference on Innovations in Information Technology (IIT’05), (2005). [2] E. H. Gerding, A. Rogers, R. K. Dash, and N. R. Jennings, ‘Sellers competing for buyers in online markets: Reserve prices, shill bids, and auction fees’, in Proceedings of Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), pp. 1287–1293, (2007). [3] R. Katkar and D. Lucking-Reiley, ‘Public versus secret reserve prices in ebay auctions: Results from a pokemon field experiment’, NBER Working Papers 8183, National Bureau of Economic Research, Inc, (March 2001). available at http://ideas.repec.org/p/nbr/nberwo/8183.html. [4] P. Klemperer, ‘Auction theory: a guide to the literature’, Journal of Economic Surveys, 13(3), 227–286, (1999). [5] D. Lucking-Reiley, ‘Auctions on the internet: What’s being auctioned, and how?’, Journal of Industrial Economics, 48(3), 227–252, (2000). [6] J. K. Min and K. L. Yong, ‘Reserve price recommendation by similaritybased time series analysis for internet auction systems’, LNAI, 4251, 292–299, (2006). [7] J. Morris, P. Ree, and P. Maes, ‘Sardine: dynamic seller strategies in an auction marketplace’, in Proceedings of the 2nd ACM Conference on Electronic Commerce (EC-00), pp. 128–134, (2000). [8] E. J. Pinker, A. Seidman, and Y. Vakrat, ‘Managing online auctions: Current business and research issues’, Management Science, 49(11), 1457– 1484, (2003). [9] W. Vickrey, ‘Counterspeculation, auctions, and competitive sealed tenders’, The Journal of Finance, 16(1), 8–37, (1961).
448
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-448
A Truthful Two-Stage Mechanism for Eliciting Probabilistic Estimates with Unknown Costs Athanasios Papakonstantinou and Alex Rogers and Enrico H. Gerding and Nicholas R. Jennings1 Abstract. This paper reports on the design of a novel two-stage mechanism, based on strictly proper scoring rules, that motivates selfish rational agents to make a costly probabilistic estimate or forecast of a specified precision and report it truthfully to a centre. Our mechanism is applied in a setting where the centre is faced with multiple agents, and has no knowledge about their costs. Thus, in the first stage of the mechanism, the centre uses a reverse second price auction to allocate the estimation task to the agent who reveals the lowest cost. While, in the second stage, the centre issues a payment based on a strictly proper scoring rule. When taken together, the two stages motivate agents to reveal their true costs, and then to truthfully reveal their estimate. We prove that this mechanism is incentive compatible and individually rational, and then present empirical results comparing the performance of the well known quadratic, spherical and logarithmic scoring rules. We show that the quadratic and the logarithmic rules result in the centre making the highest and the lowest expected payment to agents respectively. At the same time, however, the payments of the latter rule are unbounded, and thus the spherical rule proves to be the best candidate in this setting.
1 INTRODUCTION In a world where information can be distributed over systems owned by different stakeholders and accessed by multiple users, it is important to develop processes that will evaluate this information and will give some guarantees to its quality. This is particularly important in cases where the information in question is a probabilistic estimate or forecast whose generation involves some cost. Examples include estimates of quality of service within a reputation system, or forecasts of future events such as weather conditions, where such costs could represent the computational task of accessing and evaluating previous interactions records, or that of running a large scale weather prediction model. Now, when the provider of such information is a rational selfish agent, it may have an incentive to misreport its estimate, or to allocate less costly resources to its generation, if it can increase its own utility by doing so (e.g. by being rewarded for a more precise estimate than it actually provides). Thus, a centre attempting to elicit such information is presented with three challenges. First, it must identify the agent who can provide an estimate of the required precision at the lowest cost. Second, it must incentivise this agent to allocate sufficient costly resources in order to provide an estimate of the required precision. Finally, it must incentivise this agent to truthfully report the estimate that has been generated. Against this background, a number of researchers have proposed the use of ‘strictly proper scoring rules’ to address these challenges 1
School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK, email:{ap06r,acr,eg,nrj}@ecs.soton.ac.uk
[1, 5]. Mechanisms using these rules reward accurate estimates or forecasts by making a payment to agents based on the difference between an event’s predicted and actual outcome (observed at some later stage). Such mechanisms have been shown to incentivise agents to truthfully report their estimates in order to maximise their expected payment [6]. More recently, strictly proper scoring rules have been used in computer science to promote the honest exchange of beliefs between agents [7], and within reputation systems to promote truthful reporting of feedback regarding the quality of a service experienced [2]. Furthermore, Miller et al. have shown that when the agents’ costs are known, it is possible to use an appropriately scaled strictly proper scoring rule to induce agents to commit costly resources to generate estimates of any required precision [4]. While these approaches are effective in the specific cases that they consider, they all rely on the fact that the cost of the agent providing the estimate or forecast is known by the centre. This is not the case in our scenario where these costs represent private information known only to each individual agent (since they are dependent on the specific computational resources available to the agent). Thus, in addressing this shortcoming, we contribute to the state of the art by presenting a novel two-stage mechanism which relaxes this assumption. The first stage of the mechanism incentivises agents’ to truthfully reveal their costs to the centre, thus allowing it to select the agent with the lowest cost. The second stage then incentivises this agent to generate an estimate with a minimum required precision, and to truthfully report this estimate to the centre. In more detail, in this paper we extend the state of the art in the following ways: • We describe a novel two-stage mechanism in which a centre uses a reverse second price auction in the first stage to elicit the true costs of agents, and hence identify the agent that can provide an estimate with a specified precision at the lowest cost. An appropriately scaled strictly proper scoring rule is then used in the second stage of the mechanism to incentivise this agent to generate and truthfully report the estimate. • We formally prove that this mechanism is incentive compatible in both costs and estimates revealed, and that it is individually rational. That is, agents will truthfully report both costs and estimates to the centre, and willingly participate within the mechanism. • We empirically evaluate our mechanism by comparing the quadratic, spherical and logarithmic scoring rules in a setting where costs depend linearly on precision. We show that while the logarithmic rule results in the centre making the lowest expected payment to the agent, this payment is unbounded. The other rules are bounded, but result in higher expected payments. Hence, we find that the spherical rule is preferred in our setting. The rest of this paper is organised as follows: In section 2 we describe our model, and in section 3 we present background on strictly
A. Papakonstantinou et al. / A Truthful Two-Stage Mechanism for Eliciting Probabilistic Estimates with Unknown Costs
449
Table 1. Comparison of Quadratic Spherical and Logarithmic Scoring Rules
Scoring Rule: S(x0 ; x, θ) S(θ)
S (θ) α β
Quadratic 2N (x0 ; x, 1/θ) − > 1 2
1 2
>
θ π
θ π
Spherical 4π 1 4 N (x0 ; x, 1/θ) θ θ 14
√1 4 πθ
1 4
√
4c (θ0 ) πθ0
c(θ0 ) − 2θ0 c (θ0 )
4π
1 4πθ 3
1
We now describe our model in more detail. Specifically, we assume that there is a centre interested in acquiring a probabilistic estimate or forecast (such as an expected quality of service within a reputation system, or a forecast temperature in a weather prediction setting) with a minimum precision θ0 , henceforth referred to as the required precision2 . We assume that there are N ≥ 2 rational, risk neutral agents who can provide the centre with an unbiased but noisy estimate or forecast, x, of precision θ. We model the agents’ private estimates as Gaussian random variables such that x ∼ N (x0 , 1/θ), where x0 is the true state of the parameter being estimated. Note that this true state is unknown to both the centre and the agents at the time that the estimate is requested, but becomes available to the centre at some time in the future. For example, in a reputation system the actual quality of service received is only known once the service has been procured, and in a weather forecasting setting the actual weather that occurs is observed by the centre at some later date. The agents incur a cost in producing their estimate, and we assume that this cost is a function of the precision of the estimate, c(θ). While the centre has no information regarding the agents’ cost functions, we assume that all cost functions are convex (i.e. ci (θ) ≥ 0), and we note that this is a realistic assumption in all cases where there are diminishing returns as the precision increases. We do not assume that all agents use the same cost function, but we do demand that the costs of different agents do not cross (i.e. the cost ordering of agents is the same over all precisions). Given this model, the challenge is to design a mechanism that enables the centre to identify the agent that can provide the estimate or forecast at the lowest cost, and to provide a payment to this agent such that it is incentivised to generate the estimate or forecast with a precision at least equal to the required one and to report it truthfully.
3 STRICTLY PROPER SCORING RULES As discussed in the introduction, the problem described above has previously been addressed through the use of strictly proper scoring rules as payments in the case that the agents’ cost functions are known to the centre [2, 4]. Before we proceed to the analysis of our mechanism which is designed for cases where the centre has no knowledge about the costs, we give a brief description of strictly proper scoring rules. As described earlier, such rules are used to calculate a payment to an agent depending on the difference Note that we assume that the centre derives no additional benefit if the estimate is of precision greater than θ0 .
1 2θ
1 3
4
c(θ0 ) − 4θ0 c (θ0 )
2 INFORMATION ELICITATION PROBLEM
log(N (x0 ; x, 1/θ)) θ 1 1 −2 log 2π 2
4
4c (θ0 ) 4πθ0
proper scoring rules. In section 4 we detail our mechanism and formally prove its economic properties, before empirically evaluating it in section 5. We conclude and discuss future work in section 6.
2
Logarithmic
2c (θ.0 )θ0
c(θ0 ) − 2c (θ0 )θ0
1 2
log
θ0 2π
/ −
1 2
between an event’s predicted and actual outcome. Much of the literature of strictly proper scoring rules concerns three specific rules, the quadratic, spherical and logarithmic rules, given by: ∞ 1. Quadratic: S(x0 |r(x0 )) = 2r(x0 ) − −∞ r2 (x)dx ∞ 2 2. Spherical: S(x0 |r(x0 )) = r(x0 )/( −∞ r (x)dx)1/2 3. Logarithmic: S(x0 |r(x0 )) = log r(x0 ) In each case, S(x0 |r(x0 )) is the payment given to an agent after it has reported its estimate (represented as probability density function r(x)) and x0 is the actual outcome observed.
3.1
An Incentive Compatible Mechanism
It is a standard property of strictly proper scoring rules that an agent will maximise its expected score (and hence the payment it receives) by reporting its true probabilistic estimate to the centre [1, 3]. Thus, mechanisms based upon them are incentive compatible. Using this result, we can calculate the score that the agent expects to receive, given that it has generated an estimate of precision θ and has truthfully reported it to the centre (as it is incentivised to do). To do so, we first note that, in our case, where estimates are represented by Gaussian distributions, we can replace r(x0 ) with N (x0 ; x, 1/θ), and derive new expressions for each of the three scoring rules shown above (these are presented in the first row of table 1). We can then simply integrate over the expected outcome to derive the agents expected score, S(θ). These results are shown in the second row of table 1, and form the basis of the calculations and proofs that we present in the following sections.
3.2
Eliciting Effort with Known Costs
It should now be noted that the above scoring rules will still be incentive compatible if they undergo an affine transformation. Indeed, Miller et al. show that by using appropriate scaling parameters, and given knowledge of an agent’s costs, it is possible to induce an agent to make and truthfully report an estimate with a specified precision, θ0 [4]. In this case, an agent’s expected payment, P (θ), is given by: P (θ) = αS(θ) + β
(1)
and the expected utility of the agent is given by: U (θ) = αS(θ) + β − c(θ)
(2)
The centre can now choose the value of α such that the agent’s utility (its payment minus its costs) is maximised when it produces and truthfully reports an estimate of the required precision, θ0 . To do so, it solves dU /dθ|θ0 = 0 to give: α=
c (θ0 )
S (θ0 )
(3)
450
A. Papakonstantinou et al. / A Truthful Two-Stage Mechanism for Eliciting Probabilistic Estimates with Unknown Costs
In rows three and four of table 1 we present this result, and the derivative of the expected score that is required to calculate it, for each of the three strictly proper scoring rules presented earlier.
3.3
An Individually Rational Mechanism
Finally, we now note that in order for an agent to incur the cost of producing an estimate, it must expect to derive positive utility from doing so. Thus, the centre can use the constant β to ensure that it makes the minimum payment to the agent, while still ensuring that the mechanism is individually rational. When costs are known, the centre can do so by making the agents indifferent between producing the estimate or not, by ensuring that U (θ0 ) = 0, thus giving: β = c(θ0 ) −
c (θ0 )
S (θ0 )
S(θ)
(4)
Again, row five of table 1 shows this result for each scoring rule.
4 TRUTH ELICITATION MECHANISM FOR UNKNOWN COSTS In the previous section we discussed how the centre can motivate agents to make a probabilistic estimate or a measurement of a specific precision. However, this analysis assumed the agents’ costs are known. In this section we relax this assumption and present a novel two-stage mechanism which first incentivises the agents to reveal their true costs to the centre, and then, based on this information, induces an agent to produce an estimate of at least the required precision. In more detail, in the first stage the centre asks the agents to submit their cost functions and then it assigns the estimation task to the agent with the lowest cost. Then, in the second stage, the centre uses a strictly proper scoring rule as before, but now uses the second-lowest cost reported by the agents to scale the scoring rule (i.e., set α and β). This is akin to a reverse second-price or Vickrey auction, where the agents’ rewards are equal to the second-lowest reported costs. However, in this case the reward is determined by the scoring rule, and hence depends on the actual estimate produced. In particular, this requires the scaling parameters α and β to be chosen carefully in order to incentivise the agents to reveal their true costs in the first stage. In more detail, our mechanism proceeds as follows: 1. First Stage • The centre announces that it needs an estimate of required precision θ0 , and asks all agents i ∈ {1, . . . , N }, where N ≥ 2, to report their cost functions ci (θ).3 • The centre assigns the forecast or estimate to the agent who reported the lowest cost at the required precision, i.e., agent i ck (θ0 ). such that ci (θ0 ) = mink∈{1,...,N } 2. Second Stage • The centre announces a scoring rule αS(x0 ; x, θ) + β, where: (1) S(x0 ; x, θ) is a strictly proper scoring rule, (2) S(θ) is strictly concave as a function of precision θ,4 and (3) α and β are determined using equations 3 and 4 respectively, but now based on the second-lowest reported cost functions (i.e. cj (θ) ck (θ0 )). such that cj (θ0 ) = mink=i 3
4
We note that in practise the centre only requires ci (θ0 ) and ci (θ0 ). However, for notational convenience we request the agents to reveal their entire cost function. We note that the quadratic, spherical, and logarithmic scoring rules satisfy both of these properties (see row 2 of table 1).
• The agent selected in the first stage produces an estimate x with precision θ and reports x and θ to the centre. • Once the actual outcome has been observed, the centre then gives the following payment to the agent: = αS(x0 ; x +β , θ) , θ) P (x0 ; x
4.1
(5)
Economic Properties of the Mechanism
Having detailed the two-stages of the mechanism, we now identify and prove its economic properties. Specifically, we show that: 1. The mechanism is incentive compatible in the first stage w.r.t. the costs. Specifically, truthful revelation of agents’ cost functions is a weakly dominant strategy. 2. The mechanism is incentive compatible w.r.t. the selected agent’s reported measurement and precision in the second stage. 3. The mechanism is individually rational. 4. The centre motivates the selected agent to make an estimate with a precision which is at least as high as θ0 , the precision required by the centre. We refer to actual precision produced as the ‘optimal precision’ (from the perspective of the agent) θ∗ . We now formally prove these properties. To do so, we first derive two lemmas which are then used in the proofs that follow. The first lemma shows that, if the true costs of the agent performing the measurement are less than the costs which are used to scale the scoring rule, the optimal precision θ∗ will be greater than θ0 . Let these cost functions be denoted by ct (θ) and cs (θ) respectively. More formally: Lemma 1. If ct (θ0 ) < cs (θ0 ), where ct (θ) is the agent’s true cost function, and cs (θ) is the cost function used to scale the scoring function, then θ∗ > θ0 . Proof. By scaling the scoring function using equations 3 and 4 and cs (θ), the agent’s expected utility becomes: U (θ) =
cs (θ0 )
S (θ0 )
(S(θ) − S(θ0 )) + (cs (θ0 ) − ct (θ))
(6)
Now, the optimal precision θ∗ which maximises his expected utility is formally denoted by θ∗ = argmaxθ U (θ). Therefore, U (θ∗ ) = 0, and thus we have: S (θ∗ ) c (θ∗ ) = t . (7) cs (θ0 ) S (θ0 )
Let f (θ) = S (θ)/S (θ0 ) and g(θ) = ct (θ)/cs (θ0 ). Since S(θ) is (strictly) concave it is easy to show that f (θ) ≤ 0 for θ ≥ θ0 and f (θ) < 0 for θ > θ0 . Furthermore, since ct (θ) is convex g (θ0 ) ≥ 0 for θ ≥ θ0 . Now, since f is decreasing and g is increasing, when ct (θ0 ) = cs (θ0 ) clearly the only point which satisfies equation 7 is where θ∗ = θ0 . If ct (θ0 ) < cs (θ0 ), on the other hand, it is easy to verify that g(θ0 ) < 1, since we assumed the cost functions to be non-crossing. Hence, since f (θ0 ) = 1, the only solution where the two function meet is where θ > θ0 , and thus, θ∗ > θ0 . The next lemma shows that, if the true costs of the agent doing the measurement are higher than the costs used for the scaling of the scoring function, then the agent’s utility will always be negative. Lemma 2. If ct (θ) > cs (θ) then U (θ) < 0 for any θ.
A. Papakonstantinou et al. / A Truthful Two-Stage Mechanism for Eliciting Probabilistic Estimates with Unknown Costs
Table 2.
Scoring Rule: θ∗ P (θ0 )
451
Comparison of Quadratic, Spherical and Logarithmic Scoring Rules
Quadratic 2 c2 θ0 c1 c2 c2 θ0 2 c1 − 1
Spherical 4 c2 3 θ0 c1 c2 13 c2 θ0 4 c1 −3
Logarithmic c2 θ0 c1 c2 θ0 1 + log cc21
Note that costs are given by linear functions, c(θ) = cθ, and c1 and c2 are the lowest and second lowest costs.
Proof. Concavity of the expected score S(θ) implies:
S (θ0 )(θ − θ0 ) ≥ S(θ) − S(θ0 ) Similarly, convexity of the cost function cs (θ) gives: cs (θ0 )(θ
− θ0 ) ≤ cs (θ) − cs (θ0 ).
By performing basic manipulations this results in: cs (θ0 )
S (θ0 )
(S(θ) − S(θ0 )) + cs (θ0 ) − cs (θ) ≤ 0
Furthermore, since ct (θ) > cs (θ), the following holds, for any θ: U (θ) =
cs (θ0 )
S (θ0 )
(S(θ) − S(θ0 )) + cs (θ0 ) − ct (θ) < 0
Having presented these two key lemmas, we now proceed to prove the four economic properties of our mechanism. Theorem 1. Truthful revelation of agents’ cost functions in the first stage of the mechanism is a weakly dominant strategy. c(θ) denote an Proof. We prove this by contradiction. Let ct (θ) and agents’ true and reported cost functions respectively. Furthermore, let cs (θ) denote the cost function used to scale the scoring function if the agent wins (i.e. if c(θ0 ) < cs (θ0 )). Now, suppose that the agent misreports, but this does not affect whether the agent wins or not. If the agent loses then the payoff is alway zero. If the agent wins the payoff is unaffected, since it is calculated from the second-lowest cost. Therefore, there is no incentive to misreport. Suppose that the agent misreports, and now it does affect whether the agent wins or not. There are now two cases: (1) ct (θ0 ) > cs (θ0 ) and c(θ0 ) < cs (θ0 ) (the agent wins by misreporting but would have c(θ0 ) > cs (θ0 ) (the lost when truthful), and (2) ct (θ0 ) < cs (θ0 ) and agent loses by misreporting but would have won when truthful). Case (1). Since the true cost ct (θ0 ) > cs (θ0 ), it follows directly from lemma 2 that the expected utility U (θ) is strictly negative, irrespective of θ. Therefore, the agent could do strictly better by reporting truthfully in which case the expected utility is zero. Case (2). In this case the agent would have won by being truthful, but now receives a utility of zero. To show that this type of misreporting is suboptimal, we need to show that, when ct (θ0 ) < cs (θ0 ), an agent benefits from being selected and generating the (optimal) estimate (i.e. U (θ∗ ) > 0 when ct (θ0 ) < cs (θ0 )). Now, since θ∗ is optimal by definition, then U (θ∗ ) ≥ U (θ0 ). From the expected utility in equation 6 we have, U (θ0 ) = cs (θ0 ) − ct (θ0 ) > 0 when ct (θ0 ) < cs (θ0 ), and hence U (θ∗ ) > 0 at true costs reporting. Theorem 2. The mechanism is incentive compatible w.r.t. the agent’s reported measurement and precision in the second stage.
Proof. The proof for this theorem follows directly from the definition of the strictly proper scoring rules (see section 3). Theorem 3. The two-stage mechanism is individually rational. Proof. From theorem 1 we can assume that agents report their true cost functions in the first stage. Since agents who do not win in the first stage receive zero utility, we only need to consider the case of the selected agent with cost function ct (θ) ≤ cs (θ). From equation 6, it follows that U (θ0 ) = cs (θ0 ) − ct (θ0 ) ≥ 0. Lemma 1 shows that the agent may produce an estimate θ∗ > θ0 . Since θ∗ is optimal by definition, then U (θ∗ ) ≥ U (θ0 ), and thus U (θ∗ ) ≥ 0. Theorem 4. For the agent selected in the first stage of the mechanism, it is optimal to produce an estimate with a precision equal or higher than the precision required by the centre, i.e., θ∗ ≥ θ0 . Proof. This proof follows directly from Lemma 1. In more detail, given that the agents reveal their true cost functions, we have ct (θ) ≤ cs (θ). Therefore, from lemma 1 it follows that θ∗ ≥ θ0 . Note that these proofs indicate that the two stages of the mechanism are inextricably linked and cannot be considered in isolation of one another. Indeed, apparently small changes to the second stage of the mechanism can destroy the incentive compatibility property of the first stage. For example, it is important to note that our mechanism is more precisely known as interim individually rational, since the utility is positive in expectation. In any specific instance, the payment could actually be negative if the prediction turns out to be far from the actual outcome. An alternative choice for the second stage of the mechanism would be to set β such that the payments are always positive, thus making the mechanism ex-post individually rational. However, this would then violate the incentivecompatibility property since the agents could then receive positive payoffs by misreporting their cost functions. Likewise, it might be tempting to imagine that the centre could use the revealed costs of the agents in order to request a lower precision, confident in the knowledge that the selected agent will actually produce an estimate of the required precision. However, by effectively using the lowest revealed cost within the payment rule in this way, the incentive-compatibility property of the mechanism would again be destroyed.
5 EMPIRICAL EVALUATION Having proved the economic properties of the mechanism in the general case with any convex cost function, we now present empirical results for a specific scenario in which costs are linear functions, given by ci (θ) = ci θ, where the value of ci is drawn from a uniform distribution ci ∼ U (1, 2) and θ0 = 1. Within this scenario our intention is to compare the performance of the three scoring presented earlier. To this end, for a range from 2 to 20 agents participating in the mechanism, we simulate the mechanism 106 times and, for each iteration, record the payment made to the agent who provided the estimate
452
A. Papakonstantinou et al. / A Truthful Two-Stage Mechanism for Eliciting Probabilistic Estimates with Unknown Costs
2.8
2.4
Quadratic Spherical Logarithmic
2.2
c2 θ 0
Mean Payment (P )
2.6
c1 θ0
2 1.8 1.6 1.4 1.2 1
2
4
Figure 1.
6
8
10
12
14
16
Number of Agents (N )
18
20
The mean payment made by the centre.
∗
Mean Optimal Precision (θ )
1.8
Quadratic Spherical Logarithmic
1.7 1.6 1.5 1.4 1.3 1.2 1.1 1
2
4
6
8
10
12
14
16
Number of Agents (N )
18
20
Figure 2. The mean optimal precision of agents’ estimates.
and the precision of this estimate. In figures 1 and 2 we present the means of these results (and note that the standard error in both means is much smaller than the symbol size). Consider first figure 1 which shows the mean payment made by the centre. We note that, as expected, as the number of agents increases, the mean payment decreases toward the lower limit of the uniform distribution from which the costs were drawn. Furthermore, note that there is a fixed ordering over the entire range, with the payment resulting from the quadratic scoring rule being the highest, and that of the logarithmic scoring rule being the lowest. In this figure, we also show the mean of the lowest and second lowest costs evaluated at the required precision θ0 (denoted by c1 θ0 and c2 θ0 respectively). The first cost represents the minimum payment that could have been made if the costs of the agents were known to the centre. While, the second represents the payment that would have been made, had the agent produced an estimate of the required precision rather than its own optimal precision. The gap between c1 θ0 and c2 θ0 represents the ‘information rent’ that must be paid in the case that costs are unknown. The gap between c2 θ0 and the mean payment of any particular scoring rule represents the loss that the centre has to cover due to the agent making a more precise estimate than required. The goal in selecting scoring rules is clearly to minimise this gap, and it can be seen that the logarithmic scoring rule is closest to achieving this goal. The reason for this can be seen in figure 2 where the precision of the estimates that were actually made are shown. Note that in this figure the logarithmic scoring rule is shown to induce agents to produce estimates closer to the required precision than both the spherical and the quadratic scoring rules. The same ordering as observed in these figures (when averaged over costs drawn from a uniform distribution) is also seen in analytical results for any
specific values for the lowest and second lowest costs (see table 2). Based solely on these results, it can be considered that the logarithmic scoring rule presents the best choice for the centre in this case. However, it is important to note that the logarithmic scoring rule is unbounded. That is, in the event that the agent’s estimate is far from the actual outcome, then a payment based on the logarithmic scoring rule will go to −∞ since the agent’s probability density function goes to 0 in this case (see row 1 of table 1). Thus, given this additional observation, it is clear that the spherical scoring rule represents a better choice since its payments are only slightly greater than that of the logarithmic, but it has finite bounds.
6 CONCLUSIONS In this paper we introduced a novel two-stage mechanism based on strictly proper scoring rules that motivates selfish rational agents to make a costly probabilistic estimate or forecast of a specified precision and report it truthfully to a centre. We applied the mechanism in a setting in which the centre is faced with multiple agents but has no knowledge about their costs, and we proved that it was incentive compatible and individually rational. We also empirically evaluated our mechanism, and in comparing the quadratic, spherical and logarithmic scoring rules, showed that the logarithmic one minimises the centre’s expected payment, but is unbounded. Thus, we proposed the use of the spherical rule as the best compromise between achieving minimal payments with finite bounds. Our future work consists of two main tracks. First, we would like to explore the design of alternative strictly proper scoring rules, with the intention of minimising the loss that the centre has to cover, as a result of agents making an estimate of precision higher than the required one. In this respect the value of c2 θ0 , shown in figure 1, represents a bound on the ultimate performance of the mechanism. Second, we would like to extend our mechanism to the case where the centre procures estimates from more than one agent, and then fuses them together. When costs are convex, procuring several low precision estimates may be more cost effective than procuring a single high precision estimate. Indeed, Miller et al. have shown how scoring rules can be used to score one agent’s estimate against another’s, and thus in this case there is no need to wait until the actual event’s outcome is revealed before making payments to agents [4]. However, in such a case, it is an open question as to whether it is possible to design a mechanism that incentives multiple agents to truthfully reveal their costs and estimates.
ACKNOWLEDGEMENTS This research was undertaken as part of the EPSRC funded project on Market-Based Control (GR/T10664/01). This is a collaborative project involving the Universities of Birmingham, Liverpool and Southampton and BAE Systems, BT and HP.
REFERENCES [1] A. D. Hendrickson and R. J. Buehler, ‘Proper scores for probability forecasters’, The Annals of Mathematical Statistics, 42(6), 1916–1921, (1971). [2] R. Jurca and B. Faltings, ‘Reputation-based service level agreements for web services’, in Proceedings of the International Conference on Service Oriented Computing (ICSOC), pp. 396–409, (2005). [3] J. E. Matheson and R. L. Winkler, ‘Scoring rules for continuous probability distributions’, Management Science, 22(10), 1087–1096, (1976). [4] N. Miller, P. Resnick, and R. Zeckhauser, ‘Eliciting honest feedback: The peer prediction method’, Management Science, 51(9), 1359–1373, (2005). [5] L. J. Savage, ‘Elicitation of personal probabilities and expectations’, Journal of the American Statistical Association, 66(336), 783–801, (1977). [6] R. Selten, ‘Axiomatic characterization of the quadratic scoring rule’, Experimental Economics, 1(1), 43–61, (1998). [7] A. Zohar and J. S. Rosenschein, ‘Robust mechanisms for information elicitation’, in Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1202–1204, (2006).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-453
453
Goal Generation and Adoption from Partially Trusted Beliefs C´elia da Costa Pereira and Andrea G. B. Tettamanzi 1 Abstract. A rational agent adopts (or changes) its goals when new information (beliefs) becomes available or its desires (e.g., tasks it is supposed to carry out) change. In this paper we propose a nonconventional approach for adopting goals which takes the degree of trust in the sources of information into account. Beliefs, desires, and goals, as a consequence, are gradual. Incoming information may be any propositional formula. Two algorithms for updating the mental state of an agent in this new setting are proposed. The first algorithm is relevant to the updating when a new piece of information arrives and the second one is relevant to the updating when a new desire arises.
1
Introduction and Motivation
Changes in the mental attitudes of a BDI agent [14] may influence its behavior when deciding which information to believe, which desires/goals to generate/adopt, which action to perform, and so on. The goals to be adopted in a given situation may depend on the agent’s beliefs, desires and obligations. However, most works on goal change and generation do not build on results on belief change, e.g., [2, 16, 15]. One of the first approaches in this line is Thomason’s [17], whose objective is to describe a formalism designed to integrate reasoning about desires and planning. The work by Broersen and colleagues [3] introduces the BOID architecture in which goals are generated from conditional beliefs, obligations, intentions, and desires. Also the approach by Dignum and colleagues [9] and, more recently, the one proposed in [6] are very much in this line. However, these works consider the notion of belief as an all-or-nothing concept: either the agent believes something, or it does not. Parsons and Giorgini [13] proposed to treat beliefs as degrees of evidence. Hansson [11] pointed out that there are two notions of degree of belief. The first is the static concept of degree of confidence. In this sense, the higher an agent’s degree of belief in a sentence, the more confidently it entertains that belief. The other notion is the dynamic concept of degree of resistance to change. In that sense, the higher an agent’s degree of belief in a sentence, the more difficult it is to change that belief. In this work, we just consider beliefs, desires, and goals, not intentions, but, in addition, we allow an agent to believe a piece of information to a degree. Such degree depends on the trust the agent has in the source of information. That way, we make it possible to represent the fact that if a piece of information comes from a completely trusted source, like in traditional approaches, the degree of resistance to change of the agent is null and, therefore, the agent revises its beliefs and (completely) adopts the new belief. Instead, if the 1
Universit`a degli Studi {pereira,tettamanzi}@dti.unimi.it
di
Milano,
Italy,
email:
agent does not trust the source at all, its belief will not change. The interesting case is when information comes from a partially trusted source. In this case, we will show that the relative shift of an agent’s belief degrees depends only on the degree of trust of the source. Our aim is not to compute such trust degrees, we are just interested in how they influence the agent’s beliefs and, as a consequence, the choice of which set of goals, among the possible ones, it will adopt. This work is an extension of one of the first attempts to study the impact of trusted beliefs on desires and goals [7]. The extension consists in allowing all kinds of information including disjunctive information. To explain the kind of issues we want to address, let us consider the following example, which we will refer to in the rest of the paper. You go for dinner to a new restaurant. You like to have meat (hm) but, if you find fresh fish (ff ), you’d rather have fish (hf ). Also, you like to have red wine (rw ) with meat and white wine (ww ) with fish. When you go to a new place, you assume fish is not fresh, unless you find evidence to contrary — however, you leave some room to doubt. Now, your friend, who already knows the place and whom you trust pretty much, albeit not completely, tells you they usually have fresh fish or, when they don’t, their escargots are great (ge), which you would be curious to try (he) in case you decided not to have fish. In this paper, we attempt to take into account this kind of considerations on beliefs and desires in goal generation/adoption. The paper is organized as follows. Section 2 presents the fuzzy logic-based formalism which will be used throughout the paper. Section 3 illustrates how changes due to the arrival of new information and/or a new desire influence the agent’s beliefs and desires. In Section 4, the notion of goal set is defined and requirements for goal set adoption are underlined. Section 5 concludes.
2
The Formalism
The formalism which will be used throughout the paper is inspired by the one used in [18]. However, unlike [18], the objective of our formalism is to analyze, not to develop, agent systems. Precisely, our agent must single out an optimal set of goals to be adopted.
2.1
Basic Considerations
Fuzzy sets, introduced by Zadeh [19], are a generalization of classical sets obtained by replacing the characteristic function of a set A with a membership function μA , which can take up any value in [0, 1]. The value μA (x) or, more simply, A(x) is the membership degree of element x in A, i.e., the degree to which x belongs in A. The support of A, supp(A), is the set of all x such that A(x) > 0. The usual set-theoretic operations of union, intersection, and complement can be defined as a generalization of their counterparts
454
C. da Costa Pereira and A.G.B. Tettamanzi / Goal Generation and Adoption from Partially Trusted Beliefs
on classical sets by introducing two families of operators, called triangular norms and co-norms. In practice, it is usual to employ the min norm for intersection and the max co-norm for union. Given two fuzzy sets A and B, and an element x, (A ∪ B)(x) = ¯ max{A(x), B(x)}, (A ∩ B)(x) = min{A(x), B(x)}, and A(x) = 1 − A(x). Definition 1 (Fuzzy Interpretation) A fuzzy interpretation is an assignment of truth degrees in [0, 1] to all atomic propositions (or atoms, for short) defined in the problem domain. Given a set of atoms A, a fuzzy interpretation is a function I : A → [0, 1], which assigns a truth degree I(p) ∈ [0, 1] to all atoms p ∈ A. Note that a fuzzy interpretation is, in all respects, a fuzzy set of atoms.
2.2
Formalism’s Components
An agent’s belief is a piece of information that the agent believes in. An agent’s desire is something (not always material) that the agent would like to possess or perform. Desires (or motivations) are necessary but not sufficient conditions for action. When a desire is met by other conditions that make it possible for an agent to act, that desire becomes a goal. Therefore, given this technical definition of a desire, all goals are desires, but not all desires are goals. The main distinction we made here between desires and goals is in line with that made by Thomason [17] and other authors: goals are required to be consistent whereas desires need not be. Definition 2 (Language) Let A be a set of atomic propositions and let L be the propositional language such that A ∪ {, ⊥} ⊆ L, and, ∀φ, ψ ∈ L, ¬φ ∈ L, φ ∧ ψ ∈ L, φ ∨ ψ ∈ L. Our formalism accounts for formulas (beliefs and desires), and the trust degree of information sources. The formalism is a fuzzy extension of that proposed in [6] in two ways: (i) incoming information may be any propositional formula, not only atoms or literals, and (ii) the degree of trust in the sources of information is taken into account. The first extension makes the formalism more general with respect to previous proposals, in that it allows to express all kinds of incoming information, including disjunctive information. Thanks to the second extension, it is possible to represent how strongly the agent believes in a given piece of information. We suppose that this trust degree depends on how reliable the source of the piece of information is. Here, we are not interested in the computation of such reliabilities; we merely assume that, for an agent, a belief has a trust degree in [0, 1]. An approach to the problem of assigning fuzzy trust degrees to information sources can be found for example in previous work by Castelfranchi and colleagues [4]. Consequently, if we take into account the fact that here the notion of belief is not conceived as an all-or-nothing concept but as a “fuzzy concept”, also the relations among beliefs and desires are fuzzy. The fuzzy counterpart of a desire-generation rule defined in [6] is defined as follows: Definition 3 (Desire-Generation Rule) A desire-generation rule is an expression of the form R = {βR , ψR ⇒+ D d | βR , ψR ∈ L, d ∈ {a, ¬a}, a ∈ A}2 . 2
The unconditional counterpart of of this rule is δ ⇒+ D d which means that the agent (unconditionally) desires d to degree δ.
Intuitively this means that: “an agent desires d as much as it believes βR and desires ψR . Unlike in most conventional approaches like e.g. [3], in which the authors do not consider at all disjunctive information, here, for sake of simplicity, we make this restriction only for generated desires, i.e. those in the right-hand side of the rules: a generated desire is then represented by a literal. Given a desire-generation rule R, we shall denote rhs(R) the literal on the right-hand side of R. The preferences and habits of the gourmet in the example may be described by means of the following rules: R1 : R2 : R3 :
2.2.1
ff, ge, ¬hf , hm
⇒+ D ⇒+ D ⇒+ D
R4 : R5 : R6 :
hf , he, rw ,
, hf , hf 0.7
⇒+ D ⇒+ D ⇒+ D
ww , ¬hm, hm.
Agent’s State
In this section, we define the mental state of an agent and the semantics of belief and desire formulas. The state of an agent is completely described by a triple S = B, RJ , J , where • B is a fuzzy interpretation on A; • RJ is a set of desire-generation rules, such that, for each desire d, RJ contains at most one rule of the form δ ⇒+ D d; • J is a fuzzy set of literals. B is the fuzzy interpretation which defines the degree to which the agent believes each atom in A. Representing the beliefs as a fuzzy interpretation on A guarantees by construction that the agents beliefs are consistent, i.e., for all atom a we have B(a) = 1 − B(¬a). RJ contains the rules which generate desires from beliefs and other desires (subdesires). J contains all literals (positive and negative form of atoms in A) representing desires which may be deduced from the agents’s desire-generation rules. We suppose that an agent can have inconsistent desires, i.e., for each desire d we can have J (d) + J (¬d) > 1. In the gourmet example, your initial state when you step into the restaurant might be described, by B(ff ) = 0.2, B(ge) = 0, and J (rw ) = J (hm) = 0.7, J (ww ) = J (¬hm) = J (hf ) = 0.2, J (he) = 0. By extension, we can compute the truth degree of any belief and desire formulas in L. Definition 4 (Degree of fuzzy belief and desires formulas) Let S = B, J , RJ be the state of the agent, φ, ψ ∈ L be formulas. We can extend B to arbitrary formulas in L by defining: B()
=
1,
(1)
B(⊥)
=
0,
(2)
B(¬φ)
=
1 − B(φ),
(3)
B(φ ∧ ψ)
=
min{B(φ), B(ψ)},
(4)
B(φ ∨ ψ)
=
max{B(φ), B(ψ)}.
(5)
The extension of J is obtained in the same way, except that Equations 2 and 3 do not hold for J because J may be inconsistent. They are replaced by J (⊥) = δ ∈ [0, 1]. Besides, if φ is a literal, J (φ) is directely given by the state of the agent. Note that since J need not to be consistent, the De Morgan laws do not hold, in general, for desire formulas.
C. da Costa Pereira and A.G.B. Tettamanzi / Goal Generation and Adoption from Partially Trusted Beliefs
Definition 5 (Degree of Activation of a Rule) Let R be a desiregeneration rule. The degree af activation of R, Deg(R), is given by Deg(R) = min(B(βR ), J (ψR )) and for its unconditional counterpart R = δ ⇒+ D d: Deg(R) = δ. Definition 6 (Degree of Justification) The degree of justification of desire d is defined as J (d) = maxR∈RJ :rhs(R)=d Deg(R). This represents how rational it is the fact that an agent desires d.
3
455
This observation underlines the fact that, in case of a completely trusted source, our operator obeys the Primacy of New Information Principle [8]. We measure the amount of change in beliefs by means of a fuzzy version dH of the Hamming distance between interpretations: given two fuzzy interpretations I1 and I2 , dH (I1 , I2 ) =
|I1 (a) − I2 (a)|.
(7)
a∈A
As explained previously, based on the minimal change principle, we suppose that the agent chooses the disjunct (or one of the disjuncts in case of tie) with the smallest total amount of change. More formally,
Changes in the Agent’s State
The acquisition of a new consistent piece of information with a given degree of trust in state S may cause changes in both degrees of beliefs and justification in the agent’s belief and desire sets respectively. Likewise, the arising of a new desire with a given degree may also cause changes in the desire set J .
Definition 8 (Belief Change Operator) Let β = K1 ∨ . . . ∨ Kn be the incoming information with trust degree α. The new set of beliefs is given by B ∗ α = Bi∗ , with β i∗ = arg min dH (Bi , B), i
3.1 3.1.1
Changes Caused by a new Belief Changes in the Agents’s Belief Set
To account for changes in the belief set B caused by the acquisition of a new piece of information, we define a new operator for belief change, noted ∗, which is an adaptation of the well known AGM operator for belief revision [1] to the fuzzy belief setting. We consider the disjunctive normal form (DNF) of the new piece of information β 3 , i.e., β = K1 ∨ K2 ∨ . . . Kn , with ∀ i, Ki = i l1i ∧ l2i ∧ . . . lm , and lji ∈ {aij , ¬aij }, with aij ∈ A. We suppose that β is consistent. This allows us to dispense with dealing with cases in which inconsistent beliefs make it possible to deduce all formulas and, therefore, to believe everything. If a new piece of information β arrives with degree of trust α, n alternatives are possible: either the agent trusts K1 with degree α, or K2 with degree α and so on. The value α corresponds to how strongly the agent trusts β. Here, we make a choice, motivated by the Minimal Change Principle [12]. We suppose that the agent chooses the alternative which produces the smallest change in its beliefs. We measure such changes for each disjunct of the incoming formula, thanks to the belief change operator defined below. Definition 7 (Belief Change Alternatives) Let a ∈ A and Ki be one of the disjuncts of β, the incoming information, whose degree of trust is α. Let B be the agent’s fuzzy belief set. The ith alternative new fuzzy set of beliefs Bi = B ∗ Kαi is such that, for all a ∈ A, Bi (a) =
B(a) · (1 − α) + α, B(a) · (1 − α), B(a),
if Ki |= a; if Ki |= ¬a; otherwise.
In the gourmet example, when your friend, whom you trust to a degree α = 0.8, tells you “ff ∨ ge”, you would change your beliefs B as follows: K1 = ff , K2 = ge; B1 (ff ) = B (ff )·0.2+0.8 = 0.84; B1 (ge) = B (ge) = 0; B2 (ff ) = B (ff ) = 0.2; B2 (ge) = B (ge) · 0.2 + 0.8 = 0.8; dH (B1 , B) = 0.64; dH (B2 , B) = 0.8; therefore, i∗ = 1 and B ∗ (ff ∨ ge) = B1 . Proposition 1 If Ki∗ |= a, i.e., Ki∗ confirms a to a certain degree, then applying the operator ∗ never causes the belief degree of a to decrease, i.e., B (a) ≥ B(a). Proof: If B(a) = 0 the result is obvious. Otherwise, if B(a) > 0, we have B (a) − B(a) = α · (1 − B(a)) ≥ 0. 2 Proposition 2 If Ki∗ |= ¬a, i.e., Ki∗ contradicts a to a certain degree, then applying the operator ∗ never causes the belief degree of a to increase, i.e., B (a) ≤ B(a). Proof: We have B (a) − B(a) = −α · B(a) ≤ 0.
2
The semantics of our belief change operator is defined by the following properties. Here B represents a fuzzy belief set, β the incoming trusted information with degree of trust α, supp is the support of a fuzzy set, and ∪, ∩, ⊆ and ⊇ are fuzzy operators.
(6)
This operator allows us to update the new degree of the agent’s beliefs in each atom a ∈ A, with respect to both the Ki disjunct of the incoming information and the trust degree of its source. Observation 1 If the agent trusts completely (α = 1) a source which provides a piece of information confirming (contradicting) a, then the agent will (will not at all) believe a completely (anymore) no matter which its previous degree of belief in a was. 3
where Bi is the ith alternative revision as per Definition 7. If there is more than one i such that dH (Bi , B) is minimal, one is chosen arbitrarily.
This is not a restriction because any well-formed formula of propositional logic has an equivalent DNF expression.
• (P ∗ 1)(Stability) The result of applying * in B with β is always a fuzzy set of beliefs: B ∗ α is a fuzzy set of beliefs. β • (P ∗ 2)(Expansion) If Ki∗ contains only positive atoms, then the ) ⊇ supp(B). fuzzy set B expands: supp(B ∗ α β • (P ∗ 3)(Shrinkage) If Ki∗ contains only negated atoms, then the fuzzy set B shrinks: supp(B ∗ α ) ⊆ supp(B). β • (P ∗ 4)(Invariance) If the new information is completely untrusted, i.e., α = 0, invariance holds: (α = 0) ⇒ (B ∗ α = B). β • (P ∗ 5)(Predictability) The result of applying ∗ contains all belief atoms in supp(B ∪ { α }): supp(B ∗ α ) ⊇ supp(B ∪ { α }). β β β • (P ∗ 6)(Identity) The result of applying ∗ does not depend on the 1 2 particular information. If β1 ≡ β2 and α1 = α2 : B∗ α = B∗ α . β1 β2
456
C. da Costa Pereira and A.G.B. Tettamanzi / Goal Generation and Adoption from Partially Trusted Beliefs
When all beliefs are crisp and the trust in new information is complete (α = 1), our fuzzy belief-change operator satisfies the six basic AGM revision rationality postulates K∗1–K∗6 [10]. In order to show that, let us consider the standard definition of expansion of a crisp set of formulas B with a formula φ ∈ L as B + φ = {ψ : B ∪ {φ} % ψ}. Proposition 3 If B is crisp, φ is new information whose trust is α = 1, the following hold: 1. 2. 3. 4. 5.
B = B ∗ φ is a crisp interpretation (K∗1); B (φ) = 1 (K∗2); B ⊆ B + φ (K∗3); if B(¬φ) = 0, then B + φ ⊆ B (K∗4); if φ ≡ ψ, then B ∗ φ = B ∗ ψ; (K∗6);
For the convenience of the reader, the corresponding AGM rationality postulate has been indicated between parentheses for each thesis. Note that Postulate K∗5, which in our formalism would be “B ∗ φ = L iff φ = ⊥”, is not relevant to our discussion, since we have made the assumption that new information is never inconsistent; therefore, it has not been considered. Proof: To prove Thesis 1, we observe that, when α = 1, for all atom a, B (a) ∈ {0, B(a), 1}; but B(a) ∈ {0, 1}, since B is crisp; therefore, B (a) ∈ {0, 1} as well. To prove Thesis 2, let us consider Ki∗ , the chosen alternative disjunct of φ: a sufficient condition for B (φ) = 1 is that B (Ki∗ ) = 1; ∗ now, Ki∗ = l1∗ ∧ . . . ∧ lm ; it is easy to verify that, according to Def ∗ inition 7, B (li ) = 1 for all i = 1, . . . , m; therefore, B (Ki∗ ) = mini {B (li∗ )} = 1 and the thesis follows. As for Thesis 3, it follows trivially if B + φ = L, i.e., if B(¬φ) = 1. In all other cases, i.e., when B(¬φ) = 0, we have B(φ) = 1, and, because of the minimal change principle, B = B, which proves the thesis. The proof of Thesis 4 is similar: B(¬φ) = 0 implies B(φ) = 1, whence one concludes B = B; furthermore, B + φ = B, which verifies the thesis. Finally, to prove Thesis 5, we recall that, if φ ≡ ψ, their DNFs are identical; therefore, B ∗ φ = B ∗ ψ by definition. 2
3.1.2
Changes in the Agents’s Desire Set
The acquisition of a new belief may induce changes in the justification degree of some desires. More generally, the acquisition of a new belief may induce changes in the belief set of an agent which, in turn, may induce changes in its desire set. Let β, be a new belief trusted to degree α, ( α ). To account for the changes in the desire set caused by β this new acquisition, we have to recursively: (i) calculate for each rule R ∈ RJ their new activation degree by considering B and (ii) update the justification degree of all desires in their right-hand side (rhs(R)).
The new desire set J is obtained by executing the algorithm in Figure 1 with the following inputs: B = B ∗ α , RJ , and A. β The algorithm propagates changes until a fixpoint is reached; Ck is the set of desires whose justification degree changes in step k, i.e., ∀d ∈ {a, ¬a}, with a ∈ A, d ∈ Ck ⇒ J k (d) = J k−1 (d). Step 1 updates B with respect to the incoming information α , and initialβ izes to empty, the set of desires such that justification degrees directly changes with the arrival of α , C0 . Step 2 updates C0 . Steps 3 and 4 β
1. B ← B ∗ α β ; k ← 1; C 0 ← ∅ ; 2. For each d ∈ {a, ¬a} with a ∈ A do (a) consider all Ri ∈ RJ such that rhs(R) = d; (b) calculate Deg(Ri ) by considering B ; (c) J 0 (d) ← maxRi Deg(Ri ); (d) if J0 (d) = J (d) then C0 ← C0 ∪ {d}. 3. repeat (a) Ck ← ∅; (b) for each d ∈ Ck−1 do i. for all Rj ∈ RJ such that ψRj |= d do (d); A. calculate Deg(Rj ) considering Jk−1 B. Jk (rhs(Rj )) ← max Deg(Ri ); Ri |rhs(Ri )=rhs(Rj )
(rhs(Rj )) C. if Jk (rhs(Rj )) = Jk−1 then Ck ← Ck ∪ {rhs(Rj )}. ii. k ← k + 1.
until Ck−1 = ∅. 4. for all d, J (d) is given by the following equation:
J (d) =
J (d), Ji (d),
if d ∈ C; otherwise,
(8)
where i is such that d ∈ Ci and ∀j = i if d ∈ Cj then j ≤ i, i.e., the justification degree of a “changed” desire is the last degree it takes, and C = ∞ k=0 Ck is the set of “changed” desires.
Figure 1. An algorithm to compute the new desire set upon arrival of a new belief.
update desire degrees which are indirectly changed by the incoming information. Of course, the set RJ does not change. In the gourmet example, learning β = ff ∨ ge with α = 0.8, which has you change your beliefs to B such that B (ff ) = 0.84 and B (ge) = 0, makes J change to J such that J (ww ) = J (hf ) = 0.84, J (he) = 0, J (rw ) = J (hm) = 0.7, and J (¬hm) = 0.84. Proposition 4 If the chosen disjunct Ki∗ does not contain negated atoms, then J = ∞ k=0 Jk . Proof: According to Proposition 1, for all a we have B (a) ≥ B(a). Therefore, the degree of all desires d in the new desire set J may not decrease, i.e., for all k, Jk (d) ≥ Jk−1 (d). 2 Proposition 5 If the choosen disjunct Ki∗ only contains negated atoms, then J = ∞ k=0 Jk . Proof: According to Proposition 2, for all a we have B (a) ≤ B(a). Therefore, the degree of all desires d in the new desire set J may not increase, i.e., for all k, Jk (d) ≤ Jk−1 (d). 2
3.2
Changes Caused by a New Desire
The acquisition of a new desire may cause changes in the fuzzy desire set and in the desire-generation rule base. In this work, for the sake of simplicity, we consider only new desires which are not dependent on beliefs and/or other desires. A new desire, justified with degree δ, implies the addition of the desire-generation rule δ ⇒+ D d into RJ , resulting in the new base R J . By definition of a desire-generation rule base, R J must not contain a δ ⇒+ D d with δ = δ . How does δ S change with the arising of the new desire d ?
C. da Costa Pereira and A.G.B. Tettamanzi / Goal Generation and Adoption from Partially Trusted Beliefs
+ + 1. if {δ ⇒+ D d} ∈ RJ then R J ← (RJ \ {δ ⇒D d}) ∪ {δ ⇒D d}; else R J ← RJ ∪ {δ ⇒+ d} ; D 2. k ← 1; C0 ← {d}; J0 (d) ← δ ; 3. repeat
(a) Ck ← ∅; (b) for each d ∈ Ck−1 do i. for all Rj ∈ RJ such that ψRj |= d do (d); A. calculate their respective degrees Deg(Rj ) considering Jk−1 B. Jk (rhs(Rj )) ← max Deg(Ri ); Ri |rhs(Ri )=rhs(Rj )
(rhs(Rj )) C. if Jk (rhs(Rj )) = Jk−1 then Ck ← Ck ∪ {rhs(Rj )}. ii. k ← k + 1.
until Ck−1 = ∅. 4. for all d, J (d) is given by Equation 8.
Figure 2. An algorithm to compute the new desire set upon the arisal of a new desire. • Any rule δ ⇒+ D d with δ = δ is retracted from RJ , + • δ ⇒D d is added to RJ ,
It is clear that the arising of a new desire does not change the belief set of the agent. The new fuzzy set of desires, J , is computed by the algorithm in Figure 2.
4
Goal Adoption
Goals serve a dual role in the deliberation process, capturing aspects of both intentions and desires. Besides expressing desirability, when an agent adopts a goal, it also makes a commitment to pursue the goal. Here, we concentrate exclusively on the second role served by a goal. For more information about intentions see for example Cohen and Levesque [5]. The main point about desires is that we expect a rational agent to try and manipulate its surrounding environment to fulfill them. In general, considering a problem P to solve, not all generated desires can be adopted at the same time, especially when they are not feasible at the same time. We assume we dispose of a P-dependent function FP wich, given a fuzzy set of beliefs B and a fuzzy set of desires J , returns a degree γ which corresponds to the certainty degree of the most certain feasible solution found. We may call γ the degree of feasibility of J given B, i.e., FP (B, J ) = γ. Definition 9 (γ-Goal Set) A γ-goal set, with γ ∈ [0, 1], in state S is a fuzzy set of desires G such that: 1. G is justified: G ⊆ J , i.e., ∀d ∈ {a, ¬a}, a ∈ A, G(d) ≤ J (d); 2. G is γ-feasible: FP (B, G) ≥ γ; 3. G is consistent: ∀d ∈ {a, ¬a}, a ∈ A, G(d) + G(¬d) ≤ 1. In the gourmet example, J is inconsistent, in that J (hm) + J (¬hm) = 1.54 > 1; on the other hand, consistency requires that G(hm)+G(¬hm) ≤ 1; therefore, one possible choice for G could be such that G(hm) = 0.45 and G(¬hm) = 0.55, or even G(hm) = 0 and G(¬hm) = 0.84. In general, given a fuzzy set of desires J , there may be more than one possible γ-goal set G. However, a rational agent in state S = B, J , RJ , for practical reasons, may need to elect one precise set of goals, G ∗ , to pursue, which depends on S. The choice of one γgoal set over the others may be based on a preference relation , on
457
desire sets, as proposed in [7], where it is required that a goal election function Gγ is such that: • ∀S, Gγ (S) is a γ-goal set, i.e., it does indeed return a γ-goal set; and • ∀S, if G is a γ-goal set, then Gγ (S) , G, i.e., the γ-goal set returned by function Gγ and then adopted by the agent is “optimal”. The issue of defining a specific goal election function is a critical part of constructing a rational agent framework. Such issue falls out of the scope of this work.
5
Summary
We have investigated how trust in a source of information can influence the degree of an agent’s beliefs, and how these graded beliefs influence the agent’s generated desires and then its adopted goals. We propose a new fuzzy belief change operator to deal with this new kind of information and two algorithms for updating the agent’s desire set after the arrival of a new, even partially trusted, piece of information, and a new unconditional desire. Finally, requirements for goal adoption have been stated.
REFERENCES [1] C. E. Alchourr´on, P. G¨ardenfors, and D. Makinson, ‘On the logic of theory change: Partial meet contraction and revision functions.’, J. Symb. Log., 50(2), 510–530, (1985). [2] J. Bell and Z. Huang, ‘Dynamic goal hierarchies’, in PRICAI ’96: Proceedings from the Workshop on Intelligent Agent Systems, Theoretical and Practical Issues, pp. 88–103, London, UK, (1997). SpringerVerlag. [3] J. Broersen, M. Dastani, J. Hulstijn, and L. van der Torre, ‘Goal generation in the BOID architecture’, Cognitive Science Quarterly Journal, 2(3–4), 428–447, (2002). [4] C. Castelfranchi, R. Falcone, and G. Pezzulo, ‘Trust in information sources as a source for trust: a fuzzy approach’, in Proceedings of AAMAS’03, pp. 89–96, (2003). [5] P. R. Cohen and H. J. Levesque, ‘Intention is choice with commitment’, Artif. Intell., 42(2-3), 213–261, (1990). [6] C. da Costa Pereira and A. Tettamanzi, ‘Towards a framework for goal revision’, in Proceedings of BNAIC’06, pp. 99–106, (2006). [7] C. da Costa Pereira and A. Tettamanzi, ‘Goal generation with relevant and trusted beliefs’, in Proceedings of AAMAS’08, pp. 397–404, (2008). [8] M. Dalal, ‘Investigations into a theory of knowledge base revision’, in AAAI, pp. 475–479, (1988). [9] F. Dignum, D. N. Kinny, and E. A. Sonenberg, ‘From desires, obligations and norms to goals.’, Cognitive Science Quarterly ., 2(3-4), 407– 427, (2002). [10] P. G¨ardenfors, ‘Belief revision: A vademecum’, in Meta-Programming in Logic, 1–10, Springer, Berlin, (1992). [11] S. O. Hansson, ‘Ten philosophical problems in belief revision’, Journal of Logic and Computation, 13(1), 37–49, (February 2003). [12] H. Katsuno and A. O. Mendelzon, ‘Propositional knowledge base revision and minimal change’, Artif. Intell., 52(3), 263–294, (1991). [13] S. Parsons and P. Giorgini, ‘An approach to using degrees of belief in BDI agents’, in Information, Uncertainty, Fusion, eds., B. BouchonMeunier, R. R. Yager, and L. A. Zadeh, Kluwer, Dordrecht, (1999). [14] A. S. Rao and M. P. Georgeff, ‘Modeling rational agents within a BDIarchitecture’, in Proceedings of KR’91, pp. 473–484, (1991). [15] S. Shapiro, Y. Lesp´erance, and H. J. Levesque, ‘Goal change’, in Proceedings of IJCAI’05, pp. 582–588, (2005). [16] J. Thangarajah, L. Padgham, and J. Harland, ‘Representation and reasoning for goals in BDI agents’, in Proceedings of CRPITS’02, pp. 259– 265, (2002). [17] R. H. Thomason, ‘Desires and defaults: A framework for planning with inferred goals’, in Proceedings of KR’00, pp. 702–713, (2000). [18] M. Birna van Riemsdijk, Cognitive Agent Programming: A Semantic Approach, Ph.D. dissertation, University of Utrecht, 2006. [19] L. A. Zadeh, ‘Fuzzy sets’, Information and Control, 8, 338–353, (1965).
458
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-458
Adaptive play in Texas Hold’em Poker Rapha¨el Maˆıtrepierre and J´er´emie Mary and R´emi Munos1 Abstract. We present a Texas Hold’em poker player for limit headsup games. Our bot is designed to adapt automatically to the strategy of the opponent and is not based on Nash equilibrium computation. The main idea is to design a bot that builds beliefs on his opponent’s hand. A forest of game trees is generated according to those beliefs and the solutions of the trees are combined to make the best decision. The beliefs are updated during the game according to several methods, each of which corresponding to a basic strategy. We then use an exploration-exploitation bandit algorithm, namely the UCB (Upper Confidence Bound), to select a strategy to follow. This results in a global play that takes into account the opponent’s strategy, and which turns out to be rather unpredictable. Indeed, if a given strategy is exploited by an opponent, the UCB algorithm will detect it using change point detection, and will choose another one. The initial resulting program , called Brennus, participated to the AAAI’07 Computer Poker Competition in both online and equilibrium competition and ranked eight out of seventeen competitors.
1
INTRODUCTION
Games are an interesting domain for Artificial Intelligence research. Computer programs are better than humans in many games including Othello, chess [10] or checkers [14]. Those games are perfect information games in the sense that all useful informations for predicting the outcome of the game is common knowledge of all players. Moreover, those games are deterministic. On the contrary, Poker is an incomplete information and stochastic game: players do not know which cards their opponents are holding neither the community cards remaining to come. These aspects make Poker a challenging domain for AI research [7]. Thanks to Nash’s work [13] we know the existence of an equilibrium strategy (Nash equilibrium) for the 2 players game Poker. Now, since this is a zero-sum game, if a player plays according to this strategy, in average he will not lose. But a good Poker player should also be able to adapt to his opponent game in order to exploit possible weaknesses since the goal of Poker is to win the maximum of chips. Studies have been done in the domain of opponent exploitation for the game of RoShamBo [3, 4]. For this game, a Nash equilibrium consists in playing uniformly randomly all three actions (Rock, Paper, Scissors); this strategy does not lose (in average) since it is unpredictable, but it does not win either! Indeed, against a player who always plays ”Paper” for example, the expected payoff of the Nash strategy is null (1/3 lose, 1/3 win, 1/3 draw), whereas a player who could exploit his opponent would quickly find out that playing ”Scissors” all the time is the best decision. 1
INRIA LILLE NORD EUROPE, France, {raphael.maitrepierre,jeremie.mary,remi.munos}@inria.fr
email:
In the game of poker, this idea remains true up to some extent: if an opponent bluffs too many times we should call him more often, and on the other hand, if he does not bluff often, we should be very careful. So it appears necessary to model our opponent strategy if we want to exploit his weaknesses and maximize our income. In the last few years, Poker research have received a large amount of interest. One of the first approaches to build a Poker playing bot was based on simulations [8]. The logical next step was to compute Nash equilibria for Poker. In a zero-sum game, it is possible to compute an equilibrium of the sequence form of the game using linear programming methods. But in Poker the state space is so huge that one needs to use abstraction methods (which gather similar states) before solving the sequence form game. Such powerful methods have been used in [5, 11, 12, 16] to reduce the size of the game and compute a near optimal Nash equilibrium. Some research has also been conducted in opponent modeling for game-tree search in [6], and the resulting program, named Vexbot, is also available in Poker-Academy software. In this paper, we present a new method for building an adaptive Poker bot based on beliefs updates and strategies selection. Our contribution is two-fold. First, in opponent modeling we consider a belief update on the opponent’s hand based on Bayes’ rule, which combined with different opponent models, yields different basic strategies. Second, we consider a strategy selection procedure based on a bandit algorithm (namely the Upper Confidence Bounds algorithm introduced in [2]) which performs a good trading-off between exploitation (choose a strategy that has performed well against the opponent) and exploration (try another apparently sub-optimal strategy in order to learn more about the opponent). The paper is organized as follows: after briefly reminding the rules of Hold’em Poker, we present our contributions in Section 3, with the description of the forest of game trees, the belief update rule for the opponent modeling, and the bandit algorithm for the strategy selection. We conclude with experimental results.
2
Rules of the game
In this paper we consider the two player version of Texas Hold’em Poker called heads-up. A good introduction to the rules can be found in [7]. The betting structure used is limit poker. This is the structure used in the AAAI Computer Poker Competition. A hand of Hold’em consists in four stages, each one followed by a betting round. The game begins with two forced bets called the blinds. The first player puts the small blind (half a small bet) and the other player puts the big blind (one small bet). Each player is dealt two hidden cards, a first betting round occurs, this is the preflop stage. Then, three community cards (called the board) are dealt face up, this is the flop stage. In the third stage, the turn, a fourth card is added to
R. Maîtrepierre et al. / Adaptive Play in Texas Hold’em Poker
459
the board. A last card is added to the board in the river stage. After the last betting round the showdown occurs: the remaining players compare their hands (their hole cards) and the player with the best five cards combination formed with his two cards and the community cards wins the pot (amount of chips bet by all players). In limit Poker, two sizes of bet are used: in the first two stages the bet is called the small bet. And in the last ones the bet is called big bet and worths two small bets. In betting rounds the player who acts has three possibilities:
all trees weighted by the probability of the hands used in the trees (belief that this tree corresponds to the true situation). Those probabilities are given by the beliefs table. Each game tree is solved as follows. There are 3 kinds of nodes:
• He may fold, so he loses the game and the chips he put in. • He may call, in this case he puts the same amount of chips as his opponent, if no chips have been bet in the round this action is called check • He may raise, in this case he puts one bet more than his opponent has bet, if no chips have been bet this action is called bet.
• If it is a ”fold”, the value corresponding to the player who has made this action is 0 while his opponent’s value equals the amount of chips in the pot at this point of the tree, • If it is a ”call” the value of each player is the amount of chips won (or lost) against his opponent’s hand.
3
OUR APPROACH
The approach studied in this paper is close to the human way of playing poker: our bot tries to guess what are his opponent’s hands based on the previous decisions of the opponent in this game. For that purpose, we assign to each possible hand of the opponent, the probability he holds this hand given what he has played before. This association hand/probability represents the beliefs of our bot, and are saved in a table. These probabilities are updated after each action taken by the opponent using a simple Bayes rule (see subsection 3.2). Then, given those beliefs, we compute a ”forest” of Min-Max trees (where each tree corresponds to a possible hand assignments to both players based on the current beliefs of our bot about his opponent) to evaluate the current situation and make our decision based on a weighted combination of the solutions returned by the Min-Max trees. We described this step in the next subsection. This method is used to make decisions after the flop, for preflop play we use precalculated tables from [15].
3.1
• Action nodes: Nodes representing actions of players. • Chance nodes: Nodes representing chance events (cards dealing) • Leaves: Nodes representing the end of the game. The value of a leaf depends on the last selected action:
Now, concerning the action nodes values computation: the value of the active player (the one who takes an action in that node) is defined as the maximal value (for the same player) of the 3 children nodes (corresponding to the 3 possible actions) minus the cost of the action (the amount of chips added in the pot corresponding to the action). His opponent’s value is the value of the child corresponding to the action chosen by the active player. Figures 1 and 2 illustrate the action nodes value update. In case of equality, when choosing the max value for the active player, we choose the most aggressive action (i.e. ”raise” rather than ”call”, ”call” rather than ”fold”), the reason being that in heads-up poker, playing aggressively is usually better than playing passively.
Forest of Game Trees
A forest of trees is composed of a set of Min-Max game trees where each game tree is defined by two couples of hands, one for each player. A couple of hands represent a player point of view: his real hand and his belief about his opponent’s hidden cards. For the AI player, real hand is represented by the two actual hidden cards dealt to him, and the opponent’s hands are chosen randomly according to the current belief table of probabilities about his opponent’s cards. For the opponent player, his real hand is chosen (randomly) according to the belief table (independently of the choice of the AI opponent’s hand) and his belief about our bot’s hand is uniformly randomly generated (i.e. currently, there is no model of the opponent’s belief about our bot’s cards). The beliefs about opponent’s hands are fixed within a tree. To each leaf and node of each game tree, 2 values are assigned, each one corresponding to the value for each player (Vp1 and Vp2 ). One represents the expected outcome from the point of view of the AI bot: the result of the game between his hand against the current belief about his opponent (Vp1 ). The other value is the expected outcome from the point of view of the opponent (Vp2 ). Since each possible hand for our opponent have different probabilities, we build a ”forest” of such trees in order to evaluate the current situation. Once all the game trees have been solved, the value of each possible action is given by the convex combination of the values of
Figure 1. Update of action nodes: here active player is player 1 (black nodes). His values is Vp1 . Values shown on the edges are the children nodes values minus the corresponding action cost. Active player choose the action corresponding to the maximum edge value. Here the rightmost action is chosen (Vp1 = 40). His opponent value (Vp2 ) is the value corresponding to the action chosen by player1: Vp2 = 10.
Players value at chance nodes is the mean of each sons of the node, for example if we are on the ”turn” stage, in two players games, there is 45 remaining possible cards to be dealt. So value for players is: Vp1 =
i=45 i=45 1 X 1 X Vp1i ; Vp2 = Vp2i 45 i=1 45 i=1
Since computing whole trees is too long for online play, we use an approximation for computing trees values at chance nodes: instead
460
R. Maîtrepierre et al. / Adaptive Play in Texas Hold’em Poker
Thus a simple belief update is performed after each action, based on a model of play P(a|H, It ) of the opponent. Now we explain our choice of such models. To define the style (or model) of play, Poker theorists usually [9] consider two attributes: • Whether the player plays tight (plays very few hands) or loose (plays a lot of hands); • Whether the player is aggressive (raises and bluffs) or passive (calls other players bets).
Figure 2. Here active player is player2 (white nodes), he chooses the max value among the Vp2 (this corresponds to the second action). The corresponding Vp1 and Vp2 are updated.
of computing chance nodes values nine times at each stage of the game (1 time for each sequence of actions leading to a next stage) we compute values for the first chance node encountered, and for chance nodes in the same subtree (resulting from the same card dealt) we use the value of the first chance node, modified to consider the change of pot size.
3.2
Belief Update
The AI’s belief about his opponent’ hidden cards (H) is the probability that he really holds these cards given the past actions of the opponent. At the beginning of a hand2 , each possible couple of cards is assigned a uniform probability since no information is revealed from our opponent yet. After each action of our opponent, we update those beliefs according to a model of play of the opponent, which is expressed in terms of the probabilities of choosing an action given his game. Actually we consider several possible such models, each of which defining a specific style of play. A model of play of our opponent is defined by the probabilities P(a|H, It ) of choosing an action a given his hidden cards H and the information set It , where It represents all the information available to both players at time t (e.g. the flop, the bets of the players up to time t, ...). Now, once the opponent has chosen an action a at time t, the beliefs P(H|It ) on his hidden cards H are updated according to Bayes’ rule: P(H|It ) = P(H|It−1 )P(a|H, It−1 ), where It = (It−1 ∪ {a}). 2
here hand means the game from preflop to showdown if it occurs
We selected three features to define relevant properties of a game state. The first one is the stage S of the game (Flop, Turn, River) since it defines the amount of the bets and the number of remaining community cards to come. The second one is the hand strength F (probability of winning) of the hand. The third one is the size of the pot C since it greatly influence the way of playing. We thus model the strategy of the opponent P(a|H, It ) using these three features (F, C, S) of H and It , and write P(a|F, C, S) the corresponding model (approximation of P(a|H, It )). In our implementation, we model two basis strategies, one is tight/aggressive and the other is loose/passive. A model is defined as follow: for each possible stage of the game (Flop, Turn, River) we have a table that gives the probability of choosing each action as a function of the hand strength and the pot size. Hand strength is discretized into 5 possible values and pot size is discretized every 2 big blind. For example at the Flop stage, the table is composed of 5 × 4 = 20 values. Tables at other stages are bigger since the maximum size of the pot is bigger. Those tables have been generated by resorting to expert knowledge. They are not detailed in the paper for size reasons but are available at http://sequel.futurs.inria.fr/maitrepierre/basis-strategies-tables At the beginning we only consider one strategy (tight/aggressive), and after several hands against an opponent, we are able to identify some weakness in our strategy, so we add new strategies. A new strategy is a convex combination of the two basis strategies. For example since the initial strategy is very tight, adding a looser strategy and selecting which one to use (by a method described in the next section) will improve the global behavior. Figure 3 shows the improvement of adding new strategies in games against Vexbot [6]. In the version which participate to the AAAI’07 we considered 5 different strategies built from the 2 basis strategies.
3.3
Strategy Selection
We have seen in the previous section that the different styles of play about the opponent yield different belief updates, which in turn defines different basic strategies. We now have to select a good one. To do that we use a bandit algorithm called UCB (for Upper Confidence Bounds), see [2]. This algorithm allows us to find a good tradeoff between exploitation (use what is believed to be the best strategy) and exploration (select another apparently sub-optimal strategy in order to get additional information about the opponent). UCB algorithm works by defining a confidence bound for each possible strategy, and selects the strategy that has the highest upper bound. In our version we use a slightly modified version of the algorithm named UCB-tuned [1], which takes into account the empirical variance of the obtained rewards. For strategy i, the bound is defined as: r 2 ln n def Bi (n) = σi ni where:
R. Maîtrepierre et al. / Adaptive Play in Texas Hold’em Poker
461
we give the interpretation that this strategy is starting to be less effective against the opponent (the opponent adapts to it), and we decide to forget the period when strategy i was the best, and recompute the bounds and the average rewards for each strategy but only over the 200 lasts hands. Change point detection is illustrated in Figure 4: near the 370th hand, strategy 1 average income has decreased to be under the lower confidence bound, so we recompute new average and bounds.
Figure 3. Performance of one, four, and five strategies against Vexbot (which is an adaptive bot). We observe that the resulting meta-strategy is stronger because it adapts automatically to the opponent and is less predictable.
• n is the number of hand played. • ni is the number of times strategy i was played. • σi is the empirical standard deviation of the rewards. The UCB (-tuned) algorithm consists in selecting the strategy i which has the highest upper bound: x ¯i (n) + Bi (n), where x ¯i (n) is the average rewards won by strategy i up to time n. This version of UCB assumes that the rewards corresponding to each strategy are independent and identically distributed samples of fixed random variables. However, in Poker, our opponent may change his style of play and search for a counter-strategy which adapts to ours. In order to detect possible changes in the opponent strategy, we combine the UCB selection policy to a change-point detection technique. The change-point detection technique should detect brutal decrease in the rewards when using the best strategy (this would correspond to an adaptation of the opponent to our strategy). For this purpose, we define a lower bound on each strategy:
Figure 4. Change point detection. After hand 370 the average reward of strategy 1 goes under UCB’s bound. So the historic of hands is reset and we recompute new bounds for each strategies.
4
Numerical results
We tested our bot against Sparbot [5], Vexbot [6] which are the current best bots in limit heads-up Poker, an AlwaysCall bot and an AlwaysRaise bot (2 deterministic bots which always play the same action). The tests was 1000 hands sessions and we test our bot against each bots on 10 sessions. Vexbot and our bot memory was reset after each session. Results are presented in Table 1.
Our bot Vexbot Sparbot AlwaysCall AlwaysRaise
Our bot Vexbot Sparbot AlwCall AlwRaise +0.05 +0.02 +1.01 +1.87 -0.05 +0.056 +1.04 +2.98 -0.02 -0.056 +0.47 +1.34 -1.01 -1.04 -0.47 =0.00 -1.87 -2.98 -1.34 =0.00
Li (n) = x ¯i (n) − Bi (n),
Table 1. Matches against different Bots, over 10 sessions of 1000 hands. Results are expressed in smallBlind won per Hand for the line player versus column one.
and we compute the moving average rewards, written x ¯i (n − 200 : n), on a window corresponding to the last 200 played hands with each strategy. We say that there is change-point detection if, for the current best strategy i, it happens that x ¯i (n − 200 : n) ≤ Li (n) (i.e. the average rewards obtained over a certain time period is actually worse than the current lower bound on the expected rewards), then
We have studied UCB’s behavior all along a match against Vexbot, studying this match seems more interesting to us since Vexbot is the only bot which adapts to his opponent’s behavior. Figure 5 shows the different uses of the strategies all over the match. We can see that some strategies are favored than others during
def
462
R. Maîtrepierre et al. / Adaptive Play in Texas Hold’em Poker
such or such periods: between hands 1500 and 2500, strategies 1 and 2 are very often used. After, strategies 3 and 4 are used over the 1000 following hands. This shows our opponent’s capacity of adaptation and the fact that UCB, thanks to change point detection, detects this adaptation and changes the current strategy. We can see that strategy 5 isn’t very used during the match but the addition of this strategy improves the performance of our bot figure 3 shows the performance difference before and after the add of strategy 5. We must keep in mind that UCB not only perform a choice over the strategies but also bring us a strategy which is a mix of the basic ones. So Vexbot defeats all our basic strategies but is defeated by the meta-strategy. Also note that Sparbot which is a pseudo-equilibrium is defeated. That is something very interesting because equilibrium players, since they don’t have any weakness to exploit, are a nightmare for adaptive play. It means that even taking no care of computing an equilibrium play on our basis strategies, the meta-strategy can adapt to be not so far of an equilibrium.
there is correlations between the rewards of each arm: if a slightly aggressive style doesn’t work, a very aggressive one will probably fail too. It will allow us to add more basic strategies and having more subtle attempt of exploitation.
5
CONCLUSIONS
We presented an Texas Hold’em limit poker player which adapts its playing style to his opponents. It combines beliefs update methods to obtain different strategies. The use of UCB algorithm enables a fast adaptation to modifications of opponent’s playing style. For humans player, the produced bot seems more pleasant than equilibriums ones since it tries different strategies against his opponent. Moreover due to the UCB selection, the style of play varies very quickly which sometimes give the illusion that the computer tried to trap the opponent. Using different strategies and choosing the right one depending on opponents playing styles seems to be promising idea and should be adapted to multi-player gaming.
REFERENCES
Figure 5. These curves show the number of uses of each strategy over hands played. Reference curve represents an uniformly use of each strategy. Plateaus represents periods during which a strategy isn’t used whereas slopes show great use of it.
We register our bot to the AAAI’07 Computer Poker Competition. It takes part in two competitions, the online learning competition and the equilibrium one. Results can be viewed at http://www.cs.ualberta.ca/~pokert. Even if our approach is able to defeat all the AAAI’06 bots, we didn’t perform very well in this competition (not in top 5 bots). We see several reasons to this: firstly, our approach require a lot a computer time during the match. So we had to limit the monte carlo exploration in order to comply with the time limit. Secondly, the strategy of the top competitors is really very close to a Nash equilibrium. So as our different strategies are not computed to be Nash equilibrium, our aggressive play is defeated. In fact, in future version we think that the meta-strategy obtained by uniformly choose one of our basis strategy should lead to a Nash equilibrium. Doing so will ensure us to not losing chips during the exploration stage because at the very beginning, UCB performs a near uniform exploration of all strategies. Moreover, it would offer a good response to a Nash equilibrium player: an other Nash equilibrium. An other future improvement will be the update at the same time of the expectation of several arms of the UCB. This is possible because
[1] J.Y. Audibert, R. Munos, and C. Szepesv´ari. Use of variance estimation in the multi-armed bandit problem. NIPS Workshop on on-line trading of exploration and exploitation, Vancouver, 2006. [2] Peter Auer, Nicol`o Cesa-Bianchi, and Paul Fischer, ‘Finite-time analysis of the multiarmed bandit problem’, Machine Learning, 47(2/3), 235–256, (2002). [3] D. Billings, ‘The first international roshambo programming competition’, The International Computer Games Association Journal, 1(23), 42–50, (2000). [4] D. Billings, ‘Thoughts on roshambo’, The International Computer Games Association Journal, 1(23), 3–8, (2000). [5] D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron. Approximating game-theoretic optimal strategies for full-scale poker, 2003. [6] D. Billings, A. Davidson, T. Schauenberg, N. Burch, M. Bowling, R. Holte, J. Schaeffer, and D. Szafron, ‘Game-tree search with adaptation in stochastic imperfect-information games’, Computers and Games: 4th International Conference, 21–34, (2004). [7] Darse Billings, Aaron Davidson, Jonathan Schaeffer, and Duane Szafron, ‘The challenge of poker’, Artificial Intelligence, 134(1-2), 201–240, (2002). [8] Darse Billings, Lourdes Pena, Jonathan Schaeffer, and Duane Szafron, ‘Using probabilistic knowledge and simulation to play poker’, in AAAI/IAAI, pp. 697–703, (1999). [9] Doyle Brunson, Super System: A Course in Power Poker,, Cardoza Publishing, 1979. [10] Murray Campbell, A. Joseph Hoane Jr., and Feng hsiung Hsu, ‘Deep blue’, Artificial Intelligence, 134, 57–83, (2002). [11] Andrew Gilpin and Tuomas Sandholm, ‘Finding equilibria in large sequential games of imperfect information’, in EC ’06: Proceedings of the 7th ACM conference on Electronic commerce, pp. 160–169, New York, NY, USA, (2006). ACM. [12] Andrew Gilpin and Tuomas Sandholm, ‘Better automated abstraction techniques for imperfect information games, with application to texas hold’em poker’, in AAMAS ’07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, pp. 1– 8, New York, NY, USA, (2007). ACM. [13] J. F. Nash. Equilibrium points in n-person games, 1950. [14] J. Schaeffer and R. Lake. Solving the game of checkers, 1996. [15] A. Selby. Optimal strategy for heads-up limit holdem. [16] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione, ‘Regret minimization in games with incomplete information’, in Advances in Neural Information Processing Systems 20, eds., J.C. Platt, D. Koller, Y. Singer, and S. Roweis, MIT Press, Cambridge, MA, (2008).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-463
463
Theoretical and Computational Properties of Preference-based Argumentation Yannis Dimopoulos1 and Pavlos Moraitis2 and Leila Amgoud3 Abstract. During the last years, argumentation has been gaining increasing interest in modeling different reasoning tasks of an agent. Many recent works have acknowledged the importance of incorporating preferences or priorities in argumentation. However, relatively little is known about the theoretical and computational implications of preferences in argumentation. In this paper we introduce and study an abstract preference-based argumentation framework that extends Dung’s formalism by imposing a preference relation over the arguments. Under some reasonable assumptions about the preference relation, we show that the new framework enjoys desirable properties, such as coherence. We also present theoretical results that shed some light on the role that preferences play in argumentation. Moreover, we show that although some reasoning problems are intractable in the new framework, it appears that the preference relation has a positive impact on the complexity of reasoning.
1
Introduction
Argumentation has become an Artificial Intelligence keyword for the last fifteen years, especially in sub-fields such as non monotonic reasoning [8] and agent technology (e.g. [4]). Argumentation is a promising reasoning model based on the interaction of different arguments for and against some statement. This interaction between arguments is typically based on a notion of attack, which can take different forms according to the form that the arguments have. For example, when an argument takes the form of a logical proof, arguments for and against a statement can be put across and in this case the attack relation expresses logical inconsistency. Argumentation can therefore be considered as a reasoning process implying construction and evaluation of interacting arguments. Several interesting argumentation frameworks have been proposed in the literature (see e.g. [3, 14, 12]). The majority of these systems is based on the abstract argumentation framework of Dung [8], where no assumption is made about the nature of arguments or the properties of the attack relation (i.e. the attack relation can be any binary relation on the set of arguments). Some recent works have proposed argumentation systems (see e.g. [2, 1, 5]) that are based on a defeat relation (corresponding to the attack relation in Dung’s framework), that is composed from a conflict relation on the set of arguments and a preference relation between arguments, reflecting the fact that arguments may not have equal strengths. However till now, relatively little is known about the 1 2 3
University of Cyprus, 75 Kallipoleos Str. 1678, Nicosia-Cyprus Paris Descartes University, 45 rue des Saints-P`eres 75270, Paris-France Paul Sabatier University, 118 route de Narbonne 31062, Toulouse-France
theoretical and computational properties of abstract preference-based argumentation systems. This paper is an attempt towards understanding the effects of a preference relation on an argumentation system. More precisely, it investigates the impact of the preference relation between arguments within a new abstract argumentation framework. The attack relation is the composition of a conflict relation with the preference relation, both defined on the set of arguments. The framework is abstract and general in the sense that the only assumptions made are that the conflict relation is symmetric and irreflexive, and the preference relation is a partial pre-order (i.e. reflexive and transitive). Under these reasonable and general assumptions, we show that the new framework enjoys desirable properties for an argumentation system, such as coherence. It turns out that the preference relation on the arguments translates into a preference relation on the powerset of these arguments. Moreover, the stable extensions of the preference-based argumentation theories correspond to the most preferred sets of arguments that are conflict-free. We also investigate the computational properties of the new framework and demonstrate that a transitive preference relation on the set of arguments can mitigate the computational burden of some reasoning tasks. Indeed, computing a stable extension of a preference-based argumentation theory can be performed in polynomial time. Furthermore, enumerating all stable extensions of such a theory without incomparability between arguments can be carried out with polynomial delay. Moreover, if in addition the theory does not contain indifferent arguments, finding its unique stable extension is also a polynomial computation. On the negative side, some other reasoning tasks are intractable. More specifically, deciding whether an argument is a credulous conclusion of a preference-based argumentation theory is NP-hard, while deciding whether it is a skeptical one is coNP-hard. The paper is organized as follows. We first review the basics of argumentation as introduced in [8]. Then, we present the abstract preference-based argumentation framework we propose, and investigate some of its properties. We then present algorithms for reasoning in the new framework, along with some complexity results. The last section concludes with some remarks and perspectives.
2
Basics of argumentation
Argumentation is a reasoning model based on the following main steps: i) constructing arguments and counter-arguments, ii) defining the strengths of those arguments, and iii) concluding or defining the justified conclusions. Argumentation systems are built around an underlying logical language and an associated notion of logical consequence, defining the notion of argument. The argument construction is a monotonic process: new knowledge cannot rule out an argument
464
Y. Dimopoulos et al. / Theoretical and Computational Properties of Preference-Based Argumentation
but only gives rise to new arguments which may interact with the first argument. Arguments may be conflicting for different reasons. Definition 1 (Argumentation system [8]) An argumentation system is a pair T = (A, R). A is a set of arguments and R ⊆ A × A is an attack relation. We say that an argument a attacks an argument b iff (a, b) ∈ R. Among all the arguments, it is important to know which arguments to keep for inferring conclusions. In [8], different acceptability semantics have been proposed. The basic idea behind these semantics is the following: for a rational agent, an argument ai is acceptable if he can defend ai against all attacks. All the arguments acceptable for a rational agent will be gathered in a so-called extension. An extension must satisfy a consistency requirement and must defend all its elements. Definition 2 (Conflict-free, Defence [8]) Let B ⊆ A, and ai ∈ A. • B is conflict-free iff ai , aj ∈ B s.t. (ai , aj ) ∈ R. • B defends ai iff ∀ aj ∈ A, if (aj , ai ) ∈ R, then ∃ ak ∈ B s.t. (ak , aj ) ∈ R. The main semantics introduced by Dung are summarized in the following definition. Definition 3 (Acceptability semantics [8]) Let B be a conflict-free set of arguments. • B is admissible iff it defends any argument in B. • B is a preferred extension iff it is a maximal (w.r.t ⊆) admissible extension. • B is a stable extension iff it is a preferred extension that attacks any argument in A\B. Now that the acceptability semantics are defined, we are ready to define the status of any argument. Definition 4 (Argument status) Let T = (A, R) be an argumentation system, and E1 , . . . , Ex its stable extensions. Let a ∈ A. • a is skeptical conclusion of T iff a ∈ Ei , ∀Ei=1,...,x = ∅. • a is credulous conclusion of T iff ∃Ei such that a ∈ Ei .
3
A Preference-based Argumentation Framework
In [1] the basic argumentation framework of Dung has been extended into preference-based argumentation theory (PBAT). The basic idea of a PBAT is to consider two binary relations between arguments: 1. A conflict relation, denoted by C, that is based on the logical links between arguments. 2. A preference relation, denoted by ,, that captures the idea that some arguments are stronger than others. Indeed, for two arguments a, b ∈ A, a , b means that a is at least as good as b. The relation , is assumed to be a partial pre-order (that is reflexive and transitive). The relation 1 denotes the corresponding strict relation. That is, a 1 b iff a , b and b , a. The two relations are combined into a unique attack relation, denoted by R, and the Dung’s semantics are applied on the resulting framework. In what follows, we will study a particular class of PBATs, where the conflict relation C is irreflexive and symmetric.
Definition 5 (Preference-based Argumentation Theory (PBAT)) Given an irreflexive and symmetric conflict relation C and a preference relation , on a set of arguments A, a preference-based argumentation theory (PBAT) on A is an argumentation system T = (A, R), where (a, b) ∈ R iff (a, b) ∈ C and b 1 a. It follows directly from the definition that if (a, b) ∈ C and a , b and b , a, then (a, b) ∈ R. Moreover, if (a, b) ∈ C and a, b are either indifferent or incompatible in ,, then (a, b) ∈ R and (b, a) ∈ R. Also note that if (a, b) ∈ C, then either (a, b) ∈ R or (b, a) ∈ R. Finally, if (a, b) ∈ R and (b, a) ∈ / R, then a 1 b. The following example illustrates some features of PBATs. Example 1 Let A = {a, b, c, d} be a set of arguments, and C the conflict relation on A defined as C = {(a, b), (b, a), (b, c), (c, b), (c, d), (d, c)}. Moreover, let the preference relation , contain transitive closure of the set of pairs a , b, b , c, c , d, and d , c. The corresponding PBAT is T = (A, R), where R = {(a, b), (b, c), (c, d), (d, c)}. Theory T has two stable extensions, E1 = {a, c} and E2 = {a, d}. We note here that, although it seems that combining the conflict and preference relations can be done in many different ways other than the one proposed in definition 5, all of these combinations lead to counterintuitive results and properties. A detailed analysis of these possibilities will appear in an extended version of this paper.
4
Basic Properties of PBATs
In this section we present some basic properties of PBATs. To facilitate the discussion and the presentation of the results of this section as well as those of other part in the remainder of this paper, we use some basic notions from graph theory. Indeed, as with every binary relation on a set, an argumentation system T is associated with a directed graph (digraph) GT whose nodes are the different arguments, and the edges represent the attack relation defined on them. The identification of graph theoretical structures has led to useful results regarding the properties of argumentation systems (e.g. [9]). Let G = (N, E) be a digraph and n ∈ N a node of G. The indegree of n in G is the number of nodes n of G such that (n , n) ∈ E. A (strongly connected) component C of a digraph G is a maximal subgraph C of G such that for every pair of nodes x, y ∈ C, there is a path from x to y in C. If each component of a digraph G is contracted to a single node, the resulting graph is a directed acyclic one, and is called the components graph of G. A top component of a digraph G is one that has in-degree 0 in the components graph of G. Our first result characterizes the cycles of the graph of a PBAT. Proposition 1 Let GT be the graph associated with a PBAT T = (A, R). Every cycle of GT has at least two symmetric edges. Proof We prove by case analysis that a cycle of GT cannot have no or one symmetric edges. Let a1 , a2 , . . . , an be a cycle of GT . This means that ∀i < n, (ai , ai+1 ) ∈ R and (an , a1 ) ∈ R. Let us assume that this cycle has no symmetric edges, ie. ∀i < n, (ai+1 , ai ) ∈ R and (a1 , an ) ∈ R. Since ∀i < n, (ai , ai+1 ) ∈ R and (ai+1 , ai ) ∈ R, it holds that ∀i < n, ai , ai+1 . By transitivity, a1 , an , meaning (a1 , an ) ∈ R, contradiction. Assume now that a1 , a2 , . . . , an is a cycle of GT such that (an , a1 ) is the only symmetric edge of the cycle. Assume first that the two arguments an , a1 are incomparable wrt the underlying preference relation ,. The transitivity of the preference relation requires that
Y. Dimopoulos et al. / Theoretical and Computational Properties of Preference-Based Argumentation
a1 , an , which contradicts the incomparability of the two arguments. Assume now that a1 , an and an , a1 . Since an , a1 and a1 , a2 , by transitivity an , a2 . On the other hand we have a2 , a3 , . . ., an−1 , an , and by transitivity a2 , an . Hence the cycle must also contain a symmetric edge between a2 and an . Therefore every cycle of GT has at least two symmetric edges. Doutre [6] has shown that the kernels of the associated graph of an argumentation theory correspond exactly to its stable extensions. A kernel of a directed graph G = (N, E) is a set of nodes K ⊆ N such that (a) K is an independent set, that is, there is no pair of nodes ni , nj ∈ K s.t. (ni , nj ) ∈ E or (nj , ni ) ∈ E (b) for all n ∈ N \ K there is a node n ∈ K s.t. (n , n) ∈ E. Moreover, Duchet [7] proved that every graph with at least two symmetric edges has a kernel. By combining these two results we obtain the following theorem. Theorem 1 Every PBAT has a stable extension. We show now that the graph associated with a PBAT has no elementary cycles of length greater than 2. The notion of elementary cycle is defined as follows. Definition 6 (Elementary cycle) Let T = (A, R) be a PBAT and X = {a1 , . . ., an } be a set of arguments of A. X is an elementary cycle of T iff: 1. ∀i ≤ n − 1, (ai , ai+1 ) ∈ R and (an , a1 ) ∈ R 2. X ⊂ X such that X satisfies condition 1. Proposition 2 Let T = (A, R) be a PBAT on an underlying preorder ,. Then, R has no elementary cycle of length greater than 2. Proof Let a1 , . . . , an be arguments of A, with n > 2, and assume that they form an elementary cycle, i.e. ∀i ≤ n, (ai , ai+1 ) ∈ R, and (an , a1 ) ∈ R. Since the cycle is elementary, then ai , ai+1 such that (ai , ai+1 ) ∈ R and (ai+1 , ai ) ∈ R. Thus, ai 1 ai+1 , ∀i < n. Therefore, a1 1 a2 1 . . . an 1 a1 , contradiction. A direct consequence of the above property is that PBATs do not have elementary odd-length cycles. By the results of [10], this implies that PBATs are coherent, i.e., their preferred and stable extensions coincide. Theorem 2 Every PBAT is coherent. In the remaining of this section we investigate the impact of the preference relation on an argumentation system. We first define a relation on the powerset of the arguments of a PBAT T = (A, R) (we denote by P(A) the powerset of A), and then show that the stable extensions of T correspond to the most preferred elements of P(A) wrt this relation. Definition 7 Let T = (A, R) be a PBAT built on an underlying preorder ,. If A1 , A2 ∈ P(A), with A1 = A2 , then A1 A2 iff one of following holds: • A 1 ⊃ A2 • for all a, b such that a ∈ A1 \ A2 and b ∈ A2 \ A1 , it holds that a1b The following result states the relation between and stable extensions, and hence sheds some light on the connection between preference and argumentation.
465
Theorem 3 Let T = (A, R) be a PBAT built on an underlying preorder , and a conflict relation C. E is a stable extension of T iff there are no arguments a, b ∈ E s.t. (a, b) ∈ C, and for all A ∈ P(A) such that A E, there are a1 , a2 ∈ A such that (a1 , a2 ) ∈ C. Proof Let E be a stable extension of T . Then, by definition, it contains no pair of arguments a, b s.t. (a, b) ∈ R. Hence, E can not contain arguments a, b s.t. (a, b) ∈ C. We prove by case analysis that for all A ∈ P(A) such that A E there exists a pair of arguments a1 , a2 ∈ A s.t. (a1 , a2 ) ∈ C. Assume first a set A with A ⊃ E. Since E is a stable extension, for all a ∈ A \ E, there is b ∈ E, and because A ⊃ E, b ∈ A s.t. (b, a) ∈ R. Therefore there exist a, b ∈ A, s.t. (a, b) ∈ C. Assume now that AE and A ⊃ E. Again, for all a ∈ A\E, there is b ∈ E s.t. (b, a) ∈ R. Since A E, by definition 7 follows that for all a ∈ A \ E and c ∈ E \ A, it holds a 1 c and hence (c, a) ∈ R. Therefore, it must be the case that b ∈ E ∩ A, which means that A contains a pair a, b such that (b, a) ∈ R, and therefore (a, b) ∈ C. Let now E be a set of arguments that contains no pair of elements a, b s.t. (a, b) ∈ C, and for all A ∈ P(A) such that A E, there are a1 , a2 ∈ A such that (a1 , a2 ) ∈ C. We prove that E is a stable extension. We show first that E is admissible. Observe that since E contains no pair of elements a, b s.t. (a, b) ∈ C, it can not contain a pair a, b s.t. (a, b) ∈ R. Assume that there exist a ∈ E and b ∈ A \ E s.t. (b, a) ∈ R and there is no c ∈ E such that (c, b) ∈ R. Hence b 1 a. Then define D(b) = {d|(b, d) ∈ R and d ∈ E}, and construct the set E = (E \ D(b)) ∪ {b}. Then, it is the case that E E and furthermore there is no pair a1 , a2 ∈ E such that (a1 , a2 ) ∈ R, and therefore (a1 , a2 ) ∈ C, contradiction. Assume now that there exists b ∈ A \ E s.t. for all a ∈ E it holds that (a, b) ∈ R. Clearly, (b, a) ∈ R, because otherwise E is not admissible. Then again, E ∪ {b} E and furthermore there is no pair a1 , a2 ∈ E ∪ {b} such that (a1 , a2 ) ∈ C, contradiction. The example below highlights the link between the relation and the stable extensions. Example 2 Let T = (A, R) be a PBAT with A = {a, b, c} and R composed from the conflict relation C = {(a, b), (b, a)(a, c), (c, a)} and preference relation that contains the pairs a , b and a , c. The relation on P(A) induced by , is depicted in figure 1. Since the sets {a, b, c}, {a, b}, {a, c} are ruled out by C, the set E = {a} is the stable extension of T .
5
Reasoning in PBATs
This section contains a preliminary investigation of the computational properties of the new argumentation framework. We start by presenting below the algorithm stable extension that computes a stable extension of a PBAT in polynomial time. Recall that finding a stable extension of a general argumentation system is an intractable task (see eg. [9]). stable extension(A, R) A = A; E = ∅ While (A = ∅) do Compute a top component C of theory (A, R) Select a node n ∈ C such that for all n ∈ A with (n , n) ∈ R it holds that (n, n ) ∈ R E = E ∪ {n};
466
Y. Dimopoulos et al. / Theoretical and Computational Properties of Preference-Based Argumentation
a b
a
• (li , cj ), if literal li appears in clause cj . • (ci , t), for 1 ≤ i ≤ n.
c
b
a
c
a
b
c
c
b
{}
Figure 1. Ranking relation where an edge from A to B means that A B.
A = A − ({n} ∪ {n |(n, n ) ∈ R}) end do Return E Notice that by construction the set E returned by the above algorithm does not contain two elements x, y such that (x, y) ∈ R. Moreover, again by construction, for each element x ∈ A that is not included in E, there must by some element y ∈ E such that (y, x) ∈ R. Therefore, the set E returned by the algorithm is a stable extension of the input theory (A, R). The key point of the stable extension algorithm is that at each iteration it finds a node n from a top component of the input theory such that for all n ∈ A for which (n , n) ∈ R, it holds that (n, n ) ∈ R. An informal justification of the existence of such elements is the following. Assume that the algorithm reaches a point where there is a top component C of the theory that contains no node with the above property. This means that for every node n ∈ C there exists some other node n ∈ C such that (n , n) ∈ R and (n, n ) ∈ R. Remove from C all symmetric edges (the edge (x, y) ∈ R is symmetric if (y, x) ∈ R also holds). Then, in the resulting graph all nodes of C must have an incoming edge, which means that C contains a cycle with no symmetric edges, which contradicts proposition 1. Although computing a stable extension of a PBAT can be performed in polynomial time, we prove below that credulous and skeptical reasoning in the new framework are intractable. Theorem 4 Let T = (A, R) be a PBAT and a ∈ A. Deciding whether a is a credulous conclusion of T is NP-hard. Proof We prove the claim by a reduction from 3SAT. Let S = {c1 , . . . cn } be a 3SAT theory on a set of clauses c1 , . . . cn . From S we construct a PBAT ST = (A, R). The set of arguments A of ST contains the following elements: • An argument li for each literal li that appears in S. • An argument cj for each clause cj of S, 1 ≤ j ≤ n. • An additional argument t that corresponds to the whole theory S. The underlying conflict relation C of ST contains the following (symmetric) pairs: • (li , ¬li ), for each argument li that corresponds to a literal li of S
Finally, the underlying preference relation , of ST , is defined as ,= {(a, b)|a, b ∈ A, a = b} − {(t, ci )|ci is the argument that corresponds to clause ci }, that is, each argument that corresponds to clauses is preferred to the argument that corresponds to the theory, whereas all other arguments are indifferent to each other. Therefore, R coincides with its underlying conflict relation, with the only difference that it does not contain the pairs (t, ci ), for 1 ≤ i ≤ n. We now prove that S is satisfiable iff ST has a stable (admissible) extension that contains argument t. Let M be a satisfying truth assignment of S. We show that the set of arguments E = M ∪ {t} is an extension of ST . First note that for any pair of arguments ai , aj ∈ E, it holds that (ai , aj ) ∈ R. Furthermore, it holds that for each ci ∈ A that corresponds to a clause of S, there must be some argument lj ∈ E that corresponds to some literal of S such that (lj , ci ) ∈ R (otherwise M is not satisfying). Therefore, E is a stable extension of ST . Let now E be a stable extension of ST such that t ∈ E. We prove that the assignment that corresponds to the arguments of E is a satisfying one for S. This assignment does not contain any pairs of complementary literals because these pairs of literals belong to R. Furthermore, since t ∈ E, it must be the case that ci ∈ E for 1 ≤ i ≤ n. Therefore it must be the case that for each clause ci of S at least one of its literals must belong to E, therefore the assignment that corresponds to E is satisfying. Proposition 3 Let T = (A, R) be a PBAT and a ∈ A. Deciding whether a is a skeptical conclusion of T is coNP-hard. Proof Given a propositional theory S we construct a PBAT TS = (A, R) in a way similar to that of the previous proof with the difference that A contains an additional argument t such that pair (t, t ) ∈ C, (t , t) ∈ C, and t , t , t , t. It is not difficult to prove that t is a skeptical conclusion of TS iff S is unsatisfiable.
6
Theories without incomparability
In this section we turn our attention to PBATs without incomparability, i.e. theories T = (A, R) such that for each pair of arguments ai , aj ∈ A, either ai , aj or aj , ai . More specifically we present an algorithm that enumerates all stable extensions of a theory in this class with polynomial delay. An algorithm that enumerates the elements of a set S is said to be a polynomial delay one, if it computes the first element of the set within polynomial time in the size of the input, and furthermore the time taken by the algorithm between computing two consecutive elements of this set is also bounded by some polynomial in the size of the input. The key property of PBATs without incomparability that is exploited by the stable extensions computation algorithm, is that the strongly connected components of the graph GT of such a theory T contain only symmetric edges, and therefore these components are essentially undirected (sub)graphs. This useful property is proved in the following result. Proposition 4 Let T = (A, R) be a PBAT without incomparability, and GT its associated digraph. If a, b ∈ A are arguments that belong to the same component of GT and (a, b) ∈ R, then (b, a) ∈ R. Proof Let a, b ∈ A be arguments that belong to the same component of GT and (a, b) ∈ R. Therefore (b, a) ∈ C, and a , b. Since
Y. Dimopoulos et al. / Theoretical and Computational Properties of Preference-Based Argumentation
a, b belong to same component there must be a path from b to a. Since there is no incomparability, by transitivity we get that b , a. From this and the fact (b, a) ∈ C we conclude that (b, a) ∈ R. The kernels (recall that kernels correspond to stable extensions) of a graph that contains only symmetric edges are exactly its maximal (w.r.t. set inclusion) independent sets (MISs). To see this, note that it follows from the definition, that every kernel is an MIS. On the other hand, since in this case all edges are symmetric, an MIS is also a kernel. This connection between stable models, kernels and MISs, allows us to employ well-known procedures that enumerate all maximal independent sets of a graph with polynomial delay [11]. Algorithm all stable extensions, that is presented below, enumerates the stable extensions of the input theory by traversing the theory from its top components downwards. Singleton components are handled separately by the first iteration of the algorithm. To enumerate the elements that belong to stable extensions and at the same time to components with more than one nodes, the algorithm utilizes a procedure that performs MISs computation with polynomial delay. all stable extensions(A, R) A = A; E = ∅ While (A = ∅) do While (A has nodes with in-degree 0) do E = E ∪ {a|a ∈ A and has in-degree 0 } A = A − (E ∪ {a |a ∈ E and (a, a ) ∈ R}) end do Select a top component C of (A, R) For each MIS M of C computed with polynomial delay do E = E ∪ M; A = A − (M ∪ {a |a ∈ M and (a, a ) ∈ R}) call stable extension(A , R) end do end do Return E It is known [13] that the number of MISs of a graph with n nodes is at most nn/3 . Therefore, if a PBAT has m components each of which has at least 2 nodes and at most k nodes, then the theory has at most nmk/3 stable extensions. Hence, the run time of the algorithm is exponential in mk. For ”small” values of m and k, the above algorithm can be also used to perform credulous and skeptical reasoning. The idea is to simply enumerate all stable extensions of the input theory, and terminate as soon as the given argument belongs (credulous reasoning) or does not belong (skeptical reasoning) to one of the stable extensions. Consider now a PBAT T = (A, R) where the underlying preference , relation contains neither incomparability nor indifference. Therefore, for all pairs of arguments ai , aj ∈ A, either ai , aj or aj , ai holds, but not both. In this case the graph of T is acyclic and T has exactly one stable extension. The first iteration of the algorithm all stable extensions above computes this unique stable extension in polynomial time. Obviously, the same procedure can be used for credulous and skeptical reasoning in this restricted class of PBATs.
7
Conclusion and Future Work
In this paper we presented an abstract preference-based argumentation framework. Although other works in the literature (see e.g. [2, 1, 5] ) have also acknowledged the importance of incorporat-
467
ing preferences in argumentation systems, very little have been said about the theoretical and computational properties of such systems. This paper is a work in the direction of filling this gap by proposing a new preference-based argumentation framework and studying its basic properties. We have shown that the theories of the new framework have always stable extensions and are coherent. We also characterized the structure of preference-based argumentation theories by extending previous works that attempted to link argumentation and graph theory (see eg. [9] for a recent example). Moreover, it seems that the transitivity of the underlying preference relation imposes a strong structure on the preference-based argumentation theories that can be exploited computationally. Indeed, some computational problems become easier in the new framework, whereas others remain intractable. There are many directions for future research. We plan to investigate more deeply the structural properties of PBATs and further extend the link with graph theory. Moreover, we intend to study the properties of the relation and identify its effects on argumentation. Finally, the computational properties of the new framework will be explored more fully in the future.
ACKNOWLEDGEMENTS We thank one of the reviewers for many helpful comments.
REFERENCES [1] L. Amgoud and C. Cayrol, ‘On the acceptability of arguments in preference-based argumentation framework’, in Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pp. 1–7, (1998). [2] L. Amgoud, Y. Dimopoulos, and P. Moraitis, ‘A Unified and General Framework for Argumentation-based Negotiation’, in Proc. 7th International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 963–970. ACM Press, (2007). [3] L. Amgoud and H. Prade, ‘Explaining qualitative decision under uncertainty by argumentation’, in 21st National Conference on Artificial Intelligence, AAAI’06, pp. 16 – 20, (2006). [4] T. Bench-Capon and P. Dunne, ‘Argumentation in artificial intelligence’, Artif. Intell., 171(10-15), 619–641, (2007). [5] T. J. M. Bench-Capon, ‘Persuasion in practical argument using valuebased argumentation frameworks’, Journal of Logic and Computation, 13(3), 429–448, (2003). [6] S. Doutre, Autour de la s´emantique pr´ef´er´ee des syst`emes d’argumentation, PhD thesis, Universit´e Paul Sabatier, Toulouse – France., 2002. [7] P. Duchet, Repr´esentations, Noyaux en Th´eorie des Graphes et Hypergraphes, PhD thesis, 1979. [8] P. M. Dung, ‘On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games’, Artificial Intelligence, 77, 321–357, (1995). [9] P. Dunne, ‘Computational properties of argument systems satisfying graph-theoretic constraints’, Artif. Intell., 171(10-15), 701–729, (2007). [10] P. Dunne and T. Bench Capon, ‘Coherence in finite argument systems’, Artificial Intelligence, 141 (1–2), 187–203, (2002). [11] D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis, ‘On generating all maximal independent sets’, Inf. Process. Lett., 27(3), 119–123, (1988). [12] A. Kakas and P. Moraitis, ‘Argumentation based decision making for autonomous agents’, in Proc. 2nd International Joint Conference on Autonomous Agents and Multi-Agents Systems, pp. 883–890, (2003). [13] J. Moon and L. Moser, ‘On cliques in graphs’, Israel Journal of Mathematics, 3, 23–28, (1965). [14] H. Prakken and G. Sartor, ‘Argument-based extended logic programming with defeasible priorities’, Journal of Applied Non-Classical Logics, 7, 25–75, (1997).
468
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-468
Norm Defeasibility in an Institutional Normative Framework Henrique Lopes Cardoso and Eug´enio Oliveira1 Abstract. Normative environments have been proposed to regulate agent interaction in open multi-agent systems. However, most approaches rely on pre-imposed regulations that agents are subject to. Taking a different stance, we focus on a normative framework that assists agents in establishing by themselves their own commitment norms. With that aim in mind, a model of norm defeasibility is presented that enables exploiting and adapting a normative background to different extents. We formalize the normative state using first-order logic and define rules and norms operating on that state. A suitable semantics regarding the use of norms within a hierarchical context structure is given, based on norm activation conflict and defeasibility.
1
INTRODUCTION
Most approaches regarding the use of norms in multi-agent systems (MAS) have addressed one of two ends of a spectrum. On one end, there are systems where norms are pre-imposed on agents, either with no possible deviation [5] or admitting violations [4, 1]. On the other end, social norm emergence from agent interaction is also being addressed [19]. In this paper we consider a midway approach on the use of norms in MAS, where norms are consciously adopted by a group of agents. The Electronic Institution (EI) concept has also been studied with this aim in mind [14, 3]. In particular, in [13] a normative framework has been suggested as a core component of an EI. In the present paper, a normative environment assisting agentbased automated contract establishment is formalized. Agents can exploit a supportive normative framework in order to establish their mutual contracts in a more straightforward fashion: contracts [13] can be underspecified, relying on a structured normative environment that fills in any omissions. We define the notion of normative context and context hierarchies, characterize the normative state and give a representation of norm. We then formalize normative conflicts in our approach and their resolution based on norm activation defeasibility. From the law field, three normative conflict resolution principles have been defined and traditionally used. The lex superior is a hierarchical criterion and indicates that a norm issued by a more important legal entity prevails, when in conflict with another norm (e.g. the Constitution prevails over any other legal body). The lex posterior is a chronological criterion indicating that the most recent norm prevails. The lex specialis is a specificity criterion establishing that the most specific norm prevails. While not firmly adopting any of these options, our approach resembles more the lex specialis principle, because broadly speaking a norm defined at a more specific context will typically prevail. 1
LIACC, DEI / Faculdade de Engenharia, Universidade do Porto, R. Dr. Roberto Frias, 4200-465 Porto, Portugal, email: {hlc, eco}@fe.up.pt
The paper is organized as follows. Section 2 deals with the normative environment, defines the notion of context and sub-context, describes the normative state and gives a representation for rules and norms. Section 3 is devoted to norm semantics and to the norm activation defeasibility approach. In Section 4 we provide some examples that exploit the usage of the normative environment. Finally, Section 5 discusses related work and Section 6 concludes.
2
AN INSTITUTIONAL NORMATIVE ENVIRONMENT
In this section we define the normative environment and present a context-based normative framework. This framework forms the basis for the norm defeasibility model described in Section 3. Def. 1: Normative Environment NE = NS , IR, N The normative environment NE of an EI is composed of a normative state NS , a set IR of institutional rules (see Def. 7) that manipulate that normative state and a set N of norms, which can be seen as a special kind of rules (see Def. 8). The role of institutional rules is to maintain the normative state of the system. While norms define the normative positions of each agent, the main purpose of those rules is to relate the normative state with the standing normative positions (see [12] for the use of rules in monitoring those normative positions).
2.1
Contexts
Our model is based on a contextualization of both the normative state and norms. In this subsection we introduce the notion of context and context organization. Def. 2: Context C = PC , CA, CI , CN A context C is an organizational structure within which a set CA of agents commits to a joint activity partially regulated by a set CN ⊆ N of appropriate norms. A context includes a set CI of contextual info that makes up a kind of background knowledge for that context (see Def. 4). PC is the parent context within which context C is formed. Let PCA be the set of agents in context PC : we have that CA ⊆ PCA. Contexts allow us to organize norms according to a hierarchical normative structure. Norm set N is partitioned among the several contexts that may exist, that is, sets CN for each context are mutually disjoint. A norm inheritance mechanism (as explained later) justifies why set CN only partially regulates the activity of agents in CA. We identify a top level context within which all other contexts are (directly or indirectly) formed. We now introduce the notion of sub-context.
469
H. Lopes Cardoso and E. Oliveira / Norm Defeasibility in an Institutional Normative Framework
Def. 3: Sub-context C C Context C = PC , CA , CI , CN is a sub-context of context C = PC , CA, CI , CN , denoted C C, if PC = C or if PC C . When C is either a sub-context of C or C itself, we write C C. From Def. 2 we also have that CA ⊆ CA. A sub-context defines an organizational structure committed to by a subset of the parent context’s agents. Notice that the sub-context relationship is an explicit one. Every context is a sub-context of the top context. We now define contextual information as a foundational component of a context. Def. 4: Contextual info Info C Contextual info Info C is a fully-grounded atomic formula in first-order logic, which comprises founding information regarding a context C = PC , CA, CI , CN . Info C ∈ CI . The CI component in a context definition is therefore composed of first-order logic formulae that provide background information for that context.
2.2
Normative State
The normative state is organized through contexts, and concerns the description of what is taken for granted in a model of institutional reality. Therefore, we call every formula in NS an institutional reality element, or IRE . Each institutional reality element refers to a specific context within which it is relevant. Def. 5: Contextual institutional reality element IRE C A contextual institutional reality element IRE C is an IRE regarding context C. We distinguish the following kinds of IRE C and with the following meanings: ifact C (f , t) institutional fact f has occurred at time t time C (t) instant t has elapsed C obl (a, f , t) agent a is obliged to bring about fact f until deadline t fulf C (a, f , t) agent a has fulfilled, at time t, his obligation to bring about f viol C (a, f , t) agent a has violated, at time t, his obligation to bring about f Note that the use of context C as a superscript is only a syntactical convenience – both contextual info and institutional reality elements are first-order formulae (C could be used as the first argument of each of these formulae). While contextual info is confined to background information that is part of the context definition, contextual institutional reality elements represent occurrences taking place after the context’s creation, during its lifetime. We consider institutional facts as agent-originated, since they are obtained as a consequence of some agent action. The remaining elements are environment events, asserted in the process of norm activation and monitoring [13]. Our model of institutional reality is based on a discrete model of time. The time elements are used to signal instants that are relevant to the context at hand. Obligations are deontic statements, and we admit both their fulfillment and violation. Def. 6: Normative State NS = {IRE1C1 , IRE2C2 , ..., IREnCm } The normative state NS is a set of fully-grounded atomic formulae IREiCj , 1 ≤ i ≤ n, in first-order logic.
The normative state will contain, at each moment, all elements that characterize the current state of affairs in every context. In that sense, NS could be seen as being partitioned among the several contexts, as is the case with norms; however, IRE ’s are not part of a context’s definition, since they are obtained at a later stage, during the context’s operation. Some of the IRE ’s are interrelated: for instance, a fulfillment connects an obligation to bring about a fact with its achievement as an institutional fact. These interrelations are captured with institutional rules.
2.3
Rules and Norms
Given the “contextualization” of the normative state, we are now able to define rules and norms. Institutional rules allow us to maintain the normative state of the system. They are not contextualized, but yet they operate on contextual IRE ’s. Def. 7: Institutional rule R ::= Antecedent → Consequent An institutional rule R defines, for a given set of conditions, what other elements should be added to the normative state. The rule’s Antecedent is a conjunction of patterns of IRE C (see Def. 5), which may contain variables; restrictions may be imposed on such variables through relational conditions. We also allow the use of negation (as failure): Antecedent ::= IRE C | ¬Antecedent | RelCondition | Antecedent ∧ Antecedent The rule’s Consequent is a conjunction of IRE C which are not deontic statements (IRE –C ), and which are allowed to contain bounded variables: Consequent ::= IRE –C | IRE –C ∧ Consequent When the antecedent matches the normative state using a firstorder logic substitution Θ, and if all the relational conditions over variables hold, the atomic formulae obtained by applying Θ to the consequent of the rule are added to the normative state as fullygrounded elements. Besides institutional reality elements, the norms themselves are also contextual.
Def. 8: Norm N C ::= Situation C → Prescription C A norm N C is a rule with a deontic consequent, defined in a specific context C. The norm is applicable to a context C C. The norm’s Situation C is a conjunction of patterns of Info C and IRE –C (no deontic statements). Both kinds of patterns are allowed to contain variables; restrictions may be imposed on such variables through relational conditions: Situation C ::= Info C | IRE –C | RelCondition | Situation C ∧ Situation C The norm’s Prescription C is a (possibly empty) conjunction of deontic statements (obligations) which are allowed to contain bounded variables and are affected to the same context C: Prescription C ::= | OblConj C C C OblConj ::= obl (...) ∧ OblConj C | obl C (...)
Conceptually, the norm’s Situation C can be seen as being based on two sets of elements: background (Sb) and contingent (Sc). Background elements are those that exist at context C creation (the founding contextual info), while contingent elements are those that are added to the normative state at a later stage. This distinction will be helpful when describing norm semantics.
470
H. Lopes Cardoso and E. Oliveira / Norm Defeasibility in an Institutional Normative Framework
Observe the distinction between the context where the norm is defined, and the context to which the norm applies. While, in order to make the model as simple as we can, we define a norm as being applicable to a specific context, in Section 3.1 we relax this assumption, which will in part clarify the usefulness of the model.
part of norms). Observe that we do not talk about norm defeasance, but rather norm activation defeasance. Thus, the defeasance relationship may only materialize on actual norm applicability.
3
A question that may arise when going through the previous definitions can jeopardize the purpose of having defeasible norms as those in the model presented. Why should there be norms that, while being applicable to the same context, are defined in different contexts that have a sub-context relationship? Why not have all norms applicable to context C defined inside context C? The reason for our approach becomes apparent when considering the stated aim of a supportive normative environment: to have a normative background that can fill-in details of sub-contexts that are created later and that can benefit from this setup by being underspecified. This leads us to the subject of “default rules” in the law field [2]. Thus, part of the normative environment’s norms will typically be predefined, in the sense that they are pre-existent to the applicable contexts themselves (which correspond to and result from contracts as they are signed up). What we need is to typify contexts in order to be able to say that a norm applies to a certain type of contexts. This way, a norm might be defined at a super-context and applicable to a range of sub-contexts (of a certain type) to be subsequently created. We can do this adaptation by considering a context identifier C as a pair id:type, where id is a context identifier and type is a pre defined context type. In a norm N C = S C → P C (see Def. 8), C C C C patterns of Info and IRE within S and P will be rewritten to accommodate this kind of context reference, eventually using a variable in place of the context id . For instance, an IRE X :t pattern, where X is a variable, would match IRE ’s of any sub-context of type t. When activating a norm with this kind of pattern, the substitution Θ (as used in Def. 9) would have to bind X to a specific sub-context identifier; every further occurrence of X is thus a bounded-variable. This approach allows us to maintain our definitions of norm activation conflict and defeasance, with minor syntactical changes.
NORM SEMANTICS
In this section we define the semantics of norms and formalize a model for norm defeasibility in the ambit of a supportive normative framework. We start by exploring norm applicability according to the normative state. For that, we make use of the notion of substitution in first-order logic. We denote by f ·Θ the result of applying substitution Θ to atomic formula f . Def. 9: Norm activation Norm N C = S C → P C , applicable to a context C = PC , CA , CI , CN , is activated if there is a substitution Θ such that: • ∀c∈Sc c · Θ ∈ NS , where Sc is the set of contingent con juncts (IRE –C patterns) in S C ; and • ∀b∈Sb b · Θ ∈ CI , where Sb is the set of background con juncts (Info C patterns) in S C ; and
• all the relational conditions in S C over variables hold. We are now able to define the notion of conflicting norm activations, as follows. Def. 10: Norm activation conflict Let Act1 be the activation of norm N1C1 = S1C1 → P1C1 obtained with substitution Θ1 and Act2 the activation of norm N2C2 = S2C2 → P2C2 obtained with substitution Θ2 . Let NS1 = {c · Θ1 |c ∈ Sc1 }, and NS2 = {c · Θ2 |c ∈ Sc2 }, where Sc1 and Sc2 are the sets of contingent conjuncts of S1C1 and S2C2 , respectively. Both NS1 and NS2 represent fractions of the whole normative state NS .& Norm activations Act1 and Act2 are in conflict, written Act1 Act2 , if NS1 = NS2 and either C1 C2 or C2 C1. Succinctly, we say there is a norm activation conflict if we have two applicable norms activated with the same fraction of the normative state and defined in different contexts. Notice that the fact that both norms are activated with the same contextual IRE ’s already dictates that the norm contexts, if different, have a sub-context relationship (there is no multiple inheritance mechanism in our normative structure). This becomes clearer when taking into account the sub-context (Def. 3) and norm (Def. 8) definitions: a context has a single parent context, and a norm N C applies to a context C C. In principle, all norm activations are defeasible, according to the following definition. Def. 11: Norm activation defeasance C1 Norm activation Act1 for norm & N1 defeats norm activation C2 Act2 for norm N2 if Act1 Act2 and C1 C2. A defeated norm activation is discarded, that is, the defeated activation is not applied to the normative state fraction used for activating the norm. Only undefeated norm activations will be applied: the substitution that activated the norm is applied to its prescription part and the resulting fully-grounded deontic statements are added to the normative state (recall that there are no free variables in the prescription
3.1
4
Norm Contextual Target
EXAMPLES
In this section we sketch some examples towards the exploitation of the normative environment. The examples try to focus on the important aspects of our approach; in the following we adopt the convention that variables begin with an upper-case letter. Our scenario is based on the following: each of a group of companies (agents) provides different resources that may need to be combined in order to present a value-added offering to third-parties. For that, they agree to form a virtual organization (VO). This organization will define a supply-agreement that translates into a context sa3:sa in the normative environment, where sa3 is the context id and sa is the context type (see Section 3.1). Notice that sa3:sa top, where top is the top context. Suppose we have, at the top context, the following norm: N1top = ifact X:sa (order (A1 , Res, Qt, A2 ), T )∧ supply–info X:sa (A2 , Res, Pr ) → obl X:sa (A2 , delivery(A2 , Res, Qt, A1 ), T + 2 )∧ obl X:sa (A1 , payment(A1 , Qt ∗ Pr , A2 ), T + 2 )
H. Lopes Cardoso and E. Oliveira / Norm Defeasibility in an Institutional Normative Framework
The norm states that for any supply-agreement, when an order is made that corresponds to the supply-info (which is an Info C for this type of context) of the receiver, he is obliged to deliver the requested goods and the sender is obliged to make the associated payment. Now, suppose context sa3:sa includes the following norms. N1sa3 :sa = ifact sa3 :sa (order (A1 , Res, Qt, jim), T )∧ supply–info sa3 :sa (jim, Res, Pr ) ∧ Qt > 99 → obl sa3 :sa (jim, delivery(jim, Res, Qt, A1 ), T + 5 )∧ obl sa3 :sa (A1 , payment(A1 , Qt ∗ Pr , jim), T + 2 ) This norm expresses the fact that agent jim, when receiving orders with more than 99 units, has an extended delivery deadline. N2sa3 :sa = ifact sa3 :sa (order (sam, Res, Qt, A2 ), T )∧ supply–info sa3 :sa (A2 , Res, ) → obl sa3 :sa (A2 , delivery(A2 , Res, Qt, sam), T + 2 ) N3sa3 :sa = fulf sa3 :sa (A2 , delivery(A2 , Res, Qt, sam), T )∧ supply–info sa3 :sa (A2 , Res, Pr ) → obl sa3 :sa (sam, payment(sam, Qt ∗ Pr , A2 ), T + 2 ) These two norms express the higher position of agent sam who, as opposed to other agents, only pays after receiving the merchandise. Suppose we have the following founding contextual info for context sa3:sa: supply–info sa3 :sa (jim, r1 , 1 )
5
471
RELATED WORK
From a theoretical logical stance, norm defeasibility has been mainly guided by deontic reasoning [16], where conflicts regard the deontic operators themselves. Our approach is centered instead on the applicability of norms, not on their prescriptions. More practical approaches (e.g. in the B2B domain) to normative conflict resolution have also been developed. The application of business rules in e-commerce has been studied in [11], where courteous logic programs allow for an explicit definition of priorities among rules. An extension based on defeasible logic [15] has been advanced in [10]. Also, [9] addresses defeasible reasoning in the e-contracts domain, based on default logic and on the definition of dynamic priorities among rules. The work in [7] addresses the issue of conflict resolution in a structured setup of compound activities. These resemble our context and sub-context relationships. However, they model deontic conflicts (e.g. an action being obliged and prohibited at the same time), while we model norm (activation) conflicts. They study the inheritance of normative positions (obligations, permissions, prohibitions), based on an explicit stamping of each one of them with a priority value and a timestamp; the specificity criterion is based on the compound activities’ structure. We address the inheritance of norms and provide a means to override norm activations based on their defeasibility. Our approach of context and sub-context definitions, together with the presented norm defeasibility model, is similar to the notion of supererogatory defeasibility in [18]. They model defeasibility in terms of role and sub-role definitions. In fact, they also consider express defeasibility, which is based on the specificity of conditions for norm applicability, but this approach has been followed by several others. We should also point out that [8] presents a grammar for rules that combines both our rule and norm definitions. However, our concern is to distinguish a priori rule definition as a normative state maintenance concern from norm definition as a contracting activity. Furthermore, in [8] there is no attempt to solve any disputes related with possibly conflicting norms.
supply–info sa3 :sa (sam, r2 , 1 ) supply–info sa3 :sa (tom, r3 , 1 ) Table 1 shows what might happen in different normative states. The second column shows which norm activation conflicts come about (and how they are resolved) when the institutional reality elements of the first column are present. Notice that in the first example there is no conflict, since norm N1sa3 :sa is not activated because of a variable restriction. The third column shows the normative state after applying the defeating norm activation. For instance, in the second example NS contains NS together with the prescriptions of norm N1sa3 :sa (after applying the substitution that activated the norm). The third and fourth examples illustrate sam’s advantage in being obliged to pay only after the delivery has been fulfilled. In each case we rely on refraction (a principle used in rule-based systems) to avoid firing a defeating norm more than once on the same activation (which would otherwise happen since our normative state is monotonic). The norm activation defeasibility model is very flexible, allowing us to easily specify different contracting situations that exploit and adapt the normative background to different extents. Also, although the examples do not show this, it may be the case that a VO created by a group of agents defines norms to be applied in sub-contexts of a certain type. This would make up a three-level norm inheritance structure, where a subset of the VO’s agents could make further contracts that are covered by the VO’s agreement.
6
CONCLUSIONS
In this paper we formalized a normative environment with a hierarchical normative framework, including norm inheritance as a mechanism to facilitate contract establishment. Contexts were used as a means to organize norms and, more importantly, to guide their inheritance to new contexts. For that, we distinguished the context where a norm is defined from the context(s) to which it can be applied. In order to allow the expansibility of the system, and its application in different contracting scenarios, a model of norm activation defeasibility was designed, allowing an exploitation of the normative framework to different extents. Each signed contract generates a new context. A contract can include norms that defeat some of the norms of its super-contexts (which would otherwise be inherited), thus adapting the normative background to a specific situation. Considering normative conflict resolution from the law field, as disclosed in the introduction, our approach has some similarities with the lex specialis principle. However, the defeating norms are more specific in the sense that they are defined at (as opposed to applied to) a more specific context (a kind of “lex inferior”). The lex specialis flavor comes from the fact that in most cases a defeating norm should also apply to a narrower context-set. These properties of our norm defeasance approach result from the fact that the original aim is not to impose predefined regulations on
472
H. Lopes Cardoso and E. Oliveira / Norm Defeasibility in an Institutional Normative Framework
Table 1.
Different normative states and norm activation conflicts.
NS ifact sa3 :sa (order (tom, r1 , 5 , jim), 1 )
Conflict none, N1top applies
NS ifact sa3 :sa (order (tom, r1 , 5 , jim), 1 ) obl sa3 :sa (jim, delivery(jim, r1 , 5 , tom), 3 ) obl sa3 :sa (tom, payment(tom, 5 , jim), 3 )
ifact sa3 :sa (order (tom, r1 , 100 , jim), 1 )
N1sa3 :sa defeats N1top
ifact sa3 :sa (order (tom, r1 , 100 , jim), 1 ) obl sa3 :sa (jim, delivery(jim, r1 , 100 , tom), 6 ) obl sa3 :sa (tom, payment(tom, 100 , jim), 3 )
ifact sa3 :sa (order (sam, r3 , 5 , tom), 1 )
N2sa3 :sa defeats N1top
ifact sa3 :sa (order (sam, r3 , 5 , tom), 1 ) obl sa3 :sa (tom, delivery(tom, r3 , 5 , sam), 3 )
ifact sa3 :sa (order (sam, r3 , 5 , tom), 1 ) obl sa3 :sa (tom, delivery(tom, r3 , 5 , sam), 3 ) fulf sa3 :sa (tom, delivery(tom, r3 , 5 , sam), 2 )
none, N3sa3 :sa applies
ifact sa3 :sa (order (sam, r3 , 5 , tom), 1 ) obl sa3 :sa (tom, delivery(tom, r3 , 5 , sam), 3 ) fulf sa3 :sa (tom, delivery(tom, r3 , 5 , sam), 2 ) obl sa3 :sa (sam, payment(sam, 5 , tom), 4 )
agents, but instead to help them in building contractual relationships by providing a normative background (which can be exploited in a partial way through adaptation). A feature of our approach that exposes this aim is that all norms are defeasible. In this respect we follow the notion from law theory of “default rules” [2]. We leave for future work the possibility of defining non-defeasible norms, that is, norms that are not to be overridden. This notion of “default rules” might be misleading; it has not a direct correspondence with default logic formalizations [17]. We do not handle the defeasibility of conclusions of default rules in that sense, but instead model defeasibility of the application of the rules themselves (which are called norms). Although we are primarily concerned with deadline obligations, the inclusion of permissions or prohibitions as possible deontic statements prescribed by norms demands no changes in our norm activation defeasibility approach. We do not rely on conflicts between the content of deontic statements (which are deontic conflicts), but instead on norm activation conflicts. These are closely related to the notion of conflict set (or agenda) in rule-based forward-chaining systems (e.g. [6]). In those systems, a conflict is a possible application of more than one rule at the same time, and a conflict resolution strategy will decide which rule to apply in each step of the process. Some open issues in our research include, as already mentioned, the possibility of defining non-defeasible norms, which might be important in certain contracting domains. The development of multipleinheritance mechanisms within our contextual framework is also an interesting issue, although it poses additional problems regarding norm defeasibility.
ACKNOWLEDGEMENTS The first author is supported by FCT (Fundac¸a˜ o para a Ciˆencia e a Tecnologia) under grant SFRH/BD/29773/2006.
REFERENCES [1] A. Artikis, J. Pitt, and M. Sergot, ‘Animated specifications of computational societies’, in Int. J. Conf. on Autonomous Agents and MultiAgent Systems, eds., C. Castelfranchi and W. L. Johnson, pp. 1053– 1062, Bologna, Italy, (2002). ACM, New York, USA. [2] R. Craswell, ‘Contract law: General theories’, in Encyclopedia of Law and Economics, eds., B. Bouckaert and G. De Geest, volume III: The Regulation of Contracts, 1–24, Edward Elgar, Cheltenham, (2000).
[3] F. Dignum, ‘Autonomous agents with norms’, Artificial Intelligence and Law, 7(1), 69–79, (1999). [4] F. Dignum, ‘Abstract norms and electronic institutions’, in International Workshop on Regulated Agent-Based Social Systems: Theories and Applications (RASTA’02), Bologna, Italy, (2002). [5] M. Esteva, B. Rosell, J. A. Rodr´ıguez-Aguilar, and J. L. Arcos, ‘Ameli: An agent-based middleware for electronic institutions’, in Third Int. J. Conf. on Autonomous Agents and Multi-agent Systems, volume 1, pp. 236–243. IEEE Computer Society, (2004). [6] E. Friedman-Hill, Jess in Action, Manning Publications Co., 2003. [7] A. Garc´ıa-Camino, P. Noriega, and J. A. Rodr´ıguez-Aguilar, ‘An algorithm for conflict resolution in regulated compound activities’, in Seventh Int. Workshop Engineering Societies in the Agents World (ESAW’06), (2006). [8] A. Garc´ıa-Camino, J. A. Rodr´ıguez-Aguilar, C. Sierra, and W. Vasconcelos, ‘Norm-oriented programming of electronic institutions: A rule-based approach’, in Coordination, Organizations, Institutions, and Norms in Agent Systems II, 177–193, Springer, (2007). [9] G. K. Giannikis and A. Daskalopulu, ‘Defeasible reasoning with econtracts’, in IEEE/WIC/ACM International Conference on Intelligent Agent Technology, pp. 690–694, (2006). [10] G. Governatori, ‘Representing business contracts in ruleml’, International Journal of Cooperative Information Systems, 14(2-3), 181–216, (2005). [11] B. N. Grosof, ‘Representing e-commerce rules via situated courteous logic programs in ruleml’, Electronic Commerce Research and Applications, 3(1), 2–20, (2004). [12] H. Lopes Cardoso and E. Oliveira, ‘A context-based institutional normative environment’, in AAMAS’08 Workshop on Coordination, Organization, Institutions and Norms in agent systems (COIN), pp. 119–133, Estoril, Portugal, (2008). [13] H. Lopes Cardoso and E. Oliveira, ‘A contract model for electronic institutions’, in Coordination, Organizations, Institutions, and Norms in Agent Systems III, LNAI 4870, 27–40, Springer, (2008). [14] H. Lopes Cardoso and E. Oliveira, ‘Electronic institutions for b2b: Dynamic normative environments’, Artificial Intelligence and Law, 16(1), 107–128, (2008). [15] D. Nute, ‘Defeasible logic’, in Handbook of Logic in Artificial Intelligence and Logic Programming, eds., D.M. Gabbay, C.J. Hogger, and J.A. Robinson, volume 3, 353–395, Oxford University Press, (1994). [16] D. Nute, Defeasible Deontic Logic, volume 263 of Synthese Library, Kluwer Academic Publishers, 1997. [17] R. Reiter, ‘A logic for default reasoning’, Artificial Intelligence, 13(1/2), 81–132, (1980). [18] Y. U. Ryu, ‘Relativized deontic modalities for contractual obligations in formal business communication’, in 30th Hawaii International Conference on System Sciences (HICSS), pp. 485–493, Hawaii, USA, (1997). [19] S. Sen and S. Airiau, ‘Emergence of norms through social learning’, in Twentieth International Joint Conference on Artificial Intelligence, pp. 1507–1512, Hyderabad, India, (2007).
8. Constraints and Search
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-475
475
SLIDE: A Useful Special Case of the CARDPATH Constraint Christian Bessiere1 and Emmanuel Hebrard2 and Brahim Hnich3 and Zeynep Kiziltan4 and Toby Walsh5 Abstract. We study the C ARD PATH constraint. This ensures a given constraint holds a number of times down a sequence of variables. We show that S LIDE, a special case of C ARD PATH where the slid constraint must hold always, can be used to encode a wide range of sliding sequence constraints including C ARD PATH itself. We consider how to propagate S LIDE and provide a complete propagator for C ARD PATH. Since propagation is NP-hard in general, we identify special cases where propagation takes polynomial time. Our experiments demonstrate that using S LIDE to encode global constraints can be as efficient and effective as specialised propagators.
1 INTRODUCTION In many scheduling problems, we have a sequence of decision variables and a constraint which applies down the sequence. For example, in the car sequencing problem, we need to decide the sequence of cars on a production line. We might have a constraint on how often a particular option is met (e.g. 1 out of 3 cars can have a sun-roof). As a second example, in a nurse rostering problem, we need to decide the sequence of shifts worked by nurses. We might have a constraint on how many consecutive night shifts any nurse can work. Such constraints have been classified as sliding sequence constraints [7]. To model such constraints, we can use the C ARD PATH constraint. This ensures that a given constraint holds a number of times down a sequence of variables [5]. We identify a special case of C ARD PATH which we call S LIDE, that is interesting for several reasons. First, many sliding sequence constraints, including C ARD PATH, can easily be encoded using this special case. S LIDE is therefore a “generalpurpose” constraint for encoding sliding sequencing constraints. This is an especially easy way to provide propagators for such global constraints within a constraint toolkit. Second, we give a propagator for enforcing generalised arc-consistency on S LIDE. By comparison, the previous propagator for C ARD PATH given in [5] does not prune all possible values. Third, S LIDE can be as efficient and effective as specialised propagators in solving sequencing problems. 1 2 3
4 5
LIRMM (CNRS / U. Montpellier), France, email: bessiere@lirmm.fr. Supported by the ANR project ANR-06-BLAN-0383-02. 4C, UCC, Ireland, email: ehebrard@4c.ucc.ie. Izmir Uni. of Economics, Turkey, email: brahim.hnich@ieu.edu.tr. Supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grant No. SOBAG-108K027. CS Department, Uni. of Bologna, Italy, email: zeynep@cs.unibo.it. NICTA and UNSW, Sydney, Australia, email: toby.walsh@nicta.com.au. Funded by the Australian Government’s Department of Broadband, Communications and the Digital Economy, and the ARC.
2 CARDPATH AND SLIDE CONSTRAINTS A constraint satisfaction problem consists of a set of variables, each with a finite domain of values, and a set of constraints specifying allowed combinations of values for given sets of variables. We use capital letters for variables (e.g. X), and lower case for values (e.g. d). We write D(X) for the domain of variable X. Constraint solvers typically explore partial assignments enforcing a local consistency property. A constraint is generalised arc consistent (GAC) iff when a variable is assigned any value in its domain, there exist compatible values in the domains of all the other variables of the constraint. The C ARD PATH constraint was introduced in [5]. If C is a constraint of arity k then C ARD PATH(N, [X1 , . . . , Xn ], C) holds iff C(Xi , . . . , Xi+k−1 ) holds N times for 1 ≤ i ≤ n − k + 1. For example, we can count the number of changes in the type of shift with C ARD PATH(N, [X1 , . . . , Xn ], =). Note that C ARD PATH can be used to encode a range of Boolean connectives since N ≥ 1 gives disjunction, N = 1 gives exclusive or, and N = 0 gives negation. We shall focus on a special case of the C ARD PATH constraint where the slid constraint holds always. S LIDE(C, [X1 , . . . , Xn ]) holds iff C(Xi , . . . , Xi+k−1 ) holds for all 1 ≤ i ≤ n − k + 1. That is, a C ARD PATH constraint in which N = n − k + 1. We also consider a more complex form of S LIDE that applies only every j variables. More precisely, S LIDEj (C, [X1 , . . . , Xn ]) holds iff . By definition C(Xij+1 , . . . , Xij+k ) holds for 0 ≤ i ≤ n−k j S LIDEj for j = 1 is equivalent to S LIDE. Beldiceanu and Carlsson have shown that C ARD PATH can encode a wide range of constraints like C HANGE , S MOOTH, A MONG S EQ and S LIDING S UM [5]. As we discuss later, S LIDE provides a simple way to encode such sliding sequencing constraints. It can also encode many other more complex sliding sequencing constraints like R EGULAR [16], S TRETCH [13], and L EX [7], as well as many types of chanelling constraints like E LEMENT [19] and optimisation constraints like the soft forms of R EGULAR [20]. More interestingly, C ARD PATH can itself be encoded into a S LIDE constraint. In [5], a propagator for C ARD PATH is proposed that greedily constructs upper and lower bounds on the number of (un)satisfied constraints by posting and retracting (the negation of) each of the constraints. This propagator does not achieve GAC. We propose here a complete propagator for enforcing GAC on S LIDE. S LIDE thus provides a GAC propagator for C ARD PATH. In addition, S LIDE provides a GAC propagator for any of the other global constraints it can encode. As our experimental results reveal, S LIDE can be as efficient and effective as specialised propagators. We illustrate the usefulness of S LIDE with the A MONG S EQ constraint which ensures that values occur with some given frequency. For instance, we might want that no more than 3 out of every sequence of 7 shift variables are a “night shift”. More
476
C. Bessiere et al. / SLIDE: A Useful Special Case of the CARDPATH Constraint
precisely, A MONG S EQ(l, u, k, [X1 , . . . , Xn ], v) holds iff between l and u variables in every sequence of k variables take value in the ground set v [8]. We can encode this using S LIDE. More precisely, A MONG S EQ(l, u, k, [X1 , . . . , Xn ], v) can be encoded k,v k,v , [X1 , . . . , Xn ]) where Dl,u is an instance of the as S LIDE(Dl,u k,v A MONG constraint [8]. Dl,u (Xi , . . . , Xi+k−1 ) holds iff between l and u variables take values in the set v. For example, suppose 2 of every 3 variables along a sequence X1 . . . X5 should take the value a, where X1 = a and X2 , . . . , X5 ∈ {a, b}. This can be encoded as S LIDE(E, [X1 , X2 , X3 , X4 , X5 ]) where E(Xi , Xi+1 , Xi+2 ) ensures two of its three variables take a. This S LIDE constraint ensures that E(X1 , X2 , X3 ), E(X2 , X3 , X4 ) and E(X3 , X4 , X5 ) all hold. Note that each ternary constraint is GAC. However, enforcing GAC on the S LIDE constraint sets X4 = a as there are only two satisfying assignments and neither have X4 = b.
3 SLIDE WITH MULTIPLE SEQUENCES We often wish to slide a constraint down two or more sequences of variables at once. For example, suppose we want to ensure that two vectors of variables, X1 to Xn and Y1 to Yn differ at every index. We can encode such a constraint by interleaving the two sequences and sliding a constraint down the single sequence with a suitable offset. In our example, we simply post S LIDE2 (=, [X1 , Y1 , . . . , Xn , Yn ]). As a second example of sliding down multiple sequences of variables, consider the constraint R EGULAR(A, [X1 , . . . , Xn ]). This ensures that the values taken by a sequence of variables form a string accepted by a deterministic finite automaton A [16]. This global constraint is useful in scheduling, rostering and sequencing problems to ensure certain patterns do (or do not) occur over time. It can be used to encode a wide range of other global constraints including: A MONG [8], C ONTIGUITY [15], L EX and P RECEDENCE [14]. To encode the R EGULAR constraint with S LIDE, we introduce variables, Qi to record the state of the automaton. We then post S LIDE2 (F, [Q0 , X1 , Q1 , . . . , Xn , Qn ]) where Q0 is set to the starting state, Qn is restricted to accepting states, and F (Qi , Xi+1 , Qi+1 ) holds iff Qi+1 = δ(Xi , Qi ) where δ is the transition function of the automaton. If we decompose this encoding into the conjunction of slid constraints, we get a set of constraints similar to [6]. Enforcing GAC on this encoding ensures GAC on R EGULAR and, by exploiting functionaliy of F , takes O(ndq) time where d is the number of values for Xi and q is the number of states of the automaton. This is asymptotically identical to the specialised R EGULAR propagator [16]. This encoding is highly competitive in practice with the specialized propagator [2]. One advantage of this encoding is that it gives explicit access to the states of the automaton. Consider, for example, a rostering problem where workers are allowed to work for up to three consecutive shifts. This can be specified with a simple R EGULAR constraint. Suppose now we want to minimise the number of times a worker has to work for three consecutive shifts. To encode this, we can post an A MONG constraint on the state variables to count the number of times we visit the state representing three consecutive shifts, and minimise the value taken by this variable. As we shall see later in the experiments, the encoding also gives an efficient incremental propagator. In fact, the complexity of repeatedly enforcing GAC on this encoding of the R EGULAR constraint down the whole branch of a backtracking search tree is just O(ndq) time.
4 SLIDE WITH COUNTERS We may want to slide a constraint on a sequence of variables computing a count. We can use S LIDE to encode such constraints by incrementally computing the count in an additional sequence of variables. Consider, for example, C ARD PATH(N, [X1 , . . . , Xn ], C). For simplicity, we consider k = 2 (i.e., C is binary). The generalisation to other k is straightforward. We introduce a sequence of integer variables Mi in which to accumulate the count. We encode C ARD PATH as S LIDE2 (G, [M1 , X1 , . . . , Mn , Xn ]) where M1 = 0, Mn = N , and G(Mi , Xi , Mi+1 , Xi+1 ) is defined as: if C(Xi , Xi+1 ) holds then Mi+1 = Mi + 1, otherwise Mi+1 = Mi . GAC on S LIDE ensures GAC on C ARD PATH. As a second example, consider the S TRETCH constraint [13]. Given variables X1 to Xn taking values from a set of shift types τ , a set π of ordered pairs from τ × τ , and functions shortest(t) and longest(t) giving the minimum and maximum length of a stretch of type t, S TRETCH([X1 , . . . , Xn ]) holds iff each stretch of type t has length between shortest(t) and longest(t); and consecutive types of stretches are in π. We can encode S TRETCH as S LIDE2 (H, [X1 , Q1 , . . . , Xn , Qn ]) where Q1 = 1 and H(Xi , Xi+1 , Qi , Qi+1 ) holds iff (1) Xi = Xi+1 , Qi+1 = 1 + Qi , and Qi+1 ≤ longest(Xi ); or (2) Xi = Xi+1 , Xi , Xi+1 ∈ π, Qi ≥ shortest(Xi ) and Qi+1 = 1. GAC on S LIDE ensures GAC on S TRETCH.
5 OTHER EXAMPLES OF SLIDE There are many other examples of global constraints which we can encode using S LIDE. For example, we can encode L EX [7] using S LIDE. L EX holds iff a vector of variables [X1 ..Xn ] is lexicographically smaller than another vector of variables [Y1 ..Yn ]. We introduce a sequence of Boolean variables Bi to indicate if the vectors have been ordered by position i − 1. Hence B1 = 0. We then encode L EX as S LIDE3 (I, [B1 , X1 , Y1 , . . . , Bn , Xn , Yn ]) where I(Bi , Xi , Yi , Bi+1 ) holds iff (Bi = Bi+1 = 0 ∧ Xi = Yi ) or (Bi = 0 ∧ Bi+1 = 1 ∧ Xi < Yi ) or (Bi = Bi+1 = 1). This gives us a linear time propagator as efficient and incremental as the specialised algorithm in [12]. As a second example, we can encode many types of channelling constraints using S LIDE like D OMAIN [17], L INK S ET 2B OOLEANS [7] and E LEMENT [19]. As a final example, we can encode “optimisation” constraints like the soft form of the R EGULAR constraint which measures the Hamming or edit distance to a regular string [20]. There are, however, constraints that can be encoded using S LIDE which do not give as efficient and effective propagators as specialised algorithms (e.g. the global A LL D IFFERENT constraint [18]).
6 PROPAGATING SLIDE A constraint like S LIDE is only really useful if we can propagate it efficiently and effectively. The simplest possible way to propagate S LIDEj (C, [X1 , . . . , Xn ]) is to decompose it into a sequence of constraints, C(Xij+1 , . . . , Xij+k ) for 0 ≤ i ≤ n−k and let the j constraint solver propagate the decomposition. Surprisingly, this is enough to achieve GAC in many cases. For example, we can achieve GAC in this way on the S LIDE encoding of the R EGULAR constraint. If the constraints in the decomposition overlap on just one variable then the constraint graph is Berge acyclic [4], and enforcing GAC on the decomposition of S LIDEj achieves GAC on S LIDEj . Similarly, enforcing GAC on the decomposition achieves GAC on S LIDEj if
477
C. Bessiere et al. / SLIDE: A Useful Special Case of the CARDPATH Constraint
the constraint being slid is monotone. A constraint C is monotone iff there exists a total ordering ≺ of the values such that for any two values v, w, if v ≺ w then v can replace w in any support for C. For instance, the constraints A MONG and S UM are monotone if either no upper bound, or no lower bound is given. Theorem 1 Enforcing GAC over each constraint in the decomposition of S LIDEj achieves GAC on S LIDEj if the constraint being slid is monotone. Proof: For an arbitrary value v ∈ D(X), we show that if every constraint is GAC, then we can build a support for X = v on S LIDEj . For any variable other than X, we choose the smallest value in the total order. This is the value that can be substituted for any other value in the same domain. A tuple built this way satisfies all the constraints being slid since we know that there exists a support for each (they are GAC), and the values we chose can be substituted for this support. 2 In the general case, when constraints overlap on more than one variable (e.g. in the S LIDE encoding of A MONG S EQ), we need to do more work to achieve GAC. We distinguish two cases: when the arity of the constraint being slid is not fixed, and when the arity is fixed. We show that enforcing GAC in the former case is NP-hard. Theorem 2 Enforcing GAC on S LIDE(C, [X1 , . . . , Xn ]) is NPhard when the arity of C is not fixed even if enforcing GAC on C is itself polynomial. Proof: We give a reduction from 3-SAT in N variables and M clauses. We introduce variables Xij for 1 ≤ i ≤ N + 1 and 1 ≤ j ≤ M . For each clause j, if the clause is xa ∨ ¬xb ∨ xc , then we set X1j ∈ {xa , ¬xb , xc } to represent the values that make this j clause true. For each clause j, we set Xi+1 ∈ {0, 1} for 1 ≤ i ≤ N to represent a truth assignment. Hence, we duplicate the truth assignment for each clause. We now build the following constraint j 1 M S LIDE(C, [X11 , .., XN+1 , .., X1j , .., XN+1 , .., X1M , .., XN+1 ]) where C has arity N + 1. We construct C(Y1 , . . . , YN+1 ) to hold iff Y1 = xd and Y1+d = 1, or Y1 = ¬xd and Y1+d = 0. (in these two cases, the value assigned to Y1 represents the literal that makes clause j true), or Yi ∈ {0, 1} and Yi = Yi+N+1 (in this case, the truth assignment is passed down the sequence). Enforcing GAC on C is polynomial and an assignment satisfying the S LIDE constraint corresponds to a satisfying assignment for the original 3-SAT problem. 2 When the arity of the constraint being slid is not great, we can enforce GAC on S LIDE using dynamic programming (DP) in a similar way to the DP-based propagators for the R EGULAR and S TRETCH constraints [16, 13]. A much simpler method, however, which is just as efficient and effective as dynamic programming is to exploit a variation of the dual encoding into binary constraints [10] based on tuples of support. Such an encoding was proposed in [1] for a particular sliding constraint. Here we show that this method is more general and can be used for arbitrary S LIDE constraints. Using such an encoding, S LIDE can be easily added to any constraint solver. We illustrate the intersection encoding by means of an example. Consider again the A MONG S EQ example in which 2 of every 3 variables of X1 . . . X5 should take the value a, where X1 = a and X2 , . . . , X5 ∈ {a, b}. We can encode this as S LIDE(E, [X1 , X2 , X3 , X4 , X5 ]) where E(Xi , Xi+1 , Xi+2 ) is an instance of the A MONG constraint that ensures two of its three variables take a. If the sliding constraint has arity k, we introduce an intersection variable for each subsequence of k − 1 variables of S LIDE. The first intersection variable V1 has a domain containing
X1 a
X2 a b
aa ab
V1
V2
X3 a b
aa ab ba bb
V3
X4 a b
aa ab ba bb
Figure 1.
X5 a b
V4
aa ab ba bb
: channelling constraint : allowed tuple in compatibility constraint Vi
: intersection variable
Intersection encoding
all tuples from D(X1 ) × . . . × D(Xk−1 ). The jth intersection variable Vj has domain containing D(Xj ) × . . . × D(Xj+k−2 ). And so on until Vn−k+2 . In our example in Fig 1, this gives D(V1 ) = D(X1 ) × D(X2 ), . . . , D(V4 ) = D(X4 ) × D(X5 ). We then post binary compatibility constraints between consecutive intersection variables. These constraints ensure that the two intersection variables assign (k − 1)-tuples that agree on the values of their k − 2 common original variables (like constraints in the dual encoding). They also ensure that the k-tuple formed by the two (k − 1)-tuples satisfies the corresponding instance of the slid constraint. For instance, in Fig 1, the binary constraint between V1 and V2 does not allow the pair ab, aa because the second argument of ab for V1 (value b for X2 ) is in conflict with the first argument of aa for V2 (value a for X2 ). That same constraint between V1 and V2 does not allow the pair ab, bb because the tuple abb is not allowed by E(X1 , X2 , X3 ). Enforcing AC on such compatibility constraints prunes aa and bb from V2 , ab and bb from V3 , and ba and bb from V4 . Finally, we post binary channelling constraints to link the tuples to the original variables. One such constraint for each original variable is sufficient. For example, we can have a channelling constraint between V4 and X4 which ensures that the first argument of the tuple assigned to V4 equals the value assigned to X4 . Enforcing AC on this channelling constraint prunes b from the domain of X4 . We could instead post a channelling constraint between V3 and X4 ensuring that the second argument in V3 equals X4 . The A MONG S EQ constraint is now GAC. Theorem 3 Enforcing AC on the intersection encoding of S LIDE achieves GAC in O(ndk ) time and O(ndk−1 ) space where k is the arity of the constraint to slide and d is the maximum domain size. Proof: The constraint graph associated with the intersection encoding is a tree. Enforcing AC on this therefore achieves GAC. Enforcing AC on the channelling constraints then ensures that the domains of the original variables are pruned appropriately. As we introduce O(n) intersection variables, and each can contain O(dk−1 ) tuples, the intersection encoding requires O(ndk−1 ) space. Enforcing AC on a compatibility constraint between two intersection variables Vi and Vi+1 takes O(dk ) time as each tuple in the intersection variable Vi has at most d supports which are the tuples of Vi+1 that are equal to Vi on their k − 2 common arguments. Enforcing AC on O(n) such constraints therefore takes O(ndk ) time. Finally, enforcing AC on each of the O(n) channelling constraints takes O(dk−1 ) time as they are functional. Hence, the total time complexity is O(ndk ). 2 Arc consistency on the intersection encoding simulates pairwise consistency on the decomposition. It does this efficiently as intersection variables represent in extension ’only’ the intersections. This is sufficient because the constraint graph is acyclic. This encoding is also very easy to implement in any constraint solver. It has good
478
C. Bessiere et al. / SLIDE: A Useful Special Case of the CARDPATH Constraint
incremental properties. Only those constraints associated with a variable which changes need to wake up. The intersection encoding of S LIDEj for j > 1 is less expensive to build than for j = 1 as we need intersection variables for subsequences of less than k − 1 variables. For 1 ≤ j ≤ k/2, we introduce intersection variables for subsequences of variables of length k − j starting at indices 1, j + 1, 2j + 1... whose domains contain (k − j)tuples of assignments. Compatibility and channelling constraints are defined as with j = 1. If j > k/2, two consecutive intersection variables (for two subsequences of k − j variables) involve less than k variables of the S LIDEj . The compatibility constraint between them cannot thus ensure the satisfaction of the slid constraint. We therefore introduce intersection variables for subsequences of length k/2 starting at indices 1, j + 1, 2j + 1... and for subsequences of length k/2 finishing at indices k, j + k, 2j + k... The compatibility constraint between two consecutive intersection variables representing the subsequence starting at index pj + 1 and the subsequence finishing at index pj + k ensures satisfaction of the (p + 1)th instance of the slid constraint. The compatibility constraint between two consecutive intersection variables representing subsequence finishing at index pj + k and the subsequence starting at index (p + 1)j + 1 ensures the consistency of the arguments in the intersection of two instances of the slid constraint.
We use the same variable ordering for all models so that heuristic choices do not affect results. We schedule the days in chronological order and within each day we allocate a shift to every nurse in lexicographical order. Initial experiments show that this is more efficient than the minimum domain heuristic. However, it restricts the variety of domains passed to the propagators, and thus hinders any demonstration of differences in pruning. We therefore also use a more random heuristic. We allocate within each day a shift to every nurse randomly with 20% frequency and lexicographically otherwise.
decomp amongseq slide slidec decomp amongseq slide slidec decomp amongseq slide slidec
Table 1. Nurse scheduling with lexicographical variable ordering (1 on instances solved by all methods, 2 on instances solved by the method).
7 EXPERIMENTS We now demonstrate the practical value of S LIDE. Due to space limits, we only report detailed results on a nurse scheduling problem, and summarise the results on balanced incomplete block design generation and car sequencing problems. Experiments are performed with ILOG Solver 6.2 on a 2.8GHz Intel computer running Linux. We consider a Nurse Scheduling Problem [9] in which we generate a schedule of shift duties for a short-term planning period. There are three types of shifts (day, evening, and night). We ensure that (1) each nurse takes a day off or is assigned to an available shift; (2) each shift has a minimum required number of nurses; (3) each nurse’s work load is between specific lower and upper bounds; (4) each nurse works at most 5 consecutive days; (5) each nurse has at least 12 hours of break between two shifts; (6) the shift assigned to a nurse does not change more than once every three days. We construct four different models, all with variables indicating what type of shift, if any, each nurse is working on each day. We break symmetry between the nurses with lex concstraints. The constraints (1)-(3) are enforced using global cardinality constraints. Constraints (4), (5) and (6) form sequences of respectively 6-ary, binary and ternary constraints. Since (4) is monotone, we simply post the decomposition in the first three models. This achieves GAC by Theorem 1. The models differ in how (5) and (6) are propagated. In decomp, they are decomposed into conjunction of slid constraints. In amongseq, (5) is decomposed and (6) is enforced using the A MONG S EQ constraint of ILOG Solver (called IloSequence). The combination of (5) and (6) are enforced by S LIDE in slide. Finally, in slidec , we use S LIDE for the combination of (4), (5), and (6). We test the models using the instances available at http://www.projectmanagement.ugent.be/nsp.php in which nurses have no maximum workload, but a set of preferences to optimise. We ignore these preferences and post a constraint bounding the maximum workload to at most 5 day shifts, 4 evening shifts and 2 night shifts per nurse and per week. Similarly, each nurse must have at least 2 rest days per week. We solve three samples of instances involving 25, 30 and 60 nurses to schedule over 28 days.
#solved bts1 time1 bts2 time2 25 nurses, 28 days (99 instances) 99 301 0.13 301 0.13 99 301 0.19 301 0.19 99 301 0.19 301 0.19 99 295 0.68 295 0.68 30 nurses, 28 days (99 instances) 68 7101 2.80 15185 5.29 67 7101 4.31 7150 4.33 70 3303 1.99 4319 2.53 75 1047 2.13 11014 10.02 60 nurses, 28 days (100 instances) 51 5999 4.38 5999 4.38 51 5999 7.10 5999 7.10 52 5300 5.61 8479 7.21 58 2157 7.52 4501 12.07
#solved decomp amongseq slide slidec
86 85 97 97
decomp amongseq slide slidec
20 20 42 43
decomp amongseq slide slidec
3 2 27 34
bts1 time1 bts2 time2 25 nurses, 28 days (99 instances) 35084 7.69 41892 10.06 35401 14.43 35401 14.43 1699 1.00 1547 0.92 457 0.58 438 0.56 30 nurses, 28 days (99 instances) 68834 11.94 69550 12.75 68834 18.89 69550 19.83 378 0.18 8770 7.29 365 0.95 12857 6.76 60 nurses, 28 days (100 instances) 122406 71.06 250427 142.90 122406 119.40 122406 119.40 562 0.65 2367 2.19 542 3.96 1368 6.38
Table 2. Nurse scheduling with random variable ordering (1 on instances solved by all methods, 2 on instances solved by the method).
Tables 1 and 2 report the mean runtime and fails to solve the instances with 5 minutes cutoff. Between the first three models, the best results are due to slide. We solve more instances with slide, as well as explore a smaller tree. By developing a propagator for a generic constraint like S LIDE, we can increase pruning without hurting efficiency. Note that slide always performs better than amongseq. A possible reason is that A MONG S EQ cannot encode constraint (6) as directly as S LIDE. As in previous models, we need to channel into Boolean variables and post A MONG S EQ on them. This may not give as effective and efficient pruning. S LIDE thus offers both modelling and solving advantages over existing sequencing constraints. Note also that slidec solves additional instances in the time limit. This is not suprising as the model slides the combination of the constraints (4), (5), and (6). Recall that the sliding constraint of (4) is 6-ary. It is pleasing to note that the intersection encoding performs well even in the presence of such a high arity constraint. We also ran experiments on Balanced Incomplete Block Designs (BIBDs) and car sequencing. For BIBD, we use the model in [12] which contains L EX constraints. We propagate these either using the specialised algorithm of [12] or the S LIDE encoding. As both propagators maintain GAC, we only compare runtimes. Results on large instances show that the S LIDE model is as efficient as the L EX
C. Bessiere et al. / SLIDE: A Useful Special Case of the CARDPATH Constraint
model. For car sequencing, we test the scalability of S LIDE on large arity constraints and large domains using 80 instances from CSPLib. Unlike a model using IloSequence, our S LIDE model does not combine reasoning about overall cardinality of a configuration with the sequence of A MONG constraints. Hence, it is not as efficient: 26 instances were solved with S LIDE within the five minute cutoff, compared to 39 with IloSequence. However, 9 of the instances solved with S LIDE were not solved by IloSequence. The memory overhead of the S LIDE propagator was not excessive despite the slid constraints having arity 5 and domains of size 30. The S LIDE model used on average 22Mb of space, compared to 5Mb for IloSequence.
8 RELATED WORK Pesant introduced the R EGULAR constraint, and gave a propagator based on dynamic programming to enforce GAC [16]. As we saw, the R EGULAR constraint can be encoded using a simple S LIDE constraint. In this simple case, the dynamic programming machinery of Pesant’s propagator is unnecessary as the decomposition into ternary constraints does not hinder propagation. We have found that S LIDE is as efficient as R EGULAR in practice [2]. Furthermore, our encoding introduces variables for representing the states. Access to the state variables may be useful (e.g. for expressing objective functions). Although an objective function can be represented with the C OST R EGULAR constraint [11], this is limited to the sum of the variable-value assignment costs. Our encoding is more flexible, allowing different objective functions like the min function used in the example in Section 3. Beldiceanu, Carlsson, Debruyne and Petit have proposed specifying global constraints by means of deterministic finite automata augmented with counters [6]. They automatically construct propagators for such automata by decomposing the specification into a sequence of signature and transition constraints. This gives an encoding similar to our S LIDE encoding of the R EGULAR constraint. There are, however, a number of advantages of S LIDE over using an automaton. If the automaton uses counters, pairwise consistency is needed to guarantee GAC (and most constraint toolkits do not support pairwise consistency). We can encode such automata using a S LIDE where we introduce an additional sequence of variables for each counter. S LIDE thus provides a GAC propagator for such automata. Moreover, S LIDE has a better complexity than a brute-force pairwise consistency algorithm based on the dual encoding as it considers only the intersection variables, reducing the space complexity by a factor of d. Hellsten, Pesant and van Beek developed a GAC propagator for the S TRETCH constraint based on dynamic programming similar to that for the R EGULAR constraint [13]. As we have shown, we can encode the S TRETCH constraint and maintain GAC using S LIDE. Several propagators for the A MONG S EQ are proposed and compared in [21, 3]. Among these propagators, those based on the R EGULAR constraint do the most pruning and are often fastest. Finally, Bartak has proposed a similar intersection encoding for propagating a sliding scheduling constraint [1] We have shown that this method is more general and can be used for arbitrary S LIDE constraints.
9 CONCLUSIONS We have studied the C ARD PATH constraint. This slides a constraint down a sequence of variables. We considered S LIDE a special case of C ARD PATH in which the slid constraint holds at every position. We demonstrated that this special case can encode many global sequencing constraints including A MONG S EQ, C ARD PATH, R EGULAR in a
479
simple way. S LIDE can therefore serve as a “general-purpose” constraint for decomposing a wide range of global constraints, facilitating their integration into constraint toolkits. We proved that enforcing GAC on S LIDE is NP-hard in general. Nevertheless, we identified several useful and common cases where it is polynomial. For instance, when the constraint being slid overlaps on just one variable or is monotone, decomposition does not hinder propagation. Dynamic programming or a variation of the dual encoding can be used to propagate S LIDE when the constraint being slid overlaps on more than one variable and is not monotone. Unlike the previous proposed propagator for C ARD PATH, this achieves GAC. Our experiments demonstrated that using S LIDE to encode constraints can be as efficient and effective as specialised propagators. There are many directions for future work. One promising direction is to use binary decision diagrams to store the supports for the constraints being slid when they have many satisfying tuples. We believe this could improve the efficiency of our propagator in many cases.
REFERENCES [1] R. Bartak, ‘Modelling resource transitions in constraint-based scheduling’, in Proc. of SOFSEM 2002: Theory and Practice of Informatics. (2002). [2] C. Bessiere, E. Hebrard, B. Hnich, Z. Kiziltan, C.-G. Quimper and T. Walsh, ‘Reformulating global constraints: the SLIDE and REGULAR constraints’, in Proc. of SARA’07. (2007). [3] S. Brand, N. Narodytska, C.-G. Quimper, P. Stuckey and T. Walsh, ‘Encodings of the SEQUENCE Constraint’, in Proc. of CP’07. (2007). [4] C. Beeri, R. Fagin, D. Maier, and M. Yannakakis, ‘On the desirability of acyclic database schemes’, Journal of the ACM, 30, 479–513, (1983). [5] N. Beldiceanu and M. Carlsson, ‘Revisiting the cardinality operator and introducing cardinality-path constraint family’, in Proc. of ICLP’01. (2001). [6] N. Beldiceanu, M. Carlsson, R. Debruyne, and T. Petit, ‘Reformulation of global constraints based on constraints checkers’, Constraints, 10(4), 339–362, (2005). [7] N. Beldiceanu, M. Carlsson, and J-X. Rampon, ‘Global constraints catalog’, Technical report, SICS, (2005). [8] N. Beldiceanu and E. Contejean, ‘Introducing global constraints in CHIP’, Mathl. Comput. Modelling, 20(12), 97–123, (1994). [9] E.K. Burke, P.D. Causmaecker, G.V. Berghe and H.V. Landeghem, ‘The state of the art of nurse rostering’, Mathl. Journal of Scheduling, 7(6), 441–499, (2004). [10] R. Dechter and J. Pearl, ‘Tree clustering for constraint networks’, Artificial Intelligence, 38, 353–366, (1989). [11] S. Demassey, G. Pesant, and L.-M. Rousseau, ‘A cost-regular based hybrid column generation approach’, Constraints, 11(4), 315–333, (2006). [12] A. Frisch, B. Hnich, Z. Kiziltan, I. Miguel, and T. Walsh, ‘Global constraints for lexicographic orderings’, in Proc. of CP’02. (2002). [13] L. Hellsten, G. Pesant, and P. van Beek, ‘A domain consistency algorithm for the stretch constraint’, in Proc. of CP’04. (2004). [14] Y.C. Law and J.H.M. Lee, ‘Global constraints for integer and set value precedence’, in Proc. of CP’04. (2004). [15] M. Maher, ‘Analysis of a global contiguity constraint’, in Proc. of the CP’02 Workshop on Rule Based Constraint Reasoning and Programming, (2002). [16] G. Pesant, ‘A regular language membership constraint for finite sequences of variables’, in Proc. of CP’04. (2004). [17] P. Refalo, ‘Linear formulation of constraint programming models and hybrid solvers’, in Proc. of CP’00. (2000). [18] J-C. R´egin, ‘A filtering algorithm for constraints of difference in CSPs’, in Proc. of AAAI’94. (1994). [19] P. Van Hentenryck and J.-P. Carillon, ‘Generality versus specificity: An experience with AI and OR techniques’, in Proc. of AAAI’88. (1988). [20] W-J. van Hoeve, G. Pesant, and L-M. Rousseau, ‘On global warming : Flow-based soft global constaints’, Journal of Heuristics, 12(4-5), 347– 373, (2006). [21] W-J. van Hoeve, G. Pesant, L-M. Rousseau, and A. Sabharwal, ’Revisiting the sequence constraint’ in Proc. of CP’06. (2006).
480
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-480
L. Mandow and J.L. Pérez de la Cruz / Frontier Search for Bicriterion Shortest Path Problems
481
482
L. Mandow and J.L. Pérez de la Cruz / Frontier Search for Bicriterion Shortest Path Problems
L. Mandow and J.L. Pérez de la Cruz / Frontier Search for Bicriterion Shortest Path Problems
483
484
L. Mandow and J.L. Pérez de la Cruz / Frontier Search for Bicriterion Shortest Path Problems
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-485
485
Heuristics for Dynamically Adapting Propagation Kostas Stergiou1 Abstract. Building adaptive constraint solvers is a major challenge in constraint programming. An important line of research towards this goal is concerned with ways to dynamically adapt the level of local consistency applied during search. A related problem that is receiving a lot of attention is the design of adaptive branching heuristics. The recently proposed adaptive variable ordering heuristics of Boussemart et al. use information derived from domain wipeouts to identify highly active constraints and focus search on hard parts of the problem resulting in important saves in search effort. In this paper we show how information about domain wipeouts and value deletions gathered during search can be exploited, not only to perform variable selection, but also to dynamically adapt the level of constraint propagation achieved on the constraints of the problem. First we demonstrate that when an adaptive heuristic is used, value deletions and domain wipeouts caused by individual constraints largely occur in clusters of consecutive or nearby constraint revisions. Based on this observation, we develop a number of simple heuristics that allow us to dynamically switch between enforcing a weak, and cheap local consistency, and a strong but more expensive one, depending on the activity of individual constraints. As a case study we experiment with binary problems using AC as the weak consistency and maxRPC as the strong one. Results from various domains demonstrate the usefulness of the proposed heuristics.
1
INTRODUCTION
Building adaptive constraint solvers is a major challenge in constraint programming. One aspect of this goal is concerned with ways to dynamically adapt the level of local consistency applied on constraints during search. Constraint solvers typically maintain (generalized) arc consistency (G)AC, or a weaker consistency property like bounds consistency, during search. Although many stronger local consistencies have been proposed, their practical usage is limited as they are mostly applied during preprocessing, if at all. The main obstacle is the high time and in some cases space complexity of the algorithms that can achieve these consistencies. This, coupled with the implicit general assumption that constraints should be propagated with a predetermined local consistency throughout search, makes maintaining strong consistencies an infeasible option, except for some specific CSPs. One way to overcome the high complexity of maintaining a strong consistency while retaining its benefits is to dynamically evoke it during search only when certain conditions are met. There have been some works along this line in the literature, mainly focusing on methods to switch between AC and weaker consistencies [8, 10, 17, 14]. Here we consider methods to selectively apply stronger local consistencies than AC during search. 1
Department of Information & Communication Systems Engineering University of the Aegean, Greece (konsterg@aegean.gr).
Recently, Boussemart et al. proposed two adaptive conflict-driven variable ordering heuristics for CSPs called wdeg and dom/wdeg [2]. These heuristics use information derived from conflicts, in the form of domain wipeouts (DWOs), and stored as constraint weights to guide search. These heuristics, and especially dom/wdeg, are among the most efficient, if not the most efficient, general-purpose heuristics for CSPs. Grimes and Wallace proposed alternative conflictdriven heuristics that consider value deletions as the basic propagation events associated with constraint weights [11]. The efficiency of all the proposed conflict-directed heuristics is due to their ability to learn though conflicts encountered during search. As a result they can guide search towards hard parts of the problem and identify contentious constraints [11]. It has been recognized, for example in [14], that in many problems only few of the constraint revisions that occur during search are fruitful (i.e. delete values) while, as an extreme case, some constraints do not cause any value deletions at all despite being revised many times. Hence it would be desirable to apply a strong consistency only when it is likely that it will prune many values and avoid using such a consistency when the expected pruning is non-existent or very low. Through weight recording, conflict-driven heuristics are able to identify highly active constraints and focus search on variables involved in such constraints. Given that highly active constraints usually reside in hard parts of the problem, can one take advantage of this information to adapt the level of constraint propagation accordingly? In this paper we show how information about conflicts and value deletions can be exploited, not only to perform variable selection, but also to dynamically adapt the level of local consistency achieved on the constraints of the problem. First we demonstrate that when a conflict-driven heuristic is used on structured problems, constraint activity during search is not uniformly distributed among the revisions of the constraints. On the contrary it is highly clustered as value deletions and domain wipeouts caused by individual constraints largely occur in clusters of nearby revisions. Based on this observation, we develop simple heuristics that allow us to dynamically switch between enforcing a weak, and cheap local consistency, and a strong but more expensive one. The proposed heuristics achieve this by monitoring the activity of the constraints in the problem and triggering a switch between different propagation methods on individual constraints once certain conditions are met. For example, one of the heuristics works as follows. It applies a weak consistency on each constraint c until a revision of c results in a DWO. Then it switches to a strong consistency and applies it on c for the next few revisions. If no further weight update occurs during these revisions, it switches back to a weaker consistency. As a case study we experiment with binary problems using AC as the weak consistency and maxRPC as the strong one. Experimental results from various domains demonstrate the usefulness of the proposed heuristics.
486
BACKGROUND
3
CONSTRAINT ACTIVITY DURING SEARCH
In many, mainly structured, problems some constraints do not cause any DWOs or even are deletion-inactive during the run of a search algorithm. For example, when solving the scen11 RLFA problem with MAC+dom/wdeg, only 27 of the 4103 constraints in the problem were DWO-active while 2182 were deletion-active. The activity of the constraints in a problem depends on the structure of the problem since constraints in difficult local subproblems are more likely to cause deletions and DWOs, especially if a heuristic like dom/wdeg that can identify such subproblems is used. Due to the complex interactions that may exist between constraints, the activity also depends on the search algorithm, the propagation method, the variable ordering heuristic, and on the order in which constraints are propagated. For example, when solving scen11 with an algorithm that maintains maxRPC (MmaxRPC) + dom/wdeg, 29 constraints were DWO-active with 13 of these identified as DWO-active by both MAC and MmaxRPC. Importantly, many revisions of the constraints that are DWOactive and deletion-active are redundant or achieve very little pruning. Figure 1 demonstrates how DWOs (y-axis) caused by 4 sample constraints are detected as constraint revisions (x-axis) occur throughout search. That is, each data point gives the weight of the constraint at its i-th DWO-revision. The algorithm used is MAC + dom/wdeg and the sample constraints are taken from three structured and one random problem. As we can see, DWOs in structured problems form clusters of successive or very close calls to the revision procedure, with the exception of a few outliers. The same pattern occurs with respect to value deletions (not shown due to lack of space). In contrast, DWOs in the random instance are distributed in a much more uniform way along the line of revisions. Similar results were
250
60 50
200
constraint weight
constraint weight
150 100 50
40 30 20 10
weight updates
0 0
500
1000 1500 2000 2500 constraint revisions
3000
weight updates
0 3500
0
100
200 300 400 constraint revisions
500
600
160 60
140
50
120 constraint weight
A Constraint Satisfaction Problem (CSP) is a tuple (X, D, C) where: X is a set of n variables, D is a set of domains, one for each variable, and C is a set of e constraints. Each constraint c is a pair (var(c), rel(c)), where var(c) = {x1 , . . . , xk } is an ordered subset of X, and rel(c) is a subset of the Cartesian product D(x1 )x . . . xD(xk ). In a binary CSP, a directed constraint c, with var(c) = {xi , xj }, is arc consistent (AC) iff for every value ai ∈ D(xi ) there exists a value aj ∈ D(xj ) s.t. the 2-tuple <(xi , ai ), (xj , aj )> satisfies c. In this case (xj , aj ) is called an AC-support of (xi , ai ) on c. A problem is AC iff there is no empty domain in D and all the constraints in C are AC. A directed constraint c, with var(c) = {xi , xj }, is max restricted path consistent (maxRPC) iff it is AC and for each value (xi , ai ) there exists a value aj ∈ D(xj ) that is an AC-support of (xi , ai ) s.t. the 2-tuple <(xi , ai ), (xj , aj )> is path consistent (PC) [5]. A tuple <(xi , ai ), (xj , aj )> is PC iff for any third variable xm there exists a value am ∈ D(xm ) s.t. (xm , am ) is an AC-support of both (xi , ai ) and (xj , aj ). In this case we say that (xj , aj ) is a maxRPC-support of (xi , ai ) on c. The revision of a constraint c, with var(c) = {xi , xj }, using a local consistency A is the process of checking whether the values of xi verify the property of A. We say that a revision is fruitful if it deletes at least one value, while it is redundant if it achieves no pruning. A DWO-revision is one that causes a DWO. We will say that a constraint is DWO-active during a run of a search algorithm if it caused at least one DWO. Accordingly, we will call a constraint deletion-active if it deleted at least one value from a domain and deletion-inactive if it caused no pruning at all.
constraint weight
2
K. Stergiou / Heuristics for Dynamically Adapting Propagation
40 30 20
100 80 60 40
10
20 weight updates
0 0
200 400 600 800 1000 1200 1400 1600 1800 2000 constraint revisions
weight updates
0 0
1000 2000 3000 4000 5000 6000 7000 8000 900010000 constraint revisions
Figure 1. DWOs caused by sample constraints from the RLFAP instance scen11 (top left), the driver instance driver-08c (top right), the quasigroup completion instance qcp15-120-0 (bottom left), and the forced random instance frb35-17-0 (bottom right).
obtained when MmaxRPC was used in place of MAC. Note that in the structured problems the percentage of DWO-revisions to total revisions is low. There were also many redundant revisions. For example in the RLFAP instance the sample constraint, which was the most active one in terms of DWOs caused, was revised 3386 times during search, but only 407 of these revisions were fruitful, while only 265 were DWO-revisions. To further investigate these observations we run the Expectation Maximization (EM) clustering algorithm [7] on the data of Figure 1 (top left). This revealed 20 clusters of DWO-revisions with average size of 13.25. The mean and median standard deviation (SD) for the DWO-revisions (x-axis) across the clusters was 21.67 and 7.41 respectively. The SD in a cluster is an important piece of information as it represents the average distance of any member of the cluster from the cluster’s centroid. That is, it is a measure of the cluster’s density. The median SD over the 20 clusters is quite low which indicates that DWO-revisions are closely grouped together. The mean is higher because it is affected by the presence of outliers. That is, some of the clusters formed by EM may include outliers which increase the cluster’s SD. Table 1.
Clustering results from benchmark instances.
instance #constraints avg #clusters avg size mean SD median SD scen11 27/4103 6.66 10.82 41.09 16.12 driver-08c 87/9321 2.44 12.62 38.50 25.11 qcp15-120-0 554/3150 12.87 15.26 226.12 129.28 frb35-17-0 233/262 7.20 19.38 1856.70 1649.05
Table 1 shows clustering results from the four benchmark instances of Figure 1. For each instance we report the ratio of DWOactive constraints over the total number of constraints, the average number of clusters, the average cluster size, and the mean and median SD for the clusters of DWO-revisions. Averages are taken over 20 sample DWO-active constraints from each problem. The mean and median SD are much lower in structured problems compared to the random one verifying the observation that in the presence of structure DWO-revisions largely occur in clusters while in its absence they tend to be uniformly distributed. The question we try to
K. Stergiou / Heuristics for Dynamically Adapting Propagation
answer in the following is whether we can take advantage of this to discover dead-ends sooner through strong propagation while keeping cpu times manageable.
4
HEURISTICALLY ADAPTING PROPAGATION
We now present four simple heuristics that can be used to dynamically adapt the level of consistency enforced on individual constraints. These heuristics exploit information regarding domain reductions and wipeouts gathered during search. We limit ourselves to the case where dynamic adaptation involves switching between a weak, and cheap, local consistency and a stronger but more expensive one. In general, it may be desirable to utilize a suit of local consistencies with varying power and properties. The intuition behind the heuristics is twofold. First to target the application of the strong consistency on areas in the search space where a constraint is highly active so that domain pruning is maximized and dead-ends are encountered faster. And second to avoid using an expensive propagation method when pruning is unlikely. The first three heuristics try to take advantage of the clusterness that fruitful revisions display in structured problems, while the fourth heuristic simply reacts to any deletions caused by a constraint. Importantly, any heuristic, be it for branching or for adapting the local consistency enforced, must be lightweight, i.e. cheap to compute. As it will become clear, the heuristics proposed here are indeed lightweight as they affect the complexity of the propagation procedure only by a constant factor. The heuristics can be distinguished according to the propagation events they monitor (deletions or DWOs) and also according to the extent of user involvement in their tuning (fully and semi automated). Heuristics based on DWOs (value deletions) may change or maintain the level of local consistency employed on a given constraint by monitoring the DWOs (value deletions) caused by this constraint. There are also hybrid heuristics that may react to both types of propagation events. Fully automated heuristics do not require any tuning while semi automated ones are parameterized by a bound. This bound specifies the desired number of revisions during which a strong consistency is enforced after a propagation event has been detected. The greater the bound the longer is the strong consistency applied. In our experiments we have used AC and maxRPC as the weak and strong local consistency respectively. As proved in [5], maxRPC is strictly stronger than AC. That is, it will always delete at least the same values as AC. Also, maxRPC displays a good cpu time to value deletions ratio compared to other strong local consistencies [6]. However, since our approach is generic, when describing the heuristics we will avoid naming specific consistencies and instead we will refer to switching between a weak (W ) and a strong (S) local consistency, where S is strictly stronger than W . For each c ∈ C, the heuristics record the following information: 1) rev[c] is a counter holding the number of times c has been revised, incremented by one each time c is revised. 2) dwo[c] is an integer denoting the revision in which the most recent DWO caused by c occurred. 3) del[c] is a Boolean flag denoting whether the most recent revision of c resulted in at least one value deletion (del[c]=T) or not (del[c]=F). 4) del S[c] is a Boolean flag denoting whether the most recent revision of c identified and deleted at least one value that is W but not S. The flag becomes T only if a value that is W but not S is deleted. Otherwise, it is set to F. 5) del W [c] is a Boolean flag denoting whether the current revision of c resulted in at least one value deletion (del W [c]=T) or not (del W [c]=F). H1 (l): semi automated - DWO monitoring Heuristic H1 monitors and counts the revisions and DWOs of the constraints in the problem.
487
A constraint c is made S if the number of calls to Revise(c) since the last time it caused a DWO is less or equal to a (user defined) threshold l. That is, if rev[c]-dwo[c] ≤ l. Otherwise, it is made W . H2 : fully or semi automated - deletion monitoring H2 monitors revisions and value deletions. A constraint c is made S as long as del[c]=T. Otherwise, it is made W . H2 can be semi automated in a similar way to H1 by allowing for a (user defined) number l of redundant revisions after the last fruitful revision. If l is set to 0 we get the fully automated version of H2 . H3 : fully or semi automated - hybrid H3 is a refinement of H2 . It monitors revisions, value deletions, and DWOs. A constraint c is made S as long as del S[c]=T. Otherwise, it is made W . Once the constraint causes a DWO, del S[c] is set to T and the monitoring of S’s effects starts again. If this is not done then once del S[c] is set to F the constraint will thereafter be propagated using W . H3 can be semi automated in a similar way to H1 and H2 by allowing for a (user defined) number l of revisions that only delete W -inconsistent values or no values at all after the last revision that deleted values that were W but not S. H4 : fully or semi automated - deletion monitoring H4 monitors value deletions. For any constraint c, H4 applies W until del W [c] becomes T. In this case c is made S. In other words, if at least one value is deleted from the domain of a variable x ∈ var(c) by W then S is applied on the remaining available values in D(x). H4 can be semi automated by insisting that S is applied only if a (user defined) proportion p of x’s values have been deleted by W during the current revision of c. With high values of p S will be applied only when it is likely that it will cause a DWO. Importantly, the heuristics defined above can be combined either disjunctively or conjunctively in various ways. For example, heuristic H∨ 124 applies S on a constraint whenever the condition specified by either H1 , H2 , or H4 holds. Heuristic H∧ 24 applies S when both the conditions of H2 and H4 hold. We can choose a disjunctive or conjunctive combination depending on whether we want S applied to a greater or lesser extent respectively. Figure 2 describes the implementation of functions Revise for applying a weak or a strong consistency using the proposed heuristics. They are based on corresponding functions of coarse-grained algorithms like AC-3. Once a constraint is selected for revision a function we call Decide (which is not shown for space reasons) is called to determine how it will be propagated. This function is parameterized by the adaptive propagation heuristic h and the data structures required for the computation of the heuristics. The appropriate function w.r.t. to h is called to compute the heuristic and decide on the local consistency to be applied. Thereafter, depending on the selected consistency, the appropriate version of function Revise is called to perform the propagation. The two versions of Revise shown, one ∨ for W and one for S, implement H∨ 124 or H134 . As values are deleted and DWOs are detected, the data structures used by the heuristics are updated. Initially, i.e. before the first revision of c, del[c], del W [c] and del S[c] are set to F and rev[c], dwo[c] are set to 0.
5
EXPERIMENTS
We implemented and tested the heuristics described in Section 4 as well as a number of combined heuristics. We used d-way branching, dom/wdeg for variable ordering, and lexicographic value ordering. We experimented with the following classes of benchmarks taken from C. Lecoutre’s web page, where details about them can be found: radio links frequency assignment (RLFAP), langford, black hole, driver, hanoi, quasigroup completion, quasigroup with holes,
488
K. Stergiou / Heuristics for Dynamically Adapting Propagation
function Revise(c,xi ,S) rev[c]++; for each a ∈ D(xi ) if a is not W -supported in c then delete a from D(xi ); 2: del[c] ← T; else if a is not S-supported in c then delete a from D(xi ); 2: del[c] ← T; 3: del S[c] ← T; if D(xi ) = ∅ then dwo[c] ← rev[c]; 3: del S[c] ← T; 2:if no value is deleted then del[c] ← F; 3:if no value that is W is deleted by S then del S[c] ← F; function Revise(c,xi ,W ) rev[c]++; for each a ∈ D(xi ) if a is not W -supported in c then delete a from D(xi ); del W [c] ← T; 2: del[c] ← T; if del W =T then for each a ∈ D(xi ) if a is not S-supported in c then delete a from D(xi ); 3: del S[c] ← T; if D(xi ) = ∅ then dwo[c] ← rev[c]; 3: del S[c] ← T; 2:if no value is deleted then del[c] ← F; 3:if no value that is W deleted by S then del S[c] ← F; ∨ Figure 2. The versions of Revise given can apply H∨ 124 or H134 . ∨ ). Removing lines labelled with 3 (2) gives H∨ (H 124 134
graph coloring, composed random, forced random, geometric random. Some classes and many specific instances are very easy (e.g. composed) or very hard (e.g black hole) for all methods. The results presented below demostrate that the heuristics retain the efficiency of maxRPC where it is better than AC and improve it where it is worse. Also, we need to point out that for many of the tested classes there exist specialized methods that can solve the specific problems much faster than the generic methods we use. Our aim is only to demonstrate the efficiency of the proposed heuristics in dynamically switching between different local consistencies. Table 2 shows results from some real-world RLFAP instances. We compare adaptive algorithms that use the heuristics of Section 4, where each algorithm is denoted by the corresponding heuristic, to MAC and MmaxRPC, simply denoted by AC and maxRPC respectively. For H1 , and any combined heuristic that includes H1 , l was set to 100 while for H2 l was set to 10. These values were chosen empirically and display a good performance across a number of instances2 . In these problems maxRPC is too expensive to maintain compared to AC. The adaptive heuristics cut down the size of the explored search space and reduce the run times in most cases. This is more visible in problems where maxRPC visits considerably less nodes than AC (e.g. graph08-f11). Importantly, in easy problems or in problems where maxRPC does not have a considerable effect compared to AC the heuristics do not slow the search process in a significant way. Table 3 shows results, including only some of the heuristics, from instances belonging to the following classes of benchmarks: graph 2
The fully automated version of H2 is competitive but less robust.
Table 2. Nodes (n) and cpu times (t) in seconds from RFLAP instances. The s and g prefixes stand for scen and graph respectively. The best cpu time for each instance is highlighted with bold. instance AC maxRPC H1 H2 H3 H4 H∨ H∨ 14 124 s11 n 2864 1334 1175 1842 1432 1678 1358 1360 t 6.9 24.2 3.7 6.7 5.5 6.0 4.9 4.9 s11-f9 n 108184 37663 35102 47552 39312 53338 38202 37743 t 539.6 3478.3 170.4 335.4 183.3 274.8 205.2 212.7 s11-f10 n 8576 2098 2197 2675 1938 3849 2462 2467 t 30.2 93.8 11.6 18.8 10.2 13.9 11.4 11.3 s11-f12 n 6678 1923 1750 2804 1763 3095 1953 1921 t 19.7 101.7 8.6 14.5 9.4 14.7 11.0 10.6 s02-f25 n 11998 5262 3114 10802 2938 12961 4367 4922 t 9.3 65.1 5.6 16.0 5.5 15.2 9.3 10.3 s03-f11 n 8314 880 1047 4830 2762 4518 2068 1489 t 26.4 24.7 5.6 20.2 11.8 17.2 12.5 9.5 g08-f10 n 11948 6342 6650 6423 9540 4863 4474 4119 t 34.5 147.1 21.9 19.4 26.8 13.9 16.3 16.2 g08-f11 n 9996 629 753 960 748 713 608 619 t 35.9 18.7 4.3 4.5 4.8 3.6 3.6 3.6 g14-f27 n 11602 926 10759 2237 9698 2877 2750 2750 t 13.0 2.5 15.3 3.1 17.2 3.3 3.1 3.1
coloring (1st,2nd), driver (3rd,4th), quasigroup completion (5th-7th), quasigroups with holes (8th,9th). In some of these problems maxRPC is much more efficient than AC. The heuristics, except H4 , can further improve on the performance of maxRPC making the adaptive algorithms considerably more efficient than MAC. Table 3. Nodes (n) and cpu times (t) in seconds from structured instances. instance queen8-8-8 games120-9 driverlogw-08 driverlogw-09 qcp-15-120-0 qcp-15-120-5 qcp-15-120-10 qwh-20-166-0 qwh-20-166-1
n t n t n t n t n t n t n t n t n t
AC maxRPC H2 H4 H∨ H∨ 24 124 1458 2807 5863 4244 >1h 3.15 2.9 >1h 5.1 2.7 3208852 1392922 5511126 2265133 1604133 1452449 403.7 432.3 834.3 293.7 216.1 195.9 3814 785 1003 3417 855 903 13.2 25.5 6.9 9.2 6.1 6.2 14786 8342 10802 10627 8859 8895 239.2 265.8 152.9 167.1 137.8 141.2 108336 21926 35394 101901 29990 27167 98.4 43.3 39.9 83.9 33.4 28.3 387742 80424 84193 370461 81269 112290 422.0 201.0 118.2 369.4 117.7 147.0 1136801 52112 58325 152497 76399 68046 1178.0 113.6 65.1 145.1 88.6 71.2 104288 20236 15550 62993 15591 24725 269.1 86.9 42.3 140.0 46.0 78.2 132842 22688 29681 66775 25147 39435 355.4 111.4 88.2 151.1 78.5 116.7
The results given in Tables 2 and 3 show that individual heuristics can display considerable variance in their performance from instance to instance. On the contrary, combined heuristics are quite robust. Comparing the heuristics, H2 and the combined ones that include H2 display good performance on a variety of problems. It has to be noted ∨ that H∨ 24 and H124 were faster than AC in all instances we tried from the classes mentioned at the start of this section, except for some easy instances where they were slightly slower. H1 and H3 are effective on RLFAPs but worse than H2 on quasigroup problems. The fully automated version of H4 displays the worst performance among the individual heuristics. But we have not yet tried semi automated versions of H4 . Overall the heuristics offer a good balance between AC and maxRPC. In problems where maxRPC offers significant savings in nodes, they retain this advantage and translate it into considerable savings in run times. In problems where maxRPC offers moderate savings in nodes, the heuristics significantly reduce the run times of maxRPC and are competitive, and often faster, than AC.
K. Stergiou / Heuristics for Dynamically Adapting Propagation
Finally, Table 4 gives result from forced and geometric random problems. As is clear, in such problems that lack structure the heuristics do not reduce the node visits in a significant way and are outperformed by AC. The best heuristic is by far H4 . This is because H4 does not target clusters of activity to apply maxRPC but reacts to value deletions wherever they occur. Hence, it is not significantly handicapped by the absence of clusters. Table 4.
Nodes (n) and cpu times (t) in seconds from random instances.
instance frb35-17
n t frb40-19 n t geo50-20-75 n t
AC maxRPC H2 H4 H∨ 24 23782 14920 15022 21182 15064 13.5 107.5 47.5 16.1 48.4 40058 20073 24446 32393 19722 24.9 151.6 76.8 27.9 63.4 227535 112785 148853 221211 142416 218.9 2089.4 765.7 247.1 748.3
H∨ 124 14642 46.8 22752 76.1 141726 750.1
A final interesting observation is that sometimes the heuristics result in fewer node visits than maxRPC or in more than AC. This is explained by the interaction between constraint propagation and the variable ordering heuristic. Different propagation methods can lead to different weight increases for the costraints, which in turn can guide dom/wdeg to different variable selections, and hence different parts of the search space.
6
RELATED WORK
Building adaptive constraint solvers is a topic that has attracted considerable interest in the literature (see for example [1, 15, 9, 12]). Part of this interest has been directed to the dynamic adaptation of constraint propagation during search. The most common manifestation of this idea is the use of different propagators for different types of domain reductions in arithmetic constraints. When handling arithmetic constraints most solvers differentiate between events such as removing a value from the middle of a domain, or from a bound of a domain, or reducing a domain to a singleton, and apply suitable propagators accordingly. Works on adaptive propagation for general constraints include the following. El Sakkout et al. proposed a scheme called adaptive arc propagation for dynamically deciding whether to process individual constraints using AC or forward checking [8]. Freuder and Wallace proposed a technique, called selective relaxation which can be used to restrict AC propagation based on two criteria; the distance in the constraint graph of any variable from the currently instantiated one, and the proportion of values deleted [10]. Chmeiss and Sais presented a backtrack search algorithm, MAC (dist k), that also uses a distance parameter k as a bound to maintain a partial form of AC [4]. Schulte and Stuckey proposed a technique for selecting which propagator to apply to a given constraint, among an array of available constraint propagators, using priorities that are dynamically updated [17]. Similar ideas are also implemented in constraint solvers such as Choco [13]. Probabilistic arc consistency is a scheme that can help avoid some consistency checks and constraint revisions that are unlikely to cause any domain pruning [14]. As in [8], the scheme is based on information gathered by examining the supports of values in constraints which can be very expensive for non-binary constraints. Our work is more closely related to [8] as the aim is to dynamically adapt the level of local consistency achieved on individual constraints. However, neither [8] or any of other works use information about failures captured in the form of constraint weights to achieve this. Besides, to the best of our knowledge, although many levels of consistency stronger than AC have been proposed, they have not been studied in this context before (i.e. evoking them dynamically).
7
489
CONCLUSION
We have proposed a number of simple heuristics for dynamically switching between different local consistencies applied on individual constraints during search. These heuristics monitor propagation events like DWOs and value deletions caused by the constraints and react by changing the propagation method when certain conditions are met. The inspiration behind the development of the heuristics was based on observing the activity of the constraints when using conflict-driven search heuristics. As we demonstrated, DWOs and value deletions in structured problems mostly occur in clusters of consecutive or nearby revisions. This can be taken advantage of to increase or decrease the level of consistency applied when a constraint is highly active or inactive respectively. Experimental results from various domains displayed the usefulness of the heuristics. The work presented here is only a first step towards designing heuristics for adaptive constraint propagation using information gathered during search. There are several directions for future work. First of all we need to further evaluate the heuristics including their conjunctive combinations. We can also investigate different local consistencies for binary and non-binary problems, try to devise more sophisticated heuristics, and integrate with existing related works (e.g. [14]). Also, it would be interesting to study the interaction of adaptive propagation with other adaptive branching heuristics apart from dom/wdeg. For example, the impact-based heuristics of [16] and the explanation-based heuristics of [3].
REFERENCES [1] J. Borrett, E Tsang, and N. Walsh, ‘Adaptive Constraint Satisfaction: The Quickest First Principle’, in ECAI-96, pp. 160–164, (1996). [2] F. Boussemart, F. Heremy, C. Lecoutre, and L. Sais, ‘Boosting systematic search by weighting constraints’, in ECAI-2004, pp. 482–486, (2004). [3] H. Cambazard and N. Jussien, ‘Identifying and Exploiting Problem Structures Using Explanation-based Constraint Programming’, Constraints, 11, 295–313, (2006). [4] A. Chmeiss and L. Sais, ‘Constraint Satisfaction Problems: Backtrack Search Revisited’, in ICTAI-2004, pp. 252–257, (2004). [5] R. Debruyne and C. Bessi`ere, ‘From restricted path consistency to maxrestricted path consistency’, in CP-97, pp. 312–326, (1997). [6] R. Debruyne and C. Bessi`ere, ‘Domain Filtering Consistencies’, Journal of Artificial Intelligence Research, 14, 205–230, (2001). [7] A. Dempster, N. Laird, and D. Rubin, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm’, Journal of the Royal Statistical Society, 39(1), 1–38, (1977). [8] H. El Sakkout, M. Wallace, and B. Richards, ‘An Instance of Adaptive Constraint Propagation’, in CP-96, pp. 164–178, (1996). [9] S. Epstein, E. Freuder, R. Wallace, A Morozov, and Samuels. B., ‘The Adaptive Constraint Engine’, in CP-2002, pp. 525–540, (2002). [10] E. Freuder and R.J. Wallace, ‘Selective relaxation for constraint satisfaction problems’, in ICTAI-96, (1996). [11] D. Grimes and R.J. Wallace, ‘Sampling Strategies and Variable Selection in Weighted Degree Heuristics’, in CP-2007, pp. 831–838, (2007). [12] 1st International Workshop on Autonomous Search (in conjunction with CP-07), eds., Y. Hamadi, E. Monfroy, and F. Saubion, 2007. [13] F. Laburthe and Ocre, ‘Choco : implementation du noyau d’un systeme de contraintes’, in JNPC-00, pp. 151–165, (2000). [14] D. Mehta and M.R.C. van Dongen, ‘Probabilistic Consistency Boosts MAC and SAC’, in IJCAI-2007, pp. 143–148, (2007). [15] S. Minton, ‘Automatically Configuring Constraint Satisfaction Programs: A Case Study’, Constraints, 1(1/2), 7–43, (1996). [16] P. Refalo, ‘Impact-based search strategies for constraint programming’, in CP-2004, pp. 556–571, (2004). [17] C. Schulte and P.J. Stuckey, ‘Speeding Up Constraint Propagation’, in CP-2004, pp. 619–633, (2004).
490
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-490
Near Admissible Algorithms for Multiobjective Search Patrice Perny and Olivier Spanjaard1 Abstract. In this paper, we propose near admissible multiobjective search algorithms to approximate, with performance guarantee, the set of Pareto optimal solution paths in a state space graph. Approximation of Pareto optimality relies on the use of an epsilon-dominance relation between vectors, significantly narrowing the set of nondominated solutions. We establish correctness of the proposed algorithms, and discuss computational complexity issues. We present numerical experimentations, showing that approximation significantly improves resolution times in multiobjective search problems.
1
INTRODUCTION
Heuristic search in state space graphs was initially considered in the framework of single objective optimization. The value of a path is defined as the sum of the costs of its arcs and the problem amounts to finding one path with minimum cost among all paths from a source node to the goal. This problem is solved by constructive search algorithms like A∗ [6] which provide the optimal solution-path. In this case preferences are measured by a scalar cost function inducing a complete weak-order over sub-paths. However, preferences are not always representable by a single criterion function. For example, in path planning problems for autonomous agents, the action allowing a transition from a state to another might have an impact in term of time, distance, energy consumption etc, thus leading to different points of views, non-necessarily reducible to a single overall cost [3]. More generally, multiobjective search is very useful in many applications requiring computer-aided problem solving (e.g., engineering design, preference-based configuration). It justifies the interest for search algorithms like MOA∗ [12], the multiobjective extension of A∗ , and its recent refinement by Mandow and P´erez-de-la-Cruz [8]. Besides these works on exact algorithms, several ε-admissible variations of the A∗ algorithm have been proposed in the literature (e.g. [11, 4]). These algorithms guarantee to find a solution that is within a factor of (1 + ε) of the best solution. They realize a compromise between time and space requirements on the one hand, and optimality of the returned solution on the other hand. These variations have proved to perform well, achieving a significant reduction of the number of iterations (up to 90% for ε = 0.1) to solve instances of the traveling salesman problem [11, 4]. Near admissible algorithms might also prove their efficiency in multiobjective search. At least, the introduction of tolerance thresholds in dominance concepts is worth investigating, with possibly a twofold benefit: not only it might simplify the search by increasing pruning possibilities, but it also might possibly reduce the size of the potential output (the set of non-dominated elements). This latter point is crucial as can be seen in the following example derived from Hansen [5]. 1
LIP6, Univ. Pierre and Marie Curie, 104 av. du Pr´esident Kennedy 75016 Paris, France, email: firstname.lastname@lip6.fr. This work has been supported by the ANR project PHAC which is gratefully acknowledged.
Example 1. Consider a simple biobjective state-space graph with a set N = {0, . . . , q} of nodes, 0 being the initial node and q being the goal node. At each node n ∈ N \{q}, two actions a1 , a2 are feasible: action a1 leads to node {n + 1} with cost (2n , 0) whereas action a2 leads to the same node with cost (0, 2n ). By construction, there exists 2q distinct solution-paths from 0 to q in this graph with costs (k, 2q − 1 − k) for k = 0, . . . , 2q − 1. For example the sequence of q times action a1 yields a solution path with cost (2q − 1, 0) whereas the sequence of q times action a2 yields a solution path with cost (0, 2q −1). In that graph, all the paths from 0 to q have the same sum of costs but distinct costs on the first objective (due to the uniqueness of the binary representation of an integer). The images of all these paths in the space of objectives are on the same line (orthogonal to vector (1,1)) and therefore, they are all Pareto-optimal. In such family of instances, with q nodes and only 2 actions and 2 objectives, we can see that the number of Pareto-optimal paths grows exponentially with q. For instance, if q = 16 we have 65536 Pareto-optimal solution paths. This example shows that the exact determination of the Pareto set might induce prohibitive computation times. Moreover, producing the entire list of Pareto optimal solutions is probably useless for the Decision maker. In such cases, two approaches might be of interest: 1) focusing the search on a specific compromise solution; 2) approximating the Pareto set while keeping a good representation of the various possible tradeoffs in the Pareto set. The first approach requires additional preference information from the Decision Maker concerning, for example, the relative importance of criteria, the compensations allowed, and the type of compromise sought. When this information is not available, the second approach is particularly relevant. In this direction, several studies have been proposed, relying on the concept of ε-dominance introduced as an approximation of Pareto dominance in various multiobjective problems [14, 10, 2, 7, 1]. Despite the growing interest for these concepts, the potential of ε-relaxation of dominance concepts has not been investigated, to the best of our knowledge, in the framework of multiobjective search on implicit state space graphs. This is precisely the aim of this paper which is organized as follows. In the first two sections, we recall some useful results. In Section 2 we introduce formal material for the approximation of the Pareto set. In Section 3 we provide a simple reformulation of a multiobjective search algorithm to determine the exact Pareto set, and we prove its pseudopolynomiality. Then, we show how to modify this algorithm to get more efficient and near admissible versions. Finally, we provide numerical experimentations in the last section.
2
PARETO SET AND ITS APPROXIMATION
Considering a finite set of objectives {1, . . . , m} any solution-path can be characterized by a cost-vector (c1 , . . . , cm ) ∈ Zm + where ci represents the cost of the path with respect to objective i. Hence, the
491
P. Perny and O. Spanjaard / Near Admissible Algorithms for Multiobjective Search
comparison of paths reduces to the comparison of their cost-vectors. The set of all cost-vectors attached to solution-paths is denoted X. We recall now some definitions linked to dominance concepts: Definition 1. The weak Pareto dominance relation (p -dominance for short) on cost-vectors of Zm + is defined by: x p y ⇐⇒ [∀i ∈ {1, . . . , m}, xi ≤ yi ] Thus x dominates y, which is denoted by x p y, when x is at least as good as y with respect to all objectives. For any dominance relation defined on a set X, we will use the following definitions: Definition 2. Any element x ∈ X is said to be -optimal in X if, for all y ∈ X, y x ⇒ x y. If x is not -optimal then it is said to be -dominated. Definition 3. A subset Y ⊆ X is said to be a -covering of X if for all x ∈ X there exists y ∈ Y such that y x. Whenever no proper subset of Y is a -covering of X, then Y is said to be a minimal -covering of X. The aim of multiobjective search is to find a p -covering of the set of solution-paths. As shown in Example 1, such a set can be very large. This difficulty can be overcome by resorting to an approximate dominance concept called ε-dominance relation [5, 14]: Definition 4. The ε-dominance relation on cost-vectors of Zm + is defined by: x ε y ⇐⇒ x p (1 + ε)y As an illustration, consider the left part of Figure 1 concerning a bi-objective problem where every feasible solution is represented by a point x = (x1 , x2 ) in the bi-objective space. Within this space, point p1 (resp. p2 ) ε -dominates all the points within cone C 1 (resp. C 2 ). The notion of ε -covering arises then naturally. Indeed, the set {p1 , p2 } is a 2-points ε -covering of X since X ⊆ C 1 ∪ C 2 . Note that a smaller ε yields a finer ε -covering of X, as illustrated on the right part of Figure 1, where a 5-points ε -covering of the same set X is given. x2
x2
p1 C1
C2
p2 x1
Figure 1.
x1
ε > 0 and any set X of bounded vectors x such that 1 ≤ xi ≤ M for all i ∈ {1, . . . , m}, there exists a ε -covering subset of X the size of which is polynomial in log M and 1/(log(1 + ε)), see [10, 7]. This can simply be explained by considering a logarithmic scaling m function ϕ : Zm + → Z+ on the objective space, defined as follows: — log xi ϕ(x) = (ϕ1 (x), . . . , ϕm (x)) with ϕi (x) = log(1 + ε) For every component xi , it returns an integer k such that (1 + ε)k ≤ xi < (1 + ε)k+1 . Using ϕ we can define a ϕ-dominance relation: Definition 5. The ϕ-dominance relation on cost-vectors of Zm + is defined by: x ϕ y ⇐⇒ ϕ(x) p ϕ(y) This relation satisfies the following properties: Proposition 1. For all vectors x, y, z ∈ Zm + , we have: (i) x ϕ y and y ϕ z ⇒ x ϕ z (transitivity) (ii) x ϕ y ⇒ x ε z. The symmetric part of ϕ defined by x ≡ϕ y if and only if ϕ(x) = ϕ(y) is therefore an equivalence relation (by transitivity). Clearly, by keeping one element of X for each equivalence class of ≡ϕ , one obtains an ϕ -covering of X [10]. The left part of Figure 2 illustrates this point on the bi-objective example introduced for Figure 1. The dotted lines form a logarithmic grid in which each square represents an equivalence class for ≡ϕ . Hence the set of black points (one per non-empty square) represents a ϕ -covering of all points. Interestingly enough, the resulting ϕ -covering is also a ε covering set by Proposition 1 (ii). Moreover, the size of this ε covering is upper bounded by the number of equivalence classes of relation ≡ϕ , which is not greater than: (1 + 9log M/ log(1 + ε):)m [10]. A refined ϕ -covering (which is also an ε -covering) can easily be derived by removing ϕ -dominated elements (we keep only the black points on the right part of Figure 2) which improves the bound to (1+9log M/ log(1+ε):)m−1 , see [7]. Coming back to Example 1 with q = 16, a p -covering requires 65536 solution-paths whereas a ε -covering of this set constructed with k ϕ as indicated j 65536 + 1 = 117 eleabove (for ε = 0.1) contains at most log log 1.1 ments. More generally, it is important to note that, for fixed values of ε and m, the size of the ε -covering grows only polynomially with the size of the instance, even when the Pareto set grows exponentially. In addition, if a set Y ⊆ X is an ε -covering of X, we know (by Definition 3) that any feasible tradeoff achieved in X is approximated with performance garantee, i.e it is ε -dominated by at least one element in Y . This enables a more concise and yet representative description of possible tradeoffs in the Pareto set. The question is whether an ε -covering is computable in polynomial time or not. x2
x2
ε-coverings for two values of ε.
Note that, for a given ε, several minimal ε -covering subsets of different sizes exist. For example, consider X = {x, y, z} with x = (800, 950), y = (880, 880) and z = (950, 800) and set ε = 0.1. The set {x, z} is an ε -covering subset of X since 800 ≤ 968 = (1 + 0.1) × 880 and 950 ≤ 968 and thereby x ε y. Furthermore, neither x ε z nor z ε x, and therefore {x, z} is minimal. Note that {y} is also a minimal ε -covering subset. On the one hand we have indeed y ε x since 880 ≤ 880 = (1 + 0.1) × 800 and 880 ≤ 1045 = (1 + 0.1) × 950, on the other hand y ε z for the same reasons. The very interest of ε-dominance lies in the following property: for any fixed number m > 1 of objectives, for any finite
x1 1
(1 + ε) (1 + ε)2
(1 + ε)3
(1 + ε)4
Figure 2.
x1 1
(1 + ε) (1 + ε)2
Logarithmic grid.
(1 + ε)3
(1 + ε)4
492
3
P. Perny and O. Spanjaard / Near Admissible Algorithms for Multiobjective Search
MULTIOBJECTIVE SEARCH ALGORITHM ∗
We now present a multiobjective extension of A (reformulation of the label-expanding version of MOA∗ [8]), and we prove its pseudopolynomiality, which is directly related to that of the size of the Pareto set. In the following sections, we will then use a logarithmic grid to derive near-admissible algorithms: a first one the complexity of which is polynomial in the number of states, a second one more efficient in practice in spite of a higher theoretical complexity. To our knowledge, this is the first attempt to devise near admissible algorithms for multiobjective search in implicit graphs (the existing near admissible multiobjective algorithms work in explicit graphs). A∗ algorithm and its multiobjective extensions explore a state space graph G = (N, A) where N is a finite set of nodes (possible states), and A is a set of arcs representing transitions. Formally, we have A = {(n, n ) : n ∈ N, n ∈ S(n)} where S(n) ⊆ N is the set of all successors of node n. A cost-vector v(n, n ) is at tached to each arc (n, cost-vector of a path P is Pn ) ∈ A, and the defined by v(P ) = (n,n )∈P v(n, n ). In the sequel, we assume that v(P ) ∈ [1, M ] for every solution path, where M is a known constant. Then s ∈ N denotes the source of the graph (the initial state), Γ ⊆ N the subset of goal nodes, P(s, Γ) the set of all paths from s to a goal node γ ∈ Γ (solution-paths), and P(n, n ) the set of all paths linking n to n , characterized by a list n, . . . , n of nodes. Unlike the scalar case, there possibly exists several p -optimal paths with distinct cost-vectors to reach a given node in a multiobjective problem. Hence, one expands labels = [n , P , g ] (attached to subpaths) rather than nodes, where n indicates the labeled node, P the corresponding subpath in P(s, n ), and g the cost-vector of P . As in A∗ , the set of generated labels is divided into two disjoint sets: a set OPEN of not yet expanded labels and a set CLOSED of already expanded labels. Besides, the p -optimal expanded labels in { : n ∈ Γ} are stored in a set SOL. Since a node n may be on the path of more than one p -optimal solution, a set H(n) of heuristic cost-vectors is given for each node n, estimating the set {v(P ) : P ∈ P(n, Γ)}. For each generated label 0 , a set F ( 0 ) of evaluation vectors is computed from all possible combinations {g0 + h, h ∈ H(n0 )}. It estimates the set of p -optimal values of solution-paths extending P0 . Initially, OPEN contains only label [s, s, 0], while CLOSED and SOL are empty. At each subsequent step, one expands a label ∗ in OPEN such that F ( ∗ ) contains at least one p -optimal vector in ∪∈OPEN F ( ). The process is kept running until OPEN becomes empty. Two pruning rules are used: S Rule R1 : discard label when there exists ∈ OPEN CLOSED s.t. n = n and g p g . Rule R2 : discard label when ∀f ∈ F ( ), ∃ ∈ SOL s.t. g p f . These rules ensure to generate all p -optimal paths in P(s, Γ) provided heuristic H is admissible, i.e. ∀n ∈ N, ∀P ∈ P(n, Γ), ∃h ∈ H(n) s.t. h p v(P ). The algorithm is outlined below: M ULTIOBJECTIVE S EARCH A LGORITHM (MOA∗ ) Input: G, OPEN, CLOSED, SOL while OPEN = ∅ 01 move a label ∗ from OPEN to CLOSED 02 if n∗ ∈ Γ 03 then U PDATE(SOL, ∅, ∗ ) 04 else for each node n ∈ S(n∗ ) do 05 create 0 = [n , P∗ , n , g∗ + v(n, n )] 06 if ∃f0 ∈ F ( 0 ) s.t. ∀ ∈ SOL not(g p f0 ) 07 then U PDATE(OPEN(n ), CLOSED(n ), 0 ) 08 else discard 0 Output: SOL
This algorithm calls procedure U PDATE which applies to L1 , a list of open labels, and L2 , a list of closed labels. It possibly updates list L1 with label as follows: U PDATE(L1 , L2 , ) 01 If ∀ ∈ L1 ∪ L2 not(g p g ) then L1 ← L1 ∪ {l} 02 Remove p -dominated labels from L1 We now show that this multiobjective search algorithm is pseudopolynomial for integer costs (for a fixed number m of objectives), with the following worst case complexity analysis. The “while” loop in the main procedure is iterated at most |N | (M + 1)m times since this is the maximum number of distinct labels. Indeed there are |N | nodes, and for each of them, the number of different cost vectors is upper bounded by (M + 1)m . Furthermore, at each iteration of the loop the main computational cost is due to line 06 which requires binary comparisons of labels from F ( 0 ) and SOL. With a naive method, this represents (M + 1)2m comparisons. Hence the algorithm executes less than |N | (M + 1)m loops of cost (M + 1)2m . Therefore the overall complexity is within O(|N |2 M 3m ).
4
APPROXIMATION ALGORITHMS
We consider now two ways of relaxing the exact version of the multiobjective search algorithm so as get a better efficiency, either by modification of R1 or R2 (both modification cannot be performed together without losing the performance guarantee).
4.1
An FPTAS for multiobjective search
In this subsection, we assume that a finite upper bound L on the lengths (numbers of arcs) of all solution-paths in P(n, Γ) is known. Under this assumption, we provide a Fully Polynomial Time Approximation Scheme (FPTAS) for computing an approximation of the Pareto set. For simplicity, we assume throughout this section that the input is a finite graph on |N | nodes. Several FPTAS to compute ε coverings in multiobjective shortest path problems (MSP) have been proposed in the literature; that is, algorithms that, given an encoding of the graph and an accuracy level ε > 0, yield an ε -covering in time and space bounded by a polynomial in |N | and 1ε . Hansen [5] and Warburton [14] have proposed methods combining rounding and scaling techniques (i.e., approximating data elements before the execution of an algorithm) with pseudopolynomial exact algorithms (i.e., algorithms that operate in time and space bounded by a polynomial in |N | and the largest data element), in order to keep polynomially bounded the size of the auxiliary data computed during the execution. These methods are particular to biobjective problems and acyclic graphs respectively. Another algorithm is due to Papadimitriou and Yannakakis [10]. It is less specific to MSP, and its interest resides mainly in its generality: it proceeds by computing one solution (if it exists) inside every box of the logarithmic grid of Section 2. The authors show that this can be polynomially performed in a problem A if there is a pseudopolynomial algorithm for the exact version of A (given an instance of A and an integer B, is there a feasible solution with cost exactly B?). Finally, Tsaggouris and Zaroliagis [13] have recently proposed an FPTAS based on a generalized BellmanFord algorithm. Except for [10], all the other approaches rely on dynamic programming. We now show how to obtain an FPTAS by applying trimming techniques to the multiobjective search algorithm. The idea is to keep polynomially bounded the number of possible labels at each node, by using a logarithmic grid. Nevertheless, it is not possible to work directly with ϕ in place of p within procedure UPDATE because we might exceed the desired error threshold (1+ε) due to error propagations, as shown in the following example.
493
P. Perny and O. Spanjaard / Near Admissible Algorithms for Multiobjective Search
Example 2. Consider the graph with nodes {s, n, n , γ} and costs: v(s, n) = (2, 2), v(s, n ) = (1, 1.1), v(n , n) = (0.9, 1), v(n, γ) = (1, 1), v(n , γ) = (2.3, 1.8). We set ε = 0.1. We get two labels at node n, 1 = [n, s, n, (2, 2)] and 2 = [n, s, n , n, (1.9, 2.1)]. Since 1 ε 2 , assume that 2 is discarded. We get two labels 3 = [γ, s, n , γ, (3.3, 2.9) and 4 = [γ, s, n, γ, (3, 3)] at γ. At this point 4 might be discarded since 3 ε 4 . In this case the unique returned solution path would be s, n , γ with cost (3.3, 2.9)]. However it is clear that path s, n , n, γ with cost (2.9, 3.1) is not εcovered by (3.3, 2.9). Actually we have: (3.3, 2.9) p 1.1(3, 3) and (3, 3) p 1.1(2.9, 3.1) but not (3.3, 2.9) p 1.1(2.9, 3.1). We only have (3.3, 2.9) p 1.12 (2.9, 3.1). This example suggests a possible solution relying on the assumption that solution-paths contain at most L arcs. We might replace 1 (1 + ε) by (1 + ε) L to remain below (1 + ε) by propagation of errors. This idea is implemented in the following revised pruning rule. S Rule R1 : discard label when there exists ∈ OPEN CLOSED s.t. n = n and ψ(g ) p ψ(g ) m where ψ is a logarithmic scaling function ψ : Zm + → Z+ on the objective space, defined as follows: j k ψ(x) = (ψ1 (x), . . . , ψm (x)), ψi (x) = log xi / log(1 + ε)1/L This lead to replace procedure U PDATE by: ψ−U PDATE(L1 , L2 , ) 01 If ∀ ∈ L1 ∪ L2 not(ψ(g ) p ψ(g )) 02 then L1 ← L1 ∪ {l} 03 Remove ψ -dominated labels from L1 With ψ−U PDATE the multiobjective search algorithm becomes polynomial in |N | and 1ε , provided 1 ≤ M ≤ 2p(|N |) where p denotes some polynomial. Indeed, the cost jof every solution pathkon 1 the logarithmic scale is upper bounded by log M/ log(1 + ε) L ∈ O(L log M/ε). Hence, the global complexity of the algorithm becomes O(|N |2 (L log M/ε)3m ). Since L ≤ |N | and log M ∈ ` ´3m O(p(|N |)), it is within O( 1ε p(|N |)) for some polynomial p and therefore polynomial in 1ε and |N |. Now, it remains to show that this version of the algorithm yields a ε -covering subset of the solution-paths. To this end we state the following propositions: Proposition 2. For all i ∈ {1, . . . , L}, ∀x, y, z ∈ X the following monotonicity property hold: i i x p y(1 + ε) L ⇒ (x + z) p (y + z)(1 + ε) L Note that this monotonicity property does not hold for the ψdominance relation induced by ψ(x) p ψ(y). Proposition 3. Let P ∈ P(s, Γ). At any time before termination, if ∀ ∈ SOL not(g ε v(P )), then there exists ∈ OPEN and P extending P such that v(P ) ε v(P ). Proof. Consider a solution-path P = s, n1 , . . . , nk ∈ Γ. By contraposition, assuming that for all ∈ OPEN no solution-path P extending P is such that v(P ) ε v(P ), we show that there exists a label ∈ SOL for which g ε v(P ). For that purpose, we exhibit a finite sequence ( i ) of closed labels generated during the search, i such that gi p (1 + ε) L v(Pi ) (1), where Pi = s, n1 , . . . , ni . We proceed as follows: for i = 0, we set 0 = [s, s, 0] and we 0 clearly have g0 p (1 + ε) L v(P0 ). Inductively, assume now that labels 0 , . . . , j have been generated and closed (j < k), such that Equation 1 holds for i = 0, . . . , j. Let = [nj+1 , Pj , nj+1 , gj + v(nj , nj+1 )] be the label of the path from s to nj+1 extending
Pj . This label has been generated since j has been expanded and nj+1 ∈ S(nj ). There are two cases: Case 1. If ∈ CLOSED then we set j+1 = . Case 2. If ∈ CLOSED, we cannot have ∈ OPEN since it would contradict the initial assumption. Indeed, consider solutionpath P = Pj , nj+1 , . . . , nk . We would have: v(P ) = gj + j
v(nj , . . . , nk ) p (1 + ε) L v(Pj ) + v(nj , . . . , nk ) p (1 + ε)v(Pj ) + (1 + ε)v(nj , . . . , nk ) p (1 + ε)v(P ). Hence, has been generated, but ∈ OPEN ∪ CLOSED. Therefore has been discarded using pruning rule R1 or R2 : Case 2.1. If is discarded by R1 , then there exists ∈ OPEN ∪ CLOSED such that n = n and ψ(g ) p ψ(g ), which implies 1 1 1 g p (1 + ε) L g . We have g p (1 + ε) L g = (1 + ε) L (gj + 1
j
v(nj , nj+1 )) p (1 + ε) L ((1 + ε) L v(Pj ) + v(nj , nj+1 )) p j+1 (1 + ε) L v(Pj+1 ). Moreover, by the same reasoning as above with P = P , nj+2 , . . . , nk , we have v(P ) ε v(P ), and therefore
cannot be in OPEN. Hence, ∈ CLOSED and we set j+1 = . Case 2.2. If R2 prunes , then sequence ( i ) is stopped. Whenever case 2.2 stops the sequence 0 , . . . , j by discarding label
, then for all f ∈ F ( ), there exists ∈ SOL s.t. g p f (Eq. 1). Moreover, there exists f ∈ F ( ) such that f ε v(P ), as we now show. By admissibility of H, there exists h ∈ H(nj+1 ) such that h p v(nj+1 , . . . , nk ). Then, there exists f = g + h ∈ F ( ) such that f p g + v(nj+1 , . . . , nk ) = gj + v(nj , . . . , nk ) j
p (1 + ε) L v(Pj ) + v(nj , . . . , nk ) p (1 + ε)v(Pj ) + (1 + ε)v(nj , . . . , nk ) = (1+ε)v(P ). Hence f p (1+ε)v(P ) (Eq. 2). From Eq. 1 and Eq. 2, we get g ε v(P ) (by transitivity of p ). Whenever case 2.2 does not occur, the sequence continues until j = k. Once label k has been expanded at nk , solution-path Pk has been discovered and SOL includes a label such that g ε v(P ). In all cases, the existence of ∈ SOL with g ε v(P ) is proved. 2 From this proposition, it follows that the algorithm cannot terminate as long as the solution-paths stored in SOL does not constitute a ε -covering of P(s, Γ). Indeed, it would imply that OPEN is nonempty, which contradicts the termination of the algorithm. We can therefore conclude that the algorithm returns a ε -covering subsets of solution-paths. Note that this technique is mainly of theoretical interest, since the complexity is quadratic in the number |N | of states but |N | is usually exponential in the depth of the search. We propose therefore below a simpler technique that also guarantees the approximation, more efficient in practice in spite of a higher complexity.
4.2
A near admissible version of MOA∗
We now present the MOA∗ε algorithm, which returns a ε -covering of solution-paths without requiring the knowledge of an upper bound on the number of nodes that can be expanded. The basic features of the algorithm are essentially the same as MOA∗ . The main difference lies in the following pruning rule which uses ε -dominance: Rule R2 : discard label when ∀f ∈ F ( ), ∃ ∈ SOL s.t. g ε f . This rule allows an early elimination of uninteresting labels while keeping near admissibility of the algorithm provided heuristic H is admissible. Indeed, if H is admissible, then for all f ∗ ∈ F ∗ ( ) there exists f ∈ F ( ) such that f = g + h p g + h∗ = f ∗ . Hence g ε f implies that g ε f ∗ . This pruning rule can be inserted in the multiobjective search algorithm by substituting line 06 by: 06 if ∃f0 ∈ F ( 0 ) s.t. not(f ( ) ε f0 ) ∀ ∈ SOL Despite MOA∗ε does not provide complexity guarantees, it outperforms significantly the exact version.
494
P. Perny and O. Spanjaard / Near Admissible Algorithms for Multiobjective Search
Remark 1. The following weaker relaxation of the pruning condition in R2 can be used in the FPTAS: k 06 if ∃f0 ∈ F ( 0 ) s.t. not(f ( ) p (1 + ε) L f0 ) ∀ ∈ SOL where k is an upper bound of the length of the longest path from n0 to a goal. This is the case in the implemented version.
5
NUMERICAL EXPERIMENTATIONS
To investigate the potential of approximation, we tested our algorithms on two multiobjective combinatorial problems. Biobjective binary knapsack problem. Given a set {1, . . . , n} of items j, each item having a weight wj and a profit pij according to every objective i, one searches a minimal p -covering of combinations of items that can be put into a knapsack of capacity b (i.e., theP total weight of thePitems cannot run over b): n max n j , max j=1 p1j xP j=1 p2j xj n subject to w x j=1 j j ≤ b xj ∈ {0, 1} ∀j ∈ {1, . . . , n} where xj = 1 iff one chooses to put item j in the knapsack. The state space has been defined such that all solution-paths share the same length n. This enables to apply the FPTAS with L = n. The heuristic evaluations used to order and prune the search derive from the upper bound of Martello and Toth [9] for the single objective version. The MOA∗ , FPTAS and MOA∗ε algorithms have been implemented in JAVA and were run on a Pentium 4 3.60GHz PC. Table 1 shows computation times (in sec) obtained on 35 random instances of size n, with profits and weights randomly drawn in [1, 100], and a capacity b bounded to 50% of the total weight of the items. These results show that the relaxation of the optimality condition significantly speeds up the search, with faster results when using MOA∗ε . We have also studied the behavior of MOAε when setting ∀j p1j = 2j , p2j = 2n − 2j and wj = 1, which yields instances where all combinations of b items are non-dominated with distinct profits on the first objective. In the first line of Table 2, we indicate the execution times of MOA∗ , and in the second line the number #sol of non-dominated solutions (which grows exponentially with n). Both approximation algorithms return a ε -covering in less than one second for all ε in {0.005, 0.01, 0.05}. For each value of ε we give the size of the returned ε -covering. It shows that the choice of ε allows the size of the output set to be controlled, as well as the computation times. n
30
40
50 60 70 80 MOA∗ time 0.397 1.879 11.31 43.66 215.2 457.7 ε FPTAS 0.005 0.353 1.514 7.922 29.90 127.8 226.5 0.297 1.077 4.842 18.29 65.91 97.26 0.01 0.05 0.046 0.036 0.065 0.331 0.555 0.393 0.003 0.001 0.001 0.002 0.001 0.001 0.1 ε MOA∗ε 0.005 0.315 0.940 4.225 18.58 62.37 110.4 0.179 0.364 1.389 9.294 19.75 35.11 0.01 0.05 0.008 0.007 0.013 0.064 0.065 0.075 0.001 0.001 0.001 0.001 0.001 0.001 0.1 Table 1. Numerical results on the biobjective knapsack. n 15 16 17 18 19 20 time 0.495 3.328 1.895 44.08 17.55 711.9 6.103 1.104 2.104 5.104 9.104 2.105 # sol 0.005 31 28 28 25 25 22 0.01 16 14 15 13 13 11 3 3 4 3 3 3 0.05 Table 2. Pareto approximation on pathological instances.
Multiobjective shortest path problem. In order to study the interest of approximation when the number of objectives grows, we have per-
formed experimentations of MOA∗ε on the multiobjective path problem. We have generated different classes of instances by controlling the number of nodes |N | = 1000, 2000, 3000 and the number of objectives m = 2, 5, 10. Cost of arcs are randomly generated within [1, 100]. The approximations have been computed with ε = 0.1. Table 3 gives, in each class of instances, the average execution time (in sec) obtained on 20 different instances. These performances illustrate that approximation remains powerful when the number of objectives grows. As a comparison, the exact determination (with MOA∗ ) of the Pareto set on instances with 1000 nodes and 10 criteria required more that one hour on the same computer.
Table 3.
6
|N | 1000 2 obj 0.078 0.175 5 obj 10 obj 0.447 Times for MOA∗0.1
2000 3000 0.295 0.751 0.761 1.901 2.474 7.268 on the shortest path problem.
CONCLUSION
We have proposed two approximation algorithms for multiobjective search. The first one is an FPTAS which requires that an upper bound on the length of a solution-path is known, while the second one does not provide guarantee on the worst case complexity, but performs better in practice without requiring any information on the length of solution-paths. Both algorithms outperform exact multiobjective search in times. Note that the approximate Pareto set can include dominated solutions (although close to optimality). An interesting research direction is therefore to look for algorithms able to compute approximate Pareto set including only non-dominated solutions. Another possible extension of this work is to study the use of εdominance to approximate more involved preference models.
REFERENCES [1] E. Angel, E. Bampis, and A. Kononov, ‘On the approximate tradeoff for bicriteria batching and parallel machine scheduling problems.’, Theor. Comput. Sci., 306(1-3), 319–338, (2003). [2] T. Erlebach, H. Kellerer, and U. Pferschy, ‘Approximating multiobjective knapsack problems’, Manag. Science, 48(12), 1603–1612, (2002). [3] K. Fujimura, ‘Path planning with multiple objectives’, IEEE Robotics and Automation Magazine, 3(1), 33–38, (1996). [4] M. Ghallab, ‘Aε : an efficient near admissible heuristic search algorithm’, in Proc. of the 8th IJCAI, pp. 789–791, (1983). [5] P. Hansen, ‘Bicriterion path problems’, in Multicriteria Decision Making, eds., G. Fandel and T. Gal, (1980). [6] P.E. Hart, N.J. Nilsson, and B. Raphael, ‘A formal basis for the heuristic determination of minimum cost paths’, IEEE Trans. Syst. and Cyb., SSC-4 (2), 100–107, (1968). [7] M. Laumanns, L. Thiele, K. Deb, and E. Zitzler, ‘Combining convergence and diversity in evolutionary multiobjective optimization’, Evolutionary Computation, 10(3), 263–282, (2002). [8] L. Mandow and J.-L. P´erez-de-la Cruz, ‘A new approach to multiobjective A* search.’, in Proc. of the 19th IJCAI, pp. 218–223, (2005). [9] S. Martello and P. Toth, ‘An upper bound for the zero-one knapsack problem and a branch and bound algorithm’, European J. of Operational Research, 1, 169–175, (1975). [10] C. H. Papadimitriou and M. Yannakakis, ‘On the approximability of trade-offs and optimal access of web sources’, in Proc. of the 41th IEEE Symp. on FOCS, pp. 86–92, (2000). [11] J. Pearl and J.H. Kim, ‘Studies in semi-admissible heuristics’, IEEE Trans. on PAMI, 4(4), 392–400, (1982). [12] B.S. Stewart and C.C. White III, ‘Multiobjective A*’, J. of the Association for Computing Machinery, 38(4), 775–814, (1991). [13] G. Tsaggouris and C. Zaroliagis, ‘Multiobjective optimization: Improved FPTAS for shortest paths and non-linear objectives with applications’, in Proc. of the 17th ISAAC, pp. 389–398, (2006). [14] A. Warburton, ‘Approximation of pareto optima in multiple-objective shortest-path problems’, Operations Research, 35(1), 70–79, (1987).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-495
495
Compressing Pattern Databases with Learning Mehdi Samadi1 and Maryam Siabani2 and Ariel Felner 3 and Robert Holte1 Abstract. A pattern database (PDB) is a heuristic function implemented as a lookup table. It stores the lengths of optimal solutions for instances of subproblems. Most previous PDBs had a distinct entry in the table for each subproblem instance. In this paper we apply learning techniques to compress PDBs by using neural networks and decision trees thereby reducing the amount of memory needed. Experiments on the sliding tile puzzles and the TopSpin puzzle show that our compressed PDBs significantly outperforms both uncompressed PDBs as well as previous compressing methods. Our full compressing system reduced the size of memory needed by a factor of up to 63 at a cost of no more than a factor of 2 in the search effort.
1 Introduction and Overview States in a search space are often represented using a set of state variables. An abstraction of the search space, called the pattern space, can be defined by only considering a subset of the state variables (called the pattern variables). A pattern is a state of the pattern space which has an assignment of values to the pattern variables. A state in the original space is mapped to a pattern by ignoring the state variables in that are not pattern variables. A pattern database (PDB) stores the distance of each pattern to the goal pattern. The value stored in the PDB for is a lower bound on the distance from to the goal state, and thus serves as an admissible heuristic for searching in the original search space. A PDB contains one entry for each pattern in pattern space. In general, the more entries a PDB contains, the more accurate it is as a heuristic, and the more efficient is the search that uses the PDB as a heuristic. The drawback of large PDBs is the amount of memory they consume. One approach to mitigating this problem is to compress the PDB. For example, Felner et al. [3] compress a PDB by simply merging several highly correlated (usually adjacent) entries into one. They achieved a significant improvement on the 4-peg Towers of Hanoi and the TopSpin problems but only limited success for the sliding tile puzzles. The main drawback of that work is that the rule for deciding which PDB entries to merge was fixed throughout the entire compressing process. Higher degree of compression significantly degrades the performance. We introduce a new, general and flexible compression method for PDBs that is experimentally shown to improve uncompressed PDBs as well as the compression methods reported in [3]. Improvement takes the form of either reducing the amount of memory required for the PDB without substantially increasing the number of generated nodes, or reducing both the memory required and the number of gen1 2 3
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8, {msamadi,holte}@cs.ualberta.ca Electrical and Computer Engineering Department, Isfahan University of Technology, Isfahan, Iran, siabani@ec.iut.ac.ir Information Systems Engineering Dept.,Deutsche Telekom Labs,Ben Gurion University,Beer-Sheva, Israel, felner@bgu.ac.il
erated nodes. The main idea underlying our work is to use techniques from the machine learning literature to compress PDBs. In particular, we train an artificial neural network (ANN) so that it can be used instead of the PDB. The neural network requires almost no memory. However since the ANN’s output is not guaranteed to be less than or equal to the PDB value (admissible), we use additional storage (in the form of a hash table) for all the patterns whose value is overestimated by the ANN. This basic idea is then improved with two steps. Decision trees and a PDB-partitioning method are used to separate the PDB entries into smaller subgroups with similar characteristics and then training separate ANNs for each subgroup. We tested our compression system on three search spaces: the 15puzzle, the 24-puzzle and TopSpin. Our results show that our full compression system requires up to 63 times less memory than the original PDB while increasing the number of nodes generated by no more than a factor of two. The modest increase in search effort is not a concern because the freed-up memory can be used in ways that are known to substantially speed up search, e.g., for additional PDBs [6], and/or for memory-based search algorithms such as A*, perimeter search or memory-enhanced IDA*. We do not actually implement any of these techniques in this paper, but are confident that they would more than compensate for the small increase in search effort caused by our compression technique.
2 Related Work Symbolic PDBs [1] use binary decision diagrams (BDDs) to store a PDB, and have been shown, for some search spaces, to significantly reduce the memory needed to store the PDB entries compared to traditional PDB tables. However, a recent unpublished study of symbolic PDBs on a wide range of search spaces has shown that symbolic PDBs do not always result in compression, they sometimes require more memory than a table. In particular, symbolic PDBs for the 15-puzzle require more memory than the traditional PDB representation, whereas the experiments below show that our method greatly reduces the memory required. The idea of using learning and classification techniques in heuristic search was suggested before. In [9], a feature vector for partitioning the state space to a number of classes is used. Learning techniques were used for each class. In [10] the state space was partitioned based on feature vector and then ”generalized heuristic information” is learned for each class. These ideas were only applied to small domains and in contrast to our approach did not find the optimal solution. Recently, multi-layered ANN was used to represent heuristics for the 15-puzzle [2]. Given a training set of states descriptions together with their optimal solution, a learning system that predicts the length of the optimal solution for an arbitrary state was built. They biased their predicted values towards admissibility but unlike our approach their system returned suboptimal solutions in about 50% of the cases.
496
M. Samadi et al. / Compressing Pattern Databases with Learning
Reverse circle 18 17
19 20 1
2
3 4
16
5
15
6
14
7 13 12 11 10 9
1
2
3
4
1
2
3
5
6
7
8
9
5
6
7
10
11 12
13
14
8
9
10
11
15 16
17
18
19
12
13 14
15
20
22
23
24
4
8
21
Figure 1. The Top-spin and Sliding tile puzzles.
3 Search domains The sliding tile puzzles such as the 15- and 24- puzzles (shown in Figure 1) have been used as benchmark problems in many previous papers. For clarity, we describe all our methods in the context of the sliding tile puzzle, but our ideas are general and can be applied to other problems as well. In our representation of the sliding tile puzzle, the variables are the tiles and the values are their locations. The best existing method for solving the sliding tile puzzles optimally uses additive PDBs [4]. The tiles are partitioned into disjoint sets, and a PDB is built for each set. The PDB stores the cost of moving the tiles in the pattern set from any given arrangement to their goal positions, counting the moves of the pattern tiles. Under such circumstances the sum of the values from different disjoint PDBs is an admissible heuristic [4]. We use the notation − − to denote a partitioning of the tiles into three disjoint groups with , and tiles in each group, respectively. The -TopSpin puzzle has tokens arranged in a ring. Any set of 4 consecutive tokens can be reversed (rotated 180 degrees in the physical puzzle). Our encoding of this puzzle has operators, one for each possible reversal. In TopSpin more than one object is moved in each move and simple additive PDBs are not applicable here. The standard way to build a PDB for this domain is to specify a set of pattern tokens, and to treat the remaining tokens as if they were indistinguishable from one another4 .
4 Augmented compression We introduce a method that compresses PDBs using learning techniques and preserves the admissibility property. Our system includes three independent steps, ANN learning , Decision tree classification and pattern partitioning and we describe each of them in turn.
4.1 Compression with ANN Our first idea is to build an ANN that learns the PDB. Assume a PDB for the tile puzzle which consists of the set of tiles, . The different patterns are the different ways to permute all the tiles in into the state space. Each pattern has an entry ( ) which stores its heuristic value. We want to build a learning system that will be able to predict ( ) for each given pattern . For this we use Artificial Neural Networks (ANN) [8], a well-known learning technique. Multi Layer Perceptron (MLP) neural network (ANN) with standard modified back-propagation [8] algorithm is used for the prediction. This system is called the basic ANN compressing in this paper.
4.1.1 Feature selection We use two types of features for the ANN: 1) Description of the pattern: Each tile in is a feature and its position in pattern is the value for that feature. 4
Since this puzzle is cyclic, we can assume that token number 1 is always in fixed position. Thus, for implementation, the total number of states can be reduced by a factor of N.
2) Heuristic vector: We also construct smaller PDBs each for a subset of tiles i ⊂ . We denote the corresponding PDB heuristic for a given pattern as i ( ). Note that each i ( ) is admissible for . We define a heuristic vector for pattern as ( ) = 1( ) 2( ) K ( ). Each member of the heuristic vector i is also used as a feature and i ( ) is the value of that feature. For example, for a 6-tile PDB we used two different 2-4 partitionings to a total of 4 smaller PDBs that are used in the heuristic vector. The heuristic value of in the original PDB is the target function.
4.1.2 Training and using the ANN The ANN is trained by iterating over all the entries of the original PDB. For each pattern we construct its different features and feed them to the ANN coupled with ( ) (the PDB heuristic for ). Once the training process ends, we can delete the original PDB from memory. Only the smaller PDBs which make up the heuristic vector are left in memory. Similarly, the ANN itself is also kept in memory. Then, during the search, given a pattern , we calculate its features (e.g., by looking up the smaller PDBs). The values of these features are given an input to the ANN and the output ,denoted as ( ), is used as the heuristic value for pattern .
4.1.3 Correcting overestimations Training an ANN to predict the exact desired value is NP-complete [7]. Thus, learning systems in general and ANNs in particular are not completely accurate by nature as they can deviate from the real value for many of the instances. With heuristics, lower deviations are not a problem as an admissible heuristic should be a lower bound. However, if the ANN is overestimating, the heuristic will no longer be admissible and non-optimal solutions might be returned. We solved this problem as follows. After the ANN was built, we iterated again on the entire set of patterns (as a test set). Each pattern whose () ( ) is inserted into a hash table together with its correct heuristic value ( ). During the search, we first check to see whether ∈ . If indeed ∈ we use its heuristic value stored in and do not even consult the ANN. As shown below, for well trained ANNs the set of overestimating patterns is small and so is the memory requirements of . Traditionally, the training phase is stopped when the mean square error (MSE) of the training data is below a predefined small threshold. For our case it is defined as: = Σt∈T ( ( )2 ) | |, where is the training set, ( ) = ( ) − ( ), is the original PDB value, and is the learned function. ( )2 is symmetric, so overestimation and underestimation have the same cost. Using this function with an ANN results in a heuristic that tries to be close to the optimal value without discriminating between being under (acceptable) or over (undesirable). We modified the error function to penalize positive values of ( ) (overestimation), biasing the ANN towards producing admissible 1 values. We used ( ) = ( + 1+exp(−bE(t)) ) ( ) instead of ( ) in the MSE calculations. The constants and were determined experimentally. ( ) reduces the number of overestimating instances by a factor of 4 (over ( )) and was used in our experiments.
4.1.4 Experimental results In this section, we evaluate our compression system on the 6-6-3 (of tiles (4-9), (10-15) and (1-3)) additive PDB for the 15-puzzle; additional evaluation of the final system is given in Section 5. The compression technique is applied to the two 6-tile PDBs individually;
M. Samadi et al. / Compressing Pattern Databases with Learning
Heuristic 6-6-3 (4-2)2 -(4-2)2 -3 DIV 2 basic ANN ANN+DT ADP
AvH 40.06 37.84 38.88 39.20 39.75 39.90
Nodes 6,323,187 50,818,284 19,204,184 11,676,726 9,550,754 7,285,207
Time 2.39 19.71 7.92 10.44 5.42 4.62
Mem 11.00 0.18 5.50 1.30 0.84 0.50
Hash 8% 4% 2%
Table 1. Results for the 6-6-3 PDBs of the 15-puzzle.
the 3-tile PDB is very small and is left uncompressed. The heuristic vector for each 6-tile PDB contains four values, which are created by using two sets of additive 4- and 2-tile PDBs. Table 1 shows the results. All the values shown are averages over the first 100 random initial states used in [4]. The first column is the heuristic used. The next four columns present the average initial heuristic value, the number of nodes generated by IDA*, the average time (in seconds), and the amount of memory used (in Megabytes). The time needed to precompute the PDB and train the ANN is not included in the times reported. This is standard, since these operations are done just once, no matter how many problems are solved. The final column shows the percentage of entries in the original PDB that were stored in the hash tables because the ANNs overestimated their value. The first row presents the results of using the normal 6-6-3 PDB. The second row shows the results of directly using the PDBs that make up the heuristic vector inside our ANN system. The superscript 2 in the heuristic description indicates the use of two sets of 4-2 additive PDBs for each 6-tile PDB in the 6-6-3. The maximum value of the two sets is used instead of the 6-tile PDB value. The third row shows the results of using the 2 method for compressing the 6-tile PDBs used in [3]. In this method, adjacent PDB entries are replaced by a single entry. The fourth row (basic ANN) is for the ANN system just described. The total memory for this system is dominated by the memory needed for the hash table; the memory needed for the small PDBs used in the heuristic vector is small (reported in row 2), and the memory needed for the ANN itself is negligible. The last two rows are for the enhanced ANN systems described below. Again, the total memory needed for them is mostly needed by the hash tables. The direct use of the smaller PDBs that make up the heuristic vector of our ANN (row 2) dramatically reduces the memory but increases the number of generated nodes by an order of magnitude. By contrast, our basic ANN technique reduces the amount of memory by a factor of 9 while increasing the number of generated nodes by a factor of only 1.84. This is a significant improvement over the 2-fold memory reduction of 2 compressing technique [3] which was achieved at the cost of 3 times more generated nodes5 . In all the results of this paper the constant CPU time per node favors simple PDB construction. While we efficiently implemented all our learning techniques, they could probably be made more efficient and better optimized. We decided to also report the CPU time but it should be taken with care.
4.2 Using decision tree to classify data A major problem of using ANN for predicting the value of PDBs is the size of the hash table used to store the patterns with overestimating ANN values. To address this we first construct a decision tree 5
In [3] sparse (multi dimensional array) mapping was used for the PDBs and thus the DIV method compressed cliques. Here, we used their more realistic compact mapping (a single dimensional array). The DIV method does not compress cliques here and its performance is worse than DIV for sparse mapping. See [3] for more details.
497
(DT) which classifies the patterns into two types. The ANN is only used for one type while the other type will consult smaller PDBs. As described above, each PDB is partitioned into smaller disjoint PDBs. For example a 6-tile PDB, 6 is partitioned into two disjoint PDBs 2 and 4 . We want to classify the 6-tile patterns into two classes: equal and larger. A pattern is classified as equal if 6( ) = 2 ( )+ 4 ( ). It is classified as larger if 6( ) 2( ) + 4 ( ). Patterns in the equal class need only to consult the smaller 2- and 4-tile PDBs and add their values. For patterns in the larger class 6 has knowledge about additional moves (over the sum of the two smaller PDBs) that are needed. Thus, the ANN is built to learn these additional moves. The benefit of using the DT before the ANN is twofold. First, it is sufficient to train and use the ANN for patterns in the larger group only. Thus, the ANN can be made more accurate as it needs to learn the behavior of a special class of patterns only - the ones whom their PDB values were larger than the sum of the smaller PDBs. Second, for the equal group, there is no need to pass through the complex network of the ANN and consulting the smaller PDBs is enough. Note that deepening down a decision tree is rather cheap as it is usually implemented as a series of nested − − statements. Adding the DT proved useful. For example, for the 6-6-3 PDBs of the 15 puzzle, nearly 58% of the patterns were classified as equal6 and only 42% are larger patterns that trained the ANN.
4.2.1 Building the Decision Tree A decision tree is built by examining various attributes of the training data. The entire set of features used by the ANN (described above) were used as attributes for the DT and the entire set of patterns were used to train and build the DT. We used ID3 [8], a common algorithm for building DT. Classic ID3 stops growing the DT when each leaf contains items that should be classified to one class only. Since we had a very large set of patterns we stopped growing the tree as soon as the percentage of patterns of one of the groups (larger or equal) in the given tree node exceeded a predefined threshold 1 (classic ID3 uses 1 = 100%). The exact value for 1 was determined experimentally for the various domains. Similarly, once the number of patterns in a node was smaller than another threshold 2 we stopped growing the DT. In nodes with mixed patterns we used the majority function to determine the class of this node.
4.2.2 Misclassification of the decision tree Because of the early stopping condition, some of the patterns can be misclassified by the decision tree. There is no problem if a pattern of the larger group was misclassified as equal. In this case, we use the sum of the smaller PDBs. This value is admissible but might be smaller than the real value of the larger PDB. The other direction is more problematic. Here equal patterns were misclassified as larger. This will cause the ANN to have such patterns in its training set. But, recall that to preserve admissibility, all patterns with overestimated ANN values are stored in a hash table so admissibility is kept.
4.2.3 Experimental results Line 5 in Table 1 shows the results of using the ANN+DT to compress the 6-6-3 additive PDB of the 15-puzzle. It shows that augmenting the basic ANN with the DT technique reduces the number 6
In fact, as described earlier, we had two sets of smaller PDBS. We classified a pattern as equal if its heuristic was equal to the maximum of the sums of the two sets of smaller disjoint partitioning.
M. Samadi et al. / Compressing Pattern Databases with Learning
of nodes generated by roughly 20% (from 11,676,726 to 9,550,754) and reduces the memory requirements by 35% (from 1.3 to 0.84 Megabytes). The ANN now only handles the larger patterns. Not only it has fewer patterns to classify, these patterns have similar attributes. This allows it to be more accurate for the same amount of training. Consequently, the hash tables can be smaller because fewer patterns have their values overestimated by the ANN. Indeed the hash table percentage dropped from 8% to 4%.
4.3 Partitioning the patterns into groups (PART) To properly train the ANN to have a reasonable error range it is necessary to feed it with the entire set of training instances at least 500 times. This can increase the total amount of training time especially if very large PDBs are used where data is saved on the disk. To address this, we add another step before building the DT. The basic idea is to partition the patterns into smaller groups (for very large PDBs this can be done in the disk) and then (load each group into the memory and) build a separate DT+ANN system for each group. In order to classify these groups we use a smaller heuristics (e.g., members of the heuristic vector). We call them the pivot heuristics. We then classify the patterns according to the values of the pivot heuristics, For example assume that two members of the heuristic vector 1 and 2 are used. Pattern with 1 ( ) = and will belong to the group labeled . Each such group 2( ) = contains patterns with similar attributes as they had similar values for the pivot heuristics. Each group will have a separate DT+ANN and the prediction will be more accurate due to similarities of the patterns inside each group. Another advantage is that for very large PDBs, which cannot be stored in memory, we can partition the large PDB into smaller groups which can fit in memory. Then, we build a DT+ANN for each group. Our full system of ANN+DT+Partitioning will be refereed to as ADP in the reminder of this paper. Line 6 in Table 1 shows the results of using the full ADP system to compress the 6-6-3 PDBs. We used exactly the same heuristic vector as used for previous lines. Partitioning is done based on two heuristic values, each is the sum of 2- and 4-tile PDBs. Augmenting the ANN+DT system with the partitioning technique reduces the number of nodes generated by roughly 25% (from 9,550,754 to 7,285,207) and reduces the memory requirements by 40% (from 0.84 to 0.5 Megabytes). Compared to the original 6-6-3 PDB (line 1 in Table 1), ADP compression reduces the memory required by over 95%, while increasing the number of nodes generated by only 15%. It also significantly outperforms the 2 compression method of [3] in all aspects - nodes, time and memory.
4.4 The general framework for ADP To summarize, the following preprocessing steps should be taken to build the full three-step ADP learning system: • Create the original PDB. • Create small PDBs for the heuristic vector and choose the pivot heuristics. • Partition the patterns of the original PDB into small groups according to values of the pivot heuristics. • Create a DT for each group of the partition and classifying patterns to equal or to larger. • Train an ANN for patterns that were classified as larger. • Test the ANN and build the hash table for overestimating patterns. To obtain a heuristic value for a state the following:
during the search we do
• Extract the values of the heuristic vector for and we find appropriate group according to the pivot heuristics. • Traverse the relevant DT and see if it is a larger or equal node. • If it is an equal node, add up the smaller PDB heuristics. If it is a larger node consult the relevant hash table and the relevant ANN and retrieve the heuristic value.
5 Experimental results We now present additional experimental results for the full ADP system for the the 15- and 24-puzzles and for the TopSpin puzzle.
5.1 15-Puzzle ADP was used to compress the 7- and 8-tile PDBs of the 7-8 additive PDB for the 15-puzzle (used in [4]). The two PDBs were compressed individually. The heuristic vector for each consisted of four values based on two 6−2 additive PDBs, for the 8-tile PDB and on 6−1 for the 7-tile PDB. These heuristics are also used as the pivot heuristics. Heuristic 7-8 7-6-2 DIV 2 ADP (6-1)2 -(6-2)2 ADP (4-3)2 -(4-4)2 Table 2.
AvH 44.08 41.70 42.43 43.03 41.96
Nodes 157,553 1,486,038 950,473 307,332 899,516
Time 0.07 0.54 0.33 0.21 0.57
Mem 549 61 274 46 16
Hash 0 0 0 2.9% 2.2%
ADP compression of the 7-8 additive PDB.
Table 2 presents the results in the same format as Table 1. The first row presents the results when using the normal, uncompressed additive 7-8 PDB. The second row is for an uncompressed 7-6-2 additive PDB. The next row is for the DIV 2 compressing of [3]. The next row is for ADP using heuristic vectors containing two 6 − 1 additive PDBs for the 7-tile PDB and 6−2 for the 8-tile PDB. The final row is for ADP using heuristic vectors containing two 4 − 3 additive PDBs for the 7-tile PDB and two 4 − 4 additive PDBs for the 8-tile PDB. The last two lines show that varying the PDBs used in ADP’s heuristic vector produces an interesting time-space tradeoff. However, both of these systems use less memory than the uncompressed 7-6-2 additive PDB and the DIV 2 compressing method and generate significantly fewer nodes. The ADP with (6−1)2 −(6−2)2 was even faster in CPU time. Compared to the state-of-the-art uncompressed 7-8 PDB, this ADP reduces the memory required by over 90%, at a cost of less than doubling the number of nodes generated. 100 Nodes (in Millions - log scale)
498
Regular PDB ANN+DT+PART 10
1
0.1 0
20
40 60 80 Memory (Megabytes)
100
120
Figure 2. Nodes generated by both compression and regular systems.
Figure 2 brings together the data for uncompressed PDBs (solid line) and compressed PDBs using ADP (dashed line) from Tables 1 and 2 in order to compare the number of generated nodes as a function of the memory used. It also includes two data points not shown in those tables of 7-7-1 additive PDBs. This figure clearly
M. Samadi et al. / Compressing Pattern Databases with Learning
shows that for any given amount of memory it is far better to use a compressed PDB than a regular uncompressed PDB.
5.2 24-puzzle The best existing heuristic for 24-puzzle uses a 6-6-6-6 additive PDB, and takes the maximum of the normal PDB lookup (r), its reflection about the main diagonal (r*), the dual lookup (d), and the reflection of the dual (d*) [5]. All values for regular lookup can be extracted from two 6-tile PDBs. For dual lookup, we need six additional PDBs [5]. ADP is applied to all these 6-tile PDBs. As in the 15-puzzle, the heuristic vector for the 6-tile PDB contains two additive 4-2 PDBs and they were also used for the partitioning step. PDB Lookups r,r* r,r*,d,d* r,r* (ADP) r,r*,d,d* (ADP)
Nodes 43,454,810,045 13,549,943,868 69,527,696,072 19,781,408,283
Table 3.
Time 15,861 8,441 31,843 15,971
Mem 244 972 4 37
Hash 0 0 1.6% 1.9%
Results for the 24-puzzle.
Table 3 shows the experimental results. The values are averages over the first 25 random instances used in [5]. Lines 1-2 are for the uncompressed PDBs, lines 3-4 are for the compressed PDBs. The first line in each group shows the results when only the regular lookup and its reflection are done. The second line in each group shows the results when the dual lookup and its reflection are done in addition to the regular lookups. By using two lookups, ADP decreased the size of PDB by a factor of 63 while increasing the number of nodes generated by only a factor of 1.6. For four lookups ADP decreased the size of PDB by a factor of 27 while increasing the number of generated nodes only by a factor of 1.45.
5.3 Top-spin We also applied the ADP system on the -TopSpin puzzle. A PDB of tokens has actually N different ways of being used. A PDB of tokens [1 ... ] can also be used as a PDB of [2... +1], [3... +2], etc. with the appropriate mapping of tokens. Thus, a single PDB allows up to different lookups. In separate experiments we applied ADP to a 9-token PDB and a 10-token PDB for the 17-TopSpin. The heuristic vector in each case contained 3 values corresponding to 3 different lookups in a PDB based on 7 tokens, for the 9-token PDB, and based on 8 tokens for the 10-token PDB. The partitioning of the 9- and 10-token PDBs useed all the PDBs from their heuristic vectors. PDB
AvH
9 8 9 MOD 9 ADP
10.61 9.58 9.30 9.97
9 8 9 MOD 9 ADP 10 10 ADP
10.96 10.01 9.68 10.20 11.94 11.32
Nodes Time 1 Lookup 43,496,120 74.18 394,922,925 589.10 61,709,097 104.38 48,335,470 97.44 2 Lookups 664,966 1.62 5,777,064 11.29 6,489,343 14.71 1,475,642 4.29 84,772 0.21 194,252 0.92
Table 4.
Mem
Hash
494 54 54 48
0 0 0 2.6%
494 54 54 48 3,959 484
0 0 0 2.6% 0 2.4%
Results for (17,4)-TopSpin.
The experimental results are shown in Table 4, where each value is an average over a set of 100 random instances. Lines 1-4 show
499
the results of solving 17-Topspin if just one lookup is made in the PDB, while rows 5-8 show the results if two lookups are made. The first two lines in each group show the results of using an uncompressed 9-token or 8-token PDB. The third line shows the results of the best compression technique used in [3] for 17-TopSpin, which compresses the table for the 9-token PDB using the MOD operator. The final row in each group is for our ADP compression technique. For both one and two lookups ADP clearly generates fewer nodes than the other techniques with similar amount of memory (the 8token PDB and the 9-Token PDB compressed by the MOD operator. With two lookups it was even faster in CPU time. In fact, when two lookups are made the MOD method actually generates more nodes than an uncompressed PDB of the same size, the 8-token PDB. The last two rows show the results of using regular and compressed 10-token PDB and with two lookups. ADP reduces the memory required by 87% while increasing the number of generated nodes by a factor of 2.3. The compressed version of the 10-token PDB requires slightly less memory than the uncompressed 9-token PDB but generates only 30% of the nodes and 56% of the time.
6 Summary and Conclusions We presented a new technique that better utilizes memory by compressing PDBs with learning techniques and we applied it to different domains. A three step mechanism to construct the system was introduced but any subset of them can be separately used. A significant reduction in memory was achieved over the uncompressed PDB at a cost of a small increase in the search effort. Furthermore, our compressing idea usually outperforms previous compressing techniques in both memory and number of nodes and many times in CPU time as well. For a given amount of memory it is beneficial to use our compressing technique over uncompressed PDB of the same size. An advantage of our system is that PDBs that are much larger than the available memory can be generated on disk and can be compressed to fit the memory. In fact, we used this method to compress 10-token PDB for 17-TopSpin. Future work will continue these ideas as follows. First we would like to compress much larger PDBs as well as trying to solve larger versions of these puzzles. Second, other classifier techniques (like oblique tree and SVM [8]) might perform better that the ADP system. Finally, this approach can be applied in compressing PDBs in planning domains.
REFERENCES [1] S. Edelkamp, ‘Symbolic pattern databases in heuristic search planning’, AIPS, 274–293, (2002). [2] M. Ernandes and M. Gori, ‘Likely-admissible and sub-symbolic heuristics.’, in ECAI, pp. 613–617, (2004). [3] A. Felner, R. Korf, R. Meshulam, and R. Holte, ‘Compressed pattern databases’, JAIR, 30, 213–247, (2007). [4] A. Felner, R. E. Korf, and S. Hanan, ‘Additive pattern database heuristics’, JAIR, 22, 279–318, (2004). [5] A. Felner, U. Zahavi, R. Holte, and J. Schaeffer, ‘Dual lookups in pattern databases’, in Proc. IJCAI, pp. 103–108, (2005). [6] R. C. Holte, A. Felner, J. Newton, R. Meshulam, and D. Furcy, ‘Maximizing over multiple pattern databases speeds up heuristic search’, Artificial Intelligence, 170, 1123–1136, (2006). [7] J. S. Judd, Neural network design and the complexity of learning, MIT Press, Cambridge, MA, USA, 1990. [8] T. Mitchell, ‘Machine learning and data mining’, Communications of the ACM, 42(11), 30–36, (1999). [9] George Politowski, On the construction of heuristic functions, Ph.D. dissertation, University of California at Santa Cruz, 1986. [10] S. Sarkar, P. Chakrabarti, and S. Ghose, ‘A framework for learning in search-based systems’, IEEE Transactions on Knowledge and Data Engineering, 10(4), 563–575, (1998).
500
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-500
A Decomposition Technique for Max-CSP Hach´emi Bennaceur, Christophe Lecoutre, Olivier Roussel1 Abstract. The objective of the Maximal Constraint Satisfaction Problem (Max-CSP) is to find an instantiation which minimizes the number of constraint violations in a constraint network. In this paper, inspired from the concept of inferred disjunctive constraints introduced by Freuder and Hubbe, we show that it is possible to exploit the arc-inconsistency counts, associated with each value of a network, in order to avoid exploring useless portions of the search space. The principle is to reason from the distance between the two best values in the domain of a variable, according to such counts. From this reasoning, we can build a decomposition technique which can be used throughout search in order to decompose the current problem into easier sub-problems. Interestingly, this approach does not depend on the structure of the constraint graph, as it is usually proposed. Alternatively, we can dynamically post hard constraints that can be used locally to prune the search space. The practical interest of our approach is illustrated, using this alternative, with an experimentation based on a classical branch and bound algorithm, namely PFC-MRDAC.
1
Introduction
The Constraint Satisfaction Problem (CSP) is the task of determining if a given constraint network is satisfiable or not, i.e. if it is possible to assign a value to all variables in order to satisfy all constraints. When no solution can be found, it may be interesting to identify a complete instantiation which satisfies the greatest number of constraints (or equivalently, which minimizes the number of violated constraints). This is called the Maximal Constraint Satisfaction Problem (MaxCSP). During the last decade, many works have been carried out to solve this problem (and its direct extension, WCSP). The basic (complete) approach is to employ a branch and bound mechanism, traversing the search space in a depth-first manner while maintaining an upper bound, the best solution cost found so far, and a lower bound on the best possible extension of the current partial instantiation. When the lower bound is greater than or equal to the upper bound, backtracking (or filtering) occurs. Lower bound computations of constraint violations have been improved repeatedly, over the years, by exploiting inconsistency counts [7, 11, 1, 10], disjoint conflicting sets of constraints [13], or cost transfers between constraints [3, 4, 2]. Alternative approaches (usually) combine branch and bound search with dynamic programming or structure exploitation. On the one hand, Russian Doll Search [14] and variable elimination [9] can be considered as dynamic programming methods, whose principle is to solve successive sub-problems, one per variable of the initial problem. On the other hand, structural decomposition methods [8, 12, 5] 1
Universit´e Lille-Nord de France, Artois, F-62307 Lens – CRIL, F62307 Lens – CNRS UMR 8188, F-62307 Lens – IUT de Lens – {bennaceur,lecoutre,roussel}@cril.univ-artois.fr
exploit the structure of the problems in order to establish some conditions about possible decompositions. Such methods are based on tree decomposition, provide interesting theoretical time complexities which depend on the width of the decomposition (tree-width), and are becoming increasingly successful. In [6], Freuder and Hubbe have proposed to exploit, for constraint satisfaction, the principle of inferred disjunctive constraints: given a satisfiable binary constraint network P , for any pair (X, a) where X is a variable of P and a a value in the domain of X, if there is no solution containing a for X, then there is a solution containing a value (for another variable) which is not compatible with (X, a). Using this principle, the authors show that it is possible to dynamically and iteratively decompose a problem. In this paper, we generalize this approach to Max-CSP (including the non-binary case) by exploiting the arc-inconsistency counts, associated with each value of the problem. The arc-inconsistency count (aic for short) of a pair (X, a) corresponds to the number of constraints that do not support (X, a). The aic gap associated with the variable X is the absolute difference between the two lowest arcinconsistency counts of values of X (plus 1). We show that it is possible to reason from the aic gap to obtain a condition under which we have the guarantee to obtain an optimal solution, while avoiding to explore some portions of the search space. From this reasoning, we can build a decomposition technique which can be used throughout search to decompose the current problem into simpler sub-problems, generalizing for Max-CSP the approach of [6]. It is important to remark that unlike usual decomposition methods, this approach does not depend on the structure of the constraint graph, since the decomposition can always be applied, whatever the structure of the constraint graph is. Alternatively, we can dynamically post hard constraints that can be used locally to prune the search space. Depending on the implementation, these hard constraints can participate to constraint propagation, or just impose backtracking. The paper is organized as follows. After some technical background, we introduce the central result of this paper. Then, we present two main exploitations of it: decomposition and pruning. After the presentation of some experimental results, we conclude.
2
Background
In this paper, we are dealing with the discrete CSP (Constraint Satisfaction Problem) framework. Each CSP instance P corresponds to a constraint network which is defined by a finite set of n variables {X1 , X2 , . . . , Xn } and a finite set of e constraints {C1 , C2 , . . . , Ce }. Each variable X must be assigned a value from its associated discrete domain dom(X), and each constraint C involves an ordered subset scp(C) of variables of P , called its scope, and specifies the set rel(C) of combinations of values allowed for
H. Bennaceur et al. / A Decomposition Technique for Max-CSP
the variables of its scope. |scp(C)| is called the arity of C, and C is binary if its arity is 2. A CSP instance is binary if it only contains binary constraints, and normalized if it does not contain two constraints with the same scope. Two variables are neighbours iff they both belong to the scope of a constraint. A complete instantiation is the assignment of a value to each variable. Let s denote a complete instantiation, s(X, a) is the complete instantiation obtained from s by replacing the value assigned to X in s by a. A constraint C is violated (or unsatisfied) by a complete instantiation s iff the projection of s over scp(C) does not belong to rel(C). A solution is a complete instantiation that satisfies every constraint. In some cases, the CSP instance may be over-constrained, and thus admits no such solution. We can then be interested in finding a complete instantiation that best respects the set of constraints. In this presentation, we consider the Max-CSP problem where the goal is to find an optimal solution, i.e. a complete instantiation satisfying as many constraints as possible. A Max-CSP instance is also represented by a constraint network. Given a constraint C with scp(C) = {Xi1 , . . . , Xir }, any tuple in dom(Xi1 ) × . . . × dom(Xir ) is called a valid tuple on C. A value a for the variable X is often denoted by (X, a). A constraint C supports the value (X, a) (equivalently, a value (X, a) has a support on C) iff either X ∈ / scp(C) or there exists a valid tuple on C which belongs to rel(C) and which contains the value a for X. When any value is supported by a constraint, this constraint is said (generalized) arc-consistent. For the binary normalized case, we say that a variable Y supports the value (X, a) iff either no constraint involves both X and Y , or such a constraint supports (X, a). For a binary constraint C such that scp(C) = {X, Y }, a value (X, a) is compatible with a value (Y, b) iff (a, b) belongs to rel(C). The arc-inconsistency count of a value (X, a), denoted by aic(X, a), is the number of constraints (variables for the binary normalized case) which do not support (X, a).
3
Main Theorem
In this section, we present the main result of this paper, generalizing the approach [6] developed in the context of binary CSP. Definition 3.1 Let P be a Max-CSP instance and X be a variable of P . An aic best value of X is a value a ∈ dom(X) such that aic(X, a) is minimal, i.e. ∀c ∈ dom(X), aic(X, a) ≤ aic(X, c). An aic second best value of X is a value b ∈ dom(X) such that b = a and ∀c ∈ dom(X) \ {a, b}, aic(X, b) ≤ aic(X, c). The aic gap of X is defined as δ = aic(X, b) − aic(X, a) + 1. Theorem 3.1 Let P be a Max-CSP instance, X be a variable of P , a be an aic best value of X, δ be the aic gap of X and C1 , . . . , Cm be the m constraints involving X which support (X, a). There always exists an optimal solution s∗ of P such that: • either X is assigned the value a in s∗ , • or X is assigned a value different from a in s∗ , and at least δ constraints among C1 , . . . , Cm are violated by s∗ (X, a). Proof: When P has an optimal solution where X is assigned a, the first condition is obviously satisfied and the theorem is verified. Otherwise if there is no optimal solution where X = a, let s∗ = (v1 , ..., v, ..., vn ) be an optimal solution of P , and let v be the value of X in s∗ . Let CX be the set of constraints of P involving the variable X (CX is a superset of {C1 , . . . , Cm }).
501
Assume that s∗ violates p constraints of CX and s∗ (X, a) violates q constraints of CX . Since there is no optimal solution with X = a, s∗ (X, a) necessarily violates more constraints of CX than s∗ and therefore q > p. Necessarily, we have p ≥ aic(X, v) and q ≥ aic(X, a) since arc-inconsistency counts computed wrt P represent lower bounds of aic counts obtained after assigning all variables of P . Therefore, ∃t ≥ 0, r ≥ 0 s.t. p = aic(X, v) + r and q = aic(X, a) + t. Since q > p, aic(X, a) + t > aic(X, v) + r or equivalently t > aic(X, v) − aic(X, a) + r. Since v = a, we have aic(X, v) ≥ aic(X, b) (b is an aic second best value of X) and therefore t > aic(X, b) − aic(X, a) + r. As r ≥ 0 and δ = aic(X, b) − aic(X, a) + 1, we obtain t ≥ δ. This means that at least δ constraints of P which support (X, a) involve variables whose values given by s∗ are not compatible with (X, a). Therefore, the theorem is also verified. 2 This theorem can be used in two different ways. It can be used to generate a decomposition of the Max-CSP instance or it can be exploited as a pruning rule.
4
The Decomposition Approach
The decomposition of a Max-CSP instance P around a variable X is defined as follows. Definition 4.1 Under the hypotheses and with the notations of Theorem 3.1, the decomposition of a Max-CSP instance P around the value a of` variable X generates the sub-problems P0 , P1 , . . . Pk ´ (with k = m ) defined by: δ • P0 is derived from P by assigning a to variable X • Pi (with i ∈ 1..k) is derived from P by removing a from the domain of X and restricting the assignments of neighbours of X so that at least δ of the constraints supporting (X, a) in P do not support (X, a) in Pi any more. These sub-problems may be solved independently and Theorem 3.1 guarantees that at least one of them contains an optimal solution of P . It should be noticed that this decomposition may prune some (equivalent) optimal solutions of P . As described in the definition, the sub-problems are not disjoint which means that an assignment may be a solution of several subproblems simultaneously. It is however easy to generate disjoint subproblems as will be shown in section 4.2. With m denoting the number of that support (X, a), this decomposition generates ` constraints ´ 1+ m sub-problems (when δ = 1, this number is equal to 1 + m δ and is bounded by n−aic(X, a) with n the number of variables). Although the number of sub-problems is exponential in δ, Section 4.3 proves that the search space of the different sub-problems P0 , . . . Pk is exponentially smaller than the search space of the initial problem P provided that we generate disjoint sub-problems. This means that the decomposition is always beneficial because, even if it may generate many sub-problems, they are always easier to solve globally than the initial problem.
4.1
Example
To illustrate the decomposition technique, let us consider the binary constraint network P built on {X1 , X2 , X3 } and containing the constraints {C12 , C13 , C23 }. We have dom(Xi ) = {1, 2, 3} for i ∈ 1..3, and the constraints are defined by the following tables (allowed tuples):
502
H. Bennaceur et al. / A Decomposition Technique for Max-CSP
rel(C12 ) X1 X2 1 1 1 2 3 1
rel(C13 ) X1 X3 1 1 1 2 2 1 2 3 3 2
rel(C23 ) X2 X3 1 3 3 1
An optimal solution of this Max-CSP instance violates one constraint. For example, X1 = 1, X2 = 1, X3 = 2 is an optimal solution which violates the constraint C23 . To perform the decomposition strategy, we have to select one variable and one of its aic best values. For example, (X1 , 1) is one aic best value of X1 since aic(X1 , 1) = 0, aic(X1 , 2) = 1 and aic(X1 , 3) = 0. Here, we have δ = 1. The decomposition around (X1 , 1) leads to the following independent sub-problems: P0 is derived from P by assigning X1 = 1. In P0 , dom(X10 ) = {1}, dom(X20 ) = dom(X30 ) = {1, 2, 3}. P1 is derived from P by asserting X1 = 1 and restricting the domain of X2 to the values incompatible with (X1 , 1). In P1 , dom(X11 ) = {2, 3}, dom(X21 ) = {3}, dom(X31 ) = {1, 2, 3}. P2 is derived from P by asserting X1 = 1 and restricting the domain2 of X2 to the values incompatible with (X1 , 1) and restricting the domain of X3 to the values compatible with (X1 , 1). In P2 , dom(X12 ) = {2, 3}, dom(X22 ) = {1, 2}, dom(X32 ) = {3}. Notice that the sub-problem where dom(X1 ) = {2, 3}, dom(X2 ) = {1, 2} and dom(X3 ) = {1, 2} is pruned and this subproblem contains an optimal solution of the whole problem which is X1 = 3, X2 = 1 and X3 = 2. Now, let us modify slightly the initial problem. Assume that the value 3 of X1 is incompatible with all values of X3 , then we have:
Enumerating all the sub-problems in the decomposition and ensuring that these problems are disjoint is as simple as enumerating the values of a binary counter under the constraint that at least δ of its bits must be 0. Let IYX=a be the values of domain dom(Y ) which are incompatible with (X, a) and CYX=a be the values of dom(Y ) which are compatible with (X, a). By definition, dom(Y ) = IYX=a ∪ CYX=a and IYX=a ∩ CYX=a = ∅. Clearly, sub-domains I and C form a partition of each domain and this can be used to decompose the search in a systematic way. Exhaustive search on all values of a variable Y can be performed by first restricting the domain to IYX=a and then to CYX=a . This is a binary branching. Since this can be done recursively, each branch can be represented by a binary word bY1 , . . . , bYm where bYi = 0 indicates that the domain of Yi is restricted to IYX=a and i bYi = 1 indicates that the domain of Yi is restricted to CYX=a . Exi haustive search on all values of all variables Y will enumerate the 2m binary words (from all 0 to all 1). When X = a is chosen for the decomposition of a problem P , the first sub-problem is P0 where X = a and the other sub-problems are the ones where X = a and where δ variables among the m variables Yi which support (X, a) have their domain reduced to IYX=a . i A simple solution to avoid any redundant or useless search is to use the binary branching scheme presented above. The restriction where δ variables among the m variables Yi have their domain reduced to IYX=a translates to the condition ’at least δ bits in the binary word i representing the branch must be 0’. This condition is trivial to enforce in a binary branching. Y1 0 1 1 1
rel(C13 ) X1 X3 1 1 1 2 2 1 2 3
In this case, for X1 there is only one aic best value (since aic(X1 , 1) = 0, aic(X1 , 2) = 1 and aic(X1 , 3) = 1) and so δ = 2. Thus, the decomposition leads only to two sub-problems P0 and P1 . P0 is unchanged and P1 is obtained from P by asserting X1 = 1 and restricting the domain of X2 , X3 to the values incompatible with (X1 , 1). In P1 , dom(X11 ) = {2, 3}, dom(X21 ) = {3}, dom(X31 ) = {3}. In this case we have discarded the following two sub-problems: P2 where dom(X12 ) = {2, 3}, dom(X22 ) = {3} and dom(X32 ) = {1, 2}, and P3 where dom(X13 ) = {2, 3}, dom(X23 ) = {1, 2} and dom(X33 ) = {1, 2, 3}. The sub-problem P3 contains one optimal solution of P : X1 = 3, X2 = 1 and X3 = 3. For the initial problem, the decomposition prunes 23 out of 33 possible complete instantiations while in the modified problem it prunes 16 (more than a half) of them.
4.2
Enumeration of Sub-problems
For the sake of simplicity, we now assume that constraints are binary and normalized (i.e. they all have different scopes) but the method is easy to generalize3 . When constraints are binary, ensuring that a constraint C with scp(C) = {X, Y } does not support (X, a) simply amounts to reducing the domain of Y to the values incompatible with (X, a). 2 3
This restriction is enforced to obtain disjoint sub-problems, see 4.2 This restriction just ensures that reducing the domain of a neighbour of X will affect only one constraint on X. Otherwise we have to take into account some variables more than once.
Y2 ∗ 0 1 1
Y3 ∗ ∗ 0 1
Y4 ∗ ∗ ∗ 0
Y1 0 0 0 1
(a) search with δ = 1
Y2 0 0 1 0
Y3 0 1 0 0
Y4 ∗ 0 0 0
(b) search with δ = 3
Figure 1. List of branches to explore for n = 4 and different values of δ
As an example, Figure 1 represents the branches that must be explored for two different values of δ and for n = 4 variables. For clarity, ∗ is used as a joker to represent any 0/1 value.
4.3
Some Complexity Results
Interestingly, this binary branching scheme allows to draw immediate complexity results. Assume that Y1 , . . . , Ym are the variables which support (X, a) and that Z1 , . . . , Zr are the other unassigned variables. Without applying the decomposition, an exhaustive search of the sub-problem where X = a will have to explore the Cartesian Qr product of the domains which amounts to Q m i=1 |dom(Yi )|. i=1 |dom(Zi )| complete instantiations. When the decomposition is used, at least δ variables Y must have their domain reduced to IYX=a . This means that the number of complete instantiations which are not explored amounts to: X S∈2{Yi }
Y
with card(S)<δ Yi ∈S
|IYX=a |. i
Y Yi ∈S
|CYX=a |. i
r Y
|dom(Zi )|
i=1
have the same size c and all IYX=a As an illustration, if all CYX=a i i have the same size i, the number of pruned complete instantiations simplifies as: ! r X n j m−j Y |dom(Zi )| i c j i=1 j≤δ
H. Bennaceur et al. / A Decomposition Technique for Max-CSP
WhenQδ = 1, the number of pruned complete instantiations is just i.cm−1 m i=1 |dom(Zi )|. It roughly corresponds to the size of the so-called consistent sub-problem identified in [6] for the CSP case. In any case, the number of complete instantiations that are explored when the decomposition is applied is smaller than the initial number of complete instantiations to explore (by an exponential factor in the general case).
4.4
Related Work
Classical structural decomposition methods combine tree decomposition of graphs with branch and bound search [8, 12, 5]. A tree decomposition involves computing a pseudo-tree which covers the set of variables by clusters. Two clusters are adjacent in this tree if they share some variables. An important property of tree decomposition is that the sub-problems associated with clusters may be solved independently after assigning values to the shared variables. In practice, the efficiency of decomposition methods highly depends on the structure of the constraint graph. The decomposition approach presented here, inspired from [6], proceeds differently from classical ones since the principle is to directly decompose the whole problem into independent sub-problems without computing any pseudo-tree or assigning any variable of the problem. Each sub-problem can be solved independently while in the same time, a portion of the search space of the whole problem is pruned. The downside of this method is that the number of generated sub-problems may be large. However, the decomposition does not rely on the structure of the constraint graph.
5
The Pruning Approach
Another way to exploit Theorem 3.1 is to interpret it as a pruning rule which can be integrated into any method based on tree search to solve the Max-CSP problem. Assuming here a tree search algorithm employing a binary branching scheme, at each node ν, a value (X, a) is selected, and two branches are built from ν: a left one labelled with the variable assignment X = a, and a right one labelled with the value refutation X = a. Considering the current instance at node ν, let a, δ and {Ci } be the best aic value of X, the aic gap of X and the set of constraints supporting (X, a), respectively. As soon as the left branch has been explored, one can post a hard constraint atLeastU nsatisf ied(δ, {Ci }, (X, a)) before exploring the right branch of ν. This constraint is violated as soon as it is no more possible to find in {Ci }, at least δ constraints which do not support anymore (X, a). Of course, a constraint posted with respect to the right branch of node ν must be removed when the algorithm backtracks from ν. These hard constraints, dynamically added to the instance, can be used to impose backtracking, and consequently, to avoid exploring useless portions of the search space. After each propagation phase, one can simply check that all currently posted hard constraints are still satisfied. If this is not the case, backtracking occurs. We will denote any tree search algorithm A, exploiting this approach, by AP C (Pruning Constraints). Interestingly, except for some particular search heuristics (such as the ones based on constraint weighting), we have the guarantee that A-P C will always visit a tree which is included in the one built by A. On the other hand, the additional hard constraints can also participate to constraint propagation. When for a constraint atLeastU nsatisf ied(δ, {Ci }, (X, a)), we can determine that at
503
most δ constraints of {Ci } can still be in a position of not supporting (X, a), we can impose that these δ constraints do not support (X, a), making then new inferences. For example, for a binary constraint of {Ci }, among the δ ones, involving X and another variable Y , any value of Y compatible with (X, a) can be removed. Here, we can imagine sophisticated mechanisms to manage propagation such as the use of lazy structures (e.g. watched literals). Importantly, notice that this pruning approach can be integrated into many search algorithm solving the Max-CSP problem, including hybrid ones that combine tree decomposition with enumeration.
6
Experimental Results
In order to show the practical interest of the approach described in this paper, we have conducted an experimentation on a cluster of Xeon 3,0GHz with 1GiB under Linux using the benchmark suite used for the 2006 competition of Max-CSP solvers (see http://www.cril.univ-artois.fr/CPAI06/). We have used the classical branch and bound PFC-MRDAC algorithm [11] which maintains reversible directed arc-inconsistency counts in order to compute lower bounds at each node of the search tree, and have been interested in the impact of using the PC (Pruning Constraints) approach (see Section 5). We have used here the variant that just imposes backtracking, and have not still implemented the one that allows to make inferences. We have not still implemented the decomposition approach either. Two variable ordering heuristics have been considered. The first one is dom/ddeg, usually considered for Max-CSP, which selects at each node the variable with the lowest ratio domain size on dynamic degree. The second one, denoted by dom ∗ gap/ddeg, involves the aic gap of the variables. More precisely, the ratio dom/ddeg is multiplied by the aic gap in order to favour variables for which there is a large gap between the best value and the following one. We believe that it may help quickly finding good solutions and, more specifically, increasing the efficiency of our approach. Finally, the value with the lowest aic is always selected. Notice that it can be seen as a refinement of the ic + dac counters usually used to select values. The protocol used for our experimentation is the following: for each instance, we start with an initial upper bound4 set to infinity, and record the (cost of the) best solution found (and time-stamp it) within a given time limit (here, 1, 500 seconds). Even if this protocol prevents us from getting some useful results for some instances (for example, if the same best solution is found by the different algorithms after a few seconds), it benefits from being easily reproducible and exploitable, whether the optimum value is known or not. First of all, recall that we have the guarantee that PFC-MRDACPC always visits a tree which is smaller than the one built by PFCMRDAC. It makes our experimental comparisons easier. We can then make a first general observation about the results of our experimentation. The overhead of managing PC hard constraints is usually between 5% and 10% of the overall cpu time. Since on random instances, our approach permits to only save a limited number of nodes (as expected), we obtain a similar behaviour with PFCMRDAC and PFC-MRDAC-PC. This is not shown here, due to lack of space. On the other hand, on structured instances, Table 1 presents the results on representative instances and clearly demonstrates the interest of our approach. These instances belong to academic and patterned series maxclique (brock, p − hat,san), kbtree (introduced in [5]), dimacs (ssa) and composed, and also to real-world 4
In the experimentation, Max-CSP was considered as the problem of minimizing the number of violated constraints.
504
H. Bennaceur et al. / A Decomposition Technique for Max-CSP
series celar (scen, graph) and spot. The ratio introduced in the table corresponds to the cpu of PFC-MRDAC divided by the cpu of PFC-MRDAC-PC. It is either an exact value (when both methods have found the same upper bound) or an approximate one (in this case, we use the time limit 1, 500 as a lower bound). For example, on instance spot5 − 404, we obtain 74 as upper bound with PFCMRDAC and 73 with PFC-MRDAC-PC. Since, any node visited by PFC-MRDAC-PC is necessarily visited by PFC-MRDAC, we know that at least 1, 500 seconds are required by PFC-MRDAC to find the upper bound 73. We then obtain a speedup ratio which is greater than 1, 500/99 = 15.1. Remark that, as expected, the results are more impressive when using the heuristic dom ∗ gap/ddeg (more than two orders of magnitude on some instances) which besides, often allows us to find better upper bounds. PFC-MRDAC dom/ddeg dom*gap/ddeg ¬P C P C ratio ¬P C P C ratio Academic and Patterned instances ub 184 183 184 183 brock-200-1 cpu 1, 490 706 > 2.1 3 57 > 26.3 ub 191 191 191 190 brock-200-2 cpu 638 92 > 6.9 85 201 > 7.4 ub 3 3 6 3 composed-25-1-2-1 cpu 613 332 = 1.8 19 846 > 1.7 ub 4 4 6 3 composed-25-1-25-1 cpu 92 72 = 1.2 14 1, 407 >1 ub 6 0 3 0 kbtree-9-2-3-5-20-01 cpu 996 1, 333 > 1.1 0 15 > 100 ub 13 13 14 4 kbtree-9-2-3-5-30-01 cpu 1, 037 1, 009 = 1.0 1, 177 392 >3 ub 162 160 162 160 keller-4 cpu 36 303 > 4.9 1 149 > 10.0 ub 293 293 293 293 p-hat300-1 cpu 396 76 = 5.2 1, 481 224 = 6.6 ub 493 493 493 492 p-hat500-1 cpu 1, 357 652 = 2.0 33 717 > 2.0 ub 174 173 157 155 san-200-0.9-1 cpu 1, 425 1, 287 > 1.1 0 888 > 1.6 ub 185 185 185 184 sanr-200-0.7 cpu 426 94 = 4.5 808 324 > 4.6 ub 82 73 11 2 ssa-0432-003 cpu 0 175 > 8.5 46 19 > 78.9 ub 392 390 52 49 ssa-2670-130 cpu 1 56 > 26.7 55 1, 126 > 1.3 Real-world instances ub 342 341 366 365 graph6 cpu 216 935 > 1.6 7 406 > 1.0 ub 161 159 160 160 graph8-f11 cpu 5 1, 299 > 1.1 644 56 = 11.5 ub 576 576 620 620 graph11 cpu 5 5 =1 677 70 = 9.6 ub 269 269 211 211 scen6 cpu 69 27 = 2.5 20 14 = 1.4 ub 744 744 741 741 scen10 cpu 34 34 =1 623 56 = 11.1 ub 81 81 66 66 scen11-f12 cpu 729 395 = 1.8 146 35 = 4.1 ub 215 214 133 131 scenw-06-18 cpu 8 934 > 1.6 231 442 > 3.3 ub 98 98 121 117 scenw-06-24 cpu 686 244 = 2.8 741 400 > 3.7 ub 353 353 525 524 scenw-07 cpu 1, 239 471 = 2.6 25 8 > 187.5 ub 207 206 196 196 spot5-28 cpu 0 31 > 48.3 1 1 =1 ub 52 51 49 48 spot5-29 cpu 25 305 > 4.9 807 29 > 51.7 ub 124 124 122 122 spot5-42 cpu 900 64 = 14.0 1, 157 6 = 192.8 ub 74 73 76 73 spot5-404 cpu 85 99 > 15.1 0 331 > 4.5
Table 1. Best upper bound (ub, number of violated constraints) and cpu time (to reach it) obtained with PFC-MRDAC on structured instances, with (P C) and without (¬P C) the Pruning Constraints method. The timeout was set to 1, 500 seconds per instance.
Finally, for a very limited number of these instances, we succeeded in finding an optimal value and proving optimality, given 20 hours of cpu time per instance. For example, for brock-200-2, optimality is proved when using PC in 13, 394 and 29, 217 seconds with dom/ddeg and dom ∗ gap/ddeg respectively, while optimality is not proved within 72, 000 seconds when PC is not employed. As an-
other example, the instance scenw-06-24 is solved in 18, 858 seconds with PFC-MRDAC-PC-dom ∗ gap/ddeg and in 37, 405 seconds when PC is not used.
7
Conclusion
In this paper, we have generalized to Max-CSP the principle of inferred disjunctive constraints introduced in [6] for CSP. Using the socalled aic (arc-inconsistency count) gap, we have shown that it was possible to obtain a guarantee about the obtention of an optimal solution, while pruning some portions of the search space. Interestingly, this result can be exploited both in terms of decomposition (already addressed for CSP in [6]) and backtracking/filtering (by posting hard constraints). We have shown that our approach, grafted to a classical branch and bound algorithm, was really boosting search when solving structured instances. Indeed, using PFC-MRDAC, we have noticed a speedup that sometimes exceeds one order of magnitude with the heuristic dom/ddeg and two orders of magnitude with the original dom ∗ gap/ddeg. We want to recall that dynamic programming and decomposition methods, which have recently received a lot of attention, still rely on branch and bound search. It means that all these methods may benefit from the approach developed in this paper. Finally, one perspective of this work is to extend it with respect to Weighted CSP and Valued CSP frameworks.
Acknowledgments This paper has been supported by the IUT de lens, the CNRS and the ANR “Planevo” project no JC05 41940.
REFERENCES [1] M.S. Affane and H. Bennaceur, ‘A weighted arc-consistency technique for Max-CSP’, in Proceedings of ECAI’98, pp. 209–213, (1998). [2] M.C. Cooper, S. de Givry, and T. Schiex, ‘Optimal Soft Arc Consistency’, in Proceedings of IJCAI’07, pp. 68–73, (2007). [3] M.C. Cooper and T. Schiex, ‘Arc consistency for soft constraints’, Artificial Intelligence, 154(1-2), 199–227, (2004). [4] S. de Givry, F. Heras, M. Zytnicki, and J. Larrosa, ‘Existential arc consistency: Getting closer to full arc consistency in weighted CSPs’, in Proceedings of IJCAI’05, pp. 84–89, (2005). [5] S. de Givry, T. Schiex, and G. Verfaillie, ‘Exploiting Tree Decomposition and Soft Local Consistency In Weighted CSP’, in Proceedings of AAAI’06, (2006). [6] E.C. Freuder and P.D. Hubbe, ‘Using inferred disjunctive constraints to decompose constraint satisfaction problems’, in Proceedings of IJCAI’93, pp. 254–261, (1993). [7] E.C. Freuder and R.J. Wallace, ‘Partial constraint satisfaction’, Artificial Intelligence, 58(1-3), 21–70, (1992). [8] P. J´egou and C. Terrioux, ‘Hybrid backtracking bounded by treedecomposition of constraint networks’, Artificial Intelligence, 146(1), 43–75, (2003). [9] J. Larrosa and R. Dechter, ‘Boosting search with variable elimination in constraint optimization and constraint satisfaction problems’, Constraints, 8(3), 303–326, (2003). [10] J. Larrosa and P. Meseguer, ‘Partition-Based lower bound for MaxCSP’, Constraints, 7, 407–419, (2002). [11] J. Larrosa, P. Meseguer, and T. Schiex, ‘Maintaining reversible DAC for Max-CSP’, Artificial Intelligence, 107(1), 149–163, (1999). [12] R. Marinescu and R. Dechter, ‘AND/OR Branch-and-Bound for Graphical Models’, in Proceedings of IJCAI’05, pp. 224–229, (2005). [13] J.C. Regin, T. Petit, C. Bessiere, and J.F. Puget, ‘New lower bounds of constraint violations for over-constrained problems’, in Proceedings of CP’01, pp. 332–345, (2001). [14] G. Verfaillie, M. Lemaitre, and T. Schiex, ‘Russian doll search for solving constraint optimization problems’, in Proceedings of AAAI’96, pp. 181–187, (1996).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-505
505
Fast Set Bounds Propagation using BDDs Graeme Gange and Vitaly Lagoon Department of Computer Science and Software Engineering The University of Melbourne, Vic. 3010, Australia Peter J. Stuckey NICTA Victoria Laboaratory Department of Computer Science and Software Engineering The University of Melbourne, Vic. 3010, Australia Abstract. Set bounds propagation is the most popular approach to solving constraint satisfaction problems (CSPs) involving set variables. The use of reduced ordered Binary Decision Diagrams (BDDs) to represent and solve set CSPs is well understood and brings the advantage that propagators for arbitrary set constraints can be built. This can substantially improve solving. The disadvantages of BDDs is that creating and manipulating BDDs can be expensive. In this paper we show how we can perform set bounds propagation using BDDs in a much more efficient manner by generically creating set constraint predicates, and using a marking approach to propagation. The resulting system can be significantly faster than competing approaches to set bounds propagation.
1
Introduction
It is often convenient to model a constraint satisfaction problem (CSP) using finite set variables and set relationships between them. A common approach to solving finite domain CSPs is using a combination of a global backtracking search and a local constraint propagation algorithm. The local propagation algorithm attempts to enforce consistency on the values in the domains of the constraint variables by removing values from the domains of variables that cannot form part of a complete solution to the system of constraints. The most common level of consistency is set bounds consistency [4] where the solver keeps track for each set of which elements are definitely in or out of the set. Many solvers use set bounds consistency including ECLiPSe, Gecode, and ILOG SOLVER. Set bounds propagation is supported by solvers since stronger notions of propagation such as domain propagation require representing exponentially large domains of possible values. However, [8] demonstrated that it is possible to use reduced ordered binary decision diagrams (BDDs) as a compact representation of both set domains and of set constraints, thus permitting set domain propagation. A domain propagator ensures that every value in the domain of a set variable can be extended to a complete assignment of all of the variables in a constraint. The use of the BDD representation comes with several additional benefits. The ability to easily conjoin and existentially quantify BDDs allows the removal of intermediate variables, thus strengthening propagation, and also makes the construction of
propagators for global constraints straightforward. Given the natural way in which BDDs can be used to model set constraint problems, it is therefore worthwhile utilising BDDs to construct other types of set solver. Indeed it has been previously demonstrated [5, 6] that set bounds propagation can be efficiently implemented using BDDs to represent constraints and domains of variables. A major benefit of the BDD-based approach is that it frees us from the need to laboriously construct set bounds propagators for each new constraint by hand. Moreover, correctness and optimality of such BDD-based propagators follow by construction. The other advantages of the BDD-based representation identified above still apply, and the resulting solver performs very favourably when compared with existing set bounds solvers. But set bounds propagation using BDDs still constructs BDDs during propagation, which is a considerable overhead. In this paper we show how we can perform BDD-based set bounds propagation using a marking algorithm that perform linear scans of the BDD representation of the constraint without constructing new BDDs. The resulting set bounds propagators are substantially faster than those using BDDs. We can use the same linear pass to detect elements of the set which can make further difference in propagation, and construct a filter on the propagator to prevent invoking it unless one of the variables that can make a difference changes. To summarize, the benefits of the approach of this paper are: • efficiency, no new BDDs are constructed during propagation, so it is very fast; • reuse, we can reuse a single BDD for multiple copies of the same constraint, and hence handle larger problems; • ordering, we are not restricted to a single global ordering of Booleans for constructing BDDs; and • filtering, we can keep track of which parts of the set variable can really make a difference, and reduce the amount of propagation. We illustrate a prototype solver using the approach on well-known set problems, comparing against the state of the art Gecode set bounds propagation solver.
2
Preliminaries
Propagation based approaches to solving set constraint problems represent the problem using a domain storing the possible values of each set variable, and propagators for each constraint, that remove values
506
G. Gange et al. / Fast Set Bounds Propagation using BDDs
from the domain of a variable that are inconsistent with values for other variables. Propagation is combined with backtracking search to find solutions. A domain D is a complete mapping from the fixed finite set of variables V to finite collections of finite sets of integers. The domain of a variable v is the set D(v). A domain D1 is said to be stronger than a domain D2 , written D1 D2 , if D1 (v) ⊆ D2 (v) for all v ∈ V. A domain D1 is equal to a domain D2 , written D1 = D2 , if D1 (v) = D2 (v) for all V variables v ∈ V. A domain D can be interpreted as the constraint v∈V v ∈ D(v). For set constraints we will often be interested in restricting variables to take on convex domains. A set of sets K is convex if a, b ∈ K and a ⊆ c ⊆ b implies c ∈ K. We use interval notation [a, b] where a ⊆ b to represent the (minimal) convex set K including a and b. For any finite collection of sets K = {a1 , a2 , . . . , an }, we define the convex closure of K: conv (K) = [∩a∈x a, ∪a∈x a]. We extend the concept of convex closure to domains by defining ran(D) to be the domain such that ran(D)(x) = conv (D(x)) for all x ∈ V. A valuation θ is a set of mappings from the set of variables V to sets of integer values, written {x1 → d1 , . . . , xn → dn }. A valuation can be extended to apply to constraints involving the variables in the obvious way. Let vars be the function that returns the set of variables appearing in an expression, constraint or valuation. In an abuse of notation, we say a valuation is an element of a domain D, written θ ∈ D, if θ(vi ) ∈ D(vi ) for all vi ∈ vars(θ). Constraints, Propagators and Propagation Solvers A constraint is a restriction placed on the allowable values for a set of variables. We shall use primitive set constraints such as (membership) k ∈ v, (equality) u = v, (subset) u ⊆ w, (union) u = v ∪ w, (intersection) u = v ∩ w, (cardinality) |v| = k, (upper cardinality bound) |v| ≤ k, (lexicographic order) u < v, where u, v, w are set variables, k is an integer. We can also construct more complicated constraints which are (possibly existentially quantified) conjunctions of primitive set constraints. We define the solutions of a constraint c to be the set of valuations θ on vars(c) that make the constraint true. We associate a propagator with every constraint. A propagator f is a monotonically decreasing function from domains to domains, so D1 D2 implies that f (D1 ) f (D2 ), and f (D) D. A propagator f is correct for a constraint c if and only if for all domains D: {θ | θ ∈ D} ∩ solns(c) = {θ | θ ∈ f (D)} ∩ solns(c) A propagation solver solv (F, D) for a set of propagators F and a domain D repeatedly applies the propagators in F starting from the domain D until a fixpoint is reached. solv (F, D) is the weakest domain D D where f (D ) = D for all f ∈ F . Domain and Bounds Consistency A domain D is domain consistent for a constraint c if D is the smalest domain containing all solutions θ ∈ D of c. We define the domain propagator for a constraint c as ( dom(c)(D)(v) =
{θ(v) | θ ∈ solns(D ∧ c)} if v ∈ vars(c) D(v) otherwise
Then dom(c)(D) is always domain consistent with c. A domain D is (set) bounds consistent for a constraint c if for every variable v ∈ vars(c) the upper bound of D(v) is the union of the values of v in all solutions of c in D, and the lower bound of D(v) is the intersection of the values of v in all solutions of c in D.
We define the set bounds propagator for a constraint c as ( conv (dom(c)(ran(D))(v)) if v ∈ vars(c) sb(c)(D)(v) = D(v) otherwise Then sb(c)(D) is always bounds consistent with c. BDDs We assume a set B of Boolean variables with a total ordering ≺. We make use of the following Boolean operations: ∧ (conjunction), ∨ (disjunction), ¬ (negation), → (implication), ↔ (biimplication) and ∃ (existential quantification). We denote by ∃V F ¯V F the formula ∃x1 · · · ∃xn F where V = {x1 , . . . , xn }, and by ∃ we mean ∃V F where V = vars(F ) \ V . Reduced Ordered Binary Decision Diagrams are a well-known method of representing Boolean functions on Boolean variables using directed acyclic graphs with a single root. Every internal node n(v, f, t) in a BDD r is labelled with a Boolean variable v ∈ B, and has two outgoing arcs — the ‘false’ arc (to BDD f ) and the ‘true’ arc (to BDD t). Leaf nodes are either F (false) or T (true). Each node represents a single test of the labelled variable; when traversing the tree the appropriate arc is followed depending on the value of the variable. Define the size |r| as the number of internal nodes in a BDD r, and VAR(r) as the set of variables v ∈ B appearing in some internal node in r. Reduced Ordered Binary Decision Diagrams (BDDs) [1] require that the BDD is: reduced, that is it contains no identical nodes (that is, nodes with the same variable label and identical then and else arcs) and has no redundant tests (no node has both then and else arcs leading to the same node); and ordered, if there is an arc from a node labelled v1 to a node labelled v2 then v1 ≺ v2 . A BDD has the nice property that the function representation is canonical up to variable reordering. This permits efficient implementations of many Boolean operations. BDDs can represent an arbitrary Boolean formula over variables B. We shall be interested in stick BDDs where for every internal node n(v, f, t) exactly one of f or t is the constant V F node. V Stick BDDs represent exactly the formulae of the form v∈T v∧ v∈F ¬v where T and F are disjoint subsets of B. A Boolean variable v is said to be fixed in a BDD r if either for every node n(v, t, e) ∈ r t is the constant F node, or for every node n(v, t, e) e is the constant F node. Such variables can be identified in a linear time scan over the domain BDD. For convenience, if φ is a BDD, we write φ to denote the BDD representing the conjunction of the fixed variables of φ. Note φ is a stick BDD.
3
Set Propagation using BDDs
The key step in building set propagation using BDDs is to realize that we can represent a finite set domain using a BDD. Representing domains If v is a set variable ranging over subsets of {1, . . . , N }, then we can represent v using the Boolean variables V (v) = {v1 , . . . , vN } ⊆ B, where vi is true iff i ∈ v. We will order the variables v1 ≺ v2 · · · ≺ vN . We can represent a valuation θ using a formula 0 1 ^ ^ ^ @ R(θ) = vi ∧ ¬vi A . v∈vars(θ)
i∈θ(v)
i∈{1,...,N }−θ(v)
Then a domain of variable v, D(v) can be represented as W a∈D(v) R({v → a}). This formula can be represented by a BDD.
G. Gange et al. / Fast Set Bounds Propagation using BDDs
Representing constraints We can similarly model any set constraint c as a BDD B(c) using the Boolean variable representation V (v) of its set variables v. By ordering the variables in each BDD carefully we can build small representations of the formulae. The pointwise order of Boolean variables is defined as follows. Given set variables u ≺ v ≺ w ranging over sets from {1, . . . , N } we order the Boolean variables as u1 ≺ v1 ≺ w1 ≺ u2 ≺ v2 ≺ w2 ≺ · · · un ≺ vn ≺ wn . The representation B(c) is simply ∨θ∈solns(c) R(θ). For primitive set constraints (using the pointwise order) this size is linear in N . For more details see [6]. The BDD representation of x = y ∪ z is shown in Figure 2(a). BDD-based Set Bounds Propagation We can build a set bounds propagator, more or less from the definition, since we have BDDs to represent domains and constraints. ^ φ = B(c) ∧ D(v ) v ∈vars(c)
sb(c)(D)(v)
=
∃V (v) φ
We simply conjoin the domains to the constraint obtaining φ, then extract the fixed variables from the result, and then project out the relevant part for each variable v. The set bounds propagation can be improved by removing the fixed variables as soon as possible. The improved definition is given in [5]. Overall the complexity can be made O(|B(c)|). The updated set bounds can be used to simplify the BDD representing the propagator. Since fixed variables will never interact further with propagation they can be projected out of B(c), so we can replace B(c) by ∃VAR(φ) φ.
4
Faster Set Bounds Propagation
While set bounds propagation using BDDs is much faster than set domain propagation and often better than set domain propagation (or other variations of propagations for sets) it still creates new BDDs. This is not necessary as long as we are prepared to give up the simplifying of BDDs that is possible in set bounds propagation. We do not represent domains of variables as BDD sticks, but rather as arrays of integer values. A domain D is an array where, for variable v ranging over subsets of {1, . . . , N }: D[vi ] = 0 indicates i ∈ / v, D[vi ] = 1 indicates i ∈ x, and D[vi ] = 2 means we don’t know whether i is in or not in v. Hence D(v) = [{i|D[vi ] = 1}, {i|D[vi ] = 0}]. The BDD representation of a constraint B(c) is built as before. A significant difference is that since constraints only communicate through the set bounds of variables we do not need them to share a global variable order hence we can if necessary modify the variable order used to construct B(c) for each c, or use automatic variable reordering (which is available in most BDD packages) to construct B(c). Another advantage is that we can reuse the BDD for a constraint c(¯ x) on variables x ¯ for the constraint c(¯ y ) on variables y¯ (as long as they range over the same initial sets), that is, the same constraint on different variables. Hence we only have to build one such BDD, rather than one for each instance of the constraint. The set bounds propagator sb(c(¯ x) for constraint c(¯ x) is now implemented as follows. A generic BDD representation r of the constraint c(¯ y ) is constructed. The propagator copies the domain description of the actual parameters x1 , . . . , xn onto a domain description E for formal parameters y 1 , . . . , y n . It constructs an array E
507
where E[yij ] = D[xji ]. Let V = {yij | 1 ≤ j ≤ n, 1 ≤ i ≤ N } be the set of Boolean variables occurring in the constraint c(¯ y ). The propagator executes the code bddprop(r, V, E) shown in Figure 1 which returns (r , V , E ). If r = F the propagator returns a false domain, otherwise the propagator copies back the domains of the formal parameters to the actual parameters so D[xji ] = E[yij ]. We will come back to the V argument in the next subsection. The procedure bddprop(r, V, E) traverses the BDD r as follows. We visit each node n(v, f, t) in the BDD in a top-down memoing manner. We record if, under the current domain, the node can reach the F node, and if it can reach the T node. If the f child can reach the T node we add support for the variable v taking value 0. Similarly if the t child can reach T we add support for the variable v taking 1. If the node can reach both F and T we record that the variable v matters to the computation of the BDD. After the visit we reduce the variable set for the propagator to those that matter, and remove values with no support from the domain. The procedure assumes a global time variable which is incremented between each propagation, which is used to memo the marking phase. The top(n, V ) function returns the variable in the root node of n or the largest variable (under ≺) in V if n = T or n = F. Example 1 Consider the BDD for the constraint x = y ∪ z when N = 2 shown in Figure 2(a). Assuming a domain E where E[y1 ] = 1 (1 ∈ y) and E[z2 ] = 1 (2 ∈ z), and the remaining variables take value 2, the algorithm traverses the edges shown with double lines in Figure 2(b). No path from x1 , or x2 following the f arc reaches T hence alive[x1 ,0] and alive[x2 ,0] are not marked with the current time. As a result E[x1 ] and E[x2 ] are set to 1. Hence we have determined 1 ∈ x and 2 ∈ x. Also, no nodes for z1 are actually visited, and the left node for y2 only reaches F and the right node only reaches T . Hence matters[z1 ] and matters[y2 ] are not marked with the current time. The set of vars collected by bddprop is empty, since the remaining variables are fixed. 2
4.1
Waking up less often
In practice a bounds propagation solver does not blindly apply each propagator until fixpoint, but keeps track of which propagators must still be at fixpoint, and only executes those that may not be. For set bounds this is usually managed as follows. To each set variable v is attached a list of propagators c that involve v. Whenever v changes, these propagators are rescheduled for execution. We can do better than this with the BDD based propagators. The algorithm bddprop collects the set of Boolean variables that matter to the BDD, that is can change the result. If a variable is fixed that does not matter, then set bounds propagation cannot learn any new information. We modify the wakeup process as follows. Each variable xj stores a list of pairs (f, S) of propagator f with the subset S of variables xji which matter to the propagator with the current domain. When the variable changes we traverse the list of propagators and wake those propagators where the change intersects with S. On executing a propagator we revise the set S stored in the list for variable xj to be {xji | yij ∈ vars} where vars is the the set of “interesting” variables returned by bddprop. Note the same optimization could be applied to the standard approach, but requires the overhead of computing VAR(r ) which here is folded into bddprop.
508
G. Gange et al. / Fast Set Bounds Propagation using BDDs
bddprop(r,V ,E) { (reachf , reacht) = bddp(r, V, E); if (¬reacht) return (F, ∅, E); vars = ∅; for (v ∈ V ) { for (d ∈ {0, 1}) if (alive[v,d] < time) E[v] = 1 − d; if (E[v] = 2 ∧ matters[v] ≥ time) vars = vars ∪ {v}; } return (r, vars, E); } bddp(node,V ,E) { switch node { F : return (1,0); T : return (0,1); n(v, f, t): if (visit[node] ≥ time) return save[node]; reachf = 0; reacht = 0; if (E[v] = 1) { (rf0 , rt0 ) = bbdp(f, V, E); reachf = reachf ∧ rf0 ; reacht = reacht ∧ rt0 ; if (rt0 ) { for (v ∈ V, v ≺ v ≺ top(f, V )}) alive[v ,0] = alive[v ,1] = time; alive[v,0] = time; } } if (E[v] = 0) { (rf1 , rt1 ) = bbdp(t, V, E); reachf = reachf ∧ rf1 ; reacht = reacht ∧ rt1 ; if (rt1 ) { for (v ∈ V, v ≺ v ≺ top(t, V )}) alive[v ,0] = alive[v ,1] = time; alive[v,1] = time; } } if (reachf ∧ reacht) matters[v] = time; save[node] = (reachf , reacht); visit[node] = time; return (reachf , reacht); } } Figure 1.
5
Pseudo-code for BDD propagation.
Experimental Results
We have built a prototype set bounds solver implementing the algorithms described. Currently a Prolog engine takes the definition of the problem, and used an interface to the BDD package CUDD [10] to construct the generic BDDs. It then creates a C file for backtracking solver with data structures for the BDDs. This prototype is very expensive in terms of compilation time, ranging from 0.36–4.65s for Steiner and 0.52–2.42s for golfers, but the actual BDD creation time is a tiny proportion of this, at most 30ms and usually unmeasurable (0ms). In a direct implementation the compilation time will effectively shrink to the BDD creation time. Experiments were conducted on a 2.66GHz Core2 Duo with 2 Gb of RAM running Ubuntu GNU/Linux 7.04. We compare against the state of the art set bounds propagators of Gecode 2.0 [3]. Steiner Systems A commonly used benchmark for set constraint solvers is the calculation of small Steiner systems. A Steiner system S(t, k, N ) is a set X of cardinality N and a collection C of subsets of X of cardinality k (called ‘blocks’), such that any t elements of
/.-, ()*+ x1 { CCC CC { C! }{ /.-, ()*+ /.-, ()*+ y1 y1
()*+ /.-, ()*+
/.-, z1 C z
{ 1
{ C
{ { C! }{ { /.-, ()*+ x2 { CCC CC { C! }{ ()*+ ()*+ /.-, y2 y2 /.-,
/.-, ()*+ ()*+
z2 C { z
/.-,
i{ 2 i |
i || t C i
{{
~|util| oi qi i C!
}{{
F
T
(a)
/.-, ()*+ x {{ 1CCCCCC {{ CCCC y {{ % /.-, ()*+ /.-, ()*+ y1 y1
/.-, ()*+ /.-, ()*+ z1 C z
{{ 1
C
{
{ C! }{ { /.-, ()*+ x { 2CCC {{ { CCCCCCC ()*+y {{ % ()*+ y2 y2 /.-, /.-,
()*+
/.-, ()*+ z2 C { z2
/.-,
i
|||||||q t iC i i
i { {{{{{{ C!
y {{{
zutil|||oi i
F
T
(b)
Figure 2. (a) The BDD representing x = y ∪ z where N = 2. A node n(v, f, t) is shown as a circle around v with a dashed arrow to f and full arrow to t. (b) The edges traversed by bddprop, when E[y1 ] = 1 and E[z2 ] = 1 and E[v] = 2 otherwise, are shown doubled.
X are` in´exactly ` ´ one block. Any Steiner system must have exactly m = Nt / kt blocks (Theorem 19.2 of [9]). We use the same modelling of the problem as [8], extended for the case of more general Steiner Systems. We model each block as a set variable s1 , . . . , sm , with the constraints: m ^
(|si | = k) ∧
i=1 m−1 ^
m ^
(∃uij .uij = si ∩ sj ∧ |uij | ≤ (t − 1)) ∧ (si < sj )
i=1 j=i+1
To compare the raw performance of the bounds propagators we performed experiments using a model of the problem with primitive constraints and intermediate variables uij directly as shown above equivalent to the Gecode model. The results are shown in “Split Constraints” section of Table 1. Gecode has slightly better search behaviour than our solver because its set bounds propagators take into account cardinality information. Clearly the raw propagation speed of the BDD solver is better than Gecode except for the case where the N is large. Note that the BDD solver of [6] cannot handle the largest four Steiner problems with split constraints, because there are too many Boolean variables for the BDD package. Of course, the BDD representation permits us to merge primitive constraints and remove ` ´ intermediate variables, allowing us to model the problem as m binary constraints (containing no intermediate 2 variables uij ) corresponding to second line above conjoined with the cardinality constraints for si and sj . Results for this improved model are shown in the “Merged Constraints” section of Table 1. Here the search is reduced and propagation speed usually significantly increased, though filtering is less beneficial. Social Golfers Another common set benchmark is the “Social Golfers” problem, which consists of arranging N = g × s golfers into g groups of s players for each of w weeks, such that no two players play together more than once. Again, we use the same model as [8], using a w ×g matrix of set variables vij where 1 ≤ i ≤ w and 1 ≤ j ≤ g. Gecode is restricted to use separate constraints, while the BDD solver uses merged constraints.
G. Gange et al. / Fast Set Bounds Propagation using BDDs
509
Table 1. Performance results on Steiner Systems: first solution (F) and all solutions (A). Time in seconds for 1000 runs (first solution problems) and one run (all solutions) and number of failures are given for Gecode and the BDD solver for split constraints and the BDD solver for merged constraints. Two times for the BDD set bounds solver are shown: time without filtering and time+f with filtering. A first-fail “element-in-set” labelling strategy is used in all cases. “—” denotes failure to complete a test case within 240 minutes. × denotes a case where our naive trailing implementation for filtering runs out of space.
Problem S(2,3,7) S(3,4,8) S(2,3,9) S(2,4,13) S(2,3,15) S(3,4,16) S(2,5,21) S(3,6,22) S(2,3,31) S(2,3,7) S(3,4,8) S(2,3,9)
F F F F F F F F F A A A
Gecode time fails 0.41 2 5.43 14 47.98 395 3.33 4 19.86 6 1688.81 90 14.97 4 495.9 118 1098.82 14 0.27 6.10×103 1018.84 6.36×106 2593.03 3.15×107
Split Constraints time time+f fails 0.30 0.24 2 1.70 1.56 14 37.04 14.50 542 3.58 1.98 4 29.01 16.54 6 431.53 × 90 20.50 19.68 4 271.14 243.27 142 1659.71 1891.04 14 0.29 0.22 1.17×104 10108.89 7875.12 1.44×107 — — —
Table 2. First-solution performance results on the Social Golfers problem. Time in seconds for 100 runs and number of failures are given for both solvers. A first-fail “element-in-set” labelling strategy is used in all cases. Problem 2-5-4 2-6-4 2-7-4 3-5-4 3-6-4 3-7-4 4-5-4 4-6-5 4-7-4 4-9-4 5-5-4 5-7-4 5-8-3 6-5-3 6-6-3
Gecode time fails 0.33 14 7.71 860 34.30 2935 0.81 14 18.57 863 93.42 2974 0.65 1388 225.92 5355 142.58 2979 10.52 54 149.73 2495 308.61 3062 5.07 10 102.84 1621 3.06 4
Merged Constraints time time+f fails 0.21 0.14 30 5.77 2.55 2036 19.58 8.9 4447 0.46 0.44 30 22.91 11.82 2039 64.06 41.77 4492 0.30 0.26 2886 298.43 209.22 12747 137.80 103.25 4498 7.8 5.49 71 50.73 28.29 2758 218.58 190.9 4582 3.29 2.36 14 35.93 17.05 1615 1.74 1.23 5
Experimental results are shown in Table 2. Interestingly the merged constraints here are not enough to compete with the set bounds propagators of Gecode that include cardinality considerations. Not withstanding the greater search space, the BDD set solver is still substantially faster than Gecode. For these examples filtering is always beneficial, sometimes 2 times faster. If we compare against the BDD solver of [6] on these examples, our new solver is around 30 times faster (although the machines used are not identical).
6
Related Work
BDD based set solvers were introduced by [8] originally for domain propagation, and then extending to bounds, split and lex and cardinality propagation [6]. The combination of BDD based set bounds propagation with nogoods was introduced in [7]. Another approach automatically constructing set bounds propagators is defined in [12]. A similar approach to using BDDs in propagation was previously defined for solving SAT problems in [2]. This approach informally defines a marking approach to BDD propagation, but does not consider sets, generic constraints, or filtering.
7
Merged Constraints time time+f fails 0.12 0.11 0 0.96 0.90 2 5.05 5.69 121 2.24 2.17 2 15.61 15.66 3 474.22 530.52 58 19.38 19.68 3 554.44 668.509 96 1198.95 1301.64 11 0.01 0.02 1.07×103 58.93 58.95 4.32×105 287.05 324.10 8.81×106
Conclusion
In this paper we have improved the BDD-based technique of set bounds propagation. The traversal approach to propagation we presented is at least an order of magnitude faster than the previous technique utilizing BDD operations. The prototype implementation of our method is significantly faster than the state of the art set constraint solver of Gecode. As demonstrated by [7], further improvements in the solver performance can be straightforwardly achieved by incorporating nogoods generation [11].
REFERENCES [1] Randal E. Bryant, ‘Graph-based algorithms for Boolean function manipulation’, IEEE Trans. Comput., 35(8), 677–691, (1986). [2] R.F. Damiano and J.H. Kukula, ‘Checking satisfiability of a conjunction of BDDs’, in Proceedings of Design Automation Conference, pp. 818– 823, (2003). [3] Gecode. www.gecode.org. Accessed Jan 2008. [4] Carmen Gervet, ‘Interval propagation to reason about sets: Definition and implementation of a practical language’, Constraints, 1(3), 191– 246, (1997). [5] P. Hawkins, V. Lagoon, and P.J. Stuckey, ‘Set bounds and (split) set domain propagation using ROBDDs’, in 17th Australian Joint Conference on Artificial Intelligence, volume 3339 of LNCS, pp. 706–717, (2004). [6] P. Hawkins, V. Lagoon, and P.J. Stuckey, ‘Solving set constraint satisfaction problems using ROBDDs’, Journal of Artificial Intelligence Research, 24, 106–156, (2005). [7] P. Hawkins and P.J. Stuckey, ‘A hybrid BDD and SAT finite domain constraint solver’, in Proceedings of the 8th International Symposium on Practical Aspects of Declarative Languages, volume 3819 of LNCS, pp. 103–117, (2006). [8] V. Lagoon and P.J. Stuckey, ‘Set domain propagation using ROBDDs’, in Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, volume 3258 of LNCS, pp. 347– 361, (2004). [9] J. H. van Lint and R. M. Wilson, A Course in Combinatorics, Cambridge University Press, 2nd edn., 2001. [10] Fabio Somenzi. CUDD: Colorado University Decision Diagram package. Accessed May 2004. http://vlsi.colorado.edu/ fabio/CUDD/. [11] S. Subbarayan, ‘Efficent reasoning for nogoods in constraint solvers with BDDs’, in Proceedings of Tenth International Symposium on Practical Aspects of Declarative Languages, volume 4902 of LNCS, pp. 53–57, (2008). [12] G. Tack, C. Schulte, and G. Smolka, ‘Generating propagators for finite set constraints’, in Twelfth International Conference on Principles and Practice of Constraint Programming, volume 4204 of LNCS, pp. 575– 589, (2006).
510
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-510
A new Approach for Solving Satisfiability Problems with Qualitative Preferences Emanuele Di Rosa and Enrico Giunchiglia and Marco Maratea1 Abstract. The problem of expressing and solving satisfiability problems (SAT) with qualitative preferences is central in many areas of Computer Science and Artificial Intelligence. In previous papers, it has been shown that qualitative preferences on literals allow for capturing qualitative/quantitative preferences on literals/formulas; and that an optimal model for a satisfiability problems with qualitative preferences on literals can be computed via a simple modification of the Davis-Logemann-Loveland procedure (DLL): Given a SAT formula, an optimal solution is computed by simply imposing that DLL branches according to the partial order on the preferences. Unfortunately, it is well known that introducing an ordering on the branching heuristic of DLL may cause an exponential degradation in its performances. The experimental analysis reported in these papers hightlights that such degradation can indeed show up in the presence of a significant number of preferences. In this paper we propose an alternative solution which does not require any modification of the DLL heuristic: Once a solution is computed, a constraint is added to the input formula imposing that the new solution (if any) has to be better than the last computed. We implemented this idea, and the resulting system can lead to significant improvements wrt the original proposal when dealing with MIN ONE / MAX - SAT problems corresponding to qualitative preferences on structured instances.
1
Introduction
The problem of expressing and solving satisfiability problems with qualitative preferences is central in many areas of Computer Science and Artificial Intelligence. For instance, in planning, beside the goals that have to be achieved, it is common to have other “soft” goals that it would be desiderable to satisfy: A plan is one solution which achieves all the goals, and an “optimal” plan is one which also achieves as many soft goals as possible. In planning as satisfiability [16] with soft goals [13], the task of finding an optimal plan is reduced to a satisfiability problem with qualitative preferences. Here, for simplicity, we consider qualitative preferences on literals, in which preferences are modeled as a set S of literals, and the relative importance of satisfying each literal in the set S is captured with a partial order on S. In [12, 13], it has been shown that 1. qualitative preferences on formulas and quantitative preferences on literals/formulas can be reduced to qualitative preferences on literals; and 2. that it possible to compute an optimal solution (wrt the expressed preferences) via a simple modification of the Davis-LogemannLoveland procedure (DLL): In more details, an optimal solution 1
DIST Universit`a di email: {emanuele,enrico,marco}@dist.unige.it
Genova,
Italy,
is computed by imposing that branching occurs according to the partial order on the literals in the set of preferences. This method for computing an optimal solution has the advantage that it only requires a simple modification of existing state-of-theart SAT solvers all of which are based on DLL. However, it is well known that introducing an ordering on the branching heuristic of DLL may cause an exponential degradation in its performances [15]. OPTSAT is the name given to the related system built on top of MIN ISAT [10]. The experimental analysis reported in [12, 13] hightlights that such degradation can show up in the presence of a significant number of preferences. In this paper we propose an alternative solution which does not require any modification of DLL heuristic and thus which does not have the above mentioned disadvantage. In a few words, once a solution is computed, a blocking formula is added to the input formula imposing that the new solution (if any) will be better than the last computed wrt the qualitative preference on literals expressed. Our approach works with any qualitative preference on literals, and thus (via the reductions described in [12, 13]) with any qualitative/quantitative preference on literals/formulas. We extended OPTSAT in order to incorporate this new method. In the following, we use OPTSAT- HS to refer to OPTSAT when using the method described in [12], and OPTSAT- BF to refer to OPTSAT when using the method here described. To comparatively test the effectiveness of the approach, we consider MAX - SAT and MIN - ONE problems, in their non partial/partial2 , qualitative/quantitative versions, as in [12]. Our selection of benchmarks includes problems from the last MAX - SAT evaluation3 , well known satisfiability planning problems, and does not include problems with a (pseudo)-random structure. Indeed, OPTSAT is based on MINISAT , and MINISAT has been designed to solve large but relatively easy industrial SAT problems (and not small but relatively difficult randomly generated problems). In the qualitative case of (partial) MIN - ONE and MAX - SAT problems, the experimental results show that OPTSAT- BF performs better than OPTSAT- HS. The reasons for the good performances of OPTSAT- BF are: 1. The good quality of the first computed solution, and 2. The few iterations required to get to the determined optimal solution. In the quantitative case, OPTSAT- BF is competitive also with respect to the other state-of-the-art systems for MAX - SAT, including the most performing systems in the recent PB and MAX - SAT evaluations. Summing up, the main contributions of the paper are: 2 3
In the partial MIN - ONE (resp. MAX - SAT ) problem the optimization has to be performed on a subset of the variables (resp. clauses) of the problem. http://www.maxsat07.udl.es/
E. Di Rosa et al. / A New Approach for Solving Satisfiability Problems with Qualitative Preferences
• We define a new approach for solving satisfiability problems with qualitative preferences. • We formally state some properties of our algorithm. • We extend OPTSAT in order to implement this new approach. • On (partial) MAX - SAT and MIN - ONE non (pseudo)-random problems, we show that OPTSAT- BF performs better than OPTSAT- HS in the qualitative case, and that is competitive wrt other state-ofthe-art systems in the quantitative case. The paper is structured as follows. In Section 2 we review our formalism for expressing preferences. Section 3 is dedicated to the presentation of the algorithm behind OPTSAT- BF, and its formal properties. Section 4 presents the experimental analysis we conducted. Section 5 ends the paper with some final remarks.
2
Satisfiability and Qualitative Preferences
Consider a finite set P of variables. A literal is a variable x or its negation x. We assume x = x. A clause is a finite disjunction of literals and a formula is a finite conjunction of clauses. As customary in SAT, we also represent clauses as sets of literals and formulas as sets of clauses, and we use and ⊥ to denote the empty set of clauses and the empty clause respectively. For example, given the 4 variables Fish, Meat, RedWine, WhiteWine, the formula {Fish, Meat}, {RedWine, WhiteWine}
(1)
models the fact that we cannot have both fish (Fish) and meat (Meat), both red (RedWine) and white (WhiteWine) wine. An assignment is a consistent set of literals. If l ∈ μ, we say that both l and l are assigned by μ. An assignment μ is total if each literal l is assigned by μ. A total assignment μ satisfies a formula ϕ if for each clause C ∈ ϕ, C ∩μ = ∅. A model μ of a formula ϕ is an assignment satisfying ϕ. A formula ϕ entails a formula ψ if the models of ϕ are a subset of the models of ψ. For instance, (1) has 9 models. In the following, we abbreviate a total assignment with the set of variables assigned to true, and we write μ |= ψ to indicate that μ is a model of ψ. For instance, we write {Fish, WhiteWine} as an abbreviation for the total assignment {Fish, Meat, WhiteWine, RedWine} in which the only variables assigned to true are Fish and WhiteWine, i.e., the situation in which we have fish and white wine. A qualitative preference on literals is a partially ordered set of literals, i.e., a pair S, ≺ where S is a set of literals (also called the set of preferences), and ≺ is a partial order on S. Intuitively, S represents the set of literals that we would like to have satisfied, and ≺ models the relative importance of our preferences. For example, {Fish, RedWine, WhiteWine}, {WhiteWine ≺ RedWine}
(2)
models the case in which we prefer to have fish and both red and white wine. In the case in which it is not possible to have both red and white wine, we like more to have white than red wine. A qualitative preference S, ≺ on literals can be extended to the set of total assignments as follows: Given two total assignments μ and μ , μ is preferred to μ (μ ≺ μ ) if and only if 1. there exists a literal l ∈ S with l ∈ μ and l ∈ μ ; and 2. for each literal l ∈ S∩(μ \μ), there exists a literal l ∈ S∩(μ\μ ) such that l ≺ l . A model μ of a formula ϕ is optimal if it is a minimal element of the partially ordered set of models of ϕ. For instance, considering the qualitative preference (2), the formula (1) has only one optimal model, i.e., {Fish, WhiteWine}.
511
We recall that qualitative preference on formulas can be reduced to qualitative preferences on literals (see [13]); and that by propositional encoding of the objective function to maximize/minimize, it is possible to reduce also quantitative preferences to qualitative ones, see [12].
3
Solving satisfiability problems with preferences
Consider a formula ϕ and a qualitative preference on literals S, ≺. The problem of computing an optimal model of ϕ wrt S, ≺ can be solved by 1. computing a (not necessarily optimal) model μ of ϕ, 2. adding a formula which restricts the subsequent search for models to those which are preferred to μ, 3. iterating the above two steps up to the point that the last assignment found can no longer be improved. Crucial for the above procedure is a condition which enables us to say which are the models that are preferred (wrt S, ≺) to an assignment μ. The preference formula for μ wrt S, ≺ is (∨l:l∈S,l∈μ l) ∧ (∧l :l ∈S,l ∈μ (∨l:l∈S,l∈μ,l≺l l ∨ l )).
(3)
An assignment μ is preferred to μ wrt S, ≺ iff μ satisfies (3), as stated by the following theorem. Theorem 1 Let μ and μ be two total assignments. Let S, ≺ be a qualitative preference. μ is preferred to μ wrt S, ≺ if and only if μ satisfies the preference formula for μ wrt S, ≺. As an example of the application of the theorem above consider the following particular cases: 1. S ⊆ μ, (e.g., because there are no preferences, S = ∅): In this case (3) is equivalent to ⊥, meaning that there is no assignment which is preferred to μ, i.e., that μ is already optimal; 2. S, ≺ = {l1 , . . . , ln }, ∅: In this case (3) becomes (∨l:l∈S,l∈μ l) ∧ (∧l :l ∈S,l ∈μ l ), meaning that any assignment μ with μ ≺ μ must be such that μ ∩ S ⊂ μ ∩ S; Considering the preference (2), 1. if μ1 = {Meat, RedWine}, then (3) is ψ1 : (Fish ∨ WhiteWine) ∧ (WhiteWine ∨ RedWine) 2. if μ2 = {Meat, WhiteWine}, then (3) is ψ2 : (Fish ∨ RedWine) ∧ WhiteWine 3. if μ3 = {Fish, WhiteWine}, then (3) is ψ3 : RedWine ∧ Fish ∧ WhiteWine. Notice that μ2 ≺ μ1 and μ3 ≺ μ2 : As a consequence ψ2 entails ψ1 and ψ3 entails ψ2 . Further, as the last example makes clear, it is indeed possible that the preference formula for an assignment is inconsistent with the given set of constraints, and this is indeed an obvious consequence of the fact that the definition of (3) does not take into account the input formula: In the case in which the preference formula for an assignment μ is inconsistent with the input set of clauses, μ is optimal. As we have already said at the beginning of the section, Theorem 1 allows us to use any complete SAT solver as a black box for
512
E. Di Rosa et al. / A New Approach for Solving Satisfiability Problems with Qualitative Preferences
S, ≺ := a qualitative preference on literals; ϕ := the input formula; ψ := ; μopt := ∅ function PREF - DLL(ϕ ∪ ψ,μ) 1 if (⊥ ∈ (ϕ ∪ ψ)μ ) return FALSE; 2 if (μ is total) μopt := μ; ψ := Reason(μ, S, ≺); return FALSE; 3 if ({l} ∈ (ϕ ∪ ψ)μ ) return PREF - DLL(ϕ ∪ ψ, μ ∪ {l}); 4 l := ChooseLiteral(ϕ ∪ ψ, μ); 5 return PREF - DLL(ϕ ∪ ψ, μ ∪ {l}) or PREF - DLL(ϕ ∪ ψ, μ ∪ {l}).
Figure 1.
The algorithm of PREF - DLL .
computing an optimal assignment. Once a model μ of a formula ϕ is found, the formula (3) is computed and added to ϕ and then the SAT solver can be invoked: The returned model is ensured to be preferred to μ. However, given that all the state-of-the-art systems are based on DLL, it is possible, following what has been successfully done in various areas of automated deduction (see, e.g., [2]), to add the formula (3) as soon as μ is determined, i.e., during the search. The resulting procedure is represented in Figure 1. In the figure: • ϕ is the input set of clauses, S, ≺ is a qualitative preference on literals, μopt is the (current) optimal assignment, ψ is the set of clauses corresponding to the preference formula for μopt wrt S, ≺; μ is an assignment; • (ϕ ∪ ψ)μ is the set of clauses obtained from ϕ ∪ ψ by (i) deleting the clauses C ∈ ϕ ∪ ψ with μ ∩ C = ∅, and (ii) substituting the other clauses C ∈ ϕ ∪ ψ with C \ {l : l ∈ μ}; • Reason(μ, S, ≺) returns the set of clauses corresponding to the preference formula for μ wrt S, ≺; • ChooseLiteral(ϕ ∪ ψ, μ) returns a literal in ϕ ∪ ψ which is unassigned by μ. It is easy to see that PREF - DLL is exactly the same as that once a model μ is determined (see line 2),
DLL ,
except
1. μ is stored in μopt ; 2. the preference formula for μ wrt S, ≺ is stored in ψ, and 3. FALSE is returned. Notice that PREF - DLL generalizes DLL in the sense that if there are no preferences (i.e., if S = ∅), PREF - DLL behaves as DLL: Indeed, if S = ∅ then any model is optimal, and as soon as one model μ is found, the preference formula for μ wrt S, ≺ (i.e., ⊥) determines the termination of PREF - DLL. Theorem 2 Let ϕ be a formula and S, ≺ a qualitative preference on literals. PREF - DLL(ϕ, ∅) terminates, and then μopt is empty if ϕ is unsatisfiable, and an optimal model of ϕ wrt S, ≺ otherwise. Beside the above, one interesting property of PREF - DLL is its “anytime” property: The sequence of models μ1 , μ2 , . . . , μn computed by PREF - DLL are ensured to be such that μi+1 is preferred to μi , i.e., μi+1 ≺ μi (0 < i < n). Thus, PREF - DLL is as fast as DLL to compute the first model of the input set of clauses, and, time permitting, from that point on, it can only improve the quality of the model found. Also notice that in Figure 1 we called Reason the procedure
for computing the preference formula (3). Indeed, most of the current SAT solvers (at least those meant for applications) are based on learning: As soon as a clause C becomes empty, C is returned and then used by the learning mechanism of the solver to backjump over irrelevant nodes while backtracking, and, with learning, to prune the subsequent search of the solver. Such clause C is often called “reason” or conflict clause, and it has the property that it is falsified by the assignment μ which caused C to become empty (i.e., for each literal l ∈ C, l ∈ μ). In our case, with solvers based on learning, as soon as the assignment μ is total and no empty clause is detected, we can return the clause C corresponding to the left conjunct of (3) as conflict clause: Indeed, ∨l∈S,l∈μ l is falsified by μ. However, we must also add the other clauses corresponding to (3) to the input set of clauses, since these are needed to ensure that the search will continue looking for another model μ of the input formula with μ ≺ μ. Fortunately, the clauses added to the input set of clauses, do not need to be indefinitely retained (otherwise PREF - DLL can have an exponential blow up in space): Once a new model μ with μ ≺ μ is found, we can discard the clauses added because of μ since they are entailed by the new clauses added because of μ , as stated by the following theorem. Theorem 3 Let S, ≺ be a qualitative preference. Let μ1 , μ2 , . . . , μn be the sequence of models computed by PREF - DLL, and ψ1 , ψ2 , . . . , ψn be the corresponding preference formulas. For each i, 0 < i < n, ψi+1 entails ψi . In PREF - DLL (see Figure 1), the preference formula ψi for μi is overwritten as soon as a new model μi+1 is determined (line 2). PREF - DLL is thus guaranteed to work in polynomial space in the size of the input formula and qualitative preference.
4
Implementation and experimental analysis
We extended OPTSAT [12] in order to incorporate these ideas. OPTSAT is built on top of MINISAT [10], the 2005 version, winner of the SAT 2005 competition on the industrial benchmarks category (together with the SAT/CNF minimizer S AT EL ITE [9]): Such choice has been motivated by our interest in solving, in particular, large structured problems coming from applications. The two versions of OPTSAT — OPTSAT- HS and OPTSAT- BF — are the ones that we consider in the case of qualitative preferences. In the case on quantitative preference, OPTSAT encodes the objective function using the methods described in [23, 3]: Here we used the one based on [23]. Table 1 shows the results for OPTSAT- HS and OPTSAT- BF on a variety of problems detailed below. The table shows the results also for various other state-of-the-art solvers included for completeness. In particular we considered both dedicated solvers for •
MAX - SAT
problems, like BF [6]; MAX S OLVER [24]; T OOL [21, 17] ver. 3.0; MAX S ATZ version submitted to the 2007 Evaluation [18]; M INI M AX S AT ver. 1.0 [14] and abbreviated with MMSAT in the Table; and • generic Pseudo-Boolean solvers, like OPBDP ver. 1.1.1 [4]; PBS ver. 2.1 and ver. 4 [1]; MINISAT + ver. 1.13 [11] and abbreviated with MSAT + in the Table; GLP PB ver. 0.2 by the same authors of P UEBLO [22] as submitted to the 2007 Evaluation4 ; BSOLO ver. 3.0.17 [19]. BAR
MAX S ATZ and M INI M AX S AT have been the winner of the recent Max-SAT Evaluation 2007 in the “Max-SAT” and “Partial MaxSAT” category, respectively. MINISAT + was the solver able to prove 4
http://www.eecs.umich.edu/˜hsheini/pueblo/
E. Di Rosa et al. / A New Approach for Solving Satisfiability Problems with Qualitative Preferences
class 1 Partial MINONE 2 MINONE 3 MAXSAT 4 MAXCUT/spinglass 5 MAXCUT/dimacs mod 6 PSEUDO/garden 7 PSEUDO/logic-synthesis 8 PSEUDO/primes 9 PSEUDO/routing 10 MAXONE/structured 11 MAXCLIQUE/structured
#I OPTSAT- HS 21 77.99(19) 26 0.69(26) 35 26.68(34) 5 0.01(5) 62 0.01(62) 7 0.02(7) 17 0.03(17) 148 4.81(130) 15 11.69(15) 60 0.96(60) 62 0.01(62)
OPTSAT- BF
OPBDP
2.7(21) 0.2(26) 11.25(35) 0.01(5) 0.01(62) 0.01(7) 0.01(17) 0.19(131) 3.12(15) 0.13(60) 0.06(62)
− 85.37(7) 20.89(3) 0.99(1) 230.33(5) 2.2(4) − 16.65(85) 81.83(5) 296.26(35) 70.37(16)
PBS4 223.14(15) 17.56(19) 98.55(10) 66.67(1) 0.01(2) 147.58(4) 85.88(1) 18.08(90) 102.75(9) 11.48(60) 23.79(13)
MSAT + 43.32(18) 7.33(24) 130.37(31) 0.86(1) 247.54(7) 0.25(5) 490.36(5) 11.52 (104) 43.74(15) 2.02(58) 154.39(22)
513
BSOLO MAX S ATZ MMSAT OPTSAT- HS OPTSAT- BF 433.21(16) 391.21(12) 74.28(21) 69.89(21) 115.73(22) 87.21(24) 93.24(24) 23.99(25) 192.56(23) 274.38(22) 229.73(21) 218.86(31) 175.12(31) 76.57(1) 33.19(3) 1.09(3) 7.56(1) 7.52(1) 0.01(2) 59.27(52) 194.52(52) 66.86(4) 21.61(3) 30.18(4) 4.75(5) 22.8(5) 36.66(5) − 81.93(2) 90.36(3) 338.26(3) 22.23 (94) 62.08 (107) 31.8(103) 60.59(109) 373.73(8) 109.49(14) 41.49(15) 36.1(15) 40.96(60) 22.5(60) 293(56) 7.87(58) 248.26(14) 61.97(36) 54.14(19) 178.04(23)
Table 1. Results for solving satisfiability problems with qualitative (columns 4-5) and quantitative (columns 6-13) preferences. Problems are: Partial MIN - ONE (row 1), MIN - ONE (row 2), MAX - SAT (rows 3-5), and partial MAX - SAT (rows 6-11).
unsatisfiability and optimality to a larger number of instances than all the other solvers that entered into the Pseudo-Boolean Evaluation 2005 [20], and the best performing solver (together with BSOLO) also in the Pseudo-Boolean Evaluation 2006, category OPT-SMALLINTLIN. BSOLO and GLP PB have been the best performing PB solvers in the OPT-SMALLINT-LIN category of the recent Pseudo-Boolean evaluation 2007. Considering the dedicated solvers for MAX - SAT, we discarded BF, MAX S OLVER and T OOLBAR after an initial analysis because they seem to be tailored for randomly generated problems, and are thus not suited to deal with the problems we consider here. About the Pseudo-Boolean solvers, we do not show the results for PBS ver. 2.1 and GLP PB because they are almost always slower than PBS ver. 4.0 and BSOLO, respectively, and, ultimately, they manage to solve only a few of the instances we consider. About the benchmarks, we considered a wide set of instances, mainly coming for real-world applications. In particular, we used SATPLAN 2004, release of 10 Feb. 2006 to generate the partial MIN ONE problems of row 1: In more details, we considered several domains from previous International Planning Competitions (IPCs); generated the first satisfiable instances with SATPLAN; and, for such instance, we considered the partial MIN - ONE problem of minimizing the set of action variables set to true. For MIN - ONE and MAX SAT problems, we selected well known satisfiable and unsatisfiable SAT instances from several domains, i.e., Formal Verification instances from the Bejing’96 competition, planning problems from SATPLAN contributed by Kautz and Selman, Data Encryption Standard (DES) instances, quasi group instances, and bounded model checking (BMC) problems used in the original BMC paper [5], miter-based circuit equivalence benchmarks by Joao Marques-Silva: Each of these satisfiable instances corresponds to a MIN - ONE problem and the results are presented in row 2, while the unsatisfiable instances correspond to the MAX - SAT problems whose results are in row 3. Finally, we included in our analysis also (partial) MAX SAT problems from the recent MAX - SAT evaluation, rows 4-11: As it emerges from the results of this evaluation5 , these benchmarks are hard; the performances of the best solvers differ only for a factor, no solver clearly wins; and it is difficult to solve even a single instance more than the other solvers. Each solver has been run using its default settings. All the experiments have been run on a Linux box equipped with a Pentium IV 3.2GHz processor and 1GB of RAM. CPU time is measured in seconds; timeout has been set to 1800 seconds. In Table 1, 5
See the slides about the http://www.maxsat07.udl.es/ms07-pre.pdf.
results,
available
at
• • • •
column 2 is the class of the problems; column 3 is the number of instances in the class; columns 4-5 are dedicated to qualitative preferences; columns 6-14 are for the quantitative case.
Results for solvers are cumulatively presented as in the report of the MAX - SAT Evaluations: Given a set of instances, we show the mean CPU time of the solved instances, and the number of solved ones (in parenthesis). MAX S ATZ can only deal with MAX - SAT problems, and this is why the corresponding results for MIN - ONE and partial MIN - ONE / MAX - SAT are missing. In the qualitative case we can see that OPTSAT- BF (column 5) is consistently better than OPTSAT- HS (column 4), both in terms of mean CPU time and solved instances: OPTSAT- BF solves the same number of instances of OPTSAT- HS, or higher, and in less time, sometimes dramatically (see, e.g, rows 1 and 8), but for row 11 which is nonetheless solved very easily by both solvers. In the quantitative case, OPTSAT- BF performs also well on these benchmarks. We have to remind that these benchmarks do not include many problems from the last evaluations because of their (pseudo)-random structure which is not suited for our solver. For fairness, this also implies that it is not clear whether the problem we selected are suited for the other solver in our analysis. Indeed, we conducted a preliminary analysis on the (pseudo)-random problems we excluded, and we got a different picture, in which other solvers (and in particular MMSAT) emerge. class 1 Partial MINONE 2 MINONE 3 MAXSAT 4 MAXCUT/spinglass 5 MAXCUT/dimacs mod 6 PSEUDO/garden 7 PSEUDO/logic-synthesis 8 PSEUDO/primes 9 PSEUDO/routing 10 MAXONE/structured 11 MAXCLIQUE/structured
T1 2.68 0.19 0.05 0.01 0.01 0.01 0.01 0.18 3.12 0.12 0.06
Q1 #Sols Tf Qf 45.5 2.5 2.7 44.1 751.6 2 0.2 751.6 8605.2 21.2 11.25 8847.6 770.4 2 0.01 770.4 695.9 2.2 0.01 701.9 496 2 0.01 496 152.2 2 0.01 152.2 368.4 2 0.19 368.4 58.7 2 3.12 58.7 240.5 8.4 0.13 249.8 430.4 2 0.06 430.4
Table 2. CPU time for finding first (column T1 ) and optimal (column Tf ) solution. 1 + number of models computed by OPTSAT- BF (column #Sols). Quality of the first (column Q1 ) and optimal (column Qf ) solution.
514
E. Di Rosa et al. / A New Approach for Solving Satisfiability Problems with Qualitative Preferences
In order to understand the good behavior of our algorithm, Table 2 shows, for each class, the average of the CPU times for finding the first (even if not optimal) (column T1 ) and optimal (column Tf ) solution; the average quality6 of the first (column Q1 ) and optimal (column Qf ) solution; and the average of 1 + the number of models computed by OPTSAT- BF (column #Sols). Looking at the table, we see that the good performances of OPTSAT- BF can be explained by the following factors: 1. the relative quality of the first solution (i.e., Qf /Q1 for rows 1-2 and Q1 /Qf for rows 3-11) is usually very high, greater than 0.96; and 2. the low number of intermediate solutions generated before the optimal one: For 9 classes out of 11, the number in column #Sols is lower or equal than 2.5. Considering that 2 indicates that the first computed model is already optimal, this means that the algorithm converges to an optimal model very quickly. Finally note how, for the two classes in which the first solution is of a low quality, i.e., rows 3 and 10 in Table 2, the convergence is very different: For the MAXSAT class in row 3, T1 is negligible, and all CPU time is spent in “filling the gap” with the optimal result; while for the MAXONE/structured class, most of the time is spent looking for the first solution. As a consequence, in MAX - SAT (resp. MAXONE/structured) the optimal solution is reached by a serie of relatively difficult (resp. easy) intermediate steps.
5
Conclusions
We have defined and implemented a new approach based on DLL for solving satisfiability problems with preferences which does not need any modification to DLL heuristic. The basic idea is that whenever a solution is found, a formula is added to the input set of clauses ensuring that the new model (if any) will be better than the last computed one. The experimental analysis performed on a wide set of, mainly structured, (partial) MAX - SAT and MIN - ONE benchmarks has shown that it leads in most cases to significant improvements when dealing with qualitative preferences, and that it is also competitive with other state-of-the-art systems in the quantitative case. There is a huge literature on expressing and reasoning with preferences, see, e.g. [8], and the various events on preferences taking place every year. If we do not take into account [12, 13], the closest work to ours seems to be the one on CP-nets [7]: In the paper, the authors show that exploring the search space according to the partial order on the values of the variables, the first solution determined is guaranteed to be optimal. CP-nets allows for non-Boolean variables, but on the other hand they only allow to express preferences between values of a same variable: Thus, modeling “I prefer a to b” where a and b are distinct propositional variables cannot be directly captured.
REFERENCES [1] Fadi A. Aloul, Arathi Ramani, Igor L. Markov, and Karem A. Sakallah, ‘PBS: A backtrack search pseudo-Boolean solver’, in Proc. SAT, (2002). [2] Alessandro Armando, Claudio Castellini, Enrico Giunchiglia, Fausto Giunchiglia, and Armando Tacchella, ‘SAT-based decision procedures for automated reasoning: a unifying perspective’, in Mechanizing Mathematical Reasoning: Essays in Honor of J¨org H. Siekmann on the Occasion of His 60th Birthday, volume 2605 of LNCS, Springer Verlag, (2005). 6
Quality is measured in terms of number of variables assigned to true for MIN - ONE and partial MIN - ONE problems, and in terms of number of satisfied clauses for MAX - SAT and partial MAX - SAT problems.
[3] Olivier Bailleux and Yacine Boufkhad, ‘Efficient CNF encoding of Boolean cardinality constraints.’, in Proc. CP, pp. 108–122, (2003). [4] P. Barth, ‘A Davis-Putnam enumeration algorithm for linear pseudoboolean optimization’, Technical report, Max Plank Instutute for Computer Science, (1995). technical Report MPI-I-95-2-2003. [5] A. Biere, A. Cimatti, E. M. Clarke, M. Fujita, and Y. Zhu, ‘Symbolic model checking using SAT procedures instead of BDDs’, in Proceedings of the 36th Design Automation Conference (DAC’ 99), pp. 317– 320. Association for Computing Machinery, (1999). [6] Brian Borchers and Judith Furman, ‘A two-phase exact algorithm for max-SAT and weighted max-SAT problems.’, J. Comb. Optim., 2(4), 299–306, (1998). [7] Craig Boutilier, Ronen I. Brafman, Carmel Domshlak, Holger H. Hoos, and David Poole, ‘CP-nets: A tool for representing and reasoning with conditional ceteris paribus preference statements’, J. Artif. Intell. Res. (JAIR), 21, 135–191, (2004). [8] Jon Doyle, ‘Prospects for preferences’, Computational Intelligence, 20(2), 111–136, (2004). [9] Niklas E´en and Armin Biere, ‘Effective preprocessing in sat through variable and clause elimination’, in Theory and Applications of Satisfiability Testing, 8th International Conference, SAT 2005, volume 3569 of Lecture Notes in Computer Science, pp. 61–75. Springer, (2005). [10] Niklas E´en and Niklas S¨orensson, ‘An extensible SAT-solver’, in Theory and Applications of Satisfiability Testing, 6th International Conference, SAT 2003, Selected Revised Papers, pp. 502–518, (2003). [11] Niklas E´en and Niklas S¨orensson, ‘Translating pseudo-Boolean constraints into SAT’, Journal on Satisfiability, Boolean Modeling and Computation, 2, 1–26, (2006). [12] E. Giunchiglia and M. Maratea, ‘Solving optimization problems with DLL’, in Proc. of 17th European Conference on Artificial Intelligence (ECAI), pp. 377–381, (2006). [13] Enrico Giunchiglia and Marco Maratea, ‘Planning as satisfiability with preferences’, in In Proc. of 22nd AAAI Conference on Artificial Intelligence, pp. 987–992. AAAI Press, (2007). [14] Federico Heras, Javier Larrosa, and Albert Oliveras, ‘MiniMaxSat: A new weighted max-sat solver’, in Proc. of Theory and Applications of Satisfiability Testing - SAT 2007, 10th International Conference, volume 4501 of LNCS, pp. 41–55. Springer, (2007). [15] Matti J¨arvisalo, Tommi Junttila, and Ilkka Niemel¨a, ‘Unrestricted vs restricted cut in a tableau method for Boolean circuits’, Annals of Mathematics and Artificial Intelligence, 44(4), 373–399, (August 2005). [16] Henry Kautz and Bart Selman, ‘Planning as satisfiability’, in Proc. ECAI, pp. 359–363, (1992). [17] Javier Larrosa, Federico Heras, and Simon de Givry, ‘A logical approach to efficient Max-SAT solving’, Artificial Intelligence, 172, 204– 233, (2008). [18] Chu Min Li, Felip Manya, and Jordi Planes, ‘New inference rules for max-sat’, Journal of Artificial Intelligence Research (JAIR). To appear., (2007). [19] V. M. Manquinho and J. P. Marques-Silva, ‘On using cutting planes in pseudo-boolean optimization’, Journal on Satisfiability, Boolean Modeling and Computation (JSAT), 2, 209–219, (2006). [20] Vasco Miguel Manquinho and Olivier Roussel, ‘The first evaluation of pseudo-Boolean solvers (PB’05)’, Journal on Satisfiability, Boolean Modeling and Computation, 2, 103–143, (2006). [21] P. Meseguer S. De Givry, J. Larrosa and T. Schieux, ‘Solving MaxSAT as weighted CSP’, in Proc. of 9th International Conference on Principles and Practice of Constraint Programming (CP 2003), volume 2833 of Lecture Notes in Computer Science, pp. 363–376, (2003). [22] Hossein M. Sheini and Karem A. Sakallah, ‘Pueblo: A modern pseudoboolean sat solver’, in 2005 Design, Automation and Test in Europe Conference and Exposition (DATE 2005), 7-11 March 2005, Munich, Germany, pp. 684–685. IEEE Computer Society, (2005). [23] Joost P. Warners, ‘A linear-time transformation of linear inequalities into conjunctive normal form.’, Information Processing Letters, 68(2), 63–69, (1998). [24] Z. Xing and W. Zhang, ‘MaxSolver: An efficient exact algorithm for (weighted) maximum satisfiability’, Artificial Intelligence, 164(1-2), 47–80, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-515
515
Combining binary constraint networks in qualitative reasoning Jason Jingshi Li1,2 and Tomasz Kowalski1 and Jochen Renz1,2 and Sanjiang Li3 Abstract. Constraint networks in qualitative spatial and temporal reasoning are always complete graphs. When one adds an extra element to a given network, previously unknown constraints are derived by intersections and compositions of other constraints, and this may introduce inconsistency to the overall network. Likewise, when combining two consistent networks that share a common part, the combined network may become inconsistent. In this paper, we analyse the problem of combining these binary constraint networks and develop certain conditions to ensure combining two networks will never introduce an inconsistency for a given spatial or temporal calculus. This enables us to maintain a consistent world-view while acquiring new information in relation with some part of it. In addition, our results enable us to prove other important properties of qualitative spatial and temporal calculi in areas such as representability and complexity.
1
INTRODUCTION
An important ability of intelligent systems is to handle spatial and temporal information. Qualitative calculi such as the Region Connection Calculus (RCC8) [10] or Allen’s Interval Algebra (IA) [1] intend to capture such information by representing relationships between entities in space and time. Such calculi have different advantages compared to quantitative spatial and temporal representations such as coordinate systems. They are closer to everyday human cognition, deal well with incomplete knowledge, and can be computationally more efficient than, say, the full machinery of metric spaces. Defining a qualitative calculus is very intuitive. What is required is a domain of spatial or temporal entities, a set of jointly exhaustive and pairwise disjoint (JEPD) relations between the entities of the domain, and (weak) composition between the relations. These properties are essential for enabling constraint-based reasoning techniques for qualitative calculi [13]. However, not all qualitative calculi that can be defined in this way are equally well suited for representing and reasoning about spatial and temporal information. Consider two consistent sets Θ1 , Θ2 of spatial or temporal information. It is clear that if both sets refer to different entities, then combining the two sets will also lead to a consistent set as there are no potentially conflicting constraints. If the two sets contain information about the same entities, then it is clear that combining the two sets might lead to inconsistencies, as your favourite crime story will amply demonstrate. Here we are interested in a particular kind of com1
RSISE, The Australian National University, Canberra ACT 0200, Australia, email: jason.li|tomasz.kowalski|jochen.renz@anu.edu.au 2 NICTA Canberra Research Laboratory, Canberra ACT 2601, Australia 3 State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, P.R. China, email: lisanjiang@tsinghua.edu.cn
bination of sets, namely, combining sets that share only a very small number of entities and where the relationships between the shared entities are identical in both sets. Assume, for example, that Θ1 and Θ2 contain consistent information about the spatial relationships of entities in two adjacent rooms, Θ1 for room 1 and Θ2 for room 2. Assume further that the two rooms are connected by n closed doors such that the relationships between the n doors are exactly the same in Θ1 and Θ2 , and the doors are the only entities contained in both sets. Without considering any additional information (e.g. that there is only one computer in total, but both room 1 and room 2 contain the computer according to Θ1 and Θ2 ), it is common sense that combining both sets Θ1 and Θ2 to Θ = Θ1 ∪ Θ2 cannot lead to an inconsistency. However, as several examples in the literature show [6], there are qualitative calculi where this property is not satisfied and where inconsistencies are introduced when combining two sets that share a small number of entities with identical relations. Such calculi are counterintuitive and it is questionable whether they should be used for spatial or temporal representation and reasoning at all, as they introduce inconsistencies where there shouldn’t be any. Apart from this problem, there are some practically very important advantages of using a qualitative calculus that allows the consistent combination of two consistent sets of information: (1) It opens up the possibility to use divide-and-conquer techniques and to split a large set of qualitative constraints into smaller sets that can be processed independently. This is an essential requirement for hierarchical reasoning and may also speed up reasoning. (2) It becomes possible to ignore or filter additional information if it is clear that it won’t affect the information important to us. Unfortunately, there is currently no general way of determining for which qualitative calculi consistent sets can be consistently combined and for which calculi unnecessary inconsistencies are introduced. Some initial results were obtained by Li and Wang [6], where a special case of this problem called one-shot extensibility was analysed. Li and Wang considered the case of consistently extending a consistent atomic set of RCC8 constraints by one additional entity, and showed manually by an extensive case analysis that this is always possible for RCC8. Li and Wang showed that one-shot extensibility is also an essential requirement for other important computational properties of a qualitative calculus. In this paper we analyse combinations where two sets share at most two entities and identify a method for automatically testing if this is always possible for a given qualitative calculus. This case is particularly important for different reasons: (1) It provides a purely algebraic and very general proof for one-shot extensibility [6]. (2) It (partially) solves some fundamental questions related to algebraic closure, consistency and (weak) composition, and (3) it provides a purely symbolic test for when a relation algebra is representable.
516
2
J.J. Li et al. / Combining Binary Constraint Networks in Qualitative Reasoning
PRELIMINARIES
A qualitative calculus such as RCC8 or the Interval Algebra defines relationships over a given set of spatial or temporal entities, the domain D. The basic relations B form a partition of D × D which is jointly exhaustive and pairwise disjoint, i.e., between any two elements of the domain exactly one basic relation holds [7]. RCC8 for example uses a topological space of extended regions as the domain and defines eight basic relations DC, EC, PO, EQ, TPP, NTPP, TPPi, NTPPi which are verbalised as disconnected, externally connected, partial overlap, equal, tangential proper part, non-tangential proper part and the converse relations of the latter two [10]. In this paper we intensively use the precise mathematical definitions of relations, algebras and different algebraic operators which we summarize in the following. A more detailed overview can be found in [3, 7, 13]. A nonassociative relation algebra (NA) is an algebraic structure A = (A, ∧, ∨, ;, −,˘, 1’, 0, 1), such that • (A, ∧, ∨, −, 0, 1) is a Boolean algebra • (A, ;,˘, 1’) is an involutive groupoid with unit, that is, a groupoid satisfying the following equations (a) x ; 1’ = 1’ ; x
(b) x˘˘ = x
(c) (x ; y)˘ = y˘; x˘
• the operations ; and ˘ are normal operators, that is, they satisfy the following equations – x;0=0=0;x – 0˘ = 0 – x ; (y ∨ z) = (x ; y) ∨ (x ; z) – (x ∨ y) ; z = (x ; z) ∨ (y ; z) – (x ∨ y)˘ = x˘∨ y˘ • the following equivalences hold (x ; y) ∧ z = 0 iff (z ; y˘) ∧ x = 0 iff (x˘; z) ∧ y = 0 These are called Peircean laws or triangle identities. A nonassociative relation algebra is a relation algebra (RA) if the multiplication operation (;) is associative. For more on relation algebras and nonassociative relation algebras see [4] and [8]. Let A be a NA. For any set U , called a domain, let R(U ) be the algebra (℘(U × U ); ∪, ∩, ◦, −, −1 , Δ, ∅, U × U ), where the operations are union, intersection, composition, complement, converse, the identity relation, the empty relation and the universal relation (all with their standard set-theoretical meaning). Notice that since ◦ is associative, R(U ) is a RA. We say that A is weakly represented over U if there is a map μ : A → ℘(U × U ) such that μ commutes with all operations except ; for which we require only μ(a ; b) ⊇ μ(a) ◦ μ(b) This property of weak representation gives rise to a notion of weak composition of relations, namely, for R, S ∈ μ[A], we define R 7 S to be the smallest relation Q ∈ μ[A] containing R ◦ S. Every NA has a weak representation, for example a trivial one, with U = ∅. Of course, interesting weak representations are nontrivial, and typically injective. A weak representation is a representation if μ is injective and the inclusion above is in fact equality, that is, if μ is an embedding of relation algebras. In such a case weak composition equals composition [12], and that is expressed by saying that weak composition is extensional. Not every NA, indeed not every RA is representable.
Although weak representations are not as interesting as representations, curiously, it is the former that gave rise to a notion of qualitative calculus, which is a triple (A, U, μ) where A is a NA, U is a set and μ : A → U is a weak representation of A over U . Since (A, U, μ) is notationally cumbersome, we will later write A for both a NA and a corresponding calculus (A, U, μ), if U and μ are clear from context or their precise form is not important. A calculus (A, U, μ) is extensional if μ is a representation of A. Notice that if (A, U, μ) is extensional, then A is a RA, indeed a representable one. The converse need not hold, as the example of RCC8 demonstrates. All NAs considered in this paper are assumed to be finite (hence atomic) and such that 1’ is an atom. These are severe restrictions on the class of NAs, but natural from a qualitative calculi point of view. A network N over a NA A is a pair (V, ) where V is a set of vertices (nodes) and : V 2 → A is any function. Thus, a network is a complete directed graph labelled by elements of A. Abusing notation a little we will often write N for the set of vertices of N , if it does not cause confusion. Where double precision is important, we will write VN and N for the set of vertices of N and its labelling function, respectively. For convenience we assume that the set V of nodes is always a set of natural numbers. We will also frequently refer to the label on the edge (i, j) as Rij . A network M is a subnetwork of N , if all nodes and labels of M are also nodes and labels of N . We will write M ≤ N is such case. A network M is a refinement of N if VM = VN and M (i, j) ≤ N (i, j), for any i, j ∈ V (where ≤ is the natural ordering among the labels as elements of A). A network is atomic if all the labels are basic relations (atoms) of A. To indicate atomicity we will sometimes use lower case labels rij . A network N is algebraically-closed (a-closed) if the following hold 1. Rii is the equality relation (identity element of A) 2. Rij 7 Rjk ≥ Rik for any i, j, k ∈ N Networks may be viewed as approximations to (weak) representations, indeed, if μ is a weak representation of A over a domain U , then μ[A] is an a-closed network over A. An arbitrary network N over A is consistent with respect to a weak representation μ, if N is a subnetwork of μ[A]. This paper is mostly concerned with combining a-closed networks without introducing inconsistencies. Let N0 , N1 , N2 be a-closed networks, such that N0 ≤ N1 and N0 ≤ N2 . The triple (N0 , N1 , N2 ) is called a V -formation. A V-formation (N0 , N1 , N2 ) can be amalgamated if there is an a-closed network M such that N1 ≤ M and N2 ≤ M . Such an M is called an amalgam of N1 and N2 over N0 or just an amalgam if the rest is clear from the context. Notice that we do not formally require VM = VN1 ∪ VN2 . However, if an amalgam M exists, its restriction to M ≤ M with VM = VN1 ∪ VN2 is an amalgam as well, so we can always assume that M only has nodes from N1 and N2 . Definition 1 (Network Amalgamation Property) Let A be a qualitative calculus (NA). A has Network Amalgamation Property (NAP), if any V-formation (N0 , N1 , N2 ) of networks over A can be amalgamated by a network M over A. Clearly NAP is a hard property to come by, so some restrictions are necessary. One such restriction calls for the common subnetwork N0 to be small in the following sense. Definition 2 (k-Amalgamation Property) Let A be a qualitative calculus (NA). A has k Amalgamation Property (k-AP), if any Vformation (N0 , N1 , N2 ) of networks over A, such that |N0 | ≤ k, can be amalgamated by a network M over A.
J.J. Li et al. / Combining Binary Constraint Networks in Qualitative Reasoning
Figure 2.
Figure 1. (a) 3-extensibility and, (b) 4-extensibility. Both amalgamate over the edge (1,2). The dashed arrows represent the new edges.
It is obvious that n-AP implies m-AP for n ≥ m. The smallest interesting case for a qualitative calculus is that of 2-AP. We will approach it step by step, beginning with |N1 | = |N2 | = 3, i.e., amalgamation of two triangles over a common edge. We will show that this follows from the associativity of A. The next case, namely, |N1 | = 4 and |N2 | = 3 (adding a triangle to a square) turns out to be crucial. We will analyse it in some detail and then show that certain strong version of this case implies 2-AP for atomic networks.
3
EXTENSIBILITY
In this section we deal with 2-AP for the case with |N2 | = 3, which can be seen as extending an a-closed network N1 by a triangle N2 over a common edge. We refer to this as a one-shot extension [6]. Definition 3 ((generic) k-extensibility) Let A be a qualitative calculus (NA) and k a natural number. A is k-extensible if any atomic V-formation (N0 , N1 , N2 ) of networks over A, such that |N0 | = 2, |N1 | = k and |N2 | = 3, can be amalgamated by a network |M | over A. If Ni (i ∈ {0, 1, 2}) are non-atomic, then A is generically k-extensible (see Figure 1). Lemma 1 Let A be a RA. If A is associative, then A is generically 3-extensible. Proof sketch. Let N0 = {1, 2}, N1 = {0, 1, 2} and N2 = {1, 2, 3}. Put R03 = R01 7 R13 ∩ R02 7 R23 . By associativity, R12 = ∅. We need to show that the triangles {0, 1, 3} and {0, 2, 3} are a-closed. By symmetry it suffices to prove it for {0, 1, 3}, so we need to show three inclusions: (R01 7 R13 ) ∩ (R02 7 R23 ) ≤ R01 7 R13
(1)
R13 ≤ R10 7 [(R01 7 R13 ) ∩ (R02 7 R23 )]
(2)
R01 ≤ [(R01 7 R13 ) ∩ (R02 7 R23 )] 7 R31
(3)
• • • •
517
(a) The RA B9 and, (b) The network S that is not 4-extensible
xIy if x = y xGy if x = y ± 1 (mod 7) xBy if x = y ± 2 (mod 7) xRy if x = y ± 3 (mod 7)
Then, {I, R, G, B} are atoms of a representable relation algebra. Its representation using red for R, green for G and blue for B is shown in Figure 2. This algebra is known as B9 (cf. [5]). Consider the network S = {0, 1, 2, 3} with (0, 1) = R = (2, 3), (0, 3) = B = (1, 2), (0, 2) = G = (1, 3), and (i, i) = I, (i, j) =
(j, i). Verifying that S is a-closed but not extensible is left to the reader as an exercise. Since S is atomic, B9 is not 4-extensible. We will return to S twice more in this paper, hence the fancy font. We did not find any equations that would imply 4-extensibility in a manner similar to the role of associativity in Lemma 1. Checking generic 4-extensibility exhaustively takes too long for even a relatively small calculus such as RCC8. However, we could construct RCC8 networks for which generic 4-extensibility fails. Interestingly, all such networks we managed to construct contained relations that are known to be NP-hard (cf. [11]). On the other hand, 4-extensibility can be exhaustively tested by a program that performs checks on all atomic a-closed networks with four nodes. Theorem 1 If a qualitative calculus (A, U, μ) is extensional and A is not 4-extensible, then a-closure does not decide consistency for networks of atomic relations. Proof sketch. In an extensional calculus, consistent networks can always be extended by one-shot [6]. However, if A is not 4-extensible, then there exists an atomic network N on four nodes that has no aclosed one-shot extension. Therefore N is not consistent. One example of such an algebra is B9 , with S in place of N .
3.1
Strong 4-extensibility
4-extensibility allows two networks of size 3 and 4 respectively to be combined over one edge without introducing inconsistencies. In this section, we show a special case of 4-extensibility that allows us to combine any two atomic networks of arbitrary size over one edge.
The first of these is trivial, the two others follow from relation algebra identities. To show 3-extensibility, put R03 = r01 7 r13 ∩ r02 7 r23 , where rij are atoms. Then any refinement of R03 satisfies the inclusions above, so any atomic refinement r03 satisfies them as well.
Definition 4 (Strong 4-extensibility) Let A be a qualitative calculus (NA). A is strongly 4-extensible if any V-formation (N0 , N1 , N2 ) of atomic networks over A, with N0 = {1, 2}, N1 = {0, 1, 2} and N2 = {1, 2, 3, 4}, can be amalgamated by a network |M | over A, such that for all i ∈ N2 \ N0
Since algebras that fail associativity are somewhat pathological, the above lemma is widely applicable. Unlike 3-extensibility, 4-extensibility may fail in associative algebras, indeed even in representable ones. Consider the group Z7 (the integers under addition modulo 7) and for x, y ∈ Z7 define
Ri0 = (ri1 7 r10 ) ∩ (ri2 7 r20 ) It follows easily by triangle identities that strong 4-extensibility implies 4-extensibility. The beauty of strong 4-extensibility is that for a given one-shot extension, labels for new edges are precisely the
518
J.J. Li et al. / Combining Binary Constraint Networks in Qualitative Reasoning
intersections of compositions of labels on existing edges. This property is in fact possessed by both RCC8 and IA and can be checked even more efficiently than simple 4-extensibility. Theorem 2 If a NA A is strongly 4-extensible, then A has 2Amalgamation Property if N1 , N2 are atomic. Proof sketch. Let (N0 , N1 , N2 ) be a V-formation of atomic networks, with N0 = {0, 1}. Let M = N1 ∪ N2 be the network retaining all the labels from N1 and N2 and with the new labels for edges (x, y) with x ∈ Ni \ Nj and y ∈ Nj \ Ni ({i, j} = {1, 2}) defined by (x, y) = rx0 7 r0y ∩ rx1 7 r1y . We will show that M is a-closed. Suppose the contrary. Then, there is a triangle in M with edges labelled by A,B,C, such that C ≤ A 7 B. Now, A,B and C cannot all be edges from Ni (i ∈ {1, 2}), for Ni is a-closed. So at least one of A,B,C is of the from (x, y) with x ∈ Ni \ Nj and y ∈ Nj \ Ni ({i, j} = {1, 2}). Notice also that there at most two of A, B, C can be such (three such edges do not form a triangle). We have then two cases. If there is exactly one such edge among A,B,C, it violates the assumption of 3-extensibility; if there are exactly two such edges, then it violates the assumption of strong 4-extensibility. Thus, M is a-closed as claimed. The above theorem showed that if the calculus is strong 4extensible, then we can amalgamate any two atomic networks over one edge. In the following we will show additional benefits of strong 4-extensibility for a qualitative calculus or relation algebra. Definition 5 (One-Shot Extensibility [6]) A qualitative calculus (A, U, μ) is one-shot extensible if any consistent atomic V-formation (N0 , N1 , N2 ) with |N0 | = 2 and |N2 | = 3, can be amalgamated by a consistent atomic network M . Corollary 1 If a qualitative calculus A is strongly 4-extensible, and a-closure decides consistency for networks of atomic and universal relations, then A is one-shot extensible. One-shot extensibility was used in [6] to prove (for certain A) that tractability of a set of relations S is equivalent to tractability of its closure Sb under weak composition, intersection and converse. The method from [6] involves numerous manual calculations in the semantics of A. However, if we know that a-closure happens to decide consistency for networks of atomic and universal relations in a qualitative calculus, as it for example does in RCC8 [2], then a simple check on the composition table for strong 4-extensibility is sufficient to prove one-shot extensibility. Definition 6 (One-Shot Proto-Extensibility) A qualitative calculus (NA) A is one-shot proto-extensible if any atomic V-formation (N0 , N1 , N2 ) with |N0 | = 2 and |N2 | = 3, can be amalgamated by an atomic network M . One-shot proto-extensibility ensures that the amalgam has an aclosed atomic refinement. Its advantage over one-shot extensibility is that the it is a syntactic notion that is independent to any (weak) representation. Any representable algebra is trivially one-shot extensible relative to its representation. Theorem 3 Any one-shot proto-extensible RA is representable. Proof sketch. Let A be a RA with the required property. We build a representation of A inductively, beginning with any atomic a-closed triangle. At any given stage i, we have constructed an atomic aclosed network Ni . By one-shot proto-extensibility, we can pick any
atomic a-closed triangle T and add it to Ni , in effect amalgamating Ni and T over an edge thatSthey share, obtaining an atomic aclosed network Ni+1 . Let N = i∈ω Ni . Define μ : A → N putting μ(a) = {(x, y) : N (x, y) = a} for an atom a ∈ A. By finiteness of A, each u ∈ A is a join of finitely many atoms. Thus, we can extend μ onto the whole universe of A setting μ(u) = μ(a1 ) ∪ · · · ∪ μ(an ), where a1 , . . . , an are atoms with u = a1 ∨· · ·∨an . It can be verified that the so defined μ is a representation of A. It is not the case that one-shot extensibility implies one-shot protoextensibility, even for representable algebras. This is connected to the existence of atomic a-closed networks that are not consistent. A counterexample is again provided by B9 , which is representable, hence one-shot extensible, but not one-shot proto-extensible, as the network S in Figure 2 shows.
4
ATOMIC REFINEMENT OF AMALGAMATED NETWORKS
In the previous section we showed that a NA A has the 2-Amalgamation Property over atomic networks if it is strongly 4-extensible. Then if a calculus (A, U, μ) has the property that a-closure decides consistency for networks of atomic and universal relations, there is always an atomic amalgam of the two networks, hence the calculus is one-shot proto-extensible. However, this is not a satisfactory result, as one-shot proto-extensibility is a purely syntactic concept based on the relation algebra, and we want to be able to prove it without resorting to the semantics of the qualitative calculus. We want a procedure that ensures the amalgam always has an a-closed atomic refinement. Such a procedure would provide a purely syntactic way to prove oneshot proto-extensibility, and hence representability.
4.1
Flexibility Ordering
Under strong 4-extensibility, each non-atomic relation in the amalgam of two networks over a common edge is precisely the intersection of the two paths from nodes in one network to another. One way to ensure there is always an atomic refinement to these relations such that the entire network is a-closed is to have a flexible atom (cf. [9]). A relation algebra with a set of atoms B has a flexible atom a if the following condition hold: ∃a ∈ B : ∀b, c ∈ B \ {1’}, a ∈ b 7 c A flexible atom is contained in any composition of two atomic relations, so to make an amalgam atomic and a-closed one would just need to replace all the non-atomic relations in it by the flexible atom. However, requiring a flexible atom is a very strong condition, and we do not know of a qualitative calculus, whose associated algebra has this property. Instead, we propose to construct an ordering of atoms that will emulate this property when refining amalgams, given the algebra has strong 4-extensibility. That is, we create a sequence of atomic relations, such that for any non-atomic edge R in the amalgam, we can refine it to the first element in the sequence that is contained in R, and the network remains a-closed. Formally, let A be a relation algebra with a set of atoms B and S be a sequence of its atoms. A choice refinement of a non-atomic relation R over S is the first member of S that is a refinement of R. Definition 7 (Flexibility Ordering) For a strongly 4-extensible relation algebra A, its Flexibility Ordering is a sequence S of atomic relations, such that for any amalgam M of an atomic V-formation
J.J. Li et al. / Combining Binary Constraint Networks in Qualitative Reasoning
(N0 , N1 , N2 ) with |N0 | = 2, the non-atomic relations from M can be replaced by their respective choice refinements over S and the resulting network is a-closed. The idea is that we define a sequence S of atomic relations such that in any M , when we replace a non-atomic edge R by its choice refinement r over S, it will never be inconsistent with the atomic edges of M , or atomic edges which arise as choice refinements of other non-atomic relations in M that are prior or equal to r in S. To construct such a sequence, we propose an automated procedure that consists of two parts: First, for a given sequence S, that may not cover all cases, we test if a new atomic relation r that is not in S to see if it is compatible with S. That is, for an amalgam M of any two atomic a-closed network {0, 1, 2} and {1, 2, 3, 4}, in the case that no current member of S is contained in the new edge R03 but r is, we check whether the following hold: 1. If R04 is already atomic, then when we replace R03 with r, the triangle {0, 3, 4} is a-closed. 2. Else if there exists a choice refinement r04 of R04 over S, then when we replace R03 with r, R04 with r04 , the triangle {0, 3, 4} is a-closed. 3. Else if R04 contains r, then when we replace both R03 and R04 by r, the triangle {0, 3, 4} is a-closed. If the above hold for all such amalgams M , r and S is compatible. The second part involves the construction of such a list. Starting from an empty list, we incrementally add atoms that pass the compatibility test with the list, and backtrack when no further candidates can pass the test. It is worth noting that each branch of the search tree may terminate early: e.g. if an atom a is not compatible with an empty ordering, then we do not have to test any entries with a at the head of the ordering. Theorem 4 If a NA is strongly 4-extensible, and it has a Flexibility Ordering, then it is one-shot proto-extensible. Proof sketch. From Theorem 2 we get a network M that is a-closed, but the new edges between N1 and N2 may not be atomic. However, with a Flexibility Ordering we can refine each of these edges to atomic relations, knowing that similar atomic refinements of other new edges will not introduce an inconsistent triple, since we have checked all possible cases in the construction of the Flexibility Ordering. Therefore the entire network is refined to be atomic and aclosed, thus the relation algebra is one-shot proto-extensible. This general result, together with Theorem 3, allows us to prove representability of a RA A from its composition table. This means that A can be a part of an extensional qualitative calculus (A, U, μ). It also implies that consistency can be preserved when amalgamating two atomic a-closed networks over two nodes if we know that aclosure decides consistency for only atomic relations.
4.2
Empirical Evaluations of Flexibility Ordering on RCC8 and Interval Algebra
Both RCC8 and IA are prime candidates to test for Flexibility Orderings, as they are well known and non-trivial calculi in the spatialtemporal domain, and their respective relation algebras are both strongly 4-extensible. For RCC8, the procedure found the Flexibility Ordering: (DC, EC, PO, TPP, TPPi), whereas for IA, the procedure found (<, di, o, s, oi, f ). Hence we have proved from their composition table that their relation algebras are representable.
519
Computationally the worst case of the procedure is O(|B|!). However, this would be extremely rare, as most branches of the search tree will be terminated earlier than exhaustive search, thus trimming down a majority of potential search space. For IA, with 13 atoms, the procedure found an ordering in 4 seconds on a Intel Core2Duo 2.4GHz processor with 2GB RAM, and for RCC8 it found a solution in less than a second. Therefore, our procedure is widely applicable.
5
CONCLUSION AND FUTURE WORK
We provided sufficient conditions to amalgamate two atomic networks of any size over a common edge. Hence, for a calculus where a-closure decides consistency for networks that contain only atomic and universal relations, two atomic networks can always be consistently amalgamated. The property of strong 4-extensibility, together with other known results, also tell us when a-closure does not decide consistency for atomic networks. It provides an efficient computational test to check, for a non-extensional calculus, whether complexity results for a set of relations can be transferred to its closure. More importantly, we have provided a procedure that proves the resulting amalgamated network has an a-closed atomic refinement, independent of any information about the domain of the calculus. This allows us to prove representability of a relation algebra from its composition table. It preserves consistency under amalgamation of two atomic networks over two nodes, if a-closure decides consistency for networks of atomic relations. The first obvious future step is to see whether two atomic a-closed networks can be amalgamated over n nodes for n > 2. Then we take it to the non-atomic ones. It is also interesting to see under what conditions a calculus has the Network Amalgamation Property, that is, networks can be combined regardless of number of shared nodes. Secondly, our proposed notion of one-shot proto-extensibility is a sufficient, but not necessary condition for representability of a relation algebra. There are other representable relation algebras which are not one-shot proto-extensible. It would be interesting to see if there are any connections between one-shot proto-extensibility and Hirsch-Hodkinson type games [4], and whether Hirsch-Hodkinson games can be interpreted as a sequence of one-shot extensions.
REFERENCES [1] J.F. Allen, Maintaining knowledge about temporal intervals, Comm. ACM, 26, 832–843, 1983. [2] B. Bennett, Determining consistency of topological relations, Constraints, 3, 213–225, 1998. [3] I. D¨untsch, Relation algebras and their application in temporal and spatial reasoning, Artif. Intell. Rev., 23-4, 315-357, 2005. [4] R. Hirsch, I. Hodkinson, Relation algebras by games, Elsevier, 2002. [5] P. Jipsen, E. Luk´acs, Minimal relation algebras, Algebra Universalis, 32, 189–203, 1994. [6] S. Li and H.Wang, RCC8 binary constraint network can be consistently extended, Artificial Intelligence, 170, 1-18, (2006). [7] G. Ligozat, J. Renz, What is a qualitative calculus? A general framework, Proc. PRICAI’04, 53-64, 2004 [8] R. Maddux, Relation algebras, Elsevier, 2006. [9] R. Maddux, Some varieties containing relation algebras, Trans. Amer. Math. Soc., 2, 501–526, 1982. [10] D.A. Randell, Z. Cui, and A.G. Cohn, A spatial logic based on regions and connection. Proc. KR’92, 165–176, 1992. [11] J. Renz, Maximal tractable fragments of the Region Connection Calculus: A complete analysis, Proc. IJCAI’99, 448–455, 1999. [12] J. Renz, G. Ligozat, Weak composition for qualitative spatial and temporal reasoning, Proc. CP’05, 534-548, 2005. [13] J. Renz, B. Nebel, Qualitative spatial reasoning using constraint calculi, Handbook of Spatial Logics, Springer, 161-215, 2007.
520
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-520
Solving Necklace Constraint Problems Pierre Flener and Justin Pearson 1 Abstract. Some constraint problems have a combinatorial structure where the constraints allow the sequence of variables to be rotated (necklaces), if not also the domain values to be permuted (unlabelled necklaces), without getting an essentially different solution. We bring together the fields of combinatorial enumeration, where efficient algorithms have been designed for (special cases of) some of these combinatorial objects, and constraint programming, where the requisite symmetry breaking has at best been done statically so far. We design the first search procedure and identify the first symmetrybreaking constraints for the general case of unlabelled necklaces. Further, we compare dynamic and static symmetry breaking on reallife scheduling problems featuring (unlabelled) necklaces.
1 INTRODUCTION In combinatorics, a necklace of n beads over k colours is the lexicographically smallest element in an equivalence class of the set of k-ary n-tuples under rotations; the underlying symmetry group is the cyclic group Cn acting on the indices. For example, the binary triple 001 is the representative necklace of {001, 010, 100}. Combinatorial objects are enumerated under some chosen total order. For example, under the lexicographic order, the binary 3-bead necklaces are 000, 001, 011, and 111. If the values (colours) of a tuple are interchangeable, then we speak of unlabelled tuples (symmetric group Sk acting on the values) and unlabelled necklaces (product group Cn × Sk ). For example, under the lexicographic order, the unlabelled binary 3-tuples are 000, 001, 010, and 011, while the unlabelled binary 3bead necklaces are 000 (representing the necklaces 000 and 111) and 001 (representing the necklaces 001 and 011). The generating functions for counting (unlabelled) necklaces are given in [6], and the sequences of their counts (for k ≤ 6) can be found in [16]. A constraint satisfaction problem (CSP) is a triplet X, D, C, where X is a sequence of n variables, D is a set of k possible values for these variables and is called their domain, and C is the set of constraints specifying which assignments of values to the variables are solutions. If the constraint set C allows the variable sequence X to be rotated, then a necklace is a combinatorial sub-structure of the CSP and we say that the CSP has rotation variable symmetry. If the constraint set C has a domain D containing interchangeable elements, then we say that the CSP has full value symmetry. Exploiting such symmetry is important in order to solve a CSP efficiently. For example, compare the ternary object counts in Table 1 with 3n . CSPs with an (unlabelled) necklace as a combinatorial substructure are not unusual. For example, Gusfield [9, page 12] states that “circular DNA is common and important. [sample organisms omitted.] Consequently, tools for handling circular strings may someday be of use in those organisms”. One such problem is studied 1
Department of Information Technology, Uppsala University, Box 337, SE – 751 05 Uppsala, Sweden. Email: Firstname.Surname@it.uu.se
in [3]. Necklaces occur in coding theory [7], genetics [7], and music [6], while unlabelled necklaces occur in switching theory [6]. We study a real-life problem with (unlabelled) necklaces in scheduling, different from the one in [8]. In this paper, we propose to bring together combinatorial enumeration and constraint programming (CP). Very efficient combinatorial enumeration algorithms exist for some of the mentioned combinatorial objects, but not for unlabelled necklaces (except over two colours [2]). These algorithms can be used as CP search procedures for CSPs having those objects as combinatorial sub-structures, thereby breaking a lot of symmetry dynamically. This has also been advocated in [13], say, where a generic CP search procedure is proposed for an arbitrary symmetry group acting on the values; however, except for [15] not much dynamic symmetry breaking seems to have been done for groups acting on the variables. Conversely, CP principles can be used for devising enumeration algorithms for the combinatorial objects where efficient algorithms have remained elusive to date. The contributions of this paper can be summarised as follows: • Design of an enumeration algorithm, and hence a CP search procedure, for (partially) unlabelled k-ary necklaces (Sections 2 and 4). • Identification of symmetry-breaking constraints for (partially) unlabelled k-ary necklaces, including filtering algorithms for the identified new global constraints (Sections 3 and 4). • Experiments on real-world problems validating the usefulness of the proposed dynamic and static symmetric-breaking methods for (partially unlabelled) k-ary necklaces (Section 4). Finally, in Section 5, we conclude and discuss future research. In the following, consider a CSP X, D, C where X is a sequence of n ≥ 2 variables and D is a set of k ≥ 1 domain values. We assume that D = {0, . . . , k − 1}; this also has the advantage that the order is obvious whenever we require D to be totally ordered.
2 DYNAMIC SYMMETRY BREAKING Unlabelled Tuples. If the domain values of D are interchangeable, then we impose a total order on D, and the enumeration algorithm of [5], say, can be used to generate all unlabelled tuples (modulo the full value symmetry). We present it as Algorithm 1 in the style of a search procedure in constraint programming (CP), so that it can interact with any problem constraints. The initial call is utuple(1, −1). At any time, j is the index of the next variable to be assigned (and j = n + 1 when none remains) while u is the largest value used so far (and u = −1 when none was used yet). The idea is to try for each variable all the values used so far plus one unused value, since all unused values are still interchangeable at that point. Upon backtracking, the try all construct non-deterministically tries all the alternatives, in the given value order (line 6). Each alternative contains the assignment of the chosen value i to the chosen variable X[j]
P. Flener and J. Pearson / Solving Necklace Constraint Problems
1: 2: 3: 4: 5: 6: 7: 8: 9: 10:
procedure utuple(j, u : integer) var i : integer if j > n then return true else try all i = 0 to min(u + 1, k − 1) do X[j] ← i; utuple(j + 1, max(i, u)) end try end if
Algorithm 1: Search procedure for unlabelled tuples [5] 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:
procedure necklace(j, p : integer) var i : integer if j > n then return n mod p = 0 else try all i = X[j − p] to k − 1 do X[j] ← i; necklace(j + 1, if i = X[j − p] then p else j) end try end if
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:
521
procedure uneck (j, p, u : integer) var i : integer if j > n then return n mod p = 0 else try all i = X[j − p] to min(u + 1, k − 1) do if probe(j, i, p) then X[j] ← i; uneck (j +1, if i = X[j −p] then p else j, max(i, u)) end if end try end if function probe(j, i, p : integer) : boolean X[j] ← i; if j = n ∧V n mod (if i = X[j − p] then p else j) = 0 then return q=n X[q, . . . , n, 1, . . . , q − 1] ≥lex X[1, . . . , n] 2 else if j < V n then return j−1 q=2 X[j − q + 1, . . . , j] ≥lex X[1, . . . , q] else return false end if
Algorithm 3: Probing search procedure for unlabelled necklaces
Algorithm 2: Search procedure for necklaces [2] (line 7) and a recursive call for the next variable (line 8). Note that we have fixed the variable order to be from left to right across X, and the tuples are thus generated in lexicographic order; this is an unnecessary restriction, but the reason for this choice will become clear in a few lines. This algorithm takes constant amortised time and space, and the number of objects generated is actually equal to the number of unlabelled tuples. Necklaces. If the variable sequence X is circular, then the enumeration algorithm of [2], say, can be used to generate all necklaces (modulo the rotation variable symmetry). We present it as a CP search procedure in Algorithm 2. The initial call is X[0] ← 0; necklace(1, 1), where X[0] is a dummy element. At any time, j is the index of the next variable to be assigned (and j = n + 1 when none remains) while p is the period, explained next. The idea is either to try and keep replicating the values at the previous p positions, or to try all larger values with a new period of j. At any time, the prefix X[1, . . . , j] is a pre-necklace, that is a prefix of some necklace, which may however be longer than n. The variable order is necessarily from left to right across X, due to the role of p, and the necklaces are thus generated in lexicographic order. This algorithm takes constant amortised time and space, and the number of objects generated is proportional by a constant factor (tending down to (k/(k − 1))2 as n → ∞) to the number of necklaces: note that only n-tuples where the period p divides n actually are necklaces (line 4). In other words, not all symmetry is broken at every node of the search tree, and some backtracking is forced (by a constant-time test on p) only at leaf level; at present, loopless necklace enumeration remains elusive. Unlabelled Necklaces. If the variable sequence X is circular and the domain values of D are interchangeable, then a constant-amortisedtime enumeration algorithm [2] only exists for generating all binary (k = 2) unlabelled necklaces (modulo the symmetries). We do not present it here, but instead construct a novel enumeration algorithm for any amount of colours. Noting that unlabelled necklaces are a subset of the necklaces (Algorithm 2) that are unlabelled tuples (Algorithm 1), and observing that the control flows of those two algorithms match line by line, the skeleton of an enumeration algorithm for unlabelled necklaces can be obtained simply by “intersecting” those two algorithms, which yields all but lines 7 and 10 of
the CP search procedure uneck in Algorithm 3. The initial call is X[0] ← 0; uneck (1, 1, −1), where X[0] is a dummy element. We now gradually refine the probe(j, i, p) function (called in line 7), guarding the non-deterministic assignment of value i to the current variable X[j] followed by the continued enumeration. Leaf Probing. If probe always returns true, then uneck will enumerate a superset of the unlabelled necklaces, as their symmetry group is the product rather than just the union of the symmetry groups for necklaces and unlabelled tuples. For example, the binary 3-necklace 011 will erroneously be returned, even though it can be transformed into the unlabelled necklace 001 (by first rotating the second position of the circular sequence 011 into first position, giving 110, and then minimally renaming its colours, giving 110 = 001); however, the necklace 111 will correctly not be returned, since it is not an unlabelled tuple. Consider the left half of Table 1, giving the numbers of various combinatorial objects of length n over 3 colours: column 7 counts the unlabelled tuples (sequence A124302 in [16]); column 6 counts the necklaces (fewer than the unlabelled tuples for n ≥ 7; sequence A1867); column 5 counts the necklaces that are unlabelled tuples, that is the number of pre-necklaces when probe always returns true; and column 2 counts the unlabelled necklaces (sequence A2076). The difference between columns 5 and 6 (or 7) shows the gain obtained so far for free by Algorithm 3 over Algorithm 2 (or Algorithm 1), but the difference between columns 5 and 2 shows the amount of pruning that leaf probing has to do. The least thing probe(j, i, p) should thus do is to make sure only unlabelled necklaces are enumerated. This is at the latest done when trying to assign the last variable (when j = n) of the CSP: at that moment, the entire circular sequence X is known, so probe must return true if X cannot be transformed (by position rotation and col or renaming) into an object that has already been tried in the enumeration. Since objects are enumerated in lexicographic order (as an inherited feature of the two underlying algorithms), this can be done by checking whether the minimal renaming of every (non-unit) rotation of X is lexicographically larger than or equal to X. Computing the minimal renaming Y of an n-tuple Y takes Θ(n) time, and can be merged into the O(n)-time lexicographic comparison; at most n − 1 such renamings and comparisons are done, hence this probing takes
522
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Table 1.
P. Flener and J. Pearson / Solving Necklace Constraint Problems
seq. A2076: unecks 1 2 3 6 9 26 53 146 369 1002 2685 7434 20441 57046 159451
probing internal + leaf n mod p = 0 leaves 1 1 2 2 4 5 8 10 15 22 34 48 80 121 196 293 490 744 1267 1920 3357 5104 8996 13635 24403 37030 66886 101354 184770 279895
leaf only leaves 1 2 5 13 36 97 268 732 2017 5552 15371 42624 118731 331664 929883
seq. A1867: necks 3 6 11 24 51 130 315 834 2195 5934 16107 44368 122643 341802 956635
seq. A124302: utuples 1 2 5 14 41 122 365 1094 3281 9842 29525 88574 265721 797162 2391485
necklaces Algo. 2 Cons. (3) time time 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.02 0.04 0.04 0.11 0.11 0.24 0.30 0.78 0.81 2.12 2.22 5.91 6.24 16.54 17.25
Algo. 3 time (leaf) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.06 0.20 0.63 1.95 6.06 18.82 58.56
unlabelled necklaces Algo. 3 Cons. (1) and (4) time (all) time fails 0.00 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 0.01 2 0.00 0.01 6 0.00 0.03 9 0.01 0.07 29 0.02 0.18 69 0.06 0.50 181 0.16 1.48 469 0.49 4.54 1240 1.58 13.33 3298 4.65 41.04 8919 14.50 122.46 24328 44.89 374.12 66865
Numbers of objects of length n over 3 colours, and their enumeration times (in seconds) via dynamic & static (constraint-based) symmetry breaking
O(n2 ) time at worst. Note that a successful probe incurs the highest cost. The algorithmic details are trivial, so we just write a specification into line 16. Lazy evaluation of the conjunction should be made, returning false as soon as one conjunct is false. Also, experiments have revealed that failure is detected earlier on the average if the starting positions of the rotations recede from right to left across X. An improvement of this leaf probing comes from observing what happens when the lowest value, namely X[j − p], is tried for X[j] when j = n: the recursive call (line 9) then is uneck (n + 1, p, u) and everything hinges on whether n mod p = 0 or not. But the latter check can already be done before probing (in O(n2 ) time, recall) whether X[j − p] actually is a suitable value for X[n]. For any other tried value i > X[j − p] for X[n], the recursive call (line 9) is uneck (n + 1, n, max(i, u)) and we then know that n mod n = 0. Hence the test in line 15, as well as lines 19 and 20. Internal Probing. The leaf probing discussed so far assumes that line 18 is replaced by return true. This is unsatisfactory, as no pruning (other than via the p and u parameters) takes place at the internal nodes of the search tree, so that many more leaves are generated than necessary (recall the difference between columns 5 and 2 in Table 1). In the spirit of constraint programming, we ought to perform more pruning when j < n. The idea is the same as for leaves (where j = n) except that only a strict prefix X[1, . . . , j] of the circular sequence X is known, so that we can only check whether the minimal renaming of every suffix of X[1, . . . , j] is lexicographically larger than or equal to X[1, . . . , j]. For example, when searching for a ternary 6-bead unlabelled necklace, assume we have already constructed the pre-necklace 010 and probe(4, 2, 4) is now called to check whether at position j = 4 < 6 = n the variable X[4] can be assigned the (so far unused) value i = 2 = u + 1 = k − 1 under period p = 4, so the following comparisons must be made: 2 02 102 0102
= = = =
0 01 012 0102
≥lex ≥lex ≥lex ≥lex
0 01 010 0102
(4) (3) (2) (1)
The first and last comparisons will always succeed and can be omitted. Exactly j−2 such renamings and comparisons of tuples of length O(j − 1) are thus to be done, hence this internal probing also takes O(n2 ) time at worst, since j = O(n). The algorithmic details are trivial, so we just write a specification into line 18. Again, lazy evaluation of the conjunction should be made. Also, experiments have revealed that failure is detected earlier on the average if the starting positions of the suffixes recede from right to left across X[1, . . . , j], as in the top-down order of the sample comparisons above.
To assess the impact of internal probing, consider again the left half of Table 1: column 4 gives the new numbers of pre-necklaces (much lower than in column 5), and column 3 counts the prenecklaces that are accepted by the test on the period p. The difference between columns 3 and 2 is the amount of pruning that leaf probing now has to do, and the difference between columns 4 and 3 is the amount of pruning done by the period test. Note that the constanttime period test prunes much more than the quadratic-time probing. Incremental Internal Probing. Empirically, on average, the internal probing just proposed is much more efficient than its O(n2 ) worst time suggests, due to the nature of unlabelled necklaces. We now optimise this internal probing into an algorithm taking O(n) time at worst, leading to an enumeration that is systematically faster by a constant factor (namely 17% faster in our implementation). The idea is to trade time for space and make the comparisons incremental. Continuing our previous example, having so far constructed the pre-necklace 0102 of a ternary 6-bead unlabelled necklace, probe(5, 1, 5) is eventually called at the next iteration to check whether at position j = 5 < 6 = n the variable X[5] can be assigned the value i = 1 under period p = 5, so the following comparisons must be made: 1 21 021 1021 01021
= = = = =
0 01 012 0120 01021
≥lex ≥lex ≥lex ≥lex ≥lex
0 01 010 0102 01021
(5 ) (4 ) (3 ) (2 ) (1 )
Note that the last four comparisons correspond to the ones given earlier, that the considered suffixes of X[1, . . . , j] got longer at the end by the new (boldfaced) value i = 1, and that the minimal renamings of the (non-boldfaced) prefixes remained the same. In other words, only the scalar comparisons of the (boldfaced) last values matter, since the lexicographic ≥lex comparisons of the (non-boldfaced) prefixes have already been made until the previous iteration. If the lexicographic comparison until the previous iteration is =lex , as in formulas (1), (3), and (4), then the scalar comparison operator is ≥ at the current iteration; if the lexicographic comparison until the previous iteration is >lex , as in formula (2), then no scalar comparison need be made at the current iteration. We incrementally maintain a global k × n matrix m, where m[i, j] gives the minimal renaming of value i if the renaming starts at position j. We also incrementally maintain locally to every search-tree node an n-tuple c of Booleans, where c[j] = true if the lexicographic comparison from position j until the previous iteration is =lex , that is if the comparison from j is to continue at the current iteration. For example, since the scalar
P. Flener and J. Pearson / Solving Necklace Constraint Problems
comparison in formula (3 ) gives 2 > 0, we set c[3] ← false for the next iteration. Using these incremental data structures, the internal probing in line 18 can be replaced by the following specification (the algorithmic details, including the incremental maintenance of c and m, are omitted for space reasons): q=j−1
return
^
(if c[q] then m[i, q] ≥ X[j + 1 − q] else true)
2
At most j − 2 scalar comparisons are to be done, hence this incremental internal probing takes O(n) time at worst, since j = O(n) and the incremental maintenance of c[1 . . . j] and m[i, 1 . . . j] takes O(n) time at worst. Lazy evaluation of the conjunction should be made. Failure is detected earlier on the average if the starting positions of the suffixes recede from right to left across X[1, . . . , j], as in the top-down order of the sample comparisons above. Discussion. An analysis of the amortised complexity of Algorithm 3 is beyond the scope of this paper. Its correctness follows from line 16 capturing the essence of unlabelled necklaces and the correctness of Algorithms 1 and 2. To assess the runtime impact of internal probing, consider the right half of Table 1: the fourth-last and third-last columns give the enumeration times (in seconds) if there is only leaf probing and also internal probing, respectively. (All experiments in this paper were performed under SICStus Prolog v4.0.2 on a 2.53 GHz Pentium 4 machine with 512 MB running Linux 2.6.20.)
3 STATIC SYMMETRY BREAKING Unlabelled Tuples. To break full value symmetry, it suffices to order the positions of the first occurrences, if any, of each value. Letting firstPos(i) denote the first position, if any, of value 0 ≤ i < k in X under the current assignment, and n + 1 + i otherwise, the following k − 1 binary constraints break full value symmetry [11]: firstPos(0) < firstPos(1) < · · · < firstPos(k − 1). A more efficient filtering algorithm can be designed for the conjunction of these constraints, giving a new global constraint, called orderedFirstOccurrences(X, D)
(1)
A checker for this global constraint can be specified as a deterministic finite automaton (DFA) (omitted for space reasons), so that we get a filtering algorithm using the automaton global constraint [1]. Necklaces. To break rotation variable symmetry, we apply the socalled lex-leader scheme [4], which says that any variant of a wanted solution under all the symmetries of the considered symmetry group must be lexicographically larger than or equal to that solution. For necklaces, this means that all the rotations of the sequence X must be lexicographically larger than or equal to X itself: n ^
X[q, . . . , n, 1, . . . , q − 1] ≥lex X[1, . . . , n]
(2)
q=2
These n − 1 constraints over sequences of exactly n elements have been logically minimised in [8] to the following n − 1 constraints over sequences of at most n − 1 elements: n ^
X[q, . . . , (2q − 3) mod n + 1] ≥lex X[1, . . . , q − 1]
(3)
q=2
Reading from right to left, this constrains the first q − 1 elements of X to be lexicographically smaller than or equal to the cyclically next q − 1 elements of X, for 2 ≤ q ≤ n. Future work includes
523
designing a more efficient filtering algorithm for the conjunction of these global lexicographic constraints. Unlabelled Necklaces. The conjunction of the constraints (1) and (3) accepts all necklaces that are unlabelled tuples (just like Algorithm 3 without probing). In fact, the rotation variable symmetry and full value symmetry can be broken by the constraints (1) together with the probing tests in line 16 of Algorithm 3 seen as constraints: n ^
X[q, . . . , n, 1, . . . , q − 1] ≥lex X[1, . . . , n]
(4)
q=2
The difference with (2) and (3) lies in the minimal renaming of the left-hand sides. The logic minimisation of (2) into (3) does not apply to (4). A checker for the required A ≥lex B global constraint can be specified as a DFA (omitted for space reasons), so that we get a filtering algorithm using the automaton global constraint [1]. The idea is to augment the classical DFA for ≥lex [1] with variables representing the smallest value used so far and the minimal-renaming bijection on D (encoded by an allDifferent constraint). Discussion. The proof of correctness and completeness of the introduced symmetry-breaking constraints is omitted for space reasons. To assess the runtimes (in seconds) of dynamic and static symmetry breaking, consider the right half of Table 1. Unmentioned numbers of backtracks are zero. For necklaces, columns 8 and 9 reveal a slight advantage of Algorithm 2 over constraints (3). For unlabelled necklaces, the last three columns reveal a huge advantage of Algorithm 3 over constraints (1) and (4). However, these runtimes were obtained in the absence of any problem-specific constraints, and static symmetry breaking usually performs better than dynamic symmetry breaking in the presence of problem-specific constraints. We address this issue in the next section.
4 EXPERIMENTS We now experimentally compare the proposed dynamic and static symmetry-breaking (SB) methods on real-life scheduling problems containing an (unlabelled) necklace as a combinatorial sub-structure. Example: Rotating Schedules. Many industries and services need to function 24/7. Rotating schedules, such as the one in Figure 1 (a real-life example taken from [10]) are a popular way of guaranteeing a maximum of equity to the involved work teams. In our example, there are day (d), evening (e), and night (n) shifts of work, as well as days off (x). Each team works maximum one shift per day. The scheduling horizon has as many weeks as there are teams. In the first week, team i is assigned to the schedule in row i. For any next week, each team moves down to the next row, while the team on the last row moves up to the first row. Note how this gives almost full equity to the teams, except, for instance, that team 1 does not enjoy the six consecutive days off that the other teams have, but rather three consecutive days off at the beginning of week 1 and another three at the end of week 5. We here assume that the daily workload is uniform. In our example, each day has exactly one team on-duty for each work shift, and hence two teams entirely off-duty; assuming the work shifts average 8h, each employee will work 7 · 3 · 8 = 168h over the five-week-cycle, or 33.6h per week. Daily workload can be enforced by global cardinality (gcc) constraints on the columns. Further, any number of consecutive workdays must be between two and seven, and any change in work shift can only occur after two to seven days off. This can be enforced by stretch constraints [12] on the table flattened row-wise into a sequence. (A filtering algorithm for the stretch constraint, which is not a built-in of SICStus Prolog, was automatically obtained from a DFA model of a constraint checker using
524
P. Flener and J. Pearson / Solving Necklace Constraint Problems
Week 1 2 3 4 5 Figure 1.
Mon x x d e n
Tue x x d e n
Wed x e d x n
Thu d e x x n
Fri d e x n x
Sat d x e n x
Sun d x e n x
instance 1d, 1e, 1n, 2x 2d, 2e, 2n, 2x
Figure 2.
unique sol’s 2274 4115 4950 3444
Algorithm 2 time fails 7 228823 50 959970 199 2922846 603 7526564
Constraints (3) time fails 4 9140 26 69704 147 408669 558 1587889
Cons. (5) and (4 ) time fails 205 2964 31193 313587
5 CONCLUSIONS no SB time 21 158 751 2581
Performance comparison on necklace schedules
the (built-in) automaton global constraint [1].) We assume that soft constraints, such as full weekends off as numerous and well-spaced as possible, are enforced by manual selection among schedules satisfying the hard constraints. In our example, there are two full weekends off, in the optimally spaced rows 2 and 5. Necklaces. Under the given assumption (uniform workload) and constraints (gcc and stretch), any rotating schedule has the symmetries of necklaces, when we view it flattened row-wise into a sequence. In addition to the classical instance in Figure 1, here denoted 1d, 1e, 1n, 2x, we ran experiments over other instances. For example, instance 2d, 2e, 1n, 2x has the uniform daily workload of 2 teams each on the day and evening shifts, 1 team on the night shift, and 2 teams off-duty. Figure 2 gives the obtained runtimes (in seconds) and numbers of backtracks (fails) over all solutions. The time ratio to all solutions between SB and no-SB is a good indicator of that time ratio to the first optimal solution (say, with the maximum number of full weekends off), as branch-and-bound essentially iterates over many solutions in order to pick the best. On average, when breaking the symmetries statically, the default variable ordering (trying the leftmost variable) is better than first-fail (trying the leftmost variable with the smallest domain) and most-constrained (trying the leftmost variable with the smallest domain that has the most constraints suspended), with the default bottom-up value ordering, hence the runtimes for static symmetry-breaking are given for the default orderings. Static symmetry-breaking, in the presence of the problemspecific constraints, is now faster than dynamic symmetry-breaking. Partially Unlabelled Necklaces. Under the uniform workload assumption, some rotating schedules even have many of the symmetries of unlabelled necklaces. In our instances for 5 and 8 weeks, the constraints do not distinguish between the d, e, n work shifts, so that those values are interchangeable. To break such partial value symmetry dynamically, it suffices to replace line 6 of Algorithm 3 by try all i ∈ {X[j − p], . . . , min(u + 1, k − 2)} ∪ {k − 1} and to make the minimal renamings Y in lines 16 and 18 respect the subsets D ⊆ D of interchangeable values; in our case D = {d, e, n} ∪ {x}. We denote the resulting search procedure by Algorithm 3 . To break this partial value symmetry statically, it suffices to post one orderedFirstOccurrences(X, D ) for each subset D : firstPos(d) < firstPos(e) < firstPos(n)
Algorithm 3 time fails 13 35969 703 1380876
Figure 3. Comparison on partially unlabelled necklace schedules
A five-week rotating schedule with uniform workload
instance 1d, 1e, 1n, 2x 2d, 1e, 1n, 2x 2d, 2e, 1n, 2x 2d, 2e, 2n, 2x
unique sol’s 402 274
(5)
Together with an adaptation, denoted (4 ), of constraints (4) where Y respects the D , we have a static symmetry-breaking method for such partially unlabelled necklaces. Figure 3 gives the obtained runtimes (in seconds) and numbers of backtracks (fails) over all solutions. Static symmetry breaking, in the presence of the problem-specific constraints, is still a lot slower than dynamic symmetry breaking.
By bringing together the fields of combinatorial enumeration and constraint programming, we have extended existing results for dynamically and statically breaking the rotation variable symmetry of necklaces into new symmetry-breaking methods dealing also with the additional full value symmetry of unlabelled necklaces. On an example, we have also shown how to specialise these methods when the value symmetry of unlabelled necklaces is only partial. In the absence of problem-specific constraints, the dynamic symmetrybreaking methods outperform the static ones, narrowly for necklaces but largely for unlabelled necklaces. On a real-life scheduling problem we have shown that, in the presence of problem-specific constraints, the static method becomes faster for necklaces, but not for partially unlabelled necklaces. One should be aware of existing enumeration algorithms for special cases, such as the constant-amortised-time algorithms for unlabelled binary necklaces [2], or for necklaces with fixed content [14]. For instance, under the given assumption (uniform workload) and constraints, rotating schedules are necklaces with fixed content, so the algorithm of [14] should be tried instead of Algorithm 2. Future work includes the quest for a constant-amortised-time enumeration algorithm for unlabelled k-ary necklaces. Acknowledgements. We are supported by grant IG2001-67 of the Swedish Foundation for International Cooperation in Research and Higher Education, and by grant 70644501 of the Swedish Research Council. We thank J. Sawada and V. Vajnovszki for discussions.
REFERENCES [1] N. Beldiceanu, M. Carlsson, and T. Petit, ‘Deriving filtering algorithms from constraint checkers’, CP’04, LNCS 3258:107–122. Springer. [2] K. Cattell, F. Ruskey, J. Sawada, M. Serra, and C. R. Miers, ‘Fast algorithms to generate necklaces, unlabeled necklaces, and irreducible polynomials over GF (2)’, Journal of Algorithms 37(2):267–282, (2000). [3] W. Y. C. Chen and J. D. Louck, ‘Necklaces, MSS sequences, and DNA sequences’, Advances in Applied Mathematics 18(1):18–32, (1997). [4] J. M. Crawford et al., ‘Symmetry-breaking predicates for search problems’, KR’96, pp. 148–159. Morgan Kaufmann, (1996). [5] M. C. Er, ‘A fast algorithm for generating set partitions’, The Computer Journal 31(3):283–284, (1988). [6] E. N. Gilbert and J. Riordan, ‘Symmetry types of periodic sequences’, Illinois Journal of Mathematics 5:657–665, (1961). [7] S. W. Golomb, B. Gordon, and L. R. Welch, ‘Comma-free codes’, Canadian Journal of Mathematics 10(5):202–209, (1958). [8] A. Grayland, I. Miguel, and C. Roney-Dougal, ‘Minimal ordering constraints for some families of variable symmetries’, SymCon’07, (2007). [9] D. Gusfield, Algorithms on Strings, Trees, and Sequences, CUP, 1997. [10] G. Laporte, ‘The art and science of designing rotating schedules’, Journal of the Operational Research Society 50(10):1011–1017, (1999). [11] Y. C. Law and J. Lee, ‘Symmetry breaking constraints for value symmetries in constraint satisfaction’, Constraints 11(2–3):221–267, (2006). [12] G. Pesant, ‘A filtering algorithm for the stretch constraint’, CP’01, LNCS 2239:183–195. Springer, (2001). [13] C. M. Roney-Dougal et al., ‘Tractable symmetry breaking using restricted search trees’, ECAI’04, pp. 211–215. (2004). [14] J. Sawada, ‘A fast algorithm to generate necklaces with fixed content’, Theoretical Computer Science 301(1–3):477–489, (2003). [15] M. Sellmann and P. Van Hentenryck, ‘Structural symmetry breaking’, IJCAI’05, pp. 298–303. IJCAI, (2005). [16] N. Sloane. The on-line encyclopedia of integer sequences. At http: //www.research.att.com/∼njas/sequences/, 2008.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-525
525
Vivifying Propositional Clausal Formulae C´ edric PIETTE1 and Youssef HAMADI2 and Lakhdar SA¨ IS1 Abstract. In this paper, we present a new way to preprocess Boolean formulae in Conjunctive Normal Form (CNF). In contrast to most of the current pre-processing techniques, our approach aims at improving the filtering power of the original clauses while producing a small number of additional and relevant clauses. More precisely, an incomplete redundancy check is performed on each original clauses through unit propagation, leading to either a sub-clause or to a new relevant one generated by the clause learning scheme. This preprocessor is empirically compared to the best existing one in terms of size reduction and the ability to improve a state-of-the-art satisfiability solver.
1
INTRODUCTION
Since a few years, preliminary computations on CNF formulae have been more and more studied by the SAT community. This renewal of interest can be explained by different factors. First, reducing the huge size of the SAT instances encoding real world problems increases the robustness of SAT solvers. Secondly, these instances contain different kinds of structures that can be handled more efficiently before search. One of the most effective preprocessing techniques (SatElite) is currently integrated in state-of-the-art SAT solvers such as Minisat and Rsat. It is now well acknowledged that the performances of these solvers is usually greatly improved by this particular preprocessing, up to the point where SatElite is often used by SAT competitors. Thus, preprocessing a formula before solving is now known as an important step, and a lot of preprocessors have already been proposed. One of the first and efficient preprocessing algorithm, called 3-Resolution was incorporated to the Satz solver [10]. It consists in adding to the formula all resolvent clauses of size less or equal to 3, until saturation. 2-SIMPLIFY, a less computationally heavy preprocessor was proposed in [2]. It has been developed to better manage realworld benchmarks, which often contain a lot of binary clauses. Roughly, the idea is thus to use those binary clauses to construct an implication graph, from which unit clauses can be deduced by computing the transitive closure. If unit clauses have been obtained, they are propagated and this process is iterated until a fix point is reached. Later on, HyPre generalized 2-SIMPLIFY by computing hyper-binary resolution to deduce new binary clauses [1]. Moreover, HyPre is able to detect and substitute equivalent literals incrementally. The classical DP 1 2
Universit´ e Lille-Nord de France, Artois, CRIL-CNRS UMR 8188, F-62307 Lens, email: {piette,sais}@cril.fr Microsoft Research, 7 J J Thomson Avenue, Cambridge, United Kingdom email: youssefh@microsoft.com
procedure, based on variable elimination through resolution, has also been used in a limited way as a preprocessing step. A weaker schema has been adopted by the NiVER procedure [13]. This one eliminates variables by resolution if this computation does not increase the number of literals of the CNF formula. NiVER has been improved later by a so-called substitution rule, together with the use of clause signatures and touched lists to define the recent SatElite preprocessor [6]. However, only preprocessors that eliminate variables by a limited application of resolution are now grafted to modern SAT solvers. Indeed, the other kinds of preprocessors aim at modifying the CNF formula with some addition and/or removal of clauses, keeping generally the same set of variables. The main problem of these preprocessors is that it is difficult to measure the relevance of each added or eliminated clause with respect to the resolution step. One can eliminate clauses and can derive an harder sub-formula. Similarly, adding new clauses might lead to an increase in the space complexity, without reducing the search space. Indeed, the added clauses can only clutter the solver by creating redundant information. In this paper, we revisit this kind of preprocessing, using only forms of resolution that aims at substituting existing clauses by more constrained ones. In other words, our main goal is to strengthen, or to vivify, the redundant clauses from the original formula. To this end, we apply a limited check of redundancy on each clause of the CNF formula in order to derive or to approximate one of its minimally redundant subclauses. Interestingly, our proposed approach can also take advantage of modern learning scheme to produce new resolvents that are conditionally added to the formula. This paper is organized as follow: in the next section basic notations and definitions about propositional clausal formulae and SAT are provided. In section 3, different simplification techniques and their practical usefulness are discussed. Next, particular forms of resolution hidden by unit propagation are presented, and an incomplete method which can produce them is presented. The resulting preprocessor is detailed and evaluated in section 4. Finally, we conclude the paper by some perspectives and further works.
2
DEFINITIONS AND NOTATIONS
We briefly state here some definitions and notations used in the rest of this paper. A propositional formula is in conjunctive normal form (CNF for short), if it can be represented using a set (interpreted as a conjunction) of clauses, where a clause is a set (interpreted as a disjunction) of literals, a literal being a propositional variable, or its negation. The set of variables that appear in a CNF formula Σ will be denoted
526
C. Piette et al. / Vivifying Propositional Clausal Formulae
Var (Σ). Lit(Σ) is defined as the set {x, ¬x|x ∈ Var (Σ)}. For ¯ is defined as {¯ a set of literals L, L l|l ∈ L}. An interpretation ρ of a CNF formula Σ is an application from Var (Σ) to the set of truth values {true, false}. It is called a model iff it provides the value true to Σ (in short ρ |= Σ). SAT is the problem of deciding whether a given CNF formula admits a model, or not. Let ca = {la1 , . . . , lan , l} and cb = {lb1 , . . . , lbm , ¬l} be two clauses. The clause c = {la1 , . . . , lan , lb1 , . . . , lbm } is a logical consequence (called resolvent) of ca and cb . This production rule is called resolution and is denoted ⊗R . We note the resolvent c as ca ⊗R cb . Most of the techniques used for solving SAT (e.g. DP-like procedure, unit propagation, learning schemes, etc.) are based on implicit or explicit application of resolution. This is clearly the case for most preprocessors, including the one presented in this paper. Let c and c be two clauses of Σ. We say that a clause c (resp. c) subsumes (resp. is subsumed by) c (resp. c ) iff c ⊂ c. Subsumed clauses c can be removed from Σ while preserving satisfiability. Given x ∈ Lit(Σ), we define Σ|x , the formula simplified by the assignment of x to true. We recursively define U P (Σ) as follows : (1) U P (Σ) = Σ if Σ does not contain unit clauses, (2) U P (Σ) =⊥ if Σ contains two unit-clauses {x} and {¬x}, (3) otherwise, U P (Σ) = U P (Σ|x ) with x a unit literal appearing in a unit clause of Σ. A clause c is implied by unit propagation from Σ, denoted Σ |=U P c, if U P (Σ|c¯) =⊥. In the next section, the main preprocessing strategies are discussed, and a limited form of resolution that produces more constrained clauses than the original ones is presented.
3 3.1
PREPROCESSING CNF FORMULAE Adding and/or removing clauses?
Two main categories of preprocessors have been proposed: the first one aims at eliminating variables through a partial application of the DP procedure [5]. Actually, only variables which can be eliminated keeping the formula within a “reasonable” size (w.r.t. the original size) are exhaustively processed by resolution. SatElite belongs to this category of preprocessors. The principle of the second category is to modify the original formula by adding and/or removing clauses, usually keeping the whole set of variables. Most of the time, the production of new clauses is made by resolution. For instance, HyPre performs hyper resolution to produce binary clauses [1] that are added to the formula. These new clauses represent redundant information with respect to the original CNF formula, and this information seems to generally help solvers. Recently, a new approach introduced in [7] aims at removing from a formula some of the redundant clauses, namely clauses c of Σ s.t. Σ\{c} c. Obviously enough, performing such a test is computationally intractable. Therefore, this redundancy is only checked through unit propagation. As a consequence, this approach is incomplete, but it is able to remove some redundant clauses in polynomial time. As other clausefiltering techniques, the resulting preprocessor can sometimes slow down the whole resolution process because of the removal of some important redundant clauses. The main problem with those techniques is that it is hard to characterize which redundant clauses are useful. Indeed, a tradeoff has to be made between the management of a large
number of clauses, which slows down DPLL implementations, and their relevance, namely their ability to trigger propagations. Indeed, it is well-known that redundant information can actually help SAT solvers; for instance, the powerful learning scheme, which produces a particular resolvent clause after each conflict, can be viewed as a dynamic addition of redundant clauses during search. This learning strategy is now known to be one of the key features of modern solvers, which proves the interest of redundant information with respect to practical SAT resolution. Nevertheless, a simple experiment which consists in adding all learnt clauses to a CNF formula after its resolution shows that this new redundant information makes the formula generally more difficult to solve. Hence, how can we ensure that a particular clause-adding approach can effectively boost a given SAT solver? A priori, one interesting option would consider the efficient generation of sub-clauses from the original CNF. In this way, there is neither addition nor removal of any clause, but the substitution of existing clauses by more constrained ones. In current solvers, this computation would have great advantages: it could not only increase the number of unit propagations with no more clauses to manage, but would also lead to shorter learnt clauses during the search by reducing the reason of the propagated literals. Several techniques have already been proposed to generate sub-clauses. For instance, it is proposed in [4] to explore the implication graph to generate resolvent clauses and to only take into account the ones which subsume at least one original clause of the CNF formula, in order to substitute this latter clause by the shorter produced one. Actually, this computation is exponential in the worst case, and a weaker polynomial version restricted to a single literal assignment is proposed. In the next section, a new approach that aims at checking more systematically whether a clause can be shortened or not, is presented.
3.2
One answer: shorten existing clauses
The way a problem is encoded in CNF is crucial for its practical resolution, and can lead to exponential differences in resources requirement. Analyzing the different kinds of modelling is now an active path of research (see e.g. [8]). However, even with “good” modelling, some clauses might be redundant. A clause is redundant if it can be inferred from the remaining part of the CNF formula. In our approach redundancy check is only used to shorten clauses by eliminating some redundant literals. However, checking whether a clause is redundant is CoNPcomplete [11]. Hence, an incomplete but linear time deduction strategy has been adopted. Indeed, this check is performed with respect to unit propagation, only. More formally, a clause c of Σ is redundant modulo unit propagation (in short RedU P (Σ, c)), iff Σ\{c} |=U P c. Obviously, if RedU P (Σ, c ), and c ⊂ c, then we also have RedU P (Σ, c). The converse is not true. This observation lead us to a new definition of minimal redundancy of clauses. We say that a clause c of Σ is minimally redundant modulo UP iff c ⊂ c s.t. RedU P (Σ, c ). One of the main goals behind our vivification process is to find for each redundant clause, one of its minimal redundant sub-clauses. Actually, a clause checked to be shortened is removed from the CNF formula, and the opposite of its literals are assigned one by one according to their lexicographic
C. Piette et al. / Vivifying Propositional Clausal Formulae
ordering. Given a CNF formula Σ and c = {l1 , l2 , . . . , ln } a clause from Σ. Assuming that the order in which the literals are assigned is (¬l1 , . . . , ¬ln ), two possible cases may occur: 1. ∃i ∈ {1, . . . , n − 1} s.t. Σ\{c} ∪ {¬l1 , . . . , ¬li } U P ⊥ In this case, we have Σ\{c} U P c with c = (l1 ∨ . . . ∨ li ) This new clause c strictly subsumes c. Hence, the original clause can be substituted by the new deduced one. Obviously, c is not necessarily minimally redundant modulo UP. Indeed, another ordering on the literals {l1 , l2 , . . . , li } might lead to an even shorter sub-clause. Thanks to a conflict analysis, the deduced sub-clause c could be shortened again leading to an even smaller sub-clause. Indeed, a new clause η can be generated by a complete traversal of the implication graph associated to Σ and to the assignments of the literals {¬l1 , . . . , ¬li } . The complete traversal of the implication graph ensure that the clause η contains only literals from c . Thereby, η is a sub-clause of (l1 ∨ . . . ∨ li ). 2. Otherwise, as unit propagation is performed after each assignment, if one of the remaining literals is assigned by this filtering operation, then a sub-clause is produced. Trivially, when this phenomenon occurs, the propagated literal is either assigned positively (it satisfies the removed clause of the CNF formula) or negatively (it is falsified in this clause). Considering i and j with 1 ≤ i < j ≤ n, the two possible cases are: • Σ\{c} ∪ {¬l1 , . . . , ¬li } U P ¬lj In this case, we can deduce: Σ\{c} U P (l1 ∨. . .∨li ∨¬lj ) Applying resolution between this new clause and c (using the variable lj ), we obtain: (l1 ∨ . . . ∨ lj ∨ . . . ∨ ln ) ⊗R (l1 ∨ . . . ∨ li ∨ ¬lj ) = (l1 ∨ . . . ∨ lj−1 ∨ lj+1 ∨ . . . ∨ ln ). This new clause clearly subsumes c. Hence, the original clause can be substituted by the new deduced one. • Σ\{c} ∪ {¬l1 , . . . , ¬li } U P lj In this case, we can deduce: Σ\{c} U P (l1 ∨ . . . ∨ li ∨ lj ) In this case too, the produced clause subsumes c and enables to “remove” literals from it. Accordingly, from the iterative assignments of the opposite literals of a clause, one reduced clause could be produced. This computation can clearly be integrated into a modern SAT solver, and benefit from lazy data structures to be performed. Moreover, during such a search, some assignments could lead to a conflict. As explained above, when this case occurs, the procedure can use the conflict analysis implemented in current solvers to produce smaller sub-clauses in a polynomial time. Using the previous rules and the learning feature of SAT solvers, a CNF formula can be vivified, namely made easier to solve. In the next section, we present the practical implementation that has been made, based on the previous ideas.
4 4.1
CNF FORMULAE VIVIFICATION Technical choices
In this section, different practical parameters are discussed, some of them resulting from extensive experiments. First, the ideas proposed in the previous section imply to test the clauses of a formula to shorten some of them. However, if a literal is actually removed from a clause, new propagations can be performed using this clause, meaning that all
527
the failed tests made on previous clauses could then succeed with this shortened clause. Hence, whenever a test succeeds to produce a sub-clause, all other clauses are checked again with a new iteration of the procedure. Second, the presented sub-clause production technique supposes that the order in which the literals are assigned is important. Clearly, to ensure a maximal clause reduction, one has to check all possible orders of literals. However, this could lead to a pretty heavy computation; then, an incomplete strategy that consists in trying only one particular order has been adopted. Actually, a variant of the MOMS branching heuristic [9] is used to sort the literals in order to maximize the number of implied literals by unit propagation. Yet, using only this heuristic makes the order very similar from one iteration to the other. As said previously, a clause is tested again only if at least one other clause has been shortened. However, keeping only the MOMS ordering does not appear as a good solution, because the procedure would not benefit of the potential multiple iterations made on each clause. To diversify the search, some randomization is used as follows: assuming that the literals of a clause are sorted with respect to MOMS, two of them are selected randomly and are exchanged in this ordering. Finally, when a conflict occurs, the tested clause c = (l1 ∨ . . . ∨ ln ) is substituted by its sub-clause c = (l1 ∨ . . . ∨ li ). As mentioned above, a complete traversal of the implication graph could lead to an even more reduced clause, but for efficiency purposes, this computation is not performed. In our implementation, classical learning scheme is used to generate a nogood η corresponding to the first UIP. If this new clause η subsumes the sub-clause c , then c is now substituted by η; otherwise η is only added to the formula if its size (in term of number of literals) is strictly less than the size of the original clause. As the results show, this strategy only adds a few number of nogoods (< 5% of the number of original clauses), which prove useful for the future exhaustive search. Considering these choices, a new polynomial preprocessor called ReVivAl (for pReprocessing based on Vivification Algorithm) has been developed. This method is described in the Algorithm 1. Roughly, for each clause c of an input CNF Σ, c is removed from Σ and the opposite of each literal is assigned alternatively with unit propagation (loop from line 5 to 29). Moreover, different checks about the remaining literals (that “should” be unassigned) and the presence of a conflict are performed, as presented in section 3.2 (tests on lines 11, 13, 17, 19, 23 and 27). The order in which the literals are selected for assignment is given by the function select a literal which just selects the highest literal with respect to our randomized MOMS-like score, where two randomly chosen literals have their score reversed. As long as one of the clauses has been reduced (change set to true), the process continues with all the other clauses. Let us note that our implementation has been integrated into a modern SAT solver, which enables the use of most recent data structures and mecanisms designed for SAT resolution. Hence, the redundancy test of each clause, performed by a serie of assignments, takes advantage of the efficiency of watched literals. In the same way, the conditional add of clauses is achieved through the “classical” learning functions, usually called by the solver after each conflict. Exploiting those structures and techniques implemented into exhaustive methods not only leads to an easy
528
C. Piette et al. / Vivifying Propositional Clausal Formulae
Algorithm 1: Vivification of a CNF formula Input: Σ : a CNF formula Output: a vivified CNF formula 1 begin 2 change ←− true; ; 3 while change do 4 change ←− f alse ; 5 foreach c ∈ Σ do 6 Σ ←− Σ\{c} ; Σb ←− Σ ; 7 cb ←− ∅ ; shortened ←− f alse ; 8 while (Not(shortened) And (c = cb )) do 9 l ←− select a literal(c\cb ) ; 10 cb ←− cb ∪ {l} ; Σb ←− (Σb ∪ {¬l}) ; 11 if ⊥ ∈ UP(Σb ) then 12 cl ←− conflict analyze and learn() ; 13 if cl ⊂ c then 14 Σ ←− Σ ∪ {cl } ; 15 shortened ←− true ; 16
else if |cl | < |c| then Σ ←− Σ ∪ {cl } ; cb ←− c ;
17 18
if c = cb then Σ ←− Σ ∪ {cb } ; shortened ←− true ;
19 20 21 22
else if ∃(ls ∈ (c\cb )) s.t. ls ∈ UP(Σb ) then if (c\cb ) = {ls } then Σ ←− Σ ∪ {cb ∪ {ls }} ; shortened ←− true ;
23 24 25 26
if ∃(ls ∈ (c\cb )) s.t. ¬ls ∈ UP(Σb ) then Σ ←− Σ ∪ {c\{ls }} ; shortened ←− true ;
27 28 29
if Not(shortened) then Σ ←− Σ ∪ {c} ; else change ←− true ;
30 31 32 33
return Σ ; end
implementation of our method within most of current solvers, but also provides our approach with their effectiveness for the different performed tests. Our approach is thoroughly evaluated in the following section.
4.2
Empirical Evaluation
We have compared ReVivAl against the preprocessor which is actually considered as the best approach, namely SatElite. The state-of-the-art SAT solver RSAT [12] has been selected, since it has been recognized in the last competition as very adapted for structured problems. All our experiments have been conducted on Intel Xeon 3GHz under Linux CentOS 4.1. (kernel 2.6.9) with a RAM limit of 2GB. For all experiments, a timeout of 3 hours has been respected. We have compared the preprocessors both on their size reduction and their impact on the efficiency of RSAT. Actually, this comparison has been conducted on a very large set of
benchmarks from the SAT competitions, SAT Race, SATLIB and other sources; more than 5000 instances have been used for those experiments that have needed about 600 days of CPU time. A sample of experiments where examples will be referred in the following is proposed in Table 1, but the exhaustive results are available at: http://www.cril.fr/~piette/preprocessor.html. The first main part of Table 1 provides the name of the tested problem together with the number of clauses (#cla) and literals (#lit) it contains.The two other parts of the table are similar (one for each preprocessor), and contain the time of preprocessing in seconds, the size of the resulting formula in term of number of literals and clauses after the corresponding preliminary computation, and the solving time (in seconds) needed to solve the CNF formula after simplification. In addition, for ReVivAl, the number of performed iterations and learnt clauses are provided in the columns “#ite” and “#learnt”, respectively. The best preprocessing on a given instance corresponds to the best one in term of cumulated preprocessing and solving time (reported in boldface). First, let us focus on benchmarks that can be actually solved being only preprocessed. Indeed, it exists such CNF formulae, including some instances proposed for the SAT competitions and/or the SAT Races. Given the features of the presented preprocessing approaches, when one of them (or both) succeed(s) to prove (un)satisfiability of a CNF formula, this clearly means that the CNF formula is solvable in polytime (indicated Polynomial in the table). The interest of such formulae can be questioned for solvers empirical evaluations, because they do not exhibit any computational difficulties, which should be the key point of comparison between exhaustive procedures. Among the tested formulae, SatElite (resp. ReVivAl) proves 35 (resp. 167) instances polynomial. Moreover, note that for both preprocessors, those computations are most often performed within a few seconds (see e.g. SAT dat.k1, ezfact16 3). Second, let us consider the size of CNF formula after being preprocessed. Some differences can be observed between both approaches. Indeed, on the first hand, the purpose of SatElite is to eliminate variables without increasing the size of the CNF formula. Thus, resulting CNF formula can have about the same number of clauses, but they can exhibit a higher number of literals. On the opposite, ReVivAl tries to minimize the size of clauses and to add limited relevant ones. As a consequence, the simplified formulae sometimes contain a little more clauses than the original ones, but in general the average number of literals per clause is reduced, making them more exploitable for the solver’s unit-propagation mechanism. As an example, on the benchmark alu4mul.miter which exhibits 30465 clauses and 103040 literals (ratio #lit/#cla = 3.38), SatElite eliminates variables keeping about the same number of clauses and literals whereas ReVivAl returns a smaller CNF formula in number of clauses (28992) for a ratio equal to 3.11. Cases where SatElite provides a formula with a largely bigger ratio can occur (see e.g. 3pipe 3 ooo and 3bitadd 31), but not with ReVivAl. More generally, discarding the instances that cannot be solved using any of the preprocessors in conjunction to RSAT, a time gap of 18.8% can be observed in favour of ReVivAl. Futhermore, using SatElite RSAT cannot decide the satisfiability of 2508 instances within 3 hours of CPU time
C. Piette et al. / Vivifying Propositional Clausal Formulae
Instance name (#cla,#lit) pbl-00250 (32700,256765) velev-fvp-sat-3.0-07 (1012271,2979665) alu4mul.miter (30465,103040) Composite-024BitPrimes-1 (11158,49842) Composite-024BitPrimes-0 (11158,49842) velev-eng-uns-1.0-04 (66654,188252) (31310,86676) 3bitadd 31 SAT dat.k1 (3868,9928) c3540mul.miter (33199,112244) logistics-rotate-10t5 (338789,680799) (1113,4089) ezfact16 3 ezfact48 8 (11001,41369) ezfact48 9 (11001,41369) ezfact64 1 (19785,74601) abb313GPIA-8-c (426860,2561106) qg7-10 (33736,89626) color-10-3 (6475,25200) grieu-vmpc-s05-27r (96849,253854) ferry12 (32199,71303) mod2c-3cage-unsat-9-1 (464,1856) 544707209399nw (18031,53975) rand net40-60-10 (14321,33560) abb313GPIA-8-cn (693640,2080902) (11489,33442) equilarge m1 hanoi5u (73777,160717) shuffling-2-s1765005333 (30465,103040) lksat-n2200. . .s1262048766 (7524,22572) 3pipe 3 ooo (33270,95618) gripper12 (30746,68144) gripper13 (40461,89385)
time 0.33 4.05 0.1 0.02 0.02 0.5 0.38 0 0.13 2.45 0 0.02 0.02 0.06 1.47 0.04 0.08 0.15 0.21 0 0.12 0.1 16.73 0.07 0.29 0.13 0.04 0.19 0.12 0.16
SatElite (#cla,#lit) (31115,245053) (998048,3017403) (30392,102976) (10087,44762) (10016,44487) (62239,201530) (31186,108004) Polynomial (33066,112216) (317194,687554) (990,3586) (9215,34333) (10532,39464) (16716,62498) (421719,2521104) (11038,27452) (6175,32600) (96849,253854) (30268,68540) (464,1856) (16032,49234) (10778,29243) (388404,2307599) (11158,41295) (61778,135270) (30392,102976) (6322,19609) (31735,101212) (28060,62815) (37437,83384)
Table 1.
time 59.8 61.53 17.99 0.56 0.48 60.09 10.17 0 3.3 11.67 0 1.37 1.26 2.77 60.88 0.15 0.06 25.87 0.5 0 2.26 3.74 23.27 0.12 1.18 23.93 0.19 30.86 0.26 0.37
ReVivAl (#cla,#lit) solv. time (34815,219076) 613.89 (990394,2928889) 16.04 (28992,90194) 670.66 (10728,35545) 2013.16 (10395,34668) 16.59 (57518,157260) 14.24 (33125,83346) 1.71 Polynomial (27206,80134) 2186.01 (277860,558081) 151.64 Polynomial (9582,27880) 46.71 (9558,27858) 82.46 (17265,50532) 3049.66 (404007,2340042) 12.05 Polynomial (6475,25200) 19.41 (96482,239730) 51.76 (30405,67168) 1.87 (472,1533) 2942.17 (14768,34277) 814.4 (14321,33152) 421.97 (677607,2017201) 2321.44 (11489,33164) time out (59467,129001) 284.83 (28975,89789) 1023.51 (7629,20439) 59.91 (31150,87665) 9.13 (27871,61632) 676.2 (37195,82030) 1183.14
#ite #learnt 30 2785 2 19 15 502 20 163 18 132 15 89 28 1815 13 815 1 0 18 315 17 344 15 549 28 236 1 0 9 154 1 12 9 8 9 628 8 0 12 19 5 0 1 9 11 592 16 105 33 60 1 2 1 4
SatElite VS ReVivAl
(preprocessing and solver), while the solver fails for only 2457 benchmarks with our approach. This 51 instances difference does not look big, but SAT competitions and Races are usually settled by even smaller gaps. However, even if ReVivAl has in general a better effect on CNFs than SatElite, counter-examples can obviously be provided (see e.g. hanoi5u and abb313GPIA-8-cn). Nevertheless, many classes of SAT instances are typically more sensitive to the ReVivAl process, which is better than SatElite at improving RSAT. For example, on ezfact-*, encoding circuits factorization, Composite-*BitPrimes instances, encoding composite numbers (suggested as a challenge to SAT solvers in 1997 by Cook and Mitchell [3]), gripper* planning instances, our proposed approach clearly outperforms SatElite.
5
solv. time time out 9.84 915.91 7403.21 733.2 11.67 5058.73 1635.29 1250.5 0 61.99 204.94 time out 366.29 0.01 1.28 104.02 8.28 2152.31 1477.7 486.48 143 time out 183.74 1380.07 201.38 9.67 time out 1300.36
529
CONCLUSION
In this paper, ReVivAl, a new preprocessing based on limited forms of resolution and conflict analysis has been proposed. Our approach, called vivification, makes an original use of clause redundancy checking to produce sub-clauses and to add new relevant clauses obtained thanks to the clause learning scheme. Its efficiency is illustrated through extensive experiments with a state-of-the-art DPLL solver. A comparison with the best known preprocessing technique shows that ReVivAl, achieves interesting improvements, especially on circuits factorization, composite numbers and planning instances. Our results open many interesting future directions of research. It appears that combining several preprocessors often enables to even better improvements. Indeed, a combination of SatElite and ReVivAl obtained parlicularly interesting results at the SAT-Race 2008 (6th on 19 submitted solvers). A dynamic selection of preprocessors based on automated-
tuning approaches is thus a path that should be explored. The periodical use of ReVivAl, for example during restarts, is also a promising future direction.
REFERENCES [1] F. Bacchus and J. Winter, ‘Effective preprocessing with hyper-resolution and equality reduction’, in SAT’03, pp. 341– 355, (2003). [2] Ronen I. Brafman, ‘A simplifier for propositional formulas with many binary clauses’, in IJCAI’01, pp. 515–522, (2001). [3] S.A. Cook and D.G. Mitchell, ‘Finding hard instances of the satisfiability problem: A survey’, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 35, (1997). [4] S. Darras, G. Dequen, L. Devendeville, B. Mazure, R. Ostrowski, and L. Sais, ‘Using boolean constraint propagation for sub-clause deduction’, in CP’05, pp. 757–761, (2005). [5] M. Davis and H. Putnam, ‘A computing procedure for quantification theory’, Journal of the ACM, 7(3), 201–215, (1960). [6] N. E´ en and A. Biere, ‘Effective preprocessing in SAT through variable and clause elimination’, in SAT’05, pp. 61–75, (2005). [7] O. Fourdrinoy, E. Gr´egoire, B. Mazure, and L. Sais, ‘Eliminating redundant clauses in SAT instances’, in CP-AI-OR’07, pp. 71–83, (2007). [8] A. Hertel, P. Hertel, and A. Urquhart, ‘Formalizing dangerous SAT encodings’, in SAT’07, pp. 159–172, (2007). [9] R. G. Jeroslow and J. Wang, ‘Solving propositional satisfiability problems’, Annals of mathematics and artificial intelligence, 1, 167–187, (1990). [10] C. Li and Anbulagan, ‘Look-ahead versus look-back for satisfiability problems.’, in CP’97, pp. 341–355, (1997). [11] Paolo Liberatore, ‘Redundancy in logic I: CNF propositional formulae’, Artif. Intell., 163(2), 203–232, (2005). [12] K. Pipatsrisawat and A. Darwiche, ‘RSAT 2.0: SAT solver description’, Technical Report D–153, Automated Reasoning Group, Computer Science Department, UCLA, (2007). [13] S. Subbarayan and D. Pradhan, ‘NiVER: Non increasing variable elimination resolution for preprocessing SAT instances’, SAT’04, 276–291, (2004).
530
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-530
Hybrid tractable CSPs which generalize tree structure Martin C. Cooper1 and Peter G. Jeavons2 and Andr´as Z. Salamon3 Abstract. The constraint satisfaction problem (CSP) is a central generic problem in artificial intelligence. Considerable progress has been made in identifying properties which ensure tractability in such problems, such as the property of being tree-structured. In this paper we introduce the broken-triangle property, which allows us to define a hybrid tractable class for this problem which significantly generalizes the class of problems with tree structure. We show that the broken-triangle property is conservative (i.e., it is preserved under domain reduction and hence under arc consistency operations) and that there is a polynomial-time algorithm to determine an ordering of the variables for which the broken-triangle property holds (or to determine that no such ordering exists). We also present a nonconservative extension of the broken-triangle property which is also sufficient to ensure tractability and can be detected in polynomial time. Keywords: constraint satisfaction, tractability, computational complexity, arc consistency.
1
INTRODUCTION
Constraint satisfaction problems with tree structure have been widely studied, and are known to have efficient algorithms [8]. However, tree structure is quite restricted. It is therefore worthwhile exploring more general problem classes, to identify more widely-applicable properties which still allow efficient solution algorithms. A subclass of the general CSP which can be solved in polynomial time, and also identified in polynomial time, is called a tractable subclass. There has been a considerable research effort in identifying tractable subclasses of the CSP over the past decade. Most of this work has focused on one of two general approaches: either identifying forms of constraint which are sufficiently restrictive to ensure tractability no matter how they are combined [2, 9], or else identifying structural properties of constraint networks which ensure tractability no matter what forms of constraint are imposed [7, 5]. The first approach has had considerable success in characterizing precisely which forms of constraint ensure tractability no matter how they are combined. A set of constraint types with this property is called a tractable constraint language. In general it has been shown that any tractable constraint language must have certain algebraic properties known as polymorphisms [13]. A complete characterization of all possible tractable constraint languages has been established in the following cases: conservative constraint languages 1 2 3
IRIT, University of Toulouse III, 31062 Toulouse, France, email: cooper@irit.fr Computing Laboratory, University of Oxford, Oxford, OX1 3QD, UK, email: Peter.Jeavons@comlab.ox.ac.uk Computing Laboratory, University of Oxford, Oxford, OX1 3QD, UK, and The Oxford-Man Institute of Quantitative Finance, 9 Alfred Street, Oxford, OX1 4EH, UK, email: Andras.Salamon@comlab.ox.ac.uk
(i.e. constraint languages containing all unary constraints) [3], and constraint languages over a 2-element domain [17] or a 3-element domain [4]. The second approach has also had considerable success in characterizing precisely which structures of constraint network ensure tractability no matter what constraints are imposed. For the class of problems where the arity of the constraints is bounded by some fixed constant (such as binary constraint problems) it has been shown that (subject to certain technical assumptions) the only class of structures which ensure tractability are structures of bounded tree-width [12]. However, many constraint satisfaction problems do not possess a sufficiently restricted structure or use a sufficiently restricted constraint language to fall into any of these tractable classes. They may still have properties which ensure they can be solved efficiently, but these properties concern both the structure and the form of the constraints. Such properties have sometimes been called hybrid reasons for tractability [16], and they are much less widely-studied and much less well-understood than the language properties and structural properties described above. In this paper we introduce a new hybrid property which we call the broken-triangle property. We show that this property is sufficient to ensure that a CSP instance is tractable, and also show that checking whether an instance has the broken-triangle property can be done in polynomial time. Moreover, we show that all tree-structured CSP instances have this property, as well as many other instances that are not tree-structured (including some with unbounded tree-width). The broken triangle property can be thought of as a kind of transitivity condition. By processing the variables in an appropriate order, an algorithm akin to those used for solving tree-structured CSP instances can be applied to find a solution. Moreover, a suitable such ordering of variables can be found efficiently. The general technique for finding a suitable ordering, and then exploiting it to generate a solution, is discussed in Section 3. Sections 4 to 6 extend these ideas.
2
THE BROKEN TRIANGLE PROPERTY
In this paper we focus on binary constraint satisfaction problems. A binary relation over domains Di and Dj is a subset of Di × Dj . For a binary relation R, the relation rev(R) is defined as {(v, u) | (u, v) ∈ R}. A binary CSP instance consists of a set of variables (where each variable is denoted by a number i ∈ {1, 2, . . . , n}); for each variable i, a domain Di containing possible values for variable i; and a set of constraints, each of the form (i, j), R, where i and j are variables and R is a relation such that R ⊆ Di × Dj . To simplify the notation we introduce the notion of a canonical constraint relation which combines all of the specified information about a pair of variables i, j. Definition 1 Suppose i and j are variables of a CSP instance. De-
M.C. Cooper et al. / Hybrid Tractable CSPs Which Generalize Tree Structure
531
note by Uij the set of constraint relations specified for the (ordered) pair of variables (i, j). The canonical constraint relation between variables i and j will be denoted Rij and is defined as \ Rij = (Uij ∪ {rev(R) | R ∈ Uji }) .
Lemma 3 A binary CSP instance satisfies the broken-triangle property with respect to the variable ordering < if and only if for all triples of variables i < j < k, and for all (u, v) ∈ Rij ,
The canonical constraint relation Rij contains precisely the pairs of values that are allowed for the variables i and j by all the constraints on i and j. Note that Rij = rev(Rji ). If there are no constraints involving i and j, then Rij is the intersection of an empty set, and is defined to be the complete relation Di ×Dj . If relation Rij is neither empty nor the complete relation, we say it is proper.
Proof: The condition that either Rik (u) ⊆ Rjk (v) or Rjk (v) ⊆ Rik (u) is equivalent to stating that there do not exist elements a of Rik (u) and b of Rjk (v) such that a ∈ Rjk (v) and b ∈ Rik (u). By the definition of the image of an element in a relation, this in turn is equivalent to the statement that there do not exist a, b ∈ Dk such that (u, a) ∈ Rik , (v, b) ∈ Rjk , (u, b) ∈ Rik and (v, a) ∈ Rjk . Sentence (1) therefore exactly forbids the presence of a configuration that would prevent the instance from satisfying the BTP.
Definition 2 A binary CSP instance satisfies the broken-triangle property (BTP) with respect to (w.r.t.) the variable ordering <, if, for all triples of variables i, j, k such that i < j < k, if (u, v) ∈ Rij , (u, a) ∈ Rik and (v, b) ∈ Rjk , then either (u, b) ∈ Rik or (v, a) ∈ Rjk . The broken-triangle property can be understood by the implication shown in Figure 1. In this figure, each oval represents the domain of an associated variable, and each line represents a consistent assignment of values for a pair of variables. A line joins element u ∈ Di and element v ∈ Dj if (u, v) ∈ Rij . The BTP on i, j, k simply says that for any “broken triangle” a − u − v − b, as illustrated in Figure 1, there is always a true triangle u − v − c (where c is either a or b). BTP is similar to but stronger than directional path consistency [18]. It is important to note that the BTP must be satisfied for all triples i < j < k, even if the description of the instance does not specify a constraint between variables i and j. If there is no specified constraint between i and j, then Rij allows all pairs of values. A set of CSP instances may satisfy the broken-triangle property due to the structure of the constraint graph, due to the language of the constraint relations, or due to a combination of these. a b u
(Rik (u) ⊆ Rjk (v)) ∨ (Rjk (v) ⊆ Rik (u)).
(1)
Using this result we can obtain the following simple sufficient condition for the broken-triangle property. Lemma 4 A binary CSP instance satisfies the broken-triangle property with respect to a variable ordering < if, for all triples of variables i < j < k, either Rik or Rjk is a complete relation. Proof: If Rik is a complete relation, then Rik (u) = Dk , while if Rjk is a complete relation, then Rjk (v) = Dk . In either case, by Lemma 3, the instance satisfies the BTP. Definition 5 A class of CSP instances is called conservative if it is closed under domain restrictions (i.e., the addition of arbitrary unary constraints). It is easy to verify from the definition that the broken-triangle property is conservative. This has two important benefits. First, the broken-triangle property is invariant under arc consistency operations: if a binary CSP instance satisfies the broken-triangle property, then so does its arc consistency closure. Second, if the brokentriangle property is satisfied on all triples of variables i, j, k belonging to some subset of variables W , then the CSP instance which results when all of the variables not in W have been assigned will satisfy the broken-triangle property, and hence be efficiently solvable.
k
3 i
v j
⇓
c
u
k i
v j
Figure 1. The broken-triangle property on variables i, j, k.
For an element a ∈ Di , we write Rij (a) to represent {b ∈ Dj : (a, b) ∈ Rij }, the image of a in relation Rij .
TRACTABILITY OF BTP INSTANCES
In this section we show that if a CSP instance has the broken-triangle property with respect to some fixed variable ordering, then finding a solution is tractable. Moreover, the problem of finding a suitable ordering if it exists is also tractable. For a binary CSP instance with n variables, let d = max{|D1 |, . . . , |Dn |} and let q be the number of constraints. Definition 6 An assignment of values (u1 , . . . , uk ) to the first k variables of a binary CSP instance is called consistent if ui ∈ Di whenever 1 ≤ i ≤ k, and (ui , uj ) ∈ Rij whenever 1 ≤ i < j ≤ k. Theorem 7 For any binary CSP instance which satisfies the BTP with respect to some known variable ordering <, it is possible to find a solution in O(d2 q) time (or determine that no solution exists). Proof: By the discussion above, if an instance has the BTP with respect to <, then establishing arc consistency preserves the BTP. Furthermore, it is known that arc consistency can be established in O(d2 q) time [1]. If this results in an empty domain, then the instance has no solutions. Therefore, we assume in the following that the CSP instance is arc consistent and has non-empty domains.
532
M.C. Cooper et al. / Hybrid Tractable CSPs Which Generalize Tree Structure
We can assign some value u1 ∈ D1 to the first variable, since D1 = ∅. To prove the result it is sufficient to show, for all k = 2, . . . , n, that any consistent assignment (u1 , . . . , uk−1 ) for the first k − 1 variables can be extended to a consistent assignment (u1 , . . . , uk ) for the first k variables. The case k = 2 follows from arc consistency. By Lemma 3, if i < j < k then either Rik (ui ) ⊆ Rjk (uj ) or Rjk (uj ) ⊆ Rik (ui ). Thus the set {Rik (ui ) | i < k} is totally ordered by subset inclusion, and hence has a minimal element \ Ri0 k (ui0 ) = Rik (ui ) (2) i
for some i0 < k. Since the instance is arc consistent, Ri0 k (ui0 ) = ∅. By the definition of Rik (ui ), it follows that (u1 , . . . , uk ) is a consistent assignment for the first k variables, for any choice of uk ∈ Ri0 k (ui0 ). The time taken to calculate the intersections in (2) is at most O(d2 q) overall, since each pair of values must be checked against each relevant constraint. Theorem 8 The problem of finding a variable ordering for a binary CSP instance such that it satisfies the broken-triangle property with respect to that ordering (or determining that no such ordering exists) is solvable in polynomial time. Proof: Given a CSP instance P , we define a new CSP instance P that has a solution precisely when there exists a suitable variable ordering for P . To construct P , let O1 , . . . , On be variables taking values in {1, . . . , n} representing positions in the ordering. We impose the ternary constraint Ok < max{Oi , Oj } (3)
4
RELATED CLASSES
In this section we will show that the broken-triangle property generalizes two other known tractable classes: one based on language restrictions and one based on structural restrictions. Throughout this section we suppose that the values in the variable domains are totally ordered. Definition 9 A binary relation Rij is right monotone if ∀b, c ∈ Dj , (a, b) ∈ Rij ∧ b < c ⇒ (a, c) ∈ Rij . A commonly-used right monotone constraint is the inequality constraint: Xi ≤ Xj . The complete relation is also right monotone. Lemma 10 If the relations Rik , Rjk are both right monotone, then the broken triangle property is satisfied on the triple of variables i < j < k, whatever the relation Rij . Proof: Suppose that Rik , Rjk are both right monotone and that (u, v) ∈ Rij , (u, a) ∈ Rik and (v, b) ∈ Rjk . If a < b, then (u, b) ∈ Rik (since Rik is right monotone); if a > b, then (v, a) ∈ Rjk (since Rjk is right monotone). Definition 11 Consider a binary CSP instance P . For a given variable ordering <, denote by parents< (k) the set of variables i < k such that Rik is proper. Definition 12 A binary CSP instance is renamable right monotone with respect to a variable ordering < if, for each k ∈ {2, . . . , n}, there is an ordering of Dk , such that Rik is right monotone for every i ∈ parents< (k). Lemma 13 If a binary CSP instance is renamable right monotone with respect to a variable ordering <, then it satisfies the brokentriangle property with respect to <.
for all triples of variables i < j < k in P such that the brokentriangle property fails to hold for some u ∈ Di , v ∈ Dj , and a, b ∈ Dk . The instance P then has a solution precisely if there is an ordering of the variables 1, . . . , n of P which satisfies the brokentriangle property. Note that if the solution obtained represents a partial order (for instance, if Oi and Oj are assigned the same value for some i = j), then it can be extended to a total order which still satisfies all the constraints by using a linear time topological sort. For each triple of variables in P , the construction of the corresponding constraints in P requires O(d4 ) steps to check which constraints to add. There are O(n3 ) such triples, so constructing instance P takes O(n3 d4 ) steps, which is polynomial in the size of P . The constraints in P are all of the form (3), and such constraints are max-closed [14] (if p1 < max{q1 , r1 } and p2 < max{q2 , r2 } then max(p1 , p2 ) < max{max(q1 , q2 ), max(r1 , r2 )}). Maxclosed constraints are a tractable constraint language [14]: any CSP instance with max-closed constraints can be solved by establishing generalized arc consistency [15] and then choosing the maximum element which remains in each variable domain. Since the size of P is polynomial in the size of P , it follows that the instance P can be solved in time polynomial in the size of P .
Proof: If a CSP instance has tree structure, then any variable ordering < from any designated root to the leaves is such that |parents< (k)| ≤ 1 for every variable k. Hence, by Lemma 4, it satisfies the BTP with respect to that ordering.
Because the BTP is conservative, any pre-processing operations which only perform domain reductions, such as arc consistency, path-inverse consistency [11], or neighbourhood substitution [10, 6], can be applied before looking for a variable ordering for which the broken-triangle property is satisfied; these reduction operations cannot destroy the broken-triangle property, but they can make it more likely to hold (and easier to check).
Let TREE be the constraint satisfaction problem consisting of all instances that have tree structure, RRM be the CSP consisting of all instances that are renamable right monotone w.r.t. some variable ordering, and BTP be the CSP consisting of all instances which have the broken-triangle property w.r.t. some variable ordering. Note that the class RRM contains instances of arbitrary tree-width, for instance some CSPs where the constraint structure is a grid.
Proof: Suppose the CSP instance is renamable right monotone with respect to variable ordering <, and let k be any variable. Since the instance is renamable right monotone with respect to <, there is an ordering of Dk such that whenever i ∈ parents< (k) then Rik is right monotone. Now suppose i < j < k are variables in this ordering. Then each of Rik and Rjk is either the complete relation (and hence right monotone), or right monotone in its own right. By Lemma 10, the broken triangle property is satisfied for i, j, k. Since the choice of k was arbitrary, it follows that the instance satisfies the BTP. Lemma 14 If a CSP instance has a tree structure, then it satisfies the broken-triangle property with respect to any variable ordering in which each node occurs before its children.
M.C. Cooper et al. / Hybrid Tractable CSPs Which Generalize Tree Structure
2 0 1
533
that (u, v) ∈ Rij , (u, a) ∈ Rik and (v, b) ∈ Rjk . Denote by I the subproblem of DAC(I) on variables i, j, k and with reduced domain {a, b} for variable k. Establishing directional arc consistency in I may reduce the domains of variables i and j, but cannot delete v from the domain of variable j (since it has a support, namely b, at k) nor can it delete u from the domain of variable i (since it has supports at variables j and k). If DAC(I ) is universally backtrackfree, then the consistent assignment (u, v) for the variables (i, j) can be extended to a consistent assignment for (i, j, k), which must be either (u, v, a) or (u, v, b). This corresponds exactly to the definition of the broken-triangle property, and so DAC(I) satisfies the BTP.
Figure 2. An instance in BTP that is not in RRM or TREE.
6 Theorem 15 TREE BTP and RRM BTP. Proof: The inclusions follow from Lemma 14 and Lemma 13; the instance shown in Figure 2 establishes the strict separations.
5
ALTERNATIVE CHARACTERIZATION
In this section we consider properties which are both conservative and preserved by taking subproblems. We show that the brokentriangle property is the only such property which ensures that the following desirable behaviour can be guaranteed simply by achieving a certain level of arc-consistency: Definition 16 A CSP instance is universally backtrack-free with respect to an ordering < of its n variables if ∀k ∈ {2, . . . , n}, any consistent assignment for the first k − 1 variables can be extended to a consistent assignment for the first k variables. Definition 17 Given a CSP instance I on variables 1, . . . , n, the subproblem I({i1 , . . . , im }), where 1 ≤ i1 < i2 < . . . < im ≤ n, is the m-variable CSP instance with domains Di1 , . . . , Dim and exactly those constraints of I whose scopes are subsets of {i1 , . . . , im }. Definition 18 A set Σ of CSP instances is inclusion-closed if ∀I ∈ Σ, all subproblems I(M ) on subsets M of the variables of I also belong to Σ. Definition 19 A binary CSP instance is directional arc-consistent with respect to a variable ordering <, if for all pairs of variables i < j, ∀a ∈ Di , ∃b ∈ Dj such that (a, b) ∈ Rij . Proposition 20 A conservative inclusion-closed set Σ of CSP instances is such that the directional arc-consistency closure DAC(I) of every I ∈ Σ with respect to a variable ordering < is universally backtrack-free with respect to < if and only if ∀I ∈ Σ, DAC(I) satisfies the broken-triangle property with respect to <. Proof: The argument used in the proof of Theorem 7 shows that if any binary CSP instance satisfies the broken-triangle property then its directional arc-consistency closure is universally backtrack-free. To prove the converse, suppose that Σ is a conservative inclusionclosed set of CSP instances and consider any I ∈ Σ. Since Σ is conservative, DAC(I) also belongs to Σ, since it is obtained from I by a sequence of domain reductions. In the following, we let Di denote the domain of variable i in DAC(I). Consider three variables i < j < k and four domain values u ∈ Di , v ∈ Dj , a, b ∈ Dk such
GENERALIZING THE BTP
In this section we show that a weaker form of the broken-triangle property also implies backtrack-free search. This leads to a larger, but non-conservative, tractable class of CSP instances. Throughout this section, we assume that domains are totally ordered. Definition 21 A binary CSP instance is min-of-max extendable with respect to the variable ordering <, if for all triples of variables i, j, k such that i < j < k, if (u, v) ∈ Rij , then (u, v, c) is a consistent assignment for (i, j, k), where c = min(max(Rik (u)), max(Rjk (v))) The symmetrically equivalent property max-of-min extendability is defined similarly, with c = max(min(Rik (u)), min(Rjk (v))). Lemma 22 A binary CSP instance satisfies the broken-triangle property w.r.t. a variable ordering < if and only if it is min-of-max extendable w.r.t. < for all possible domain orderings. Proof: Suppose that a CSP instance satisfies the broken-triangle property with respect to <, and consider an arbitrary ordering of each of the domains. To prove min-of-max extendability, it suffices to apply the broken-triangle property to a = max(Rik (u)) and b = max(Rjk (v)). Since a and b are maximal, it must be (u, v, min(a, b)) which is the consistent extension of (u, v). To prove the converse, suppose that a CSP instance is min-of-max extendable for all possible domain orderings. For any a, b ∈ Dk , consider an ordering of Dk for which a, b are the two maximal elements. The broken-triangle property then follows from the definition of min-of-max extendability.
Theorem 23 If a binary CSP instance is min-of-max extendable w.r.t. some known variable ordering < and some (possibly unknown) domain orderings, and is also directional arc-consistent with respect to <, then it is universally backtrack-free w.r.t. <, and hence can be solved in polynomial time. Proof: Suppose that (u1 , . . . , uk−1 ) is a consistent assignment for the variables (1, . . . , k − 1). By directional arc consistency, ∀i < k, Rik (ui ) = ∅. This means that c = min{max(Rik (ui )) : 1 ≤ i ≤ k − 1} is well-defined. Let j ∈ {1, . . . , k − 1} be such that c = max(Rjk (uj )). Let i be any variable in {1, . . . , k − 1} − {j}. Applying the definition of min-of-max extendability to variables i, j, k allows us to deduce that (ui , c) ∈ Rik . It follows that ∃uk ∈ Dk
534
M.C. Cooper et al. / Hybrid Tractable CSPs Which Generalize Tree Structure
(namely uk = c) such that (u1 , . . . , uk ) is a consistent assignment for the variables (1, . . . , k). Note that we used the ordering of domain Dk only to prove the existence of a consistent extension (u1 , . . . , uk ) of (u1 , . . . , uk−1 ). A backtrack-free search algorithm need not necessarily choose uk = c and hence does not need to know the domain orderings. Theorem 24 The problem of finding a variable ordering for a binary CSP instance with ordered domains such that it is min-of-max extendable w.r.t. that ordering (or determining that no such ordering exists) is solvable in polynomial time. Proof: The requirements for the ordering are a subset of the requirements for establishing the broken triangle property. Hence the result can be proved exactly as in the proof of Theorem 8. We can use Theorem 24 in the following way: given a CSP instance with ordered domains, compute its arc consistency closure, and then test (in polynomial time) whether this reduced instance is min-of-max extendable for some ordering of its variables. If we find such an ordering, then the instance can be solved in polynomial-time, by Theorem 23. However, this approach is not guaranteed to find all possible useful variable orderings achieving min-of-max extendability. Since minof-max extendability is not a conservative property, it may be that, for some variable orderings, the directional arc-consistency closure is min-of-max extendable but the full arc-consistency closure is not (or vice versa). In fact we conjecture that, for a given binary CSP instance with fixed domain orderings, determining whether there exists some variable ordering such that the directional arc-consistency closure is min-of-max extendable with respect to that ordering is NP-complete. We also conjecture that determining whether a CSP instance is min-of-max extendable for some unknown domain orderings, even for a fixed variable ordering, is NP-complete. Finally, we show that min-of-max extendability is a generalization of a previously-identified hybrid tractable class based on row-convex constraints [18]. Definition 25 A CSP instance is row-convex (w.r.t. a fixed variable ordering and fixed domain orderings) if for all pairs of variables i < j, ∀u ∈ Di , Rij (u) is the interval [a, b] for some a, b ∈ Dj . It is known that a directional path-consistent row-convex binary CSP instance is universally backtrack-free and hence tractable [18]. (However, it should be noted that establishing directional path consistency may destroy row-convexity.) Our interest in this hybrid tractable class is simply to demonstrate that it is a special case of min-of-max extendability. Proposition 26 If a binary CSP instance is directional pathconsistent and row-convex, then it is min-of-max extendable (and also max-of-min extendable). Proof: Consider the triple of variables i < j < k and suppose that (u, v) ∈ Rij . By directional path consistency, ∃c ∈ Dk such that (u, c) ∈ Rik and (v, c) ∈ Rjk . By row-convexity, Rik (u) and Rjk (v) are intervals in the ordered domain Dk . The existence of c means that these intervals overlap. Both end-points of this overlap provide extensions of (u, v) to a consistent assignment for the variables (i, j, k). One end-point is given by min(max(Rik (u)), max(Rjk (v))) which ensures min-of-max extendability. (The other ensures max-of-min extendability.)
7
CONCLUSION
We have described new hybrid tractable classes of binary CSP instances which significantly generalize tree-structured problems as well as previously-identified language-based and hybrid tractable classes. The new classes are based on local properties of ordered triples of variables. Moreover, we have shown that the problem of determining a variable ordering for which these properties hold is solvable in polynomial time. We see this work as a first step towards a complete characterization of all hybrid tractable classes of constraint satisfaction problems.
REFERENCES [1] C. Bessi`ere and J.-C. R´egin, ‘Refining the basic constraint propagation algorithm’, in Proc IJCAI’01, Seattle, WA, pp. 309–315, (2001). [2] Andrei Bulatov, Peter Jeavons, and Andrei Krokhin, ‘Classifying the complexity of constraints using finite algebras’, SIAM Journal on Computing, 34(3), 720–742, (2005). [3] Andrei A. Bulatov, ‘Tractable conservative constraint satisfaction problems’, in Proceedings of 18th IEEE Symposium on Logic in Computer Science (LICS 2003), 22-25 June 2003, Ottawa, Canada, pp. 321–330. IEEE Computer Society, (2003). [4] Andrei A. Bulatov, ‘A dichotomy theorem for constraint satisfaction problems on a 3-element set’, Journal of the ACM, 53(1), 66–120, (2006). [5] David Cohen, Peter Jeavons, and Marc Gyssens, ‘A unified theory of structural tractability for constraint satisfaction problems’, Journal of Computer and System Sciences, 74(5), 721–743, (2008). [6] Martin C. Cooper, ‘Fundamental properties of neighbourhood substitution in constraint satisfaction problems’, Artificial Intelligence, 90(1– 2), 1–24, (1997). [7] R. Dechter and J. Pearl, ‘Network-based heuristics for constraint satisfaction problems’, Artificial Intelligence, 34(1), 1–38, (1987). [8] Rina Dechter, ‘Tractable structures for constraint satisfaction problems’, in Handbook of Constraint Programming, eds., Francesca Rossi, Peter van Beek, and Toby Walsh, 209–244, Elsevier, (2006). [9] Tom´as Feder and Moshe Y. Vardi, ‘The computational structure of monotone monadic SNP and constraint satisfaction: A study through Datalog and group theory’, SIAM Journal of Computing, 28(1), 57– 104, (1998). [10] Eugene C. Freuder, ‘Eliminating interchangeable values in constraint satisfaction problems’, in Proc. AAAI-91, Anaheim, CA, pp. 227–233, (1991). [11] Eugene C. Freuder and Charles D. Elfe, ‘Neighborhood inverse consistency preprocessing’, in Proc. AAAI/IAAI-96, Portland, OR, Vol. 1, pp. 202–208, (1996). [12] Martin Grohe, ‘The structure of tractable constraint satisfaction problems’, in Proceedings of the 31st Symposium on Mathematical Foundations of Computer Science, volume 4162 of Lecture Notes in Computer Science, pp. 58–72. Springer-Verlag, (2006). [13] P.G. Jeavons, ‘On the algebraic structure of combinatorial problems’, Theoretical Computer Science, 200, 185–204, (1998). [14] P.G. Jeavons and M.C. Cooper, ‘Tractable constraints on ordered domains’, Artificial Intelligence, 79(2), 327–339, (1995). [15] R. Mohr and G. Masini, ‘Good old discrete relaxation’, in Proceedings 8th European Conference on Artificial Intelligence —ECAI’88, ed., Y. Kodratoff, pp. 651–656. Pitman, (1988). [16] J.K. Pearson and P.G. Jeavons, ‘A survey of tractable constraint satisfaction problems’, Technical Report CSD-TR-97-15, Royal Holloway, University of London, (July 1997). [17] T.J. Schaefer, ‘The complexity of satisfiability problems’, in Proceedings 10th ACM Symposium on Theory of Computing, STOC’78, pp. 216–226, (1978). [18] Peter van Beek and Rina Dechter, ‘On the minimality and decomposability of row-convex constraint networks’, Journal of the ACM, 42(3), 543–561, (1995).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-535
535
Justification-Based Non-Clausal Local Search for SAT Matti J¨arvisalo and Tommi Junttila and Ilkka Niemel¨a1 Abstract. While stochastic local search (SLS) techniques are very efficient in solving hard randomly generated propositional satisfiability (SAT) problem instances, a major challenge is to improve SLS on structured problems. Motivated by heuristics applied in complete circuit-level SAT solvers in electronic design automation, we develop novel SLS techniques by harnessing the concept of justification frontiers. This leads to SLS heuristics which concentrate the search into relevant parts of instances, exploit observability don’t cares and allow for an early stopping criterion. Experiments with a prototype implementation of the framework presented in this paper show up to a four orders of magnitude decrease in the number of moves on real-world bounded model checking instances when compared to WalkSAT on the standard CNF encodings of the instances.
1 INTRODUCTION Advances in propositional satisfiability (SAT) testing have established SAT based methods as a competitive way of solving combinatorial problems in various domains. Stochastic local search (SLS) methods, such as [16, 15, 10, 3], are very efficient especially in solving randomly generated SAT instances. However, for structural realworld SAT instances complete DPLL based SAT solvers seem to dominate SLS solvers (see, e.g., results of the latest SAT competitions at http://www.satcompetition.org/). Further work on improving SLS techniques for structural problems is needed and, in particular, developing techniques for handling variable dependencies efficiently has been identified a major challenge [7]. One problem in developing efficient techniques for handling variable dependencies is that typically the most efficient SLS solvers work on the flat CNF input format. Some techniques for CNF level SLS solvers have been developed to utilize propagation during search [2]. However, there seems to be room for novel structurebased SLS techniques exploiting variable dependencies more directly. Indeed, in SAT based approaches, direct CNF encodings of a problem domain are rarely used: the problem at hand is typically encoded with a structure-preserving general propositional formula φ which can then be translated into an equi-satisfiable CNF formula by introducing additional variables for the subformulas of φ. There are also SAT solvers which—instead of demanding CNF translation before solving—work directly on general formulas. Such solvers use Boolean circuits [11] as the compact representation for a general propositional formula in a DAG-like structure. However, such solvers are typically complete DPLL style non-clausal algorithms [5, 8, 9, 17]. Only a few SLS methods have been proposed for 1
Helsinki University of Technology, Dept. Information and Comp. Sci., Finland. Emails: {matti.jarvisalo,tommi.junttila,ilkka.niemela}@tkk.fi. Research supported by Academy of Finland (#122399 (MJ,IN) and #112016 (TJ)). MJ additionally acknowledges support from the HeCSE graduate school, Emil Aaltonen Foundation, Jenny and Antti Wihuri Foundation, Nokia Foundation, and Foundation for Technology Promotion TES.
general propositional formulas [14, 6, 12]. Common to these SLS approaches is that they attempt to explicitly exploit variable dependencies through independent (or input) variables, i.e., sets of variables such that a truth value assignment for them uniquely determines the truth values of all other variables, by focusing the search on truth assignments of input variables. In this paper we develop a novel non-clausal SLS method for structural SAT problems from a different starting point. Our aim is to bring structure-exploiting techniques into local search for SAT in order to lift the performance of local search SAT solving especially on structural real-world problem domains. We employ Boolean circuits as the representation of general propositional formulas. Motivated by justification frontier heuristics (see e.g. [9]) applied in complete circuit-level SAT solvers in electronic design automation, our search technique looks for a justification for the Boolean circuit instead of focusing on finding a satisfying truth assignment. The idea is to be able to drive local search more top-down in the overall structure of the circuit rather than in a bottom-up mode as is done in local search techniques focusing on input variables. This is achieved by guiding the search using justification frontiers that enable exploiting observability don’t cares (see e.g. [13]), drive the search to relevant parts of the circuit, and offer early stopping criteria which allow to end the search when the circuit is de facto satisfiable even if no concrete satisfying truth assignment has been found. Experiments with a prototype implementation of the framework presented in this paper show up to a four orders of magnitude decrease in the number of moves on real-world bounded model checking instances when compared to WalkSAT on the standard CNF encodings of the instances. The rest of this paper is organized as follows. First, Boolean circuits and related central concepts are defined (Sect. 2). The proposed justification-based non-clausal SLS method is then described (Sect. 3) and analyzed w.r.t. both CNF level and previous non-clausal methods (Sect. 4). Initial experiments are presented in Sect. 5.
2 CONSTRAINED BOOLEAN CIRCUITS Boolean circuits offer a natural non-clausal representation for propositional formulas in a compact DAG-like structure with subformula sharing. Rather than translating circuits to CNF for solving the resulting SAT instance by local search, in this work we will work directly on the Boolean circuit representation. A Boolean circuit over a finite set G of gates is a set C of equations of form g := f (g1 , . . . , gn ), where g, g1 , . . . , gn ∈ G and f : {f, t}n → {f, t} is a Boolean function, with the additional requirements that (i) each g ∈ G appears at most once as the left hand side in the equations in C, and (ii) the underlying directed graph ˘ ¯ G(C), E(C) = g , g ∈ G × G | g := f (. . . , g , . . .) ∈ C is acyclic. The set of gates in a circuit C is denoted by G(C). If g , g ∈ E(C), then g is a child of g and g is a parent of g . The
536
M. Järvisalo et al. / Justification-Based Non-Clausal Local Search for SAT
descendant and ancestor relations are defined in the usual way as the transitive closures of the child and parent relations, respectively. If g := f (g1 , . . . , gn ) is in C, then g is an f -gate (or of type f ), otherwise it is an input gate. The set of input gates in C is denoted by inputs(C). A gate with no parents is an output gate. An assignment for C is a (possibly partial) function τ : G → {f, t}. A total assignment τ is consistent with C if τ (g) = f (τ (g1 ), . . . , τ (gn )) for each g := f (g1 , . . . , gn ) in C. A constrained Boolean circuit C α is a pair C, α, where C is a circuit and α is an assignment for C. Each g, v ∈ α is called a constraint where g is constrained to v (typically used for setting an output gate to a truth value). A total assignment τ for C satisfies C α if (i) τ is consistent with C, and (ii) respects the constraints: τ ⊇ α. If some total assignment satisfies C α , then C α is satisfiable and otherwise unsatisfiable. In this work we consider Boolean circuits in which the following Boolean functions are available as gate types. • • • •
NOT(v)
is t iff v is f. is t iff at least one of v1 , . . . , vn is t. AND (v1 , . . . , vn ) is t iff all v1 , . . . , vn are t. XOR(v1 , v2 ) is t iff exactly one of v1 , v2 is t.
OR(v1 , . . . , vn )
However, notice that the techniques developed in this paper can be adapted for a wider range of types such as equivalence and cardinality gates. In order to keep the presentation and algorithms simpler, we assume that constraints only appear in the output gates of constrained circuits. Any circuit can be rewritten into such a normal form by using the rules in [5]. Example 1 Figure 1 shows a Boolean circuit for a full-adder with the constraint that the carry-out bit c1 is t. One satisfying total assignment for the circuit is {c1 , t, t1 , t, o0 , f, t2 , f, t3 , t, a0 , t, b0 , f, c0 , t}. (1)
c1 OR
C
t
t1 AND
o0 XOR
t2 AND
t3 XOR
a0
b0
Figure 1.
c0
=
{c1 :=
OR(t1 , t2 )
t1 :=
AND(t3 , c0 )
o0 :=
XOR(t3 , c0 )
t2 :=
AND(a0 , b0 )
t3 :=
XOR(a0 , b0 )}
α = {c1 , t}
A constrained Boolean circuit C α .
The restriction of an assignment τ to a set G ⊆ G of gates is defined as usual: τ |G = {g, v ∈ τ | g ∈ G }. Given a non-input gate g := f (g1 , . . . , gn ) and a value v ∈ {f, t}, a justification for the pair g, v is a partial assignment σ : {g1 , . . . , gn } → {f, t} to the children of g such that f (τ (g1 ), . . . , τ (gn )) = v holds for all extensions τ ⊃ σ. That is, the values assigned by σ to the children of g are enough to force g to have the value v. A gate g is justified in an assignment τ if it is assigned, i.e. τ (g) is defined, and (i) it is an input gate, or (ii) g := f (g1 , . . . , gn ) ∈ C and τ |{g1 ,...,gn } is a justification for g, τ (g). For example, consider the gate t1 in Fig. 1. The possible justifications for t1 , f are {t3 , f}, {t3 , f, c0 , t}, {t3 , f, c0 , f}, {c0 , f}, and {t3 , t, c0 , f}; the first and fourth are subset minimal ones. Gate t1 is justified in the assignment (1). Given a constrained circuit C α and an assignment τ ⊇ α for C, the justification cone of C α under τ , denoted by jcone(C α , τ ), is the minimal set of gates satisfying the following requirements.
1. All constrained gates belong to the cone. That is, if g, v ∈ α, then g ∈ jcone(C α , τ ). 2. If a justified gate belongs to the cone, then all the gates that participate in some subset minimal justification for the gate are also in the cone. Formally, if g ∈ jcone(C α , τ ) and (i) g is a non-input gate, (ii) g is justified in τ , and (iii) gi , vi ∈ σ for some subset minimal justification σ for g, τ (g), then gi ∈ jcone(C α , τ ). In principle it would be sufficient to consider only one, arbitrarily chosen subset minimal justification. However, such a formalization would make jcone(C α , τ ) ambiguously defined. The justification frontier of C α under τ is the “bottom edge” of the justification cone, i.e. those gates in the cone that are not justified: jfront(C α , τ ) = {g ∈ jcone(C α , τ ) | g is not justified in τ }. A gate g is interesting in τ if it belongs to the frontier jfront(C α , τ ) or is a descendant of a gate in it; the set of all gates interesting in τ is denoted by interest(C α , τ ). A gate g is an (observability) don’t care if it is neither interesting nor in the justification cone jcone(C α , τ ). For instance, consider the constrained circuit C α in Fig. 1. Under the assignment τ = {c1 , t, t1 , t, o0 , f, t2 , f, t3 , t, a0 , f, b0 , f, c0 , t}, the justification cone jcone(C α , τ ) is {c1 , t1 , t3 , c0 }, the justification frontier jfront(C α , τ ) is {t3 }, interest(C α , τ ) = {t3 , a0 , b0 }, and the gates t2 and o0 are don’t cares. Proposition 1 If the justification frontier jfront(C α , τ ) is empty for some total assignment τ , then the constrained circuit C α is satisfiable. When jfront(C α , τ ) is empty, a satisfying assignment can be obtained by (i) restricting τ to the input gates appearing in the justification cone, i.e. to the gate set jcone(C α , τ ) ∩ inputs(C), (ii) assigning other input gates arbitrary values, and (iii) recursively evaluating the values of non-input gates. Thus, whenever jfront(C α , τ ) is empty, we say that τ de facto satisfies C α . As an example, the assignment τ = {c1 , t, t1 , f, o0 , f, t2 , t, t3 , t, a0 , t, b0 , t, c0 , t} de facto satisfies the constrained circuit C α in Fig. 1. Also note that if a total truth assignment τ satisfies C α , then jfront(C α , τ ) is empty. Translating Circuits to CNF. Each constrained Boolean circuit C α can be translated into an equi-satisfiable CNF formula cnf(C α ) by applying the standard “Tseitin translation”. In order to obtain a small CNF formula, the idea is to introduce a variable g˜ for each gate g in the circuit, and then to describe the functionality of each gate with a set of clauses. For instance, an AND-gate g ∨ g˜1 ),. . . , g := AND(g1 , . . . , gn ) is translated into the clauses (¬˜ g ∨ ¬˜ g1 ∨ . . . ∨ ¬˜ gn ). The constraints are trans(¬˜ g ∨ g˜n ), and (˜ lated into unit clauses: g, t ∈ α introduces the unit clause (˜ g ) and g, f ∈ α the negated unit clause (¬˜ g ). A Note on Negations. As usual in many SAT algorithms, we will implicitly ignore NOT-gates of form g := NOT(g1 ); g and g1 are always assumed to have the opposite values. Thus NOT-gates are, for instance, (i) “inlined” in the cnf translation by substituting ¬˜ g1 for g˜, and (ii) never counted in an interest set interest(C α , τ ).
3 JUSTIFICATION-BASED NON-CLAUSAL SLS In contrast to typical local search algorithms for SAT, which work on CNF formulas, we develop justification-based non-clausal stochastic
M. Järvisalo et al. / Justification-Based Non-Clausal Local Search for SAT
local search techniques. As typical in clausal SLS, a configuration is described by a total truth assignment. However, our method works directly on general propositional formulas represented as Boolean circuits, and hence a configuration is a total assignment on the gates of the Boolean circuit at hand. In contrast to typical local search for SAT, we exploit—motivated by successful implementations of complete circuit SAT solving techniques (see, e.g., [9])–techniques for detecting justification-based don’t cares within our Boolean circuit SAT local search (BC SLS) framework. This is based on justification frontiers, which guide the search heuristics to concentrate on relevant parts of the instance and, moreover, provide an alternative, early stopping criterion for the search. We demonstrate the novel approach by developing a WalkSAT type algorithm [15] that exploits justification frontiers in guiding search. In the clausal WalkSAT local moves are based on randomly selecting a clause falsified by the current truth assignment. In our algorithm the role of the falsified clauses is played by the gates in the justification front, i.e., the gates in the justification cone not justified by the current assignment. WalkSAT flips one of the variables in the chosen clause in the greedy move to maximize the decrease in the number of falsified clauses. In our case a greedy move selects a justification for the chosen gate to minimize the number of interesting gates. The resulting method is presented as Algorithm 1. Given a constraint circuit C α and a noise parameter p ∈ [0, 1] (with p = 0 only greedy moves are made), the algorithm performs local search over the assignment space of all the gates in C (inner loop on lines 3-13). Algorithm 1 BC SLS Input: constrained Boolean circuit C α , parameter p ∈ [0, 1] Output: a de facto satisfying assignment for C α or “don’t know” Explanations: τ : current truth assignment on all gates with τ ⊇ α δ: next move (a partial assignment) 1: for try := 1 to M AX T RIES(C α ) do 2: τ := pick an assignment over all gates in C s.t. τ ⊇ α 3: for move := 1 to M AX M OVES(C α ) do 4: if jfront(C α , τ ) = ∅ then return τ 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:
Select a random gate g ∈ jfront(C α , τ ) with probability (1 − p) do %greedy move δ := a random justification from those justifications for g, v ∈ τ that minimize cost(τ, ·) otherwise %non-greedy move (with probability p) if g is unconstrained in α δ := {g, ¬v} where g, v ∈ τ else δ := a random justification for g, v ∈ τ τ := (τ \ {g, ¬w | g, w ∈ δ}) ∪ δ return “don’t know”
We will next describe the inner loop of BC SLS in more detail.
3.1 Stopping Criterion Similar to typical CNF level SLS methods, one could terminate the search in BC SLS by applying the standard stopping criterion: when all gates are justified in the current configuration τ , then τ is in itself a satisfying truth assignment for the circuit. However, the justification frontier allows for an early stopping criterion by Proposition 1: when the current front jfront(C α , τ ) is empty (line 4), the current
537
configuration τ de facto satisfies C α . Thus we can obtain from τ a satisfying assignment after the search is terminated by simply evaluating the unconstrained gates in C α by using the values for input gates in τ . This is a stronger stopping criterion than the standard one, since the front is empty whenever the standard one holds, but the opposite does not necessarily hold: the front can be empty even if there are gates in the circuit which are not justified in τ .
3.2 Making Moves For each of the M AX T RIES(C α ) runs of the inner loop, M AX M OVES(C α ) moves are made. The moves exploit structural information and semantics of individual gates for finding a justification for the currently assigned value of a chosen gate (lines 6-12). Given the current configuration τ , we concentrate on making moves on gates in jfront(C α , τ ) by randomly picking a gate g from this set. For a gate g and its current value v in τ , the possible greedy moves are induced by the justifications for g, v. The idea is to minimize the size of the interest set. In other words, the value of the cost function for a move (justification) δ is ˛ ˛ cost(τ, δ) = ˛interest(C α , τ )˛, where τ = (τ \ {g, ¬w | g, w ∈ δ}) ∪ δ. That is, the cost of a move δ is given by the size of the interest set in the configuration τ where for the gates mentioned in δ we use the values in δ instead of those in τ . The move is then selected randomly from those justifications δ for g, v for which the value cost(τ, δ) is smallest over all justifications for g, v. During a non-greedy move (lines 9-12, executed with probability p), we invert the value of the gate g itself whenever this is possible, i.e., when g is not constrained in α. The idea here is to try to escape from possible local minima by more radically changing the justification front, most likely upwards in the circuit structure. In the case that we may not invert the value of g (since it is constrained), the move is chosen randomly from the set of all justifications for g, v ∈ τ .
4 ANALYSIS 4.1 Interest Set Size Driven Greedy Moves Considering greedy moves, the objective function under minimization in BC SLS is cost(τ, ·). Alternatively, one could use the objective of minimizing |jfront(C α , τ )|, since (i) flipping is concentrated on gates in jfront(C α , τ ) and (ii) the stopping criterion jfront(C α , τ ) = ∅ is used. The reasoning behind choosing to minimize the number of gates in interest(C α , τ ) is that it gives a better progress measure than minimizing the number of gates in the justification front. First, notice that the justification front cannot become empty before it reaches a subset of the input gates, since only input gates are justified by default. Now, the size of the interest set gives an upper bound on the number of gates that still need to be justified (the descendants of the gates in the front). Following this intuition, by minimizing the size of the interest set the greedy moves drive the search towards the input gates.
4.2 Comparison with Clausal Methods One of the main advantages of the proposed BC SLS method over clausal local search methods is that BC SLS can exploit observability don’t cares. As an example, consider the circuit in Fig. 2(a), where the gate g1 is constrained to true and the other t and f symbols depict
538
M. Järvisalo et al. / Justification-Based Non-Clausal Local Search for SAT
the current configuration τ . All the gates, except g6 , in the complex subcircuit rooted at the gate g2 are don’t cares under τ . Therefore BC SLS can ignore the subcircuit and terminate after flipping the input gate g5 as the justification front becomes empty. However, assume that we translate the circuit into a CNF formula by using the Tseitin translation cnf given in Sect. 2. If we apply a clausal SLS algorithm such as WalkSAT on the CNF formula, observability don’t cares are no longer available in the sense that the algorithm must find a total truth assignment that simultaneously satisfies all the clauses originating from the subcircuit. This can be a very complex task. t or g1 f and g2 complex subcircuit f
f and f g6
t or
t xor g3 g5
(a) Exploiting don’t cares. Figure 2.
not gx1
t
t or
gx2
gx3
gx4
(b) A CNF circuit.
Example circuits
We can also analyze how BC SLS behaves on flat clausal input. To do this, we associate a CNF formula F = C1 ∧ . . . ∧ Ck with a constrained CNF circuit ccirc(F ) = C, α as follows. Take an input gate gx for each variable x occurring in F . Now C = {gCi := OR(gl1 , ..., g¯lm ) | Ci = (l1 ∨ . . . ∨ lm )} ∪ ˘ g¬x := NOT(gx ) | ¬x ∈ ∪ki=1 Ci and the constraints force each “clause gate” gCi to true: α = {gCi , t | 1 ≤ i ≤ k}. This is illustrated in Fig. 2(b) for F = (x1 ∨ ¬x2 ) ∧ (¬x2 ∨ x3 ∨ x4 ). When BC SLS is run on a CNF circuit, it can only flip input variables. If input gates were excluded from the set interest(C α , τ ) of interesting gates, then |interest(C α , τ )| would equal to the number of unjustified clause gates in the configuration τ . Thus the greedy move cost function cost(τ, ·) would equal to that applied in WalkSAT measuring the number of clauses fixed/broken during a flip. Since input gates are included in interest(C α , τ ), the BC SLS cost function also measures, in CNF terms, the number of variables occurring in unsatisfied clauses.
4.3 Comparison with Non-Clausal Methods SLS techniques working directly on non-clausal problems closest to our work include [14, 6, 12]. They are all based on the idea of limiting flipping to input (independent) variables whereas we allow flipping all gates (subformulas) of the problem instance. Moreover, in these approaches the greedy part of the search is driven by a cost function which is substantially different from the justification-based cost function that we employ. Sebastiani [14] generalizes the GSAT heuristic to general propositional formulas and defines the cost function by (implicitly) considering the CNF form cnf(φ) of the general formula φ: the cost for a truth assignment is the number of clauses in cnf(φ) falsified by the assignment. The approaches of Kautz and Selman [6] and Pham et al. [12] both use a Boolean circuit representation of the problem and employ a cost function which, given a truth assignment for the input gates, counts the number of constrained output gates falsified by the assignment. This cost function provides limited guidance to greedy moves in cases where there are few constrained output gates or they are far from the input gates. A worst-case scenario occurs when the Boolean circuit given as input has a single output gate implying that the cost function can only have the values 0 or 1 for
any flip under any configuration. Such a cost function does not offer much direction for the greedy flips towards a satisfying truth assignment. Our cost function appears to be less sensitive to the number of output gates or their distance from the input gates. This is because the search is based on the concept of a justification frontier which is able to distribute the requirements implied by the constrained output gates deeper in the circuit.
5 EXPERIMENTS In order to evaluate the ideas behind the BC SLS framework, we have implemented a prototype on top of the bc2cnf Boolean circuit simplifier/CNF translator [4]. The computation of justification cone is implemented directly by the definition. When making greedy and random moves, justifications are selected from the set of subset minimal justifications for the gate value; for a true OR-gate and false AND-gate, the value of a single child is inverted, and for a true OR-gate and false AND-gate the values of all children are inverted. As structural benchmarks we use a set of Boolean circuits encoding bounded model checking of asynchronous systems for deadlocks [1], available at http://www.tcs.hut.fi/∼mjj/ benchmarks/. Although rather easy for current DPLL solvers, these benchmarks are challenging for typical SLS methods. Since our implementation is at present a very preliminary nonincremental one, we will compare the number of moves made by WalkSAT and our prototype.2 We use WalkSAT, since the current prototype—as explained also in Sect. 3—can be basically seen as a justification-based variation of WalkSAT. For running WalkSAT, we apply exactly the same Boolean circuit level simplification in bc2cnf to the circuits as in our prototype (including, e.g., circuit level propagation that is equivalent to unit propagation), and then translate the simplified circuit to CNF with the Tseitin-style translation implemented in bc2cnf for running WalkSAT. We run both WalkSAT and our prototype implementation with the default noise value p = 0.5 (that is, 50%). To make a fair evaluation (not favoring our prototype), we allow WalkSAT 108 moves and limit our implementation to a maximum of 106 moves. Each instance is run 9 times without restarts. The number of gates in the simplified circuits (column #gates), and the number of variables (#vars) and clauses (#clauses) resulting from the standard CNF translation, are given in Table 1. Furthermore, the minimum (min), median (med), and maximum (max) number of moves for each instance is presented. The number of runs without a satisfying truth assignment is given in the column max in parentheses. Additionally, we give the ratio of the number of moves made by our prototype and WalkSAT for the minimum, median, and maximum number of moves done by the solvers. For example, the max/max ratio of 533.43 for the instance speed 1.fsa-b10-s means that the maximum number of moves made by WalkSAT over the nine runs was 533.43 times as large as the maximum number of moves done by our implementation on the instance. To sum up, the experiments demonstrate potential of our novel approach when solving structural (non-clausal) SAT instances. A promising observation is that our justification frontier based technique seems to keep the search rather focused when the size of the instance grows as witnessed by the modestly increasing number of moves. In particular, this compares favorably to WalkSAT which typically exceeds the cutoff of 108 moves as the instance sizes grow. 2
The prototype computes the justification front and cone repeatedly in a global, non-incremental way. This naive implementation makes around 80250 times fewer flips per second (fps) than WalkSAT on instances with 1000-2500 gates. By careful re-implementation a very substantial increase is expected in the fps rate by incrementally computing the front and cone.
M. Järvisalo et al. / Justification-Based Non-Clausal Local Search for SAT
539
Table 1. Comparison of a prototype implementation of BC SLS with WalkSAT CNF Instance #gates #vars #clauses speed 1.fsa-b6-s 836 688 2087 1142 943 2875 speed 1.fsa-b7-s speed 1.fsa-b8-s 1448 1198 3660 1754 1453 4444 speed 1.fsa-b9-s 5226 speed 1.fsa-b10-s 2060 1708 speed 1.fsa-b12-s 2672 2218 6786 7563 speed 1.fsa-b13-s 2978 2473 8338 speed 1.fsa-b14-s 3284 2728 speed 1.fsa-b15-s 3590 2983 9111 speed 1.fsa-b6-p 687 577 1666 speed 1.fsa-b7-p 1022 863 2528 1359 1149 3387 speed 1.fsa-b8-p 1696 1435 4245 speed 1.fsa-b9-p speed 1.fsa-b10-p 2033 1721 5101 6793 speed 1.fsa-b12-p 2703 2289 7643 speed 1.fsa-b13-p 3040 2575 dp 12.fsa-b5-s 1579 1339 4148 2060 1748 5418 dp 12.fsa-b6-s 2541 2157 6688 dp 12.fsa-b7-s 3022 2566 7958 dp 12.fsa-b8-s dp 12.fsa-b9-s 3503 2975 9228 dp 12.fsa-b5-p 1267 1111 3210 1844 1616 4673 dp 12.fsa-b6-p 2421 2121 6136 dp 12.fsa-b7-p dp 12.fsa-b8-p 2998 2626 7599 3575 3131 9062 dp 12.fsa-b9-p elevator 1-b4-p 264 230 649 elevator 1-b4-s 534 447 1363 841 704 2163 elevator 1-b5-s 1307 1093 3388 elevator 1-b6-s elevator 2-b6-p 896 775 2308 1606 1379 4214 elevator 2-b7-p elevator 2-b6-s 1582 1339 4157 2448 2070 6495 elevator 2-b7-s mmgt 2.fsa-b6-p 903 777 2285 mmgt 2.fsa-b7-p 1283 1113 3278 1421 1188 3722 mmgt 2.fsa-b6-s 1953 1692 5034 mmgt 3.fsa-b7-p 3079 2600 8260 mmgt 3.fsa-b7-s
BC SLS #moves min med max 965 965 4358 2266 2578 5077 1633 1849 5518 5029 6695 12616 5089 11313 28423 6899 41379 141700 31384 139921 415601 43690 179967 587184 33647 321554 - (1) 1129 1129 1129 2148 2777 7614 4338 8248 22294 5176 10610 27500 7249 30846 60009 45304 144787 735228 34363 328346 709696 8880 27519 36421 23740 52975 106542 28289 69029 170824 33935 91764 461459 83107 162137 446453 4838 85808 411793 38563 118477 221461 22545 69040 214360 73826 132431 576672 50227 148409 425594 171 171 171 869 869 1723 2543 3632 4788 5073 57305 116572 1789 4376 15621 6221 18601 91691 6776 16702 - (1) 7940 28524 72247 1220 1220 34132 5671 32236 86944 3051 10831 38780 5136 7264 48315 39796 178191 833128
WalkSAT #moves min med max 2252 5805 11368 6255 20915 38237 9266 62497 95837 25911 330321 1643032 1511045 4376285 15161778 - (9) - (9) - (9) - (9) 1342 1851 7706 4636 10916 25955 10991 40833 278042 33752 76864 288506 2043613 4638369 10800631 - (9) - (9) 14469 37361 81102 123249 790190 2299552 397887 28360757 - (1) - (9) - (9) 4619 12545 46037 17961 85344 145830 113932 244863 406876 379112 14101664 69496990 - (9) 1176 3041 10801 2450 51226 270317 19472 202139 391216 305888 1317650 4433915 16898 1134779 2824590 492869 2756750 13232933 13576544 - (8) - (9) 170454 620821 1260101 873901 4289501 16896746 8586379 67686412 - (3) 6886854 - (6) - (9)
Considering the input flipping SLS methods in the literature (recall Sect. 4.3), we were, unfortunately, unable at the moment to obtain implementations of these methods for comparison. Comparing input flipping methods to our current framework remains thus an important aspect of future work. We did also investigate the performance of AdaptNovelty+ [3] on the benchmarks. We omit the precise results here due to space reasons. On the whole, although AdaptNovelty+ does find satisfying truth assignments for more instances than WalkSAT using the cutoff of 108 moves, our prototype shows typically a one-to-three orders of magnitude reduction in the number of moves compared to AdaptNovelty+ — rather similarly as when compared to WalkSAT.
6 CONCLUSIONS Motivated by techniques applied in circuit-level SAT solvers in electronic design automation, we present a novel approach to solving structural SAT problems with local search on non-clausal level. By incorporating justification frontiers, we develop SLS heuristics which concentrate the search into relevant parts of instances, exploit observability don’t cares and allow for an early stopping criterion. Encouraged by the potential witnessed by low move counts of a prototype implementation, we see various directions for further work. We plan to replace the prototype with a proper solver implementation with specialized data structures. For achieving self-tuning of the greediness parameter for effectively escaping from local minima, developing adaptive noise mechanisms [3] for non-clausal SLS is a topic for further work. Another aspect is to investigate the effect of adding local consistency checking (on the circuit level, extending studies on adding propagation to CNF level SLS [2]) into the framework, and possibly even conflict learning.
relative gain in #moves min/min med/med max/max 2.33 6.02 2.61 2.76 8.11 7.53 5.67 33.80 17.37 5.15 49.34 130.23 296.92 386.84 533.43 >14494.85 >2416.68 >705.72 >3186.34 >714.69 >240.62 >2288.85 >555.66 >170.30 >2972.03 >310.99 1.19 1.64 6.83 2.16 3.93 3.41 2.53 4.95 12.47 6.52 7.24 10.49 281.92 150.37 179.98 >2207.31 >690.67 >136.01 >2910.11 >304.56 >140.91 1.63 1.36 2.23 5.19 14.92 21.58 14.07 410.85 >585.40 >2946.81 >1089.75 >216.70 >1203.27 >616.76 >223.99 0.95 0.15 0.11 0.47 0.72 0.66 5.05 3.55 1.90 5.14 106.48 120.51 >1990.96 >673.81 >234.97 6.88 17.78 63.16 2.82 58.95 156.89 7.66 55.66 81.71 60.30 22.99 38.04 9.45 259.32 180.82 79.23 148.20 144.32 2003.62 >5987.31 >12594.46 >3505.82 >1384.14 139.72 508.87 36.92 154.10 133.07 194.34 2814.28 6249.32 >2578.65 1340.90 >13766.52 >2069.75 >2512.82 >561.20 >120.03
REFERENCES [1] K. Heljanko, ‘Bounded reachability checking with process semantics’, in CONCUR, volume 2154 of LNCS, pp. 218–232. Springer, (2001). [2] E.A. Hirsch and A. Kojevnikov, ‘UnitWalk: A new SAT solver that uses local search guided by unit clause elimination’, Annals of Mathematics and Artificial Intelligence, 43(1), 91–111, (2005). [3] H.H. Hoos, ‘An adaptive noise mechanism for WalkSAT’, in AAAI, pp. 655–660, (2002). [4] T. Junttila. The BC package and a file format for constrained Boolean circuits˙ http://www.tcs.hut.fi/∼tjunttil/bcsat/. [5] T. Junttila and I. Niemel¨a, ‘Towards an efficient tableau method for Boolean circuit satisfiability checking’, in CL 2000, volume 1861 of LNAI, pp. 553–567. Springer, (2000). [6] H. Kautz, D. McAllester, and B. Selman, ‘Exploiting variable dependency in local search’, in IJCAI poster session, (1997). http://www. cs.rochester.edu/u/kautz/papers/dagsat.ps. [7] H.A. Kautz and B. Selman, ‘The state of SAT’, Discrete Applied Mathematics, 155(12), 1514–1524, (2007). [8] A. Kuehlmann, M.K. Ganai, and V. Paruthi, ‘Circuit-based Boolean reasoning’, in DAC, pp. 232–237. ACM, (2001). [9] A. Kuehlmann, V. Paruthi, F. Krohm, and M. K. Ganai, ‘Robust Boolean reasoning for equivalence checking and functional property verification’, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 21(12), 1377–1394, (2002). [10] D. McAllester, B. Selman, and H. Kautz, ‘Evidence for invariants in local search’, in AAAI, pp. 321–326, (1997). [11] C. Papadimitriou, Computational Complexity, Addison-Wesley, 1995. [12] D.N. Pham, J. Thornton, and A. Sattar, ‘Building structure into local search for SAT’, in IJCAI, pp. 2359–2364, (2007). [13] S. Safarpour, A. Veneris, R. Drechsler, and J. Lee, ‘Managing don’t cares in Boolean satisfiability’, in DATE’04. IEEE, (2004). [14] R. Sebastiani, ‘Applying GSAT to non-clausal formulas’, Journal of Artificial Intelligence Research, 1, 309–314, (1994). [15] B. Selman, H.A. Kautz, and B. Cohen, ‘Noise strategies for improving local search’, in AAAI, pp. 337–343, (1994). [16] B. Selman, H. Levesque, and D. Mitchell, ‘A new method for solving hard satisfiability problems’, in AAAI, pp. 440–446, (1992). [17] C. Thiffault, F. Bacchus, and T. Walsh, ‘Solving non-clausal formulas with DPLL search’, in CP, volume 3258 of LNCS, pp. 663–678. Springer, (2004).
540
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-540
Multi-valued Pattern Databases Carlos Linares L´opez1 Abstract. Pattern Databases were a major breakthrough in heuristic search by solving hard combinatorial problems various orders of magnitude faster than state-of-the-art techniques at that time. Since then, they have received a lot of attention. Moreover, pattern databases are also researched in conjunction with other domain-independent techniques for solving planning tasks. However, they are not the only technique for improving heuristic estimates. Although more modest, perimeter search can also lead to significant improvements in the number of generated nodes and overall running time. Therefore, whether they can be combined or not is a natural and interesting issue. While other researchers have recently proven that a joint application of both ideas (termed as multiple goal) leads to no progress at all, it is shown here that there are other alternatives for putting both techniques together —denoted here as multi-valued. This paper shows that multivalued pattern databases can still improve the performance of standard (or single-valued) pattern databases in practice. It also examines how to enhance memory usage when comparing multi-valued pattern databases in contraposition to various single-valued standard pattern databases.
1
Introduction
Heuristics play a central role in problem-solving by guiding search algorithms towards the goal state from an arbitrary state, anywhere in the state space. Before the conception of pattern databases [1], heuristics were usually either handcrafted or directly derived by relaxing the original constraints of the problem at hand. In other words, pattern databases are an automatic mean for deriving heuristic functions which are usually far better informed than others. Thus leading to large improvements in the number of nodes generated and the overall running time. However, since pattern databases can take large chunks of main memory, various alternatives have been explored to efficiently use the available memory. On one hand, it has been shown that pattern databases can be successfully compressed, at least, in some domains like the Towers of Hanoi [5]. Also, it has been shown that pattern databases can be mapped re-using the same symbol instead of using a one-to-one mapping [9] as originally suggested. Although pattern databases can lead to further improvements by exploiting some domain-specific properties (e.g. reflections in the definition of the state or intrinsic characterizations of permutation state spaces [7]), they have been used also for solving planning tasks in conjunction with other domain-independent techniques [3] with very good results. In contrast to pattern databases, perimeter search [2, 12] aims at improving an existing heuristic function, instead of automatically 1
Planning and Learning Group, Universidad Carlos III de Madrid. Avda. de la Universidad, 30 - 28911 Legan´es, Madrid (Spain) email: carlos.linares@uc3m.es
generating a new one. Although this technique has been usually employed for solving large sets of instances with respect to the same target, it could be broadly used when solving problems for distinct goal nodes. Altogether, pattern databases can be used for automatically deriving heuristic functions, and perimeter search serves for improving its estimates. Hence, whether they can be combined or not is an interesting issue which has already been addressed [6]. However, the first results in this regard showed that perimeter search leads to no benefit at all. In this paper, a different technique for combining both ideas is discussed.
2
Background
This section succinctly reviews the main concepts underlying both perimeter search and pattern databases. The interested reader should refer to the cited papers for further information.
2.1 Perimeter Search Perimeter search was independently and simultaneously introduced in the specialized bibliography by Giovanni Manzini [12] and John F. Dillenburg and Peter C. Nelson [2]. The key observation of these researchers is that the main problems in bidirectional search come from the fact that both searches progress simultaneously. They proposed, instead, to generate a set of nodes (known as perimeter nodes) whose descendants exceed a given threshold d (known as perimeter depth) around the target node, and only after it has been generated, to start a unidirectional search from the source state until a collision with a perimeter node is detected. From this point of view, perimeter search might be seen as a more simple form of bidirectional search. However, the most prominent feature of this contribution is that it provides a mean for automatically improving an existing heuristic function, h(· ), since the unidirectional search from an arbitrary state, n uses the following, better informed, heuristic function hd : hd (n, t) = min {h(n, m) + h∗ (m, t)} m∈Pd
(1)
where Pd is the perimeter set comprising all nodes generated at depth d from the target, t, and h∗ (m, t) is the optimal cost of reaching the goal from the perimeter node m. Although it can be argued that using perimeter search involves “as many heuristic calculations as there are perimeter nodes” [11], the truth is that this number decreases with the depth of the forward search [12].
2.2
Pattern Databases
In their original work [1], Joseph C. Culberson and Jonathan Schaeffer defined patterns as abstractions of the original state space where
541
C. Linares López / Multi-Valued Pattern Databases
each constant appearing in the state space gets replaced by either a dedicated symbol or a special “don’t care” symbol. The granularity of the abstraction is defined as the number of constants in the original state being replaced by the same symbol [8]. For example γ = 3, 3, 2, 1 denotes an abstraction where three constants are replaced by one symbol (say x1 ); another three are replaced by a new symbol x2 ; another two constants by a third symbol, x3 , and the last constant by a unique symbol, x4 . Although it has not been mentioned before in the related literature, it can be easily proven that the number of patterns generated with a given granularity γ is: |γ| i=1
C
N−
i−1 j=1
γj ,γi
=
|γ| . N−
i−1
γj
/
j=1
i=1
γi
(2)
where Cn,m is the number of combinations of n elements choose m, and N is the total number of constants in the original state space, so that N = i γi . Thus, the previous granularity gives raise to:
The apparent advantage of this approach is that while standard pattern databases explore the abstracted search space around the goal node, the perimeter generation starts by considering the original state space up to a pre-defined perimeter depth. Nevertheless, the same state can be mapped, in different pattern databases, to different entries which contain the minimum distance to different perimeter nodes, so that comparisons become more difficult. In other words, this idea is likely to produce very poor estimates by comparing the minimum distance to different perimeter nodes.3 Besides, Ariel Felner and Nir Ofek [6] experimentally showed, and empirically proved, that this approach leads to no improvement at all, i. e., it generates the same number of nodes. Their explanation can be intuitively depicted as follows: the only expected benefit under this scheme is that patterns happening within the perimeter are now assigned better heuristic estimates, since patterns appearing beyond the perimeter set shall still get the same minimum distance. Comparing the number of patterns within the perimeter with all the plausible patterns gives a very small ratio in favour of this approach.
C9,3 × C9−(3),3 × C9−(3+3),2 × C9−(3+3+2),1 = 5, 040 different entries. Pattern databases are simply hash tables which store, for every pattern (or arrangement of symbols in the abstracted state), the minimum number of moves required to place the symbols in the abstracted state space in their goal location —also known as goal pattern. This value can be easily computed with a backwards brute-force breadth-first search from the goal pattern. So far, pattern databases are admissible heuristic functions. The index into the pattern database assigned to each pattern results from a ranking function which is (usually by far) the most expensive operation in searching with pattern databases2 . Originally, all moves were counted in so that when comparing the values retrieved from different pattern databases (for a collection of different patterns), the only way for getting an admissible heuristic is just to take the MAX of all values. However, when the constants appearing in the original state space can be split into disjoint sets (as in the N -puzzle or the Towers of Hanoi, but not in the Rubik’s cube or the TopSpin puzzle), a far better informed heuristic function can be built by computing the summation of all values [10]. This idea is known as disjoint, or just ADD pattern databases.
3
3.1
Mutiple Goal Pattern Databases
The first approach consists of addressing the combination as a multiple goal problem, i.e. a special case of heuristic search where the problem consists of hitting any of the perimeter nodes generated at depth d. A simple, yet beautiful, way for solving this sort of problems with the aid of pattern databases, consists of seeding the queue used in the backward breadth-first search with all the perimeter nodes [11]. This way, the pattern database will store a unique value per entry with the minimum distance to all perimeter nodes. Ranking consists of converting each item in a collection into a scalar.
Multi-valued Pattern Databases
Instead, it is suggested herein to store separately the distance to each perimeter node in the pattern database, as shown in figure 1. Thereby, comparisons with respect to the same perimeter node become now feasible, leading to a better informed heuristic function as discussed in section 2.1. This is, indeed, the most natural way to implement perimeter search. Since every entry contains a vector of values instead of a scalar, this technique is denoted as multi-valued pattern databases in contraposition to standard single-value pattern databases which consist of a unique value per entry. t m1
Combining Perimeter Search and Pattern Databases
As mentioned in the introduction, the main contribution of this work consists of discussing a different way than that previously proposed in [6] for putting together both perimeter search and pattern databases.
2
3.2
Figure 1.
mj
m2
...
h1 [1] h1 [2]
...
h1 [j]
h2 [1] h2 [2]
...
h2 [j]
h3 [1] h3 [2]
...
h3 [j]
Seeding a different queue with every perimeter node
At first glance, it might seem that this approach wastes a lot of space in main memory. However, this is not the case at all in the vast majority of cases. Consider, for example, the 15-Puzzle and a single-valued pattern database consisting of 7 different symbols, i.e. 1, 1, 1, 1, 1, 1, 1, 9. According to equation (2), this yields 57,657,600 different entries. What is the next bigger pattern database that can be built? • One option consists of augmenting the original pattern database with an additional symbol, this is, taking 8 different constants, 3
Indeed, though not explicitly mentioned in [6], the termination condition might become more difficult now, since it is not strictly true that when various pattern databases return zero, a collision with a perimeter node has been detected, since maybe they are all referring to different nodes!
542
C. Linares López / Multi-Valued Pattern Databases
whose granularity is 1, 1, 1, 1, 1, 1, 1, 1, 8. This new, bigger, pattern database consists of 518,918,400 entries and is 9 times bigger than the original one. • Another option consists of mapping an additional constant in the original state space to some other symbol currently used. This case is represented with granularity 1, 1, 1, 1, 1, 1, 2, 8 and originates 259,459,200 entries, 4.5 times more than the original pattern database. However, the number of perimeter nodes generated in the 15Puzzle at depth d = 1 and 2 is |Pd | = 2 and 4, respectively. This means that the resulting multi-valued pattern databases are smaller than the single-valued pattern databases created in both cases. Another consideration tightly related to the size of the resulting pattern databases is the number of ranking operations performed in each case. While the number of nodes to consider simultaneously in multi-valued pattern databases impose an overhead, they are all retrieved in a row, i.e. with a single ranking operation. This is true because the distances to each perimeter node are stored in contiguous locations in memory. However, if different single-valued pattern databases are going to be employed (which take altogether the same space than a multi-valued pattern database), each value shall be retrieved separately so that various ranking operations shall be performed. Since ranking is the most expensive operation in pattern databases, this overhead shall be taken into account as well.
3.3
Results
Although ADD pattern databases are known to provide more accurate heuristic values, it is not always possible to apply them. Therefore, experiments have been conducted with both ADD and MAX pattern databases. In both cases, the perimeter is generated using a brute-force depthfirst search algorithm from the goal which generates all nodes whose descendants have a cost that exceeds the specified perimeter depth. Once the perimeter set is generated, different queues are seeded with each perimeter node and a backward breadth-first search is issued with everyone for a given pattern specification. As a result, multivalued pattern databases (which are as many times bigger than a single-valued pattern database as perimeter nodes were found) are generated. For the ease of comparison, the same mapping functions have been programmed. Since sparse mapping incurs in prohibitive wastes of space for some cases, a compact mapping has been chosen4 . Because IDA∗ explores the state space in a depth-first fashion, an incremental implementation of the Myrvold and Ruskey ranking algorithm [13] has been developed. The current implementation runs about 20%–30% faster and has no additional memory requirements. Unfortunately, due to space constraints no further details regarding this algorithm are provided. It is worth mentioning that single-valued pattern databases are expected to be more sensitive to this improvement than multi-valued pattern databases. The reason is that multivalued pattern databases, being more informed than single-valued pattern databases (according to section 2.1) will generate and rank less nodes.
3.3.1
5-5-5 sPDB (0.0014Gb) mPDB2 (0.0058Gb) rPDB (0.0014Gb) 6-6-3 sPDB (0.0107Gb) mPDB2 (0.0429Gb) rPDB (0.0107Gb)
#1
#2
#3
#4
#5
#6
1.28 0.851 0.49 0.188 1.01 0.372
0.56 0.324 0.56 0.232 0.38 0.111
2.17 1.561 0.92 0.417 1.48 0.581
0.79 0.456 0.81 0.364 0.49 0.140
2.32 1.634 1.18 0.570 1.49 0.585
0.47 0.254 0.64 0.253 0.44 0.134
#1
#2
#3
#4
#5
#6
0.83 0.509 0.37 0.113 0.78 0.248
0.98 0.576 0.45 0.144 0.63 0.186
0.42 0.187 0.30 0.086 0.36 0.092
0.77 0.452 0.33 0.108 0.44 0.119
1.77 1.183 1.00 0.434 1.79 0.638
0.39 0.181 0.50 0.181 0.24 0.056
Table 1. Experimental results in the 15-Puzzle with 5-5-5 and 6-6-3 PDBs: each cell shows the mean run-time in seconds (above) and total number of generated nodes in thousands of millions, 109 (below)
Table 1 shows also the same statistics for another six different arrangements of 6-6-3 pattern databases. Pattern database #6 is the one suggested in [4]. Next, table 2 depicts the same statistics for four different arrangements of 7-8 pattern databases. In this case, pattern database #1 is the one widely suggested in the specialized bibliography and also cited in [4]. sPDB (0.5369Gb) mPDB2 (2.1479Gb) rPDB (0.5369Gb)
#1
#2
#3
#4
0.0368 13.721 0.0355 10.262 0.0088 3.832
0.1338 60.347 0.0434 15.374 0.0993 21.646
0.1374 54.323 0.0435 15.502 0.0846 19.379
0.1687 78.370 0.0580 18.349 0.0939 23.594
Table 2. Experimental results in the 15-Puzzle with 7-8 PDBs: mean run-time in seconds (above) and total number of generated nodes in millions, 106 (below)
Multi-valued ADD pattern databases
The domains chosen for experimenting with multi-valued ADD pattern databases are the 15-Puzzle and the 24-Puzzle. In all cases, pat4
tern databases are “blank-preserving” [8], i. e., the blank tile is always mapped to a unique symbol, instead of “blank-increasing” — which consists of mapping the blank tile to the same symbol used by other tiles, such as the “don’t care” symbol. Also, reflections about the main diagonal are computed for single-valued pattern databases only if the regular lookup did not exceed the current threshold. Since no domain dependent feature is exploited for multi-valued pattern databases, results with reflections are provided only for the sake of completeness. Table 1 shows the mean time elapsed (in seconds) and the total number of generated nodes for solving the Korf’s test suite, which consists of 100 problems, when using single-valued and multi-valued pattern databases. In the experiments, six different arrangements of pattern databases have been used, where each pattern consists of 5 pattern tiles —pattern database #6 is the same suggested in [4]. In all tables, sPDB denotes single-valued pattern databases; mPDBi stands for multiple-valued pattern databases generated with perimeter depth d = i and, finally, rPDB stands for the same pattern databases as in sPDB but taking advantage of the reflections about the main diagonal.
For a thorough discussion on the topic, see [4], section 4.2, page 289.
Table 3 shows the same statistics in the 24-Puzzle using two different arrangements of 6-6-6-6 pattern databases at depth 3. Pattern database #1 is the usual reference in this domain, as suggested in [4]. The test set employed consists of the 25 easiest instances of the test
543
C. Linares López / Multi-Valued Pattern Databases
sPDB (0.4750Gb) mPDB3 (4.7501Gb) rPDB (0.4750Gb)
#1
#2
10622.39 3.08 6162.81 1.00 2390.09 0.41
24746.44 7.02 12227.45 1.93 10170.26 1.72
by perimeter search clearly pays-off for the reduction on the number of nodes generated. 4-4-4 (0.0086Mb) sPDB mPDB1
Table 3. Experimental results in the 24-Puzzle with 6-6-6-6 PDBs: mean run-time in seconds (above) and total number of generated nodes in millions of millions, 1012 (below)
suite detailed in [10], with solution lengths ranging from 81 to 106 moves. When comparing the performance of various single-valued pattern databases versus their multi-valued counterparts, it turns out that the latter usually outperforms the former, more remarkably in the 7-8 and 6-6-6-6 cases. But this is not always true —see, for example PDB #6 in the 6-6-3. However, when comparing all running-times, the pattern database which resulted in faster performance is always the multivalued pattern database (for example, in the 6-6-3 case, the fastest algorithm uses multi-valued pattern databases arranged as in #3), but in the 5-5-5 case. The fact that for some arrangements, multi-valued pattern databases do not outperform their single-valued counterpart but others do it, can be explained as an effect of the diversity induced by the perimeter nodes. It has been observed that for some arrangements of pattern databases, the blank tile only reaches a few patterns when computing the perimeter nodes. The more pattern databases are affected, the better the heuristic. For example, in the 7-8 PDB #1 of the 15-Puzzle5 , allowing the blank to move twice affects both pattern databases. Thus, the resulting multi-valued pattern database outperformed its single-valued counterpart, even though the latter is very accurate for solving this problem. Correspondingly, when computing the multi-valued pattern database of 5-5-5 #6, only one PDB out of three gets updated, thus not leading to any improvement on either the number of nodes generated or the running time.
mPDB2
Multi-valued MAX pattern databases
The domain chosen for these experiments is the (N, K)-TopSpin. Max’ing is far less efficient than taking the summation of a few values from different disjoint pattern databases. Thus, the sizes of the instances considered here are smaller than the ones shown in the previous paragraph. The number of pattern databases and the number of tiles they contain in each case is clearly identified in the tables. For example, 6-6 stands for two pattern databases with 6 tiles each. Besides, they always consist of contiguous locations arranged in such a way that the pattern databases are all equidistant, thus minimizing the overlapping among them. In all the subsequent experiments, the test suites employed consisted of 100 solvable instances generated with the random application of a number of operators between 100 and 500. Table 4 shows the results in the (9, 2)-TopSpin. This puzzle can be solved so fast that in most cases the time spent falls below 0.00 seconds. The number of perimeter nodes generated at depth d = 1 and 2 is |Pd | = 3 and 6, respectively, so that mPDB1 and mPDB2 are 3 and 6 times larger than the size of the corresponding sPDB, shown below every arrangement. As it can be seen, the overhead imposed 5
In this case, the 15-Puzzle is split into two halves: one above the other. The one below contains 8 pattern tiles whereas the one over it contains 7, because the blank tile is ommitted.
≤ 0.00 0.458 ≤ 0.00 0.347 ≤ 0.00 0.263
6-6-6 (0.1730Mb) ≤ 0.00 0.104 ≤ 0.00 0.077 ≤ 0.00 0.060
Table 4. Experimental results in the (9, 2)-TopSpin: mean run-time in seconds (above) and total number of generated nodes in tenths of millions, 105 (below)
Table 5 summarizes the results for both the (12, 2)-TopSpin and the (15, 2)-TopSpin. As it can be seen, multi-valued pattern databases solved the problems faster and generating less nodes in all cases, with no exception. (12, 2)-TopSpin 6-6 8-8 (1.2689Mb) (38.0676Mb) sPDB mPDB1 mPDB2
9.94 1.813 6.54 1.066 6.07 0.795
0.21 0.021 0.16 0.013 0.11 0.008
(15, 2)-TopSpin 7-7-7-7-7 (154.6497Mb) 22.48 2.776 20.16 2.114 16.95 1.572
Table 5. Experimental results in the (12, 2)-TopSpin and the (15, 2)-TopSpin: mean run-time in seconds (above) and total number of generated nodes in thousands of millions, 109 (below)
4 3.3.2
0.01 5.952 0.01 3.964 ≤ 0.00 3.044
5-5-5 (0.0432Mb)
Compressing Multi-valued Pattern Databases
From equation (2) it becomes clear that the number of patterns grows rapidly for any granularity. Thus, techniques have been developed for efficiently compressing pattern databases both in a lossy and lossless way [5]. In this section, some preliminary ideas for compressing multi-valued pattern databases as well are discussed. It should be highlighted that the techniques discussed herein are not incompatible with those introduced in [5]. In spite of the discussions in section 3.2, the truth is that disjoint (or ADD) multi-valued pattern databases take even less space than what it might seem. Consider the 7-8 PDB #1 for the 15-Puzzle generated with perimeter depth d = 1 —see fotnote 5. It is easy to realize that in the two perimeter nodes generated so far, the inferior half (i. e., the pattern database with 8 tiles) looks exactly the same than in the goal state. Since ADD pattern databases do count all moves of the blank tile, the values stored in the inferior multi-valued pattern database are likely to be the same. Thus, it is only necessary to store two values per entry in the superior pattern database, but only one in the inferior database. This way, the resulting multi-valued pattern databases take twice the space of the smaller single-valued database (the one with 7 tiles) but only once the space of the inferior, larger, single-valued database. This stands for a marginal increment in the size of 10%. Even considering larger perimeter depths (say d = 2), it is still possible to apply other compression schemes to multi-valued pattern databases as discussed below.
544
C. Linares López / Multi-Valued Pattern Databases
This is not true, however, for MAX pattern databases because in this case only moves of the pattern tiles are taken into account. Nevertheless, it is still possible to compress the resulting multi-valued pattern database relating statistically the distribution of values to each perimeter node with the distance to the first perimeter node. Let δi (j) denote the difference hi (j)−hi (1) where hi (j) is the jth component of the vector in the i-th entry of a multi-valued pattern database. In other words, δi (j) is the difference of the distance to the j-th perimeter node and the first perimeter node from pattern i. This way, it is possible to compute the vector of differences δi (· ) for every entry, i, in a given multi-valued pattern database. Also, it is assumed that P perimeter nodes have been generated. Now, there are two different ways to compress data in a loosy way without sacrificing admissibility: Traversal compression consists of forcing all hi (j) values from the same entry i to be equal to the minimum of them all, so that each component j takes a new value hi (j) computed as follows:
choice. In other words, instead of storing a vector of heuristic estimations to each perimeter node in every entry of the multi-valued pattern database, an index to a small number of δ(· ) vectors is attached. Then, when solving a problem, retrieve the index from the pattern database and apply its δ(· ) vector to get the heuristic estimations to all the perimeter nodes. Preliminary experiments in the (N, K)-TopSpin suggest that it is feasible to significantly compress multi-valued pattern databases and still running faster than various single-valued pattern databases generating far less nodes.
5
Summary
Although it might be contrary to intuition, storing various values per entry in a pattern database can outperform the standard, singlevalued pattern databases, either ADD or MAX. Furthermore, these databases can be compressed with the techniques outlined in the last section which are not incompatible with existing techniques for compressing single-valued pattern databases.
P
hi (j) = hi (1) + min{δi (j)}, 2 ≤ j ≤ P . The expected loss in j=2
the accuracy of the resulting heuristic values due to the traversal compression, Lt , can be computed as: Lt (δi (· )) =
P
hi (j) − hi (j) p(δi )
Acknowledgements This work has been partially supported by the Spanish MEC project TIN2005-08945-C06-05 and UC3M-CAM project CCG06UC3M/TIC-0831.
j=2
where p(δi ) stands for the probability of occurrence of the vector of differences δi . Note that the same vector of differences δi can happen in an arbitrary number of entries in the multi-valued pattern database other than the i-th entry. Applying repeatedly this compression scheme, the resulting pattern database will be exactly the same than the one generated under the multiple goal approach in section 3.1. Longitudinal compression merges two different entries, u and v, by forcing their δ vectors, δu (· ) and δv (· ), to be the same so that hu (i) and hv (i) take new values, hu (i) and hv (i), according to: hu (i) = hu (1) + min{δu (i), δv (i)} and, similarly for hv . As in the previous case, it is possible to compute the expected loss in the accuracy of the heuristic values that result after a longitudinal compression, Ll , as follows: Ll (δu (· ), δv (· )) =
P
(hu (j) − hu (j))p(δu )+
j=2
(hv (j) − hv (j))p(δv )
Since the preceding expressions allow the measurement of the loss in the accuracy of the heuristic function, they serve for compressing any multi-valued pattern database to any desired ratio of compression degree versus loss of accuracy. In particular, for any upper bound on the average loss of the heuristic function, U , an algorithm for efficiently compressing a multi-valued pattern database proceeds in the following fashion: if the average loss is still below U , compute the expected loss of all the traversal compressions, and also the expected loss of all the longitudinal compressions for each pair of entries in the pattern database. Next, pick the compression with the minimum expected loss and update the pattern database. Proceeding in this manner, the number of differences, δ(· ), will be monotonically decreasing at each step. If there are n different vectors of differences when the expected loss reaches the upper bound, U , code each entry in the pattern database with one of the indexes in the range [1, log2 n], so that log2 n bits are used instead of 8, which is the usual
REFERENCES [1] Joseph C. Culberson and Jonathan Schaeffer, ‘Pattern databases’, Computational Intelligence, 14(3), 318–334, (1998). [2] John F. Dillenburg and Peter C. Nelson, ‘Perimeter search’, Artificial Intelligence, 65, 165–178, (1994). [3] Stefan Edelkamp, ‘External symbolic heuristic search with pattern databases’, in Proceedings of the Fifteenth International Conference on Automated Planning and Scheduling (ICAPS-05), pp. 51–60, Monterey, California, United States, (June 2005). [4] Ariel Felner, Richard E. Korf, and Sarit Hanan, ‘Additive pattern database heuristics’, Journal of Artificial Intelligence Research, 22, 279–318, (November 2004). [5] Ariel Felner, Richard E. Korf, Ram Meshulam, and Robert Holte, ‘Compressed pattern databases’, Journal of Artificial Intelligence Research, 30, 213–247, (October 2007). [6] Ariel Felner and Nir Ofek, ‘Combining perimeter search and pattern database abstractions’, in Proceedings of the Seventh Symposium on Abstraction, Reformulation and Approximation (SARA-07), pp. 155– 168, Whistler, Canada, (July 2007). [7] Ariel Felner, Uzi Zahavi, Jonathan Schaeffer, and Robert C. Holte, ‘Dual lookups in pattern databases’, in Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), pp. 103–108, Edinburgh, Scotland, (July 2005). [8] Robert Holte, Jack Newton, Ariel Felner, Ram Meshulam, and David Furcy, ‘Multiple pattern databases’, in Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS-04), pp. 122–131, Whistler, British Columbia, Canada, (June 2004). [9] Robert C. Holte, Ariel Felner, Jack Newton, Ram Meshulam, and David Furcy, ‘Maximizing over multiple pattern databases speeds up heuristic search’, Artificial Intelligence, 170(16–17), 1123–1136, (November 2006). [10] Richard E. Korf and Ariel Felner, ‘Disjoint pattern database heuristics’, Artificial Intelligence, 134(1–2), 9–22, (2002). [11] Richard E. Korf and Ariel Felner, ‘Recent progress in heuristic search: A case study of the four-peg towers of hanoi problem’, in Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), pp. 2324–2329, Hyderabad, India, (January 2007). [12] Giovanni Manzini, ‘BIDA∗ : an improved perimeter search algorithm’, Artificial Intelligence, 75, 347–360, (1995). [13] W. Myrvold and F. Ruskey, ‘Ranking and unranking permutations in linear time’, Information Processing Letters, 79, 281–284, (2001).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-545
545
Using Abstraction in Two-Player Games Mehdi Samadi , Jonathan Schaeffer1, Fatemeh Torabi Asr , Majid Samar , Zohreh Azimifar 2 Abstract. For most high-performance two-player game programs, a significant amount of time is devoted to developing the evaluation function. An important issue in this regard is how to take advantage of a large memory. For some two-player games, endgame databases have been an effective way of reducing search effort and introducing accurate values into the search. For some one-player games (puzzles), pattern databases have been effective at improving the quality of the heuristic values used in a search. This paper presents a new approach to using endgame and pattern databases to assist in constructing an evaluation function for two-player games. Via abstraction, single-agent pattern databases are applied to two-player games. Positions in endgame databases are viewed as an abstraction of more complicated positions; database lookups are used as evaluation function features. These ideas are illustrated using Chinese checkers and chess. For each domain, even small databases can be used to produce strong game play. This research has relevance to the recent interest in building general gameplaying programs. For two-player applications where pattern and/or endgame databases can be built, abstraction can be used to automatically construct an evaluation function.
1 Introduction and Overview Almost half a century of AI research into developing highperformance game-playing programs has led to impressive successes, including D EEP B LUE (chess), C HINOOK (checkers), TDG AMMON (backgammon), L OGISTELLO (Othello), and M AVEN (Scrabble). Research into two-player games is one of the most visible accomplishments in artificial intelligence to date. The success of these programs relied heavily on their ability to search and to use application-specific knowledge. The search component is largely well-understood for two-player games (whether perfect or imperfect information; stochastic or not); usually the effort goes into building a high-performance search engine. The knowledge component varies significantly from domain to domain. Various techniques have been used, including linear regression (as in L OGIS TELLO ) and temporal difference learning (as in TD-G AMMON ). All of them required expert input, especially the D EEP B LUE [10] and C HINOOK [16] programs. Developing these high-performance programs required substantial effort over many years. In all cases a major commitment had to be made to developing the program’s evaluation function. The standard way to do this is by hand, using domain experts if available. Typically, the developer (in consultation with the experts) designs multiple evaluation function features and then decides on an appropriate 1 2
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2E8, email: {msamadi,jonathan}@cs.ualberta.ca Department of Computer Science and Engineering Shiraz University, Shiraz, Iran, email:{torabi,samar,azimifar}@cs.shirazu.ac.ir
weighting for them. Usually the weighted features are summed to form the assessment. This technique has proven to be effective, albeit labour intensive. However, this method fails in the case of a new game or for one in which there is no expert information available (or no experts). The advent of the annual General Game Playing (GGP) competition at AAAI has made the community more aware of the need for general-purpose solutions rather than custom solutions. Most high-performance game-playing programs are compute intensive and benefit from faster and/or more CPUs. An important issue is how to take advantage of a large memory. Transposition tables have proven effective for improving search efficiency by eliminating redundancy in the search. However, these tables provide diminishing returns as the size increases [3]. For some two-player games, endgame databases (sometimes called tablebases) have been an effective way of reducing search effort and introducing accurate values into the search. These databases enumerate all positions with a few pieces on the board and compute whether each position is a provable win, loss or draw. Each database position, however, is applicable to only one position. The single-agent (one-player) world has also wrestled with the memory issue. Pattern databases have been effective for improving the performance of programs to solve numerous optimization problems, including the sliding-tile puzzles and Rubik’s Cube [8]. They are similar to endgame databases in that they enumerate a subset of possible piece placings and compute a metric for each (e.g., minimum number of moves to a solution). The databases are effective for two reasons. First they can be used to provide an improved lower bound on the solution quality. Second, using abstraction, multiple states can be mapped to a single database value, increasing the utility of the databases. The main theme of this paper is to investigate and propose a new approach to use endgame and pattern databases to assist in automating the construction of an evaluation function for two-player games. The research also carries over to multi-player games, but this is not addressed in this paper. The key idea is to extend the benefits of endgame and pattern databases by using abstraction. Evaluation of a position with N pieces on the board is done by looking up a subset of pieces M < N in the appropriate database. The evaluation function is built by combining the results of multiple lookups and by learning an appropriate weighting of the different lookups. The algorithm is simple and produces surprisingly strong results. Of greater importance is that this is a new general way to use the databases. The contributions of this research are as follows: 1. Abstraction is used to extend pattern databases (even additive pattern databases) for constructing evaluation functions for a class of two-player games. 2. Pattern-database-based evaluation functions are shown to produce state-of-the-art play in Chinese checkers (10 pieces a side). Against a baseline program containing the latest evaluation func-
546
M. Samadi et al. / Using Abstraction in Two-Player Games
tion enhancements, the pattern-database-based program scores 68% to 79% of the possible points. 3. Abstraction is used to extend endgame databases for constructing evaluation functions for a class of two-player games. 4. Chess evaluation functions based on four- and five-piece endgame databases are shown to outplay C RAFTY, the strongest freeware chess program available. On seven- and eight-piece chess endgames, the endgame-database program scores 54% to 80% of the possible points. Abstraction is a key to extending the utility of the endgame and pattern databases. For domains for which these databases can be constructed, they can be used to build an evaluation function automatically. As the experimental results show, even small databases can be used to produce strong game plays.
state in the abstract search space. Information about the abstract state (e.g., solution cost) can be used as a heuristic for the original state (e.g., a bound on the solution cost). Here we give the background notation and definitions using chess as the illustrative domain. Let S be the original search space and S be the abstract search space. Original space Abstract space u (u)
a*(u,v) (a)*( (u), (v) )
v
(v)
2 Related Work Endgame databases have been in use for two-player perfect information games for almost thirty years. They are constructed using retrograde analysis [18]. Chess was the original application domain, where databases for all positions with six or fewer pieces have been built. Endgame databases were essential for solving the game of checkers, where all positions with ten or fewer pieces have been computed [6]. The databases are important because they reduce the search tree and introduce accurate values into the search. Instead of using a heuristic to evaluate these positions (with the associated error), a game-playing program can use the database value (perfect information). The limitation, however, is that each position in the database is applicable to a single position in the search space. Pattern databases also use retrograde analysis to optimally solve simplified versions of a state space [4]. A single-agent state space is abstracted by simplifying the domain (e.g., only considering a subset of the features) and solving that problem. The solutions to the abstract state are used as lower bounds for solutions to a set of positions in the original search space. For some domains, pattern databases can be constructed so that two or more database lookups can be added together while still preserving the optimality of the combined heuristic [13]. Abstraction means that many states in the original space can use a single state in the pattern database. Pattern databases have been used to improve the quality of the heuristic estimate of the distance to the goal, resulting in many orders of magnitude reduction in the effort required to solve the sliding-tile puzzles and Rubik’s Cube [8]. The ideas presented in this paper have great potential for General Game Playing (GGP) programs [9]. A GGP program, given only the rules of the game/puzzle, has to learn to play that game/puzzle well. A major bottleneck to producing strong play is the discovery of an effective evaluation function. Although there is an interesting literature on feature discovery applied to games, to date the successes are small [7]. It is still early days for developing GGP programs, but the state of the art is hard coding into the program several well-known heuristics that have been proven to be effective in a variety of games, and then testing them to see if they are applicable to the current domain [15]. It remains an open problem how to automate the discovery and learning of an effective evaluation function for an arbitrary game.
3 Using Abstraction in Two-Player Games Abstraction is a mapping from a state in the original search space into a simplified representation of that state. The abstraction is often a relaxation of the state space or a subset of the state. In effect, abstraction maps multiple states in the original state space to a single
Figure 1. Original states and edges mapped to an abstract space.
Definition 1 (Abstraction Transformation): An abstraction transformation φ : S → S maps 1) states u ∈ S to states φ(u) ∈ S , and 2) actions a ∈ S to actions φ(a) ∈ S . This is illustrated in Figure 1. Consider the chess endgame of white king, rook, and pawn versus black king and rook (KRPKR). The original space (S) consists of all valid states where these five pieces can be placed on the board. Any valid subset of the original space can be considered as an abstraction. For example, king and rook versus king (KRK) and king, rook, and pawn versus king (KRPK) are abstractions (simplifications) of KRPKR. For any particular abstraction S , the search space contains all valid states in the abstract domain (all piece location combinations). The new space S is much smaller than the original space S, meaning that a large number of states in S are being mapped to a single state in S . For instance, for every state in the abstracted KRK space, all board positions in S where the white king, white rook and black king are on the same squares as in S are mapped onto a single abstract state (i.e., white pawn and black rook locations are abstracted away). Actions in S contain all valid moves for the pieces that are in the abstracted state. Definition 2 (Homomorphism): An abstraction transformation φ is a homomorphism transformation if for all series of actions that transforms state u to state v in S then there is a corresponding transformation for φ(u) to φ(v) in S . This is illustrated in Figure 1, where a∗ represents zero or more actions. If there is a solution for a state in the original space S, then the homomorphism property guarantees the existence of a solution in the abstracted space S . Experimental results indicate that this characteristic can be used to improve search performance in S. Various abstractions can be generated for a given search problem. The set of relaxing functions is defined as φ = {φ1 , φ2 , . . . , φn }, where each φi is an abstraction. Define the distance between any two states u and v in the relaxed environment as φi with hφi (u, v). For example, for an endgame or pattern database, v is usually set to a goal state meaning that hφi (u, v) is the minimal number of moves needed to achieve the goal. Using off-line processing, the distance from each state in φi to the nearest goal can be computed and saved in a database (using retrograde analysis). For a pattern database (one-player search), the minimal distance to the goal is stored. For an endgame database (two-
M. Samadi et al. / Using Abstraction in Two-Player Games
player search), the minimal number of moves to win (maximal moves to postpone losing) are recorded. This is the standard way that these databases are constructed. Given a problem instance to solve, during the search all values from those lookup tables are retrieved for further processing. To evaluate a position p from the original space, the relaxed state, φi (p), is computed and the corresponding hφi (p) is retrieved from the database. The abstract values are saved in a heuristic vector h =< hφ1 , hφ2 , . . . , hφn >. The evaluation function value for state p is calculated as a function of h. For example, popular techniques used for two-player evaluation functions include temporal difference learning to linearly combine the hφi values [1], and neural nets to achieve non-linear relations [17]. For example, let us evaluate a position p in the KRPKR chess endgame. In this case, the abstracted states could come from the databases KRPK, KRKR, KRK and KPK. First, for each abstraction, the abstract state is computed and the heuristics value hφi (p) is retrieved from the database. In this case, the black rook is removed and the resulting position is looked up in the KRPK database; the white pawn is removed and the position looked up in the KRKR database; etc. The heuristic value for p could be, for example, the sum of the four abstraction scores.
4 Experimental Results In this section, we explore using abstraction to apply pattern database technology to two-player Chinese checkers and chess endgame database technology to playing more complicated chess endgames. Unlike chess, Chinese checkers has the homomorphism property (the proof is simple, but not shown here for reasons of space).
4.1 Chinese Checkers Chinese checkers is a 2-6 player game played on a star shaped board with the squares hexagonally connected. The objective is to move all of one’s pieces (or marbles) from the player’s home zone (typically 10) to the opposite side of the board (the opponent’s home zone). Each player moves one marble each turn. A marble can move by rolling to an adjacent position (one of six) or by repeatedly jumping over an adjacent marble, of any color, to an adjacent empty location (the same as jumps in 8 × 8 checkers/draughts). In general, to reach the goal in the shortest possible time, the player should jump his pieces towards the opponent’s home zone. Here we limit ourselves to two-player results, although the results presented here scale well to more players (not reported here). Due to the characteristics of Chinese checkers, three different kinds of abstractions might be considered. Given N pieces on each side of the original game: 1. Playing K ≤ N white pieces against L ≤ N black pieces; 2. Playing K ≤ N white pieces to take them to opponent’s home zone (a pattern database including no opponent’s marble); and 3. Playing K ≤ N white pieces against L ≤ N black pieces, but with a constraint that the play concentrates on a partition of the board. For any given search space, the more position characteristics that are exploited by the set of abstractions, the more likely that the combination of abstraction heuristics will be useful for the original problem space. The first two abstractions above have the homomorphism property, and the empirical results indicate that they better approximate the original problem space. In the first abstraction, a subset of
547
pieces for both players (e.g., the three-piece versus two-piece game) is considered and the minimal number of moves to win (most moves to lose) is used. The second abstraction ignores all the opponent’s pieces. This abstraction gives the number of moves required to get all of one’s pieces into the opponent’s zone. This value is just a heuristic estimate (not a bound), since the value does not take into account the possibility of jumping over the opponent’s pieces (which precludes it from being a lower bound) and does not take into account interference from the opponent’s pieces (precluding it from being an upper bound). Clearly, the first abstraction is a better representation of the original problem space. The third abstraction considers only a part of the board to build a pattern database. For example, the goal of the abstraction can be changed so that the pieces only have to enter the goal area (without caring about where the end up). The state space for the first abstraction is large; the endgame database of three versus two pieces requires roughly 256MB. The second relaxation strategy makes the search space simpler, allowing for pattern databases that include more pieces on the board. The database size for five pieces of the same side needs roughly 25MB, 10% of the first abstraction database. Our experience with Chinese checkers shows that during the game five cooperating pieces will result in more (and longer) jump moves (hence, less moves to reach the goal) than five adversarial pieces. Although the first abstraction looks more natural and seems to better reflect the domain, the second abstraction gives better heuristic values. Thus, here we present only the second and third abstractions. The baseline for comparison is a Chinese checkers program (10 pieces a side) with all the current state-of-the-art enhancements. The evaluation function is based on the Manhattan distance for each side’s pieces to reach the goal area. Recent research has improved on this simple heuristic by adding additional evaluation terms: 1) curved board model, incremental evaluation, left-behind marbles [19]; and 2) learning [11]. All of these features have been implemented in our baseline program. Experiments consisted of the baseline program playing against a program using a PDB- or endgame-based evaluation function. Each experimental data point consists of a pair of games (switching sides) for each of 25 opening positions (after five random moves have been made). Experiments are reported for search depths of three and five ply (other search depth results are similar). The branching factor in the middlegame of Chinese checkers is roughly 60-80. Move generation can be expensive because of the combination of jumps for each side. This slows the program down, limiting the search depth that can be achieved in a reasonable amount of time. The average response time for a search depth of six in the middlegame is more than thirty seconds per move (1.5 hours per game). Our reported experiments are limited to depths three through five because of the wide range of experiments performed. In this paper, we report the results for three interesting heuristic evaluation functions. Numerous functions were experimented with and achieved similar performance to those reported here. For the following abstractions, the pieces were labeled 1 to 10 in a right-to-left, bottom-up manner. The abstractions used were: PDB(4): four-piece pattern database (second abstraction) with the goal defined as the top four squares in the opponent’s home zone. Three abstractions (three lookups) were used to cover all available ten pieces: pieces 1-4, 4-7, and 7-10. We also tested other lookups on this domain. Obviously increasing the number of lookups can increase the total amount of time to evaluate each node. On the other hand, the overlap of using pieces four and seven in the evaluation function does not have a severe effect on the cost of an
548
M. Samadi et al. / Using Abstraction in Two-Player Games
evaluation function. PDB(6): six-piece pattern database (second abstraction) with the goal defined as the top six squares in the opponent’s home zone. Two abstractions (two lookups) were used to cover all 10 pieces: pieces 1-6 and 5-10. Again, two pieces are counted twice in an evaluation (pieces 5 and 6), as a consequence of minimizing the execution overhead. PDB(6+4): a probe from the six-piece PDB is added to a probe from the four-piece PDB (a combination of second and third abstraction). Two abstractions (two lookups) were used to cover all 10 pieces: pieces 1-6 from the PDB(6) and 7-10 from the PDB(4) with its goal defined as passing all pieces from the opponent’s front line (third abstraction). In other words, for the four-piece abstraction we delete the top six squares of the board such that the new board setup introduces our new goal. The weighting of each probe is a simplistic linear combination of the abstraction heuristic values. Abstraction (Pieces) PDB (4) PDB (6) PDB (6+4) PDB (4) PDB (6) PDB (6+4) PDB (4) PDB (6) PDB (6+4) Table 1.
Search Depth 3 3 3 4 4 4 5 5 5
Win % 79 68 74 69 68 80 78 70 78
Experiments in Chinese checkers.
Table 1 presents the results achieved using these abstractions. The three rows of results are given for each of search depths three, four and five. The win percent reflects two points for a win, one for a tie and zero for a loss. Evidently, PDB(6+4) has the best performance, winning about 80% of the games against the baseline program. Perhaps surprisingly, PDB(4) performs very well, even better than PDB(6) does. One would expect PDB(6) to perform better since it implicitly contains more knowledge of the pieces interactions. However, note that the more pieces on the board, the more frequent long jump sequences will occur. The longer the jump sequence the smaller the probability that it can be realized, given that there are other pieces on the board. Hence, we conclude that a larger PDB may not be as accurate as a smaller PDB. The additive evaluation function (using PDB(4+6)) gives the best empirical results. Not only is it comparable to the PDB(4), but also it achieves its good performance with one fewer database lookup per evaluation. Although the experiments were done to a fix search depth (to ensure a uniform comparison between program versions), because of the relative simplicity of the evaluation function an extra database lookup represented a significant increase in execution cost. In part this is due to the pseudo-random nature of accessing the PDB, possibly incurring cache overhead. Our implementation takes advantage of obvious optimizations to eliminate redundant database lookups (e.g., reusing a previous lookup if still applicable). By employing these optimizations, we observed that the time for both heuristic functions are very close and does not change the results. Several experiments were also performed using the first abstraction, with three-against-two-piece endgame databases. A program based on the second abstraction (pattern databases with five pieces) significantly outperformed the first abstraction. The values obtained
from five cooperating pieces were a better heuristic predictor than that obtained from the adversarial three versus two pieces database. The results reported here do not necessarily represent the best possible. There are numerous combinations of various databases that one can try. The point here is that simple abstraction can be used to build an effective evaluation function. In this example, single-agent pattern databases are used in a new way for two-player heuristic scores.
4.2 Chess This section presents experimental results for using four- and fivepiece chess endgame databases to play seven- and eight-piece chess endgames. The abstracted state space is constructed using a subset of available pieces. For example, for the KRPKR endgame one can use the KRK, KRPK, and KRKR subset of pieces as abstractions of the original position. All the abstractions are looked up in their appropriate database. The endgame databases are publicly available at numerous sites on the Internet. For each position, they contain one of the following values: win (the minimum number of moves to mate the opponent), loss (the longest sequence of moves to be mated by the opponent) or draw (zero). The values retrieved from the abstractions are used as evaluation function features. They are linearly combined; no attempt at learning proper weights has been done yet. In chess, as opposed to Chinese checkers, ignoring all the opponent pieces does not improve the performance given the tight mutual influence they have on each other (i.e., piece captures are possible). Hence pattern databases are unlikely to be effective. One could use pattern databases for chess, even though, we expect a learning algorithm to discover a weight of zero for such abstractions. The chess abstraction does not have the homomorphism property because of the mutual interactions among the pieces. In other words, it is possible to win in the original position while not achieving this result in the abstract position. For example, there are many winning positions in the KRPKR endgame but in the abstraction of KRKR almost all states lead to a draw. Our experiments used the four- and five-piece endgame databases. Note that the state-of-the-art in endgame database construction is six pieces [2]. These databases are too large to fit into RAM, making their access cost prohibitively high. Evaluation functions must be fast, otherwise they can dramatically reduce the search speed. Hence we restrict ourselves to databases that can comfortably fit into less than 1GB of RAM. This work will show that even the small databases can be used to improve the quality of play for complex seven- and eight-piece endgames. In our experiments the proposed engine (a program consisting solely of an endgame-database-based evaluation) played against the baseline program (as the opponent). Each experimental data point consisted of a pair of games (switching sides) for each of 25 endgame positions. The programs searched to a depth of seven and nine ply. Results are reported using four- and five-piece abstractions of sevenand eight-piece endgames. Because of the variety of experiments performed, the search depth was limited to nine. The baseline considered here is C RAFTY, the strongest freeware chess program available [12]. It has competed in numerous World Computer Chess Championships, often placing near the top of the standings. Table 2 shows the impact of two parameters on performance: the endgame database size and the search space depth. The table gives results for three representative seven-piece endgames. The first column gives the endgame, the second gives the win percentage (as stated before, wins is counted as two, draws as one and losses as zero), and
M. Samadi et al. / Using Abstraction in Two-Player Games
Endgame KRPP–KBN KRPP–KNN KRP–KNPP KRPP–KBN KRPP–KNN KRP–KNPP KRPP–KBN KRPP–KNN KRP–KNPP KRPP–KBN KRPP–KNN KRP–KNPP
Search Depth 7 7 7 7 7 7 9 9 9 9 9 9
Win % 60 68 72 68 76 80 54 64 70 56 68 76
Abstractions Used KPPK, KKBN, KRK KRK, KRKP, KRPK, KNKP KKPP, KBKP, KPKN, KRK KPKBN, KRPKB, KRPKN KRPKN, KPPKN, KPKNN KRPKN, KRKNP, KPKNP, KPKPP KPPK, KKBN, KRK KRK, KRKP, KRPK, KNKP KKPP, KBKP, KPKN, KRK KPKBN, KRPKB, KRPKN KRPKN, KPPKN, KPKNN KRPKN, KRKNP, KPKNP, KPKPP
Table 2. Experiments in chess (four-piece and five-piece abstractions).
the last column shows the abstractions used. The first six lines are for a search depth of seven; the remaining six for a search depth of nine. For each depth, the first three lines show the results for using threeand four-piece databases as an abstraction; the last three rows show the results when five-piece databases are used. C RAFTY was used unchanged. It had access to the same endgame databases as our program, but it only used them when the current position was in the database. For all positions with more pieces, it used its standard endgame evaluation function. In contrast, our program, using abstraction, queried the databases every time a node in the search required to be evaluated. By eliminating redundant database lookups, the cost of an endgame-database evaluation can be made comparable to that of C RAFTY’s evaluation. Not surprisingly, the five-piece databases had superior performance to the four-piece databases (roughly 8% better for depth seven and 4% better at depth nine). Clearly, these databases are closer to the original position (i.e., less abstract) and hence are more likely to contain relevant information. Further, a significant drawback of small-size abstraction models is the large number of draw states in the database (e.g. KRKR), allowing little opportunity to differentiate between states. The five-piece databases contain fewer draw positions, giving greater decision domain to the evaluation function. As the search depth is increased, the benefits of the superior evaluation function slightly decrease. This is indeed expected, as the deeper search allows more potential errors by both sides to be avoided. This benefits the weaker program. Position KQP–KRNP KRRPP–KQR KRPP–KRN KQP–KNNPP KQP–KRBPP KQP–KRNP KRRPP–KQR KRPP–KRN KQP–KNNPP KQP–KRBPP
Search Depth 7 7 7 7 7 9 9 9 9 9 Table 3.
Win % 64 76 60 76 64 64 76 64 72 62
Abstractions Used KQKRP, KQKNP, KPKRN , KQKRN KQKRP, KQKNP, KPKRN KRPKN, KPPKR, KPKRN KPKNN, KQKNN, KQKNP KPKNN, KQKNN, KQKNP KQKRP, KQKNP, KPKRN, KQKRN KQKRP, KQKNP, KPKRN KRPKN, KPPKR, KPKRN KPKNN, KQKNN, KQKNP KQKRB, KQPKR, KQKRP, KQKBP
Experiments for chess.
Table 3 shows the results for some interesting (and complicated) seven- and eight-piece endgames, all using five-piece abstraction. These represent difficult endgames for humans and computers to play. Again, the endgame-database-based evaluation function is superior to C RAFTY, winning 60% to 76% of the games. This performance is achieved using three or four abstraction lookups, in contrast to C RAFTY’s hand-designed rule-based system. Why is the endgame database abstraction effective? The abstrac-
549
tion used for chess is, in part, adding heuristic knowledge to the evaluation function about exchanging pieces. In effect, the smaller databases are giving information about the result when pieces come off the board. This biases the program towards lines which result in favorable piece exchanges, and avoids unfavorable ones.
5 Conclusion and Future Works The research presented in this paper is a step towards increasing the advantages of pre-computed lookup tables for the larger class of multi-agent problem domains. The main contribution of this research was to show that the idea of abstraction can be used to extend the benefits of pre-computed databases for use in new ways in building an accurate evaluation function. For domains for which pattern and/or endgame databases can be constructed, the use of this data can be extended beyond its traditional usage and be be used to build an evaluation function automatically. As the experimental results show, even small databases can be used to produce strong game play. Since 2005, there has been interest in the AI community in building a general game-playing (GGP) program. The application-specific research in building high-performance games is being generalized to handle a wide class of games. Research has already been done in identifying GGP domains for which databases can be built [14]. For those domains, abstraction is a promising way to automatically build an evaluation function. An automated system has been developed to build a pattern databases for planning domains using binpacking algorithm to select the appropriate symbolic variables for pattern database [5]. Similar approach can be used to automatically select variables in GGP to build endgame/pattern databases.
REFERENCES [1] J. Baxter, A. Tridgell, and L. Weaver, ‘Learning to play chess using temporal differences’, Machine Learning, 40(3), 243–263, (2000). [2] E. Bleicher, 2008. http://k4it.de/index.php?topic= egtb&lang=en. [3] D. Breuker, Memory Versus Search in Games, Ph.D. dissertation, University of Maastricht, 1998. [4] J. Culberson and J. Schaeffer, ‘Searching with pattern databases’, in Canadian Conference on AI, pp. 402–416, (1996). [5] S. Edelkamp, ‘Planning with pattern databases’, in Proceedings of the 6th European Conference on Planning (ECP-01), pp. 13–34, (2001). [6] J. Schaeffer et al., ‘Checkers is solved’, Science, 317(5844), 1518– 1522, (2007). [7] T. Fawcett and P. Utgoff, ‘Automatic feature generation for problem solving systems’, in ICML, pp. 144–153, (1992). [8] A. Felner, U. Zahavi, J. Schaeffer, and R. Holte, ‘Dual lookups in pattern databases’, in IJCAI, pp. 103–108, (2005). [9] M. Genesereth, N. Love, and B. Pell, ‘General game playing: Overview of the AAAI competition’, AI Magazine, 26, 62–72, (2005). [10] F h. Hsu, Behind Deep Blue, Princeton University Press, 2002. [11] Alistair Hutton, Developing Computer Opponents for Chinese Checkers, Master’s thesis, University of Glasgow, 2001. [12] R. Hyatt, 2008. http://www.craftychess.com/. [13] R. Korf and A. Felner, ‘Disjoint pattern database heuristics’, Artificial Intelligence, 134, 9–22, (2002). [14] Arsen Kostenko, Calculating End Game Databases for General Game Playing, Master’s thesis, Fakultat Informatik, Technische Universitat Dresden, 2007. [15] G. Kuhlmann and P. Stone, ‘Automatic heuristic construction for general game playing’, in AAAI, pp. 1457–1462, (2006). [16] J. Schaeffer, One Jump Ahead, Springer-Verlag, 1997. [17] G. Tesauro, ‘Temporal difference learning and TD-Gammon’, CACM, 38(3), 58–68, (1995). [18] K. Thompson, ‘Retrograde analysis of certain endgames’, Journal of the International Computer Chess Association, 9(3), 131–139, (1986). [19] Paula Ulfhake, A Chinese Checkers-playing program, Master’s thesis, Department of Information Technology Lund University, 2000.
This page intentionally left blank
9. Planning and Scheduling
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-553
553
A Practical Temporal Constraint Management System for Real-Time Applications Luke Hunsberger Abstract. A temporal constraint management system (TCMS) is a temporal network together with algorithms for managing the constraints in that network over time. This paper presents a practical TCMS, called MYSYSTEM, that efficiently handles the propagation of the kinds of temporal constraints commonly found in realtime applications, while providing constant-time access to “all-pairs, shortest-path” information that is extremely useful in many applications. The temporal network in MYSYSTEM includes special timepoints for dealing with the passage of time and eliminating the need for certain common forms of constraint propagation. The constraint propagation algorithm in MYSYSTEM maintains a restricted set of entries in the associated all-pairs, shortest-path matrix by incrementally propagating changes to the network either from adding a new constraint or strengthening, weakening or deleting an existing constraint. The paper presents empirical evidence to support the claim that MYSYSTEM is scalable to real-time planning, scheduling and acting applications.
1 Introduction A Simple Temporal Network (STN) is a pair, (T , C), where T is a set of time-point variables (or time-points) and C is a set of temporal constraints, each having the form: tj − ti ≤ δ, for some ti , tj ∈ T and some real number δ [3]. In this paper, we let n = |T | and m = |C|. A solution to an STN is a set of real-valued assignments to the variables in T that satisfy all of the constraints in C. An STN is called consistent if it has at least one solution. Each STN, (T , C), has a corresponding graph, G = (T , E), where the nodes of the graph are the time-points in T , and the edges of the graph correspond one-to-one with the constraints in C. In particular, for each constraint, tj − ti ≤ δ, in C, there is an edge from ti to tj with weight δ in E. In this paper, we let k be the maximum number of edges incident to any node in the graph. An STN is consistent if and only if its corresponding graph has no negative cycles (i.e., loops with negative path-length) [3]. Most STNs include a special time-point—called the zero timepoint (or Z)—whose value is fixed at 0. Temporal constraints involving Z are equivalent to unary constraints. For example, Z − ti ≤ δ1 is equivalent to the lower-bound constraint, −δ1 ≤ ti ; and tj − Z ≤ δ2 is equivalent to the upper-bound constraint, tj ≤ δ2 . The distance matrix for an STN is an n-by-n matrix, D, such that D(ti , tj ) equals the length of the shortest path from ti to tj in the corresponding graph, G. Thus, D is the all-pairs, shortest-path (APSP) matrix for G. If there is no path from ti to tj , then D(ti , tj ) = ∞. Changing an STN over Time. An STN typically acquires new time-points and constraints over time. Algorithms that incrementally 1
Vassar College, Poughkeepsie, NY, USA, hunsberg@cs.vassar.edu
1
propagate changes to the STN in response to adding a new constraint or strengthening an existing constraint are called incremental algorithms. Algorithms that propagate changes to the STN in response to weakening or deleting a constraint already in the network are called decremental algorithms. Algorithms that are both incremental and decremental are called fully dynamic. Decremental algorithms have higher time complexity than their incremental counterparts [16, 12]. Executing Time-Points. In most applications, the starting and ending times of tasks are represented by time-points in a temporal network. When the task is begun—say, at time K—its starting time-point, ts , is fixed to the value K, by inserting the constraints, K ≤ ts ≤ K (i.e., Z − ts ≤ −K and ts − Z ≤ K). We say that ts has been executed at time K. Similarly, when the task is completed— say, at time L—its ending point, te , is fixed to the value L. Cesta and Oddi’s Algorithm. Cesta and Oddi [2] presented a fully dynamic algorithm for propagating changes to an STN. The algorithm does not maintain the entire distance matrix; instead, it maintains only enough entries to verify the consistency of the network. In particular, for each time-point t ∈ T , it only maintains entries of the form, D(Z, t) and D(t, Z). Thus, the space requirements are O(n). The incremental portion of the algorithm, which is a variation of the Bellman-Ford algorithm, has time complexity O(nm). The decremental portion of the algorithm first determines which entries might be affected by the change to the network and then runs the incremental portion on that part of the network. Since their algorithm does not maintain the full distance matrix, it can only discover negative cycles during the process of constraint propagation. Furthermore, answering distance matrix queries for entries other than those involving Z requires O(kn) time, instead of the constant look-up time that is afforded by having the full distance matrix. Maintaining the Full Distance Matrix. Maintaining an up-todate distance matrix requires O(n2 ) space and additional constraint propagation; however, it has the following important advantages. First, it provides constant-time lookup for distance-matrix entries, which facilitates the use of multi-agent coordination algorithms (e.g., temporal decoupling algorithms [9]). Second, before adding a new constraint (or strengthening an existing constraint) the consistency of the resulting network can be determined by constant-time lookup— in advance of any constraint propagation [11]. Researchers have developed fully dynamic algorithms for maintaining distance matrices [7, 5, 16, 4, 12]. Although these algorithms have attractive time complexities, they restrict the kinds of constraints that can populate a network and, thus, are inappropriate for many applications. Others have presented algorithms making fewer restrictions, but exhibiting poorer performance [13, 6]. The INCR 2004 Algorithm. The author recently presented a practical incremental algorithm for maintaining the full distance ma-
554
L. Hunsberger / A Practical Temporal Constraint Management System for Real-Time Applications
22 18 17
ti
4
9
6
8
5
tj
Figure 1.
tp
tm
tk
4
Z
tq direction of propagation
The PropFwd phase of the incremental algorithm
0
−d
N
tg 6
Before:
Z
During:
2
tm tj
12
After:
direction of propagation Figure 2.
The PropBkwd phase of the incremental algorithm
trix [8]. For ease of exposition, we shall refer to that algorithm as the INCR 2004 algorithm. That algorithm reduces the size of the network by collapsing all rigid components down to a single time-point.2 The INCR 2004 algorithm also reduces constraint propagation by propagating only along undominated edges.3 The undominated edges are stored in hash tables. In particular, for each time-point t, Precs(t) is a hash table containing the undominated edges coming in to t; and Succs(t) contains the undominated edges going out from t. The highlevel structure of the algorithm, which is based on work by several others [12, 13, 6], has two phases, called PropFwd and PropBkwd. The algorithm has time complexity O(kΔ), where Δ is the number of entries of D that actually need to be changed [12]. The PropFwd Phase. Suppose a new (or stronger) constraint, tj − ti ≤ δ, is added to the network. Fig. 1 illustrates the PropFwd phase, in which changes to distance matrix entries of the form, D(ti , t), are propagated by following the successors of tj . In the figure, decreasing the weight of the edge, ti tj , from 5 to 2 requires decreasing D(ti , tk ) from 9 to 6, and decreasing D(ti , tm ) from 17 to 14. Since D(ti , tp ) does not need to be changed, forward propagation stops at that point.4 During the PropFwd phase, each time-point, t, for which D(ti , t) changed is collected in a hash-table, AffectedTPs. The PropBkwd Phase. Fig. 2 illustrates the PropBkwd phase of the INCR 2004 algorithm. For each tm in AffectedTPs collected during the PropFwd phase, the predecessors of ti are followed, potentially leading to changes in entries of the form, D(t, tm ). For example, in the figure, the entry D(ti , tm ) had been reduced from 17 to 14 during the first phase. Its new value, requires reducing D(th , tm ) from 18 to 15. However, since D(tg , tm ) does not need to be changed, backward propagation stops at that point. Augmented STNs. An Augmented STN (ASTN) is an STN that has been augmented to include a special time-point, N, which represents the current time (i.e., “now”) [11]. Representing the now timepoint enables the network to explicitly handle the passage of time 2
3
4
−1 −2
ti
A rigid component is a set of time-points in which the temporal distance between each pair of time-points is constrained to be some fixed value. Other researchers have described collapsing rigid components [17, 7]. A constraint is called undominated if removing it from the network would necessarily require updating the distance matrix. In contrast, removing a dominated constraint from the network would leave the distance matrix unchanged. The algorithm takes advantage of the fact that dominated constraints are easy to detect in networks with no rigid components [10]. For expositional simplicity, Fig. 1 shows only one branch of the sub-tree rooted at tj . The PropFwd phase normally explores multiple branches of that sub-tree. Similar remarks apply to the PropBkwd phase.
tb tc
0
The now time-point in an ASTN
14
th 1
0
Figure 3.
20 18
ta
Z
N
2
−2 −3
Z
0
N
N
2
−2
t t
t
Figure 4. The execution of the time-point t at time 2
and the execution of time-points. The passage of time is handled by including a single edge from N to Z, with weight −d, representing the lower-bound constraint, d ≤ N. This edge, as illustrated in Fig. 3, is the only outgoing edge from the now time-point. As time passes, the value of d increases (i.e., the constraint involving Z and N grows stronger). Since the time-complexity of strengthening a constraint is lower than that of weakening or deleting constraints, this way of dealing with the passage of time is computationally attractive. In an ASTN, each unexecuted time-point, t, is constrained to occur at or after now—represented by an edge from t to N with weight 0. Fig. 3 illustrates these kinds of edges, which are the only incoming edges to the now time-point. When t is executed, the edge from t to N is deleted, and two edges between t and Z are inserted to fix t’s value. Fig. 4 provides “before”, “during” and “after” snapshots of a network in which t is executed at time 2. In the “before” snapshot, the current time is 1, and t is constrained to occur at or after that time. In the middle snapshot, t has been executed at time 2 (i.e., the edge from t to N has been deleted, and a pair of edges between t and Z have been inserted, fixing the value of t to 2). In the bottom snapshot, the current time has advanced to 3, but that has no effect on t. For an ASTN, the distance matrix entry, D(Z, N), can be interpreted as a kind of deadline [11]. In particular, if some time-point is not executed at or before this deadline, then the network is certain to become inconsistent—because the passage of time (i.e., the increased value of d on the edge from N to Z) will eventually generate a negative cycle. The potential inconsistency can be averted by executing one or more time-points, thereby deleting constraints involving N and increasing the value of D(Z, N).
2 Desiderata The main goal for the work described in this paper is to provide a temporal constraint management system that can serve as the basis for a temporal reasoning module in real-time planning, scheduling and acting applications, including multi-agent systems involving the coordination of temporally dependent, inter-agent activities. This high-level goal consists of the following subsidiary goals: • To maintain constant-time access to all distance-matrix entries • To reduce space requirements for the distance matrix (or any other auxiliary data structures) • To reduce the need for constraint propagation
L. Hunsberger / A Practical Temporal Constraint Management System for Real-Time Applications
ta
ta
tb −2
5
8
−3 Z
Figure 5.
8 Zout
5 −2 0
tb −3 Zin
Replacing the zero time-point by a pair of time-points
• To include a fully dynamic constraint propagation algorithm that is scalable to real-time applications Constant-time access to distance-matrix entries facilitates multiagent coordination algorithms (e.g., temporal decoupling [9]). Reducing space requirements for the distance matrix implies not explicitly representing every distance-matrix entry, while maintaining constant-time access. Reducing the need for constraint propagation makes the fully dynamic algorithm computationally palatable. “Scalable” means that the resulting TCMS is practical for applications involving hundreds, or even thousands of time-points.
3 Approach This paper presents a TCMS called MYSYSTEM that meets the desiderata listed above. In MYSYSTEM: • The now time-point, N, is explicitly represented (as in ASTNs). • The zero time-point, Z, is replaced by a pair of time-points, Zin and Zout , thereby eliminating propagation through Z, and reducing the number of distance matrix entries needing to be computed. • Since the portion of the distance-matrix that is actually computed is typically quite small, the values are stored in a hash table, instead of a two-dimensional array. • The incremental algorithm is essentially the same as the INCR 2004 algorithm, except that rigid components and dominated constraints are handled differently. • A new decremental algorithm is provided that manipulates the same data structures as the incremental algorithm. The algorithm, which draws on ideas from other researchers [4, 13], is not the fastest possible, but requires only minor auxiliary data structures. • Executed time-points are effectively removed from the network. Replacing the Zero Time-Point by a Pair of Time-Points. In real-world applications, the starting and ending times of tasks are typically subject to a variety of unary constraints—that is, constraints involving the zero time-point, Z. As a result, while the maximum number of edges incident on any other time-point might be, say, ten, the number of edges incident on Z can be O(n). Thus, a great deal of the constraint propagation needed to fully populate the distance matrix is due to constraints involving Z. To eliminate constraint propagation through Z, the temporal network in MYSYSTEM replaces Z by a pair of time-points, Zin and Zout .5 In particular, as illustrated in Fig. 5, Zin is the destination for all edges that would normally point to Z, and Zout is the source of all edges that would normally emanate from Z. Now, adding an edge from Zin to Zout with weight 0 (shown as a dashed arrow in the figure) would make the two networks in Fig. 5 equivalent; however, such an edge is purposely left out of the network in MYSYSTEM. This seemingly minor change eliminates propagation through Z; thus, it dramatically reduces the amount of computation required to maintain the distance matrix. At the same time, 5
This treatment of the zero time-point is somewhat similar to Cesta and Oddi’s treatment of the zero time-point as both a source and a sink [2].
555
MYSYSTEM retains the property of having constant-time access to all distance-matrix entries. To see this, suppose A is a standard ASTN and A is the same as A, except that the zero time-point has been replaced by Zin and Zout , as described above. Because the edge from Zin to Zout is left out of A , the distance matrices, D and D , are typically quite different. However, the relationship between their corresponding entries is simple. In particular, for any ti , tj ∈ T \{Z}:6
• D(ti , Z) = D (ti , Zin ) • D(Z, tj ) = D (Zout , tj ) • D(ti , tj ) = min{D (ti , Zin ) + D (Zout , tj ), D (ti , tj )} The last equality can be glossed as: “The shortest path from ti to tj either involves the zero time-point or it doesn’t.” In this way, although D typically contains far fewer finite entries than D, it can be used to fetch the value of any entry in D(ti , tj ) in constant time. The Distance-Matrix Hash Table. Due to the use of Zin and Zout , the constraint propagation algorithms in MYSYSTEM typically need to compute only a small fraction of the O(n2 ) entries in the distance matrix, D . Thus, to save space, a hash table is used to store only those entries that are actually computed. Any entry, D (ti , tj ), that has not been stored in the hash table is taken to be infinity, representing that there is no path from ti to tj . Hash-table keys are integers of the form, N i + j, where N is an upper bound on the number of time-points in the network. 7 For example, if N = 214 = 16384, then 28-bit values can be used for hash-table keys—which can be quickly computed using left-shift and addition operations. A Note about Rigid Components and Undominated Edges. In a purely incremental context, constraints are never weakened or deleted. Thus, rigid components, once created, can never become non-rigid. Thus, it is safe to collapse each rigid component down to a single point as soon as it is created. Insodoing, the network remains free from rigidities, which simplifies the detection of dominated constraints. In contrast, a fully dynamic algorithm must handle the weakening or deleting of constraints and, thus, cannot afford to collapse all rigid components—because undoing such transformations can be too computationally costly. Thus, the fully dynamic algorithm in MYSYSTEM does not typically collapse rigid components. Thus, the network in MYSYSTEM may contain rigidities, thereby complicating the detection of dominated edges. For this reason, the detection of dominated edges in MYSYSTEM is restricted to cases where a strictly shorter alternative pathway is found.8 In addition, the decremental algorithm can sometimes insert dominated edges into the Precs and Succs hash tables—because avoiding doing so would be too computationally costly. However, when the incremental algorithm detects these dominated edges, they are immediately removed from the Precs and Succs hash tables. Thus, in this sense, the fully dynamic algorithm in MYSYSTEM can be said to propagate along “mostly” undominated edges. The Decremental Algorithm in MYSYSTEM. The decremental algorithm is used when an existing constraint, tj − ti ≤ δ, is either weakened or deleted. The algorithm has the following three phases: (1) In a hash-table called Changelings, collect all pairs, (tx , ty ), such that D (tx , ty ) might need updating. (2) For each (tx , ty ) in Changelings, check for shorter alternative pathways from tx to ty ; collect the shortest alternatives in a hash-table called AltPaths. 6 7 8
T \{Z} denotes the set of time-points in A other than Z. Demetrescu and Italiano [4] encode pairs in this way. In contrast, the INCR 2004 algorithm also detects edges that are dominated by a path whose length is the same as that of the edge being dominated.
556
L. Hunsberger / A Practical Temporal Constraint Management System for Real-Time Applications
(3) Incrementally propagate the constraints in AltPaths. Phase 1. Consider the path from tx to ty shown below, where the wavy arrows represent shortest paths and δ is the original weight of the edge being weakened/deleted.
tx
ti
δ
tj
ty
The pair, (tx , ty ), is collected during Phase 1 if and only if: D (tx , ty ) = D (tx , ti ) + δ + D (ti , ty ) All such pairs are collected using a two-pass algorithm that has the same structure as the PropFwd and PropBkwd phases of the incremental algorithm. Thus, Phase 1 takes time O(kΔ), where Δ is the number of pairs in Changelings. After the Changelings hash-table has been populated, the corresponding distance-matrix entries are assigned new values, as follows. If the edge, ti tj , has been deleted, then each D (tx , ty ) is set to ∞, because the deletion of ti tj might mean there no longer is any path from tx to ty . On the other hand, if ti tj was simply weakened—say by an amount α—then each D (tx , ty ) is set to the value D (tx , ty ) + α + 1. Using this value, which is necessarily greater than the eventual updated value, forces D (tx , ty ) to be updated during Phase 2 or 3. Since MYSYSTEM does not maintain any pointers to first or last steps of shortest paths (e.g., as done by Rohnert [13]), the Changelings hash table may end up containing some pairs whose distance-matrix entries do not need to be updated. Instead of maintaining complex auxiliary data structures to avoid this, the decremental algorithm discovers alternative paths during Phase 2 and 3 to ensure that the corresponding distance-matrix entries are restored. Phase 2. For each (tx , ty ) in Changelings, alternative pathways of the forms given below are collected in a hash-table called AltPaths.9 tx
ty
edge
tx
tk edge
ty
shortest path
tx
tv shortest path
ty edge
For some (tx , ty ) in Changelings, it may be that no alternative paths exist. For other pairs, more than one such path may exist; however, only the shortest such paths are kept in AltPaths. The hash-key for the AltPaths hash table is the pair, (tx , ty ); the value is the length of the alternative path. (Interior time-points on the path are not needed.) Notice that the alternative pathways collected during Phase 2 may well have been dominated prior to the weakening (or deleting) of the edge ti tj , as illustrated below in the case of an alternative edge. 16
tx 3
ti
5
ty tj
4
Prior to weakening ti tj from 5 to 10, the edge, tx ty was not a shortest path; however, afterward, it becomes a shorter (and possibly shortest) path. For this reason, the edges considered during Phase 2 are drawn from the set C—which contains all of the edges in the network—not just those in the Precs and Succs hash tables. 9
Demetrescu and Italiano [4] refer to such pathways as locally shortest.
Phase 3. During Phase 3, the alternative paths found in Phase 2 are incrementally propagated. There are several options for doing this. Each alternative path could, in turn, be completely propagated using the incremental algorithm. However, this sort of depth-first approach might result in a large amount of redundant propagation. Another option, analogous to A∗ search, would be to sort the alternative paths according to how close their path-lengths were to the original value of D (tx , ty ) and apply the incremental algorithm to those alternative paths in their sorted order. The decremental algorithm in MYSYSTEM takes an iterative, breadth-first approach. In the first iteration, each path in AltPaths is propagated only one step along the predecessors of tx and the successors of ty . Each one-step propagation generates a new update which is stored in a hash-table called newAltPaths. During the second iteration, each update in newAltPaths is propagated only one step, generating new updates for the third iteration. This iterative process terminates when no more updates are generated. Empirical evidence suggests that this form of incremental propagation is quite practical. Removing Executed Time-Points. As discussed earlier, the fully dynamic algorithm does not typically collapse rigid components, because undoing such transformations in response to constraint relaxations can be too computationally costly. However, when a timepoint, t, is executed, it forms a rigid component with Zin and Zout that is guaranteed to persist. Thus, it is safe to collapse this kind of rigid component. Doing so effectively removes t from the network by reorienting constraints involving t toward Zin and Zout .
4 Empirical Evaluation The MYSYSTEM TCMS was tested on a set of thirty 25-agent scheduling problems drawn from the Phase 2 Evaluation for the DARPA Coordinators Project [15]. These kinds of problems are represented in the cTAEMS language, the details of which are described elsewhere [1]. The important characteristics of the test problems are shown in the top plot in Fig. 6. Each problem involved between 1507 and 3273 time-points (plotted on the horizontal axis) and between 803 and 1686 activities (ACTS).10 For each problem, a centralized scheduler [14] was used to generate a set of agent schedules seeking to optimize the cTAEMS quality metric. In the process, the scheduler invoked the incremental algorithm of MYSYSTEM between 3461 and 7353 times (INCRS), and the decremental algorithm between 254 and 1185 times (DECRS). The resulting schedules included a total of between 139 and 243 activities (SCHEDS), and resulted in networks with between 3326 and 6797 edges (EDGES). The middle plot of Fig. 6 shows the CPU time used by MYSYSTEM to do all of the temporal computations for each scheduling problem. The CPU time ranged from 2 seconds to 2 minutes for each problem. In the worst case, the 2 minutes of computation, spread over 8000 invocations of the incremental or decremental algorithms, averaged to about 15 msec per invocation. The bottom plot of Fig. 6 shows the memory usage by MYSYS TEM . The number of finite distance-matrix cells (i.e., those that were actually stored in a hash table) ranged from about 77,000 to about 850,000 per problem. In contrast, the full distance matrix would have required between 2.2 and 10.7 million cells. Given that typical entries are four bytes, such a matrix could have required over 40 megabytes of memory. In contrast, the total memory used by MYSYSTEM during the course of each scheduling problem, most of which was dynamically allocated and freed, ranged from about 8 to 92 megabytes. 10
Some activities share time-points; hence the number of time-points is somewhat less than double the number of activities.
L. Hunsberger / A Practical Temporal Constraint Management System for Real-Time Applications
INCRS 7000
EDGES
5000
557
sults on temporal networks derived from a centralized scheduler applied to a variety of 25-agent scheduling problems involving thousands of time-points.
Acknowledgments
3000 ACTS 1000
DECRS SCHEDS 1600
2000
2400
2800
3000
NUMBER OF TIME POINTS
The research presented in this paper was supported in part by subcontract 55-000723 between Vassar College and SRI International as part of the DARPA Coordinators Project (Contract FA8750-05-C0033). Any opinions, findings and conclusions or recommendations expressed in this paper are those of the author and do not necessarily reflect the views of DARPA. The author thanks Stephen Smith, Zachary Rubinstein, Terry Zimmerman, Laura Barbulescu and Anthony Gallagher from Carnegie Mellon University for providing access to their scheduler.
3
CPU SECONDS
10
REFERENCES
2
10
1
10
0
10
1600
2000
2400
2800
3000
NUMBER OF TIME POINTS
8
10
Total Memory Used (Bytes)
7
10 Potential Size of Distance Matrix 6
10
5
10
Number of Finite Distance Matrix Entries
4
10
1600
2000
2400
2800
3000
NUMBER OF TIME POINTS
Figure 6.
Results of experiments on 25-agent scheduling problems
All experiments were run on an IBM Thinkpad laptop with a 2.4GHz Intel processor using Allegro Common Lisp, version 8.1.
5 Conclusion This paper presented a new temporal constraint management system, called MYSYSTEM, that combines novel STN representations with a fully dynamic propagation algorithm that is practical for realworld, real-time applications. The temporal network in MYSYSTEM includes special time-points to eliminate a common form of constraint propagation and reduce the number of distance-matrix entries that typically need to be computed. The fully dynamic algorithm extends an earlier incremental algorithm. It limits propagation to “mostly” undominated edges. The paper provided empirical re-
[1] M. Boddy, B. Horling, J. Phelps, R. Goldman, R. Vincent, C. Long, and B. Kohout, ‘C taems language specification, version 1.06 (0)’. [2] Amedeo Cesta and Angelo Oddi, ‘Gaining efficiency and flexibility in the simple temporal problem’, in Proceedings of the Third International Workshop on Temporal Representation and Reasoning (TIME-96), pp. 45–50. IEEE, (1996). [3] Rina Dechter, Itay Meiri, and Judea Pearl, ‘Temporal constraint networks’, Artificial Intelligence, 49, 61–95, (1991). [4] C. Demetrescu and G. Italiano, ‘A new approach to dynamic all pairs shortest paths’, in Proceedings of the 35th STOC, pp. 159–166, (2003). [5] Camil Demetrescu and Giuseppe F. Italiano, ‘Improved bounds and new trade-offs for dynamic all pairs shortest paths’, Technical Report ALCOMFT-TR-02-1, ALCOM, (2002). [6] Shimon Even and Hillel Gazit, ‘Updating distances in dynamic graphs’, Methods of Operations Research, 49, 371–387, (1985). [7] Alfonso Gerevini, Anna Perini, and Francesco Ricci, ‘Incremental algorithms for managing temporal constraints’, Technical Report IRST9605-07, IRST. [8] Luke Hunsberger, ‘Quantitative temporal reasoning in planning problems’. AAAI-2004 Tutorial MP-2, slides available at: http://www.cs.vassar.edu/˜hunsberg. [9] Luke Hunsberger, ‘Algorithms for a temporal decoupling problem in multi-agent planning’, in Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-2002), (2002). [10] Luke Hunsberger, Group Decision Making and Temporal Reasoning, Ph.D. dissertation, Harvard University, 2002. Available as Harvard Technical Report TR-05-02. [11] Luke Hunsberger, ‘Distributing the control of a temporal network among multiple agents’, in Proc. of the 2nd Int’l. Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS-03), (2003). [12] G. Ramalingam and Thomas Reps, ‘On the computational complexity of dynamic graph problems’, Theoretical Computer Science, 158, 233– 277, (1996). [13] Hans Rohnert, ‘A dynamization of the all pairs least cost path problem’, in 2nd Symposium of Theoretical Aspects of Computer Science (STACS 85), ed., Kurt Mehlhorn, volume 182 of Lecture Notes in Computer Science, 279–286, Springer, (1985). [14] S. Smith, A.T. Gallagher, T.L. Zimmerman, L. Barbulescu, and Z. Rubinstein, ‘Distributed management of flexible times schedules’, in Intl. Conf. on Autonomous Agents and Multiagent Systems, (2007). [15] Valerie Guralnik Thomas Wagner, John Phelps and Ryan VanRiper, ‘COORDINATORS: Coordination managers for first responders’, in Proc. of the 3rd Intl. Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2004). IEEE Computer Society, (2004). [16] Mikkel Thorup, ‘Worst-case update times for fully-dynamic all-pairs shortest paths’, in Annual ACM Symposium on Theory of Computing, pp. 112–119, (2005). [17] Ioannis Tsamardinos, Reformulating Temporal Plans for Efficient Execution, Master’s thesis, University of Pittsburgh, 2000.
558
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-558
Towards Efficient Belief Update for Planning-Based Web Service Composition J¨org Hoffmann1 Abstract. At the “functional level”, Semantic Web Services (SWS) are described akin to planning operators, with preconditions and effects relative to an ontology; the ontology provides the formal vocabulary and an axiomatisation of the underlying domain. Composing such SWS is similar to planning. A key obstacle in doing so effectively is handling the ontology axioms, which act as state constraints. Computing the outcome of an action involves the frame and ramification problems, and corresponds to belief update. The complexity of such updates motivates the search for tractable classes. Herein we investigate a class that is of practical relevance because it deals with many commonly used ontology axioms, in particular with attribute cardinality upper bounds which are not handled by other known tractable classes. We present an update computation that is exponential only in a comparatively uncritical parameter; we present an approximate update which is polynomial in that parameter as well.
1
Introduction
Semantic Web Services (SWS) are pieces of software advertised with a formal description of what they do; Web Service Composition (WSC) means to link them together in a way satisfying a complex user requirement. WSC is widely recognized for its economic potential. In the wide-spread OWL-S [3] and WSMO [5] frameworks, at the so-called “functional level” (which abstracts from interaction details and specifies only overall functionality), SWS are described akin to planning operators, with preconditions and effects relative to an ontology. Hence planning – planning under uncertainty, since information in the web context cannot be expected to be complete – is a prime candidate for realizing this form of WSC. In our work, we pursue a kind of conformant planning [17]. The tool we develop performs a forward search as per Figure 1. Each s represents (partial) knowledge about the corresponding belief b, where as usual b is the set of all situations possible at the given point in time. Maintaining the states s is challenging because it involves a belief update problem. Namely, the main difference to most work in conformant planning is that we consider state constraints, e.g. [8, 2, 16]: the domain axiomatization given in the ontology. Such axioms are state constraints in the sense that any state that can be encountered, in the given domain, is known to satisfy them. In the presence of such axioms, computing the outcome of an action involves the frame and ramification problems: How do the axioms affect the previous world, and what are their side effects? Following various authors, e.g. [10, 15], we define action outcomes as belief updates, where the “update” is the action effect conjoined with the axioms. Belief update has been shown to be hard even in tractable logics (e.g. Horn [4]). Since update is a frequently solved sub-problem in 1
SAP Research, CEC Karlsruhe, Germany, joe.hoffmann@sap.com
s0 := initialise(); open-list := s0 while TRUE do s := choose(open-list) if is-solution(s) then return path leading to s for all calls a of SWS applicable in s do s := update(s, a); insert(open-list,s ) Figure 1.
The main loop of our planner.
planning as per Figure 1, the need for tractable classes is tantalising. In this context, it is of particular interest that practical WSC problems, e.g. the widely used Virtual Travel Agency (VTA) scenario, often come with fairly simple domain axiomatizations. Some of the most typically used axioms are: subsumption relations, which herein we write as clauses of the form ∀x : train(x) ⇒ vehicle(x); attribute range type restrictions ∀x, y : ticketfor (x, y) ⇒ person(y); mutual exclusion ∀x : ¬train(x) ∨ ¬car (x); and bounds on the number of distinct attribute values, such as the axiom ∀x, y1 , y2 , y3 : (ticketfor (x, y1 ) ∧ ticketfor (x, y2 ) ∧ ticketfor (x, y3 )) ⇒ (y1 = y2 ∨ y1 = y3 ∨ y2 = y3 ) which is a cardinality upper bound saying that at most two persons may travel on the same ticket. The above raises the question which classes of axioms allow a polynomial time belief update. To our knowledge, the only existing work exploring this question is DL-Lite [6, 7], a fragment of DL for which belief update can be done efficiently, and the new belief can be represented in terms of a single ABox. The latter is necessary since the updated belief will be visible to the user. DL-Lite does not allow cardinality upper bounds. In this paper, we identify a tractable fragment which includes such bounds. A key difference to DL-Lite is that we don’t require beliefs to be understandable for a user: the representation is internal to the planner, and so we are completely free in how to define the search states s. We show that this enables us to deal with cardinality upper bounds, in time exponential only in the maximum bound k imposed by any such bound. The belief update algorithm we present deals also with binary clauses, i.e., clauses of at most two literals, such as subsumption relations, attribute range type restrictions, and mutual exclusion. One would usually expect k to be 1 or 2 (rather than, say, 17). However, in large tasks the complexity of the update can become critical even for small k. We hence also pursue the idea of sacrificing either of soundness or completeness, for tractability. We present an approximate update algorithm that is polynomial also in k. A few words are in order regarding our planning formalism. In difference to DL-Lite, and in line with the usual planning formalisms, we make a closed world assumption where a finite set of constants is fixed. The motivation for this is simply that it is closer to existing planning tools, and hence is expected to make it easier to eventually build on that work. The other main design decision regards the semantics of belief update. We adopt the possible models approach
J. Hoffmann / Towards Efficient Belief Update for Planning-Based Web Service Composition
(PMA)[18], which addresses the frame and ramification problems via a set-based notion of minimal change. Alternative semantics should be considered in the future: from an application perspective, at the time of writing there isn’t sufficient material on concrete use cases in order to tell whether one or the other semantics is more practical. The PMA has been used in many recent works related to formal semantics for WSC, e.g. [15, 1, 6], and is hence somewhat canonical.2 Section 2 introduces our planning formalism. Section 3 establishes some core observations. Sections 4 and 5 present our exact respectively approximate update algorithms. Section 6 discusses closely related work and Section 7 concludes. For lack of space, we omit all proofs and many other details such as notions of, and algorithms for, output constants and a construct for more flexible updates of attribute values. The full paper is available as a TR [12].
2
WSC Formalism
Our formalism follows standard notions from conformant planning, extended by modelling constructs for axioms. Our terminology is as used in the WSC area; it should be obvious how this corresponds to planning terminology. We denote predicates with G, H, I, variables with x, y, and constants with c, d, e. We treat equality as a “built-in” predicate. Literals are possibly negated predicates whose arguments are variables or constants; if all arguments are constants, the literal is ground. Given a set X of variables, we denote by LX the set of all literals which use only variables from X. If l is a literal, we write l[X] to indicate that l uses variables X. If X = {x1 , . . . , xk } and C = {c1 , . . . , ck }, then by l[c1 , . . . , ck /x1 , . . . , xk ] we denote the substitution, abbreviated l[C]. In the same way, we use substitution for any construct involving variables. By l, we denote of V the inverse V l. If L is a set of literals, then L := {l | l ∈ L} and L := l∈L l. An ontology Ω is a pair (P, Φ) where P is a set of predicates and Φ is a conjunction of closed first-order formulas. We call Φ a theory. A clause is a disjunction of literals with universal quantification on the outside, e.g. ∀x.¬G(x) ∨ H(x) ∨ I(x). A clause is binary if it contains at most two literals. Φ is binary if it is a conjunction of binary clauses. The only non-binary clauses we will consider are cardinality upper bounds, taking the form ∀x, y1 , . . . , yk+1 .(G(x, y1 )∧ . . . G(x, yk+1 )) ⇒ (y1 = y2 ∨ y1 = y3 ∨ · · · ∨ yk = yk+1 ); to simplify notation, we will refer to such a clause as image(G) ≤ k. A theory is binary with cardinality upper bounds if it consists entirely of binary clauses and cardinality upper bounds. We will consider the special case where every predicate G with a bound image(G) ≤ k does not appear positively in any binary clause; we refer to such Φ as binary with consequence-independent cardinality upper bounds. Note that this includes subsumption relations, attribute range type restrictions, mutual exclusion, and cardinality upper bounds. A web service w is a tuple (Xw , prew , effw ), where Xw is a set of variables (the inputs), prew is a conjunction of literals from LXw (the precondition), and effw is a conjunction of literals from LXw (the effect).3 Before a web service can be applied, its inputs must be instantiated with constants, yielding a service; to avoid confusion with the search states s, we refer to services as actions a (which is 2
3
Notably, one of the main arguments made against the PMA, e.g. by [2, 16, 11] is that it lacks a notion of causality. However, ontology languages such as OWL do not model causality; all we are given is a set of axioms. Hence this criticism does not apply for WSC (unless one proposes an entirely new framework for modelling web services, which is not our focus here). Note that this definition of preconditions and effects (conjunctions of literals) is quite restrictive. This is intended since we’re looking for tractable classes in here. It remains to be verified in future work if and inhowfar this restriction can be relaxed without losing our tractability results.
559
in accordance with the usual planning terminology). Formally, for a web service (X, pre, Y, eff) and tuple of constants Ca , an action a is given by (prea , effa ) = (pre, eff)[Ca /X]. By convention, given an arbitrary action a, we will use Ca to denote a’s input instantiation. WSC tasks are tuples (Ω, W, C, U ). Ω is an ontology, W is a set of web services, and C is a set of constants. U is the user requirement, a pair (preU , effU ) of precondition and effect. For complexity considerations, we will restrict WSC tasks to have fixed arity, meaning a constant upper bound on predicate arity, the number of parameters of any web service, and the depth of quantifier nesting in Φ. Further, we will sometimes assume fixed maximum cardinality, meaning a constant upper bound on k in any axiom image(G) ≤ k. The semantics of our formalism relies on a notion of beliefs, where each belief is a set of models. Each model is an interpretation of all propositions formed from P and C. The initial belief b0 is undefined if Φ ∧ preU is not satisfiable; else, b0 := {m | m |= Φ ∧ preU }. A solved belief is a belief b s.t., for all m ∈ b, m |= effU . It remains to define how actions affect models and beliefs. Say m is a model and a is an action; as stated, we define the outcome Res(m, a) following [18]. We say that a is applicable in m if m |= prea . If a is not applicable in m, then Res(m, a) is undefined. Otherwise, Res(m, a) := {m | m ∈ min(m, Φ ∧ effa )}. Here, min(m, φ) is the set of all m that satisfy φ and that are minimal with respect to the partial order defined by m1 ≤ m2 :iff for all propositions p, if m2 (p) = m(p) then m1 (p) = m(p). That is, m differs in a set-inclusion minimal subset of values from m. Say b is a belief. Res(b, a) is undefined if there exists m ∈ b so that Res(m,Sa) is undefined, or so that Res(m, a) = ∅. Else, Res(b, a) := m∈b Res(m, a). The Res function is extended to sequences a1 , . . . , an in the obvious way. A solution is a sequence a1 , . . . , an s.t. Res(b0 , a1 , . . . , an ) is a solved belief. Example 1 Given predicate ticketfor with image(ticketfor ) ≤ 2, and constants t, Peter , Bob, Mary. Initially, ticketfor (t, Peter )∧ ticketfor (t, Bob). Say we apply a1 with effect ticketfor (t, Mary). We get two resulting states, one with ticketfor (t, Peter ) ∧ ticketfor (t, Mary) and one with ticketfor (t, Bob)∧ticketfor (t, Mary) (but none with only ticketfor (t, Mary), since that would not be a minimal change). Say we now apply a2 with effect ticketfor (t, Peter ). We get two states, with ticketfor (t, Peter )∧ticketfor (t, Mary) and ticketfor (t, Peter ) ∧ ticketfor (t, Bob), respectively.
3
Basic Observations
We make a number of basic observations: lemmas used in our update computations, and negative results supporting our design decisions. We first make some general observations about belief intersections, then we consider binary clauses and cardinality upper bounds. Before thinking about how to update beliefs, one needs to think about how to represent beliefs, and, even, which aspects of beliefs to represent. Every belief may contain an exponential number of different models, and hence symbolic representations should be utilized, and/or only a partial knowledge should be maintained. Herein, we focus on the latter. Inspired by recent techniques from conformant planning [13] (with no state constraints), we aim at maintaining only belief intersections: the set of literals that are T true in all models of a T T belief b, m∈b {l | m |= l} =: b. Based on b, we can determine T whether an action a is applicable to b, namely iff prea ⊆ b, and T whether b is solved, namely iff effU ⊆ b. So, ideally, we wish to define the search states s from Figure 1 as sets Ls of literals: if b is a belief Tand s the corresponding search state, then we want to have Ls = b. The question is, how do we maintain those s?
560
J. Hoffmann / Towards Efficient Belief Update for Planning-Based Web Service Composition
T First, one piece of bad news is that computing Res(m, a) is very hard in general, and is hard even if Φ is Horn. This follows directly from earlier results in the area of belief update [4]:
Lemma 4 Let φ be a propositional CNF, with φ = φ1 ∧ φ2 where there exists no literal l s.t. l appears in φ1 and and l appears in φ2 . Let l be a literal s.t. φ |= l. Then either φ1 |= l or φ2 |= l.
Proposition 1 Assume a WSC task (Ω, W, C, U ) with fixed arity. Assume a model m, an action a, andTa literal l such that m |= l. It is Πp2 -complete to decide whether l ∈ Res(m, a). If Φ is Horn, then the same decision is coNP-complete.
This is easy to see based on the lack of conflicts between φ1 and φ2 . A more subtle point is that even dealing with cardinality upper bounds in isolation is tricky. T T Namely, it is not possible to compute Res(b, a) based only on b:
This shows in particular that it is not necessarily enough to restrict ourselves to a tractable logics for Φ – at least in the case of Horn logics, that does not make the update problem tractable. The question arises whether the same is the case for binary clauses. As one might suspect, the answer is “no”. The following two technical observations can be used to prove this fact; they are also used further below to prove the correctness Tof our update computations. First, literals l ∈ Res(b, a) do not appear “out of thin air”:
Proposition 3 There exist a WSC task (Ω, W, C, U ) where Φ consists entirely of cardinality upper T bounds, T an action T a, and two reachable beliefs b and b s.t. b = b , but Res(b, a) = T Res(b , a).
Lemma 1 AssumeTa WSC task (Ω, W, C, U). Assume aTbelief b and an action a. Then Res(b, a) ⊆ {l | Φ ∧ effa |= l} ∪ b. T This is due to the PMA, which, if l ∈ b and Φ ∧ effa |= l, generates m ∈ Res(b, a) so that mT|= l. Lemma 1 means that, in general, Res(b, a) can be computed in two steps: T (A) determine {l | Φ ∧ effaT|= l}; (B) determine which l ∈ b do not disappear, i.e., l ∈ Res(b, a). Obviously, (A) is just deduction in Φ. The more tricky part T is (B). The following observation characterizes exactly when l ∈ b disappears: Lemma 2 Assume a WSC task T (Ω, W, C, U).TAssume a belief b, an action a, and a literal l ∈ b. Then, l ∈ Res(b, a) iff there exists a set V L0 of literals satisfied by a model V m ∈ b, such that Φ ∧ effa ∧ L0 is satisfiable and Φ ∧ effa ∧ L0 ∧ l is unsatisfiable. Intuitively, L0 is the “reason” why l disappears: it is consistent with the effect and hence true in a model of Res(b, a); but it excludes l. We can conclude that, for binary clauses, a literal disappears only if its opposite is necessarily true: Lemma 3 Assume a WSC task (Ω, W, C, UT ) where Φ T is binary. Assume a belief b, an action a, and a literal l ∈ b. If l ∈ Res(b, a), then Φ ∧ effa ∧ l is unsatisfiable. V Namely: by Lemma 2 there V exists L0 so that Φ ∧ L0 ∧ l is satisfiable, but Φ ∧ effa ∧ L0 ∧ l is unsatisfiable; with binary Φ, this implies that Φ ∧ effa ∧ l is unsatisfiable. By Lemmas 1 and 3, and since reasoning in grounded binary Φ is polynomial, we get: Corollary 1 Assume a WSC task (Ω, W, C, U ) with fixed arity, where Φ is binary. Assume T a belief b, and anTaction a; let LT:= {l | Φ ∧ effa |= l}. Then Res(b, a) = L ∪ ( b \ L). Given b, this can be computed in time polynomial in the size of (Ω, W, C, U). Corollary 1 is a moderately interesting result since binary clauses are somewhat complementary to DL-Lite. The more important use of Lemmas 1, 2, and 3 will be below where we consider the combination of binary clauses with cardinality upper bounds. Our first observation regarding that combination is: Proposition 2 Assume a WSC task (Ω, W, C, U ) with fixed arity, where Φ is binary with cardinality upper bounds. Deciding whether Φ is satisfiable is NP-complete. By a straightforward reduction from VERTEX COVER. We sidestep this source of intractability by restricting ourselves to Φ that are binary with consequence-independent cardinality upper bounds (c.f. Section 2): any predicate G with a bound image(G) ≤ k does not appear positively in the binary clauses. Note that G appears only negatively in the clause image(G) ≤ k. This removes the problem:
A model m may disappear when applying an action a , and not be re-created when to beliefs T a is inverted. This leads T T b where b = {m | m |= Φ ∧ b},4 and further to b, b s.t. b = b but b = b . This means that it is not possible to, as envisioned, define the search states s simply as sets Ls – at least not if we want to ensure that Ls is exactly the intersection of the corresponding belief. We need to augment s with additional information. We have experimented for some time with methods augmenting s with the min and max number of attribute values present in any model of the belief. The intuition behind such an approach would be that cardinality upper bounds affect only how many, not which attribute values there are. However, this is not true since the cardinality upper bounds are intermingled with action effects; this makes capturing the precise distribution of attribute value tuples a surprisingly tricky task. It remains an open question whether beliefs in the presence of cardinality upper bounds can be represented concisely. Herein, we present two alternative options. The first option, Section 4, takes time and space that is exponential (only) in the maximum k of any upper bound image(G) ≤ k. The second option, Section 5, takes polynomial time also in k, but sacrifices precision and guarantees only one of soundness or completeness (the user may choose which one).
4
Exact Belief Update
We now specify search states s and associated initialise and update procedures that enable us to maintain precise belief intersections. We need three notations. First, by Φ|2 , we denote the subset of binary clauses of Φ. Second, if L is a set of literals, G is a predicate with arity 2, and c is a constant, then we denote L|G,c := {d | G(c, d) ∈ L}. That is, L|G,c selects from L the values of attribute G for c. Similarly, L|−G,c := {d | ¬G(c, d) ∈ L}. Third, say b is a belief; we introduce a formal notation for the precise distribution, denoted Db , of attribute value tuples. Our search states will explicitly keep track of that distribution, and hence contain suficient information for preT cise belief update (this is not possible based only on b, c.f. Proposition 3). Db maps any G where image(G) ≤ k in Φ, and any c ∈ C, onto a set of subsets of C. Namely, for each m ∈ b, Db (G, c) contains the set {d | m |= G(c, d)}. Hence, for every G and c, Db (G, c) specifies which combinations of attribute values occur. Our search states s are pairs (Ls , Ds ). Consider Figures 2 and 3. In lines (1) to (3), Figure 2 determines all logical consequences, L, of the initial literals and the binary part of Φ, and checks whether L is contradictory. Thereafter, cardinality upper bounds are handled; note that this can be done separately because of Lemma 4. Line (5) detects any violated upper bounds. Line (6) says that, for any cardinality upper bound where we already have the maximum number 4
This relates to [14], who show that DL updates can often not be represented in terms of a single changed ABox.
J. Hoffmann / Towards Efficient Belief Update for Planning-Based Web Service Composition
procedure initialise() (1) LpreU := {l | l appears in preU } V (2) L := {l | Φ|2 ∧ LpreU |= l} (3) if ex. l s.t. l ∈ L and l ∈ L then return (undefined) (4) for all image(G) ≤ k in Φ, c ∈ C do pre (5) if |L|G,cU | > k then return (undefined) pre (6) if |L|G,cU | = k then pre L := L ∪ {¬G(c, d) | d ∈ C, d ∈ L|G,cU } preU (7) D(G, c) := {D | D ⊆ C, L|G,c ⊆ D, D ∩ L|−G,c = ∅, |D| ≤ k} (8) return (L, D) Figure 2.
The initialise procedure for exact search states.
of allowed attribute values, all other values are disallowed. Line (7) sets the D(G, c) value combination sets as appropriate, taking every combination that adheres to all constraints. procedure update(s, a) (1) if prea ⊆ Ls then return (undefined) (2) LA := {l | l appears V in effa } (3) L := {l | Φ|2 ∧ LA |= l} (4) if ex. l s.t. l ∈ L and l ∈ L then return (undefined) (5) for all image(G) ≤ k in Φ, c ∈ C do (6) if |LA |G,c | > k then return (undefined) (7) if |LA |G,c | = k then L := L ∪ {¬G(c, d) | d ∈ C, d ∈ LA |G,c } (8) D(G, c) := ∅ (9) LAT := L; L := L ∪ {l | l ∈ Ls , l ∈ L} (10) for all image(G) ≤ k in Φ, c ∈ C, D ∈ Ds (G, c) do AT (11) if |D ∪ LA |G,c \ L|−G,c | > k then (12) L := L \ {G(c, d) | G(c, d) ∈ Ls \ LA } (13) D(G, c) := D(G, c)∪ A AT {D ∪ LA |G,c | D ⊆ D \ (L|G,c ∪ L|−G,c ), A |D | = k − |L|G,c |} AT (14) else D(G, c) := D(G, c) ∪ {D ∪ LA |G,c \ L|−G,c } (15) return (L, D) Figure 3.
The update procedure for exact search states.
The update procedure, Figure 3, is more complicated. Line (1) tests whether a is applicable. Lines (2) to (7) are analogous to lines (1) to (6) of Figure 2. Line (8) initialises the D structures. Line (9) extends L with all literals from Ls , except those that areT contradicted by L. By Lemma 1, the resulting L is a superset of Res(b, a). By Lemma 3,Tas far as binary clauses are concerned, the resulting L is equal to Res(b, a). For cardinality upper bounds, Lemma 3 does not apply, which necessitates lines (10) to (12) to check if further “old” belief intersection literals disappear. Namely, applying Lemma 2, an old attribute value (even if it is not contradicted) survives only if there exists no model m ∈ b so that, after the effects and their direct consequences have been applied, m contains too many attribute values. To figure out whether or not the latter is the case, the information given by Ds is exploited, in a straightforward way. (Note that this information is indeed required here. Assume that all we know is the maximum number of attribute values in any model m ∈ b. Then we would not know whether or not these are the same values as set by the action effects, and hence we could not decide whether or not an overflow occurs.) Lines (13) and (14), finally, make sure that D is updated correctly. If an overflow occurs, then all possible ways of minimally repairing the overflow are generated. If no overflow occurs, then D(G, c)
561
simply changes according to the effect and its implications. We have: Theorem 1 Assume a WSC task (Ω, W, C, U) where Φ is binary with consequence-independent cardinality upper bounds. Assume b is a reachable belief, and s is the corresponding search T state. Then: (1) b is defined iff s is defined; (2) if b is defined, then b = Ls ; (3) if b is defined, then Db ≡ Ds . The formal proof of Theorem 1 is quite lenghty, and involves various (sometimes rather tedious) case distinctions. The proof essentially spells out the intuitive arguments given above. Our main result here is that, provided a maximum cardinality is fixed, maintaining belief intersections is tractable: Corollary 2 Assume a WSC task (Ω, W, C, U ) with fixed arity and fixed maximum cardinality, where Φ is binary with consequenceindependent cardinality upper bounds. Assume b is reached by action sequence a. Then the corresponding search state s isT computed in time polynomial in the size of (Ω, W, C, U ) and a, and b = Ls . Note that it is indeed a non-trivial consequence of our particular setting that the behavior is exponential only in the maximum k of any image(G) ≤ k. The enabling properties are: (1) image(G) ≤ k does not interfere in any way with image(H) ≤ k, if G = H; (2) similarly, the bound on the number of y in G(c, y) does not interfere with the bound on y in G(c , y) if c = c ; (3) due to consequenceindependence, no interferences arise from the binary clauses. Example 2 Re-consider Example 1. Running initialise, we get the state s0 where L = {ticketfor (t, Peter ), ticketfor (t, Bob)} and D(ticketfor , t) = {{Peter , Bob}}. Applying a1 , we get s1 = update(s0 , a1 ) where L = {ticketfor (t, Mary)} and D(ticketfor , t) = {{Peter , Mary}, {Bob, Mary}}. Applying a2 , we get s2 = update(s0 , a2 ) where L = {ticketfor (t, Peter )} and D(ticketfor , t) = {{Peter , Mary}, {Bob, Peter }}.
5
Approximate Belief Update
Even though it seems likely that k will be small in practice, it is advisable to look for` more ´ efficient methods. The size of D(G, c) is bounded only by |C| . If there are many constants, then enumerk ating D will become critical even for, say, k > 2. We now tackle this complexity by approximation methods. The search states s are + − + pairs (L− s , Ls ) where Ls and Ls respectively under-approximate and over-approximate the belief intersection. Both approximations are maintained simultaneously because they are interlinked. Depending on how one tests action applicability and solutions, one obtains a pessimistic/sound (but incomplete) planning procedure, or an optimistic/complete (but unsound) planning procedure. We show here only the former; the latter can be obtained by minor modifications. The initialise procedure changes only slightly because, there, no update is performed. In fact the procedure is exactly as shown in Figure 2, except that the returned s takes the form (L, L) where L – the precise belief intersection – serves both as L− and as L+ . Consider Figure 4. Line (1) tests pessimistically whether a is not applicable: the preconditions are tested against L− s . Thereafter, lines (2) and (3) determine the effects and their implications over the binary clauses. Line (4) tests for contradictions in the latter. Similarly, line (6) aborts the algorithm in case of a conflict with a cardinality upper bound (separate treatment of the two kinds of conflicts is justified by Lemma 4). Line (7) adds the consequences of the upper bounds to the implied literals.
562
J. Hoffmann / Towards Efficient Belief Update for Planning-Based Web Service Composition
procedure update(s, a) (1) if prea ⊆ L− s then return (undefined) (2) LA := {l | l appears V in effa } (3) L := {l | Φ|2 ∧ LA |= l} (4) if ex. l s.t. l ∈ L and l ∈ L then return (undefined) (5) for all image(G) ≤ k in Φ, c ∈ C do (6) if |LA |G,c | > k then return (undefined) (7) if |LA |G,c | = k then L := L ∪ {¬G(c, d) | d ∈ C, d ∈ LA |G,c } (8) L− := L ∪ {l | l ∈ L− , l ∈ L} s (9) L+ := L ∪ {l | l ∈ L+ s , l ∈ L} (10) for all image(G) ≤ k in Φ, c ∈ C do (11) if |L− |G,c | > k then A L+ := L+ \ {G(c, d) | G(c, d) ∈ L+ s \L } − A (12) if |(C \ L|−G,c ) ∪ L|G,c | > k then A L− := L− \ {G(c, d) | G(c, d) ∈ L− s \L } (13) return (L− , L+ ) Figure 4.
The update procedure for approximate search states.
Lines (8) and (9) initialise the consideration of old intersection literals. All of those which are not contradicted are taken into a respective approximate set (c.f. Lemmas 1 and 3). Line (11) says that, if even the under-approximation violates a bound, then certainly the old attribute values get lost unless they are protected by the effect (c.f. Lemma 2). Line (12) says that, if the number of constants that could potentially be attribute values violates a bound, then it may happen that the old attribute values get lost, unless they are protected by the effect (c.f. Lemma 2). Note that the order of lines (11) and (12) is important because line (12) changes L− . If one executes (12) before (11), then the condition of (11) is always false, and L+ is still an over-approximation but an unnecessarily generous one. We get: Theorem 2 Assume a WSC task (Ω, W, C, U) where Φ is binary with consequence-independent cardinality upper bounds. Assume b is a reachable belief, and s is the corresponding approximate search state. Then: (1) if T b is undefined, then s is undefined; (2) if s is defined, then L− b ⊆ L+ s ⊆ s . As for Theorem 1, the proof of Theorem 2 is lenghty and involves various case distinctions. Our main result of this section is: Corollary 3 Assume a WSC task (Ω, W, C, U) with fixed arity, where Φ is binary with consequence-independent cardinality upper bounds. Assume b is reached by action sequence a. If the corresponding approximate search state s is defined, then s isT computed in time polynomial in the size of (Ω, W, C, U) and a, and b ⊇ L− s . Example 3 Re-consider Example 1. Running initialise, we get the state s0 where L− = L+ = {ticketfor (t, Peter ), ticketfor (t, Bob) }. Applying a1 , both lines (11) and (12) fire and so we get s1 = update(s0 , a1 ) where L− = L+ = {ticketfor (t, Mary)}. Applying a2 , only line (12) fires and so we get L− = {ticketfor (t, Peter )} and L+ = {ticketfor (t, Mary), ticketfor (t, Peter )}.
6
Related Work
[6] introduces DL-Lite, where the updated belief can be represented in terms of a new ABox computed in polynomial time. DL-Lite is somewhat complementary to binary clauses. Disjunction is allowed only in the form of subsumption rules in the TBox, and is binary in that sense. However, [6] allow unqualified existential quantification, membership assertions (ABox literals) using variables, and updates
involving general (constructed) DL concepts. On the other hand, DLLite does not allow clauses with two positive literals, and DL-Lite (like any DL) does not allow predicates of arity greater than 2. Most importantly, DL-Lite does not allow cardinality upper bounds. [7] considers a variant of DL-Lite where ABox assertions do not allow variables, and hence updates cannot be represented in terms of a new ABox. [7] show that the update from [6] can be re-used to compute the exact set of (restricted) ABox assertions after the update; this approximates the update in the sense that this set of assertions does not suffice to characterize the exact set of models. This is quite different from our approximation techniques as per Section 5, where we use approximation (without exactness guarantees) not to handle a different language, but to obtain efficiency. [9, 10] address planning with belief update semantics (other than the PMA); they do not identify tractable classes.
7
Conclusion
In planning-based WSC, one of the fundamental difficulties is the complexity of computing the outcome of actions. Since practical domain axiomatizations for WSC are often simple, there is hope to tackle this complexity by identifying tractable fragments. We make a first step in this direction, showing how cardinality upper bounds can be handled, in combination with binary clauses. Many questions are left open. For example: Are our algorithms here the best possible ones, or is there an exact update algorithm that is polynomial also in k? Can one efficiently deal with cardinality lower bounds? We hope that some of these questions will be clarified in future work.
REFERENCES [1] F. Baader, C. Lutz, M. Milicic, U. Sattler, and F. Wolter, ‘Integrating description logics and action formalisms: First results’, in AAAI, (2005). [2] G. Brewka and J. Hertzberg, ‘How to do things with worlds: On formalizing actions and plans’, JLC, 3, 517–532, (1993). [3] The OWL Services Coalition. OWL-S: Semantic Markup for Web Services, 2003. [4] T. Eiter and G. Gottlob, ‘On the complexity of propositional knowledge base revision, updates, and counterfactuals’, AI, 57, 227–270, (1992). [5] D. Fensel, H. Lausen, A. Polleres, J. deBruijn, M. Stollberg, D. Roman, and J. Domingue, Enabling Semantic Web Services – The Web Service Modeling Ontology, Springer-Verlag, 2006. [6] G. De Giacomo, M. Lenzerini, A. Poggi, and R. Rosati, ‘On the update of description logic ontologies at the instance level’, in AAAI, (2006). [7] G. De Giacomo, M. Lenzerini, A. Poggi, and R. Rosati, ‘On the approximation of instance level update and erasure in DL’, in AAAI, (2007). [8] M. Ginsberg and D. Smith, ‘Reasoning about action I: A possible worlds approach’, AI, 35, 165–195, (1988). [9] J. Hertzberg and S. Thiebaux, ‘Turning an action formalism into a planner - a case study’, JLC, 4, 617–654, (1994). [10] A. Herzig, J. Lang, P. Marquis, and T. Polacsek, ‘Updates, actions, and planning’, in IJCAI, (2001). [11] Andreas Herzig and Omar Rifi, ‘Propositional belief base update and minimal change’, AI, 115, 107–138, (1999). [12] J. Hoffmann. Towards efficient belief update for planningbased web service composition, 2008. Available at http://members.deri.at/ joergh/papers/tr-ecai08.pdf. [13] J¨org Hoffmann and Ronen Brafman, ‘Conformant planning via heuristic forward search: A new approach’, AI, 170(6–7), 507–541, (2006). [14] H. Liu, C. Lutz, M. Milicic, and F. Wolter, ‘Updating description logic ABoxes’, in KR, (2006). [15] Carsten Lutz and Ulrike Sattler, ‘A proposal for describing services with DLs’, in DL, (2002). [16] N. McCain and H. Turner, ‘A causal theory of ramifications and qualifications’, in IJCAI, (1995). [17] D. E. Smith and D. Weld, ‘Conformant Graphplan’, in AAAI, (1998). [18] M. Winslett, ‘Reasoning about actions using a possible models approach’, in AAAI, (1988).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-563
563
Genetic Optimization of the Multi-Location Transshipment Problem with Limited Storage Capacity Nabil Belgasmi and Lamjed Ben Saïd and Khaled Ghédira1 Abstract. Lateral Transshipments afford a valuable mechanism for compensating unmet demands only with on-hand inventory. In this paper we investigate the case where locations have a limited storage capacity. The problem is to determine how much to replenish each period to minimize the expected global cost while satisfying storage capacity constraints. We propose a RealCoded Genetic Algorithm (RCGA) with a new crossover operator to approximate the optimal solution. We analyze the impact of different structures of storage capacities on the system behaviour. We find that Transshipments are able to correct the discrepancies between the constrained and the unconstrained locations while ensuring low costs and system-wide inventories. Our genetic algorithm proves its ability to solve instances of the problem with high accuracy.
1
INTRODUCTION
Practical optimization problems especially supply chain optimization problems, usually have a complex structure. That is the same in a lot of transport or production related fields [1]. Physical pooling of inventories has been widely used in practice to reduce cost and improve customer service [2]. Transshipments are recognized as the monitored movement of material among locations at the same echelon. It affords a valuable mechanism for correcting the discrepancies between the locations’ observed demand and their on-hand inventory. Subsequently, Transshipments may reduce costs and improve service without increasing the system-wide inventories. The study of multi-location models with Transshipments is an important contribution for mathematical inventory theory as well as for inventory practice. The idea of lateral Transshipments is not new. The first study dates back to the sixties. The twolocation-one-period case with linear cost functions was considered by [3]. [4] studied with N-location-one-period model where the cost parameters are the same for all locations. [5] incorporated non-negligible replenishment lead times and Transshipment lead times among stocking locations to the multilocation model. The effect of lateral Transshipment on the service levels in a two-location-one-period model was studied by [6]. The common problem tackled by these models is the determination of the optimal replenishment decision which minimizes the aggregate cost of the system. Most of the studies lead to optimal solutions since they investigate simple models easily solved by mathematical techniques (see [4]], [7]). However, an optimal replenishment decision for a general multi-location inventory system cannot be solved in analytical way. Few operational research methods were applied to find out near-optimal solutions. The gradient-based IPA method was successfully used for both capacitated Transshipment and production problems [8]. The use of IPA to solve real-world problems is not always possible since 1
Ecole Nationale des Sciences de l’Informatique, Email : khaled.ghedira@isg.rnu.tn
many conditions should be satisfied to ensure the unbiasedness of its estimator [9]. Evolutionary optimization may provide a powerful methodology for solving such complex problems without need of prior knowledge about their analytical properties. The contribution of this paper is twofold. We, first, incorporate storage capacity constraints to the traditional Transshipment model which leads to a better modelling of real-world situations. Second, we investigate the applicability of real-coded evolutionary algorithms to the optimization of inventory levels and costs. This provides insights to tackle other extensions of the basic Transshipment problem with evolutionary optimization methods. The remainder of this paper is organized as follows. In Section 2, we formulate the proposed Transshipment model. In Section 3, we present the main concepts of the evolutionary optimization; we describe the new crossover operator and our evolutionary modelling of the problem. In Section 4, we show our experimental results. In Section 5, we state our concluding remarks.
2 2.1
THE PROBLEM Model description
We consider the following real life problem where we have n stores selling a single product. The stores may differ in their cost and demand parameters. The system inventory is reviewed periodically. At the beginning of the period and long before the demands realization, replenishments take place in store i to increase the stock level up to Si. The storage capacity of each location is limited to Smax,i. In other way, the replenishment quantities should not exceed Smax,i inventory units. This may be due to expensive fixed holding costs, or to the limited physical space of the stores. Thus, the inventory level of store i will be always less or equal to min(Si, Smax,i). After the replenishment, the observed demands Di which represents the only uncertain event in the period are totally or partially satisfied depending on the onhand inventory of local stores. However, some stores may be run out of stock while others still have unsold goods. In such situation, it will be possible to move these goods from stores with surplus inventory to stores with still unmet demands. This is called lateral Transshipment within the same echelon level. It means that stores in some sense share the stocks. The set of stores holding inventory I+ can be considered as temporary suppliers since they may provide other stores at the same echelon level with stock units. Let t ij be the Transshipment cost of each unit sent by store i to satisfy a one-unit unmet demand at store j. In this paper, the Transshipment lead time is considered negligible. After the end of the Transshipment process, if store i still has a surplus inventory, it will be penalized by a per-unit holding cost of hi. If store j still has unmet demands, it will be penalized by a per-unit
564
N. Belgasmi et al. / Genetic Optimization of the Multi-Location Transshipment Problem with Limited Storage Capacity
shortage cost of pj. Fixed cost Transshipment costs are assumed to be negligible in our model. [2] proved that, in the absence of fixed costs, if Transshipments are made to compensate for an actual shortage and not to build up inventory at another store, there exists an optimal base stock policy S* for all possible stationary policies. To see the effect of the fixed costs on a two-location model formulation, see [10]. The following notation is used in our model formulation: n
Number of stores
Si
Order quantities for store i
S
Vector of order quantities, S = (S1, S2, …, Sn) (Decision variable)
Smax,
Maximum storage capacity of store i
Smax
Vector of storage capacities, Smax = (Smax 1, Smax,2, …, Smax n)
Di
Demand realized at i
D
Vector of demands, D = (D1, D2, …, Dn)
hi
Unit inventory holding cost at i
pj
Unit penalty cost for shortage at j
t ij
Unit cost of Transshipment from i to j
Tij
Amount transshipped from i to j
I+
Set of stores with surplus inventory (before Transshipment)
previous period. The optimality of the order-up-to policy in the absence of fixed costs is proven in [2].
2.3
Model formulation
Cost function: Since inventory choices in each store are centrally coordinated, it would be a common interest among the stores to minimize aggregate cost. At the end of the period, the system cost is given by:
C S, D
hi S i
Di
p j Dj
i I
Sj
K S, D
j I
(1) The first and the second term on the right hand side of (1) can be respectively recognized as the total holding cost and shortage cost before the Transshipment. However, the third term is recognized as the aggregate Transshipment profit since every unit shipped from i to j decreases the holding cost at i by hi and the shortage cost at j by pj. However, the total cost is increased by t ij because of the Transshipment cost. Due to the complete pooling policy, the optimal Transshipment quantities Tij can be determined by solving the following linear programming problem:
K S, D
max Tij
hi i I
pj
ij
Tij
(2)
j I
Subject to
Tij
Si
Di
, i
I
(3)
Tij
Dj
Sj
, j
I
(4)
j I
I-
Set of stores with unmet demands (before Transshipment) i I
2.2
Modelling assumptions
Several assumptions are made in this study to avoid pathological cases: Assumption 1 (Transshipment policy): The Transshipment policy is stationary, that is, the Transshipment quantities are independent of the period in which they are made; they depend only on the available inventory after demand observation. In this study, we will employ a Transshipment policy known as complete pooling. This Transshipment policy is described as follow [11]: “the amount transshipped from one location to another will be the minimum between (a) the surplus inventory of sending location and (b) the shortage inventory at receiving location”. The optimality of the complete pooling policy is ensured under some reasonable assumptions [6]. Assumption 2 (Lead time): Transshipment lead times are negligible. At the end of every period, optimal Transshipment quantities are computed. We assume that they are immediately shipped to their destination without making customers wait for long time. Assumption 3 (Replenishment policy): At the beginning of every period, replenishments take place to increase inventory position of store i up to min(Si, Smax,i) taking into account the remaining inventory of the
Tij
0
(5)
In (2), problem K can be recognized as the maximum aggregate income due to the Transshipment. Tij denotes the optimal quantity that should be shipped from i to fill unmet demands at j. Constraints (3) and (4) say that the shipped quantities cannot exceed the available quantities at store i and the unmet demand at store j. Since demand is stochastic, the aggregate cost function is built as a stochastic programming model which is formulated in (6). The objective is to minimize the expected aggregate cost per period.
min S
C S, D
min S
hi Si i I
Di
p j Dj
Sj
K S, D
j I
(6) Subject to
Si
S max,i , i 1...n
(7)
where the first two terms denotes the expected cost before the Transshipment, called Newsvendor2 cost, and the third term 2
The newsvendor model is the basis of most existing Transshipment literature. It addresses the case where Transshipments are not allowed.
N. Belgasmi et al. / Genetic Optimization of the Multi-Location Transshipment Problem with Limited Storage Capacity
denotes the expected aggregate income due to the Transshipment. This proves the important relationship between the newsvendor and the Transshipment problem. By setting very high Transshipment costs, i.e. t ij > hi + pj , no Transshipments will occur. Problem K will then return zero. Thus, our model can deal with both Transshipment and newsvendor cases. Cost function properties: The cost function is stochastic because of the demand randomness modelled by the continuous random variables Di with known joint distributions. Thus we must compute the expected value of the cost function. An analytical tractable expression for problem K given in (2) exists only in the case of a generalized two-location problem or N-location with identical cost structures [4]. In both cases, the open linear programming problem K has an analytical solution. But in the general case (many locations with different cost structures), we can use any linear programming method to solve problem K. In this study, we used the Simplex Method. The mentioned properties of our problem are sufficient to conclude that it is not possible to compute the exact expected values of the stochastic function given in (6). The most common method to deal with noise or randomness is re-sampling or re-evaluation of objective values [12]. With the re-sampling method, if we evaluate a solution S for N times, the estimated objective value is obtained as in equation (8) and the noise is reduced by a factor of N . For this purpose, draw N random scenarios D1,…,DN independently from each other (in our problem, a scenario Dk is equivalent to a vector demand Dk=(D11 ,…,DNN). A sample estimate of f(S), noted E(f(S,D)), is given by
f ( S , D)
3 3.1
f (S )
1 N
N k 1
f (S , D k )
Var[ f ( S )]
N (8)
565
parameters in its chromosome RCGA (Real Coded Genetic Algorithm). The general structure of a GA is: Genetic algorithm Begin t:=0 Initialize P(t) Evaluate P(t) while (not Stop-criterion) do t := t + 1 Select POP(t) from P(t-1) Crossover P(t) Mutate P(t) Evaluate P(t) End-While End. Where t is the current generation, and P(t) is the current population.
3.2
Solution methodology
In our study, a real-coded GA is used to search for optimal replenishment decisions S*, with respect to the storage capacity constraints. In this section, we describe our evolutionary modelling of the constrained multi-location Transshipment problem. Structure of the Individual and population size: Each individual consists of a vector of n genes. It encodes a replenishment decision S. A gene is a positive real parameter representing an order quantity Si. It is easy to see that a population represents a set of replenishment decisions that moves toward regions of the search space that have better fitness values (lower costs). The population size is less than 30 individuals.
EVOLUTIONARY OPTIMIZATION Main concepts
We refer to evolutionary algorithms as methods that handle a population of solutions, iteratively evolve the population by applying phases of self-adaptation and co-operation and employ a coded representation of the solutions. The most suitable evolutionary algorithm to solve optimization problems in continuous domains are Evolutionary Strategies (ES) [13], Genetic Algorithms (GA) [14] with real coding and evolutionary programming [15]. GAs are search methodology invented by Holland [15], which is inspired by the natural genetic theory. They are regarded as methods that are suited for exploring large solution spaces. It is a very effective method for solving realworld problems this success is its simplicity and performance. The main idea of this technique is to generate diverse chromosomes and select the most appropriates to continue. We have an initial population of chromosomes which are produced randomly or by a particular scheme. Then, iteratively, we generate new generations of population out of the previous ones using mutation, crossover and selection. Mutation is designed to generate a new chromosome out of an existing one by randomly changing it. In the crossover two existing chromosomes are combined to generate new chromosomes. Selection will ensure the formation of the new population from the previous population. By applying the mentioned operations, the average fitness of the population will tend to increase over the algorithm lifetime. In many practical problems, chromosomes are coded as real numbers. We call the GA working with real
Fitness evaluation: With respect to the re-sampling method given in (8), we should evaluate each individual N times in order to compute its fitness value. However, this may lead to individuals with different variances, which makes the selection of good individuals not accurate. Thus, in order to get a population with a common estimation Error Rate ER, we repeat the evaluation of each individual until its error estimation rate would be less than ER. We define the error estimation rate as the fraction of the estimated standard deviation and the expected mean of the sampled function at the given design S,
ER S
S f S
(9)
Recall that ER(S) is null when the approximated standard deviation is null. This is the case when the sample size is too large (9). Using the ER measure facilitates the supervision of the accuracy of explored regions of the search space, since neither the standard deviation nor the expected cost is known in advance. We will use ER varying between 0.01 and 2. Initialization: In most of the search algorithms, the initialization method is very important. We have opted for two initialization procedures. The first consists of generating uniformly distributed values for each gene within the domain [0, min(Si, Smax,i)]. The second consists of analytical solving of the newsvendor version of our problem. Then, we initialize each gene with a random value close to the optimal computed solution with respect to the storage capacities.
N. Belgasmi et al. / Genetic Optimization of the Multi-Location Transshipment Problem with Limited Storage Capacity
Crossover: Mating is performed using crossover to combine genes from different parents to produce new children. We have chosen the binary tournament selection to pick out parents for reproduction. Tournament selection runs a tournament between two randomly chosen individuals and selects the winner (individual with best fitness value). Many crossover techniques were studied in evolutionary optimization. We tested 3 existing crossover operators. Let A and B be two selected parents, and a a real number uniformly generated between 0 and 1; Single-point crossover: the chromosomes of the parents are cut at a randomly chosen point and the resulting fragments are swapped. Uniform crossover: each gene of the offspring X is selected randomly from the corresponding genes of the parents. Convex crossovers: offspring X = a.A + (1-a).B Moreover, we proposed a new crossover operator called Gradient-descent crossover (GRD-Crossover) since it creates an offspring following a quasi-descent direction. The first new offspring X is obtained by applying a convex crossover (X is inside the segment [AB]). The second offspring Y depends on the fitness values of the parents. Let CA and CB be fitness values of A and B and assume that CB = CA.. We can suppose that if Y will be in the same direction of the path linking solution A to B, then it may be better than its parents. More properly, X and Y are created as below: X = a.A + (1 – a).B
Y = B – ? .(B – A) Where a is a real number uniformly generated between 0 and 1; ? is a positive uniform random variable that has the same sign as (CB – CA). We implemented all these crossovers and showed that the GRD-Crossover performs well in term of convergence and accuracy. Mutation: Mutation is realized by adding to each gene Si a normally distributed random number centred on 0. This operator alters genes of the selected individuals with a given mutation probability. Because we are dealing with real-valued definition domains (e.g. [0, min(Si, Smax,i)]), all offspring genes that are out of its domains are scaled down as follow: Si := min(Si, Smax,i).
4
OPTIMIZATION RESULTS
In this section, we report on our numerical study. We first analyze the shape of the constrained cost function for a given system setting. We illustrate the spread of the individuals in the first and the tenth generations of the GA. We compare our GRDCrossover with other crossovers and show its ability to perform well and to provide near-optimal solutions. Finally, we analyze the impact of the incorporation of storage capacity in the basic Transshipment model.
4.1
Case study
Our first exemplary inventory model consists of 2 locations with the following parameters: hi=$1, pi=$4, t ij=$0.5 and Di=N(100,20). Location (2) has no storage capacity constraints (Smax,2=8). However, location (1) storage capacity is limited to Smax,1 = 80. We generated 30.000 samples of the cost function with a fixed error rate ER=1%. The average number of evaluations is 450.000. Obviously, an individual consists of 2 genes only, each for one location. The evolutionary optimization process was started with the following parameters: Population size = 30 Number of generations = 40 Crossover rate = 85% Mutation rate = 15% Error rate = 1%
4.2
Experimental design
To show the flexibility of our model, we have studied a 4-location Transshipment system with 7 storage capacities. In all designs, holding costs are equal to $1, shortage costs are equal to $4, Transshipment costs are equal to $0.5 and demands are normally distributed: N (100, 20). Table 1 summarizes the designs characteristics. C-0
Sys
8
Smax,1
Table1. RCGA parameters C-1 C-2 C-3 C-4 C-5 8
100
80
60
C-6
C-7
20
0
40
In system C-0, no material movement is allowed among locations. It represents 4 independent newsvendor problems. System C-1 refers to the basic Transshipment problem with no storage limits. In systems C2-7, only location (1) faces different storage constraints. All the other locations have no such storage constraints. System-wide inventories considerably decrease in comparison to independent newsvendors system. Figure 1 reveals also an important property of multi-location systems with storage capacity constraints, that is the ability of the locations to face heavy storage constraints (Smax,1 = 0). Solidarity and cooperation of some system locations significantly fix the aggregate cost. When analyzing the optimal costs of all settings, we remark that whatever the hardness of the storage capacity (varying from 8 to 0), costs and systemwide inventories in systems where Transshipments are allowed (C-1-7) are less then newsvendor. 700,00
480,00
600,00
470,00
500,00
460,00
400,00
450,00
300,00
440,00
200,00
430,00
100,00
420,00
0,00
410,00 C-0
C-1
C-2
C-3
C-4
C-5
C-6
C-7
Settings Cost (TR)
Cost (NB)
Total Inventory (TR)
Total Inventory (NB)
Figure 1. Cost under different systems
Total inventory
Selection: After evaluating the fitness of each individual, we must select the fittest ones to reproduce and form the population of the next generation. In our case, the best individuals represent the set of replenishment decisions {S*} that ensure low aggregate costs. Many selection methods were studied and used for solving problems. We have chosen a deterministic selection procedure which consists of sorting the individuals and copying the best 10% of them to the mating pool. This protects the best individuals and let them survive until the birth of stronger offspring.
Cost
566
N. Belgasmi et al. / Genetic Optimization of the Multi-Location Transshipment Problem with Limited Storage Capacity
4.3
Validation with a benchmark
We validate our RCGA using an illustrative example from [4] where optimal solutions are available. Recall that the system consists of 4 locations having identical cost structures with a holding cost of $1 per unit, a shortage cost of $4 per unit, and Transshipment cost of $0.10 per unit. There are no storage capacity constraints. Thus, our purpose is to compare the solution given by our RCGA using different crossovers to the optimal solution computed analytically. This can be done by setting infinite storage capacity limits (Smax,i = 8). In figure 2, we found that GRD-Crossover is better than all the other experimented crossovers. It has an important role in finetuning the individuals at the last generations. It performs better than the Convex-crossover though it is partially based on a convex exploitation of the selected parents. Figure 3 shows that best solutions given by the RCGA has a big variance (94<S1<130, 209<S2<274, 155<S3<198 and 148<S4<218) whereas the resulting costs are approximately equal (C = {113.51, 113.90, 113.80, 115.42}). Recall the optimal solution is S*=(109, 222.5, 163.5, 192.5) with a minimal cost of C* = 113.49. This leads to the conclusion that the approximation of the optimal cost value with our RCGA is satisfactory even though the approximation of the optimal order quantities has a great variance. 122,00 120,00
Fitness
118,00 116,00 114,00 112,00 110,00 108,00 1 GRD-X
2
3
4
CVX-X
5
6
7 8 9 10 11 12 13 14 15 Individuals
UNIFORM-X
1-POINT-X
OPT-SOL
Figure 2. Best fitness of the last generation individuals under multiple crossovers 300,00
Inventory level
250,00 200,00 150,00 100,00 50,00 0,00 S1 GRD-X
CVX-X
S2 UNIFORM-X
S3 1-POINT-X
S4 OPT.SOL
Figure 3. Optimal and near-optimal solutions under multiple crossovers
5
CONCLUSION
In this paper, we considered a multi-location Transshipment model with limited storage capacity. The objective is to minimize the aggregate cost function where decision variables are the
567
constrained order-up-to quantities. We modelled the optimal redistribution of inventory in an arbitrary period as a linear programming problem based on the complete pooling policy. We employed a real-coded GA to solve the problem. A new crossover operator based on a simple approximation of the gradient descent is proposed and tested under multiple problem instances. Experiments showed that it outperforms many existent crossovers. An interesting conclusion is that Transshipments offer an important flexibility to systems that faces embarrassing storage capacity limits. The observed results confirm the success of evolutionary algorithms in solving inventory problems. Future studies will be concentrated on two directions: The multi-objective optimization of multi-location systems with storage capacity, where costs, lead times and service level should be optimized. The amelioration of real-coded evolutionary algorithms by incorporating effective search and sensitivity estimation techniques in crossover or mutation operators.
REFERENCES [1] Arnold, J. and Köchel, P., Evolutionary Optimization of the Multilocation Inventory Model with Lateral Transshipments, 1997. [2] Herer, Y.T, Tzur, M., and Yücesan, E., The multi-location Transshipment problem. (Forthcoming in IIE Transactions), 2005. [3] Aggarwal, S.P., Inventory control aspect in warehouses. Symposium on Operations Research, Indian National Science Academy, New Delhi. 1967. [4] Krishnan, K.S. and Rao, V.R.K., Inventory control in N warehouses. J. Industrial. Engineering, Vol.16, No.3, pp. 212-215. 1965. [5] Jonsson, H. and E.A. Silver. Analysis of a Two-Echelon Inventory Control System With Complete Redistribution. Management Science 33, 215-227. 1987. [6] Tagaras, G., Effects of pooling on the optimization and service levels of two-location inventory systems. IIE Trans., Vol. 21, No. 3, pp. 250-257. 1989. [7] Rudi, N., SANDEEP KAPUR, AND DAVID PYKE (1998). A Two–Location Inventory Model with Transhipment and Local Decision Making. [8] Özdemir, D., E. Yücesan, and Y.T. Herer. 2003. Multi-Location Transshipment Problem with capacitated Transportation. Technology Management Area, INSEAD. Proceedings of the 2003 Winter Simulation Conference. [9] Glasserman P., Gradient estimation via perturbation analysis, Kluwer Academic Publishers, Hingham. 1991. [10] Herer, Y. and Rashit, A., Lateral Stock Transshipments in a Twolocation Inventory System with Fixed Replenishment Costs. Department of Industrial Engineering, Tel Aviv University. 1999a [11] Herer, Y. and Rashit, A., Policies in a general two-location infinite horizon inventory system with lateral stock Transshipments. Department of Industrial Engineering, Tel Aviv University. 1999b. [12] H.-G. Beyer. Evolutionary algorithms in noisy environments: Theoretical issues and guidelines for practice. Computer Methods in Applied Mechanics and Engineering, 186(2-4):239267. 2000. [13] Rechenberg, I., Evolution Strategy , in Zuarda et.al. 1994, pp.147159 [14] Fogel, D. B. (1995). Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, New Jork. [15] Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Anna Arbor.
568
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-568
Regression for Classical and Nondeterministic Planning Jussi Rintanen NICTA & the Australian National University Canberra, Australia Abstract. Many forms of reasoning about actions and planning can be reduced to regression, the computation of the weakest precondition a state has to satisfy to guarantee the satisfaction of another condition in the successor state. In this work we formalize a general syntactic regression operation for ground PDDL operators, show its correctness, and define a composition operation based on regression. As applications we present a very simple yet powerful algorithm for computing invariants, as well as a generalization of the hn heuristic of Haslum and Geffner to PDDL.
1
Introduction
Although it is well known that the expressivity of PDDL [13] is required for efficient modeling of many planning problems [14], most planner implementations still restrict to the STRIPS language in which action preconditions are conjunctions of (positive) literals and all effects are unconditional. Anecdotal evidence tells that this is due to the difficulty to reason about actions more general than STRIPS. PDDL can often be efficiently reduced to STRIPS, but certain classes of operators that have disjunctive preconditions or several conditional effects with logically independent antecedents lead to an exponential number of STRIPS operators. Furthermore, reduction to STRIPS is impossible for many generalizations of classical planning: in the presence of partial observability, splitting one operator to several is in general incorrect because in execution time it may not be possible to choose which operator to execute. This provides a strong motivation for the generalization of STRIPS-based algorithms and other planning techniques to more general languages such as PDDL. Our work defines a regression operation for ground PDDL operators and demonstrates its applications to planning. Pednault [15] defines regression for his ADL class of operators but his definition skips over the concrete syntax of what is today known as ADL/PDDL, The key component of the regression operation we define for ground PDDL is Definition 3 that maps a PDDL operator and a state variable to formulae describing the conditions under which the variable becomes true and false. The basis of regression operations is the substitution of a variable by an expression that describes its new value. This was used in the assignment axioms of the Hoare calculus [9] and later by Dijkstra for computing weakest preconditions [4]. The structure of the paper is as follows. Section 2 defines the classical planning problem for ground PDDL, the regression operation and the composition operation, and discusses their formal properties. Section 3 gives applications to invariants and heuristics. Section 4 defines regression for nondeterministic operators, Section 5 discusses related work, and Section 6 concludes the paper.
2
Definitions
Definition 1 Let A be a set of state variables. An operator is a pair p, e where p is a propositional formula over A describing the precondition, and e is an effect, defined recursively as follows. 1. a and ¬a for state variables a ∈ A are effects. 2. e1 ∧ · · · ∧ en is an effect if e1 , . . . , en are effects. 3. c e is an effect if c is a formula and e is an effect. The meaning of conditional effects c e is that effects e take place if the condition c is true. Definition 2 (Execution) Let p, e be an operator over A. Let s : A → {0, 1} be a state. The operator is executable in s if s |= p and the set es is consistent. This set is recursively defined as follows. 1. as = {a} and ¬a Ss = {¬a} for a ∈ A. 2. e1 ∧ · · · ∧ en s = n i=1 ei s . 3. c es = es if s |= c and c es = ∅ otherwise. An operator p, e induces a partial function Rp, e on states: states s and s are related by Rp, e if s |= p and s is obtained from s by making the literals in es true and retaining the truth-values of state variables not occurring in es . Define exco (s) = s by sR(o)s and exco1 ;...;on (s) = excon (. . . exco1 (s) . . .). The main application of regression is in backward-search in which the basic step, computing a formula that represents the predecessor states (the new subgoal), is regression. The key component of regression for PDDL-style operators is given next. Definition 3 We recursively define the condition El (e) of literal l made true by an operator with the effect e as follows. El (l) = El (l ) = ⊥ when l = l (for literals l ) El (e1 ∧ · · · ∧ en ) =El (e1 ) ∨ · · · ∨ El (en ) El (c e) = c ∧ El (e) The symbols and ⊥ denote true and false, respectively. The case El (e1 ∧ · · · ∧ en ) = El (e1 ) ∨ · · · ∨El (en ) is defined as a disjunction because it is sufficient that at least one effect makes l true. Definition 4 Let A be the set of state variables. We define the condition El (o) of operator o =Vp, e being executable so that literal l is made true as p ∧ El (e) ∧ a∈A ¬(Ea (e) ∧ E¬a (e)). The third conjunct in the formula requires that no state variable is made both true and false. The formula El (e) indicates in which states the literal l is made true by e. It is closely related to es .
569
J. Rintanen / Regression for Classical and Nondeterministic Planning
Lemma 5 Let A be the set of state variables, s a state on A, l a literal on A, and o an operator with effect e. Then 1. l ∈ es if and only if s |= El (e), and 2. exco (s) is defined and l ∈ es if and only if s |= El (o). The formula Ea (e) ∨ (a ∧ ¬E¬a (e)) expresses the truth of a ∈ A after the execution of e in terms of truth-values of state variables before the execution: either a becomes true, or a is true before and does not become false. Lemma 6 Let a ∈ A be a state variable, o = p, e ∈ O an operator, and s and s = exco (s) states. Then s |= Ea (e) ∨ (a ∧ ¬E¬a (e)) if and only if s |= a. Definition 7 (Regression) Let φ be a propositional formula and o = p, e an operator. The regression of φ with respect to o is V rgo (φ) = φr ∧ p ∧ χ where χ = a∈A ¬(Ea (e) ∧E¬a (e)) and φr is obtained from φ by replacing every a ∈ A by Ea (e) ∨ (a ∧ ¬E¬a (e)). Define rge (φ) = φr ∧ χ and rgo1 ;...;on (φ) = rgo1 (· · · rgon (φ) · · · ). The formula χ corresponds to the requirement that es is consistent for an operator to be executable. The reason why regression is useful is that it allows to compute the predecessor states by simple formula manipulation. Next we formalize the important property of regression. Theorem 8 Let φ be a formula over A, o an operator over A, and S the set of all states i.e. valuations of A. Then {s ∈ S|s |= rgo (φ)} = {s ∈ S| exco (s) |= φ}. Proof: We show that for any state s, s |= rgo (φ) if and only if exco (s) is defined and exco (s) |= φ. By definition rgo (φ) = φr ∧ p ∧ χ for o = p, e where φr is obtainedV from φ by replacing each a ∈ A by Ea (e)∨ (a ∧¬E¬a (e)) and χ = a∈A ¬(Ea (e) ∧E¬a (e)). First we show that s |= c ∧ χ if and only if exco (s) is defined. s |= c ∧ χ iff s |= c and {a, ¬a} ⊆ es for all a ∈ A iff exco (s) is defined The two equivalences are respectively by Lemma 5 and Definition 2. Then we show that s |= φr if and only if exco (s) |= φ. This is by structural induction over subformulae φ of φ and formulae φr obtained from φ by replacing a ∈ A by Ea (e) ∨ (a ∧ ¬E¬a (e)) Induction hypothesis: s |= φr if and only if exco (s) |= φ . Base case 1, φ = : Now φr = and both are true in the respective states. Base case 2, φ = ⊥: Now φr = ⊥ and both are false in the respective states. Base case 3, φ = a for some a ∈ A: Now φr = Ea (e) ∨ (a ∧ ¬E¬a (e)). By Lemma 6 s |= φr if and only if exco (s) |= φ . Inductive case 1, φ = ¬θ: By the induction hypothesis s |= θr iff exco (s) |= θ. Hence s |= φr iff exco (s) |= φ by the truthdefinition of ¬. Inductive case 2, φ = θ ∨ θ : By the induction hypothesis s |= θr iff exco (s) |= θ, and s |= θr iff exco (s) |= θ . Hence s |= φr iff exco (s) |= φ by the truth-definition of ∨. Inductive case 3 for φ = θ ∧ θ goes like the previous case. It may appear that for n consecutive regression steps the size of the formula grows exponentially, as each variable occurrence may be replaced by a bigger formula containing several variables. However, if the formula is represented in the circuit form instead of a treelike formula, each variable occurs at most once. Hence a sequence
of regression steps only leads to a worst-case polynomial increase in size. The circuits can often be simplified to keep them small, and in special cases, like STRIPS operators, there is a constant upper bound on the size of formulae/circuits. In addition to being the basis of backward search, regression has many other applications in reasoning about sequences of actions. Central questions concern the relation between a given action and a given sequence of actions: whether they are executable in exactly the same states and whether they have the same effects. This is the basis of computing macro-actions [10] and the elimination of redundant actions [8]. Answering this question requires the composition of a sequence of two or more operators. The composition o1 ◦ o2 of o1 = p1 , e1 and o2 = p2 , e2 is an operator that behaves like applying o1 followed by o2 . For a to be true after o2 we can regress a with respect to o2 , obtaining Ea (e2 ) ∨ (a ∧ ¬E¬a (e2 )). Condition for this formula to be true after o1 is obtained by regressing with e1 , leading to rge1 (Ea (e2 ) ∨ (a ∧ ¬E¬a (e2 ))) = rge1 (Ea (e2 )) ∨ (rge1 (a) ∧ ¬rge1 (E¬a (e2 ))) = rge1 (Ea (e2 )) ∨ ((Ea (e1 ) ∨ (a ∧ ¬E¬a (e2 ))) ∧ ¬rge1 (E¬a (e2 ))). Since we want to define an effect φ a of o1 ◦ o2 so that a becomes true whenever o1 followed by o2 would make it true, the formula φ does not have to represent the case in which a is true already before the execution of o1 ◦ o2 . Hence we can simplify the above formula to rge1 (Ea (e2 )) ∨ (Ea (e1 ) ∧ ¬rge1 (E¬a (e2 ))). An analogous formula is needed for making ¬a false. This leads to the following definition. Definition 9 (Composition) Let o1 = p1 , e1 and o2 = p2 , e2 be two operators on A. Then their composition o1 ◦ o2 is defined as * + ^ ((rg (Ea (e2 )) ∨ (Ea (e1 ) ∧ ¬rg (E¬a (e2 )))) a)∧ e1 e1 p, ((rge1 (E¬a (e2 )) ∨ (E¬a (e1 ) ∧ ¬rge1 (Ea (e2 )))) ¬a) a∈A
where p = rgo1 (p2 ) ∧
V a∈A
¬ (Ea (e1 ) ∧ E¬a (e1 )).
Example 10 Consider o = , (¬b0 b0 ) ∧ (¬b1 ∧ b0 (b1 ∧ ¬b0 )) ∧ (¬b2 ∧ b1 ∧ b0 (b2 ∧ ¬b1 ∧ ¬b0 ) which increments a 3-bit binary number by 1.1 The composition of o with itself, representing increment by 2, is (after applying the De Morgan laws) , ((¬b2 ∨ ¬b1 ) ∧ b0 ) ∨ (¬b0 ∧ b2 ∧ b1 ) b0 )∧ ((¬b0 ∧ b1 ∧ ¬b2 )∨ (((b2 ∧ b1 ) ∨ ¬b0 ) ∧ ((¬b1 ∨ (b0 ∧ ¬b2 )) ∧ (¬b0 ∨ b1 ))) ¬b0 )∧ ((((b2 ∧ b1 ) ∨ ¬b0 ) ∧ ((¬b1 ∨ (b0 ∧ ¬b2 )) ∧ (¬b0 ∨ b1 ))) ∨(b0 ∧ ¬b1 ) b1 )∧ ((¬b0 ∧ b1 ∧ ¬b2 ) ∨ (b0 ∧ ¬b2 ∧ b1 ) ¬b1 )∧ ((¬b0 ∧ b1 ∧ ¬b2 ) ∨ (b0 ∧ ¬b2 ∧ b1 ) b2 ).
Further logical simplification and elimination of redundant conditional effects and simplifying unnecessary conditions yields , (¬b0 ∧ b2 ∧ b1 b0 )∧ (¬b1 b1 )∧ (¬b2 ∧ b1 ¬b1 )∧ (¬b2 ∧ b1 b2 ).
Theorem 11 Let o1 and o2 be operators and s a state. Then exco1 ◦o2 (s) is defined if and only if exco1 ;o2 (s) is defined, and exco1 ◦o2 (s) = exco1 ;o2 (s). 1
Notice that 111 is not incremented further.
570
3
J. Rintanen / Regression for Classical and Nondeterministic Planning
Applications
3.1
Invariants
Very interestingly, the regression operation can be used as the main component of a powerful and intuitive algorithm for computing invariants. An invariant property of a planning problem is satisfied by every state that is reachable from the initial state(s). An equivalent inductive definition states that a property is invariant if the initial states satisfy it and every action preserves it. Main applications of invariants are planning by SAT and CSPs [11] in which invariants help to prune the search space, the validation of domain models in which invariants give information about dependencies between state variables, inexpensive incomplete tests for unreachability, and the computation of heuristics. We generalize the inductive algorithm [16] to general operators. The novelty is the extremely simple structure of the algorithm given the generality of the operator definition. The algorithm invariants(A,I,O,n) in Figure 1 computes invariants with at most n literals for operators O and an initial state I over state variables A. The runtimes increase quickly as n is increased and in practice one can use n = 2 or n = 3. We define lits(l1 ∨ · · · ∨ ln ) = {l1 , . . . , ln }. The loop on line 5 is repeated until there are no o ∈ O and clauses c ∈ C such that C ∪ {rgo (¬c)} is satisfiable. Lemma 12 Let C be a set of clauses, φ a formula, and o an operator. If C ∪ {rgo (¬φ)} is unsatisfiable, then exco (s) |= φ for all states s such that s |= C and o is executable in s. Proof: Easy corollary of Theorem 8. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
procedure invariants(A, I, O, n); C := {a ∈ A|I |= a} ∪ {¬a|a ∈ A, I |= a}; repeat C := C; for each o ∈ O and c ∈ C s.t. C ∪ {rgo (¬c)} ∈ SAT do C := C\{c}; if |lits(c)| < n then begin (* Add weaker clauses. *) C := C ∪ {c ∨ a | a ∈ A} ∪ {c ∨ ¬a | a ∈ A}; end end do until C = C ; return C; Figure 1. Algorithm for computing a set of invariant clauses
On lines 7 and 9, when a clause c is not guaranteed to hold, weaker clauses c ∨ l may be, so replace c by all clauses that are weaker by having one more literal. If these clauses don’t hold either, they will be similarly removed and replaced by weaker ones. Theorem 13 Let A be a set of state variables, I a state, O a set of operators, and n ≥ 1 an integer. Then the procedure invariants(A, I, O, n) returns a set C of clauses with at most n literals so that exco1 ;...;om (I) |= C for any sequence o1 ; . . . ; om of operators from O. Proof: Let C0 be the value assigned to the variable C on line 2 in the procedure and C1 , C2 , . . . be the values of the variable in the end of each iteration of the outermost repeat loop.
Induction hypothesis: for every {o1 , . . . , oi } ⊆ O and c ∈ Ci , exco1 ;...;oi (I) |= c. Base case i = 0: exc (I) for the empty sequence is by definition I itself, and by construction C0 consists of only formulae that are true in the initial state. Inductive case i ≥ 1: Take any {o1 , . . . , oi } ⊆ O and c ∈ Ci . Analyze two cases. 1. If c ∈ Ci−1 , then by the induction hypothesis exco1 ;...;oi−1 (I) |= c. Since c ∈ Ci it must be that Ci−1 ∪ {rgoi (¬c)} is unsatisfiable. Hence by Lemma 12 exco1 ;...;oi (I) |= c. 2. If c ∈ Ci−1 , it must be because Ci−1 ∪ {rgo (¬c )} is satisfiable for some o ∈ O and c ∈ Ci−1 such that c is obtained from c by conjoining some literals to it and c |= c. Since c ∈ Ci−1 , by the induction hypothesis exco1 ;...;oi−1 (I) |= c . Since c |= c also exco1 ;...;oi−1 (I) |= c. Since Ci−1 ∪ {rgoi (¬c)} is unsatisfiable, exco1 ;...;oi (I) |= c by Lemma 12. This finishes the induction proof. The iteration of the procedure stops when Ci = Ci−1 , meaning that the claim of the theorem holds for arbitrarily long sequences o1 ; . . . ; om . To make the algorithm run in polynomial time the satisfiability and logical consequence tests should be performed by algorithms that approximate these tests in polynomial time. If restricted to STRIPS operators, the inductive invariant computation [16] is obtained by implementing the satisfiability test C ∪ {rgo (¬c)} as an incomplete test by unit resolution. More generally, it may be useful to have a stronger tractable satisfiability test. The proof of Theorem 13 remains valid as long as the incomplete satisfiability test does not falsely indicate unsatisfiability for a satisfiable set. Inference of facts that hold at given time points was first considered in the GraphPlan algorithm of Blum and Furst in the form of mutexes [1]. This planning graph construction, similarly to early algorithms for computing invariants [5, 16] restricts to STRIPS operators. Later works have considered more general classes of operators [6, 12] adopting the inductive definition definition of invariants first used in [1, 16]. Gerevini and Schubert [6] consider conditional effects but no disjunctions. Lin [12] tries to find invariants for a class of problems by looking at problem instances with a small state space and eliminating candidate invariants if they are falsified by the chosen problem instances.
3.2
Haslum and Geffner’s hn
Our invariant algorithm computes a generalization of Haslum & Geffner’s hn heuristic [7] which is defined for STRIPS only. An estimate for the distance of any formula φ (precondition or goal) is k if φ is satisfiable with Ck but not with Ck−1 (an incomplete satisfiability test can be used without sacrificing the admissibility of the heuristic.) Haslum and Geffner’s estimate Gn (V ) for the distance of a set V of variables from the initial state can be expressed in terms of our sets Ci when our parameter n equals m: for V = {a1 , . . . , am }, Gn (V ) = k iff there is ¬b1 ∨ · · · ∨ ¬bj ∈ Ck−1 such that {b1 , . . . , bj } ⊆ V and there is no such clause in Ck , and Gn (V ) = 0 if ¬a ∈ C0 for all a ∈ V . Haslum and Geffner define states as subsets of the set A of all state variables. We will call this kinds of states h-states to distinguish them from our definition of states. Haslum and Geffner define R(V ) as the set of pairs (B, o) such that the operator o reaches a h-state V from a h-state B. This is essentially a simple regression operation for STRIPS. We ignore the operator o (because we don’t need it for costs
J. Rintanen / Regression for Classical and Nondeterministic Planning
unlike Haslum and Geffner who consider non-unitary costs) and define R(V ) simply as all the minimal sets of variables that have to be true for variables V to be true after executing one of the operators. Now R(V ) has the following property. V V Lemma 14 For all B ∈ R(V ), a∈B a |= rgo ( a∈V a). The definition of the heuristic is as follows. For V ⊆ A let if V ⊆ I Gn (V ) = 0 Gn (V ) = minB∈R(V ) (1 + Gn (B)) if |V | ≤ n and V ⊆ I Gn (V ) = maxB⊂V,|B|=n Gn (B) if |V | > n. Theorem 15 For a STRIPS problem, let Ci be the sets computed by the algorithm in Figure 1 as explained in the proof of Theorem 13. Let V ⊆ A be a set of variables. If Gn (V ) = k for any k ≥ 1, then Ck−1 ∪ V is unsatisfiable and Ck ∪ V is satisfiable, and Gn (V ) = 0 iff C0 ∪ V is satisfiable. Proof: We give a proof sketch. Induction hypothesis: for every i ≥ 0, for any V ⊆ A, 1. if Gn (V ) = i then Ci ∪ V is satisfiable, 2. if Gn (V ) = i then Cj ∪ V is unsatisfiable for j ∈ {0, . . . , i − 1}. Base case i = 0: Let V ⊆ A be any set of variables. 1. If Gn (V ) = 0 then V ⊆ C0 . Since C0 is satisfiable, also C0 ∪ V is satisfiable. 2. Holds trivially because {0, . . . , i − 1} = ∅. Inductive case i ≥ 1: Remark A. If Ci |= ¬a1 ∨ · · · ∨ ¬ak , then ¬b1 ∨ · · · ∨ ¬bm ∈ Ci for some {b1 , . . . , bm } ⊆ {a1 , . . . , ak }. 1. Assume Gn (V ) = i. Then there is an operator o that reaches the h-state V from a h-state B such that Gn (B) = i − 1. Since Gn (B) = i − 1, by theVinduction hypothesis V Ci−1 ∪ B is satisfiable. By Lemma 14 a |= rg ( o a∈B a∈V a). Hence also V Ci−1 ∪ {rgo ( a∈V a)} is satisfiable. Hence when constructing Ci the algorithm removes all clauses ¬b1 ∨ · · · ∨ ¬bj such that {b1 , . . . , bj } ⊆ V . Hence by Remark A Ci ∪ V is satisfiable. 2. Assume Gn (V ) = i ≥ 1. Then Gn (B) ≥ i − 1 for all h-states B and operators that reach V from B. If i > 1, then by the induction hypothesis Ci−2 ∪B is unsatisfiable for any such B, and there is a clause ¬b1 ∨ · · · ∨ ¬b V j ∈ Ci−2 such that {b1 , . . . , bj } ⊆ B. Hence Ci−2 ∪ {rgo ( a∈V a)} is unsatisfiable for every o ∈ O. Therefore the clauses in Ci−1 that contradict V are not removed, and hence Ci−1 ∪V is unsatisfiable. If i = 1, then Gn (V ) > 0 because V ⊆ I, and hence C0 ∪ V is unsatisfiable. Hence Ci−1 ∪ V is in both cases unsatisfiable.
4
Regression for Non-Deterministic Operators
Based on the regression operation for deterministic operators in Definition 7 regression for a class of nondeterministic operators can be defined. The operators’ effects have nondeterministic choice e1 | · · · |en between two or more deterministic effects e1 , . . . , en . Definition 16 Let φ be a formula and o = p, e1 | · · · |en an operator where e1 , . . . , en are deterministic. Define rgnd o (φ) = rg
p,e1 ! (φ) ∧ · · · ∧ rg
p,en ! (φ).
571
Theorem 17 Let φ be a formula over A, o an operator over A, and S the set of all states over A. Then for all s ∈ S, s |= rgnd o (φ) if and only if all possible successor states s of s satisfy φ. Proof: This follows from the fact that each p, ei represents one possible outcome the nondeterministic action may have, rg p,ei ! (φ) represents all the states from which φ is reached by p, ei , and the intersection of these sets is exactly the set of states from which φ is reached no matter which outcome is the actual one.
Example 18 Let o = d, b|¬c. Then rgnd o (b ↔ c) = rg d,b! (b ↔ c) ∧ rg d,¬c! (b ↔ c) = (d ∧ ( ↔ c)) ∧ (d ∧ (b ↔ ⊥)) ≡ d ∧ c ∧ ¬b. Applications of the nondeterministic regression operation are similar to the deterministic one. Most notably, backward-search algorithms for planning with partial observability can be based on it.
5
Related Work
Regression is closely related to other forms of manipulation of formulae for computing the images or preimages of sets of states. We discuss some of the most closely related and some of the very recent related work and contrast them to regression.
5.1
Symbolic Pre-Images
General forms of reasoning about actions by the computation of images and preimages, leading to logic-based algorithms for computing sets of reachable states, has many applications and was originally introduced in the context of computer-aided verification as a technique for model-checking [3, 2]. Preimage computation is essentially regression whereas images are successors of sets of states. Let A = {a1 , . . . , an }, A = {a1 , . . . , an } and A = {a1 , . . . , an }. The variables in A refer to the values of state variables in a state and the variables in A to the values in a successor state. Formulae φ over A∪A can represent arbitrary binary relations on the set of all states. The translation V of a deterministic operator V p, e into a formula is τA (o) = p ∧ a∈A ¬(Ea (e) ∧E¬a (e)) ∧ a∈A (a ↔ (Ea (e) ∨ (a ∧ ¬E¬a (e)))). The first two conjuncts express the conditions for the executability of the operator (truth of the precondition and the consistency of the effects) and the third conjunct expresses the new value of each state variable in terms of the old values of state variables. With respect to an operator o the successor or predecessor states of a set of states, represented as a formula φ, can be computed by syntactic manipulation of φ and τA (o). The basic logical step in this computation is that of existential abstraction which eliminates the occurrences of one variable in a formula. It is defined by ∃x.φ = φ[/x] ∨ φ[⊥/x] where φ[θ/x] means replacing all occurrences of x in φ by θ. Definition 19 Let o be an operator and φ a formula. Define imgo (φ) = (∃A.(φ ∧ τA (o)))[A/A ] preimgo (φ) = ∃A .(τA (o) ∧ φ[A /A])
572
J. Rintanen / Regression for Classical and Nondeterministic Planning
Above φ[A /A] denotes substitution of each a ∈ A in φ by the corresponding variable a ∈ A . Not surprisingly, there is a close connection between preimages and regression.
Acknowledgements The research was funded by Australian Government’s Department of Broadband, Communications and the Digital Economy and the Australian Research Council through NICTA.
Theorem 20 rgo (φ) ≡ preimgo (φ).
REFERENCES
Example 21 Let A = {a, b, c}. Let o = c, a ∧ (a b). Then rgo (a ∧ b) = c ∧ ( ∧ (b ∨ a)) ≡ c ∧ (b ∨ a). The formula corresponding to o is τA (o) = c ∧ a ∧ ((b ∨ a) ↔ b ) ∧ (c ↔ c ). The preimage of a ∧ b with respect to o is represented by ∃a b c .(τA (o) ∧ (a ∧ b )) ≡ ∃a b c .(c ∧ a ∧ ((b ∨ a) ↔ b ) ∧(c ↔ c ) ∧ a ∧ b ) ≡ ∃a b c .(a ∧ b ∧ c ∧ (b ∨ a) ∧ c ) ≡ ∃b c .(b ∧ c ∧ (b ∨ a) ∧ c ) ≡ ∃c .(c ∧ (b ∨ a) ∧ c ) ≡ c ∧ (b ∨ a) This connection between preimages and regression is best understood based on the equivalence a ↔ (Ea (e) ∨ (a ∧ ¬E¬a (e))) in the definition of τA (o): it corresponds to the substitution in the definition of regression. The advantage of regression is that no existential abstraction is needed, and the disadvantage is that it is restricted to operators/relations that can be represented as a conjunction of equivV alences a∈A a ↔ φa .
5.2
C-Filter of Shahaf and Amir
Shahaf and Amir [17] present C-Filtering for computing (an implicit representation of) the image of a set of states with respect to a sequence of actions. Shahaf and Amir hint at a connection between CFiltering and regression but do not clarify it. The C-Filter is simply the use of regression to test facts about a belief state B reached from an initial belief state I by a sequence of actions o1 , . . . , on . Instead of explicitly constructing B by image computation, facts relating to B are queried by regressing them to queries about the initial state. For example, to test whether B ∩ B = ∅ for some belief state B expressed as a formula φ, test the non-emptiness of the intersection by a satisfiability test of I ∧ rgo1 ;...;on (φ). Shahaf and Amir claim as the novelty of C-Filtering the incremental construction of the substitutions rgo1 ;...;on (a)/a as the action sequence o1 , . . . , on , . . . progresses as well as the representation of the required formulae as Boolean circuits.
6
Conclusions
We have defined regression and composition operations for PDDL operators and a regression operation for nondeterministic actions. We have also discussed applications of general regression operations in connection with macro-actions, elimination of irredundant operators, invariants and heuristics. In particular, we gave an algorithm for computing invariants for a general definition of actions that includes disjunctive preconditions and conditional effects. The algorithm is powerful yet conceptually extremely simple, and its power can be traded to efficiency by controlling the accuracy and asymptotic runtime of approximate satisfiability tests. The algorithm also yields a generalization of the hn heuristic [7].
[1] Avrim L. Blum and Merrick L. Furst, ‘Fast planning through planning graph analysis’, Artificial Intelligence, 90(1-2), 281–300, (1997). [2] J. R. Burch, E. M. Clarke, D. E. Long, K. L. MacMillan, and D. L. Dill, ‘Symbolic model checking for sequential circuit verification’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 13(4), 401–424, (1994). [3] Olivier Coudert, Christian Berthet, and Jean Christophe Madre, ‘Verification of synchronous sequential machines based on symbolic execution’, in Automatic Verification Methods for Finite State Systems, International Workshop, Grenoble, France, June 12-14, 1989, Proceedings, ed., Joseph Sifakis, volume 407 of Lecture Notes in Computer Science, pp. 365–373. Springer-Verlag, (1990). [4] Edsger W. Dijkstra, ‘Guarded commands, nondeterminacy and formal derivation of programs’, Communications of the ACM, 18(8), 453–457, (1975). [5] Alfonso Gerevini and Lenhart Schubert, ‘Inferring state constraints for domain-independent planning’, in Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98) and the 10th Conference on Innovative Applications of Artificial Intelligence (IAAI-98), pp. 905–912. AAAI Press, (1998). [6] Alfonso Gerevini and Lenhart K. Schubert, ‘Discovering state constraints in DISCOPLAN: Some new results’, in Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-2000) and the 12th Conference on Innovative Applications of Artificial Intelligence (IAAI-2000), pp. 761–767. AAAI Press, (2000). [7] Patrik Haslum and H´ector Geffner, ‘Admissible heuristics for optimal planning’, in Proceedings of the Fifth International Conference on Artificial Intelligence Planning Systems, eds., Steve Chien, Subbarao Kambhampati, and Craig A. Knoblock, pp. 140–149. AAAI Press, (2000). [8] Patrik Haslum and Peter Jonsson, ‘Planning with reduced operator sets’, in Proceedings of the Fifth International Conference on Artificial Intelligence Planning Systems, eds., Steve Chien, Subbarao Kambhampati, and Craig A. Knoblock, pp. 150–158. AAAI Press, (2000). [9] C. A. R. Hoare, ‘An axiomatic basis for computer programming’, Communications of the ACM, 12(10), 576–580, (1969). [10] Glenn A. Iba, ‘A heuristic approach to the discovery of macrooperators’, Journal of Machine Learning, 3(4), 285–317, (1989). [11] Henry Kautz and Bart Selman, ‘Planning as satisfiability’, in Proceedings of the 10th European Conference on Artificial Intelligence, ed., Bernd Neumann, pp. 359–363. John Wiley & Sons, (1992). [12] Fangzhen Lin, ‘Discovering state invariants’, in Principles of Knowledge Representation and Reasoning: Proceedings of the Ninth International Conference (KR 2004), eds., Didier Dubois, Christopher A. Welty, and Mary-Anne Williams, pp. 536–544. AAAI Press, (2004). [13] Drew McDermott, ‘The Planning Domain Definition Language’, Technical Report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, Yale University, (October 1998). [14] Bernhard Nebel, ‘On the compilability and expressive power of propositional planning formalisms’, Journal of Artificial Intelligence Research, 12, 271–315, (2000). [15] Edwin P. D. Pednault, ‘ADL and the state-transition model of action’, Journal of Logic and Computation, 4(5), 467–512, (1994). [16] Jussi Rintanen, ‘A planning algorithm not based on directional search’, in Principles of Knowledge Representation and Reasoning: Proceedings of the Sixth International Conference (KR ’98), eds., A. G. Cohn, L. K. Schubert, and S. C. Shapiro, pp. 617–624. Morgan Kaufmann Publishers, (June 1998). [17] Dafna Shahaf and Eyal Amir, ‘Logical circuit filtering’, in Proceedings of the 20th International Joint Conference on Artificial Intelligence, ed., Manuela Veloso, pp. 2611–2618. AAAI Press, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-573
573
Combining Domain-Independent Planning and HTN Planning: The Duet Planner Alfonso Gerevini† and Ugur Kuter‡ and Dana Nau‡ and Alessandro Saetti† and Nathaniel Waisbrot‡∗ Abstract. Despite the recent advances in planning for classical domains, the question of how to use domain knowledge in planning is yet to be completely and clearly answered. Some of the existing planners use domain-independent search heuristics, and some others depend on intensively-engineered domain-specific knowledge to guide the planning process. In this paper, we describe an approach to combine ideas from both of the above schools of thought. We present Duet, our planning system that incorporates the ability of using hierarchical domain knowledge in the form of Hierarchical Task Networks (HTNs) as in SHOP2 [14] and using domain-independent local search techniques as in LPG [8]. In our experiments, Duet was able to solve much larger problems than LPG could solve, with only minimal domain knowledge encoded in HTNs (much less domain knowledge than SHOP2 needed to solve those problems by itself).
1
Introduction
Most classical planners fall into one of two categories: planners that use domain-independent knowledge, i.e., that work in any classical planning domain, and planners that can exploit domain-specific knowledge. It has been shown, both theoretically and experimentally, that each approach has its own advantages and disadvantages: • A planner that can exploit domain-specific knowledge in order to guide its planning can solve much larger planning problems and can generally solve them much faster than the planners that don’t use such knowledge. The biggest downside of such planning systems, however, is that they require an expert human to give them extensive knowledge about how to solve planning problems in the planning domain at hand. Usually this knowledge is expressed using either temporal logic (e.g., TLPlan [1] and TALplanner [13]) or task decomposition (e.g., SHOP2 [14], SIPE-2 [17], and OPLAN [6]), and might not be easy for the general user to specify. • A planner that uses domain-independent heuristic information (e.g., FF [11], AltAlt [15], SGPlan [5], HSP [3], FastDownward [10], and LPG [8]) usually does not need expert-provided domain knowledge, since the planner itself computes a heuristic for each domain. This makes the domain formalization simpler and the planner easier to use; but the planner may often perform much worse than a planner that exploits specific domain knowledge. In this paper, we describe Duet, a new planning system that combines the advantages of using domain-independent heuristics † Dipartimento di Elettronica per l’Automazione, Universit´ a degli Studi di Brescia, Via Branze 38, I-25123 Brescia, Italy. ‡ Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, Maryland 20742, USA. ∗ Corresponding author (email:waisbrot@cs.umd.edu)
and domain-specific knowledge, while avoiding their drawbacks.1 To accomplish this, Duet incorporates adaptations of two wellknown planners: LPG, which uses domain-independent heuristics in a stochastic local search engine [8], and SHOP2, which uses domainspecific Hierarchical Task Networks (HTNs) to organize its search space [14]. We extended the SHOP2 and LPG formalisms to allow the planners to communicate in Duet by generating subgoals of a planning problem for each other. Duet organizes the planning process by passing these subgoals to the individual planners until no subgoals are left to achieve. We present our experiments with Duet on a new planning domain, called Museums. The Museums domain was inspired by the realworld operations of acquiring and relocating art objects among a set of museums around the world. The domain combines aspects of the well-known Logistic and Tower of Hanoi (ToH) problems. The objective is to use trucks to move various art objects from museums to other museums. When a truck comes to a museum to load or unload objects, there are three places to put the objects: the truck, and two pallets at the museum’s loading dock. An object’s placement depends on its fragility: fragile art objects must be placed on less-fragile ones. Thus, loading and unloading correspond to solving ToH problems. The rationale for using the Museums domain in the evaluation of Duet was that we observed that it is challenging for state-of-the-art planners and it includes two kinds of subproblems: domain-specific knowledge isn’t needed to plan the truck movements, but is needed to plan the loading and unloading operations, since the ToH problem is hard for many domain-independent planners including LPG. In our experiments, we varied the amount of HTN-based domainspecific knowledge available to Duet and compared its performance with LPG’s and SHOP2’s performance as stand-alone planners. Even with just a small amount of domain-specific knowledge (e.g., “choose the least-fragile object and move it to the target museum”), Duet usually generated solutions faster than LPG. With more domain-specific problem-solving knowledge (e.g., how to properly stack art objects on top of each other), Duet ran faster and solved more problems than both LPG and SHOP2. Although SHOP2’s performance could have been improved, this would have required much more time for hand-crafting its knowledge base.
2
Preliminaries
Our definitions of classical states, planning operators, planning domains and problems are based on those in [9]. Below we’ll summarize the definitions at the semantic level; for syntactic details see [9]. 1 In that sense, it is closely related to the recently-proposed “Model-Lite Planning” approach [12, 16], which aims to develop techniques that do not require intensive domain knowledge but still are practical.
574
A. Gerevini et al. / Combining Domain-Independent Planning and HTN Planning: The Duet Planner
In addition to classical planning operators and actions (i.e., ground instances of planning operators), we define an abstract planning operator as a triple (t, Pre, Eff), where Pre and Eff are the preconditions and the effects of the abstract operator (described as logical formulas over literals), and t is an expression (name, arg1 , . . . , argn ), where name is the abstract operators’ name and arg1 , . . . , argn are the arguments (variables and/or constant symbols). An abstract action is a ground instance of an abstract planning operator. A plan is a sequence of actions that are either classical or abstract. A planning domain is a triple Σ = (S, A, γ) where S and A are the sets of states and actions (classical and abstract), and γ : S × A → S is the state-transition function, with γ(s, a) defined iff a is applicable to s. Γ(s, π) = γ(γ(...γ(s, a1 ), a2 ), . . . , an ) is the state generated by applying the plan π = a1 , . . . , an in the state s. If some action ai is inapplicable in Γ(s, a1 , . . . , ai−1 ) then π is inapplicable in s and Γ(s, π) is not defined. A planning problem is a pair P = (s0 , g) in the planning domain Σ = (S, A, γ), where s0 ∈ S is the initial state and g is the goals represented as a conjunction of logical atoms (i.e., g represents a set of goal states G ⊆ S). A solution for a classical planning problem P is a plan π = a1 , . . . , ak such that each ai in π is a classical action and the state s = Γ(s0 , π) satisfies the goals g. LPG’s plan representation is based on linear action graphs [8], which are variants of the well-known planning graphs [2]. A linear action graph [8] is a directed acyclic leveled graph alternating between a proposition level, i.e., a set of domain propositions, and an action level, i.e., one ground domain action and a set of special dummy actions, called “no-ops”, each of which propagates a proposition of the previous level to the next one. If an action is in the graph, then its preconditions and positive effects appear in the corresponding proposition levels of the graph. Moreover, a pair of propositions or actions can be marked as mutually exclusive at every graph level where the pair appears (for a detailed description, see [8]). While in the original definition, action levels contain only classical actions [8], here we use an extended representation where an action level contains either a classical action or an abstract action. An (extended) action graph can have two types of flaws: unsatisfied action preconditions and abstract actions. LPG uses a stochastic local search process that iteratively modifies the current graph until there is no flaw or a certain search limit is exceeded [8]. LPG deals with an unsatisfied precondition by inserting into or removing from the graph a new or existing action, respectively. We modified LPG in order to recognize abstract actions as flaws resolvable by running an HTN planner, as described below. An action graph with no flaws represents a solution for the input planning problem. An HTN planner formulates a plan by decomposing tasks (i.e., symbolic representations of problem-solving activities to be performed) into smaller and smaller subtasks until tasks are reached that can be performed directly. An HTN is a pair (T, C), where T is a set of tasks and C is a set of partial ordering constraints on the tasks. The empty HTN is the pair (T, C) such that T = ∅ and C = ∅. An HTN planner uses an HTN domain description that contains three kinds of knowledge artifacts: axioms, operators, and methods. The axioms are similar to logical Horn-clause statements; the planner uses them to infer conditions about the current state. The operators are like the planning operators used in any classical planner. The names of these operators are designated as primitive tasks . Each method in an HTN domain description is a prescription for how to accomplish a nonprimitive task by decomposing it into subtasks (which may be either primitive or nonprimitive tasks). A method consists of (1) the task that the method can be used to ac-
complish, (2) the set of preconditions which must be satisfied for the method to be applicable, and (3) the subtasks to accomplish, along with some constraints over those tasks that must be satisfied. For example, consider the task of moving a collection of items from one location to another. One method might be to move them by truck. For such a method, the preconditions might be that the truck is in working order and is present at the first location. The subtasks might be to open the door, put the items onto the truck, drive the truck to the other location, and unload the items. We assume that each abstract action in a planning domain corresponds to a nonprimitive task, which must be decomposed into smaller tasks using HTN methods (if available).2 In addition to primitive and nonprimitive tasks, we also define a class of special-purpose tasks called achieve-goals tasks. An achieve-goals task specifies a set of goals, as in a classical planning problem, that need to be achieved in the world before the task decomposition-process can progress during HTN planning. An HTN planner would not have any methods to decompose an achieve-goals task t Instead, an achieve-goals task triggers the invocation of a classical planner to generate a plan π such that the state Γ(s, π) satisfies the specified goals of t, which we denote as GoalsOf(s, t), given the input set of actions. The achievegoals task is an important component of our planning system Duet that incorporates LPG and SHOP2 in a unified planning process, as described in the next section.
3
Duet = LPG + SHOP2
This section describes our planning procedure, called Duet, that incorporates local-search planning as in LPG [8] and HTN planning as in SHOP2 [14]. The LPG and SHOP2 planning procedures that we use in Duet are slightly modified versions of the originals reported in [8] and [14], respectively. Below, we first describe the Duet planning procedure, and subsequently we briefly describe our modifications to LPG and SHOP2 to adapt them to work within Duet. Figure 1 shows a high-level description of the Duet planning procedure. Duet’s input includes the initial state s0 and the goal condition g of a classical planning problem, as well as a possibly empty initial task network specified for achieving the goals g and a possibly empty set M of HTN methods. Duet first initializes the current state s to s0 and the current partial plan to the empty plan. At Line 1, n is a counter for the number of search steps performed by the planner; that is, n is the total number of graph modifications performed by LPG to fix the flaws plus the number of task decompositions done by SHOP2. Duet also uses a tabu list, τ , that keeps the abstract actions that cannot be decomposed into smaller tasks given the HTN methods in M , and therefore, must be avoided during local search in LPG. The tabu list τ is initialized to the empty list at Line 1. Duet successively generates and resolves subgoals for the input planning problem until it generates a solution plan. A subgoal of the planning problem is either a goal to achieve using domainindependent search heuristics via LPG, or an abstract action (i.e., a task) that needs to be decomposed into smaller tasks via SHOP2. Duet performs this iterative procedure for a maximum predefined number of search steps. If a solution cannot be found during these iterations, the procedure returns failure. 2 Note that a macro-action [4, 7] is a special case of an abstract action: a macro-action decomposes directly into a sequence of primitive actions, whereas an abstract action may be decomposed into a combination of both primitive actions and other nonprimitive tasks that need to be decomposed further. This allows us, for example, to write HTNs that perform the standard recursive decomposition of a Towers of Hanoi task in the Museum domain.
A. Gerevini et al. / Combining Domain-Independent Planning and HTN Planning: The Duet Planner
Procedure Duet(s0 , g, w0 , M ) Input: The problem initial state s0 , the set of problem goals g, the initial task network w0 and a set of HTN-methods, Output: A solution plan or failure. 1. n ← 0; s ← s0 ; w ← w0 ; π ← τ ← gshop2 ← glpg ← ∅; 2. while n does not exceed a predefined number of steps 3. if π is a solution (all subgoals satisfied) then return π; 4. else if there exists an abstract action gSHOP2 then 5. π , s , gLPG , w , n ← SHOP 2(s, gSHOP2 , πnil , n, M ); 6. if π = failure 7. then τ ← τ ∪ gSHOP2 , n; w ← (w − gSHOP2 ); 8. else π ← π + π ; w ← w + (w − gSHOP2 ); s ← s ; 9. gSHOP2 ← ∅; 10. else if there exists an achieve-goals task gLPG then 11. π , gSHOP2 , n ← LPG(s, gLPG , πnil , n, τ ); 12. π ← π+ prefix of π up to the first abstract action; 13. w ← the rest of π + (w − gLPG ); 14. s ← Γ(s0 , π); 15. gLPG ← ∅; 16. else if w = ∅ then 17. π, s, gLPG , w, nil ← SHOP 2(s, w, π, n, M ); 18. if π = failure then return failure; 19. else 20. π , gSHOP2 , n ← LPG(s0 , g, π, n, τ ); 21. π ← prefix of π up to the first abstract action; 22. w ← the rest of π ; 23. s ← Γ(s0 , π); 24. return failure. Figure 1. Pseudocode of the Duet planning algorithm. “+” is the operator concatenating two plans, πnil is the empty plan, s is the world state, w is the task network, τ is the tabu-list, gLPG represents the goals specified in an achieve-goals task, and gSHOP22 is an abstract action.
If Duet returns failure, we re-start it from the beginning with the same input for a predefined number of times, in order to search for possible solutions again. The rationale behind these restarts is that since LPG, and therefore Duet, is a randomized search algorithm, there is a possibility that different restarts of the planner will produce different search paths in the search space and the planner will generate a solution plan. At each iteration of the while loop (Lines 2–23), Duet first checks whether the current partial plan π is a solution for the input planning problem. If so, Duet returns this plan and terminates successfully. Otherwise, if there is an abstract action (or an HTN of abstract actions) to be accomplished, Duet invokes SHOP2 on this HTN, which is called gSHOP2 in Line 5. Using the input HTN methods M , SHOP2 attempts to generate a solution plan for the HTN gSHOP2 . Figure 2 shows the modified version of SHOP2 [14] that Duet uses. The planning procedure is the same as in [14], except for Lines 10–12. In Line 10, if the current task to be decomposed is an achievegoals task, then our adaptation of SHOP2 returns the GoalsOf(s, t) in the current state s. As described above, the Duet then invokes LPG on these goals to achieve them and updates the current partial plan. When SHOP2 returns, there are three cases: • SHOP2 generates a plan π for gSHOP2 successfully using the methods in M . In this case, the returned successor HTN w is the empty HTN and there are no successor goals for LPG (i.e., gLPG is the empty set in Line 5). • SHOP2 generates an achieve-goals task tLPG for Duet to invoke LPG in the next iteration. In this case, π is the partial plan that SHOP2 generated until the task tLPG in the decomposition pro-
575
Procedure SHOP2(s, w, π, n, M ) Input: a world state s, a task network w, a (partial) plan π, a number of search steps n and a set of HTN-methods, Output: a plan, its final state, a task that has no method, a task network and a number of search steps. 1. while w is not empty do 2. nondeterministically choose a task t from w that has no predecessors and remove it; 3. n ← n + 1; 4. if t is primitive then 5. π ← π + t; s ← γ(s, t); 6. else if t is nonprimitive then 7. choose an applicable method m for t (or if there’s 8. no such method then return failure) 9. add decomposition to the front of tasks; 10. else if t is an achieve-goals task then 11. return π, s, GoalsOf(s, t), w, n; 12. return π, s, nil, nil, n; Procedure LPG(s, g, π, ninit , τ ) Input: an initial world state s, a set of goals g, a (partial) plan π, a number of search steps ninit and a tabu-list τ , Output: a plan, the first abstract action in the plan and a number of search steps. 1. A ← an action graph with the first fact level defined by s, the action levels by π and the last fact level by g 2. for n = ninit to a predefined number of steps do 3. π ← the plan represented by A 4. if A is a solution graph then return π, nil, n; 5. σ ← the flaw at the lowest level of A 6. if σ is an abstract action then return π, σ, n; 7. else 8. N ← set of actions that are not in τ and whose insertion to/removal from A fixes σ; 9. select an element from N and modify A with it 10. return nil, nil, n. Figure 2.
Pseudocode of Duet’s modified SHOP2 and LPG procedures.
cess, s is the state in which LPG must be called, gLPG is the goals for LPG specified by tLPG , w is the HTN that still needs to be accomplished once Duet generates a plan that achieves the goals gLPG , and n is the updated number of search steps. • SHOP2 returns failure. SHOP2’s failure on gSHOP2 means that there are no possible ways to decompose gSHOP2 given the current domain knowledge and the input initial state, and therefore, LPG should not consider the particular abstract action gSHOP2 in its later planning invocations. In this case, Duet inserts gSHOP2 , along with the number of search steps generated so far, in the tabu list, and removes gSHOP2 from the current task network (Line 7). If SHOP2 returns a plan π , Duet inserts it into the current plan π, and updates the HTN w that still needs to be accomplished. Note that at Line 8, if SHOP2 could successfully accomplish gSHOP2 without returning any goals to LPG, the returned HTN w would be the empty HTN, and there would be no update to the HTN w. If there is a goal gLPG for LPG (see Lines 10–15), Duet invokes LPG with this goal, the current state, the empty plan, and the current values of the tabu list and number of search steps. The modified LPG procedure (Figure 2) is essentially the same stochastic local search procedure of [8] with the following differences: the action graph is initialized using a (possibly non-empty) plan; the initial number of
search steps is an input number instead of zero; the action graphs can contain a new type of flaw (an abstract action), which is handled by just returning it to Duet together with the current plan and number of search steps (Line 6); the search neighborhood is restricted to forbid the insertion of an abstract action in the input tabu list (Line 8). Note that at Line 5 the unsupported preconditions of an abstract action are selected before the action and that, as in [8], the neighborhood selection at Line 9 is randomized and uses a heuristic function. There are three possible cases when LPG terminates: • LPG tries to fix a flaw corresponding to an abstract action during its search and needs SHOP2 to decompose this abstract action into smaller tasks. In this case, LPG returns the current partial plan it has (π ), the abstract action for SHOP2 (gSHOP2 ), and the updated number n of performed search steps. • LPG generates a solution plan with no abstract actions for the input goals gLPG . In this case, LPG’s gSHOP2 output is empty. • LPG fails because the search increases the input number of search steps n above the predefined maximum. In this case, Duet will return failure and can be restarted. After the run of LPG, Duet updates the current plan π, the current task network w and the current world state s (Lines 12–14). If there are no immediate goals for SHOP2 or LPG (i.e., if both gSHOP2 = ∅ and gLPG = ∅), then Duet checks whether there are more tasks that need to be decomposed by SHOP2 (Lines 16–18) or any remaining flaws in the current plan that need to be fixed by LPG (Lines 19–23). In the former case, Duet invokes SHOP2 to plan for the HTN w that still needs to be accomplished. Note that, in this case, Duet gives SHOP2 the current partial plan as input (instead of the empty plan as in the above case). This is because if SHOP2 generates a plan for the input abstract action then that plan must be a part of the solution. If the task network becomes empty and the current plan contains a flaw, Duet invokes LPG in its next iteration (see Line 20) with the initial planning problem, except that this time LPG starts with the current partial plan and attempts to generate a solution based on it, rather than starting from the empty plan. The following theorem establishes Duet’s soundness (we omit the proof due to space limitations). Theorem 1 Let P = (s0 , g) be a classical planning problem, w0 be a (possibly empty) HTN to accomplish the goals g, and M be a set of HTN methods. Suppose Duet(s0 , g, w0 , M ) returns a plan π. Then, π is a solution for the planning problem P . Duet is not a complete planner (i.e., it may not find a solution to an input planning problem, although there is one) for two reasons: (1) LPG, as a stochastic local search procedure, may return failure without finding any solution given the number of restarts and the bound parameter on the number of search steps; and (2) the HTNs provided as input for SHOP2 may not be complete, and even if they are, they may prune the solution away.
4
Experimental Evaluation
We compared LPG and SHOP2 with two versions of Duet, one supplied with extremely sparse domain knowledge, and the other with more detailed knowledge of one facet of the Museums domain. The planning operators for LPG in this domain are DRIVE-TRUCK, MOVE-TO-TRUCK, MOVE-FROM-TRUCK, and MOVE. The three move operators define a ToH subdomain where the pegs are the truck area and the two museum pallets. Duet
500 450 lpg-solo duet-simple 400 350 duet-specialist shop2-solo 300 250 200 150 100 50 0 4 5 6 7
8
9
# of problems left unsolved
A. Gerevini et al. / Combining Domain-Independent Planning and HTN Planning: The Duet Planner
time (s)
576
50 lpg-solo duet-simple duet-specialist shop2-solo
40 30 20 10 0 4
5
number of objects
6
7
8
9
number of objects
Figure 3. In the first graph, each data point is the average running time on 50 randomly generated problems. The second graph shows how many times the planners failed to return plans within our 500-second deadline; each such failure was scored at 500 seconds in the first graph.
with sparse domain-knowledge, denoted as DuetSimple , used SHOP2 to choose the order in which to relocate the objects, and LPG to plan how to move each object. Duet with rich domain-knowledge of object-stacking, denoted as DuetSpecialist , provided LPG with abstract actions LOAD and UNLOAD in place of the three primitive move operators. In this version, LPG controls the trucks and chooses which objects to pick up and drop off, where each pick up/drop off request is an abstract action handled by SHOP2. Table 1. Sizes of the human-generated Museum domain descriptions for LPG, DuetSimple , and DuetSpecialist , and a SHOP2 HTN. Planner LPG DuetSimple DuetSpecialist SHOP2
Total lines 34 70 157 238
Total characters 1658 2893 6573 9549
Total no. of tokens 426 694 1534 2254
To measure the complexity of the domain knowledge needed by the various planners, Table 1 gives several different measures of the sizes of the domain descriptions used by the various planners. LPG requires only a description of the operators, while SHOP2 requires the operators and HTN methods to solve the Museum planning problems. DuetSimple and DuetSpecialist use a partial set of HTN methods: these methods can be used to generate plans for parts of a Museums planning problem but they cannot solve the problem entirely. There are three parameters affecting problem difficulty in Museums domain: the number of museums, the connectivity of the museums, and the number of art objects to transport. We performed experiments for each case in which we fixed two of the parameters above and vary the other. In the cases where we varied the first two parameters above, we did not observe a significant change in the relative performance of the planners since these two cases emphasized the truck-movement subproblems in the Museums domain and all of our planners were able to solve truck-movement subproblems easily. All of our operator and HTN descriptions and other input files regarding our experimental setup are available online.3 Figure 3 shows the results of our experiments with varying number of objects where we fixed the number of museums as 3 and generated complete graphs of museums. Each data point in this figure is the average of 50 randomly-generated planning problems. We set the time limit of 500 seconds for the planners and we scored those runs that did not return a plan within the limit at 500 seconds. With increasing numbers of objects, LPG’s local search became frequently trapped into local minima and was unable to produce any 3 See
http://www.cs.umd.edu/∼waisbrot/Duet
A. Gerevini et al. / Combining Domain-Independent Planning and HTN Planning: The Duet Planner
plan within the given CPU-time limit. For example, LPG began to struggle when the number of objects at any one museum went beyond 4, and out of the 50 9-object problems, it failed on 37. DuetSimple outperformed LPG slightly when they both solved a problem, but generally failed on most of the same problems as LPG, for the same reasons. One advantage of DuetSimple over LPG was an increase in reliability. Some of the plans produced by LPG included repetition of actions: picking an object up and then putting it back in the same place multiple times. LPG can be configured to do more planning iterations and produce an improved plan, but DuetSimple was able to produce a more directed plan in a single pass, saving time. DuetSpecialist dramatically outperformed both DuetSimple and LPG because it used domain-specific HTNs to solve the parts of the problem that involve object-stacking. While the object-stacking HTNs required human authoring, we did not give DuetSpecialist any HTNs for navigating between museums, choosing when objects should be picked up, or choosing where to place objects. DuetSpecialist solved all of the problems, and in most cases solved them faster than LPG. To run SHOP2 by itself, we needed to give it HTN methods both for stacking art objects and navigating the truck. It suffered from two major failings, due to the inexperience of the domain writer. First, the HTN methods focused on moving one art object at a time, rather than loading multiple objects onto the truck before attempting delivery. Second, the HTN methods were deeply recursive, so large problems caused the stack to overflow. Although the SHOP2 methods could be improved with additional time and experience, Duet produces good results with less effort on the part of the domain writer. One exception to Duet’s performance was that LPG outperformed it in the easiest problems. This is because of Duet’s loose coupling between SHOP2 and LPG, which made Duet easy to implement but made the communication from SHOP2 to LPG very expensive. Duet and SHOP2 are both written in LISP, so calls to SHOP2 to decompose a task were inexpensive, but calls to LPG, which is written in C, required spawning and later destroying a separate shell and process. Because of this expense, the easiest problems were completely solved by LPG before Duet was able to complete the necessary calls between planners. If both planners were packaged as libraries, this inter-planner communication cost would be significantly decreased.
5
Conclusions
We have described Duet, a new planner that incorporates adaptations of two well-known planners, LPG [8] and SHOP2 [14]. Duet combines LPG’s domain-independent local search techniques with hierarchical domain knowledge in the form of SHOP2’s Hierarchical Task Networks (HTNs). Duet starts with a planning problem consisting of an initial state, a goal condition, and a possibly empty set of tasks. During planning, Duet uses SHOP2 to decompose tasks into smaller subtasks, and LPG to satisfy goal conditions. Our experiments with Duet in the Museums domain showed that even when Duet had only a small amount of domain-specific knowledge (e.g., “choose the least-fragile object and move it to the target museum first”), it still solved planning problems faster, on average, than LPG. With more problem-solving knowledge (e.g., how to properly manipulate stacks of art objects), Duet outperformed both LPG and SHOP2, in terms of both speed and the number of successfully solved problems. To get SHOP2 to perform better, significantly more human effort would have been needed to improve its knowledge base. We are currently starting a further experimental evaluation of Duet. So far, we have run experiments using the Storage domain from the 2006 International Planning Competition and obtained sim-
577
ilar results to those shown here. Although the Duet planning procedure we described in this paper is based on SHOP2 and LPG, the ideas we described here could be easily generalized to combine any planner that uses domain-specific knowledge with any domain-independent classical planner. Thus, a possible future direction is to extend Duet to work with planners such as FF [11], FastDownward [10], and SGPlan [5]. Another direction is a tighter integration of SHOP2 and LPG, which would probably yield more efficient planning in Duet. Not only would this reduce the communication overhead between the planners, it would allow Duet to provide a richer form of “knowledge transfer;” the decisions that one of the planners make during its planning time will be more closely dependent on the domain knowledge that the other one could provide. Acknowledgments. This work was supported in part by DARPA’s Transfer Learning and Integrated Learning programs and NSF grant IIS0412812. The opinions in this paper are those of the authors and do not necessarily reflect the opinions of the funders.
REFERENCES [1] F. Bacchus and F. Kabanza, ‘Using temporal logics to express search control knowledge for planning’, Artificial Intelligence, 116(1-2), 123– 191, (2000). [2] A. L. Blum and M. L. Furst, ‘Fast planning through planning graph analysis’, Artificial Intelligence, 90(1-2), 281–300, (1997). [3] B. Bonet and H. Geffner, ‘Planning as heuristic search: New results’, in ECP, Durham, UK, (1999). [4] Adi Botea, Markus Enzenberger, Martin Muller, and Jonathan Schaeffer, ‘Macro-ff: Improving ai planning with automatically learned macro-operators’, JAIR, 24, 581–621, (2005). [5] Y. Chen, C. Hsu, and B. Wah, ‘Temporal planning using subgoal partitioning and resolution in SGPlan’, JAIR, 26, 323–369, (2006). [6] K. Currie and A. Tate, ‘O-Plan: The open planning architecture’, Artificial Intelligence, 52(1), 49–86, (1991). [7] R. E. Fikes and N. Nilsson, ‘Strips: A new approach to the application of theorem proving to problem solving’, Artificial Intelligence, 5(2), 189–208, (1971). [8] A. Gerevini, A. Saetti, and I. Serina, ‘Planning through Stochastic Local Search and Temporal Action Graphs’, JAIR, 20, 239–290, (2003). [9] M. Ghallab, D. Nau, and P. Traverso, Automated Planning: Theory and Practice, Morgan Kaufmann, 2004. [10] M. Helmert, ‘The Fast Downward planning system’, JAIR, 26, 191– 246, (2006). [11] J. Hoffmann and B. Nebel, ‘The FF planning system: Fast plan generation through heuristic search’, JAIR, 14, 253–302, (2001). [12] S. Kambhampati, ‘Model-lite planning for the web age masses: The challenges of planning with incomplete and evolving domain theories’, in AAAI, Vancouver, Canada, (2007). [13] J. Kvarnstr¨om and P. Doherty, ‘TALplanner: A temporal logic based forward chaining planner’, Annals of Mathematics and Articial Intelligence, 30, 119–169, (2001). [14] D. Nau, T. Au, O. Ilghami, U. Kuter, W. Murdock, D. Wu, and F. Yaman, ‘SHOP2: An HTN planning system’, JAIR, 20, 379–404, (2003). [15] N. Nguyen, S. Kambhampati, and R. Nigenda, ‘Planning graph as the basis for deriving heuristics for plan synthesis by state space and CSP search’, Artificial Intelligence, 135(1-2), 73 – 124, (2002). [16] S. Yoon and S. Kambhampati, ‘Towards Model-lite Planning: A Proposal For Learning & Planning with Incomplete Domain Models’, in Proc. ICAPS-07 Workshop on AI Planning and Learning, Providence, RI, (2007). [17] D. E. Wilkins, Practical Planning: Extending the Classical AI Planning Paradigm, Morgan Kaufmann, San Mateo, CA, 1988.
578
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-578
Learning in Planning with Temporally Extended Goals and Uncontrollable Events Andr´e A. Cir´e1 and Adi Botea2 Abstract. Recent contributions to advancing planning from the classical model to more realistic problems include using temporal logic such as LTL to express desired properties of a solution plan. This paper introduces a planning model that combines temporally extended goals and uncontrollable events. The planning task is to reach a state such that all event sequences generated from that state satisfy the problem’s temporally extended goal. A real-life application that motivates this work is to use planning to configure a system in such a way that its subsequent, non-deterministic internal evolution (nominal behavior) is guaranteed to satisfy a condition expressed in temporal logic. A solving architecture is presented that combines planning, model checking and learning. An online learning process incrementally discovers information about the problem instance at hand. The learned information is useful both to guide the search in planning and to safely avoid unnecessary calls to the model checking module. A detailed experimental analysis of the approach presented in this paper is included. The new method for online learning is shown to greatly improve the system performance.
1
Introduction
Recent years have seen an increased interest in advancing planning from the classical model to extensions such as using temporal logic to express desired features of a correct plan. Search in a classical planning problem can be guided with control rules expressed in temporal logic [1]. The international planning competition IPC-5 [6] has introduced hard and soft constraints, expressed in temporal logic, that finite plans should satisfy. Computing cyclic solutions to problems with temporally extended goals is presented in [10]. Previous contributions to planning such as the above ones apply temporal logic reasoning along a (candidate) solution plan that is either a finite or a cyclic sequence of actions. In contrast, this paper addresses a problem where temporal logic is applied to the future behavior of a system after a goal state is reached. Specifically, the temporal goal of a problem must be satisfied by all sequences of events that originate in a goal state. Events are transitions in the problem state space that are not under the control of the planning agent. A real-life application that motivates this research is automated configuration of a composite system such as a power grid or a network of water pipes. A composite system is a collection of interacting components. Assume it has a nominal behavior, a non-deterministic evolution in the state space where all transitions are uncontrollable events. Even though planning cannot control the events directly, it can impact the nominal behavior by configuring elements of the sys1 2
Institute of Computing, University of Campinas, Brazil NICTA and Australian National University, Canberra, ACT
tem structure such as the connections between components. Configuring the system in a specific way doesn’t necessarily imply that the subsequent nominal behaviour is fully determined. Generally, many event trajectories can originate from a given configuration. The planning task is to configure the system in such a way that its subsequent nominal behavior satisfies the goal condition on every possible event sequence. The configuration step is useful in a number of scenarios such as the initial configuration of a system, a reconfiguration to recover from a failure, a reconfiguration to grow or reduce the size of a system, and a reconfiguration to adapt to a new goal condition. As soon as a solution is found, the planning agent interferes no longer with the system unless a reconfiguration process becomes necessary at some point in the future. Contributions. This paper introduces a new planning model that combines temporally extended goals and uncontrollable events. A solving approach is presented that incrementally learns new information about a problem instance and uses it to improve the performance. The architecture contains a planning component, a model checking component and an online learning component. Planning explores the problem space where transitions are actions and enumerates candidate goal states. A model checking round tests if all event sequences that originate in a candidate goal state satisfy the temporally extended goal. If the test succeeds, a solution has been found. Otherwise, at least one event sequence exists for which the goal formula does not hold. The learning step analyzes such event sequences. New information is extracted, which will be used to both guide the planning and avoid unnecessary model checking rounds. The performance of a system that implements the ideas presented in this paper is analyzed empirically in detail. The new method for incrementally learning information about a problem instance is shown to greatly improve both the planning effort and the total number of model checking rounds.
2
Related Work
Planning systems such as TLP LAN [1] and TAL PLANNER [13] are capable of handling a large problem space by using search control rules formulated in temporal logic. MIPS [16], SGP LAN [9] and HP LAN -P [2] are examples of systems that can handle hard and soft constraints (preferences) related to a planning goal. This research direction was mainly encouraged by a track added to the 2006 International Planning Competition (IPC-5), in conjunction with PDDL3 [6]. A method able to generate cyclic plans that satisfy a temporally extended goal can be found in [10]. In path planning, temporal logic can encode constraints that a trajectory computed for a mobile unit (e.g., robot) should satisfy [5]. As in previous work
A.A. Ciré and A. Botea / Learning in Planning with Temporally Extended Goals and Uncontrollable Events
such as [7, 10, 16], we convert LTL formulas into B¨uchi automata. Two major features that distinguish our work from all contributions mentioned earlier are: (1) our system is capable of learning from trajectories where an extended goal does not hold; and (2) we apply our ideas to a new planning problem, where a deterministic planning component is followed by a non-deterministic evolution generated with uncontrollable events. In particular, we reason about LTL goals in the presence of events, whereas the IPC-5 domains with extended goals and preferences are deterministic. In reactive planning, actions are executed to respond to event occurrences. Reactive planning in problems with extended goals expressed in Metric Temporal Logic (MTL) is the topic of [3, 4]. There is an important distinction between the problem that we address and fields such as reactive planning and controller synthesis. In the latter cases no goal state is defined, whereas we need to reach a goal state where the planning (configuration) is completed and the subsequent system evolution (nominal behaviour) respects the temporal goal. Generating a control strategy consistent with an LTL formula in a non-deterministic environment is the topic in [12]. The value of this contribution seems to be more theoretical. It provides a translation of the original problem into an LTL game but indicates no heuristics or other enhancements that will be necessary to scale up the performance of a solver. It reports neither experiments nor an actual implementation of the theoretical ideas. A high-level theme that our learning approach shares with explanation based learning (EBL) is learning from counter examples. Our work differs significantly from previous work on EBL in the planning problem addressed and in the ways that new information is extracted and subsequently used. E.g., the topic in [11] is learning from Graphplan dead-ends in classical planning whilst we focus on learning from bad event sequences in planning with temporal goals and uncontrollable events. Model-based self-configuration, a problem related to our work, is addressed in [17]. That work does not consider temporally extended goals. It can be seen as a form of EBL, since it attempts to make a search more informed as more conditions conflicting with goal states are discovered.
3
Problem Definition and Background
The planning model addressed in this work is a structure S, s0 , ϕ, γ, A, E with S a finite state space, s0 ∈ S an initial state, and ϕ a temporal logic formula that describes the goal. The function γ : S × (A ∪ E) → S models deterministic transitions in the state space. The transitions are partitioned into a set of actions A (i.e., transitions under the control of the planner), and a set of uncontrollable events E that define the nominal behavior of a system. The search space that has the initial problem state as a root node and uses only actions as transitions is called the problem planning space. The space that is rooted in a given state s and uses only events for transitions is called the event space of state s. The state space associated with a problem is defined using a fixed collection of boolean variables called atoms. Each state is a complete assignment to the atoms defined for that problem. Equivalently, a state s can be defined as the set of all atoms that are true in s (closed world assumption). Following the STRIPS representation, each action (event) a has a set of preconditions pre(a), a set of positive effects add(a) and a set of negative effects del(a). An action (or event) a is applicable in a state s if s |= pre(a). In such a case, γ(s, a) = (s \ del(a)) ∪ add(a). Otherwise, γ(a, s) is undefined. A sequence of actions (events) a1 , a2 , . . . , ak , is applicable in a state s if a1 is applicable in s, a2 is applicable in γ(s, a1 ) and so on. For a sequence of actions
579
(events) π = a1 , . . . , ak that is applicable in a state, the precondition of the entire sequence pre(π) is the union of all atoms p such / add(aj )). that (∃i ∈ {1 . . . k}) : (p ∈ pre(ai ) ∧ (∀j < i)p ∈ The planning task is to find a finite sequence of actions that can be applied in s0 and that reaches a goal state. A state s ∈ S is a goal if every event sequence applicable in s satisfies the temporal goal ϕ. A sequence that does not satisfy ϕ is called a bad event sequence.
4
Solving Approach
The architecture outlined in Algorithm 1 contains three main modules. Planning explores the planning space and enumerates candidate goal states. Model checking explores the event space of a candidate goal state s to check if it satisfies the temporally extended goal of the problem ϕ. If the test returns a positive answer, a solution has been found. Otherwise, the online learning component attempts to extract a sufficient condition that explains the negative result of the most recent model checking round. The system incrementally learns information about a problem instance that is used to speed up the solving process. The learned information I is represented as an atemporal boolean formula. A state s with the property s |= I is guaranteed not to satisfy the goal formula ϕ. The boolean formula I is used in two parts of the algorithm, each with a great contribution to the system performance. Firstly, no model checking rounds need to be performed in states s with s |= I. Secondly, ¬I can be used as a reachability goal in the planning component, allowing the computation of relaxed plans that steer the search away from states that are guaranteed not to be goals. As a problem definition contains no explicit reachability goals, no other information besides ¬I is used as a goal when building relaxed plans. Standard algorithms that compute relaxed plans such as the one implemented in the FF planning system [8] work only with conjunctive reachability goals. As in Rintanen’s work [15], FF’s method is extended to handle goals such as ¬I, which can be an arbitrary boolean formula. In general, a relaxed plan could be used to compute a heuristic distance from a current state to a goal state, and to partition the successors of a node into helpful nodes (i.e., nodes obtained from applicable actions that are also part of the parent’s relaxed plan) and rescue nodes (all other valid successors). In this paper, two open queues are used, one for helpful and another for rescue nodes. A rescue node is expanded only when the helpful open queue is empty. No heuristic values are associated with nodes. The reason is that, in this problem, the reachability goal ¬I varies in time. Nodes evaluated early might have better heuristic values just because these were computed when the reachability goal was more relaxed. When ¬I is used as a reachability goal in planning, the lines 6 and 7 in Algorithm 1 are redundant, since sg |= ¬I holds for every candidate goal state sg . The lines are added to the pseudocode to emphasize more clearly that model checking is triggered only for a small fraction of the states visited in planning. The next discussion assumes that Linear Temporal Logic (LTL) goals are used. Model checking is implemented as a breadth-first search in order to discover bad event sequences of minimal length. Shorter bad event sequences can allow to learn information that has fewer conjunctive conditions and hence is more generally applicable. See details about learning later in this section. For the sake of clarity, assume that each application of an event in model checking search is performed together with both a normal (usual) progression of ϕ and a progression in the B¨uchi automaton corresponding to ϕ. B¨uchi progression is a standard approach also adopted, for example in [10]. Other model checking methods (e.g., SAT based [14]) can
580
A.A. Ciré and A. Botea / Learning in Planning with Temporally Extended Goals and Uncontrollable Events
Algorithm 1 Architecture overview. 1: I ← false {initialize learned info} 2: while true do 3: (sg , π) ← SearchForNextCandidateGoalState() {planning; π is the action sequence from s0 to sg } 4: if no state sg is found then 5: return no solution 6: if sg |= I then 7: continue {no need for a costly model checking round} 8: ModelChecking(sg ) {run a model checking round} 9: if model checking succeeds then 10: return π 11: else 12: I ← I ∨ ExtractInfo() {learning}
be used but the actual choice is not a major point of this research. As explained in this section and demonstrated empirically in the next section, we improve the model checking component of the algorithm by reducing dramatically the total number of model checking rounds, not the effort spent in one individual round. In the model checking component, the event sequences that originate in a candidate goal state sg are split into four categories, one corresponding to paths that satisfy ϕ and three corresponding to bad event sequences. Bad event sequences are: L-paths, sequences that end with a leaf node (i.e., a node where no events can be applied) before the normal progression reduces ϕ to either true or false; Fpaths, sequences along which the normal progression reduces ϕ to false; and C-paths, where a cycle is created and ϕ is never satisfied. As soon as one bad event sequence is discovered, the corresponding round of model checking returns. If desired, the procedure could attempt to discover several bad event sequences, allowing to learn more information from one round. The rest of this section focuses on the learning method. This is triggered each time when model checking has discovered an event sequence πe that is either an F-path or a C-path. No information is extracted from L-paths. Information extracted from an L-path might be too specific to sg , since it would have to explain why none out of potentially many events is applicable in the leaf node. The information extraction aims at detecting a boolean formula c such that sg |= c and c is sufficient to explain the failure of ϕ along the sequence πe . More specifically, c should imply both the following conditions: (1) πe is applicable in sg ; and (2) ϕ does not hold along the sequence πe . As indicated in Algorithm 2, the formula c is initialized to pre(πe ) to ensure that c implies condition (1). To imply condition (2), c is extended with zero or more conjunctive literals l. It is desirable to minimize the number of added literals, as a smaller formula c is more generally applicable and thus more model checking rounds could be avoided in the future. To compute a set of literals to be added to c, a variation of progression called event-specific progression is introduced. Consider a state si obtained after applying the first i ≥ 1 steps of πe . The eventspecific progression to si from the previous step is equivalent to the normal progression, except that it postpones the instantiation of certain atoms, as explained next. The normal progression can be defined recursively starting from atoms and moving to more and more complicated formulas. For the complete set of rules, see for example [1]. Only the case of atomic formulas needs to be discussed here. At the atomic level,
prog(p, si ) = true if si |= p and prog(p, si ) = false if si |= ¬p. In other words, all occurrences of atoms in the progressed formula that are not inside a temporal operator are replaced by their actual truth values in the corresponding state. The event-specific formula progression applies different rules at the atomic level. For each atom p in the initial problem definition, define a new variable p0 . Define a set of atoms Zi as pre(πe )∪eff(e1 )∪ del(e1 ) ∪ · · · ∪ eff(ei ) ∪ del(ei ). Being independent from the first i / Zi preserve their value all the way from sg to steps of πe , atoms q ∈ si . For an atomic formula p, the event-specific progression is defined / Zi , and eprog(p, si ) = prog(p, si ) ∈ as eprog(p, si ) = p0 if p ∈ {true, false} if p ∈ Zi . The progression rules for more complicated, non-atomic formulas are the same as in normal progression. Usual simplifications such as true ∨ α = true are useful to eliminate irrelevant occurrences of new variables p0 that might exist in α. Event-specific progression of ϕ along πe is performed step-bystep for t times, the same number of steps that normal progression was performed before detecting that πe was a bad event sequence. The resulting formula is denoted by eprog(ϕ, πe , t). Consider that P is the set of all new boolean variables p0 added during event-specific progression. Each element p0 ∈ P generates one literal to be added to c as a new conjunction. If p is true in sg , then p is added to c. Otherwise, ¬p is the newly created literal. It can be shown that the condition c computed as before implies both conditions (1) and (2). Implying condition (1) is obvious from the way c is initialized. A formal proof for condition (2) is skipped to save space. The intuition is that the only atoms that could possibly impact the normal formula progression of ϕ along πe are those determined by pre(πe ) (i.e., atoms in Zt ) and atoms p with p0 ∈ P . The condition c is the assignment in sg of the atoms in pre(πe ) ∪ {p|p0 ∈ P }. Before creating the literals to be added to c, P can be reduced with a greedy procedure that is linear in the size of P . The correctness of the extracted information c is preserved in the sense that it still implies conditions (1) and (2). A formula β is initialized to eprog(πe , ϕ, t). The procedure iteratively selects one variable p0 from P and instantiates it in β with the value of p in sg . This is repeated until β becomes equivalent to prog(sg , πe , ϕ, t), the formula obtained by normal progression from sg along πe for t steps. The variables in P that were not instantiated in this loop can safely be skipped when the literals are generated. The condition on line 7 of Algorithm 2 is easy to check for F-paths, since prog(sg , πe , ϕ, t) = false. The implemented system skips the greedy reduction of P for C-paths. It is possible to address this, but the experiments reported next did not indicate a performance bottleneck caused by this choice. As a simple example, if eprog(ϕ, πe , t) is false, then no additional information is added to c besides the existing part pre(πe ). In such a case, regardless of the values of other variables in sg , the preconditions and the effects of the event sequence alone are enough to progress ϕ to false.
5
Experimental Results
This first part of this section introduces a new benchmark domain. Our experiments are described next. The last part of the section contains the results and their analysis. Benchmark and Setup. Among the many available planning benchmarks, we are not aware of the existence of an encoding that is suitable to the model presented in Section 3, which includes
581
A.A. Ciré and A. Botea / Learning in Planning with Temporally Extended Goals and Uncontrollable Events
a deterministic planning stage (configuration) followed by a nondeterministic evolution in the event space (nominal behaviour). A new domain has been designed to carry out the experimental evaluation presented in this section. Because of lack of space, only a brief description is included here. The website http://abotea. rsise.anu.edu.au/factory-benchmark/ contains a detailed presentation and the source code of a problem generator. Each problem instance contains a collection of components split into two categories: machines and repositories. At most two repositories can be connected to a machine at a time. A repository cannot simultaneously be connected to more than one machine. Each repository stores raw material of a certain type and can transfer batches of it to a connected machine. A machine can combine two types of raw material to generate a final product. Planning actions consist of both changing connections between repositories and machines, and component-specific operations such as cleaning a machine. The nominal behavior of a system includes transferring raw products from a repository to a machine, and creating final products from combinations of raw materials. Furthermore, certain combinations of raw products can break a machine that is not clean. In experiments, a temporally extended goal, expressed in LTL, is a conjunction of conditions such as never break a machine and eventually generate certain products. The code is implemented in Java 1.6. B¨uchi automata are built using the LTL2BA package, available at http: //www-i2.informatik.rwth-aachen.de/Research/ RV/ltl2ba4j/index.html. The experiments are carried out on a 3.4 Ghz machine, with 1.8 GB allocated to the heap memory and 1.8 GB assigned to the stack memory. The time limit is 15 minutes per problem. We are not aware of any existing system designed for the problem addressed in this paper. In the current experiments, the new solver is compared against a basic version where the learning component is switched off. A set of 350 problem instances is created as follows. The number of repositories r is fixed to 4 and the number of machines m varies from 4 to 10. For each combination (r, m), 50 problems are generated. The LTL goal formulae range in size from 5 to 15 conjunctive conditions. The parameters r and m are chosen in such a way that the problems gradually scale up until the basic solver reaches its limits within the given time and memory constraints. The problem collection contains both instances with solutions and instances that can be proven unsolvable within the allocated resource limits. The latter category is useful to evaluate the impact of learning on reducing the number of model checking rounds. When no goal state exists,
both system versions have to visit all states in the planning space and the difference in the overall performance is mostly explained by the number of model checking rounds. Results. Figure 1 shows the total running time for instances that are proven unsolvable. Each data point in a curve corresponds to one problem instance. The problems are ordered to obtain a monotonically increasing curve for the basic solver. Learning improves the number of model checking rounds. As explained before, the number of nodes in planning search is not affected in such problems. Processing one node in informed planning (i.e., in the system with learning enabled) is more expensive, since a relaxed plan has to be computed. The overall improvement achieved by learning in this subset appears to be almost constant across the problem range. In instances where a solution is found (Figure 2), learning improves not only the number of model checking rounds but also the number of nodes expanded in planning. As compared to Figure 1, the speed-up factor increases as the problems gets larger. The largest improvement in this set reaches two orders of magnitude. Given a problem instance, assume that (P, M, L) tells the percentage that each system module (i.e., planning, model checking, learning) contributes to the total running time. (P (m), M (m), L(m)) is the average over the problems with m machines. When m varies from 4 to 10, L(m) is stable around a value of 3 to 4%. P (m) slightly increases from 70% to 80%. When learning is switched off, the only modules that contribute to the total running time are planning and model checking. The average weight of the planning time slightly increases from 55% when m = 4 to 60% when m = 10. 1000 Basic Learning
100
Time (seconds)
Algorithm 2 Learning step in pseudocode. 1: c ← pre(πe ) 2: P ← all new variables p0 in eprog(πe , ϕ, t) 3: if perform greedy reduction of P (optional) then 4: β ← eprog(πe , ϕ, t) 5: PN ← P 6: P ←∅ 7: while not (β ≡ prog(sg , πe , ϕ, t)) do 8: select p0 ∈ PN 9: instantiate p0 in β with p’s value in sg 10: remove p0 from PN and add it to P 11: for each p0 ∈ P do 12: l ← (sg |= p)?(p) : (¬p) 13: c←c∧l 14: return c
10
1
0.1 0
Figure 1.
20
40
60
80 100 Instance
120
140
160
180
Time for instances with no solution. Note the logarithmic scale.
Learning keeps the number of model checking rounds to very small values, whereas the basic system faces an exponential growth as problems increase in difficulty. Figure 3 illustrates this for problems with solutions. The situation is very similar for problems with no solution. The corresponding chart is skipped to save space. When learning is switched off, planning search is equivalent to breadth-first search, which is guaranteed to find solutions of optimal length. Figure 4 presents the quality of solutions computed by the system with learning enabled. The problems with solution solved by both systems are included in this summary. The sub-optimality of a × 100, where l is the actual length and solution is measured as l−o o o is the optimal length found with breadth-first search. In Figure 4, each bar counts how many problems fit into the corresponding suboptimality range. The data indicate that a majority of the solutions found by the learning system are optimal.
582
A.A. Ciré and A. Botea / Learning in Planning with Temporally Extended Goals and Uncontrollable Events
1000 Basic Learning
Time (seconds)
100
10
1
0.1 0
20
40
60
80
100
120
140
160
180
Instance
Figure 2.
Time for instances with solutions on a logarithmic scale.
A solving architecture that combines elements of planning, model checking and learning is presented and analyzed in detail. An online learning procedure builds up information that is used both as a reachability goal in planning search and as a condition to safely skip unnecessary model checking rounds. In experiments, the incrementally learned information has a great contribution to speeding up the solving process. Future work includes integrating the planning method presented in this paper with monitoring and diagnosis algorithms. The latter monitor a system to decide whether the nominal behavior is the desired one. When faults are detected, the planning method changes the system into a correct configuration.
7
Acknowledgment
NICTA is funded by the Australian Government’s Department of Communications, Information Technology, and the Arts and the Australian Research Council through Backing Australia’s Ability and the ICT Research Centre of Excellence programs. This work has been initiated when the first author was a visiting student at NICTA. We thank Patrik Haslum, Sophie Pinchinat, Jussi Rintanen and Sylvie Thi´ebaux for useful discussions on this topic.
100000 Basic Learning 90000 80000 70000
Rounds
60000 50000
REFERENCES
40000 30000 20000 10000 0 0
20
40
60
80
100
120
140
160
180
Instance
Figure 3. Model checking rounds for instances with solution.
6
Conclusion and Future Work
Advancing recent contributions that extend classical planning with temporal logic, this paper focuses on a planning model that combines temporally extended goals with uncontrollable events. The model is a generic encoding of a real-life application where a system should automatically be configured such that its future nominal behavior respects a given condition expressed in temporal logic.
Figure 4.
Solution quality when learning is used.
[1] F. Bacchus and F. Kabanza, ‘Using Temporal Logics to Express Search Control Knowledge for Planning’, Artificial Intelligence, 16, 123–191, (2000). [2] J. Baier, F. Bacchus, and S. McIlraith, ‘A Heuristic Search Approach to Planning with Temporally Extended Preferences’, in Proceedings of IJCAI-07, pp. 1808–1815, (2007). [3] M. Barbeau, F. Kabanza, and R. St-Denis, ‘Synthesizing Plant Controllers Using Real-Time Goals’, in IJCAI-95, pp. 791–798, (1995). [4] M. Barbeau, F. Kabanza, and R. St-Denis, ‘A Method for the Synthesis of Controllers to Handle Safety, Liveness, and Real-Time Constraints’, IEEE Transactions on Automatic Control, 43(11), 1453–1559, (1998). [5] G.E. Fainekos, H. Kress-Gazit, and G.J. Pappas, ‘Hybrid Controllers for Path Planning: A Temporal Logic Approach’, Decision and Control, and European Control Conference CDC-ECC-05, 4885–4890, (2005). [6] A. Gerevini and D. Long, ‘Plan Constraints and Preferences for PDDL3’, Technical report, University of Brescia, (2005). [7] G. De Giacomo and M. Y. Vardi, ‘Automata-Theoretic Approach to Planning for Temporally Extended Goals’, in Proceedings of ECP-99, pp. 226–238, (1999). [8] J. Hoffmann and B. Nebel, ‘The FF Planning System: Fast Plan Generation Through Heuristic Search’, JAIR, 14, 253–302, (2001). [9] C. W. Hsu, B. W. Wah, R. Huang, and Y. X. Chen, ‘Handling Soft Constraints and Preferences in SGPlan’, in ICAPS Workshop on Preferences and Soft Constraints in Planning, pp. 54–57, (2006). [10] F. Kabanza and S. Thi´ebaux, ‘Search Control in Planning for Temporally Extended Goals’, in Proceedings of ICAPS-05, pp. 130–139, (2005). [11] S. Kambhampati, ‘Improving Graphplan’s Search with EBL and DDB Techniques’, in Proceedings of IJCAI, pp. 982–987, (1999). [12] M. Kloetzer and C. Belta, ‘Managing non-determinism in symbolic robot motion planning and control’, in Robotics and Automation-07, pp. 3110–3115, (2007). [13] J. Kvarnstr¨om and M. Magnusson, ‘TALplanner in IPC-2002: Extensions and Control Rules’, JAIR, 20, 343–377, (2002). [14] T. Latvala, A. Biere, K. Heljanko, and T. Junttila, ‘Simple Bounded LTL Model Checking’, in Proceedings of Formal Methods in ComputerAided Design (FMCAD’2004), pp. 186–200, (2004). [15] J. Rintanen, ‘Unified Definition of Heuristics for Classical Planning’, in Proceedings ECAI-06, pp. 600–604, (2006). [16] S. Jabbar S. Edelkamp and M. Nazih, ‘Large-Scale Optimal PDDL3 Planning with MIPS-XXL’, in Proceedings of the International Planning Competition IPC-05, (2006). [17] B. C. Williams and P. P. Nayak, ‘A Model-based Approach to Reactive Self-Configuring Systems’, in Proceedings AAAI-96, pp. 971–978, (1996).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-583
583
A Simulation-based Approach for Solving Generalized Semi-Markov Decision Processes Emmanuel Rachelson1 and Gauthier Quesnel and Fr´ed´erick Garcia and Patrick Fabiani Abstract. Time is a crucial variable in planning and often requires special attention since it introduces a specific structure along with additional complexity, especially in the case of decision under uncertainty. In this paper, after reviewing and comparing MDP frameworks designed to deal with temporal problems, we focus on Generalized Semi-Markov Decision Processes (GSMDP) with observable time. We highlight the inherent structure and complexity of these problems and present the differences with classical reinforcement learning problems. Finally, we introduce a new simulation-based reinforcement learning method for solving GSMDP, bringing together results from simulation-based policy iteration, regression techniques and simulation theory. We illustrate our approach on a subway network control example.
1 Introduction Many problems in planning present both the features of decision under uncertainty and time-dependency. Imagine, for instance, having to plan the exploitation of a subway network, where available actions only consist in introducing or removing trains from service. In this problem, the goal is to maximize the number of passengers going through the network while minimizing the exploitation cost of the subway. Passenger arrival times, movements going in and out of the trains and possible delays in the system make the outcome of every action uncertain with regard to the next state and the date of the next decision epoch. On top of that, the flow of passengers and their destinations depend greatly on the time of day. All this defines the kind of problems we try to capture as Temporal Markov Problems. These problems cover a wide variety of other applications, as onboard UAV coordination or airport taxiway management, etc. Problems of decision under uncertainty are commonly modelled as Markov Decision Processes (MDP). Recent work on solving large state-space MDP include, for example, factored MDP methods, approximate linear programming, hierarchical approaches, reinforcement learning, etc. Temporal Markov Problems, however, have received little attention from the planning and machine learning communities, even though simulation seems a promising approach to tackling these problems. This paper presents formalisation and algorithmic issues about Temporal Markov Problems and proposes a simulation-based algorithm designed to solve them. In section 2, we will review the models adapted from Markov Processes and designed to include time-dependency and decision making. Building on this first section’s conclusions, we focus on controlling Generalized Semi-MDP (GSMDP). Section 4 presents our algorithm and discusses the issues and interests of simulation-based approaches for 1
ONERA, France, email: emmanuel.rachelson@onera.fr
GSMDP. We illustrate our approach on the subway control example in section 4.3 and conclude in section 5.
2 Temporal Markov Problems MDP have become a popular model for describing problems of planning under uncertainty. Formally, an MDP is composed of a 4-tuple S, A, P, r, where S is a countable set of states for the system, A is the countable set of possible actions, P (s |s, a) is a probability distribution function providing the transition model between states (as in a Markov Process, but conditioned with the action a) and r(s, a) is a reward value associated with the (s, a) transition, used to build criteria and to evaluate actions and policies. Solutions to MDP problems are often given as Markovian policies π, namely functions that map current states to actions. One can introduce criteria to evaluate these policies, as the discounted reward criterion given in equation 1. Criteria permit definition of the value function V π associated with a policy. An important result concerning MDP is that for any historydependent policy, there exists a Markovian policy which is at least as good with regard to a given criterion. Consequently, one can safely seach for optimal control policies in the restricted space of Markovian policies without loss in optimality. Finally, algorithms as value iteration or policy iteration are based on the fact that the optimal policy’s value function V ∗ obeys Bellman’s optimality equation 2 [1]. ! ∞ X γ δ r (sδ , π(sδ )) (1) Vγπ (s) = E δ=0
"
∗
V (s)
=
max r(s, a) + γ a∈A
X
#
∗
P (s |s, a)V (s )
(2)
s ∈S
2.1 Including continuous time in the MDP framework Introducing time in Markov Processes (MP) models — and in their decisional counterparts, MDP — can be done by defining stochastic durations between decision epochs. In a standard MP or MDP, the sojourn time in a given state is one and decision epochs occur at integer time values (thus yielding the γ δ in the discounted criterion). Allowing the sojourn time in a given state to be continuous and stochastic defines the Semi-MP, or Semi-MDP formalism. In an SMDP [11], state sojourn time is described through a distribution F (τ |s, a) indicating the time before transition, provided that we undertake action a in state s. Therefore, an SMDP is a 5-tuple < S, A, P, F, r > which corresponds to a Markov Process but with stochastic state sojourn time. Policies for the control of SMDP can be computed using standard MDP algorithms since solving a discounted reward SMDP turns
584
E. Rachelson et al. / A Simulation-Based Approach for Solving Generalized Semi-Markov Decision Processes
out to be equivalent to performing an integration over expected transition durations and to solving a total reward MDP. This is mainly due to the independence between state sojourn time τ and arrival state s . This very strong assumption was lifted in the Time-dependent MDP (TMDP) model of [2] and generalized recently in the XMDP model of [13]. Formally, an XMDP is described by a 4-tuple < S, A, p, r > where the state space S can be composed of discrete and continuous variables and may include the process’ time, A is a continuous or discrete parametric action space and p and r correspond to transition and reward models for states of S and actions of A. [13] proved that XMDP obeyed a similar optimality equation as equation 4, thus proving that standard algorithms as value iteration could be safely used to solve XMDP. Using the XMDP representation, one can model any stochastic decision process with continuous observable time and hybrid state and action spaces. This seems to suit our Temporal Markov Problems well and some recent techniques for solving hybrid state space MDP ([6, 4]) could be applied here. However, writing transition and duration functions for Temporal Markov Problems is often a very complex task and requires a lot of engineering. For instance, the effect of a RemoveT rain action on the global state of the subway problem is the result of several concurrent processes : the passenger arrivals, the trains movements, the removal of one train, etc.: all compete to change the system’s state and it is a complex task to summarize all these process’ concurrent stochastic influence into the transition and duration functions.
2.2 Concurrency and MDP In the stochastic processes litterature, concurrent Markov processes are modelled as Generalized Semi-Markov Processes (GSMP) [5]. A GSMP is a natural representation of several concurrent SMP affecting the same state space. [16] introduced Generalized SemiMarkov Decision Processes (GSMDP) in order to model the problem of decision under uncertainty where actions compete with concurrent uncontrollable stochastic events. A GSMDP describes a problem by factoring the global transition function of the process by the different stochastic contributions of concurrent events. This makes GSMDP an elegant and efficient way of describing the complexity of Markov Temporal Problems. We will therefore focus on solving time-dependent GSMDP from now on and will give a more formal definition of GSMDP in section 3. + continuous sojourn time
MP
Figure 1.
3.1 Concurrent processes We start from the stochastic process point of view, with no decision making. Formally, a GSMP [5] is described by a set S of states and a set E of events. At any time, the process is in a state s and there exists a subset Es of events that are called active or enabled. These events represent the different concurrent processes that compete for the next transition. To each active event e, we associate a clock ce representing the duration before this event triggers a transition. This duration would be the sojourn time in state s if event e was the only active event. The event e∗ with the smallest clock ce∗ (the first to trigger) is the one that takes the process to a new state. The transition is then described by the transition model of the triggering event: the next state s is picked according to the probability distribution Pe∗ (s |s). In the new state s , events that are not in Es are disabled (which actually implies setting their clocks to +∞). For the events of Es , clocks are updated the following way: • If e ∈ Es \ {e∗ }, then ce ← ce − ce∗ • If e ∈ Es or if e = e∗ , pick ce according to Fe (τ |s ) The first active event to trigger then takes the process to a new state where the above operations are repeated. One first important remark concerning GSMP is that the overall process does not retain Markov’s property anymore : knowing the current state s is not sufficient to predict the distribution on the next state of the process. [9] showed that by augmenting the state space with the events’ clocks, one could retain the Semi-Markov behaviour for a GSMP, we will discuss this issue in the next section. Introducing action choice in a GSMP yields a GSMDP as defined by [16]. In a GSMDP, we identify a subset A of controlable events or actions, the remaining ones are called uncontrolable or exogenous events. Actions can be enabled or disabled at will and the subset As = A ∩ Es of activable actions is never empty since it always contains at least the “idle” action a∞ (whose clock is always set ∞) which, in fact, does nothing and lets the first exogenous event take the process to a new state. As in the MDP case, searching for control strategies on GSMDP imply defining rewards r(s, e) or r(s, e, s ) associated to transitions and introducing policies and criteria.
3.2 Controling GSMDP
+ concurrency
SMP
GSMP
SMDP
GSMDP
+ actions
MDP
phenomena at stake. In this section, we focus on the GSMDP formalism with observable time. We define control policies, the associated state variable issues and present resolution methods.
From MP to GSMDP
3 GSMP and GSMDP The previous section illustrated how Temporal Markov Problems needed both continuous observable time models and an efficient representation of concurrency in order to represent the complexity of the
As mentionned before, the transition function for the global semiMarkov process does not retain the Markov property without augmenting the state space. In the classical MDP framework, one can make use of the Markov property of the transition function to prove that there exists a Markovian policy (which only depends on the current state) which is at least as good as any historydependent policy [11]. In the GSMDP case however, this is no longer possible and in order to define criteria and to find optimal policies, we need - in the general case - to allow the policy to depend on the whole execution path of the process. An execution path [16] of length n from state s0 to state sn is a sequence σ = (s0 , t0 , e0 , s1 , . . . , sn−1 , tn−1 , en−1 , sn ) where ti is the sojourn time in state si before event ei triggers. As in [16], we define the discounted value of an execution path by: « Z ti n−1 X T „ t Vγπ (σ) = γ i γ i k(si , ei , si+1 ) + γ t c(si , ei )dt (3) i=0
0
E. Rachelson et al. / A Simulation-Based Approach for Solving Generalized Semi-Markov Decision Processes
where k and c are traditional P SMDP lump sum reward and reward rate functions, and Ti = i−1 j=0 tj . One can then define the expected value of policy π in state s as over all execution paths ˆ the expectation ˜ starting in s: Vγπ (s) = Esπ Vγπ (σ) . This provides a criterion for evaluating policies. The goal is now to find policies that maximize this criterion. The main problem here is that it is hard to search the space of history-dependent policies. On the other hand, the supplementary variable technique is often used to transform non-Markovian processes into Markovian ones. It consists in augmenting the state space with just enough variables so that the distribution over future states only depends on the current value of these variables. In [9], Nielsen augments the natural state s of the process with all the clock readings and shows that this operation brings Markov behavior back to the GSMP process. We will note this augmented state space (s, c) for convenience. Unfortunately, it is unrealistic to define policies over this augmented state space since clock readings contain information about the future of the system. From here, several options are possible: • One could decide to sacrifice optimality and to search for “good” policies among a restricted set of policies, say the policies defined on the current natural state only. • One could also search for representation hypotheses that simplify the GSMDP model and that make natural state Markovian again. • One could compute optimal policies on the augmented state space (s, c) and then derive a policy on observable variables only. • Finally, one could search for a set of observable variables which retain the Markov property for the process, for example the set composed of the natural state of the process s, the duration for which each active event ei has been active τi and its activation state si . We will note this augmented state (s, τ, sa ) [16] is based on the second option listed above. In the next paragraph, we briefly present this approach and introduce our reinforcement learning method designed to deal with very large state spaces for GSMDP with continuous observable time and that can be adapted to the three other options.
3.3 Resolution methods The resolution method for GSMDP proposed by [16] relies on the memoryless property of the exponential distribution. If one approximates all duration functions F by phase-type distributions (which are combinations of exponential distributions), then augmenting the state space with the distribution phases brings the overall behaviour of the GSMDP back to a Continuous Time MDP, which can, in turn, be transformed to a standard discrete time MDP by the method of uniformization [11]. We refer the reader to [16] for more details. We wish not make hypotheses on the distributions that describe the dynamics of our system. On top of that, many problems we want to consider present other characteristics such as very large, and sometimes continuous state spaces. Therefore, we need to consider methods for policy search that can cope with large hybrid state spaces (yielding large hybrid trajectory spaces) and observable time. Finally, for some aspects of the problems, the stochastic behaviour might still be very complex to model formally while simulators might be readily available (for instance, in the airport taxiway management problem, the weather model is not given as probability distribution functions but as a simulator). In order to deal with such problems we turn towards reinforcement learning methods. More specifically, in order to avoid complete state space exploration, we introduce a version of approximate policy iteration where policies are defined and evaluated
585
on a subset of states and then generalized by regression to the whole state space. The choice of the subset of states used for evaluation is guided by the simulation of the current policy. We present our algorithm in section 4.1 and then illustrate why simulation-based policy iteration is particularly adapted to temporal problems in section 4.2.
4 Simulation-based approaches 4.1 Algorithm Our algorithm belongs to the Approximate Policy Iteration (API) family of algorithms. Policy Iteration is an algorithm for solving MDP which searches the policy space in a two-step fashion as illustrated on figure 2. Given a policy πn at step n, the first step consists in computing the value of πn . The second step then performs a Bellman backup in every state of the state space, thus improving the policy. An important property of policy iteration is its good anytime behaviour: at any step n, policy πn will be at least as good as any previous policy. Policy Iteration usually converges in less iterations than the standard Value Iteration algorithm but takes longer since the evaluation step is very time consuming. To deal with real problems, one needs to allow for approximate policy evaluation (as in [7]) since exact computation is often infeasible. There are few theoretical guarantees on convergence and optimality of API, as explained in [8].
Policy evaluation: V πn
One-step improvement: πn+1
Figure 2.
Policy Iteration
The version of simulation-based policy iteration we use performs simulations of the current policy πn starting from the current state of the process and stores the triplets of states, times and rewards (sδ , tδ , rδ ) obtained. Thus, one execution path yields a value function over the discrete set of states explored during simulation (equation 3). All the value functions issued from simulation form a training set {(s, v)}, s ∈ S, v ∈ R, from which we wish to generalize a value function V˜ over all states. The average value of state s in the training set tends to V πn (s) as the number of simulations tends to +∞. One major advantage of policy-driven simulation is that the policy guides the exploration of the state space to the states most likely to be visited, thus refining the training set over the states that have the largest probability of being reached by the policy. A second advantage is that this technique is adapted to large dimension state spaces. Once simulation has provided the set of samples in the space of trajectories, we want to use it as a training set for a regression method that will generalize it to the entire state space. Several approaches to regression based reinforcement learning have been proposed in the machine learning community - methods based on trees [3], evolutionary functions [15], kernel methods [10], etc. - but few have been coupled with policy simulation. We chose to focus on support vector machines (SVM) because of their ability to handle the large dimension spaces over which our samples are defined. SVM belong to the family of kernel methods and can be used for both regression and classification. Training a standard SVM over a given training set corresponds to looking for a hyperplane interpolating the samples in a
586
E. Rachelson et al. / A Simulation-Based Approach for Solving Generalized Semi-Markov Decision Processes
higher dimensional space called feature space. Practically, SVM take advantage of the kernel trick to avoid expressing the feature space explicitely. For more details on SVM, we refer the reader to [14]. In our case, we call V˜n (s) the interpolated value function of policy πn . Finally, while simulation-based exploration and SVM generalization of the value function are techniques dedicated to improve the evaluation step of approximate policy iteration, the third specificity of our algorithm deals with improving the optimization step. For large and possibly continuous state spaces, it might be very long or impracticable to compute the one-step improvement of the policy. Indeed, most of the time, computing a complete policy is irrelevant since most of this policy will never be used for the simulation-based evaluation step. Instead, it might be easier to compute online the onestep lookahead best action in the current state with respect to the stored value function. More precisely, in a standard MDP, the optimization step consists in solving equation 4 in every state: ˜ n+1 (s, a) πn+1 (s) ← arg max Q a∈A X ˜ n+1 (s, a) = r(s, a) + P (s |s, a)V˜n (s, a) with: Q
(4)
s ∈S
For continuous state spaces, computing πn+1 implies being able to compute integrals over P and V˜n . We wish not make hypotheses on the model used and therefore will perform a discretization for evaluation of the integral. Finally, since the model of P is not necessarily known to the decision maker and since we have a simulator of our system, we will make a second use of this simulator for the purpose ˜ n+1 (s, a) associated with perof evaluating the expected reward Q forming action a in state s with respect to value function V˜n (equation 5). At the end of the evaluation phase, the value function V˜n is stored and no policy is computed from it. Instead, we immediately enter a new simulation phase but whenever the policy πn+1 is asked for the action to perform in the current state s it performs online the estimation of all Q-values for state s and then choses the best action to perform. The speed up in the execution of the policy iteration algorithm is easy to illustrate for discrete state spaces problems since we replace |S| evaluations of the Q-values for policy update by the number of states visited during one simulation. This is especially interesting in the case of Temporal Markov Problems since (as we will explain in section 4.2) a state is never visited twice. Consequently, ˜ n+1 (s, a) is calculated by simply simulating N times the applicaQ tion of a in s and observing the set of {(ri , si )} as in equation 5. Then the policy returns the action which corresponds to the largest Q-value. We call this online instanciation of the policy “online approximate policy iteration”. N h i X ˜ n+1 (s, a) = 1 Q ri + V˜n (si ) N i=1
(5)
Our algorithm, called online Approximate Temporal Policy Iteration (online-ATPI), is summarized in algorithm 1. Note that in algorithm 1, s actually denotes the part of the state that is observable to the policy. This makes online-ATPI adaptable to any of the sets of policy variables presented in section 3.2. We tested a version of online-ATPI on the natural state of the process.
4.2 Simulating GSMDP and learning Simulation is a key aspect of ATPI. The Discrete EVents Simulation theory (DEVS) of [17] provides a general framework for specifying discrete event dynamic systems. We implemented GSMP and
Algorithm 1 Online-ATPI main: Input : π0 or V˜0 , s0 loop T rainingSet ← ∅ for i = 1 to Nsim do {(s, v)} ← simulate(V˜ , s0 ) T rainingSet ← T rainingSet ∪ {(s, v)} end for V˜ ← TrainApproximator(T rainingSet) end loop simulate(V˜ , s0 ): ExecutionP ath ← ∅ s ← s0 while horizon not reached do action ← ComputePolicy(s, V˜ ) (s , r) ← GSMDPstep(s, action) ExecutionP ath ← ExecutionP ath ∪ (s , r) end while convert execution path to value function {(s, v)} (eqn 3) return {(s, v)} ComputePolicy(s, V˜ ): for a ∈ A do ˜ a) = 0 Q(s, for j = 1 to Nsamples do (s , r) ← GSMDPstep(s, a) ˜ a) ← Q(s, ˜ a) + r + γ t −t V˜ (s ) Q(s, end for 1 ˜ a) ˜ a) ← Q(s, Q(s, Nsamples end for ˜ a) action ← arg max Q(s, return action
a∈A
GSMDP extensions in the VLE multi-modeling platform [12] based on the DEVS specification; by doing so, we take advantage of the DEVS framework’s properties which fit our simulation requirements, namely: • Event driven simulation and time oriented output. • The simulation engine deals with simultaneity issues and with simulation consistency and reproducibility. • Simulation engines such as the VLE platform [12] are readily available and built on the same discrete events simulation theory. • Multi-modelling possibilities, opens the algorithm to other formalisms than MP. On top of that, the DEVS formalism allows for experimental frames definition, which would permit integration of the whole simulation and planning loop in a DEVS specification. We haven’t used experimental frames yet but plan to do so in future versions. Finally, we have claimed that Temporal Markov Problems present a specific structure that makes the problem both hard to deal with for classical reinforcement learning algorithms and particularly adapted for online approximate policy iteration. More specifically: • Most reinforcement learning algorithms deal with discrete state spaces. Some approaches have been proposed ([10, 3, 6] for dealing with continuous or hybrid states but the topic is still very new. Often, continuous state resolution methods depend strongly on the
E. Rachelson et al. / A Simulation-Based Approach for Solving Generalized Semi-Markov Decision Processes
representation used and on the ability to calculate integrals over the probability functions. Simulation-based sampling approaches propose a different approach to this issue. • When time is observable, the causality principle ensures that the process never goes back in time. This avoids loops and insures that online policy instanciation performs less operations than a complete offline policy improvement step.
4.3 Example Table 1 presents optimization results for the first four iterations of online-ATPI for the subway problem initialized with a policy π0 that sets trains to run all day long 2 . Nsim was set to 20 and Nsamples to 15 with γ = 1 (finite horizon). This simple instance of the subway problem implied 4 trains and 6 stations. The problem’s specification took time-dependency and stochastic behaviour into account; for example passenger arrival periods were represented using Gaussian distributions with means and standard deviations depending on the time of day. The state space for this problem included 22 discrete, boolean or continuous variables (including time), thus yielding a sample space of dimension 22 for the training set. In table 1, tsim is the training set building time (which corresponds to performing the Nsim simulations) while tlearn is the SVM training time (in seconds). V˜stat (s0 ) is the statistical evaluation of V˜ (s0 ), while V˜SV M (s0 ) is the value provided by the trained SVM. Lastly, #SV is the number of support vectors in the SVM. The expected value of the initial state increases with iterations; this confirms the fact that policy quality improves with each iteration. This increase is not necessarily linear and depends on the problem’s structure. If the policy takes the simulation to states that are “far” from explored states (states for which the interpolated value might be erroneous) and that provide very bad rewards, it can happen that the initial state’s expected value drops for one iteration. This is the drawback from partial exploration of the state space and interpolation: very good or very bad regions of the state space might be discovered late in the iterations. One can notice that simulation time increases with iterations. This is mainly due to the number of support vectors in the SVM. Depending on the iteration step, the SVM can be much simpler and simulation time can drop again. On the other hand, online-ATPI is still very sensitive to the initial policy and we are currently working on other possibilities to improve solution quality (such as roll-out techniques and estimator refinement during optimization by simulationoptimization interweaving). Table 1. Subway control policy
tsim tlearn V˜stat (s0 ) V˜SV M (s0 ) #SV
π0 47.1 2.28 -3261.31 -2980.29 55
π1 203.43 2.7 -3188.11 -2962.46 61
π2 206.45 12.18 -2074.74 -2020.22 439
π3 446.15 56.08 -1850.12 -1837.41 3588
π4 1504.41 229.45 -887.076 -875.417 13596
Since Nsim = 20 simulations per iteration always provide a training set of around 45000 points for the SVM in the subway example, the number of support vectors for the SVM - and therefore, the iteration duration - is bounded. Longer runs on the subway problem show that the number of support vectors and learning time in column π4 are a good estimate of the worst values. 2
experiments were ran on a 1.7GHz single core processor with 1GB of RAM
587
5 Conclusion This paper introduces a new reinforcement learning method for solving Generalized Semi-Markov Decision Processes. These processes are a natural and elegant way of representing the complexity of concurrent stochastic processes. In the framework of time-dependent GSMDP with explicit time, simulation seems to be an efficient way of exploring the state space and evaluating strategies. Drawing from this idea, we introduced a simulation-based version of Approximate Policy Iteration (API), which we called online-ATPI. This algorithm incrementally improves the quality of an initial policy by making use of simulation-based evaluation, SVM regression and online policy instanciation. Although there are few theorical results concerning the convergence and optimality of API, online-ATPI seems to perform well on an example of subway network control. Future work will deal with making online-ATPI more robust to initialization; in fact, if the initial policy does not guide the simulation towards relevant areas of the state space, the error in policy evaluation can greatly penalize the algorithm. To avoid this drawback, we plan to use incremental refining methods for simulation initialization. This could result in building a more dense training set, therefore minimizing the risk of not exploring relevant parts of the state space.
REFERENCES [1] R. E. Bellman, Dynamic Programming, Princeton University Press, Princeton, New Jersey, 1957. [2] J. Boyan and M. Littman, ‘Exact solutions to time dependent MDPs’, Advances in Neural Information Processing Systems, 13, 1026–1032, (2001). [3] D. Ernst, P. Geurts, and L. Wehenkel, ‘Tree-based batch mode reinforcement learning’, JMLR, 6, 503–556, (2005). [4] Z. Feng, R. Dearden, N. Meuleau, and R. Washington, ‘Dynamic programming for structured continuous markov decision problems’, in 20th Conference on Uncertainty in AI, pp. 154–161, (2004). [5] P. Glynn, ‘A GSMP formalism for discrete event systems’, Proc. of the IEEE, 77, (1989). [6] M. Hauskrecht and B. Kveton, ‘Approximate linear programming for solving hybrid factored MDPs’, in 9th Int. Symp. on AI and Math., (2006). [7] M. Lagoudakis and R. Parr, ‘Least-squares policy iteration’, JMLR, 4, 1107–1149, (2003). [8] R. Munos, ‘Error bounds for approximate policy iteration’, in Int. Conf. on Machine Learning, (2003). [9] F. Nielsen, ‘GMSim: a tool for compositionnal GSMP modeling’, in Winter Simulation Conference, (1998). [10] Dirk Ormoneit and Saunak Sen, ‘Kernel-based reinforcement learning’, Machine Learning, 49, 161–178, (2002). [11] M. Puterman, Markov Decision Processes, John Wiley & Sons, Inc, 1994. ´ Ramat, and M.K. Traore, ‘VLE - A Multi[12] G. Quesnel, R. Duboz, E. Modeling and Simulation Environment’, in Moving Towards the Unified Simulation Approach, Proc. of the 2007 Summer Simulation Conf., pp. 367–374, (2007). [13] E. Rachelson, F. Garcia, and P. Fabiani, ‘Extending the Bellman equation for MDP to continuous actions and continuous time in the discounted case’, in 10th Int. Symp. on AI and Math., (2008). [14] V. Vapnik, S. Golowich, and A. Smola, ‘Support vector method for function approximation, regression estimation and signal processing’, Advances in Neural Information Processing Systems, 9, 281–287, (1996). [15] Shimon Whiteson and Peter Stone, ‘Evolutionary function approximation for reinforcement learning’, JMLR, 7, 877–917, (2006). [16] H. Younes and R. Simmons, ‘Solving Generalized semi-Markov Decision Processes using Continuous Phase-Type Distributions’, in AAAI, (2004). [17] B. P. Zeigler, D. Kim, and H. Praehofer, Theory of modeling and simulation: Integrating Discrete Event and Continuous Complex Dynamic Systems, Academic Press, 2000.
588
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-588
Heuristics for Planning with Action Costs Revisited Emil Keyder1 and H´ector Geffner2 Abstract. We introduce a simple variation of the additive heuristic used in the HSP planner that combines the benefits of the original additive heuristic, namely its mathematical formulation and its ability to handle non-uniform action costs, with the benefits of the relaxed planning graph heuristic used in FF, namely its compatibility with the highly effective enforced hill climbing search along with its ability to identify helpful actions. We implement a planner similar to FF except that it uses relaxed plans obtained from the additive heuristic rather than those obtained from the relaxed planning graph. We then evaluate the resulting planner in problems where action costs are not uniform and plans with smaller overall cost (as opposed to length) are preferred, where it is shown to compare well with cost-sensitive planners such as SGPlan, Sapa, and LPG. We also consider a further variation of the additive heuristic, where symbolic labels representing action sets are propagated rather than numbers, and show that this scheme can be further developed to construct heuristics that can take delete-information into account.
1
PLANNING MODEL AND HEURISTICS
We consider planning problems P = F, I, O, G expressed in Strips, where F is the set of relevant atoms or fluents, I ⊆ F and G ⊆ F are the initial and goal situations, and O is a set of (grounded) actions a with precondition, add, and delete lists P re(a), Add(a), and Del(a) respectively, all of which are subsets of F . For each action a ∈ O, we assume that there is a non-negative cost(a) so that the cost of a plan π = a1 , . . . , an is cost(π) =
n X
cost(ai )
(1)
i=1
This cost model is a generalization of the classical model where the cost of a plan is given by its length. Two of the heuristics used to guide the search for plans in the classical setting are the additive heuristic ha used in HSP [2], and the relaxed plan heuristic hFF used in FF [11]. Both are based on the delete relaxation P + of the problem, and both attempt to approximate the optimal delete-relaxation heuristic h+ which is well-informed but intractable. We review these heuristics below. In order to simplify the definition of some of the heuristics, we introduce a new dummy End action with zero cost, whose preconditions G1 , . . . , Gn are the goals of the problem, and whose effect is a dummy atom G. The heuristics h(s) then estimate the cost of achieving this ’dummy’ goal G from s. 1 2
Universitat Pompeu Fabra, Passeig de Circumvalaci´o 8, 08003 Barcelona, Spain. email: emil.keyder@upf.edu ICREA & Universitat Pompeu Fabra, Passeig de Circumvalaci´o 8, 08003 Barcelona, Spain. email: hector.geffner@upf.edu
1.1
The Additive Heuristic
Since the computation of the optimal delete-free heuristic h+ is intractable, HSP introduces a polynomial approximation in which subgoals are assumed to be independent in the sense that they are achieved with no ’side effects’ [2]. This assumption is normally false, but results in a simple heuristic function def
ha (s) = h(G; s)
(2)
that can be computed quite efficiently in every state s visited in the search from the recursive equation: j 0 if p ∈ s def (3) h(p; s) = h(ap ; s) otherwise where h(p; s) stands for an estimate of the cost of achieving the atom p from s, h(a; s) stands for an estimate of the cost of applying action a in s, and ap is a best support of fluent p in s. These two expressions are defined in turn as X def h(q; s) (4) h(a; s) = cost(a) + q∈P re(a)
and def
ap = argmina∈O(p) h(a; s)
(5)
where O(p) stands for the actions in the problem that add p. Versions of the additive heuristic appear also in [6, 16, 17], where the cost of joint conditions in action preconditions or goals is set to the sum of the costs of each condition in isolation. When the ’sum’ in (4) is replaced by ’max’, the heuristic hmax is obtained [2]. The heuristic hmax , unlike the additive heuristic ha , is admissible, but less informed. The heuristics coincide and are equivalent to the optimal delete-relaxation heuristic h+ when all the actions involve a single precondition and the goal involves a single atom.
1.2
The Relaxed Planning Graph Heuristic
The planner FF modifies HSP along two dimensions: the heuristic and the search algorithm. Unlike ha , the heuristic hFF used in FF makes no independence assumption for approximating h+ , computing instead one plan for P + which is not guaranteed to be optimal. This is done by a Graphplan-like procedure [1], which due to the absence of deletes constructs a planning graph with no mutexes, from which a plan πFF (s) is extracted backtrack-free [11]. The heuristic hFF (s) is then set to |πFF (s)|. The basic search procedure in FF is not best-first as in HSP but (enforced) hill-climbing (EHC), in which the search moves from a state s to a neighboring state s with smaller heuristic value by performing a breadth first search. This breadth first search is carried out with a reduced branching factor, ignoring actions a that are not found to be ’helpful’. The ’helpful actions’ in
E. Keyder and H. Geffner / Heuristics for Planning with Action Costs Revisited
a state s are the actions applicable in s that add the precondition p of an action in πFF (s) for p ∈ s. The use of EHC search, along with the pruning of non-helpful actions are the key factors that make FF scale up better than HSP in general [11], but due to its construction, the heuristic hFF cannot be extended easily to take action costs into account (yet see [7]).
1.3
Relaxed Plans without Planning Graphs
A simple variation of the additive heuristic can be defined that is cost sensitive and results in relaxed plans compatible with helpful action pruning and EHC search. For this, the best support ap of each atom p in the state s, calculated as part of the computation of the heuristic ha (s) in Equation 5, is stored.3 The definition of the set of actions πa (s) that make up a relaxed plan then simply collects these best supports backwards from the goal: def
πa (s)
=
π(p; s)
=
def
π(G; s) j {} S {ap } ∪ q∈pre(ap ) π(q; s)
if p ∈ s otherwise
Intuitively, the relaxed plan πa (p; s) is empty if p ∈ s, and the union of the best supporter ap for p with the relaxed plans for each of its preconditions q ∈ pre(ap ) otherwise. Note that πa (s), being a set of actions, can contain an action at most once. The same construction, captured by Equation 6, underlies the construction of the relaxed plan πFF (s) computed by FF from the relaxed planning graph. For this, however, the best supports ap that encode the ’best’ actions for achieving the atom p in the relaxation, must be obtained from the hmax heuristic and not from ha ; a modification that just involves changing the sum operator in (4) by the max operator. The hmax heuristic is known to encode the first level of the relaxed planning graph that contains a given action or fact. It is simple to prove that the collection of actions in πa (s) represents a plan from s in the delete relaxation P + . This relaxed plan, unlike the relaxed plan πFF (s) is sensitive to action costs, and can be used in FF in place of πFF (s). We call the resulting planner FF(ha ).
2
THE FF(ha ) PLANNER
In FF(ha ), the relaxed plans πa (s) are produced by computing the additive heuristic using a Bellman-Ford algorithm while keeping track of the chosen lowest-cost supporter for each atom, and then recursively collecting the best supporters starting from the goal. The heuristic h(s) used for P measuring progress in FF(ha ) is defined as the relaxed plan cost a∈πa (s) cost(a) and not as its length |πa (s)|. This heuristic, which is obtained from the computation of the additive heuristic ha , is almost equivalent to ha (s) but does not count the cost of an action more than once. The EHC search used in FF(ha ) is a slightly modified version of that used in FF. While a single step of EHC in FF ends as soon as a state s is found by breadth-first search from s such that h(s ) < h(s), in FF(ha ), all states s resulting from applying a helpful action a in s are evaluated and among those for which h(s ) < h(s) holds, the action minimizing the expression cost(a) + h(s ) is selected. Like in FF, the helpful actions in s are the actions applicable in s that add the precondition p of an action in πa (s) such that p ∈ s. 3
We assume that ties in the selection of the best supports ap are broken arbitrarily. The way ties are broken does not affect the value of the additive heuristic ha (s) in a state s but may affect the value of the heuristic defined below. The same is true for FF’s heuristic.
589
FF(ha ) is implemented on top of the Metric-FF planner [10] because of its ability to handle numeric fluents, through which nonuniform action costs are currently expressed in PDDL. FF(ha ) does not make use of numeric fluents for any other purpose besides representing action costs.
3
EXPERIMENTAL RESULTS
We evaluated the performance of FF(ha ) in comparison with other cost-sensitive planners; namely SGPlan [5], LPG-quality [8] and Sapa [6]4 on 11 domains.5 For reference, the curves show also the plan times and costs obtained by running FF, that ignores cost information, and FF-quality, an option in Metric-FF that optimizes a given plan metric by using an FF-like heuristic in a Weighted A* search [10]. Experiments were performed with eleven domains, five of these taken from the numeric track of the Third International Planning Competition (IPC3). Of these 5 domains, the Depots, Rovers, Satellite, and Zenotravel domains were modified by removing all occurrences of numeric variables from action preconditions and goals, once the action cost information was extracted from the PDDL. Also, as a reference, all planners except LPG were evaluated on the STRIPS (uniform cost) versions of these domains, and all planners were evaluated on 6 new domains introduced here, which were constructed with the aim that the length of solutions not correlate with their cost. Indeed, in two of these domains, the Minimum Spanning Tree and Assignment domains, all valid solutions contain the same number of actions. The other domains are: Shortest Path (shortestpath problems), Colored Blockworld (blocks have colors and colors must be stacked in certain ways in the goal, with costs associated with the different blocks), Delivery (a variation of the IPC5 domain TPP), and Simplified Rovers (a domain adapted from [17], in which a robot must collect samples from rocks in a grid). Moreover, for S. Rovers, both hard goal and soft goal versions were used, with the soft goals being compiled away into action costs, following the procedure described in [13].6 The experiments were run on a CPU running at 2.33 GHz with 8 GB of RAM. Execution time was limited to 1,800 seconds. The results, including plan costs and planning times for the various planners, are reported in the figures. Some observations about the results follow. Quality of Plans: In almost all of the domains, FF(ha ) produces the best plans, with the exception of the hard-goal version of S. Rovers (Fig. 3c), where it does particularly bad, and in the soft version (Fig. 3b). In both cases, LPG does better, although the opposite occurs in several domains like in Delivery (Fig. 1a), Satellite (Fig. 2a), and the Assignment Problem (Fig 2c). Sapa produces plans that are close to the best quality plans in all the domains for which it can be executed, yet is usually able to solve only the smallest instances in each domain. FF-quality suffers from a similar problem, solving a significant proportion of the instances in a few domains only.7 Overall, SGPlan does not appear to produce better plans than FF, even if FF ignores costs completely, and both produce plans that are often much worse than FF(ha ). In the STRIPS versions of the 4
5 6 7
Sapa was compiled from Java to native machine code with the GNU compiler. We were later informed by the authors that this results in a slowdown of approximately 50% compared to the version running on the Java virtual machine. LPG and Sapa could not be run on some of the domains due to bugs. We cannot provide further details on these domains due to lack of space, but the PDDL files are available from the authors. For clarity, FF-quality’s results are shown only for domains in which it was able to solve a significant number of instances.
590
E. Keyder and H. Geffner / Heuristics for Planning with Action Costs Revisited
8000
1800
FF SGPlan LPG-quality SapaPS FF(ha)
7000
FF SGPlan LPG-quality SapaPS FF(ha)
1600
1400
6000
1200 5000 1000 4000 800 3000 600 2000 400 1000
200
0
0 2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
2
4
(a) Plan costs - Delivery domain
6
8
10
12
14
16
18
20
18
20
(a) Plan costs - Satellites domain 160
2200 FF SGPlan LPG-quality FF-quality SapaPS FF(ha)
2000 1800
FF SGPlan LPG-quality FF-quality SapaPS FF(ha)
140
120
1600 1400
100
1200 80 1000 60
800 600
40
400 20 200 0
0 2
4
6
8
10
12
14
16
18
2
20
4
8
10
12
14
16
(b) Length of Plans above in Satellites
(b) Plan costs - Shortest Path domain 3500
3000
FF SGPlan LPG-quality FF-quality FF(ha)
3000
6
FF SGPlan LPG-quality FF-quality SapaPS FF(ha)
2500
2500 2000 2000 1500 1500 1000 1000
500
500
0
0 2
4
6
8
10
12
14
16
18
20
2
(c) Plan costs - Minimum Spanning Tree domain 10000
1000
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
(c) Plan costs - Assignment Problem domain 300000
FF SGPlan LPG-quality FF-quality FF(ha)
250000
100
200000
10
150000
1
100000
0.1
50000
0.01
FF SGPlan LPG-quality SapaPS FF(ha)
0 2
4
6
8
10
12
14
16
18
(d) Planning times - Minimum Spanning Tree domain
Figure 1.
20
2
4
6
8
10
12
14
16
(d) Plan costs - Zenotravel domain
Figure 2.
18
20
591
E. Keyder and H. Geffner / Heuristics for Planning with Action Costs Revisited
five IPC3 domains (unit costs), all planners produce plans of roughly equal quality. Planning Times: FF(ha ) is somewhat slower than FF on most problems, though the difference is usually a constant factor (Fig. 1d).8 There are two main reasons for this. The first is that computing ha and extracting the associated relaxed plan πa is somewhat more costly than the equivalent operation on the relaxed planning graph, so FF(ha ) takes longer to perform the same number of heuristic evaluations as FF. In general, hFF evaluates states 2–10 times faster than ha . The second is that while FF minimizes the number of actions in the plan, FF(ha ) minimizes the cost of the plan, which in some cases leads to longer plans, requiring more search nodes and more heuristic evaluations (Fig. 2b). SGPlan takes roughly the same amount of time as FF on almost all domains considered, while LPG is roughly an order of magnitude slower than the other planners except Sapa, but appears to have better scaling behaviour. Sapa is slower than LPG by roughly one order of magnitude.
4
FURTHER VARIATIONS OF THE ADDITIVE HEURISTIC
We consider briefly two further variations of the additive heuristic: the set-additive heuristic and the TSP heuristic, both analyzed in more detail in [12, 13].
4.1
The Set Additive Heuristic
In the additive heuristic, the value h(ap ; s) of the best supporter ap of p in s is propagated to obtain the heuristic value h(p; s) of p. In contrast, in the set-additive heuristic, the best supporter ap of p is itself propagated, with supports combined by set-union rather than by sum, resulting in a recursive function π(p; s) that represents the set of actions in a relaxed plan for p in s, which can be defined similarly to h(p; s) as: j {} if p ∈ s π(p; s) = (6) π(ap ; s) otherwise
4.2
The TSP Heuristic
The set-additive heuristic can be generalized by replacing the plans π(p; s) with more generic labels L(p; s) that can be numeric, symbolic, or a suitable combination, provided that there is a function Cost(L(p; s)) mapping labels L(p; s) to numbers. Here we consider labels L(p; s) that result from treating one designated multivalued variable X in the problem in a special way. A multivalued variable X is a set of atoms x1 , . . . , xn such that exactly one xi holds in every reachable state. For example, in a task where there are n rocks r1 , . . . , rn to be picked up at locations l1 , . . . , ln , the set of atoms at(l0 ), at(l1 ), . . . , at(ln ), where at(l0 ) is the initial agent location, represent one such variable, encoding the possible locations of the agent. If the cost of going from location li to location lk is c(li , lk ), then the cost of picking up all the rocks is the cost of the best (min cost) path that visits all the locations, added to the costs of the pickups. This problem is a TSP and therefore intractable, but its cost can be approximated by various fast suboptimal TSP algorithms.9 By comparison, the delete relaxation approximates the cost of the problem as the cost of the best tree rooted at l0 that spans all of the locations. The modification of the labels π(p; s) in the set-additive heuristic allows us to move from the approximate model captured by the delete relaxation to approximate TSP algorithms over a more accurate model (see [15] for other uses of OR models in planning heuristics). For this, we assume that the actions that affect the selected multivalued variable X do not affect other variables in the problem, and maintain in each label π(p; s) two disjoint sets: a set of actions that do not affect X, and the set of X-atoms required as preconditions by these actions. The heuristic hX (s) is then defined as hX (s) = CostX (π(G, s))
(11)
where CostX (π) is the sum of the action costs for the actions in π that do not affect X plus the estimated cost of the ’local plan’ [4] that generates all the X-atoms in π, expressed as ¯ + CostT SP (π ∩ X) CostX (π) = Cost(π ∩ X)
where
(12)
where ap
=
π(a; s)
=
Cost(π(a; s))
=
argmina∈O(p) Cost(π(a; s)) [ {a} {∪q∈P re(a) π(q; s)} X cost(a )
(7) (8)
The set-additive heuristic
hsa (s)
for a state s is then defined as
= Cost(π(G; s)) .
=
ap
=
π(a; s)
=
(9)
a ∈π(a;s)
hsa (s)
π(p; s)
8 < {} {p} : π(ap ; s)
if p ∈ s if p ∈ X otherwise
argmina∈O(p) CostX (π(a; s)) [ {a} {∪q∈P re(a) π(q; s)}
It is easy to show that the collection of actions π(p; s) for all atoms p represent plans for achieving the atom p in the delete-relaxation P + , which in the set-additive heuristic are computed recursively, starting with the trivial (empty) plan for the atoms p ∈ s. From a practical point of view, this recursive computation does not appear to be cost-effective in general, as the relaxed plans πa (p; s) obtained from the normal additive heuristic are normally as good and can be computed faster. Yet the planner FF(hsa ) obtained from FF by replacing the relaxed plans πFF (s) by π(G; s) above compares well with existing cost-sensitive planners (see [12]), and the formulation of the set-additive heuristic opens the door to the formulation of a broader family of heuristics.
and CostT SP (R) is the cost of the best path spanning the set of atoms R, starting from the value of X in s, in a directed graph whose nodes stand for the different values x of X, and whose edges (x, x ) have costs that encode approximations of the cost of achieving x from x in s (see [13] for details). We have implemented the planner FF(hX ) in which hX , rather than ha , is used to derive the relaxed plan, with the X variables being automatically chosen as the root variables of the causal graph [3, 9]. This planner produces plans of much lower cost than any other planner tested in the soft goals version of the Simplified Rovers domain (Fig. 3b), and plans of much lower cost than any other planner except LPG in the hard goals version (Fig. 3c), where LPG produces plans of only slightly worse quality.
8
9
We omit further data on planning time due to space considerations.
(10)
In our planner we have implemented the 2-opt algorithm discussed in [14].
592
E. Keyder and H. Geffner / Heuristics for Planning with Action Costs Revisited
5
450 FF SGPlan FF-quality FF(ha)
400
350
300
250
200
150
100
50
0 2
4
6
8
10
12
14
16
18
20
(a) Plan costs - Colored Blocksworld domain 12000 FF SGPlan FF(hX) LPG-quality FF-quality FF(ha)
10000
8000
6000
DISCUSSION
We have shown that relaxed plans and therefore helpful actions can be computed without the use of a relaxed planning graph, meaning that other heuristics can be used in conjunction with FF’s powerful EHC search. Our method of relaxed plan extraction using the additive heuristic is cost-sensitive and does not impose a large overhead over that of FF. Furthermore, a simple planner that combines the relaxed plan extracted in this way with the EHC search algorithm compares favourably to the state of the art in planning with action costs. Two other variations of the additive heuristic were also presented: the set-additive heuristic in which the relaxed plans are computed recursively, and the TSP heuristic, that takes delete-information into account. In both cases, labels are propagated rather than numbers in the equation characterizing the additive heuristic. Used together with EHC search, the TSP heuristic produces plans of much lower cost than any other planner tested in navigation problems where finding good paths going through a set of locations is critical. Our implementation of the TSP heuristic, however, is preliminary, and is suited only for problems where these locations correspond to the values of a single root variable in the causal graph.
ACKNOWLEDGEMENTS
4000
2000
0 2
4
6
8
10
12
14
16
18
20
(b) Plan costs - soft goals version of S. Rovers domain 12000 FF SGPlan FF(hX) LPG-quality FF(ha)
10000
8000
6000
4000
2000
0 2
4
6
8
10
12
14
16
18
20
(c) Plan costs - hard goals version of S. Rovers domain 1000 FF SGPlan FF(hX) LPG-quality FF(ha) 100
10
1
0.1
0.01 2
4
6
8
10
12
14
16
18
20
(d) Planning times - hard goals version of S. Rovers domain
Figure 3.
We thank the reviewers for useful comments and J. Hoffmann for making the sources of Metric-FF available. H. Geffner is partially supported by grant TIN2006-15387-C03-03 from MEC/Spain.
REFERENCES [1] A. Blum and M. Furst, ‘Fast planning through planning graph analysis’, in Proc. IJCAI-95, pp. 1636–1642, (1995). [2] B. Bonet and H. Geffner, ‘Planning as heuristic search’, Artificial Intelligence, 129(1–2), 5–33, (2001). [3] R. Brafman and C. Domshlak, ‘Structure and complexity of planning with unary operators’, JAIR, 18, 315–349, (2003). [4] R. Brafman and C. Domshlak, ‘Factored planning: How, when, and when not’, in Proc. AAAI-06, (2006). [5] Y. Chen, B. W. Wah, and C. Hsu, ‘Temporal planning using subgoal partitioning and resolution in SGPlan’, JAIR, 26, 323–369, (2006). [6] M. Do and S. Kambhampati, ‘Sapa: A domain-independent heuristic metric temporal planner’, in Proc. ECP 2001, pp. 82–91, (2001). [7] R. Fuentetaja, D. Borrajo, and C. Linares, ‘Improving relaxed planning graph heuristics for metric optimization’, in Proc. 2006 AAAI Workshop on Heuristic Search, (2006). [8] A. Gerevini, A. Saetti, and Ivan Serina, ‘Planning through stochastic local search and temporal action graphs in LPG’, JAIR, 20, 239–290, (2003). [9] M. Helmert, ‘A planning heuristic based on causal graph analysis’, in Proc. ICAPS-04, pp. 161–170, (2004). [10] J. Hoffmann, ‘The Metric-FF planning system: Translating ”ignoring delete lists” to numeric state variables’, JAIR, 20, 291–341, (2003). [11] J. Hoffmann and B. Nebel, ‘The FF planning system: Fast plan generation through heuristic search’, JAIR, 14, 253–302, (2001). [12] E. Keyder and H. Geffner, ‘Heuristics for planning with action costs’, in Proc. Spanish AI Conference (CAEPIA 2007), volume 4788 of Lecture Notes in Computer Science, pp. 140–149. Springer, (2007). [13] E. Keyder and H. Geffner, ‘Set-additive and TSP heuristics for planning with action costs and soft goals’, in ICAPS-07 Workshop on Heuristics for Domain Independent Planning, (2007). [14] S. Lin and B. W. Kernighan, ‘An effective heuristic algorithm for the TSP’, Operations Research, 21, 498–516, (1973). [15] Derek Long and Maria Fox, ‘Automatic synthesis and use of generic types in planning.’, in Proc. AIPS-2000, pp. 196–205, (2000). [16] O. Sapena and E. Onaindia, ‘Handling numeric criteria in relaxed planning graphs’, in Advances in Artificial Intelligence: Proc. IBERAMIA 2004, LNAI 3315, pp. 114–123. Springer, (2004). [17] D. E. Smith, ‘Choosing objectives in over-subscription planning’, in Proc. ICAPS-04, pp. 393–401, (2004).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-593
593
Diagnosis of Simple Temporal Networks Nico Roos 1 and Cees Witteveen 2 Abstract. In many domains successful execution of plans requires careful monitoring and repair. Diagnosis of plan execution supports this process by identifying causes of plan failure. Most plans have to satisfy temporal constraints. An important and common occurring problem during plan execution are violations of temporal plan constraints. This paper addresses diagnosis of such temporal constraint violations by modeling the temporal aspects of a plan as a Simple Temporal Network (STN). We investigate the computational properties of standard diagnostic concepts but we also argue that traditional notions of preferred diagnoses such as minimum diagnosis are not adequate. A new notion of a maximum confirmation diagnosis is introduced.
1
Introduction
A Simple Temporal Network (STN) [8] provides a way to describe (i) a plan, (ii) temporal aspects of plan steps, and (iii) temporal relations between plan steps. It also enables the description of schedule constraints, and of observations about the temporal execution of the plan, using the same formalism. The observations may violate the temporal constraints of the plan or its schedule, giving rise to a Simple Temporal Diagnosis (STD) problem. Diagnosis should identify the plan and scheduling constraints that have been violated during plan-execution. A Simple Temporal Diagnosis problem is related to a Simple Temporal Problem (STP) [8]. An STP addressed the identification of an allowable schedule for an STN. The STD problem extends this by identifying where the actual execution schedule starts to deviate from the allowable schedule. Note that STD may also be used prior to plan execution if an STP does not have an allowable schedule.
required for the transfer of passengers between flights. Flight NW 456 arrived 16:05 at the gate while it was scheduled to arrive at the gate between 14:55 and 15:00. The cause of its delay was a delayed departure of 15 minutes at JFK and an additional delay during the flight caused by unexpected head-winds. Figure 1 show the schedule of the plan and the actual execution of the plan. The figure shows the time lines for the flights KL123 and NW 456, and the time line for the catering. The blocks drawn on the time line represent the plan steps. Note that the length of the blocks roughly indicate the duration of the plan steps. The uncertainty about the start or finish of a plan steps is indicated by the time intervals below each time line. Also note the white blocks that are placed above instead of on the time line. These blocks indicate (i) the ‘waiting time’ between the on-block time of flight NW 456 and the off-block time of flight KL 123 in which passengers are transferred between the two flights, and (ii) the ‘waiting time’ between the finish of the catering service and the off-block time of flight KL 123. Schedule KL123 15:55-16:10
passenger transfer
NW456 8:50-8:55
14:55-15:00
Catering 15:15-15:30
Execution KL123 16:30
passenger transfer
2
Running example
To illustrate the ideas presented in the following sections, we will use a problem from the domain of Air Traffic Control as a running example. Flight KL 123 has a delayed departure; a delayed takeoff at 16:30 instead of the scheduled takeoff at 15:55-16:00. The taxiing time of 15-20 minutes incurred no delays. In fact flight KL 123 had a delayed off-block time. At the gate flight KL 123 incurred a delay because of the catering. Catering was scheduled to start the delivery of food between 15:15 and 15:30, which must arrive at the airplane 10 to 30 minutes before the off-block. The actual delivery time was 16:00. Flight KL 123 was also delayed because the flight had to wait on transfer passengers from flight NW 456. At least 30 minutes are 1 2
Maastricht ICT Competence Center, Universiteit Maastricht P.O.Box 616, NL-6200 MD Maastricht, email: roos@micc.unimaas.nl Faculty EEMCS, Delft University of Technology P.O.Box 5031, NL-2600 GA Delft, email: C.Witteveen@tudelft.nl
NW456 9:10
16:05
Catering 16:00
Figure 1.
The schedule and the execution of two flights and the catering.
The goal of diagnosis is to determine to what extent the plan constraints describing plan step durations, time restriction on successive plan steps, and the scheduled, are satisfied using partial observations of the plan execution.
3
Preliminaries
Simple Temporal Networks A STN (E, C) describes a plan and its schedule by a set of events E and a set of constraints C over the events. Events denote such things as the start start(s) of a plan step
594
N. Roos and C. Witteveen / Diagnosis of Simple Temporal Networks
s and the finish f inish(s) of s. The constraints are used to specify the durations of plan steps, the precedence relations between plan steps, and the plan’s schedule. It is also possible to specify requirements such as the requirement that a plan step that must start within δ minutes after the finish of its preceding plan step. To describe a constraint, we associate a variable te with each event e ∈ E. These variables take values in some dense time domain T ime. We assume T ime to be a total order with element 0 and maximum element ∞. A constraint c ∈ C specifies the allowed temporal difference between two events: lb ≤ te − te ≤ ub where e and e are events in E, lb, ub ∈ T ime and 0 ≤ lb ≤ ub. Constraints define a strict precedence relation ≺ on the E. We say that e directly precedes e iff lb ≤ te −te ≤ ub ∈ C and ub > 0. The transitive closure of the direct precedences defines the precedence relations; i.e., e precedes e iff e ≺+ e. Relating an STN to a traditional plan description P = (S, ≺), the duration of a plan step s is described by 0 < lb ≤ tf inish(s) − tstart(s) ≤ ub. A precedence constraint s ≺ s is described by lb ≤ tstart(s ) − tf inish(s) ≤ ub. Note that in the standard interpretation of a precedence constraint, lb = 0 and ub = ∞. A schedule is a placement of events on the timeline. To describe a schedule we need a special event ‘0’ marking start of the timeline; i.e., t0 = 0. This enables us to schedule the period in which an event e ∈ E should take place: lb ≤ te − t0 ≤ ub; i.e.: lb ≤ te ≤ ub. Figure 2 and 3 shows the plan of our running example and the corresponding STN, respectively. Since there are no gaps between plan steps such as ‘flight’, ‘landing’, ’taxiing’, and so on; i.e., precedence constraints of the form 0 ≤ tstart(s ) − tf inish(s) ≤ 0 hold between successive plan steps, and since these constraint cannot be violated, in Figure 3 we have chosen to represent the finish and start of successive plan steps of a flight by a single event. KL 123
NW 456
taxiing
takeoff
flight
landing
ground handling
taxiing
taxiing
ground handling
takeoff
taxiing
takeoff
catering
Catering service
Figure 2. The plan.
on-blocks
off-blocks [50,55]
KL123
[15,20]
ground handling [30,f]
NW 456 [5.30,5.45]
[5,5]
takeoff
flight
landing
[15:55,16:00]
[15,20]
taxiing
[5,5]
takeoff
[10,30]
on-blocks
[5,5]
[8:50,8:55]
taxiing
off-blocks [50,55]
ground handling
[15,20]
taxiing
[5,5]
takeoff
[14:55,15:05] [10,15]
Catering service [15:15,15:30]
catering
0
Figure 3.
The Simple Temporal Network.
Semantics The constraints of an STN place restrictions on the way a plan may be executed; the execution schedule. An execution schedule for the set of events E of an STN (E, C) is a function σ : E → Time. We say that an execution schedule σ satisfies the constraints C, denoted by σ |= C, iff lb ≤ σ(e) − σ(e ) ≤ ub holds for every constraint lb ≤ te − te ≤ ub ∈ C. An execution schedule satisfying every constraints in C is called an allowable schedule. The identification of a allowable schedule for an STN is called a Simple Temporal
Problem (STP) [8]. It is well-known that an STN has an allowable execution schedule iff its underlying labeled graph contains no negative cycles. 3 We say that a constraint c : a ≤ te − te ≤ b is entailed by a set of constraints C, denoted by C |= c, iff every allowable schedule for C satisfies c. Given a constraint c : a ≤ te − te ≤ b we say that c : a ≤ te − te ≤ b is a tightening of c, denoted by c |= c iff a ≤ a ≤ b ≤ b. There is a sound and complete derivation procedure (|−) for determining the most tightened constraint c : a ≤ te − te ≤ b entailed by a set of constraints C: C |− c iff C |= c. Observations During the execution of a plan observations can be made. These observations may pertain to the time difference observed between two events e and e as specified in the plan or may pertain to the time at which a certain event e ∈ E takes place. We assume that the first type of observation is described by some constraint a ≤ te − te ≤ b indicating that we have observed that event e occurred at least a time steps, but within b time steps after e . The second type of observation is given by a constraint a ≤ te − t0 ≤ b indicating that e occurred after a time units but before b time units have been passed (after the occurrence of the time reference event ‘0’). The set of observations containing these constraints is denoted by Obs. In the running example, we have the following observations. The delayed takeoff of flight KL 123 at 16:30, the catering starting at 16:00, and the delayed arrival at the gate of flight NW 456 at 16:05. These observations are described by the constraints 16:30 ≤ te − t0 ≤ 16:30, 16:00 ≤ te − t0 ≤ 16:00, and 16:05 ≤ te − t0 ≤ 16:05. respectively. Compatibility An important notion is the compatibility between the STN specification (E, C) and the set of observations Obs. We say that the set of observations is compatible with the plan specification if we can find an execution schedule σ that satisfies the original set of constraints C as well as the set Obs; i.e., the STN (E, C ∪ Obs) has an allowable schedule. Qualifications If an STN (E, C) is not compatible with a set Obs of observations, some constraints in C must have been violated directly or indirectly by some of the observations. To restore the compatibility between plan and observations we need to indicate which constraints have been violated. Clearly, if a plan constraint c is violated, some part of it is executed in an abnormal way. To indicate such an abnormal execution we introduce a qualification Q of constraints. Given an STN (E, C), a qualification Q is a function Q : C → H assigning a health mode to every constraint in C. We distinguish the following health modes: 1. We use the mode Q(c) = nor to denote the normal execution of a constraint c ∈ C; i.e., c has not been violated, 2. we use Q(c) = ab to denote the abnormal execution of a constraint c without exactly specifying how it is violated, and 3. we use a real number δ ∈ R to denote the degree in which a constraint is violated: Q(c) = δ. Note that the last health mode describes how much shorter or longer the temporal difference between two events is with respect to what is specified by the constraint. 3
A negative cycle refers to the fact that, first of all, a constraint c : lb ≤ te − te ≤ ub is equivalent to the following two inequalities: te −te ≤ −lb and te −te ≤ ub. Next, such inequalities can be composed: if te −te ≤ ub then te − te = te − te + te − te ≤ ub + ub . If we can derive an inequality te − te < 0, a clear inconsistency (a negative cycle) has been detected [8].
595
N. Roos and C. Witteveen / Diagnosis of Simple Temporal Networks
Qualifications will be used to restore the compatibility between observations and plan executions as follows: 1. For any constraint c : lb ≤ te − te ≤ ub, if Q(c) = ab, we assume that the constraint is not respected anymore. So c will be replaced by its weakest implicate −∞ ≤ te − te ≤ ∞, which in fact comes down to removing c from C. 2. If Q(c) = δ ∈ R then c : lb ≤ te − te ≤ ub will be replaced by the constraint lb + δ ≤ te − te ≤ ub + δ. Since the duration of plan steps and waiting times between successive plan steps cannot be negative, we require that Q(c) ≥ −1 · lb. We will use the update fuction upd(C, Q) to denote modification of the constraints C using qualification Q. upd(C, Q) = {c ∈ C | Q(c) = nor} ∪ {lb + δ ≤ te − te ≤ ub + δ | c : lb ≤ te − te ≤ ub ∈ C, Q(c) = δ} Note that the qualification of the health mode ‘ab’ to a constraint increases the uncertainty expressed by the constraint; i.e., the difference between the upper and lower bound of the constraint. The qualification of the health mode δ ∈ R does not change the expressed uncertainty since (ub + δ) − (lb + δ) = ub − lb.
4
Diagnosis
Classical Model-Based Diagnosis (MBD) addresses the identification of failing components is some system. In MBD, two types of diagnosis are distinguished, abductive and consistency based diagnosis. The abductive diagnosis can be viewed as a special case of consistency based diagnosis where we have complete knowledge of both the way components may fail and how failing components behave.4 Since in abstraction, diagnosis of constraint violations in an STN is closest related to MBD, we will use the terminology used in classical MBD. Note however, that unlike MBD, we do not have components to be diagnosed. Instead we diagnose temporal constraints. We distinguish two types of diagnosis: diagnosis without fault models where only the health modes nor and ab are used, and diagnosis with fault models.
4.1
Diagnosis without fault models
We consider consistency based diagnosis without fault models. That is, we try to make the STN compatible with the observations by identifying the constraints that could have been violated without considering how the constraints are violated. We therefore restrict ourselves to qualifications Q that map constraints to nor or ab. As we remarked before, constraints qualified as being abnormal will be removed from the set of constraints C defined by the STN (E, C). Definition 1 Let (E, C) be an STN and let Obs be the constraints describing the observations made. Moreover, let Q : C → H be a qualification such that for every c ∈ C, Q(c) ∈ {nor, ab}. The qualification Q is a consistency based diagnosis without fault models iff the STN (E, {c ∈ C | Q(c) = nor} ∪ Obs) has an allowable schedule. 4
Diagnosis of Discrete Event Systems (DESs) is another form of modelbased diagnosis that addresses the identification of failure events that change the states of components in dynamic systems.
In general, there may not be a unique diagnosis given the observations made. In fact the number of diagnoses can be quite large. For instance, in the absence of fault models, if Q is a diagnosis, then every Q such that {c ∈ C | Q(c) = nor} ⊆ {c ∈ C | Q (c) = nor} is also diagnosis. Dependencies between constraint violations Among the set of diagnoses, some diagnoses are considered to be better than others. To select the most likely diagnosis, in MBD, preference orders are defined on the set of diagnoses. These preference orders are all based on the underlying assumption that fault probabilities are independent of each other. This assumptions does not hold for the constraints of an STN. In particular the schedule constraints are not independent of other schedule constraints, plan duration constraints and precedence constraints. For instance, a delay in boarding of passengers may imply a violation of the scheduled takeoff time. To illustrate the problem of dependencies more clearly, consider the plan depicted in Figure 4. Suppose that we make the observations 11:00 ≤ t5 − t0 ≤ 11:05 and 10:55 ≤ t6 − t0 ≤ 11:00. Clearly, the schedule constraints c0-5 : 10:40 ≤ t5 − t0 ≤ 10:50 and c0-6 : 10:45 ≤ t6 − t0 ≤ 10:50 are violated and a minimum diagnosis qualifies these constraints as abnormal (ab) while qualifying all other constraints as normal (nor). A diagnosis in which the schedule constraints c0-5 , c0-6 and the plan constraint c1-2 : 14 ≤ t2 − t1 ≤ 23 are qualified as abnormal (ab) is not a minimum diagnosis. Since a violation of the plan constraint c1-2 ; e.g., its execution is taking at least 15 minutes longer, implies the violations of the schedule constraints c0-5 and c0-6 , we should only count c1-2 when determining a minimum diagnosis. The violations of c0-5 and c0-6 are not independent of the violation of c1-2 . 1
[14,23]
[9:50,9:55]
2 [13,f]
[6,f]
3
Figure 4.
5
[10:40,10:50]
4
0
[25,33]
[10,15]
6
[10:45,10:50]
Dependencies between constraints.
The above example shows us that notions, such as minimum diagnoses, cannot be defined considering all the violated constraints. Instead, we should consider an “independent core” of a diagnosis Q. To identify the independent core, we first define a causal dependency between a constraint c and a set of constraints D. The idea is that the upper and lower bound of c cannot be chosen independently of the constraints in D. Moreover, for the constraint c : lb ≤ te − te ≤ ub to causally depend on D, no event of a constraint c in D may occur after the event e. Definition 2 A constraint c : lb ≤ te − te ≤ ub ∈ C depends on a set of constraints D not containing c iff • D is a minimal subset of C such that for some choices for lb and ub, D ∪ {c} has no allowable schedule, • for no event e specified in a constraint in D, e precedes e . The independent core of a diagnosis Q can now be determined by identifying the constraints in Q that (i) are qualified the health mode ab and (ii) do not depend on other constraints that are qualified the health mode ab in Q.
596
N. Roos and C. Witteveen / Diagnosis of Simple Temporal Networks
Definition 3 Let Q be a diagnosis and let C ⊆ C be a set of constraints. C is an independent core of Q iff C contains of all constraints c ∈ C such that Q(c) = ab, and for all sets of constraints D ⊆ C on which c depends, there is no c ∈ D such that Q(c ) = ab. Minimum diagnoses In MBD, one usually prefers minimum diagnoses. The rational behind this preference is that the probability of n faults is usually much smaller than the probability of m faults for n > m provided that fault probabilities are independent. In an STN the independence requirement does not hold. Therefore, minimum diagnoses must be defined with respect to the independent core of a diagnosis Q. In the running example, we observe a delayed takeoff of flight KL 123, a delayed on-block of flight NW 456 and a delay in the finish of the catering service. One possible diagnosis Q qualifies as abnormal (ab) the scheduled takeoff time of flight KL 123, the flying time of flight NW 456 and its scheduled on-block time, and the scheduled starting time of the catering service. All other constraints are qualified as normal (nor). The independent causal core of Q are the constraints specifying the flying time of flight NW 456 and the scheduled starting time of the catering service. Since the number of constraints in the independent causal core is minimal, Q is a minimum diagnosis. Theorem 1 Finding a diagnosis with a minimum independent core for an STD problem is an NP-hard problem. We prove NP-hardness by reducing the well-known NP-complete Feedback Arc Set problem [9] to the problem of finding a diagnosis with a minimum independent core. Consider an instance I = (G(V, A), K) of the Feedback Arc Set problem. We construct an instance f (I) = (P (E, C), Obs) of the temporal diagnosis problem by specifying the plan P (E, C) as follows: • For every node v ∈ V we create two events e1v and e2v in E; • For every arc (v, w) ∈ A we add a temporal constraint 1 ≤ te2w − te1v ≤ ∞ to the temporal network. Note that the source of an arc (v, w) in the graph G(V, A) is always a e1v -event in the temporal network while the target is always a e2w -event; see Figure 5. u1
u
u2
[1,f]
v
w x
v1
[1,f] [1,f]
v2 [1,f]
w1
w2
[1,f]
x1
x2
Figure 5. Reduction of a Feedback Arc Set problem to STD.
It is easy to see that this plan has an allowable execution schedule: for every event tiv ∈ E, let σ(tiv ) = i. This assignment satisfies all constraints. Moreover, since there is at most one path of constraints between each pair of events in the STN, the independent core consists of all constraints that are qualified as being abnormal in diagnosis. The set of observations Obs of observations restores the structure of the graph G by containing for every node v ∈ V the constraint 0 ≤ te1v − te2v ≤ ∞. It is not hard to see that the observations are incompatible with the STN (i.e., the STN contains a negative cycle) iff the graph G contains a cycle. Moreover, a diagnosis in which K constraints are qualified as abnormal ab corresponds one to one with a directed feedback arc set of size K of the graph G(V, A).
4.2
Diagnosis with fault models
An important difference with diagnosis in other domains is that in diagnosis of STNs fault models are always available. In an STN, a fault model of a temporal constraint describes the degree to which the constraint is violated. In the qualification Q we denote this by shift in the bound of the temporal constraints. So, if Q(c) = δ ∈ R, then c : lb ≤ te − te ≤ ub will be replaced by the constraint lb + δ ≤ te − te ≤ ub + δ. Hence, diagnosis with fault models is defined as: Definition 4 Let (E, C) be an STN and let Obs be the constraints describing the observations made. Moreover, let Q : C → H be a qualification. The qualification Q is a consistency based diagnosis with fault models iff the STN (E, upd(C, Q) ∪ Obs) has an allowable schedule. Preferred diagnoses Definition 4 does not give us a unique diagnosis given the observations. Some diagnoses may be better than others. Generalizing the preference for minimum diagnoses in the absence of fault models, we could prefer minimum-fault diagnoses that minimize c∈C |Q(c)| where Q(c) = nor and Q(c) = ab are interpreted as Q(c) = 0 and Q(c) = ω, respectively. Clearly, minimum diagnoses are a special case of minimum-fault diagnoses. A minimum-fault diagnosis minimizes the number of execution schedules σ that satisfy an updated STN (E, upd(C, Q)) and the observations Obs. To give an illustration, consider a plan with two events e and e and one constraint: c : lb ≤ te − te ≤ ub. If we observe a ≤ te − te ≤ b with a > ub, then Q(c) = a − lb is a minimum-fault diagnosis. Since there is only one execution schedule satisfying the updated plan and the observation, the probability that the diagnosis is correct is minimal. Therefore, a different notion of preference is desirable. We should prefer diagnoses that have a high probability of being correct; i.e., maximize the number of execution schedules. The number of execution schedules is maximal if we can predict the observations made; i.e., abductive diagnosis. To illustrate this point, consider the plan in figure 6 together with the observations: 10:40 ≤ t5 − t0 ≤ 11:15 and 10:30 ≤ t6 − t0 ≤ ∞. If all constraints are qualified normal (nor), then the constraints entail 10:35 ≤ t5 − t0 ≤ 11:03 and 10:27 ≤ t6 − t0 ≤ ∞, which do not explain our observations. A diagnosis Q qualifying all plan constraint as normal (nor) except c1−2 : 14 ≤ t2 − t1 ≤ 23, which is qualified as: Q(c) = 5, does explain the observations. This diagnosis enables us to predict 10:40 ≤ t5 − t0 ≤ 11:08 and 10:32 ≤ t6 − t0 ≤ ∞. Since these predictions are a tightening of the observations, the diagnosis Q is an abductive diagnosis. 0
1
[14,23]
2
[6,12]
3
[25,33]
5
[9:50,9:55] [13,f]
4
Figure 6.
[10,15]
6
Abduction versus confirmation.
Maximum confirmation diagnosis In the above example the two observations are not very accurate. A more accurate observation such as 10:40 ≤ t5 − t0 ≤ 10:48, cannot be explained by the normal execution of the plan: Q(c) = nor for all constraints in C. Nevertheless, the most tightened constraint 10:35 ≤ t5 − t0 ≤ 11:03 entailed by the plan constraints, is confirmed by the observation. This also indicates the absence of violations of the constraints that are used to
N. Roos and C. Witteveen / Diagnosis of Simple Temporal Networks
make the prediction for the pair of events ‘0’ and e5 . Therefore, we propose a new notion of diagnosis, namely maximum-confirmation diagnoses. The idea of maximum-confirmation diagnosis is to identify the qualification Q for which the number of execution schedules is maximal. To measure the number of execution schedules, we introduce a confirmation percentage. Definition 5 Let Q be a qualification, let o : lb ≤ te − te ≤ ub be an observation and let a ≤ te − te ≤ b be the most tightened constraint implied by a qualified plan (E, upd(C, Q)). The confirmation percentage of the observation o, denoted by cpQ (o), is defined as: min(ub,b)−max(lb,a) if min(ub, b) − max(lb, a) ≥ 0 ub−lb cpQ (o) = 0 otherwise The sum of the confirmation percentages gives us a measure for comparing diagnoses. Definition 6 Let (E, C) be an STN, and let Obs be the constraints describing the observations made. A diagnosis Q of the STN and the observation Obs is a maximumconfirmation diagnosis iff o∈Obs cpQ (o) is maximal. Note that a maximum-confirmation diagnosis need not be unique. From the set of maximum-confirmation diagnoses we can derive intervals of violation degrees for the constraints. In our running example, the maximum-confirmation diagnoses assign delays of 15 to 35 minutes to the catering process given the observed finish at 16:00. An important question concerns the worst case time complexity of determining a maximum confirmation diagnosis. Theorem 2 A maximum confirmation diagnosis can be determined in polynomial time. To see why, note that each observation o : lb ≤ te − te ≤ ub ∈ Obs has one or more causal chains of events between the two events of the observation constraint e and e . Starting from the earliest observation, we qualify the last plan constraint of each the causal chain of events between the two events e and e as being violated. The qualification is chosen such that it maximizes the confirmation percentage of the observation o. Before continuing with the next constraint, we have to propagate the effect of the qualifications made. All steps can be carried out in polynomial time.
5
Related work
Several authors have addressed aspects of plan diagnosis. • Diagnosis of an agent’s planning assumptions: Birnbaum et al. [1]. • Diagnosis of a single task execution: Lesser et al. [2, 10]. • Social diagnosis of behavior selection in teams: Kalech and Kaminka [11, 13]. • Diagnosis of the abnormal effects of a plan execution: Roos et al. [19, 16, 7, 18]. • Diagnosis of coordination errors of agents executing a plan: Kalech and Kaminka [12] and Roos and Witteveen [17]. • Diagnosis of multi agent plan interactions: de Jonge et al. [3, 6]. • Diagnosis and repair of plan execution with agents share resources and provide services: Micalizio and Torasso [15, 14]. • Diagnosis of temporal constraint violations: de Jonge et al. [4, 5]. None of these approaches address diagnosis of Simple Temporal Networks. The approach to de Jonge et al. [4, 5] comes closest to our approach. However, they can only deal with abstract states such as delayed or early for plan steps.
6
597
Conclusion
Identifying causes of violations of a plan’s temporal constraints is an important issue in plan execution. To enable such diagnosis, the temporal aspects of a plans are described by Simple Temporal Network (STN). Based on observations of the plan’s execution, diagnosis has to identify the temporal constraint that are violated. The notion of classical Model-Based Diagnosis (MBD) has been adapted to STNs. Two important issues had to be dealt with: (i) we cannot assume that temporal constraints are violated independently, and (ii) the notion of consistency-based and abductive diagnosis are not adequate for STNs. A new notion of a maximum confirmation diagnosis has been proposed. In future work we will integrate whether the here presented model for diagnosis of STN can be combined with models for diagnosing other aspects of plan execution failures.
REFERENCES [1] L. Birnbaum, G. Collins, M. Freed, and B. Krulwich. Model-based diagnosis of planning failures. In AAAI 90, pages 318–323, 1990. [2] N. Carver and V. Lesser. Domain monotonicity and the performance of local solutions strategies for cdps-based distributed sensor interpretation and distributed diagnosis. Autonomous Agents and Multi-Agent Systems, 6(1):35–76, 2003. [3] F. de Jonge and N. Roos. Plan-execution health repair in a multi-agent system. In PlanSIG 2004, 2004. [4] F. de Jonge, N. Roos, and H. Aldewereld. Multiagent system technologies. In Multiagent System Technologies, 2007. [5] F. de Jonge, N. Roos, and H. Aldewereld. Temporal diagnosis of multiagent plan execution without an explicit representation of time. In BNAIC-07, 2007. [6] F. de Jonge, N. Roos, and H. van den Herik. Keeping plan execution healthy. In Multi-Agent Systems and Applications IV: CEEMAS 2005, LNCS 3690, pages 377–387, 2005. [7] F. de Jonge, N. Roos, and C. Witteveen. Diagnosis of multi-agent plan execution. In Multiagent System Technologies: MATES 2006, LNCS 4196, pages 86–97, 2006. [8] R. Dechter, I. Meiri, and J. Pearl. Temporal constraint networks. Artificial Intelligence, 49:61–95, 1991. [9] P. Festa, P. Pardalos, and M. Resende. Feedback set problems. In Handbook of Combinatorial Optimization, volume 4. Kluwer Academic Publishers, 1999. [10] B. Horling, B. Benyo, and V. Lesser. Using Self-Diagnosis to Adapt Organizational Structures. In Proceedings of the 5th International Conference on Autonomous Agents, pages 529–536. ACM Press, 2001. [11] M. Kalech and G. A. Kaminka. Diagnosing a team of agents: Scalingup. In AAMAS 2005, pages 249–255, 2005. [12] M. Kalech and G. A. Kaminka. Towards model-based diagnosis of coordination failures. In AAAI 2005, pages 102–107, 2005. [13] M. Kalech and G. A. Kaminka. On the design of coordination diagnosis algorithms for theams of situated agents. Artificial Intelligence, 171:491–513, 2007. [14] R. Micalizio and P. Torasso. On-line monitoring of plan execution: A distributed approach. Knowledge-Based Systems, 20:134–142. [15] R. Micalizio and P. Torasso. Team cooperation for plan recovery in multi-agent systems. In Multiagent System Technologies, LNCS 4687, pages 170–181, 2007. [16] N. Roos and C. Witteveen. Diagnosis of plans and agents. In MultiAgent Systems and Applications IV: CEEMAS 2005, LNCS 3690, pages 357–366, 2005. [17] N. Roos and C. Witteveen. Diagnosis of plan structure violations. In Multiagent System Technologies, 2007. [18] N. Roos and C. Witteveen. Models and methods for plan diagnosis. Journal of Autonomous Agents and Multi-Agent Systems, DOI: 10.1007/s10458-007-9017-6, 2008. [19] C. Witteveen, N. Roos, R. van der Krogt, and M. de Weerdt. Diagnosis of single and multi-agent plans. In AAMAS 2005, pages 805–812, 2005.
This page intentionally left blank
10. Perception, Sensing and Cognitive Robotics
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-601
601
An Attentive Machine Interface Using Geo-Contextual Awareness for Mobile Vision Tasks Katrin Amlacher and Lucas Paletta 1 Abstract. The presented work settles attention in the architecture of ambient intelligence, in particular, for the application of mobile vision tasks in multimodal interfaces. A major issue for the performance of these services is uncertainty in the visual information which roots in the requirement to index into a huge amount of reference images. We propose a system implementation – the Attentive Machine Interface (AMI) – that enables contextual processing of multi-sensor information in a probabilistic framework, for example to exploit contextual information from geo-services with the purpose to cut down the visual search space into a subset of relevant object hypotheses. We present a proof-of-concept with results from bottom-up information processing from experimental tracks and image capture in an urban scenario, extracting object hypotheses in the local context from both (i) mobile image based appearance and (ii) GPS based positioning, and verify performance in recognition accuracy (> 10%) using Bayesian decision fusion. Finally, we demonstrate that top-down information processing – geo-information priming the recognition method in feature space – can yield even better results (> 13%) and more economic computing, verifying the advantage of multi-sensor attentive processing in multimodal interfaces.
1
INTRODUCTION
Attention as a methodology of selecting detail of relevance is ubiquitous in biological systems and has increasingly received consideration for the design of artificial cognitive systems. Mobile multimodal interfaces as devices that receive a multitude and diversity of data with the purpose to assisting the user with relevant detail and level of abstraction are an obvious choice of investigation about how concepts for the appropriate selection of information might contribute to solve a task. In this paper we approach attention from the viewpoint of a nomadic urban user that is equipped with a camera phone and that is interested in receiving appropriate information about objects of interest within a local environment. We describe the embedding of the problem in a general system implementation of an Attentive Machine Interface (AMI) that enables contextual processing of multi-sensor information in a probabilistic framework. The system is prepared to support in general bottom-up information processing in terms of selecting and processing information within specific modalities and according to a pre-defined – be it learned or heuristically determined – methodology. A particularly novel functionality presented in this work is to enable top-down information processing by cross-modal 1
JOANNEUM RESEARCH Forschungsgesellschaft mbH, Institute of Digital Image Processing, Wastiangasse 6, 8010 Graz, Austria, email: {katrin.amlacher,lucas.paletta}@joanneum.at
priming of early processing in the manner of a multi-sensor framework for attentive – and finally superior – performance. Mobile object recognition and visual positioning have recently been proposed in terms of mobile vision services for the support of urban nomadic users. A major issue for the performance of these services is uncertainty in visual information; covering large urban areas with naive approaches would require to refer to a huge amount of reference images and consequently to highly ambiguous features. We propose to exploit contextual information from geo-services with the purpose to cut down the visual search space into a subset of all available object hypotheses in the large urban area. Geo-information in association with visual features enables to restrict the search within a local context. We extract object hypotheses in the local context from (i) mobile image based appearance and (ii) GPS based positioning and investigate the performance of Bayesian information fusion with respect to a reference database (TSG-20). The results from experimental tracks and image captures in an urban scenario prove a significant increase in recognition accuracy (Sec. 4) and use of computational resources when using in contrast to omitting geo-contextual information. Finally, we demonstrate that cross-modal top-down information processing – geo-information priming the recognition method in visual feature space – can yield even better results and more economic computing, verifying the advantage of using attentive processing in multimodal interfaces.
2 2.1
THE ATTENTIVE MACHINE INTERFACE Related Work
In ubiquitous computing, several frameworks have been proposed in the frame of attentive interfaces and context awareness. [14, 1] proposed Attentive User Interfaces (AUI) that capture the attention of the user, e.g. from eye gaze estimation, and consequently adapt interaction systems for better communication with the user. [3] proposed that context is a description of a real world situation on an abstract level that is derived from available cues. [2] described the role of perceptual components in a context aware system for interaction. Finally, [11] proposed a context processing system with blackboard functionality where components can subscribe to receive messages matching specific patterns, and various cues are integrated into a multimodal description of a situation. While the concept of AMI is directly inspired by [11], it presents processing in a probabilistic framework and enables top-down, i.e., attentive cross-modal information processing. Previous work on mobile vision services primarily advanced the state-of-the-art in computer vision methodology for the application in urban scenarios. [13] provided a first innovative attempt on building identification proposing local affine features for object match-
602
K. Amlacher and L. Paletta / An Attentive Machine Interface Using Geo-Contextual Awareness for Mobile Vision Tasks
ing. [15] introduced image retrieval methodology for the indexing of visually relevant information from the web for mobile location recognition. Subsequent attempts [8, 10, 4] advanced the methodology further towards highly robust building recognition, however, so far it has not been considered to investigate the contribution of geoinformation to the performance of the vision service.
2.2
Concept and Architecture
The context framework used in the AMI defines a cue as an abstraction of logical and physical sensors which may represent a context itself, generating a recursive definition of context. Sensor data, cues and context descriptions are defined in a framework of uncertainty. Attention is the act of selecting and enabling detail – in response to situation specific data - within a choice of given information sources, with the purpose to operate exclusively on it. Attention enabled by the AMI is therefore focusing operations on a specific detail of a situation that is described by the context. The architecture of the AMI reflects the enabling of both bottomup and top-down information processing and would support snapshot (e.g., image) based as well as continuous operation on a stream of input data (e.g., video). Fig. 1 outlines the embedding of the AMI within a client-server system architecture for mobile vision services with support from multi-sensor information. A user interface generates task information (mobile vision service) that is fed into the AMI. The user request for context information is handled by a Master Control (MC) component that schedules the processing (multiple users can start several tasks) and associates with each task corresponding system monitoring (SM) procedures. A concrete task is then performed by the Task Processor (TP) who, firstly, requests a hierarchical description of services, i.e. context-generating modules (context subgraph) and, secondly, executes the services in the order of the subgraph description. Since such a subgraph can provide several ways of processing, the appropriate part can get selected by means of, e.g., time constraint, confidence of the expected result and quality of context-generating services. If a service gets offline, TP can switch to another similar service or to another processing chain, where already processed data is reused. The Context Graph Manager (CGM) maintains and manages context-generating modules in a graph structure (Context Graph). These context-generating modules are services that receive an input cue (an image, a GPS signal, etc.) from the Data Control (DC) module and generate a specific context abstraction from an integration of the input cues. CGM assembles the subgraph according to several constraints, such as, task information, availability of context-generating modules and data and ensures that the subgraph is processable. The AMI functionality ensures the possibility to arbitrarily combine services and implements process flow regulation mechanism, e.g. when a service gets offline to switch to another service. It is also possible to invoke an additional processing chain if the confidence of the result it too low. Multiple users can concurrently request context information and the services are targeted towards fast and accurate (robust) responses.
2.3
Context Processing
For high-level context generation various services are required to combine information, services may temporarily exist, and outputs may be combined in arbitrary manner. The Context Graph – a directed acyclic graph with nodes representing individual context processing units, edges describing the information flow – is a flexible and extensible data structure that correspondingly connects between
Figure 1.
Concept of a client-server system architecture with attentive machine interface.
individual functionalities. Each context node provides a contextgenerating service that derives context information from its input data; context nodes are linked together depending on input and output data of the wrapped services; context nodes represent context information at a different level, where high-level context information is demanded by the user. For the generation of high-level context information only parts of the Context Graph need to be processed, in fact those that contribute to the corresponding (top-level) context node. Depending on available input data and services, a subgraph from the Context Graph is derived which consequently ensures a smooth processing by the Task Processor. The subgraph gets processed starting with those leaf context nodes which take data only from the Data Control. The calculated results are given to the next Context Nodes following the outgoing edges until the top-level context node is reached. The resulting high-level context information is given to the user via a visualization compent and is stored in the Data Control or Diary Manager.
2.4
Bottom-Up and Top-Down Processing
The AMI supports two different modes of information processing, i.e., bottom-up and top-down processing. The choice of modes can be decided by the Task Processor according to demands on computational resources, quality of service (e.g., recognition accuracy) and availability of data. Figure 2 provides a schematic sketch of two different modes in performing the service of geo-indexed object recognition (Sec. 3). In bottom-up processing mode (a), services for the computation of (i) visual objects (object recognition) and (ii) geo-features (positioning) are determining hypotheses with respect to the occurrence of objects (i) in the image and (ii) within a local environment. In topdown processing mode (b), there is a cross-modal dependency in (i) object recognition on the input of object hypotheses provided by (ii) the geo-service. While individually processed distributions on object hypotheses can simply be integrated in (a) using Bayesian decision fusion, (b) actually models an impact of geo-information on visual feature extraction and integration as outlined in more detail in Sec. 3.
K. Amlacher and L. Paletta / An Attentive Machine Interface Using Geo-Contextual Awareness for Mobile Vision Tasks
(a)
(b)
Figure 2. Two different processing modes visualised by their associated context subgraphs for “Geo-Indexed Building Recognition” (Sec. 3). (a) Bottom-up information processing of visual object recognition and geo-features. (b) Top-down information processing by using geo-features to prime visual object recognition (Sec. 2.4).
3
GEO-INDEXED OBJECT RECOGNITION
Urban image based recognition provides the technology for both object awareness and positioning. Outdoor geo-referencing still mainly relies on satellite based signals where problems arise when the user enters urban canyons and the availability of satellite signals dramatically decreases due to various shadowing effects [5]. Cell identification is not treated here due to its large positioning error. Alternative concepts for localization are economically not affordable, such as, INS and markers that need to be massively distributed across the urban area. For image based urban object recognition, we briefly describe how we make use of the methodology presented in [12, 4]. The user captures an image about an object of interest in its field of view, and a software client initiates wireless data submission to the server. Assuming that a GPS receiver is available, the mobile device reads the actual position estimate and sends this together with the image to the server. In the second stage, the web-service reads the message and analyzes the geo-referenced image. Based on a current quality of service and the given decision for object detection and identification, the server prepares the associated annotation information from the content database and sends it back to the client for visualization. Attentive Object Recognition Research on visual object detection has recently focused on the development of local interest operators [9, 6] and the integration of local information into object recognition. The SIFT (Scale Invariant Feature Transformation) descriptor [6] is widely used for its capabilities for robust matching despite viewpoint, illumination and scale changes in the object image captures which is mandatory for mobile vision services. The Informative Features Approach (i-SIFT [4]) applied to mobile imagery in our experiments uses local density estimations to determine the posterior entropy, making local information content explicit with respect to object discrimination. The information content from a posterior distribution is determined with respect to given task specific hypotheses. In contrast to costly global optimization, one expects that it is sufficiently accurate to estimate a local information content from the posterior distribution within a sample test point’s local neighborhood in descriptor space. One is primarily interested to get the information content of any sample local descriptor di in descriptor space D, di ∈ R|D| , with respect to the task of object recognition, where oi denotes an object hypothesis from a given object set SO . For this,
603
Figure 3. Concept for recognition from informative local descriptors. (I) SIFT descriptors are extracted within the test image. (II) Decision making analyzes the descriptor voting for MAP decision. (III) In i-SIFT attentive processing, a decision tree estimates the SIFT specific entropy; informative descriptors are then attended for decision making (II).
one needs to estimate the entropy H(O|di ) of the posterior distribution P (ok |di ), k = 1 . . . Ω, Ω is the number of instantiations of the object class variable P O. The Shannon conditional entropy denotes H(O|di ) ≡ − k P (ok |di ) log P (ok |di ). One approximates the posteriors at di using only samples gj inside a Parzen window of a local neighborhood , ||di − dj || ≤ , j = 1 . . . J. Fig. 3 depicts discriminative descriptors in an entropy-coded representation of local SIFT features di . From discriminative local descriptors one proceeds to entropy thresholded object representations, providing increasingly sparse representations with increasing recognition accuracy, in terms of storing only selected descriptor information that is relevant for classification purposes, i.e., those di with ˆ H(O|d i ) ≤ HΘ . For the rejection of images whenever they do not contain any objects of interest one considers to estimate the entropy in the posterior distribution - obtained from a normalized histogram of the object votes - and reject images with posterior entropies above a predefined threshold. The proposed recognition process is characterized by an entropy driven selection of image regions for classification, and a voting operation. Geo-Contextual Computing of Object Recognition Geoservices provide access to information about a local context that is stored in a digital city map. Map information in terms of map features is indexed via a current estimate on the user position that can be derived from satellite based signals (GPS), dead-reckoning devices and so on. The map features can provide geo-contextual information in terms of, e.g., location of points of interest. In previous work [7], the general relevance of geo-services for the application of mobile object recognition was already emphasised, however, the contribution of the geo-services to the performance of geo-indexed object recognition was not quantitatively assessed, and top-down processing was
Figure 4. Extraction of object hypotheses from geo-services. (Left to right) Within a local spatial neighborhood (geo-focus), distances to the points of interest are determined, weighted by an exponential function and normalised to result in a distribution on object hypotheses.
604
K. Amlacher and L. Paletta / An Attentive Machine Interface Using Geo-Contextual Awareness for Mobile Vision Tasks
not considered. Fig. 4 depicts a novel methodology to introduce geo-service based object hypotheses. (i) A geo-focus is first defined with respect to a radius of expected position accuracy with respect to the city map. (ii) Distances between user position and points of interest (e.g., tourist sight buildings) that are within the geo-focus are estimated. (iii) The distances are then weighted according to a normal density function by p(x) = 1/((2π)d/2 |Σ|1/2 ) exp{−1/2(x − μ)T Σ−1 (x − μ)}. By investigating different values for σ, assuming (Σij ) = δij σj2 , one can tune the impact of distances on the weighting of object hypotheses. (iv) Finally, weighted distances are normalised and determine confidence values of individual object hypotheses. Bottom-Up Geo-Indexed Object Recognition Distributions over object hypotheses from vision and geo-services are then integrated via Bayesian decision fusion. Although an analytic investigation of both visual and position signal based information should prove statistical dependency between the corresponding random variables, one assumes that it is here sufficient to pursue a naive Bayes approach for the integration of the hypotheses (in order to get a rapid estimate about the contribution of geo-services to mobile vision services) by P (ok |yi,v , xi,g ) = p(ok |yi,v )p(ok |xi,g ), where indices v and g mark information from image (y) and positioning (x), respectively. Top-Down Geo-Indexed Object Recognition Here, we firstly process the geo-service in order to receive a distribution over object hypotheses that is input to attentive object recognition. The recognition method is then primed to reject all those local (i-SIFT; see above) descriptors from consideration that are labelled with hypotheses of negligible confidence in the output of the geo-service. Hence the feature space underlying the nearest-neighbor voting procedure is containing only pre-selected prototypes which are then preferred but outside a pre-determined distance threshold in feature space. The resulting distribution over object hypothesis can again be fused with the distribution from geo-services in order to receive a distance based weighting on object hypotheses.
4
EXPERIMENTS
The overall goal of the experiments was to determine and to quantify the contribution of geo-services to object recognition in urban environments and to compare bottom-up and top-down approaches in the AMI. The performance in the detection and recognition of objects of interest on the query images with respect to a given reference image database and a given methodology (TSG-20 [4]) was compared to the identical processing but using geo-information and information fusion for the integration of object hypotheses. User Scenario and Constraints In the application scenario, we imagine a tourist being equipped with a mobile device with builtin GPS. He can send image based queries to a server using UMTS or WLAN based connectivity. The server performs geo-indexed object recognition and is expected to respond with tourist relevant annotation if a point of interest was identified. In the experiments we used an ultra-mobile PC (Sony Vaio UMPC VGN-UX1XN) with 1.3 MPixels image captures. Reference imagery [4] with 640 × 480 resolution about building objects of the TSG-20 database2 were captured from a camera-equipped mobile phone (Nokia 6230), containing changes in 3D viewpoint, partial occlusions, scale changes by varying distances for exposure, and various illumination changes. For each object we selected 2 images taken by a viewpoint change of ≈ ±30◦ and of similar distance to the object for training to determine the i-SIFT based object representation. Two additional views 2
http://dib.joanneum.at/cape/TSG-20/
(a)
(b)
(c)
(d) Figure 5. Comparison between bottom-up (blue/dark bars) and top-down approach (green/light bars) from (a) sample input images. Integration of object hypotheses from (b) vision and (c) geo-services into a (d) fused distribution demonstrates clear increases in the confidences of the correct object hypothesis and therefore a significant improvement in the performance of the mobile vision service (Fig. 6).
were taken for test purposes, giving 40 test images in total. For the evaluation of background detection we used a dataset of 120 query images, containing only images of buildings and street sides without TSG-20 objects. Another dataset was acquired with the UMPC, which consists of seven images per TSG-20 object from different view points; images were captured on different days under different weather conditions. Attentive Object Recognition In the first evaluation stage, each individual image query was evaluated for vision based object detection and recognition, then regarding extraction of geo-service based object hypotheses, and finally with respect to Bayesian decision fusion on the individual probability distributions (Sec. 3). Detection is an important pre-processing step to recognition, e.g., to avoid geoservices to support confidences for objects that are not in the query image. Experiments on imagery including background data resulted in a PT rate of 89.2% and a FP rate of 20.1%, probably due to the bad sensor quality. However, once a query image is attributed to the object category, the geo-indexed object recognition will boost the performance in finding more correct hypotheses than using vision alone. Fig. 5 depicts sample query images associated with corresponding distributions on object hypotheses from vision, geo-services, and using information fusion. The results demonstrate significant increases in the confidences of correct object hypotheses. The evaluation of the complete database of image queries about TSG-20 objects (Fig. 6) proves a decisive advantage for taking geo-service based information into account in contrast to purely vision based object recognition, in particular, using the top-down approach. While vision based
K. Amlacher and L. Paletta / An Attentive Machine Interface Using Geo-Contextual Awareness for Mobile Vision Tasks
605
the local object context that enables a meaningful selection of expected object hypotheses and therefore improve overall performance of urban object recognition. We proposed to pursue a methodology on Bayesian decision fusion that integrates distributions on object hypotheses from both cues, i.e., visual information and position estimate. We performed experiments on a representative image data set and proved significant improvement in performance when using geo-services. In future work we further exploit the concept of the AMI by integrating different context information, such as visual context or semantic segmentation, in a probabilistic framework.
ACKNOWLEDGEMENTS
(a)
This work is supported in part by the European Commission funded project MOBVIS under grant number FP6-511051 and by the FWF Austrian National Research Network on Cognitive Vision under subproject S9104-N04.
REFERENCES
(b)
Figure 6. (a) Performance comparison between geo-service based hypotheses (Geo), purely vision based recognition (OR), bottom-up processing with information fusion (OR+GEO), top-down processing of attentive recognition without (R+OR) and with post-processing using Bayesian decision fusion (R+OR+GEO). (b) Geo-indexed object recognition involves only a fraction of hypotheses and reduces computing time.
recognition is on a low level (≈ 84%), an exponentially weighted spatial enlargement of the scope on object hypotheses with geoservices increased the recognition accuracy up to ≈ 96%. With increasing σ an increasing number of object hypotheses are taken into account for information fusion and the performance finally drops to vision based recognition performance (uniform distribution in the geo-service based object hypotheses).
5
CONCLUSION
In this work we propose the AMI that enables bottom-up and topdown cross-modal information processing. We take advantage of geo-contextual information for the improvement of mobile vision services in urban scenarios, such as visual object recognition of tourist sights. We argued that geo-information provides a focus on
[1] Leonardo Bonanni, Chia-Hsun Lee, and Ted Selker, ‘Attention-based design of augmented reality interfaces’, in CHI ’05: CHI ’05 extended abstracts on Human factors in computing systems, pp. 1228–1231, New York, NY, USA, (2005). ACM. [2] James L. Crowley, Jo¨elle Coutaz, Gaeten Rey, and Patrick Reignier, ‘Perceptual Components for Context Aware Computing’, in UBICOMP 2002, International Conference on Ubiquitous Computing, Goteborg, Sweden, (September 2002). [3] Anind K. Dey and Gregory D. Abowd, ‘Towards a Better Understanding of Context and Context-Awareness’, in Proceedings of the CHI 2000 Workshop on ”The What, Who, Where, When, Why and How of Context-Awareness”, (2000). [4] Gerald Fritz, Christin Seifert, and Lucas Paletta, ‘A Mobile Vision System for Urban Object Detection with Informative Local Descriptors’, in Proc. IEEE 4th International Conference on Computer Vision Systems, ICVS, New York, NY, (January 2006). [5] B. Hofmann-Wellenhof, H. Lichtenegger, and J. Collins, Global Positioning System Theory and Practice, Springer-Verlag, Vienna, Austria, 2001. [6] D. Lowe, ‘Distinctive image features from scale-invariant keypoints’, International Journal of Computer Vision, 60(2), 91–110, (2004). [7] P. Luley, L. Paletta, A. Almer, M. Schardt, and J. Ringert, ‘Geo-services and computer vision for object awareness in mobile system applications’, in Proc. 3rd Symposium on LBS and Cartography, pp. 61–64. Springer, (2005). [8] Rapha¨el Mar´ee, Pierre Geurts, Justus Piater, and Louis Wehenkel, ‘Decision trees and random subwindows for object recognition’, in ICML workshop on Machine Learning Techniques for Processing Multimedia Content (MLMM2005), (2005). [9] K. Mikolajczyk and C. Schmid, ‘A performance evaluation of local descriptors’, in Proc. Computer Vision and Pattern Recognition, CVPR 2003, Madison, WI, (2003). [10] Stepan Obdrzalek and Jiri Matas, ‘Sub-linear indexing for large scale object recognition.’, in Proceedings of the British Machine Vision Conference, volume 1, pp. 1–10, (2005). [11] Albrecht Schmidt and Kristof Van Laerhoven, ‘How to build smart appliances’, IEEE Personal Communications, 66 – 71, (2001). [12] C. Seifert, G. Fritz, L. Paletta, and H. Bischof, ‘Learning to focus attention on discriminative regions for object detection’, in Proc. European Conference on Artificial Intelligence, ECAI 2004, pp. 932–936, (2004). [13] H. Shao, T. Svoboda, and L. van Gool, ‘HPAT indexing for fast object/scene recognition based on local appearance’, in Proc. International Conference on Image and Video Retrieval, CIVR 2003, pp. 71– 80. Chicago,IL, (2003). [14] Roel Vertegaal, ‘Attentive User Interfaces’, Communications of the ACM, 46(3), 30–33, (2003). [15] T. Yeh, K. Tollmar, and T. Darrell, ‘Searching the web with mobile images for location recognition’, in Proc. IEEE Computer Vision and Pattern Recognition, CVPR 2004, pp. 76–81, Washington, DC, (2004).
606
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-606
Learning Functional Object-Categories from a Relational Spatio-Temporal Representation Muralikrishna Sridhar and Anthony G Cohn and David C Hogg 1 Abstract. We propose a framework that learns functional objectcategories from spatio-temporal data sets such as those abstracted from video. The data is represented as one activity graph that encodes qualitative spatio-temporal patterns of interaction between objects. Event classes are induced by statistical generalization, the instances of which encode similar patterns of spatio-temporal relationships between objects. Equivalence classes of objects are discovered on the basis of their similar role in multiple event instantiations. Objects are represented in a multidimensional space that captures their role in all the events. Unsupervised learning in this space results in functional object-categories. Experiments in the domain of food preparation suggest that our techniques represent a significant step in unsupervised learning of functional object categories from spatio-temporal patterns of object interaction.
1 Introduction Children learn about the world around them by observing and participating in activities that engage them in the course of every day life. One aspect of learning activity models involves acquiring notions of what objects mean to them based on the function they fulfill in activities. Functional categories and taxonomies of objects are naturally acquired by humans during the process of observing object behaviour and using them accordingly. An important step toward unsupervised learning of activity models is to learn an analogous model of functional object categories purely by observing their behaviour. In this work, we represent the behaviour of objects involved in an activity, in terms of an activity graph, which captures qualitative spatio-temporal patterns of interaction between these objects. We search for frequent similar subgraph instances and generalize these by variablizing. These are our event classes, the instances of each event class encoding a similar pattern of spatio-temporal relationships between their respective object instances. Then we learn object categories by clustering in an object space, where the similarity measure between objects is measured, based on whether they play a similar role across the event instances for each of the event classes; e.g., a set of objects, even though different in appearance, may tend to play a similar role in events such as washing, cutting and cooking as opposed to others that do not play such a role in these events. By observing multiple instances of such event classes that have the same event role for this set of objects, it is natural to form a category that correspond to what we refer to as vegetables. Through our experiments we demonstrate that using our framework it is possible to learn semantically meaningful functional object categories and a taxonomy purely by observing object behaviour. 1
School of Computing, University of Leeds, Leeds, UK, email:{krishna,agc,dch}@comp.leeds.ac.uk. This work was funded under EPSRC grant EP/D061334/1.
In section 3 we show how functional object categories can be learned from event classes. The rest of the paper describes a novel procedure for inducing event classes from video input.
2 Related Work Much previous work has focused on supervised learning of object classes either based on the appearance of the object itself [9] or by recognizing contextual cues such as activities associated with objects [8] to locate and recognize objects. By contrast, unsupervised learning of objects can be divided into two stages, the first being object discovery e.g. discovery of blobs that are candidates for objects from video. The second stage is object class learning which involves automatically categorizing these blobs into object classes. Early work on object discovery [6] formed candidate objects by grouping pixels with similar temporal signatures that are constructed by recording colour (RGB) values for stable intervals when objects arrive, stay and depart from a region. In [7], candidate objects are obtained by first over segmenting images in a video and after extracting image features for these segments, rigidly moving features are grouped into potential objects. Both object discovery and class learning is performed simultaneously from a collection of static images [5] in two steps. First multiple segmentations for each image are produced, by varying the parameters of the normalized cut technique with the assumption that each object instance is correctly segmented at least by one segmentation. Then object classes which are groups of correctly segmented objects that are coherent in a large set of candidate segments, are learned. Another approach [1] obtains a hierarchy of object classes for static scenes by grouping image features which spatially co-occur across images for the same scene, under the same leaf of the hierarchy. In this manner, the technique learns to identify candidate objects such as keyboards, while also learning higher level object classes such as a desk area (consisting of a computer, desk etc). In this work we perform object discovery by first over segmenting the video in terms of colour patches and then grouping spatially cohesive and continuous coloured blobs to discover a candidate set of objects. We perform object class learning by clustering on a object space, where the similarity between objects is based on similar spatio-temporal behaviour (specifically object interactions) in scenes. Recent work on event learning [3, 4] aims at learning activity/event classes given a sequence of primitive events, where the primitive events are defined and recognized a priori. In [2] a relational representation language is introduced for defining temporal events, and algorithms for learning these definitions from video output are described. In this work, we introduce a generic definition for events, in terms of graphs, that captures changing spatio-temporal
M. Sridhar et al. / Learning Functional Object-Categories from a Relational Spatio-Temporal Representation
Figure 1.
Lattice for general to specific object learning
relationships between discovered objects. We show how this representation enables event mining and object learning.
3 Object Learning ¯ where X ¯ is a Assume the existence of a set of event classes F (X), sequence of object variables in some canonical ordering, between which some set of spatio-temporal relationships hold and which when instantiated, yields a set of event instances. The event classes ¯ = Fi (X1 , .., Xk , ..., Xm ) in general have multiple event inFi (X) stances in the corpus so that all these instances encode the same set (or more generally a similar set) of spatio-temporal relationships between their objects. This induces a natural mapping between objects corresponding to each object variable Xk for the event instances of an event class. Given a corpus of such instances, we show, using an example, how to induce functional object categories for the set of objects present in these instances. The event classes could be handcrafted manually through knowledge engineering techniques, or, as we describe in later sections, could be induced from a video by an event learning procedure. Let F (X1 , X2 , X3 ) be an example event class that represents events such as “X2 being lifted away from of X3 by X1 ”. The example in fig 2(c), is one such event instance (F (h1 , b1 , p1 )) of the event class F with object instances h1 , b1 , p1 having IDs 3, 4 and 6 respectively. Let us suppose that two other instances F (h1 , b2 , p1 ), F (h1 , b3 , p2 ) of the same class F had been observed in the scene. A lattice as shown in fig. 1 is grown from event instances at the bottom level (3), by generalizing exactly one argument position to a variable at each successive level. We then search for equivalence classes of objects from general to specific by traversing down this lattice, using the following procedure. For every node of each level l in the lattice, the procedure involves searching for sets of nodes at level l + 1, where each set is formed by substituting more than one object instance for the same variable Xk , for that node at level l. Applying this procedure at level 0 of the lattice, we get two such sets at level 1 (shaded with two colours) : {F (X1 , b1 , X3 ), F (X1 , b2 , X3 ), F (X1 , b3 , X3 )} obtained by substituting for X2 with b1 , b2 , b3 and
607
{F (X1 , X2 , p1 ), F (X1 , X2 , p2 )} obtained by substituting for X3 with p1 , p2 respectively . As the substituted constants {b1 , b2 , b3 } and {p1 , p2 }, play the same roles (as the variables X2 and X3 respectively) for the event class F , we say that F has induced event roles for instances of the variables X2 and X3 resulting in equivalence classes {b1 , b2 , b3 } and {p1 , p2 } respectively. We now show that, by applying the same procedure at one level below (level 1) of the lattice, we obtain a more specific event role for the specific event of objects placed on a certain plate (p1). The procedure applied at level 1 results in a set of nodes {F (X1 , b1 , p1 ), F (X1 , b2 , p1 )} at level 2 (as shaded in fig. 1), obtained by substituting for X2 in F (X1 , X2 , p1 ) with b1 , b2 respectively. We say that the more specific event class F (X1 , X2 , p1 ) has induced a more specific event role for the variable X2 resulting in an equivalence class of objects {b1, b2}, i.e. objects being put on plate p1 . By progressively traversing down the lattice using this procedure, it becomes possible to create event roles and corresponding equivalence classes C1 ...Cn , from general to specific. Applying this idea, we produce a matrix of object by equivalence classes, O in which Oi,j equals 1 if the object i occurs in the equivalence class Cj and 0 otherwise. As each equivalence class corresponds to an event role, the row vectors of this matrix summarize each object in terms of the role it plays in all the event-roles and thus induce a multidimensional object space. In this space, objects that have a similar role with respect to similar sets of events are expected have a high similarity measure. We therefore perform k-means clustering using a cluster partition index to determine k. Hierarchical clustering on these categories then yields an object taxonomy. In the next section, we show how event classes can be learned from video input and in section 6 the results of applying our object learning procedure are discussed.
4 Activity Graphs from Video Object discovery is performed by first over segmenting the video in terms of colour patches and then grouping these into spatially continuous and cohesive blobs that are a mix of noisy patches along with potential objects. These blobs are given IDs and their position and extent are recorded from the video. The spatio-temporal patterns in the entire video are represented using an activity graph. The spatial relationships between the bounding boxes of each pair of objects for every frame are mapped to a set of spatial primitives = {D, S, T}. Two objects are either spatially Disconnected(D) or connected through the Surrounds(S) or Touches(T) relationships 2 . illustrated in fig. 2(b). For each pair of objects, these spatial relationships hold during a time interval. In general, If {o1 , o2 ...on } is the set of all the objects observed in the video, for each pair oi , oj , a particular spatial relationship r ∈ holds for each frame f , i.e. holds(r(oi , oj ), f ). We are interested in maximal one-piece time intervals during which r holds between oi and oj , which we refer to as episodes. We represent such episodes by a quadruple E = oi , oj , τ, r, where |{r : Holds(r(oi , oj ), f ) ∈ τ }| = 1 and τ is a consecutive sequence of frames such that ∀τ (τ ⊂ τ → |{r : Holds(r(oi , oj ), f ) ∈ τ }| > 1 . We thus obtain the set of all episodes Δ = {E1 , E2 ...Em } for all pairs of objects. Episodes labelled E1 − E20 in fig 2(a) correspond to this set, for the activity considered in this example. 2
This approach clearly could be applicable to any set of spatial relations . Our simplified approach to video analysis is 2D, thus using this set of spatial relations means, e.g. an object o1 placed on an object o2 is represented as S(o1 , o2 ) – these 3 relations have sufficed for our experiments.
608
M. Sridhar et al. / Learning Functional Object-Categories from a Relational Spatio-Temporal Representation
(a) An activity
(b) Spatial and Temporal Primitives
(c) A subactivity of the activity in (a)
(d) Level-0 activity graph episodes E5 − E12 in (c)
for
(e) Level-1 Activity Graph for episodes E1 − E20 in (a)
Figure 2.
Having obtained all the episodes, we obtain a complete graph – which we call an activity graph – whose vertices represent the episodes and whose edges relate the time intervals corresponding to their respective episodes using Allen’s temporal primitives . We call the complete graph encoding all temporal relationships between intervals E1 −E20 a level-0 activity graph for the activity in fig. 2(a). More formally, we have the activity graph (V, E, η, ρ, Δ, ), where the function η : V −→ Δ maps the vertices V = {vi } to episodes in Δ and ρ : E −→ maps the directed edges between all pairs of vertices E : eij = vi , vj to temporal relationships in . We require that η is a bijective mapping from vertices to the set of episodes in the activity graph. The complete activity graph is too large to display here and a typical activity graph is too complex to be able to search to find event
classes3 . Fig. 2(d) shows a subgraph of the level-0 activity graph for episodes E5 − E12 - depicted in fig. 2(c). Therefore, prior to searching for event classes we use an attention mechanism to structure and simplify the level-0 activity graph to produce a level-1 activity graph. This is achieved by using a foreground attention mechanism (described below) to cluster episodes and forming a new graph structure over these clusters. Each cluster represents an atomic event and we call the clusters of episodes and their Allen relationships, a unary event graph (unary EG). The graph whose nodes are unary event graphs and whose edges are Allen’s temporal relationships between these nodes is the level-1 activity graph. 3
If we consider n = 10 objects and k as the average number of episodes in video which is usually 102 even for scenes that last for a minute, the activity graph results in a search space of O(k 2 n4 ) .i.e O(108 ).
M. Sridhar et al. / Learning Functional Object-Categories from a Relational Spatio-Temporal Representation
Foreground Attention Mechanism: We hypothesize that many activities can be conceived in terms of different foreground events each of which involve interactions only between a subset of objects – foreground objects, at different time periods. This idea can be intuitively explained using fig. 2(a), where the entire activity shown can be conceived in terms of three foreground events - (1) the left hand scooping some butter with a knife (2) the right hand removing the bread from the plate (3) the left hand spreading butter on the bread with a knife, while the right hand holds the bread. As long as {left hand,knife, butter} and {right hand, plate , bread }, are disconnected, we have two sets of foreground objects {1, 2, 5}, {3, 4, 6}, between frames 26 and 49. When the knife and the bread start to interact, the foreground set changes to the set of IDs {1, 2, 3, 4}, in which the butter and plate with IDs 5 and 6 are not included (frames 54-75). Three periods and their corresponding set of episodes {E1 − E4 }, {E5 − E12 }, {E13 − E20 } (as shown in the parallel lines below the frames), for the three foreground events are thus obtained. The next two paragraphs describe how, in general foreground events are detected and may be ommited on a first reading. We look for spatial changes between a pair of objects. For each such pair of primary foreground objects o1 , o2 at some frame f , we find the set Ω of all moving objects which are connected (i.e. T or S) to o1 or o2 , or which are connected to o1 or o2 indirectly via another moving object which is connected to o1 or o2 (directly or indirectly). The set Ω is propagated forwards to some frame f2 and backwards to some frame f1 from f until such time that one of the objects in Ω − {o1, o2} (the secondary foreground objects) changes its spatial relation to some other object in Ω to D, (unless o1 and o2 are connected at that time). The entire time from f1 to f2 is termed a period during which a foreground event involving o1 and o2 occurs, involving all the foreground objects Ω. The intuition behind this definition is that a spatial change focuses attention on a pair of objects (at least one of which must be moving, since a change has occurred), and all the objects which are intimately connected to the two objects, and groups all the interactions involving the primary objects together until such time as one of the secondary objects becomes fully disconnected from the group of objects (which then terminates this particular set of foreground objects). Note that it is possible, depending on the choice of primary objects o1 and o2 for there to be multiple temporally overlapping foreground events involving shared objects (though this has not occurred in the videos we have analysed so far). For each foreground event, we create a unary event graph (unary EG) restricted to the foreground objects of the foreground event and just during the temporal extent of the foreground event. Each unary EG endures for a period P and can be represented by the unary EG (V, E, η, ρ, ΔP , ) between the episodes for the time period P . The three unary EGs for the activity in fig. 2(a) are shown as the nodes in the level-1 activity graph in fig. 2(e). Unary EGs (which are single nodes of the level-1 activity graph) typically capture simple events such as removing a slice of bread from a plate. In the next section we show how to generalize unary events to unary event classes, and then how to form n-ary event classes, which are compound event classes composed of unary event classes. Instances of n-ary event classes are n-ary events which are composed of n unary EGs of the level-1 activity graph and which represent complex events such as the entire activity depicted in fig. 2(a,c).
609
5 Event Learning The activity graph consists of many individual events; these can be similar in that they have similar spatio-temporal relationships between their constituent objects. In order to formalize the idea of an event class that captures these regularities, independent of the actual objects involved, we first introduce a generalized version of an unary event graph. We then show how n-ary event classes can be formed, consisting of individual unary event classes. To generalize events to event classes, we first consider a unary EG φ = (V, E, η, ρ, ΔP , ) for a time period P . Instead of object instances oi ∈ Ω and intervals τ ∈ Λ, consider sets of object and interval variables X = XO , XT so that Oi ∈ XO and T ∈ XT 4 . We can now generalize the set of episodes E ∈ ΔP to EX ∈ ΔX where ΔX is a set such that EX ∈ ΔX if and only if EX = O1 , O2 , T, r where O1 ∈ XO , O2 ∈ XO , T ∈ XT , r ∈ . We use the generalised set of episodes to formalise event classes by first defining a unary event class graph (unary ECG) which captures a common pattern of spatio-temporal relationships amongst a set of similar unary EG (instances), in a generic form. Definition Let φ = (V, E, η, ρ, ΔP , ) be a unary EG of the transformed activity graph, then γ = (V , E , η , ρ , ΔX , ) is a unary event class graph (unary ECG) of φ, or we say that γ θ-generalizes φ if ∃θ = θO · θT where θO : XO → Ω and θT : XT → Λ, such that γ is isomorphic to φ under the substitution θ, i.e. 1. {η (v )θ : v ∈ V } = {η(v) : v ∈ V }. 2. {ρ (eij ) : eij = (vi , vj ) ∈ E } = {ρ(eij ) : eij = (vi θ, vj θ) ∈ E}. We require that a unary ECG generalises at least λ unary EGs, i.e. instances must occur frequently. We now extend the the idea of a unary event class graph to an n-ary event class graph (n-ary ECG) composed of unary ECGs. A n-ary ECG is just a graph made up of unary ECGs γ1 ...γn , n > 2 as its vertices and whose edges relate the time periods Pi and Pj corresponding to γi and γj by Allen’s temporal primitives . A n-ary ECG Γ whose vertices are the set {γ1 , ..., γn } θ-generalizes an n-ary EG Φ with vertices {φ1 , ..., φm }, if each γi θ-generalizes a corresponding φi and the temporal relationship between any φi , φj ∈ Φ is the same as for the corresponding γi , γj ∈ γ. A n-ary ECG represents a n-ary event class if it generalises at least λ n-ary EGs. We model λ as an exponential decreasing function of n in order to allow for larger n-ary ECGs to θ-generalise fewer n-ary EGs. Using these definitions, we finally formalize event classes as maximal event class graphs. We define a maximal event class graph (MECG) as a event class graph which generalises some set of EGs, such that no other ECG which contains it generalizes this set. I.e. every MECG generalises a set of EGs which are not generalised by some larger ECG. The procedure for computing MECGs involves two stages. In the first stage, unary ECGs with a statistically significant number of EG instantiations are found. In the second stage, these unary ECGs are iteratively used to build larger and larger ECGs (with statistically significant number of instantiations), until a final set of MECGs are obtained. In this manner we discover event classes as MECGs from the level-1 activity graph. Having found all the MECGs, we give them names ¯ ¯ ¯ F1 (X)...F k (X), where X is a sequence of variables in the 4
Note that we use capitalized/bold letters for variables and small letters for instances.
610
M. Sridhar et al. / Learning Functional Object-Categories from a Relational Spatio-Temporal Representation
Figure 3.
A hierarchy of objects categories.
MECGs, in some canonical ordering of nodes in each MECG. In section 3, where we were purely concerned with inducing an object taxonomy from the event definitions we ignored the internal ¯ which can be structure of an MECG and used just these Fi (X), defined as predicates from each of the MECGs.
6 Experiments We demonstrate our framework using a video taken with a toy (plastic) kitchen set up. We have chosen a constrained environment for the moment, in order to minimize the complexities arising in a real kitchen as a result of cluttered backgrounds, flickering lights, shiny surfaces, multiple shadows etc. We have further simplified the problem by focusing only on the hand (not the entire person) along with the other objects in the kitchen scene and taking care in the actions of the cook to not create complications arising, for instance, from full occlusion of objects involved. However, despite such simplifications, a large number of noisy patches are produced from the object discovery module, making the learning problem challenging. The video is taken with a static overhead camera that focuses on the scene. The scene consists of hands simulating the preparation of sandwiches, hot drinks, cutting vegetables and cooking vegetable dishes, lasting around 10 minutes. The video consists of exactly one instance for each of these preparations. After applying event and object learning, we obtain the object hierarchy in fig. 3. While our procedure outputs a hierarchy of object IDs, we replace these labels with the corresponding objects from the video, in order to visualize the results. It can be observed that the proposed framework has been able to differentiate between broader categories such as food items and containers and interestingly separate noisy patches from all other objects. Finer levels of granularity are captured in the grouping which separates a slice of white bread from another group consisting of vegetables. A distinction between plates pans and spoons is also clear from the hierarchy. It can therefore be concluded that the learned categories and taxonomy is intuitive and corresponds to a functional classification of objects.
7 Summary and Future Work A framework for learning object and event categories from video has been introduced. This framework offers a general way of representing activities in terms of spatio-temporal graphs. Techniques for
mining events from this graph and then learning object functional categories from these events have been proposed in this work. Our experiments show that our framework offers a promising approach toward learning functional categories. In the future, we plan to extend this framework in several directions. At present, event generalisation requires exact graph isomorphism. We plan to extend event classes to generalize a larger set of event instances by experimenting with similarity metrics between our event graphs. This will allow our approach to exploit a greater variety of video input to learn event and object taxonomies , and to cope better with noise (which might also intervene during an event instance). In contrast to almost all work in object recognition which is based on learning categories based on perceptual features, we have tackled the little researched problem of learning categories from function. However, there is clearly scope to use the learned functional categories to supervise visual appearance based object learning.
REFERENCES [1] D.Parikh and C. Tsuhan;, ‘Unsupervised learning of hierarchical semantics of objects (hsos).’, Computer Vision and Pattern Recognition, 2007. CVPR ’07., 1–8, (2007). [2] Givan R.L. Fern, A.P. and J.M. Siskind, ‘Specific-to-general learning for temporal events with application to learning event definitions from video’, Artificial Intelligence Research (JAIR), 17, 379–449, (2002). [3] Somboon Hongeng, ‘Unsupervised learning of multi-object event classes’, in: Proc. 15th British Machine Vision Conference (BMVC’04), London, UK, 2004, 487–496, (2004). [4] A. Bobick R. Hamid, S. Maddi and I. Essa, ‘Structure from statistics unsupervised activity analysis using suffix trees’, in Proc. of Conf. on Computer Vision, (2007). [5] Bryan C. Russell, William T. Freeman, Alexei A. Efros, Josef Sivic, and Andrew Zisserman, ‘Using multiple segmentations to discover objects and their extent in image collections’, CVPR 06: Proc. of Comp. Soc. Conf. on Computer Vision and Pattern Recognition, 1605–1614, (2006). [6] Brandon C.S. Sanders, Randal C. Nelson, and Rahul Sukthankar, ‘A theory of the quasi-static world’, Proc. 16th Int’l Conf. on Pattern Recognition ICPR02, (2002). [7] Tristram Southey and James J. Little, ‘Object discovery using motion, appearance and shape’, Cognitive Robotics Workshop, AAAI, (2006). [8] M Veloso, P Rybski, and F von Hundelshausen, ‘Focus: A generalized method for object discovery for robots that observe and interact with humans’, Proc. Conf. on Human-Robot Interaction, (2006). [9] A. Rosenfeld P.J. Phillips W. Zhao, R. Chellappa, ‘Face recognition: A literature survey’, ACM Computing Surveys, 399–458, (2003).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-611
611
Sequential spatial reasoning in images based on pre-attention mechanisms and fuzzy attribute graphs Geoffroy Fouquier1 and Jamal Atif2 and Isabelle Bloch1 Abstract. Spatial relations play a crucial role in model-based image recognition and interpretation due to their stability compared to many other image appearance characteristics, and graphs are well adapted to represent such information. Sequential methods for knowledgebased recognition of structures require to define in which order the structures have to be recognized, which can be expressed as the optimization of a path in the representation graph. We propose to integrate pre-attention mechanisms in the optimization criterion, in the form of a saliency map, by reasoning on the saliency of spatial area defined by spatial relations. Such mechanisms extract knowledge from an image without object recognition in advance and do not require any a priori knowledge on the image. Therefore, pre-attentional mechanisms provide useful knowledge for object segmentation and recognition. The derived algorithms are applied on brain image understanding.
1 Introduction Sequential segmentation is a useful approach for knowledge-based object recognition where objects are segmented in a predefined order, starting from the simplest object to segment to the most difficult one. The segmentation and recognition of each object is then based on a generic model of the scene and relies on the previously recognized objects. This approach, as developed e.g. in [3], requires to define the order according to which the objects have to be recognized and the choice of the most appropriate order is one of the difficulties raised by this approach. Here, the recognition and the segmentation of the objects of the scene are performed at the same time. The sequence of objects may be expressed as a path in a graph, where each node of the graph represents an object. In this paper, we propose a new approach to this problem integrating information extracted from the data, based on the notion of saliency. The visual system is usually modeled using pre-attentional and attentional mechanisms. Basically, the purpose of the pre-attentional step is to guide the attentional step to select salient parts in the scene. This selection allows the attentional process to focus only on the salient part (object or region) and thus reduces the computational cost of this mechanism. We can easily draw some similarities between the iterative segmentation scheme and the visual system: the pre-attentional mechanism could correspond to the selection of the next object to segment and the attentional mechanism to the segmentation of an object of the scene (and its interpretation). Thus the iterative segmentation framework is viewed as a scene exploration and analysis process. 1 2
TELECOM ParisTech (ENST), CNRS-LTCI UMR 5141, Paris, France {geoffroy.fouquier, isabelle.bloch}@enst.fr IRD-Cayenne/UAG, email: atif@cayenne.ird.fr
Our contribution is to introduce a pre-attentional mechanism in the optimization of the segmentation path for a sequential image segmentation process. This article is organized as follows. First we present in Section 2 how to represent the knowledge composing the generic model of the scene. In Section 3, a brief overview of the modeling of the visual system is given as well as a presentation of the pre-attentional mechanism used in the following section. Then we present in Section 4 a way to evaluate which information is given by the attentional mechanism. Then, Section 5 presents a way to integrate the saliency map into the segmentation process. Experiments and results are presented in Section 6 on an example of brain image understanding and Section 7 draws some conclusions.
2 Knowledge representation Graphs are well adapted to represent generic knowledge, such as spatial relations between the objects of a scene. In the sequential segmentation framework, the generic model of the scene is modeled as a graph where each vertex represents an object of the scene and each edge represents one or more spatial relations between two objects. We introduce the following notations: Let ΣV , ΣE be the sets of vertex labels and edge labels, respectively. Let V be a finite nonempty set of vertices, Lv be a vertex interpreter Lv : V → ΣV , E be a set of ordered pairs of vertices called edges, and Le be an edge interpreter Le : E → ΣE . Then G = (V, Lv , E, Le ) is a labeled graph with directed edges. For v ∈ V and e ∈ V ×V , δ(v, e) is a transition function that returns the vertex v such that e = (v, v ). For v ∈ V , A(v) returns the set of edges adjacent to v. Finally, p = (v1 , v2 , ..., vn ) is a path of length n labeled as lp = (v1 , e(v1 , v2 ), v2 , ..., vn ). A knowledge base KB defines all the spatial relations existing between vertices in the graph: KB = {vi Rvj , vi , vj ∈ V, R ∈ R} and e = (v1 , v2 ) ∈ E ⇐⇒ ∃R ∈ R, (v1 Rv2 ) ∈ KB, where R is the set of relations. In the following, we use fuzzy representations of the spatial relations, since they are appropriate to model the intrinsic imprecision of several relations (such as “close to”, “behind”, etc.), the potential variability (even if it is reduced in normal cases) and the necessary flexibility for spatial reasoning [2]. Here, the representation of a spatial relation is computed as the region of space in which the relation R to an object A is satisfied. The membership degree of each point corresponds to the satisfaction degree of the relation at this point. Figure 2 (b,c) presents an example of a structure and the region of space corresponding to the region “to the right of” this structure. A directed edge between two vertices v1 and v2 carries at least one spatial relation between these objects. An edge interpretor associates to each edge a fuzzy set μRel , defined in the spatial domain S,
612
G. Fouquier et al. / Sequential Spatial Reasoning in Images Based on Pre-Attention Mechanisms and Fuzzy Attribute Graphs
representing the conjunctive merging of all the representations of the spatial relations carried by this edge to a reference structure. Since there is at least one spatial relation carried by an edge, μRel cannot be empty. Let μeRi , i = 1, ..., ne the ne relations carried by an edge e. Then μeRel is expressed as: μeRel = i=1..ne (μeRi ) with a t-norm (fuzzy conjunction) [4]. Since objects are sequentially segmented, we propose to focus our attention by using known spatial relations with previously segmented objects. The set of target objects is filtered as the set of unsegmented objects which have a spatial relation with a previously segmented object. The set of segmented objects is filtered likewise as the set of objects which have a spatial relation with an unsegmented object of interest. The “search area” is thus defined by the merging of the representations of known spatial relations between previously segmented objects which have an edge in the graph with the target object. We now describe the modeling of the main relations that we use: distances and directional relative positions. A distance relation can be defined as a fuzzy interval f of trapezoidal shape on R+ . A fuzzy subset μd of the image space S can then be derived by combining f with a distance map dA to the reference object A: ∀x ∈ S, μd (x) = f (dA (x)), where dA (x) = inf y∈A d(x, y). The relation “close to” can be defined as a function of the distance between two sets: μclose (A, B) = h(d(A, B)) where d(A, B) denotes the minimal distance between points of A and B: d(A, B) = inf x∈A,y∈B d(x, y), and h is a decreasing function of d, from R+ into [0, 1]. We assume that A ∩ B = ∅. The relation of adjacency can be defined likewise as a “very close to” relation, leading to a degree of adjacency instead of a Boolean value, making it more robust to small errors. Directional relations are represented using the “fuzzy landscape approach” [1]. A morphological dilation δνα by a fuzzy structuring element να representing the semantics of the relation “in direction α” is applied to the reference object A: μα = δνα (A), where να is defined, for x in S given in polar coordinates (ρ, θ), as: να (x) = g(|θ − α|), where g is a decreasing function from [0, π] to [0, 1], and |θ − α| is defined modulo π. This definition extends to 3D by using two angles to define a direction. The example in Figure 2 (b,c) has been obtained using this definition. Other relations can be modeled in a similar way [2]. These models are generic, but the membership functions depend on a few parameters that have to be tuned for each application domain according to the semantics of the relations in that domain.
3 Saliency Maps Among the pre-attentional mechanisms, we focus on the saliency map, as defined by Koch and Ullman [6]. This mechanism allows selecting areas using some basic features easily computable on every type of images. Figure 2 presents a saliency map and its restriction around an object which allows exploring the area of the image around the object. This approach uses three basic features: intensity, color and orientation. For each feature, the difference between a location and its immediate surrounding is computed. For intensity, this is the difference of contrast. For color, two oppositions of colors are studied: between red and green on the one hand, and between blue and yellow on the other and. And for orientation, four directions are studied with Gabor filters. Overall, seven features are considered. Nine scale spaces are created with dyadic Gaussian pyramids for each feature and six maps are derived by center-surround difference between the fine scale c in {2, 3, 4} and the coarse scale of the pyramid
s = c+d, with d in {3, 4}. Finally, all maps corresponding to a same feature are normalized, and a conspicuity map per feature (the sum of all corresponding maps) is computed. Then the three conspicuity maps are merged with a weighted mean to produce the saliency map. Figure 1 presents an example of a saliency map.
Figure 1.
Lena and the corresponding saliency map (dark: not salient, bright: most salient parts)
This approach is a data-driven bottom-up approach, and the only top-bottom connections is for the occlusion of the most salient location. But more top-bottom connections are required to define protoobjects [7], an extension of the first method recently presented. In this case, the saliency map is computed as in the original method, but once the most salient location is detected, a feedback connection allows finding which conspicuity map, and then which map produces this salient location (or contributes the most). Then, a proto-object is defined as the connected component (a pixel belongs to the component if one of its neighbors is in the component, and if its value is higher than a threshold) at the same location of the higher value of the saliency map, on the map which produces it.
4 Evaluating saliency on manually segmented structures The sequential segmentation framework with the optimized segmentation path described in [5] uses generic knowledge and a segmented database and therefore cannot take into account the intrinsic segmentation difficulties of each object. These difficulties vary with respect to the object features: shape, homogeneity, texture or boundaries, or image noise. Some generic rules could be constructed, e.g. this object is more difficult to segment than this other one, but this kind of rule is not necessarily true for each image even in a restricted application domain. We consider that the information of saliency is directly related to the difficulties of segmentation because an object with a salient border will be much simpler to segment than an object with a less salient border. Therefore, we propose a methodology to derive the difficulty of segmentation from saliency information and to compare all the areas of saliency corresponding to the previously segmented objects. The area of saliency for an object corresponds to the saliency map masked by the segmentation (a binary map) of this object and possibly its surrounding. Depending on the class of segmentation algorithms, we may not be interested by the same parts of the objects. If we consider an edgebased segmentation algorithm, then we consider that the most important area to take into account for the image segmentation is the border of the object. In this case, the interesting part of the object should be extracted for example as the dilated segmentation of the object, in order take into account the surrounding of the border. In a region-based segmentation, the whole object is extracted depending on a homogeneity criterion. The saliency map is masked, in this case, by the extracted object.
G. Fouquier et al. / Sequential Spatial Reasoning in Images Based on Pre-Attention Mechanisms and Fuzzy Attribute Graphs
(a)
(b)
(c)
(d)
613
(e)
(a) A slice of a 3D Magnetic Resonance Image. (b) Right lateral ventricle. (c) Fuzzy subset corresponding to the spatial relation “right of” (b). (d) A slice of the saliency map of (a). (e) Saliency around the ventricle (dark: not salient, bright: most salient parts)
Figure 2.
Once the saliency for the surrounding of each object has been extracted, an histogram of the saliency map is computed for each object. Once normalized, we have a distribution of the saliency for each object. Therefore, we propose to estimate the difficulty of segmentation as a comparison of the histograms of saliency. In our experiments, we compute the energy of the histogram as a criterion of comparison between histograms. The energy of the histogram H, with N P 2 bins, is computed as follows: energy(H) = N h(n) where h is the function that counts the number of occurrences of value n in the saliency map. Figure 5 presents two histograms of several objects from two images. This methodology is not used for a segmentation purpose (here we are trying to get rid of the usage of a previously segmented base), but only to study the saliency of the different objects and to exhibit the potential interest of this type of measure.
where the superscript i denotes the iteration. Accordingly the set of target vertices is filtered so as to keep only the vertices connected with the already segmented set of vertices. Likewise, the latter set is filtered to the subset of vertices connected with an edge to the set of target objects. The set of edges is filtered accordingly. The obtained subgraph forms a bipartite graph composed by both sets of known and target objects, and by the set of edges representing the spatial relations between both groups of vertices:
5 Using saliency for image interpretation
For each edge e in Ef , the edge interpretor produces μeRel . The area of space of the search domain is defined as the merging of the support of all edge representations, given by the edge interpretor:
Approaches relying on the shape of the target object, like in [5], make the assertion that the generic model is always valid, i.e. that all objects from the generic model are always present and no new object can be taken into account. Here, the exploration relies on the previously recognized objects only and not on the shape of the target object, which allows dealing with changes in the model. Image segmentation is seen as a scene exploration process, where only a small region of space is analyzed at a given time, i.e. objects are segmented individually. Also, the exploration of a new area of space uses the previously explored area, here the segmented objects are used to segment the remaining parts of the scene. The process is guided using a pre-attentional mechanism, here a saliency map, which indicates the most salient area of space in the search domain. This area is computed using the already known part of the scene and the spatial relations existing between these objects and the objects that still to be found. Figure 3 presents the general scheme of the method. At first, we present how the graph is filtered to compute the area of search, then we present the process of selecting the next object to segment. In the following, the original image is denoted by I. The vertices of the graph are divided into two disjoint groups of vertices: V = Vseg ∪ Vtar . At the beginning of the process, a first object is considered as known and segmented: Vseg = {vinit }. This object can be detected using saliency in the image, or other information (in brain imaging, the lateral ventricle can be segmented using a completely different scheme for example). The recognition of an object implies thus to move a vertex from the set of target vertices to the set of segmented vertices and it is mandatory that the vertex to segment is directly connected to the set of already segmented vertices. An iteration of the sequential segmentation is expressed as a function of the previously segmented objects Vseg , the chosen next object to segment vˆ, the saliency map of the image salI , the original image I and Ef the spatial relations between both sets of objects, already segmented and to be segmented,
respectively: i i−1 Vseg = seqseg(Vseg , vˆ, salI , I, Efi−1 )
Vf s = {v1 ∈ Vseg | ∃v2 ∈ Vtar , (v1 , v2 ) ∈ E} Vf t = {v2 ∈ Vtar | ∃v1 ∈ Vseg , (v1 , v2 ) ∈ E} Ef = {(vt , vs ) | vt ∈ Vf t , vs ∈ Vf s }
μsd = ⊥e∈Ef (μeRel ) with ⊥ a t-conorm (fuzzy disjunction) [4]. The binary map corresponding to the search domain gives an area of space which includes the spatial location of all the target objects (hence a disjunction combination). Note that this spatial location could cover a large part of the image space, particularly if the only spatial relation between two objects is a relation of direction. The search domain sd is simply defined as: sd = support(μsd ) Now, we present how the process of selection of a target vertex by an analysis of the saliency in the search domain. The filtering of the graph gives two groups of vertices: Vf s and Vf t and we have to choose in Vf t the next vertex (and so the object that the vertex represents) to recognize. For each candidate vertex v, its estimated spatial location is defined by the merging of the spatial relations connecting this vertex to the previously recognized vertices: locv = e∈(A(v)∩Ef ) (μeRel ) with a t-norm. This estimated spatial location of a vertex is then combined with the search domain, to extract the saliency in the area of the estimated location of the target object and its surrounding: saliencyv = (locv , sd, salI ) An histogram of this area is then produced. We select the next object to segment by an analysis of this histogram. Among other measures, the energy of the histogram (previously defined) is kept as a criterion of selection and allows selecting the most salient area and then the next object to segment: vˆ = arg max (energy(Hv )) v∈Vft
614
G. Fouquier et al. / Sequential Spatial Reasoning in Images Based on Pre-Attention Mechanisms and Fuzzy Attribute Graphs
Generic Knowledge
:already segmented : to be segmented
original image saliency map
2
3
1 4 Model Graph
2 1
2
3
1
Saliency around 3
3
Structure Selection
Saliency around 4
4 current graph
4 Filtered fuzzy subsets graph representing spatial relations
2
3
1 updated graph Figure 3.
search domain
Segmentation 4
Block diagram of the proposed method to include a pre-attentional mechanism into sequential segmentation.
the exploration of the scene consists then in moving a vertex from the set of target vertices to the set of known vertices, and the selection of the moved vertex is realized by the comparison of the saliency of each object area of the search domain, which corresponds to a modeldriven exploration of the scene. This method allows us to directly take into account the knowledge given by the current image and does not rely on a representation of the target objects during the process. The segmentation of the object is expressed as a function of the selected object to segment vˆ, selected with a criterion based on saliency, its spatial relations with the previously segmented objects and the original image:
Finally, 9 maps are extracted. Note that we could extract more planes allowing to take into account more directions thus better isotropy. Experiments have been conducted using a manually segmented database of human brain 3D MRI (IBSR database). This database is composed by 18 brain images with their segmentations. The parameters of the membership functions used to computed the representation of the spatial relations are learned on a database of healthy cases (IBSR) and pathological cases (5 differents cases so far, corresponding to different types of brain tumor). Table 1 presents some relations used in our experiments.
v, locvˆ , I) segvˆ = segment(ˆ
Table 1. Some relations used in our experiments. LLV: left lateral ventricle LCN: left caudate nucleus, LTH: left thalamus and LPU: left Putamen.
Finally, the set of segmented object is updated: i−1 i i−1 i = Vseg ∪ {ˆ v } and Vtarget = Vtarget \ {ˆ v} Vseg
v1 LLV LLV LLV LCN LCN
R RightOf CloseTo DownOf RightOf InFrontOf
v2 LCN LCN LTH LPU LTH
v1 LCN LTH LTH LTH
R UpOf BehindOf DownOf RightOf
v2 LTH LCN LCN LPU
6 Application to human brain structures recognition Saliency map on 3-dimensions MRI Saliency maps, especially according to Koch and Ullman, are usually computed on 2D natural images with a sufficient resolution to produce the requested scale of the dyadic pyramid. In the case of 3D magnetic resonance images (MRI), the resolution of the image is often small. The IBSR database3 images used during our experiments have the following size: 256 × 256 × 128. We limit our pyramid to 7 scales (including the original scale). The fine scale used to compute maps are 1, 2 and 3. The coarse scale are the fine scale plus a δ ∈ {2, 3}, i.e. 1 + 2, 1 + 3, 2 + 2, 2 + 3, . . . . Finally, the saliency map is computed with the size of the third level of the dyadic pyramid. 3D MRI provides only one channel which is considered as an intensity in the computation. Since there is no color channel, color features are just removed. For orientation, we use a similar approach as in 2 dimensions, but on 3 different planes defined by the axis x and y for the first plane, x and z for the second, y and z for the last one. We considered 4 directions for each plane and removed the duplicates. 3
Internet Brain Segmentation Repository. The MR brain data sets and their manual segmentations were provided by the Center for Morphometric Analysis at Massachusetts General Hospital and are available at http://www.cma.mgh.harvard.edu/ibsr/
Saliency on manually segmented structures In our experiments, the area of saliency taken into account for each structure corresponds to the 3D binary map of the segmentation of one object dilated by a elementary structuring element in 6-connectivity. The saliency map is normalized between 0 and 255. The histogram in Figure 4 presents the saliency for each of the three structures on all images, and it shows the variation of saliency, although the IBSR data set is quite uniform. This variation shows that the measure of saliency takes into account specific information about each image. Table 2 presents saliency measures for three anatomical structures of the human brain plus the same measure for the white matter and the gray matter. These measures (energy of the histogram) are always higher for the three anatomical structures. Figure 5 presents some histograms of saliency for these structures. Histograms of saliency for gray and white matter are in most of the cases larger and lower than histograms for other structures, and particularly the histograms of caudate nucleus and putamen. Thus, there is more saliency in the area of the anatomical structures than in areas of gray or white matter, which does not present much information. Comparing structures, it appears that the thalamus has generally lower values (it has less well defined boundaries). Hence it can be expected that its segmentation
615
G. Fouquier et al. / Sequential Spatial Reasoning in Images Based on Pre-Attention Mechanisms and Fuzzy Attribute Graphs
0.2
Table 2. Saliency measures (energy measure of saliency histogram)
for 3 anatomical structures, white matter (LWM) and gray matter (LGM) for all images of the IBSR database. LCN: left caudate nucleus, LTH: left thalamus and LPU: left Putamen. LCN 0.065 0.097 0.039 0.050 0.038 0.054 0.039 0.040 0.039 0.045 0.037 0.033 0.037 0.046 0.033 0.032 0.045
LTH 0.057 0.064 0.033 0.031 0.028 0.038 0.024 0.026 0.026 0.030 0.025 0.029 0.033 0.030 0.026 0.025 0.032
LPU 0.068 0.095 0.042 0.054 0.107 0.099 0.046 0.046 0.061 0.060 0.048 0.032 0.069 0.061 0.044 0.044 0.049
LWM 0.026 0.041 0.027 0.026 0.027 0.038 0.023 0.020 0.026 0.027 0.019 0.026 0.031 0.025 0.017 0.022 0.022
LGM 0.015 0.020 0.017 0.017 0.018 0.025 0.018 0.014 0.020 0.014 0.011 0.017 0.020 0.017 0.014 0.015 0.020
will be more difficult.
0.16
LLV LCN LPU LTH LLV LWM LGM
1st selection LLV → LCN LTH 0.035 0.016 0.048 0.023 0.018 0.011 0.018 0.011 0.017 0.011 0.022 0.013 0.017 0.011 0.016 0.011 0.021 0.014 0.018 0.013 0.017 0.010 0.017 0.010 0.019 0.012 0.017 0.011 0.017 0.010 0.014 0.010 0.019 0.014
0.01
Figure 4.
150
200
250
300
The histograms of the saliency of each structure for all images in the database.
Sequential segmentation Starting from the lateral ventricle, we are looking for the next structure to segment. Table 3 presents the measures of saliency for the two structures connected to the lateral ventricle in the graph, the caudate nucleus and the thalamus, and the same measure, after the segmentation of the first structure. For all the images of the IBSR database, the same path is selected but with some variation of the criterion values. The resulting path corresponds to the path used in [3], defined intuitively, in a supervised way, thus with visual hints. It is hence very satisfactory to find the same path automatically using a saliency feature. The IBSR base is also a quite homogeneous database, and all images have been registered, lowering the difference between the images. Experiments on images with a higher variability, including pathological ones, are currently conducted. Figure 6 presents a typical segmentation using the resulting path.
0.04
0
20
40
60
80
100
120
140
160
180
200
hist. IBSR 02
Histograms of saliency for 4 anatomical structures, white matter and gray matter of the left hemisphere in a 3D MRI. In this case, the saliency is high for all structures, ventricle and caudate saliency histograms are clearly distinct from putamen and thalamus ones. Saliency of white matter and gray matter are lower than saliency of internal structures. Table 3. Measure of saliency for two successive selections, for each image in the IBSR database. The initial structure is the left lateral ventricle
0.005
100
0.06
Figure 5.
0.02
50
0.1
0.08
0
Ventricle CaudateNucleus Putamen Thalamus
0
0.12
IBSR 02
0.015
0
0.14
0.02
0.03
0.025
Ventricle CaudateNucleus Putamen Thalamus White Matter Gray Matter
0.18
2nd selection (LCN,LLV) → LTH LPU 0.015 0.012 0.022 0.017 0.011 0.009 0.011 0.010 0.011 0.009 0.013 0.012 0.011 0.010 0.011 0.010 0.014 0.013 0.012 0.010 0.010 0.009 0.010 0.009 0.012 0.011 0.010 0.009 0.010 0.009 0.010 0.010 0.014 0.013
[2] I. Bloch, ‘Fuzzy Spatial Relationships for Image Processing and Interpretation: A Review’, Image and Vision Computing, 23(2), 89–110, (2005). [3] O. Colliot, O. Camara, and I. Bloch, ‘Integration of Fuzzy Spatial Relations in Deformable Models - Application to Brain MRI Segmentation’, Pattern Recognition, 39, 1401–1414, (2006). [4] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, New-York, 1980. [5] G. Fouquier, J. Atif, and I. Bloch, ‘Local Reasoning in Fuzzy Attributes Graphs for Optimizing Sequential Segmentation’, in 6th IAPRTC15 Workshop on Graph-based Representations in Pattern Recognition, GbR’07, ed., springer, volume 4538 of LNCS, pp. 138–147, Alicante, Spain, (Jun 2007). [6] L. Itti, C. Koch, and E. Niebur, ‘A model of saliency-based visual attention for rapid scene analysis’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259, (Nov. 1998). [7] D. Walther and C. Koch, ‘Modeling attention to salient proto-objects’, Neural Networks, 19(9), 1395–1407, (Nov. 2006).
7 Conclusion We have presented a sequential segmentation framework viewed as a scene exploration process, and guided by a pre-attentional mechanism, here saliency map. Preliminary results show that saliency provides intrinsic information about the image, usable for its segmentation. Further work will be done on a larger graph with more structures and relations between them.
REFERENCES [1] I. Bloch, ‘Fuzzy Relative Position between Objects in Image Processing: a Morphological Approach’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(7), 657–664, (1999).
Figure 6.
Typical segmentation using the path found in our experiments
616
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-616
Automatic Configuration of Multi-Robot Systems: Planning for Multiple Steps Robert Lundh and Lars Karlsson and Alessandro Saffiotti 1 Abstract. We consider multi-robot systems where robots need to cooperate tightly by sharing functionalities with each other. There are methods for automatically configuring a multi-robot system for tight cooperation, but they only produce a single configuration. In this paper, we show how methods for automatic configuration can be integrated with methods for task planning in order to produce a complete plan were each step is a configuration. We also consider the issues of monitoring and replanning in this context, and we demonstrate our approach on a real multi-robot system, the P EIS-Ecology.
1
Introduction
One essential property of planning is that it is possible to detect, before hand, if a task can be executed or not. It is common sense for traditional action planning that if there is a plan that achieves the goal we know that it is possible to accomplish the task. If we cannot find a plan, we cannot accomplish the task. For the planning problem we address in this paper, it is not enough that we find an action plan for a task. Why this is the case will be explained later in this section. We assume that in a multi-robot system, each robot has a set of capabilities. If the robots are heterogeneous, they have different capabilities, and can thus provide a number a different functionalities which can be used to cooperate in different ways. For example, in a navigation task a robot must know its own position relative to the goal position, i.e., it needs to be localized. This position information can either be provided by using its own sensors, or another robot can provide this information by tracking the first robot and send the estimated position. We call configuration any way to instantiate and connect the different functionalities available in the multi-robot system. An action like move the robot Pippi to the livingroom can be implemented as a configuration. Different actions typically correspond to different configurations. Moreover, the same action can often be performed using different configurations depending on the availability and cost of the functional resources. The ability to automatically configure a multi-robot system in different ways for an action is the key to its flexibility and robustness. Configurations typically implement one action at a time, and if a task requires several steps (actions) to be achieved there is also a need for several configurations to be executed one after the other. This can be considered as two different problems: (1) finding a sequence of steps (a plan) and (2) finding configurations for the individual steps. However, the two problems are not independent of each other. When problem 1 is considered, it is not possible to know, before hand, that the generated plan can achieve the goal of the task unless problem 2 is also considered. That is, there might be steps in the plan for 1
¨ Center for Applied Autonomous Sensor Systems, Orebro University, Sweden. email: {robert.lundh, lars.karlsson, alessandro.saffiotti}@aass.oru.se
which no configuration can be found, and that would make the plan non-executable. This could happen in our earlier approach [7], were these two problems were considered in sequence: an action plan was first generated, and then configurations were generated online as they were needed. This paper proposes a better integration of problems 1 and 2. The approach considers both problems at planning time and can tell before hand if a task is executable or not. The single step configuration problem has been studied in several research areas, e.g., single robot task performance [8, 5], network robot systems [1, 3], and cooperative robotics [10]. However, none of these approaches consider sequences of configurations. There are also some works about integrating task planning with more detailed types of reasoning, such as the aSyMov planner [2] which combines symbolic and geometric reasoning. The rest of the paper is organized as follows. In Section 2 we give a reminder about the notion of functional configurations. In Section 3 we describe different solutions to integrated action planning and configuration generation. Sections 4 details the different parts of the approach and Section 5 presents an illustrative experiment.
2
Functional Configurations
For the configuration part of our approach we use the approach proposed in [6, 7]. We here give a brief description of the concepts of functional configurations. We assume that the world can be in a number of different states. The set of all states is denoted S. There is a number of robots r1 , . . . , rn . The properties of the robots, such as what sensors they are equipped with and their current positions, are considered to be part of the current state s0 . Robots are assumed to have modular functionalities that can be accessed and used independently, across and within the robots. A functionality f is an operator that perform computations, sensing or actuation. It is characterized by the following elements: • A specification of inputs I to be provided by other functionalities, including information about domain (e.g., video images), timing (e.g., 25 fps), etc. • A specification of outputs O provided to other functionalities, also containing domain and timing information. • A set of causal preconditions P r: conditions in the environment that have to hold in order for the functionality to be operational. • A set of causal postconditions P o: conditions in the environment which the functionality is expected to achieve. • A specification of costs Cost, e.g., computation and energy. • A body Φ, containing the code to be executed. This is typically a continuous loop, getting input, producing output.
R. Lundh et al. / Automatic Configuration of Multi-Robot Systems: Planning for Multiple Steps
Figure 1.
An example configuration
A channel ch transfers data from an output of a functionality to an input of another functionality. A configuration C is a pair F, Ch, where F is a set of functionalities and Ch is a set of channels. An important property of a configuration is that all the components in it are connected “in the right way”. We call this property admissibility, and distinguish two brands: a configuration is information admissible if each input of each functionality is connected to a compatible output of another functionality; it is causally admissible if all preconditions of all functionalities hold in the current world state. A precise definition of these properties can be found in [6]. As an example of a configuration (see Fig 1), consider a task in which robot B helps robot A to navigate. Robot A has two functionalities: a navigation controller and a wheel actuator. Robot B has four functionalities: a camera, an object tracker, a laser, and a scanmatch localization algorithm. The camera is connected to the tracker to obtain the position of A relative to B. The laser is connected to the localization to obtain the absolute position of B. These positions are combined to get the absolute position of A, which is sent to the controller on A that provides motion commands to the wheels. Configuration problem Let Σ be a multi robot system, and let D be a domain describing, in some formalism, all the functionalities that exist in Σ. D implicitly defines the set C(D) of all the configurations that can be built in Σ (both admissible and not admissible). Let A denote an action (or task), and s denote the current state. A configuration problemA, D, s for Σ is the problem of finding a configuration c ∈ C(D), admissible in state s, to perform A. Configuration planning To find a solution to a configuration problem we use a configuration planner [6]. The configuration planner uses techniques inspired by hierarchical planning, in particular the SHOP planner [9] in order to combine functionalities to form admissible configurations that solve specific tasks. This is done by searching the space of configurations to find one which is admissible in the current state and which has the lowest cost. The configuration planner takes as input a domain that describes the existing functionalities, a state of the available functionalities, and a goal (action). The configuration planner returns a configuration description, which essentially consists of a set of functionality names, and set of channels describing how the functionalities can be connected. It also returns the pre- and post-conditions and the total cost of the configuration.
3
617
Integrated action and configuration planning
The configuration problem above is concerned only with finding a configuration for one action. However, in practice most tasks require more than one action to be completed. For instance, if the robot wants to wake up a person, the robot must first reach the bedroom, then move close to the bed, before it can wake the person up.
Figure 2. Different ways to combine action and configuration planning. (a) Independent. (b) Fully integrated. (c) Loosely coupled.
Different configurations may be required for each of these actions. We call such a plan, where each action is a configuration, a configuration plan. A configuration plan is a sequence of configurations CP = c1 , . . . , ck , where k ≥ 0. Note that from now on we reserve the term task to denote the top-level task, and use the term action to denote each individual sub-task achieved by each configuration. A configuration plan is admissible if and only if each ci is admissible in the state si−1 it will be executed in. Note that each configuration can also change the state according to its postconditions. Thus, a domain D can be considered to define a state-transition system S, C, γ with states S, configurations C = C(D), and a transition function γ : S ×C → S defined according to the pre- and postconditions of the configurations. The state-transition function defines the states s1 , . . . , sk in which configurations are executed. Thus, the domain D implicitly defines the set CP(D) of all the configuration plans that can be built in Σ (both admissible and not admissible). Let then T denote a task (or goal state), and s0 denote the initial state. A configuration plan problemT, D, s0 is the problem of finding a configuration plan CP ∈ CP(D) to perform T , which is admissible from starting state s0 . In the remaining part we detail and discuss solutions to the configuration plan problem. The job of an action planner is typically to find a sequence of atomic actions a1 , . . . , ak that achieves a goal or task T . From a configuration perspective, each action ai can then be seen as an abstraction of the set of configurations {ci1 , . . . cin } that can implement it. Hence, combining an action planner with a configuration planner would let the robots deal with tasks that require more than one configuration/action to be performed. There are several ways this combination could be done. These ways can be described with the following variables: (1) If the decisions about what actions to perform (i.e., the action planning) should be taken at planning time or execution time. (2) If the actions should be expanded into configurations (i.e., the configuration planning) at planning time or at execution time. (3) If the action and configuration planning should be done independently of each other or not. We here present three different settings for the variables above. Independent action and configuration planning In [7], a simple approach to combine an action planner and a configuration planner is presented. It works by first calling the action planner to find an action plan a1 , . . . , ak for solving a particular task. That is, the decisions about which actions to perform (1) is done at planning time. This plan is then executed action by action. For each action ai that is performed, a suitable configuration ci is generated by the configuration planner at the time when the action must be ex-
618
R. Lundh et al. / Automatic Configuration of Multi-Robot Systems: Planning for Multiple Steps
ecuted. Thus, for (2) the decision about when to expand actions into configurations is taken at execution time. The action planning decisions and the configuration planning decisions are taken independently of each other. Fully integrated action and configuration planning The second way is to have the planners fully integrated. Both the decisions about what actions to perform (1) and the expansion of the actions into configurations (2) are taken at planning time. The decisions for 1 and 2 are fully interdependent, i.e., the configuration planner is called immediately to generate configurations for each action that is considered during search, so the system is working directly with configuration plans c1 , . . . , ck . In this way it is possible to cut parts of the search space based on the availability of configurations and to only create admissible configuration plans. Loosely coupled action and configuration planning In this paper, we present an approach that is based on the idea to generate an action plan and configurations for this plan before we start to execute it. First a complete action plan a1 , . . . , ak is generated, and then for that plan, a configuration is generated for each action: c1 , . . . , ck . That is, both the decision on actions to perform (1) and the expansion of actions into configurations (2) are done at planning time as in the fully integrated approach above. However, configuration generation is only done when a complete action plan has been found, in order to validate that plan. If the action plan is not valid (i.e., there are not configurations for all actions), the control returns to action planning to generate an alternative action plan, taking into account information about the failed action and its state, and so on. In this way, it is possible know if there is an admissible configuration plan for the generated action plan. In Fig. 2, the three different cases are shown side by side for comparison. The independent approach (Fig. 2a) assumes that the two planning problems can be addressed independently of each other. This approach has problems when an action cannot be expanded into a configuration at execution time. If this happens, a new action plan must be generated that fulfills the goal. Since some actions may be irreversible, there may be situations in which this solution would not be able to complete the task. Even if a new plan can be found, the fact that the actions in the failed plan were executed leads to a suboptimal performance. The fully integrated approach (Fig. 2b) considers both planning problems simultaneously. It is possible to guarantee that the generated configuration plans are admissible and optimal. However, since configurations are generated for all actions in the search space (even the actions that do not lead to the goal), the complexity of the problem makes it unusable in most practical cases. The loosely coupled approach (Fig. 2c) can, like the fully integrated approach, guarantee that the generated configuration plan is admissible. It avoids the complexity problems of the integrated approach by only trying to generate configurations for actions that are on a path to the goal. Compared to the independent approach, the loosely coupled approach can reject bad action plans before they are actually executed, and find better alternatives. The price to pay is that global optimality of the configuration plan cannot be guaranteed in general.
4
Implementation
The loosely coupled action planning and configuration generation approach has been implemented and tested on a special case of a multi-robot system, called the P EIS-Ecology.
4.1 The P EIS-Ecology testbed The concept of P EIS-Ecology was originally proposed by Saffiotti and Broxvall [13]. The main constituent of a P EIS-Ecology is a physically embedded intelligent system, or P EIS. This is any computerized system interacting with the environment through sensors and/or actuators and including some degree of “intelligence”. A P EIS generalizes the notion of robot, and it can be as simple as a toaster or as complex as a humanoid robot. A P EIS-Ecology consists of a number of P EIS embedded in the same physical environment, and endowed with a common communication and cooperation model. Communication relies on a shared tuple-space: P EIS exchange information by publishing tuples and subscribing to tuples. Cooperation relies on the notion of linking functional components: each P EIS can use functionalities from other P EIS in the ecology to complement its own. The P EIS-Ecology model has been implemented in an open-source middleware, called the P EIS-kernel [11].
4.2
Top-level process
To be used in a practical multi-robot system, such as the P EISEcology, action planning and configuration planning must be embedded in a larger process. This process must implement the integration of the two planners, and it must also consider the following aspects. First, both action planning and configuration planning depends on the current state of the environment and the system. Hence, this state should be dynamically acquired before the planning is started. Second, when the action plan is executed, each generated configuration should be instantiated in the actual P EIS-Ecology, and the configuration execution should be monitored in order to decide when to switch to the next action, and to detect possible failures. Fig. 3 gives an overall view of the top-level process. In this process, there are several “paths” for different situations. The solid arrows constitute the normal path. That is, all the different steps (1 – 8) are completed without any discrepancies. The dotted arrows represent different recovery paths. The rest of this section details the different steps and paths of the top-level process. The top-level process is run by one single robot that configures the P EIS-Ecology to help it solving the top-level task. The steps that consider state acquisition, configuration deployment, and configuration execution and monitoring are also reported in [7].
4.3
Planning
As noted above, both action and configuration planning use state information to ensure that both action plans and configurations are admissible. This state consists of two parts: system state and world state. The system state contains information relative to the system itself, e.g., which functionalities are currently available, and what is their current cost. The world state is a representation of the facts that currently hold in the environment, e.g., information about rooms and places, how they are connected, etc. To acquire the current (system and world) state from the P EIS-Ecology, we use the mechanisms provided by the P EIS-kernel. In order to generate action plans, we employ a state of the art action planner called PTLplan [4]. It requires as input a domain and a world state. The domain describes all the actions potentially available, and it is hand-coded. The state, acquired right before the planning is done, determines which actions are actually available in the current situation. An action plan consists of actions like “move(Pippi, bedroom)”, “dock(Pippi, bed)”, and “wakeup(Pippi, Johanna)”. This plan is
R. Lundh et al. / Automatic Configuration of Multi-Robot Systems: Planning for Multiple Steps
619
Figure 4. Left: A sketch of the P EIS-Home. Right: Astrid with the newspaper in the gripper.
Figure 3. Flow chart of the top-level process.
given to the configuration planner (step 3 in Fig. 3). In this step, a configuration is generated for each action in the action plan, thus creating a configuration plan. If there is a problem of finding a configuration for an action, information about that action and its state is stored (step A in Fig. 3). The action planner is then called again, and it removes that particular action in that particular state and then tries to find an alternative sequence of actions that can achieve the task. If an alternative action plan is found, it is given to the configuration generator which again tries to turn it into a configuration plan.
4.4
Execution
When a configuration plan is found, it is given to a sequencer (step 4 in Fig. 3) that is responsible for taking the next configuration in the configuration plan. When an action/configuration is reported to be completed (step 8), the sequencer takes the next configuration in the plan to deploy. Since all configurations are generated before the execution of the action plan, it is very important to verify that they are still admissible when it is time to execute them. To guarantee this, the state is dynamically acquired before the execution of each action (step 5). The preconditions of the configuration are then checked in the state (step 6). If they still hold, the configuration can be deployed (step 7). If they do not hold, an alternative configuration must be generated (step 3). If there is an alternative configuration, the postconditions of the alternative configuration must be compared with the postconditions of the initial configuration. If they are equal, the configuration can safely be added to the configuration plan and deployed. If they differ, the remaining part of the configuration plan must be regenerated to comply with the new configuration. In this case, the sequencer does not take the next action in step 4, but retries the same action. If in step 3 no alternative configuration was found, the information about
this action is stored in step A and the action planner tries to find a new action plan (step 2) as described in the previous section. Once a configuration description is generated, it must be deployed on the P EIS-Ecology. This involves activating functionalities, setting up the channels between the functionalities, and subscribing to the appropriate signals from the functionalities to know when a configuration is completed or if it has failed. After a configuration has been deployed, execution (step 8 Fig. 3) continues until the action is completed or it fails. When a configuration is completed, the next one is selected (step 4). If a configuration fails during execution, the top process tries to generate an alternative configuration.
5
An illustrative experiment
We have performed an experiment to show that (and how) the combined planner can handle situations when there are actions for which there is no configuration (this may occur both during planning and execution time). To facilitate comparisons with our previous approach [7], we repeat the scenario presented in that paper where a robot wakes up a person. For the experimental part, we have used a physical test-bed facility, called the P EIS-Home, which looks like a typical apartment of about 25m2 . It consists of a living-room, a bedroom and a small kitchen. The P EIS-Home is equipped with a communication and computation infrastructure, and with a number of P EIS. The following P EIS are of particular importance for our experiments. Pippi and Astrid: Two PeopleBot indoor robots from ActivMedia Robotics (see Fig. 4 right). Each one runs an instance of the Thinking Cap (TC), an architecture for autonomous robot control based on fuzzy logic [14], and an instance of the Player program [12], which provides a low-level interface to the robot’s sensors and actuators. The two robots are identical except that Astrid is equipped with a laser range finder and Pippi is not. The Home Security Monitor(HSM): A stationary computer which is connected to a set of web-cameras mounted in the ceiling. In addition to other monitoring tasks, not relevant here, the HSM provides a P EIS-component that is able to track a robot and localize it in the P EIS-home. HSM also has an action planner and a configuration planner, and the reconfigurations of the P EIS-Ecology in these experiments are done from here. Note however that these could as well be done elsewhere, e.g., in Pippi.
620
R. Lundh et al. / Automatic Configuration of Multi-Robot Systems: Planning for Multiple Steps
The experiment unfolds as follows: a. At start up, Pippi is located in the living-room and Astrid in the kitchen. When the morning paper arrives, the HSM wants to wake up Johanna, who is sleeping in the bedroom, and give it to her. b. With this task, the configuration process acquires the current state (step 1 Fig 3). For this state and task, an action plan is generated (step 2). This plan has the actions: dock-to(Pippi, entrance), take(Pippi, newspaper), move-to(Pippi, bedroom), dock-to(Pippi, bed), wake-up(Pippi, Johanna). c. In step 3, the search for a configuration for each action is started. For the first three actions, configurations are found. The first and third action (dock-to(entrance), move-to(bedroom)) uses a camera mounted in the ceiling for localization. For the fourth action (dock-to(bed)), the search fails since no configuration can be found. The ceiling camera used in the other actions can only track robots in the living-room and kitchen, and Pippi has no other means of localization. The information about the failed action is stored (step A Fig 3), and the action planer is again called. d. The action planner finds an alternative plan with the following actions: move-to(Astrid, living-room), dock-to(Astrid, entrance), take(Astrid, newspaper), move-to(Astrid, bedroom), dock-to(Astrid, bed), wake-up(Astrid, Johanna). When revisiting step 3 with the new action plan, it is possible to find a configuration for each action. Unlike Pippi, Astrid is able to localize on its own using a laser range finder and scan matching techniques. e. The first action/configuration in which Astrid is using the ceiling camera to localize is taken by the sequencer (step 4). It has a lower cost then using the laser. This configuration is then verified (step 6), deployed (step 7), and executed (step 8). When arriving to the living-room, the navigation module signals completion of the action and the next action is prepared for execution. f. The state is dynamically acquired (step 5). To demonstrate the behavior of the system under dynamically changing conditions, we manually made the ceiling camera unavailable. Thus, in the verification step, the configuration preconditions for dock to entrance using the ceiling camera does not hold. An alternative configuration is generated (step 3) in which the laser is used for localization instead. Since this new configuration has the same postconditions as the original configuration, it can be deployed and executed without regenerating the configuration plan. g. The remaining actions for take newspaper, dock to bed and waking up Johanna and delivering the newspaper proceeds without any complications and the task is completed. The critical point of this experiment is step c. where a configuration cannot be found for action dock-to(Pippi, bed) and the action planner is again called to find a new action plan. In the approach with independent action planning and configuration [7], Pippi would have started to execute the action plan and would not discover that it cannot achieve the goal until it reached the bedroom. The HSM would try to find a new plan to reach the goal. The HSM cannot simply generate the same plan as above, where Astrid gets the newspaper at the entrance and delivers it to Johanna, since Pippi is now holding the newspaper in the bedroom. If there is an action for giving a newspaper between robots, the HSM may find an alternative plan, otherwise it will fail.
6
Conclusions
We have presented an approach that, by combining different planning techniques, is able to find a solution for tasks that require sequences
of configurations to be completed. For this purpose we employ two different planners: one for action planning [4] and another for configuration planning [6]. The planners are loosely integrated, i.e., configuration planning is used to validate and correct action plans. With this integration, it is possible to guarantee that the execution of a task is not started if there is no admissible configuration plan. In other words, it is possible to know before hand if a plan is executable or not. To use a loose integration also makes it easy to replace the current planners with other generation techniques if this is desirable. We have demonstrated the approach in the P EIS-Ecology framework, but it applies to generic multi-robot systems as long as the robots are able to share their functionalities with each other. An important limitation of the current implementation is that we only consider the execution of a single top-level task. In general, several tasks might be performed concurrently, and new tasks might dynamically appear. A natural extension of the current framework would be to use task allocation techniques to assign different tasks to different configuration processes. With such an extension, issues such as resource handling, conflict resolution and deadlocks must also be considered. Our next step is to consider multiple top-level tasks.
ACKNOWLEDGEMENTS This work was supported by the Swedish National Graduate School in Computer Science (CUGS).
REFERENCES [1] D. Baker, G. McKee, and P. Schenker, ‘Network robotics, a framework for dynamic distributed architechtures’, in Proc of the IEEE/RSJ Int Conf on Intelligent Robots and Systems, pp. 1768–1773, (2004). [2] S. Cambon, F. Gravot, and R. Alami, ‘A robot task planner that merges symbolic and geometric reasoning’, in Proc of the European Conf on AI, pp. 895–899, (2004). [3] M. Gritti, M. Broxvall, and A. Saffiotti. Reactive self-configuration of an ecology of robots. In: ICRA workshop on Network Robot Systems, 2007. [4] L. Karlsson, ‘Conditional progressive planning under uncertainty’, in Proc of the Int Joint Conf on Artificial Intelligence (IJCAI), pp. 431– 438, (2001). [5] D. Kim, S. Park, Y. Jin, H. Chang, Y.-S. Park, I.-Y. Ko, K. Lee, J. Lee, Y.-C. Park, and S. Lee, ‘SHAGE: a framework for self-managed robot software’, in Proc of the Int Workshop on self-adaptation and selfmanaging systems, (2006). [6] R. Lundh, L. Karlsson, and A. Saffiotti, ‘Plan-based configuration of a group of robots’, in Proc of the 17th European Conf on Artificial Intelligence (ECAI), pp. 683–687, (2006). [7] R. Lundh, L. Karlsson, and A. Saffiotti, ‘Dynamic self-configuration of an ecology of robots’, in Proc of the IEEE/RSJ Int Conf on Intelligent Robots and Systems, pp. 3403–3409, (2007). [8] B. Morisset and M. Ghallab, ‘Learning how to combine sensory-motor functions into a robust behavior’, Artificial Intelligence, 172(4-5), 392– 412, (2008). [9] D. Nau, Y. Cao, A. Lothem, and H. Munoz-Avila, ‘SHOP: simple hierarchical ordered planner’, in Proc of the Int Joint Conf on Artificial Intelligence (IJCAI), pp. 968–973, (1999). [10] L. E. Parker and F. Tang, ‘Building multi-robot coalitions through automated task solution synthesis’, Proc of the IEEE, special issue on Multi-Robot Systems, 94(7), 1289–1305, (2006). [11] The PEIS ecology project. Official web site. www.aass.oru.se/˜peis/. [12] Player/Stage Project. playerstage.sourceforge.net/. [13] A. Saffiotti and M. Broxvall, ‘PEIS ecologies: Ambient intelligence meets autonomous robotics’, in Proc of the Int Conf on Smart Objects and Ambient Intelligence (sOc-EUSAI), pp. 275–280, (2005). [14] A. Saffiotti, K. Konolige, and E. H. Ruspini, ‘A multivalued-logic approach to integrating planning and control’, Artificial Intelligence, 76(1-2), 481–526, (1995).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-621
621
Structure segmentation and recognition in images guided by structural constraint propagation Olivier Nempont1 and Jamal Atif 2 and Elsa Angelini 1 and Isabelle Bloch 1 Abstract. In some application domains, such as medical imaging, the objects that compose the scene are known as well as some of their properties and their spatial arrangement. We can take advantage of this knowledge to perform the segmentation and recognition of structures in medical images. We propose here to formalize this problem as a constraint network and we perform the segmentation and recognition by iterative domain reductions, the domains being sets of regions. For computational purposes we represent the domains by their upper and lower bounds and we iteratively reduce the domains by updating their bounds. We show some preliminary results on normal and pathological brain images.
1 INTRODUCTION Image segmentation and recognition is a key problem in scene interpretation. In some application domains, such as medical imaging, the objects that compose the scene are known as well as some of their properties and their spatial arrangement. This knowledge may be properly encoded as a symbolic graph. Two main approaches can then be derived. The first one consists in matching this graph representation with image regions obtained from a preliminary segmentation (e.g. [5]). Since it is usually difficult to segment the image into semantically meaningful entities, this type of approach often relies on an over-segmentation, which makes the matching more complex (no isomorphism can be expected). The second type of approach uses the graph as a guide in a sequential process. In [4], the structures are sequentially segmented using a deformable model, which is constrained to fulfill some spatial relations with previously segmented structures. However the result is highly dependent on the segmentation order and the segmentation of one structure cannot benefit from partial information available about not already segmented structures. In this paper, we propose a new method to overcome these limitations. The idea is to express the problem as a constraint propagation process, exploiting the capability of constraint networks to solve combinatorial problems [18]. The propagation can be performed either by adding or simplifying constraints or by reducing the domains of variables. In the scope of qualitative spatial reasoning, the first option has been investigated in particular to solve satisfiability problems, for instance with RCC-8 relations [16] or qualitative relative positions [10]. We propose here to investigate the second option, i.e. the iterative reduction of the variable domains. We first recall in Section 2 some definitions on structural representations. Section 3 is the core of the paper. We define the constraint 1 2
Telecom ParisTech, CNRS UMR 5141 LTCI, Paris, email: {olivier.nempont, isabelle.bloch, elsa.angelini}@telecom-paristech.fr Unit´e ESPACE S140, IRD-Cayenne/UAG, Guyane Franc¸aise, email: jamal.atif@gmail.com
network, domains, domain bounds, the structural constraints and the contracting operators for several types of structural knowledge. In Section 4 we present a propagation process and a decision process based on minimal surface extraction. In Section 5 some preliminary segmentation and recognition results are presented on brain magnetic resonance images (MRI).
2 PRELIMINARIES Structural Knowledge Representation – The structural arrangement of anatomical structures is known and almost stable, even in the presence of a pathology. This knowledge, supposed to be consistent, can be appropriately encoded by an hypergraph [7] where the vertices correspond to spatial objects and the edges (between one or several nodes) may represent either: • known properties of objects, such as the connectivity, a priori range of volumes, • relative positions between structures, • appearance properties, such as homogeneity or contrast. Such characteristics depend on the imaging modality (MRI in our example). Since such knowledge is usually expressed in linguistic terms (in anatomical textbooks for instance [19]), fuzzy sets constitute an appealing framework for its formal modeling: to represent spatial relations, to account for different types of imprecision, related to the imperfections of the image, and to the intrinsic vagueness of some relations [1]. Membership functions defining these fuzzy sets can be learned from a data base of examples. Fuzzy Sets [6] – Let X be a bounded subset of Zn . A fuzzy set on X will be denoted by its membership function μ : X → [0, 1]. We denote α-cuts by μα and by F the set of fuzzy sets defined on X. (F, ≤) is a complete lattice for the usual order on fuzzy sets. The supremum ∨ and infimum ∧ are the max and min respectively. The smallest element is denoted by 0F and the largest element by 1F . We denote the fuzzy complementation by c(μ)(x) = 1 − μ(x), the Lukasiewicz t-norm by (x, y) = max(0, x + y − 1) and t-conorm by ⊥(x, y) = min(1, x + y), for x, y in [0, 1].
3 STRUCTURAL RECOGNITION PROBLEM AS A CONSTRAINT NETWORK 3.1 Structural segmentation and recognition problem Let I : X → R+ be a grey level image. We want to extract a set of N structures χ = {Oi |i ∈ [1..N ]} present in that image. Each of these
622
O. Nempont et al. / Structure Segmentation and Recognition in Images Guided by Structural Constraint Propagation
variables Oi is represented as a fuzzy subset μi ∈ F of X and takes values in a domain Di ⊆ F. The set of domains associated with χ is denoted by D. This recognition problem is constrained by the prior knowledge described in Section 2. Let us assume for instance that the knowledge base contains the relation “A is to the right of B”. The recognition amounts to find two fuzzy sets μ1 and μ2 satisfying the dir binary constraint CA,B (μ1 , μ2 ) = 1. The formal expression of these constraints is described in Section 3.3 for several types of relations. We will denote by C the set of constraints. Our segmentation and recognition problem can thus be associated with a constraint network χ, D, C. A solution {μi |μi ∈ Di , i ∈ [1..N ]} of our problem has to fulfill all constraints. Ideally this problem would have a unique solution. However it is generally underconstrained and different solutions are possible. Through contracting operators we will simplify our problem to obtain domains as close as possible to the set of solutions. In the following we always assume that the problem is satisfiable.
3.2 Domain definition The definition above involves the representation and the manipulation of domains which are subsets of F. In practice, membership values are discretized, and if k is the cardinality of the current discretization of [0, 1] and n the cardinality of X, the cardinality of F is then kn (10131072 for the 2D examples presented in Section 5). Handling such a set is generally not computationally tractable and we have to consider a simplified version of it. In [15], the authors represent this subset by its Minimum Bounded Rectangle (MBR) (i.e. the smallest rectangle in 2D that includes all elements of the domain). This very compact representation is nevertheless not able to capture the geometry of objects and provides a poor representation (consider for instance a diagonal line) that will limit the efficiency of the constraint propagation process. Considering the lattice structure of F, we propose here to define the domain bounds as the supremum and infimum of fuzzy sets over the domain. Let DA ⊆ F be the domain associated with an object A. We define the upper bound A of DA as: A = ∨{ν ∈ DA }. It can also be interpreted as an over-estimation of μA . The lower bound A is defined as: A = ∧{ν ∈ DA } and is an under-estimation of μA . We can notice that ∀ν ∈ DA , A ≤ ν ≤ A. For instance a tiny domain for the left lateral ventricle LV l (delineated in Figure 1(a)) is defined as the six fuzzy sets in (b). Note that the third one is μLV l . The lower and upper bounds (LV l, LV l) of this domain are presented in (c). Based on these notations, we represent the domain associated with a structure A by its bounds: (A, A) = {ν ∈ F|A ≤ ν ≤ A}. Note that if A A, the domain (A, A) is empty and the problem
(b)
(c)
Figure 1. A cropped axial view of a brain MRI. (a) Contour of left lateral ventricle (LVl). (b) A domain for LV l that contains six fuzzy sets. (c) Lower bound LV l and upper bound LV l.
1
2
is unsatisfiable. Let (A1 , A ) and (A2 , A ) be two non empty domains for the structure A. We consider the following partial order: 1 2 1 (A1 , A ) (A2 , A ) if ∀x ∈ X, A1 (x) ≥ A2 (x) and A (x) ≤ 2 A (x). The associated supremum and infimum operators are respec1 2 1 2 tively defined as: (A1 , A ) ∨ (A2 , A ) = (A1 ∧ A2 , A ∨ A ) and 1 2 1 2 (A1 , A ) ∧ (A2 , A ) = (A1 ∨ A2 , A ∧ A ).
3.3 Contracting operators 3.3.1 General issues The constraints involved in the knowledge data base are expressed as symbolic relations. Each constraint is defined as a function C : F k → {0, 1} if k objects are involved in the relation. As detailed below, it will be expressed in terms of fuzzy sets representing the objects and the spatial or appearance relations. Due to the size of the domains, contracting operators that exhaustively browse the domains (to achieve arc consistency for instance) cannot by applied. We thus define weaker contraction operators that compute new domain bounds from the initial domain bounds. A contracting operator ψ; D; C is written as: where ψ is the set of variables involved ψ; D ; C in the set of constraints C, D and D are the associated domains represented by their bounds, with D D. Notice that the contracting operators will generally not achieve arc consistency nor 2B consistency [9]. Indeed the domain may contain two values that fulfill all constraints but their supremum or infimum does not necessarily.
3.3.2 Directional relative position In [1] a method to characterize the directional relative position between objects using mathematical morphology was proposed. Suppose for instance that the caudate nucleus CN l (delineated in Figure 2(a)) is located on the right of the left ventricle LV l (delineated by dashed line). The relation “on the right” can be characterized by a structuring element ν. The fuzzy dilation δν (μLV l ) of μLV l by ν (displayed in (b)) defines a fuzzy set that corresponds to the points on the right of LV l. We consider that such a relation from an object A to an object B is satisfied if it is for all points of B, and we also impose that B is included in the complement of A. The associated constraint can be defined as: j 1 if μ2 ≤ (δν (μ1 ), c(μ1 )), dir CA,B (μ1 , μ2 ) = 0 otherwise. Suppose that the objects A and B are respectively defined over the domains (A, A) and (B, B). The elements μ of (B, B) that dir according to the current domain of A are such that: satisfy CA,B ∃ζ ∈ (A, A), μ ≤ (δν (ζ), c(ζ)), hence μ ≤ (δν (A), c(A)), since the dilation and are increasing and the complementation is decreasing. The contracting operator associated with the constraint dir CA,B is derived from this inequality. DIRECTION CONTRACTING OPERATOR: dir A, B; (A, A), (B, B); CA,B dir A, B; (A, A), (B, B ∧ (δν (A), c(A))); CA,B
Considering the same example, Figure 2 shows the upper bound LV l (c) and CDl (d) of the domains of LV l and CDl (the lower bound is here the empty set). The dilation δν (LV l) is displayed in (e) and we can see in (f) the updated upper bound CDl. The definition of the initial bounds will be addressed in Section 4.
O. Nempont et al. / Structure Segmentation and Recognition in Images Guided by Structural Constraint Propagation
We denote by H, H ⊆ F, the set of connected fuzzy sets. j 1 if μ1 ∈ H, conn (μ1 ) = CA 0 otherwise.
CNl LVl
(a)
623
(b)
(c)
W 1 (A) = {ν ∈ H|A ≤ A new upper bound can be obtained as: ξA ν ≤ A}. However it can be shown that this filter is not robust (a small error on A may cause a large error on the result). As discussed in [14], weWprefer the following formulation: 2 (A) = {ν ∈ H | ν ≤ A and maxx∈X ν(x) ≤ μ≤ (A, ν)}, ξA where μ≤ stands for the Lukasiewicz implicator, i.e. μ≤ (A, ν) = minx∈X min(1, 1 − A(x) + ν(x)). CONNECTIVITY CONTRACTING OPERATOR: conn A; (A, A); CA 2 conn A; (A, ξA (A)); CA
(d)
(e)
(f)
3.3.6 Volume Figure 2. A cropped axial view of a brain MRI. (a) Contours of the left lateral ventricle (LV l) and the left caudate nucleus (CN l). (b) Fuzzy set that represents the points on the right of LVl. (c) LV l. (d) CN l. (e) On the right of LV l. (f) CN l updated.
3.3.3 Distances Distances from fuzzy objects may be computed using mathematical morphology [1]. Let us assume that we have some knowledge about the distance between two objects A and B, that can be modeled as a fuzzy interval. The region of space satisfying such a relation to a reference object μ1 is defined as the set difference between two dilations, using two structuring elements ν1 and ν2 defined in the spatial domain and derived from the fuzzy interval: (c(δν1 (μ1 )), δν2 (μ1 )). Two fuzzy sets μ1 and μ2 satisfy the distance constraint between A and B if: j 1 if μ2 ≤ (c(δν1 (μ1 )), δν2 (μ1 )), dist CA,B (μ1 , μ2 ) = 0 otherwise. DISTANCE CONTRACTING OPERATOR: dist A, B; (A, A), (B, B); CA,B dist A, B; (A, A), (B, B ∧ (c(δν1 (A)), δν2 (A))); CA,B
3.3.4 Inclusion Consider now two objects A and B with A included in B. The associated constraint can be expressed as: j 1 if μ1 ≤ μ2 , in (μ1 , μ2 ) = CA,B 0 otherwise. INCLUSION CONTRACTING OPERATOR: in A, B; (A, A), (B, B); CA,B in A, B; (A, A ∧ B), (B ∨ A, B); CA,B
The inclusion prior can be extended to a partition prior, for instance if an object A can be decomposed into subparts {Bi }.
A volume prior is represented as a membership function μVmin : R+ → [0, 1]. The constraint is formulated as (see [14] for details): 8 < 1 if maxx∈X μ(x) vol ≤ maxv∈R+ min(μV (μ)(v), μVmin (v)), CA (μ) = : 0 otherwise, where μV (μ)(v) = sup{α, |μα | ≥ v}, |μα | denoting the cardinality (i.e. the volume) of the α-cut μα . The reduction of the domain to the fuzzy sets that satisfy this prior, will generally not change the bounds. However if we also suppose that the object is connected, the upper bound can be filtered accordW vol (ν) = 1}. ing to: ξμVmin (A) = {ν ∈ H|ν ≤ A and CA VOLUME AND CONNECTIVITY CONTRACTING OPERATOR: conn vol A; (A, A); CA ∧ CA conn vol A; (A, ξμVmin (A)); CA ∧ CA
3.3.7 Adjacency A degree of adjacency between A and B can be defined as [1]: μadj (μA , μB ) = supx,y∈X min(μA (x), μB (y), n(x, y)) where n(x, y) stands for a connectivity degree between two points x and y of X. We define the following constraint: 8 < 1 if min(maxx∈X μ1 (x), adj maxx∈X μ2 (x)) = μadj (μ1 , μ2 ), CA,B (μ1 , μ2 ) = : 0 otherwise. As in the volume case, a domain reduction by an adjacency constraint does not affect its bounds. Therefore, we also consider the adjacency jointly W with a connectivity prior, and define the following adj adj (B) = {ν ∈ H|ν ≤ B and CA,B (A, ν) = 1}. filter: ξA ADJACENCY AND CONNECTIVITY CONTRACTING OPERATOR: adj conn conn ∧ CA ∧ CB A, B; (A, A), (B, B); CA,B adj adj adj conn conn A, B; (A, ξB (A)), (B, ξA (B)); CA,B ∧ CA ∧ CB
3.3.5 Connectivity
3.3.8 Contrast
If A is a connected object, its domain can be restricted to connected fuzzy sets (definitions of fuzzy connectivity can be found in [17, 14]).
The following constraint will play a key role in the propagation process, since it will be computed from image data. We suppose here
624
O. Nempont et al. / Structure Segmentation and Recognition in Images Guided by Structural Constraint Propagation
that the contrast between the structures is roughly known and stable, which is the case in MRI (the lateral ventricles are for instance hypointense compared with the white matter on T1 weighted MRI). We first define the grey level membership function associated with a spatial object as: μIA (v) = supx∈X,I(x)=v μA (x), where I is the intensity function and v a grey level value (conversely a spatial membership function μ can be obtained from a grey level one μI as μ(x) = μI ◦ I(x)). We rely on the definition of Michelson for the contrast [12]: 2 c = vv11 −v , where v1 and v2 are two grey levels. According to the +v2 extension principle [20], we obtain the following membership function for the contrast between two fuzzy objects A and B, with grey level membership functions μIA and μIB : μcA,B (c) = sup(v ,v )∈R+2 ,c= v1 −v2 min(μIA (v1 ), μIB (v2 )). 1
2
v1 +v2
Conversely, if we consider a contrast prior μcA,B , we can obtain the set of grey levels that satisfy this contrast prior from object −1 A as: μI (v) = sup(v1 ,v2 )∈R+2 ,v=v1 ∗v2 min(μIA (v1 ), μkA,B (v2 )) −1
with μkA,B (v) = supc∈[−1,1],v= 1−c μcA,B (c) and from object B 1+c
as: μI (v) = sup(v1 ,v2 )∈R+2 ,v=v1 ∗v2 min(μIB (v1 ), μkA,B (v2 )) with μkA,B (v) = supc∈[−1,1],v= 1+c μcA,B (c). 1−c
cont CA,B (μ1 , μ2 ) = 8 1 if ∀v ∈ R+ , > > > > μI1 (v) ≤ sup(v1 ,v2 )∈R+2 min(μI2 (v1 ), μkA,B (v2 )) > > < v=v1 ∗v2 and ∀v ∈ R+ , −1 > > > μI2 (v) ≤ sup(v1 ,v2 )∈R+2 min(μI1 (v1 ), μkA,B (v2 )) > > > v=v ∗v 1 2 : 0 otherwise.
CONTRAST CONTRACTING OPERATOR:
4 CONSTRAINT PROPAGATION We describe here a simple propagation algorithm to perform the segmentation and recognition of a set of structures χ. First we initialize the domains of these structures to (0F , 1F ) and we restrict the set of constraints to those that involve only variables in χ. The constraints are then sequentially applied to reduce the variable domains, i.e. to reduce the upper bound and increase the lower one. The constraints could be applied sequentially without any ordering. However in most cases the constraint computation would be useless and time consuming. Different factors may influence the benefit of the computation of a constraint. Among them we consider the amount of change (since the last computation) of the bounds of the variables involved in the constraint3 and the computation cost of the constraint CC (function of the complexity of each involved operation such as dilations). We define a priority P for each constraint, initialized to P (0) = card(X) . At each step of the propagation process CC the highest priority constraint is selected and the associated contracting operator is computed. The priority of the constraint is then set to 0. The application of this contracting operator may induce changes on the domain of its variables. When this occurs, the priority P of a constraint that depends on one of the variables is updated as follows: P P (i + 1) = P (i) + 1
1 x∈X (A (x)
2
− A (x)) + (A2 (x) − A1 (x)) , CC
2
were (A1 , A ) and (A2 , A ) are respectively the domains before and after a change on the variable and P (i) is the priority value at step i. The process stops when the priority of all constraints is equal to 0. 3
In AC −3 algorithm [11], the list of constraints to update would correspond to those with a non-zero amount of change.
cont A, B; (A, A), (B, B); CA,B I A, B; (A, A ∧ (sup(v1 ,v2 )∈R+2 min(μB (v1 ), μkA,B (v2 )) ◦ I)), v=v1 ∗v2
−1
cont (B, B ∧ (sup(v1 ,v2 )∈R+2 min(μIA (v1 ), μkA,B (v2 )) ◦ I)); CA,B v=v1 ∗v2
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
20
30
40
50
60
70
50
60
70
(b) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
(a)
20
30
40
(c)
(d)
(e)
Figure 3. (a) LV l. (b) μILV l . (c) Original μIW M l (plain), updated one (dashed). W M l before (d) and after (e) application of the contrast contracting operator.
This is illustrated in Figure 3. Suppose for instance that the fuzzy set displayed in (a) is the upper bound LV l of the domain of the left lateral ventricle. The associated grey level membership function μILV l is shown in (b). An upper bound W M l for the left white matter structures is displayed in (d) (the contour of W M l is also shown) and μIW M l in (c). The application of the contrast contracting operator restricts μIW M l to the membership in (c) (dashed), which corresponds to the updated W M l (e).
Figure 4. LV l (left) and LV l (right) at step 0 (a), 500 (b), 1000 (c), 2500 (d), 10000 (e) and 20000 (f) of the propagation process. The target object LV l is delineated.
625
O. Nempont et al. / Structure Segmentation and Recognition in Images Guided by Structural Constraint Propagation
Ideally the upper and lower bounds of the different domains will converge to the same fuzzy set. However this will generally not occur and there remains some indecision at least on object boundaries. Even if the propagation significantly reduces the search space, it is still time consuming to apply a backtracking algorithm to extract an optimal solution according to some cost function. Therefore we propose to refine the segmentation of each structure by using the method proposed in [13], based on minimal surface optimization [3]. The segmentation problem consists in finding the closed curve that minimizes a metric based on the obtained bounds. This can efficiently be solved using a graph-cuts based method [2] for instance.
5 PRELIMINARY RESULTS ON NORMAL AND PATHOLOGICAL BRAIN We illustrate here some preliminary results on 2D brain MRI. Our knowledge base contains about 3000 relations involving 34 variables that correspond to visible structures on MRI. If we consider the left caudate nucleus, it is for instance strictly on the right of the left lateral ventricle, fairly on the left of the putamen, much brighter than the lateral ventricle, darker than the white matter and somewhat darker than the putamen. We now describe the recognition process for a few structures of the 2D brain MRI presented in Figure 5(a). We suppose that the brain was previously extracted. The associated domain is defined as a singleton. Its lower and upper bounds are thus equal. We initialized all other domains to (0F , 1F ). The propagation is then performed, completing in about 5 hours on a 3.0 GHz Pentium 4 CPU. We show in Figure 4 the upper and lower bounds of the left lateral ventricle at different steps of the propagation process. The prior information provides a good discrimination with other structures and the upper and lower bounds are close to the solution at the end of the propagation. The extraction of a crisp segmentation can then easily be performed using the method in [13]. We show in Figure 5(b) the segmentation results for the internal structures. We show also a result on a case affected by a brain tumor in Figure 5 (c-d). The tumor induces various degrees of deformation and may also involve structural modifications. The case presented here is affected by a cortical tumor which was previously extracted [8]. We modify the knowledge base, just to include that the tumor is a subpart of the brain. We do not modify the other relations. The segmentation results for internal structures is shown in Figure 5(d). We can observe that the result remains correct, despite the shape modification induced on some structures by the tumor.
6 CONCLUSION We have proposed in this paper a new formulation of the segmentation and recognition task in the case of a known structural arrangement as the resolution of a constraint network. Preliminary results were shown on 2D MRI brain. They illustrate that the constraint propagation is very efficient in providing domain bounds close to the object, thus considerably reducing the search space. Future work aims at improving the efficiency of the propagation process to make it applicable in 3D cases. A deeper study for pathological cases will also be performed, in particular to account for strong structural changes on the internal structures potentially induced by subcortical tumors.
ACKNOWLEDGEMENTS This work has been partly supported by a grant from INCA.
(a)
(b)
(c)
(d)
Figure 5. (a) 2D T1 weighted brain MRI. (b) Cropped view of segmentation results for the internal structures. (c) 2D MRI of a brain affected by a tumor. (d) Segmentation results for internal structures.
REFERENCES [1] I. Bloch, ‘Spatial Reasoning under Imprecision using Fuzzy Set Theory, Formal Logics and Mathematical Morphology’, International Journal of Approximate Reasoning, 41, 77–95, (2006). [2] Y. Boykov and V. Kolmogorov, ‘Computing geodesics and minimal surfaces via graph cuts’, in IEEE International Conference on Computer Vision, ICCV, pp. 26–33, Nice, France, (jun 2003). [3] V. Caselles, R. Kimmel, and G. Sapiro, ‘Geodesic active contours’, in IEEE International Conference on Computer Vision, ICCV, pp. 694– 699, Boston, MA, USA, (1995). [4] O. Colliot, O. Camara, and I. Bloch, ‘Integration of Fuzzy Spatial Relations in Deformable Models - Application to Brain MRI Segmentation’, Pattern Recognition, 39, 1401–1414, (2006). [5] A. Deruyver, ‘Adaptive pyramid and semantic graph: knowledge driven segmentation’, in Graph-based Representations in Pattern Recognition, GbR, volume LNCS 3434, pp. 213–223, Poitiers, France, (apr 2005). [6] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, New-York, 1980. [7] C. Hudelot, J. Atif, O. Nempont, B. Batrancourt, E. Angelini, and I. Bloch, ‘GRAFIP: a Framework for the Representation of Healthy and Pathological Anatomical and Functional Cerebral Information’, in Human Brain Mapping, HBM, Florence, Italy, (jun 2006). [8] H. Khotanlou, O. Colliot, J. Atif, and I. Bloch, ‘3D Brain Tumor Segmentation in MRI Using Fuzzy Classification, Symmetry Analysis and Spatially Constrained Deformable Models’, To appear in Fuzzy Sets and Systems. [9] O. Lhomme, ‘Consistency Techniques for Numeric CSPs’, in International Joint Conference on Artificial Intelligence, IJCAI, pp. 232–238, Chambry, France, (1993). [10] G. Ligozat, ‘Reasoning about Cardinal Directions’, Journal of Visual Languages and Computing, 9(1), 23–44, (1998). [11] A.K. Mackworth, ‘Consistency in networks of relations’, Artificial Intelligence, 8(1), 99–118, (feb 1977). [12] A. Michelson, Studies in Optics, Chicago University Press, 1927. [13] O. Nempont, J. Atif, E. Angelini, and I. Bloch, ‘Combining Radiometric and Spatial Structural Information in a New Metric for Minimal Surface Segmentation’, in Information Processing in Medical Imaging, IPMI, volume LNCS 4584, pp. 283–295, Kerkrade, The Netherlands, (jul 2007). [14] O. Nempont, J. Atif, E. Angelini, and I. Bloch, ‘A New Fuzzy Connectivity Class. Application to Structural Recognition in Images’, in Discrete Geometry for Computer Imagery DGCI, volume LNCS 4992, pp. 446–457, Lyon, France, (2008). [15] D. Papadias, T. Sellis, Y. Theodoridis, and M.J. Egenhofer, Topological relations in the world of minimum bounding rectangles: a study with R-trees, ACM Press New York, NY, USA, 1995. [16] J. Renz and B. Nebel, ‘On the complexity of qualitative spatial reasoning: A maximal tractable fragment of the Region Connection Calculus’, Artificial Intelligence, 108(1-2), 69–123, (1999). [17] A. Rosenfeld, ‘Fuzzy Digital Topology’, Information and Control, 40, 76–87, (1979). [18] F. Rossi, P. Van Beek, and T. Walsh, Handbook of Constraint Programming, Elsevier Science, 2006. [19] S.G. Waxman, Correlative Neuroanatomy, McGraw-Hill, New York, 24 edn., 2000. [20] L. A. Zadeh, ‘The Concept of a Linguistic Variable and its Application to Approximate Reasoning’, Information Sciences, 8, 199–249, (1975).
626
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-626
Theoretical Study of Ant-based Algorithms for Multi-Agent Patrolling Arnaud Glad and Olivier Simonin and Olivier Buffet and Franc¸ois Charpillet 1 Abstract. This paper addresses the multi-agent patrolling problem, which consists for a set of autonomous agents to visit all the places of an unknown environment as regularly as possible. The proposed approach is based on the ant paradigm. Each agent can only mark and move according to its local perception of the environment. We study EVAW, a pheromone-based variant of the EVAP [3] and VAW [12]. The main novelty of the paper is the proof of some emergent spatial properties of the proposed algorithm. In particular we show that obtained cycles are necessarily of same length, which ensures an efficient spatial distribution of the agents. We also report some experimental results and discuss open questions concerning the proposed algorithm.
1
INTRODUCTION
Deploying autonomous agents or robots in unknown or dynamic environments is a challenging problem for a growing number of tasks (e.g. military surveillance, rescue after natural disasters, etc.). In this paper we address an important task: the patrolling of an unknown environment. It consists in several agents that are in charge of the surveillance of a limited area. We suppose that this area is not known in advance and the number of agents can change dynamically. So we are looking for a patrolling approach that provides adaptability and robustness. To address such a challenge we study a bio-inspired algorithm that mimics ant mechanisms. Ants provide decentralized algorithms relying on very simple individual behaviors [6]. A particularity of ants is their ability to use the environment as a shared memory by dropping and sensing pheromones, defining temporary information (due to the evaporation process). Such a paradigm has been used to define several pheromone-based algorithms and meta-heuristics to deal with spatial or more generally distributed problems [5, 2, 4, 9, 10]. The patrolling problem can be defined, for a group of agents, as the problem of visiting a set of places while minimizing the time between two consecutive visits. This time is called idleness. For about ten years, several models have been proposed to deal with patrolling. Most of these approaches propose to search for a policy offline by ant-walk and consider a priori known environments represented as graphs [8, 1, 7]. On the contrary, few models have been proposed to deal with unknown and dynamic environments and online computation. We can mention Wagner et al. [13, 11] who proposed ant-based algorithms for the covering problem. In these papers they explored the capabilities of self-organized systems, in which each agent can only read and write integers on the edges of a graph. In this paper we study such systems when the environment is a grid. So we present 1
INRIA/Nancy University, Loria Lab., MAIA project, Nancy, France email: firstname.lastname@loria.fr
the EVAP algorithm, introduced in [3], that just uses the pheromone evaporation process, and we compare it to a variant of the VAW algorithm [12]. Those algorithms exhibit interesting properties. After an exploration phase, agents self-organize into stable partial cycles of equal length that completely cover the environment. As a consequence, cells are visited at a very regular frequency. As this property is desirable in the patrolling problem, our main objective is to demonstrate this property formally. The paper is organized as follows. In Section 2 we introduce the multi-agent patrolling problem. Then Section 3 presents the EVAP and VAW ant-based algorithms allowing to deal with covering and patrolling problems, and we show that they have similar behaviors. In Section 4 we study emergent spatial properties of EVAW, a combination of these two algorithms, by focusing on the emergence of optimal cycles. Before concluding, Section 5 discusses some open questions about the proposed approach.
2 2.1
THE PATROLLING PROBLEM Definition
Patrolling consists in deploying several agents in order to visit at regular time intervals some defined places of an area. It aims at gathering reliable information, seeking objects and watching over places in order to defend them against any intrusion, etc. An efficient patrol in an environment requires that the delay between two consecutive visits of a given place is minimal. Related work on multi-agent patrolling generally considers that the environment is known, two-dimensional and that it can be reduced to a graph G(V, E) (V the nodes to be visited, E the arcs defining the valid paths between nodes).
2.2
Covering vs. Patrolling
(a) Figure 1.
(b) Optimal covering is not necessary optimal patrolling
Covering aims, for one or multiple agents, at visiting each place of the environment once within the shortest possible time. Then patrolling can be intuitively considered as the process of repeatedly covering an environment. But a simple example can show that repeating an optimal solution to cover the environment is not necessary
627
A. Glad et al. / Theoretical Study of Ant-Based Algorithms for Multi-Agent Patrolling
optimal for patrolling. Indeed, in the case of Figure 1 we have two optimal covers but only the second one is an optimal patrol since the last visited cell is adjacent to the first one. Covering approaches may not be relevant in the scope of the patrolling problem. In next sections we address the patrolling problem by using simple agents that cannot communicate directly.
3
ANT-INSPIRED ALGORITHMS
3.1
Presentation of the Algorithms
3.1.1
The EVAP Algorithm
The EVAP algorithm has been introduced in [3]. This algorithm solves the multi-agent patrolling problem even when the environment is unknown. It is based on a digital pheromone model in which pheromones are represented as numbers whose value decreases over time (simulating the evaporation process of biological pheromones). Agents evolve in a 2D grid. They can perceive and move to the four adjacent cells representing their neighborhood (noted N (x), x being the current cell). Algorithm 1 describes the individual behavior of each agent. When an agent visits a cell, it drops a quantity Qmax of pheromone, then moves according to the negative gradient of pheromone. As the environment evaporates pheromones, with rate ρ (see Algorithm 2), the remaining quantity in a cell x (noted q(x)) represents the time elapsed since its last visit. So, an agent’s local behavior is defined by moving to the cell of its neighborhood which has not been visited for the longest time.
Figure 2. 3D illustration of the EVAP algorithm (with one agent)
Algorithm 1 EVAP Agent (situated on cell x) A) Find a cell y in N (x) such that q(y) = minw∈N (x) q(w) in case of multiple choices make a random choice B) Move to cell y C) Set q(y) ← Qmax (drop the Max quantity of pheromone)
Algorithm 2 EVAP Environment For every cell x of the environment If q(x) = 0 then q(x) ← ρ.q(x) (ρ ∈]0, 1[)
3.1.2
the same as the EVAP algorithm (gradient descent), but the dropped information is the date s(x) of the visit instead of laying a quantity of pheromone. So, in the VAW0 algorithm, agents must have synchronised time counters (same frequency) and start at the same time with counter t = 0. Algorithm 3 Vertex-ant-walk0 (ant situated in cell x) A) Find a cell y in N (x) such that s(y) = minw∈N (x) s(w) in case of multiple choices make a random choice B) Set s(x) ← t C) Move to cell y D) t = t + 1
3.2
Comparison of the EVAP and VAW0 Algorithms
Lets compare both algorithms. One can see that the next cell selected by an agent is the same in both algorithms (step A). Indeed, agents follow the numerical gradient, choosing in the surrounding neighborhood the cell with the minimum value. So agents necessarily choose the one which has not been visited for the longest time. Concerning the numerical fields q and s built by the algorithms, they both allow to express the elapsed time δt(x) since the last visit of a x cell: δt(x)
=
log(q(x)/Qmax )/ log(ρ)
δt(x)
=
t − s(x) in VAW0 .
in EVAP,
It is then possible to express q(x) as a function of s(x) and reciprocally. There is clearly a bijection between the EVAP evaporation function and the VAW0 time function. So, we can freely swap the time computation functions of these two algorithms. However, it is important to note that, in the multi-agent case, EVAP and VAW0 are not strictly equivalent as steps B and C are not performed in the same order. EVAP agents move and drop pheromones whereas VAW0 agents drop pheromones and move to the next cell. As a consequence, two EVAP agents may only meet on the same cell in very particular topologies. On the contrary, VAW0 agents may find themselves on the same cell more often and then follow each other until some random choice has to be made. This subtle difference leads to a more efficient exploration with EVAP. We prefer EVAP because it favors exploration, yet VAW0 ’s time computation function is easier to manipulate. As a result, we propose — and will study — the EVAW algorithm (Exploring VAW) which uses EVAP’s order of operations with VAW0 ’s maths formulae (see Algorithm 4). Note that EVAP and EVAW exhibit identical behaviors for the same initial conditions and the same random seed. Algorithm 4 EVAW Agent (situated on cell x) A) Find a cell y in N (x) such that s(y) = minw∈N (x) s(w) in case of multiple choices make a random choice B) Move to cell y C) Set s(y) ← t D) t = t + 1
The Vertex-Ant-Walk (VAW) Algorithm
In this section, we present an earlier version of the VAW algorithm (noted WAV0 in the rest of the paper) introduced by Wagner and coauthors in an appendix of [12]. The local behavior of the agents is
3.3
Known Properties
In [12], Wagner et al. proved that k VAW0 agents cover the environment in bounded time tk . This proof can be extended to show
628
A. Glad et al. / Theoretical Study of Ant-Based Algorithms for Multi-Agent Patrolling
that the algorithm performs the patrolling task (each cell will be visited at most every tk time steps). These results are also valid for the EVAW algorithm. As Wagner et al. we have also experimentally observed that the agents self-organize, so that each of them reaches a stable cycle. A cycle ζ is a finite sequence of adjacent cells that the agent repeatedly covers, some cells possibly appearing several times in the sequence. We are interested in formally studying those cycles. Before considering the multi-agent case in next section, we start by giving a result in the single agent case. In [11], Wagner et al. present a VAW variant (which we call VAW1 ) in which ants smell traces made up of a pair (μ, τ ) in which μ is the number of visits to the cell so far and τ the last time the cell was visited. Considering a single agent, they proved that, when an Hamiltonian cycle2 has been reached, the ant repeats it forever. Using the proof schema, we now show the same result for the EVAW algorithm. We note st (x) the value of cell x at time t. Proof: Assume that ζ is an Hamiltonian cycle denoted by ζ(t) = (xt , xt+1 , . . . , xt+n ) the sequence of n + 1 consecutive vertices in the tour, starting at xt . The next tour starts at time t + n + 1 and only depends on the gradient values along the vertices. So, to prove that the cycle is stable, we have to prove that, for vertices u, v, if it holds that st (u) > st (v) then st+n (u) > st+n (v). This is true as, for all u, st+n (u) = st (u)+n. So if a single Hamiltonian cycle is obtained it remains stable forever. In the next section we study the stability of cycles (Hamiltonian or not) when several agents interact in the same environment.
only once in the cycle (which is in particular verified in Hamiltonian cycles). As a result, at time t, when agt1 is in c1 , we have: st (c1 ) = st (c1 ) − l1 + 1 = t − l1 + 1.
(1)
Lemma Under these conditions, two distinct cycles patrolled each by one EVAW agent will not be maintained if they have different lengths. Proof If agt2 breaks its cycle first, the problem is solved. Let us therefore consider that this is not the case and observe what happens for agt1 . Agent agt1 goes to cycle ζ2 (on cell c2 ) if and only if it is in cell c1 at time t and st (c1 )
≥
st (c2 ).
(2)
This inequality relies on the EVAW agent behavior that ensures it always moves to its minimal neighbor cell. We therefore have to show that inequality (2) will be true in a finite time. The property that both agents visit c1 and c2 alternatively infinitely often would be written: t2 ≤ t1 ≤ t2 + l2 ≤ t1 + l1 ≤ · · · ≤ t2 + k · l2 ≤ t1 + k · l1 , where t2 and t1 are two reference visit dates t2 and t1 (agt2 visiting c2 just before agt1 visits c1 ). This inequality obviously holds only if l1 = l2 . Thus, there exist two dates t1 and t2 of the visit of agt1 in c1 (st1 (c1 ) = t1 ) and agt2 in c2 (st2 (c2 ) = t2 ) such that t2 ≤ t1 < t1 + l1 < t2 + l2 .
4
STUDY OF THE MULTI-AGENT CASE
4.1
We can then write (using Equation 1):
Introduction
In the multi-agent setting, cycles only interact in pairs so that we will focus on the two-agent case. We suppose for now that both agents (agt1 and agt2 ) remain on their own cycles (ζ1 and ζ2 , of respective lengths l1 and l2 ). These cycles are neighbors by at least two adjacent cells. We note (c1 , c2 ) a couple of adjacent cells such that c1 ∈ ζ1 and c2 ∈ ζ2 (see Fig. 3).
l1
l2 ζ1
c1 c1 c2
st1 (c1 )
=
t1 − l1 + 1,
st1 +l1 (c1 )
=
(t1 + l1 ) − l1 + 1 = t1 + 1,
st1 (c2 )
=
st2 (c2 ) = t2
(because t1 < t2 + l2 ) and
st1 +l1 (c2 )
=
st2 (c2 ) = t2
(because t1 + l1 < t2 + l2 ).
Then, at t1 + l1 , we have (using Eq. 2): st1 +l1 (c1 )
ζ2
=
t1 + 1
>
t2
=
st1 +l1 (c2 ).
So, agt1 changes to cycle ζ2 .
Figure 3. Two cycles of different lengths connecting in cells (c1 , c2 )
We will now show that the obtained cycles can not be stable if they have different lengths, then study the stability of equal length cycles.
4.2
Instability of Cycles of Different Lengths
We suppose l1 < l2 . Each time agt1 visits c1 , it continues its cycle on cell c1 (see Fig. 3). We make the assumption that c1 appears 2
A cycle is Hamiltonian when each cell is visited exactly once.
2
Note that, as we take into account only cell c2 , the previous result does not depend on the direction of agt2 ’s walk. Another remark concerns the stability of n cycles created by n agents. The stability of the system can only be obtained if cycles have the same length.
4.3
Stability of Equal Length Cycles
From now on we consider that l1 = l2 . Will cycles ζ1 and ζ2 be maintained ? We show that some patterns are fixed points and others are not. Lets start with an illustrated example. Figure 4 presents an environment in which two cycles have emerged, and that will persist, i.e. a fixed point was attained. Such a solution illustrates the emergence of an optimal patrolling with two agents. Fig. 4-b shows step 7 and Fig. 4-c shows step 15 (i.e. after one more turn). One can see that
629
A. Glad et al. / Theoretical Study of Ant-Based Algorithms for Multi-Agent Patrolling
a)
b)
7
0
1
6
3
2
5
4
1
2
0
3
4
7
6
5
9 10 8 11 12 15 8
9 15 14 13
stable. Otherwise, one agent (say agt1 ) is in front of the other (agt2 ) and may switch to agt2 ’s cycle which has to find another path to follow (Fig. 7-c).
14 11 10
c)
13 12
Figure 4. A fixed point composed of two cycles of equal length
2
0
1
n
0 n−1
6
7
0
5
2
1
4
3
a) General view
2
0
3
4
7
6
5
Figure 7.
Figure 5. Two cycles of equal length that cannot be maintained
the difference of values between adjacent cells from one cycle to the next remains the same. We show below that, under defined conditions, when agents converge to distinct cycles of equal length, the cycles will be stable. Remark When both cycles have the same length, an agent has a choice between two options (see Fig. 5) if and only if it sees not only the tail of its own cycle, but also the tail of the other agent’s cycle. We will try to find out in which situations such a choice is possible by first studying a special case where both cycles are contiguous on half of their length, as depicted on Figures 6-a and 7-a. In this setting, we will distinguish two cases depending whether both agents run along their boundary in opposite or similar directions.
0
0
n
n n−1
n n−2
1
1
b) Stable cycles
c) Unstable cycles
Agents going in similar directions along their boundary
Non Continuous Boundary — The same reasoning can be extended to more complex settings where the boundary is not made of a single segment as in previous examples. Fig. 8-a shows two agents which have reached stable cycles whose boundary is made of five segments.
a)
b) Figure 8.
Solutions with a) complex boundaries and b) more than two agents
Agents Going in Opposite Directions — Because the length of the boundary is half the length of their cycles, agt1 and agt2 meet each other at some point along this boundary. Then, they can either always end up on a couple of neighbouring cells (c1 , c2 ) —so that each remains on its own cycle (see Fig. 6-b)— or they always “miss” each other —so that they both see each other’s tail and have the choice to switch cycles or not (see Fig. 6-c)—. As a consequence, the agents have one chance out of two to have stable cycles.
Beyond Two Agents — The same reasoning can also be extended to more than two agents by considering boundaries in pairs as illustrated in Fig. 8-b.
n−1 0
Common Cycle — We distinguish a first case where several agents cover a common cycle. Figure 9-a illustrates such a situation. Trivially, both agents describe a cycle with the same length as the other.
n
n
0 n−1
a) General View
b) Agents meeting
n
0
0
n
4.4
Shared Cycles
Up to now, cycles were distinct, meaning that each cell belonged to a single cycle. However, EVAW agents can also reach cycles where some cells are visited by different agents.
c) Agents missing
Figure 6. Agents going in opposite directions along their boundary
a) Agents Going in Similar Directions — Both agents “follow each other”. In most cases the distance between agt1 and agt2 is different from 1, so that they never see each other’s tail (Fig. 7-b) and remain
Figure 9.
b) Solutions with a) a common cycle and b) two overlapping cycles
630
A. Glad et al. / Theoretical Study of Ant-Based Algorithms for Multi-Agent Patrolling
Overlapping Cycles — A second case is that of agents whose cycles share only a subset of their cells. Experimentally, this case seems to appear more frequently than common cycles. Fig. 9-b gives an example of cycles overlapping on the central cell of the environment.
5
DISCUSSION
We have demonstrated that the obtained cycles can only stabilize if they have the same length. As a consequence the EVAP algorithm ensures a balanced spatial distribution of agents in the environment. Indeed, the average and worst-case idlenesses are minimized, which is a desired property in the context of patrolling. Wagner et al. [11] asked the question whether VAW1 — when used with a single agent and in an environment allowing Hamiltonian cycles — can converge to a non-Hamiltonian cycle. Our experiments with EVAW raise the same question as we never found a counterexample. It is interesting to note that, in a multi-agent setting, EVAW may reach suboptimal solutions, when the environment is Hamiltonian (i.e. when it can be covered by a set of non overlapping Hamiltonian sub-cycles). Yet the length of these resulting cycles is always close to the Hamiltonian one. We also observed the formation of optimal or close-to-optimal cycles in non-Hamiltonian environments. In this last case, some agents follow a path that crosses itself in order to extend it and ensure that all cycles have the same length. Although we have proved that EVAW achieves the patrolling task (agents repeatedly visiting all cells), a theoretical proof that cycles are necessarily obtained is still missing. Furthermore, we plan in future work to study the mechanism leading systematically to an organization in cycles, even if the time to converge to a stable solution is huge. The objective is to possibly improve the algorithm so as to find better solutions or find good solutions faster. Concerning a real implementation of EVAP and VAW0 , both require that some computational entities be synchronized: • the “smart cells” in the case of EVAP and • the mobile robots for VAW0 . If computations take place in different entities in each algorithm, both rely on digital marks —possibly based on sensor networks or future dust sensors— as a shared memory. Patrolling algorithms and pervasive technologies will have to jointly evolve so as to provide a real-world solution to the patrolling problem. Real-world settings will also add constraints such as limited resources, robots avoidance and human-robot interaction. These algorithms should also be considered for offline pathplanning: they are known to compare with the state of the art algorithms for finding Hamiltonian cycles in a graph [11]. It has also been shown experimentally in [3] that the number of agents asymptotically increases the performance up to a limit value. However, robustness of the algorithms still needs to be demonstrated in the face of perturbations such as: • dynamic changes in the graph as studied in [13], • asynchronicity between the cells or the robots’ clocks, • noisy observations and uncertain actions. Under such perturbations, some theoretical questions remain open: • Will EVAW always self-organize in a set of cycles ? • Could we compute a complexity bound for cycle formation ? • If EVAW does not converge to a set of cycles, is the patrolling still guaranteed ? • Could we bound the average/maximum idleness ?
6
CONCLUSION
In this paper we investigated emergent behaviors occurring in antbased algorithms defined for the multi-agent patrolling problem. Such theoretical results are still rare in the reactive MAS community. We have presented and compared two similar algorithms: EVAP [3] and VAW0 [12]. Then we have introduced EVAW for practical reasons, using it both for theoretical and experimental studies. The main novelty of the paper is the theoretical study of the stability of cycles generated by the algorithm. Whereas Wagner et al. only considered Hamiltonian cycles in a single-agent setting, we proved that, in the multi-agent case, only cycles of same lengths can persist as limit cycles. Then we identified patterns that ensure that several cycles with same length will remain stable forever. We also presented and discussed different spatial self-organizations. In future work, we plan to generalize our results and continue the theoretical study of the emergent behaviors of EVAW. In particular, we want to go deeper in the analysis of the mechanisms underlying cycles formation. We plan also to work on experimental and theoretical bounds of algorithm complexity. Concerning applications, we are currently experimenting this algorithm with simulated drones involved in military base surveillance (SMAART DGA project).
REFERENCES [1] A. L. Almeida, P. M. Castro, T. R. Menezes, and G. L. Ramalho, ‘Combining idleness and distance to design heuristic agents for the patrolling task’, in II Brazilian Workshop in Games and Digital Entertainment, pp. 33–40, (2003). [2] R. Beckers, O.E. Holland, and J.-L. Deneubourg, ‘From local actions to global tasks: stigmergy and collective robotics’, in Artificial Life IV: Proc. of the 4th Int. Workshop on the synthesis and the simulation of living systems, MIT Press, (1994). [3] H. Chu, A. Glad, O. Simonin, F. Sempe, A. Drogoul, and F. Charpillet, ‘Swarm approaches for the patrolling problem, information propagation vs. pheromone evaporation’, in ICTAI’07 IEEE International Conference on Tools with Artificial Intelligence, pp. 442–449, (2007). [4] A. Colorni, M. Dorigo, and V. Maniezzo, ‘Distributed optimization by ant colonies’, in in proceedings of ECAL91, European Conference on Artificial Life, pp. 134–142, Paris, (1991). Elsevier. [5] A. Drogoul and J. Ferber, ‘From tom thumb to the dockers: Some experiments with foraging robots’, in 2nd Int. Conf. On Simulation of Adaptative Behaviors, pp. 451–459, Honolulu, (1992). [6] T. H. Labella, M. Dorigo, and J-L. Deneubourg, ‘Division of labor in a group of robots inspired by ant’s foraging behavior’, ACM Transactions on Autonomous and Adaptive Systems, 1, 4–25, (2006). [7] F. Lauri and F. Charpillet, ‘Ant colony optimization applied to the multiagent patrolling problem’, in IEEE Swarm Intelligence Symposium, (2006). [8] A. Machado, G. Ramalho, J-D. Zucker, and A. Drogoul, ‘Multi-agent patrolling: an empirical analysis of alternative architectures’, in Third International Workshop on Multi-Agent Based Simulation, pp. 155– 170, (2002). [9] J. A. Sauter, R. Matthews, H. V. D. Parunak, and S. Brueckner, ‘Evolving adaptive pheromone path planning mechanisms’, in Proc. of AAMAS’02, pp. 434–440, (2002). [10] J. A. Sauter, R. Matthews, H. V. D. Parunak, and S. Brueckner, ‘Performance of digital pheromones for swarming vehicle control’, in Proc. of AAMAS’05, pp. 903–910, (2005). [11] I. Wagner and A. Bruckstein, ‘Hamiltonian(t) - an ant-inspired heuristic for recognizing hamiltonian graphs’, in Ant-Algorithms Session in CEC’99 International Joint Conference on Neural Networks, (1999). [12] I. Wagner, M. Lindenbaum, and A. Bruckstein, ‘Distributed covering by ant-robots using evaporating traces’, IEEE Transactions on Robotics and Automation, 15, 918–933, (1999). [13] I. Wagner, M. Lindenbaum, and A. Bruckstein, ‘Ants agents networks trees and subgraphs’, Future Generation Computer Systems Journal, 16(8), 915–926, (2000).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-631
631
Incremental Component-Based Construction and Verification of a Robotic System Ananda Basu1 and Matthieu Gallien2 and Charles Lesire2 and Thanh-Hung Nguyen1 and Saddek Bensalem1 and F´elix Ingrand2 and Joseph Sifakis1 Abstract. Autonomous robots are complex systems that require the interaction/cooperation of numerous heterogeneous software components. Nowadays, robots are critical systems and must meet safety properties including in particular temporal and real-time constraints. We present a methodology for modeling and analyzing a robotic system using the BIP component framework integrated with an existing framework and architecture, the LAAS Architecture for Autonomous System, based on GenoM. The BIP componentization approach has been successfully used in other domains. In this study, we show how it can be seamlessly integrated in the preexisting methodology. We present the componentization of the functional level of a robot, the synthesis of an execution controller as well as validation techniques for checking essential “safety” properties.
1
Introduction
A central idea in systems engineering is that complex systems are built by assembling components (building blocks). Components are systems characterized by an abstraction that is adequate for composition and re-use. It is possible to obtain large components by composing simpler ones. Component-based design confers many advantages such as reuse of solutions, modular analysis and validation, reconfigurability, controllability, etc. Autonomous robots are complex systems that require the interaction/cooperation of numerous heterogeneous software components. They are critical systems as they must meet safety properties including in particular, temporal and real-time constraints. Component-based design relies on the separation between coordination and computation. Systems are built from units processing sequential code insulated from concurrent execution issues. The isolation of coordination mechanisms allows a global treatment and analysis. One of the main limitations of the current state-of-the-art is the lack of a unified paradigm for describing and analyzing the information flow between components. Such a paradigm would allow system designers and implementers to formulate their solutions in terms of tangible, well-founded and organized concepts instead of using dispersed coordination mechanisms such as semaphores, monitors, message passing, remote call, protocols, etc. It would allow in particular, a comparison of otherwise unrelated architectural solutions and could be a basis for evaluating them and deriving implementations in terms of specific coordination mechanisms. The designers of complex systems such as autonomous robots need scalable analysis techniques to guaranteeing essential proper1 2
VERIMAG CNRS/University Joseph Fourier, Grenoble, France LAAS/CNRS, Unversity of Toulouse, Toulouse, France.
ties such as the one mentioned above. To cope with complexity, these techniques are applied to component-based descriptions of the system. Global properties are enforced by construction or can be inferred from component properties. Furthermore, componentized descriptions provide a basis for reconfiguration and evolutivity. We present an incremental componentization methodology and technique which seamlessly integrate with the already existing LAAS architecture for autonomous robot. The methodology considers that the global system architecture can be obtained as the hierarchical composition of larger components from a small set of classes of atomic components. Atomic components are units processing sequential code that offer interactions through their interface. The technique is based on the use of the Behavior-InteractionPriority (BIP) [2] component framework which encompasses incremental composition of heterogeneous real-time components. The main contributions of the paper include: • A methodology for componentizing and architecting autonomous robot systems applied to the existing LAAS architecture. • Composition techniques for organizing and enforcing complex event-based interaction using the BIP framework. • Validation techniques for checking essential properties, including scalable compositional techniques relying on the analysis of the interactions between components. The paper is structured as follows. In Section 2 we illustrate with a real example, the preexisting architecture (based on GenoM [6]) of an autonomous robotic software developed at LAAS. From this architecture, we identify the atomic components used for the componentization of the robot software in BIP. Section 3 provides a succinct description of the BIP component framework. Section 4 presents a methodology for building the BIP model of existing GenoM functional modules and their integration with the rest of the software. Controller synthesis results as well as “safety” properties analysis are also presented. Section 5 concludes the paper with a state of the art, an analysis of the current results and future work directions.
2
Modular Architecture for Autonomous Systems
At LAAS, researchers have developed a framework, a global architecture, that enables the integration of processes with different temporal properties and different representations. This architecture decomposes the robot system into three main levels, having different temporal constraints and manipulating different data representations [1]. This architecture is used on a number of robots (e.g. DALA, an iRobot ATRV) and is shown on Fig. 1. The levels in this architecture are :
632
A. Basu et al. / Incremental Component-Based Construction and Verification of a Robotic System
Figure 2.
Figure 1. An instance of the LAAS architecture for the DALA Robot.
• a functional level: it includes all the basic built-in robot action and perception capacities. These processing functions and control loops (e.g., image processing, obstacle avoidance, motion control, etc.) are encapsulated into controllable communicating modules developed using GenoM3 . Each modules provide services which can be activated by the decisional level according to the current tasks, and posters containing data produced by the module and for other (modules or the decisional level) to use. • a decisional level: this level includes the capacities of producing the task plan and supervising its execution, while being at the same time reactive to events from the functional level. • At the interface between the decisional and the functional levels, lies an execution control level that controls the proper execution of the services according to safety constraints and rules, and prevents functional modules from unforeseen interactions leading to catastrophic outcomes. In recent years, we have used the R2C [14] to play this role, yet it was programmed on the top of existing functional modules, and controlling their services execution and interactions, but not the internal execution of the modules themselves. The organization of the overall system in layers and the functional level in modules are definitely a plus with respect to the ease of integration and reusability. Yet, an architecture and some tools are not “enough” to warrant a sound and safe behavior of the overall system. In this paper the componentization method we propose will allow us to synthesize a controller for the overall execution of all the functional modules and will enforce by construction the constraints and the rules between the various functional modules. Hence, the ultimate goal of this work is to implement both the current functional level and execution control level with BIP.
A GenoM module organization.
in Fig. 1 (which only shows the data flow of the functional level), there is an explicit periodical processing loop. The module Laser RF acquires the laser range finder and store them in the poster Scan, from which Aspect builds the obstacles map Obs. The module NDD (responsible for the navigation) avoids these obstacles while periodically producing a Speed reference to reach a given target from the current position Pos produced by POM. Finally, this Speed reference is used by RFLEX, which controls the speed of the robots wheels, and also produces the odometry position to be used by POM to generate the current position.4 All these modules are built using a unique generic canvas (Fig. 2) which is then instantiated for a particular robot function. Each module can execute several services started upon upper level requests. The module can send information relative to the executed requests to the client (such as the final report) or share data with other modules using posters. E.g. the NDD module provides six services corresponding to initializations of the navigation algorithm (SetParams, SetDataSource andSetSpeed), launching and stopping the path computation toward a given goal (Stop and GoTo) and a permanent service (Permanent). To execute this path, NDD exports the Speed poster which contains the speed reference. The services are managed by a control task responsible for launching corresponding activities within execution tasks. Control and execution tasks share data using the internal data structures (IDS). Moreover execution tasks have periods in which the several associated activities are scheduled. It is not necessary to have fixed length periods if some services are aperiodic. Fig. 3 presents the automata of an activity. Activity states correspond to the execution of particular elementary code (codels) available through libraries and dedicated either to initialize some parameters (START state), to execute the activity (EXEC state) or to safely end the activity leading to reseting parameters, sending error signals, etc.
3
The BIP Component Framework 5
2.1
GenoM
Functional Modules
Each module of the LAAS architecture functional level is responsible for a function of the robot. Complex modalities (such as navigation) can be obtained by having modules “working” together. For example 3
The GenoM tool can be freely downloaded from: http://softs.laas.fr/openrobots/wiki/genom
BIP [2] is a software framework for modeling heterogeneous realtime components. The BIP component model is the superposition of three layers: the lower layer describes the behavior of a component as a set of transitions (i.e a finite state automaton extended with 4 5
This particular setup will serve as an example throughout the rest of the paper. The BIP tool-set can be downloaded from: http://www-verimag.imag.fr/˜async/BIP/bip.html.
A. Basu et al. / Incremental Component-Based Construction and Verification of a Robotic System
633
START
_/started
request(arg)/_
in
abort/_
x ETHER
_/failed
FAIL
_/interrupted
_/OK(ret)
EXEC
IDLE
empty
in 0<x y:=f(x)
out y out
full
abort/_ INTER
abort/_ events : input / output
Figure 4. An example of an atomic component in BIP.
END
Figure 3. Execution automaton of an activity.
Figure 5.
data); the intermediate layer includes connectors describing the interactions between transitions of the layer underneath; the upper layer consists of a set of priority rules used to describe scheduling policies for interactions. Such a layering offers a clear separation between component behavior and structure of a system (interactions and priorities). BIP allows hierarchical construction of compound components from atomic ones by using connectors and priorities. An atomic component consists of a set of ports used for the synchronization with other components, a set of transitions and a set of local variables. Transitions describe the behavior of the component. They are represented as a labeled relation between control states. Fig. 4 shows an example of an atomic component with two ports in, out, variables x, y, and control states empty, f ull. At control state empty, the transition labeled in is possible if 0 < x. When an interaction through in takes place, the variable x is eventually modified and a new value for y is computed. From control state f ull, the transition labeled out can occur. Connectors specify the interactions between the atomic components. A connector consists of a set of ports of the atomic components which may interact. If all the ports of a connector are incomplete then synchronization is by rendezvous. That is, only one interaction is possible, the interaction including all the ports of the connector. If a connector has one complete port then synchronization is by broadcast. That is, the complete port may synchronize with the other ports of the connector. The possible interactions are the non empty sublists containing this complete port. the feasible interactions of a connector and in particular to model the two basic modes of synchronization, rendezvous and broadcast. Priorities in BIP are a set of rules used to filter interactions amongst the feasible ones. The model of a system is represented as a BIP compound component which defines new components from existing components (atoms or compounds) by creating their instances, specifying the connectors between them and the priorities. The BIP framework consists of a language and a toolset including a front-end for editing and parsing BIP programs and a dedicated platform for the model validation. The platform consists of an engine and software infrastructure for executing simulation traces of models. It also allows state space exploration and provides access to model-checking tools like Evaluator [10]. This permits to validate BIP models and ensure that they meet properties such as deadlockfreedom, state invariants and schedulability. The back-end, which is the BIP engine, has been entirely implemented in C++ on Linux to allow a smooth integration of compo-
BIP model of a service.
nents with behavior expressed using plain C/C++ code.
4
The Functional Layer in BIP
The LAAS architecture makes use of a generic module for its functional layer. If we model this generic module and its components in BIP and if we then instantiate it and connect the existing “codels” to the resulting component, then we have a BIP model of the GenoM modules. Adding the BIP model of the interaction between the modules will give us a BIP model of the overall functional layer. In order to formalize the componentization approach, we propose the following mapping (+ for one component or more, and . for composing components): functional level ::= (module)+ module ::= (service)+ . (execution task) . (poster)+ service ::= (service controler) . (activity) execution task ::= (timer) . (scheduler activity) As shown in Fig. 5, a component modeling a generic Service is obtained from composing the atomic components service controller and activity. The left sub-component represents the execution task of a service. It is launched by synchronization through port trigger. The service controller then controls the validity of the parameters of the request (if available) and will either reject the request or start the activity by synchronizing with the activity component (right subcomponent). In each state, the status of the execution task is available by synchronizing through port status. The activity will then wait for execution (i.e. synchronization on the exec port with the control task) and will either safely end, fail, or abort. Each of the transitions control, start, exec, fail, finish and inter may call an external function. The service components are further composed with execution task and poster components to obtain a module component as shown in Fig. 6.
4.1
A Functional Module in BIP
The full BIP description of the functional level of the robot, which consists of several modules, is beyond the scope of this paper. We rather focus on the modeling of the NDD module. The NDD module contains six services, a poster and a control task as sub-components and the connectors between them, as shown in Fig. 8. The control task wakes up periodically (managed by the bottomleft component with alternating sleep and trigger transitions) and always triggers the Permanent service at the beginning of each pe-
634
A. Basu et al. / Incremental Component-Based Construction and Verification of a Robotic System
Figure 6. A componentized GenoM module. p
c
ERROR
tick c >= p
finish
tick IDLE
EXEC
trigger c := 0 trigger
tick
tick c
}, AE(T) = { <x p y>, <x p z>}, AE = , AI(T) = and C = { : rule: if <x p y> <x p z> then
p |
Moreover, the rdfp14a entailment, with T = {
r |
4 EXPERIMENTAL RESULTS We used the CLIPS [3] production rule engine in order to apply thirteen entailments over the LUBM [11] university ontology. Five extensional datasets Di were generated, each one of approximately 12,000 triples. Table 2 depicts the time needed to apply the dynamic and the generic rules over different dataset sizes. The dynamic approach generates about 300 rules and, despite the great number of rules, the ABOX reasoning procedure terminates considerably faster than the generic approach, where only 13 rules are applied. Table 2. Dynamic and Generic ABOX reasoning times.
12,000 24,000 36,000 48,000 60,000
Dynamic (sec) Generic (sec) 36.750 86.063 61.078 167.000 84.797 255.859 10.7406 393.109 129.719 512.312
5 RELATED WORK To the best of our knowledge, the existing rule-based reasoners that use entailments follow the generic methodology, that is both the TBOX and the ABOX entailments are generic and ontologyindependent. SweetProlog [9], Jena [12] and OWLIM [8] are some example systems that are based on general purpose rule engines, e.g. Prolog, or on rule engines built from scratch, such as the TRREE engine of OWLIM. Notice that the default Jena rule engine for OWL reasoning is a hybrid implementation, using forward chaining rules in order to generate backward chaining rules.
6 CONCLUSIONS In this paper we presented a methodology of performing rule-based OWL reasoning based on generic TBOX and on dynamic ABox entailment rules. In that way, we are able to use the TBOX rules as the basis for generating domain-dependent ABOX inferencing rules. The main characteristic of these rules is that they join less conditional elements in their body, achieving better activation times in rule engines, than their corresponding generic entailments. Currently we are working on combining a rule engine with a DL reasoner in order to dynamically generate ABOX inferencing rules based on the inferencing capabilities of the DL paradigm.
ACKNOWLEDGEMENTS This work was partially supported by a PENED program (EPAN M.8.3.1, No. 03Ǽǻ73), jointly funded by the European Union and the Greek Government (General Secretariat of Research and Technology/GSRT).
REFERENCES [1] G. Antoniou, C.V. Damasio, B. Grosof, I. Horrocks, M. Kifer, J. Maluszynski, P.F. Patel-Schneider, Combining Rules and Ontologies. A Survey, Reasoning on the Web with Rules and Semantics, REWERSE Deliverables, 2005. [2] F. Baader, U. Sattler, An Overview of Tableau Algorithms for Description Logics, Studia Logica, vol. 69, pp. 5-40, 2001 [3] CLIPS, http://www.ghg.net/clips [4] B. Grosof, I. Horrocks, R. Volz, S. Decker, Description logic programs: Combining logic programs with description logics, WWW 2003, pp. 48–57. ACM, 2003. [5] P. Hitzler, J. Angele, B. Motik, R. Studer, Bridging the Paradigm Gap with Rules for OWL. In Proc. of the W3C Workshop on Rule Languages for Interoperability, Washington, USA, 2005 [6] I. Horrocks, P.F. Patel-Schneider, A Proposal for an OWL Rules Language, 13th Int. WWW Conf., ACM, New York (2004) [7] H.J. Horst, Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary, Journal of Web Semantics, vol. 3, pp. 79-115, 2005 [8] A. Kiryakov, D. Ognyanov, D. Manov, OWLIM - a Pragmatic Semantic Repository for OWL, Proc. Workshop Scalable Semantic Web Knowledge Base Systems, USA, 2005 [9] L. Laera, V. Tamma, T.B. Capon, G. Semeraro, SweetProlog: A System to Integrate Ontologies and Rules, Rules and Rule Markup Languages for the Semantic Web, 2004. [10] A.Y. Levy, M.-C. Rousset, Combining Horn rules and description logics in CARIN, Artificial Intelligence, 104(1-2), 165–209 (1998). [11] Y. Guo, Z. Pan, J. Heflin, LUBM: A Benchmark for OWL Knowledge Base Systems, Journal of Web Semantics, 3(2), pp. 158-182, 2005 [12] B. McBride, Jena, Implementing the RDF Model and Syntax Specification, 2nd International Workshop on the Semantic Web, Hong Kong, China, 2001 [13] B. Motik, I. Horrocks, R. Rosati, U. Sattler, Can OWL and Logic Live Together Happily Ever After?, Proc. 5th ISWC, Athens, USA, 2006 [14] R. Rosati, On the decidability and complexity of integrating ontologies and rules, Web Semantics: Science, Services and Agents on the World Wide Web, vol. 3(1), pp. 61-73, July 2005. [15] D. Tsarkov, A. Riazanov, S. Bechhofer, I. Horrocks, Using Vampire to reason with OWL, International Semantic Web Conference, pp. 471-485, 2004. [16] Web Ontology Language - OWL, http://www.w3.org/2004/OWL/
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-733
733
Computability and Complexity Issues of Extended RDF Anastasia Analyti1 and Grigoris Antoniou1,2 and Carlos Viegas Dam´asio3 and Gerd Wagner4 Abstract. ERDF stable model semantics is a recently proposed semantics for ERDF ontologies and a faithful extension of RDFS semantics on RDF graphs. Unfortunately, ERDF stable model semantics is in general undecidable. In this paper, we elaborate on the computability and complexity issues of the ERDF stable model semantics.
1
Introduction
Rules constitute the next layer over the ontology languages of the Semantic Web, allowing arbitrary interaction of variables in the head and body of the rules. In [1], the Semantic Web language RDFS [4] is extended to accommodate the two negations of Partial Logic, namely weak negation ∼ (expressing negation-as-failure or non-truth) and strong negation ¬ (expressing explicit negative information or falsity), as well as derivation rules. The new language is called Extended RDF (ERDF ). In [1], the stable model semantics of ERDF ontologies is developed, based on Partial Logic, extending the model-theoretic semantics of RDFS. Intuitively, an ERDF ontology is the combination of (i) an ERDF graph G containing (implicitly existentially quantified) positive and negative information, and (ii) an ERDF program P containing derivation rules, with possibly all connectives ∼, ¬, ⊃, ∧, ∨, ∀, ∃ in the body of a rule, and strong negation ¬ in the head of a rule. ERDF enables the combination of closed-world (non-monotonic) and open-world (monotonic) reasoning, in the same framework, through the presence of weak negation (in the body of the rules) and the new metaclasses erdf :TotalProperty and erdf :TotalClass, respectively. In [1], it is shown that stable model entailment conservatively extends RDFS entailment from RDF graphs to ERDF ontologies. Unfortunately, satisfiability and entailment under the ERDF stable model semantics are in general undecidable. In this paper, we elaborate on the computability and complexity issues of the ERDF stable model semantics. Additionally, we propose a slightly modified semantics on ERDF ontologies, called ERDF #nstable model semantics that is also a faithful extension of RDFS semantics on RDF graphs and achieves decidability.
2
Stable Model Semantics of ERDF Ontologies
In this Section, we briefly review ERDF ontologies and their stable model semantics. Details and examples can be found in [1]. A (Web) vocabulary V is a set of URI references and/or literals (plain or typed). We denote the set of all URI references by URI. 1
Institute of Computer Science, FORTH-ICS, Crete, Greece, e-mail: analyti@ics.forth.gr Department of Computer Science, University of Crete, Greece 3 CENTRIA, Departamento de Informatica, Faculdade de Ciencias e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal 4 Inst. of Informatics, Brandenburg Univ. of Technology at Cottbus, Germany 2
We consider a set of variable symbols Var such that URI, Var , and the set of literals are pairwise disjoint. In our examples, variable symbols are prefixed by “?”. Let V be a vocabulary. An ERDF triple over V is an expression of the form p(s, o) or ¬p(s, o), where s, o ∈ V ∪Var are called subject and object, respectively, and p ∈ V ∩ URI is called property. An ERDF graph G is a set of ERDF triples over some vocabulary V . We denote the variables appearing in G by Var (G), and the set of URI references and literals appearing in G by VG . Let V be a vocabulary. We denote by L(V ) the smallest set that contains the ERDF triples over V and is closed with respect to the following conditions: if F, G ∈ L(V ) then {∼F, F ∧G, F ∨G, F ⊃ G, ∃xF, ∀xF } ⊆ L(V ), where x ∈ Var . An ERDF formula over V is an element of L(V ). Intuitively, an ERDF graph G represents an existentially quantified conjunction of ERDF triples. Specifically, let G = {t1 , ..., tm } be an ERDF graph, and let Var (G) = {x1 , ..., xk }. Then, G represents the ERDF formula formula(G) = ∃?x1 , ..., ∃?xk t1 ∧ ... ∧ tm . Existentially quantified variables in ERDF graphs are handled by skolemization. Let G be an ERDF graph. The skolemization function of G is an 1:1 mapping skG : Var (G) → URI, where for each x ∈ Var (G), skG (x) is an artificial URI, denoted by G:x. The skolemization of G, denoted by sk(G), is the ground ERDF graph derived from G after replacing each x ∈ Var (G) by skG (x). An ERDF rule r over a vocabulary V is an expression of the form: Concl (r ) ← Cond (r ), where Cond (r ) ∈ L(V ) ∪ {true} and Concl (r ) is an ERDF triple or false. An ERDF program is a set of ERDF rules. We denote the set of URI references and literals appearing in P by VP . An ERDF ontology is a pair O = G, P , where G is an ERDF graph and P is an ERDF program. The vocabulary of RDF, VRDF , is a set of URI references in the rdf : namespace [4]. The vocabulary of RDFS, VRDF S , is a set of URI references in the rdfs: namespace [4]. The vocabulary of ERDF is defined as VERDF = {erdf :TotalClass, erdf :TotalProperty}. Intuitively, instances of the metaclass erdf :TotalClass are classes c that satisfy totalness, meaning that, at the interpretation level, each statement rdf :type(x, c) is either true or explicitly false. Similarly, instances of the metaclass erdf :TotalProperty are properties p that satisfy totalness, meaning that, at the interpretation level, each statement p(x, y) is either true or explicitly false. Let O = G, P be an ERDF ontology. The vocabulary of O is defined as VO = Vsk(G) ∪ VP ∪ VRDF ∪ VRDF S ∪ VERDF . In [1], the set of (ERDF) stable models of O is defined, denoted by Mst (O). Each stable model M of O (i) interprets the terms in VO and (ii) assigns intended truth and falsity extensions to the classes and properties in VO (satisfying all semantic conditions of an RDFS interpretation [4] on VO , as well as new semantic conditions, particular to ERDF). M is generated through a sequence of steps. Intuitively,
734
A. Analyti et al. / Computability and Complexity Issues of Extended RDF
starting of an intended interpretation for sk(G), a stratified sequence of rule applications is produced, where all applied rules remain applicable throughout the generation of stable model M . Let M ∈ Mst (O) and let F be an ERDF formula or ERDF graph. In [1], the model relation M |= F is defined. We say that O entails F under the (ERDF) stable model semantics, denoted by O |=st F , iff for all M ∈ Mst (O), M |= F . As an example, consider a class ex:Wine whose instances are wines and a property ex:likes(X, Y ) indicating that person X likes wine Y . Assume now that we want to select wines for a dinner such that for each guest, there is on the table exactly one wine that she/he likes. Let the class ex:Guest indicate the persons that will be invited to the dinner and let the class ex:SelectedWine indicate the wines chosen to be served. An ERDF program P that describes this wine selection problem is the following5,6 : id(?x, ?x) ← true. rdf :type(?y, SelectedWine) ← rdf :type(?x, Guest), rdf :type(?y, Wine), likes(?x, ?y), ∀?z (rdf :type(?z, SelectedWine), ∼id(?y, ?z) ⊃ ∼likes(?x, ?z)).
Consider now the ERDF graph G, containing the factual information: G = {rdf :type(Carlos, Guest), rdf :type(Gerd, Guest), rdf :type(Riesling, Wine), rdf :type(Retsina, Wine), likes(Gerd, Riesling), likes(Gerd, Retsina), likes(Carlos, Retsina)}.
Then, the ERDF ontology O = G, P has only one stable model M , for which it holds M |= rdf :type(Retsina, SelectedWine) ∧ ∼rdf :type(Riesling, SelectedWine). This is because (i) both Gerd and Carlos like Retsina and (ii) Carlos likes only Retsina. Obviously, O |=st rdf :type(Retsina, SelectedWine) ∧ ∼rdf :type(Riesling, SelectedWine). Proposition 2.1 Let G, G be RDF graphs such that VG ∩VERDF = ∅, VG ∩ VERDF = ∅, and VG ∩ skG (Var (G)) = ∅. It holds: G |=RDF S G iff G, ∅ |=st G .
3
Computability and Complexity Issues
In [1], it is shown that satisfiability and entailment under the ERDF stable model semantics are in general undecidable. The proof of undecidability exploits a reduction from the unbounded tiling problem, whose existence of a solution is known to be undecidable [2]. Note that since each constraint false ← F that appears in an ERDF ontology O can be replaced by the rule ¬t ← F , where t is an RDF, RDFS, or ERDF axiomatic triple, the presence of constraints in O does not affect decidability. An ERDF formula F is called simple, if it has the form t1 ∧...∧tk ∧∼tk+1 ∧...∧∼tm , where each ti , i = 1, ..., m, is an ERDF triple. An ERDF program P is called simple if for all r ∈ P , Cond (r ) is a simple ERDF formula or true. An ERDF ontology O = G, P is called simple, if P is a simple ERDF program. A simple ERDF ontology O (resp. ERDF program P ) is called objective, if no weak negation appears in O (resp. P ). Reduction in [1] shows that ERDF stable model satisfiability and entailment remain undecidable, even if (i) O = G, P is a simple ERDF ontology, and (ii) the terms erdf :TotalClass and erdf :TotalProperty do not appear in O (i.e., (VG ∪VP )∩VERDF = ∅). However, we will show that satisfiability and entailment under the ERDF stable model semantics are decidable, if (i) O is an objective ERDF ontology, and (ii) the entailed formula is an ERDF d-formula. 5 6
To improve readability, we ignore the example namespace ex:. Commas “,” in the body of the rules indicate conjunction ∧.
Let F be an ERDF formula. We say that F is an ERDF dformula iff (i) F is the disjunction of existentially quantified conjunctions of ERDF triples, and (ii) FVar (F ) = ∅. For example, let F = (∃?x rdf :type(?x , Vertex ) ∧ rdf :type(?x , Red )) ∨ (∃?x rdf :type(?x , Vertex ) ∧ ¬rdf :type(?x , Blue)). Then, F is an ERDF d-formula. It is easy to see that if G is an ERDF graph then formula(G) is an ERDF d-formula. Proposition 3.1 Let G, G be ERDF graphs, let P be an objective ERDF program, let F d be an ERDF d-formula, and let F be an ERDF formula. 1. The problem of establishing whether O = G, P has a stable model is NP-complete w.r.t. (|P | + 1) ∗ (|Vsk(G) | + |VP |). 2. The problems of establishing whether: (i) G, P |=st G , (ii) G, P |=st F d , and (iii) G, P |=st F , where P = ∅, are co-NP-complete w.r.t. (|P | + 1) ∗ (|Vsk(G) | + |VP |). The hardness part of the above complexity results can be proved by a reduction from the Graph 3-Colorability problem, which is a classical NP-complete problem. Moreover, participation of the above problems in NP or co-NP can be proved by showing that, from the infinite set of rdf : i terms (i ∈ IN ), only a finite subset needs to be considered for solving the corresponding problem. The following proposition shows that even if O = G, P is an objective ERDF ontology, entailment of a general ERDF formula F under the ERDF stable model semantics is still undecidable. This result can also be proved by a reduction from the unbounded tiling problem [2]. Proposition 3.2 Let G be an ERDF graph, let P be an objective program, and let F be an ERDF formula. The problem of establishing whether G, P |=st F is in general undecidable. Let O be an ERDF ontology (with weak negation possibly appearing in the program rules). The source of undecidability of the ERDF stable model semantics of O is the fact that VRDF is infinite. Thus, the vocabulary of O is also infinite (note that {rdf : i | i ≥ 1} ⊆ VRDF ⊆ VO ). Therefore, we slightly modify the definition of the ERDF stable model semantics, based on a redefinition of the vocabulary of an ERDF ontology, which now becomes finite. We call the modified semantics, the ERDF #n-stable model semantics (for n ∈ IN ). Let n ∈ IN and VO#n = VO − {rdf : i | i > n}. We define the ERDF #n-stable model semantics of O similarly to the ERDF stable model semantics of O, but now only the interpretation of the terms in VO#n is considered. The ERDF #n-stable model semantics also extends RDFS entailment from RDF graphs to ERDF ontologies. Query answering under the ERDF #n-stable model semantics is decidable. Moreover, if O is a simple ERDF ontology then query answering under the ERDF #n-stable model semantics reduces to query answering under the answer set semantics [3] for an extended logic program Π#n O . Finally, we would like to mention that the complexity results of Proposition 3.1 also hold for ERDF #n-stable model semantics.
REFERENCES [1] A. Analyti, G. Antoniou, C. V. Dam´asio, and G. Wagner. Extended RDF as a Semantic Foundation of Rule Markup Languages. Journal of Artificial Intelligence Research (JAIR), 32:37–94, 2008. [2] R. Berger. The Undecidability of the Dominoe Problem. Memoirs of the American Mathematical Society, 66:1–72, 1966. [3] M. Gelfond and V. Lifschitz. Logic programs with Classical Negation. In ICLP’90, pages 579–597, 1990. [4] P. Hayes. RDF Semantics. W3C Recommendation, 10 February 2004. Available at http://www.w3.org/TR/2004/ REC-rdf-mt-20040210/.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-735
735
Automated Web Services Composition Using Extended Representation of Planning Domain Mohamad El Falou1 and Maroua Bouzid1 and Abdel-Illah Mouaddib 1 and Thierry Vidal 2 1
INTRODUCTION
WS are distributed software components that can be exposed and invoked over the Internet using standard protocols. They communicate with their clients and with other WS by sending XML based messages over the Internet. Artificial Intelligence planning techniques can help solving the composition of WS problem. In fact, services can be modelled as actions and the business process as planning to connect the WS. The main contribution of this paper is the extension of the model of actions to handle the creation or elimination of objects as effects of actions. This contribution allows us to answer new and more expressive requests, called implicit requests, in which goals may contain objects that have been generated by the plan.
2
Related Works
The work on web service composition in the university of Trento presented in [5] translates WS into a state transition. After translating WS, the system constructs a parallel product which com" bines the n STS, this parallel product allows the n services to evolve concurrently. They use the Model Based Planner MBP [1] based on model checking techniques [6]. The drawback of this approach is that we must recalculate " whenever we add or remove a service from the domain. In [3] an approach called GOLOG based on the situation calculus is presented. GOLOG composes web services by applying logical inference techniques on pre-defined plan templates. Finally, in [7] the authors define a translation from DAML-S process models to the SHOP2 domains, and from DAML-S composition tasks to the SHOP2 planning problem. SHOP2 is a well-suited planner for working with the Process Model in an Hierarchical Task Network (HTN). HTN planning builds plans through task decomposition. All the approaches cited above suppose that the domain objects are static. In other words there is no way to eliminate nor to create objects. Furthermore, all defined requirements for the composite Web Services are defined as explicit queries.
3
Motivating Example
Let us consider a set of WS which are intended to deal with files, images and tracks as follows: 1. W S1 translates file languages. It has two services: f r2en (en2ar) translates files from French (English) to English (Arabic). 1 2
University of Caen, France, email: melfalou, bouzid, mouaddib@info.unicaen.fr IRISA - INRIA Rennes, France, email: thierry.vidal@irisa.fr
2. W S2 transforms text file formats. It has two services: latex2doc (doc2pdf ) transforms files from latex (doc) to doc (pdf ) format. 3. W S3 merges files. It has two services: mergepdf (mergedoc) merges two pdf (doc) files into a third one. As an example, let us suppose that we have two files: the first is a doc format written in English, the second is in latex format written in French and we want to obtain a file which contains the content of the two files translated to Arabic. The existing approaches dedicated to WS composition cannot express or deal with this kind of problem. To overcome this limitation, we propose an approach where the specification language of the domain consists in an extension of the specification language PDDL [4] and the WS composition mechanism is based on two planning mechanisms which are Tree-search and GraphPlan.
4
Formal Framework
Our formal framework is based on extended Planning-Graph techniques [2] allowing the creation and elimination of objects when executing services (actions). Contrary to classical approaches where a state is defined as a set of predicates, a state in our domain is defined by a set of objects, properties and relations between these objects, and we extend the definition of actions to allow the generation and elimination of objects in the environment, the assignment of new predicates to objects and the definition of new relations between them.
4.1
Preliminaries and Definitions
The domain D = (C, P ) is defined by a set of WS C = (W S1 , W S2 , ..., W Sn ) that we call a community of Web Services, and a set of predicate types P = {p1 , p2 , ..., pn } to specify the possible properties of objects and relations between them. A state q=(V,P) of the plan execution is defined by a set of objects V and their types, and a set of predicates P specifying the properties of these objects and the relationship between them. In section 3, the initial state is specified as : q0 =({(F1: file),(F2: file)},{(doc F1),(en F1),(latex F2),(fr F2)} , where F1,F2 are objects (files) and file is a type and doc, en, latex, and fr are properties. A Web Service W Si is defined by W Si = (Ti , Ai , Si ) which are respectively : type, attributes and services of WS.a A service in W Si is defined by Sik = k k k k k (P ini , P outi , P inouti , P reci , Ef f ectsi ) which are respectively : input, output and input-output objects, preconditions and effects of service execution. The service mergepdf of W S3 is defined as follows:
736
• • • • • •
M. El Falou et al. / Automated Web Services Composition Using Extended Representation of Planning Domain
P in3 = { (F1: file),(F2: file) }. P out3 = {(F: file) }. P inout3 = { }. P rec3 ={(pdf F1),(pdf F2)}. Ef f ect− 3 ={(pdf F1),(pdf F2)}. Ef f ect+ 3 ={(pdf F), (merge F F1 F2)}.
A plan is defined by a sequence of sets of services where every set is called a partial plan. More formally Π =< π1 , π2 , ..., πn > is a plan such that ∀i ∈ [1..n], πi = (si1 , ..., sin ) is a partial plan of independent services, and each sik is instantiated with real objects in the domain. One plan solution for the problem introduced in section 3 is Π =< π1 , π2 , π3 , π4 , π5 > where : • π1 =(fr2en [F1], en2ar [F2]). • π2 =(en2ar [F1], doc2pdf [F2]). • π3 =(latex2doc [F1]) • π4 =(doc2pdf [F1]). • π5 =([#F0]=mergepdf[F1, F2]). A request R = (D,q0 ,g) is defined for a domain D of W S by the initial state q0 and the goal state g. The initial and final states are defined by a set of objects and a set of associated predicates (V, P ). In the previous example, q0 =[ {(F1 :file),(F2 :file)},{(doc F1 ),(en F1 ),(latex F2 ),(fr F2 )}] and g=[ {(#F0 :file)},{(pdf #F0 ),(ar #F0 ),(merge F1 F2 #F0 )}]. The aim of using the symbol # before the name of the object is to state that it is a generated object (in the output set of the executed service), and any other object having the same type beginning with # can replace it in the domain.
5
Planning Algorithm
We have implemented two algorithms to build the solution of our problem. The first one is based on the classical Tree-search algorithm and the second one is based on the Graph plan method. The basic idea behind the Tree-search algorithm is to apply from the initial state all executable services. By doing this (expand a state) we obtain a set of new states S, if the goal is in S, a solution is found. If not, based on the strategy of Tree-search we select one of the unexpanded states. If all states are expanded we get a failure. By using this algorithm, we obtain a sequential plan of services. After that, we transform this plan into a sequential set of partial plans (set of independent services) Π =< π1 , π2 , ..., πn >.
5.1
Graph plan algorithm
The GraphPlan algorithm performs a procedure close to iterative deepening, discovering a new part of the search space at each iteration. It iteratively expands the planning graph by one level, then it searches backward from the last level of this graph for a solution. The first expansion, however, proceeds to a level Pi in which all the goal propositions are included and no pairs of them are mutex, and the set of services executed for reaching g are not mutex; and so on until reaching P0 (then the plan is found), or until reaching failure (Pi = Pi+1 and no plan is found).
6
Implementation and Results
By implementing the Tree-search and the Graph Plan algorithms, we prove that our new approach of composition WS under implicit request is feasible. In our implementation we use a part of the PDDL language, and extend it to fit our model.
Table 1.
Problem nbr of objects P1 1 P2 1 P3 2 P4 2 P5 3 P6 3 P7 4 " solution not found
Results of Tree-search algorithm strategy depth width node nbr plan size node nbr plan size 6 4 9 4 4 0 4 0 8 7 232 6 9 8 2585 8 13 12 > 4200 " 14 13 583 7 18 17 > 2900 "
We tested our algorithms on 7 examples that contain many objects and many types of variables (files, tracks and images). P2 is the problem given in section 3. In table 1 we give the number of initial objects of the different problems, the number of expanded nodes and the plan size by using depth and width strategies. We have 16 available services in the domain (illustrated in section 3). From these results we can observe that the depth strategy is very effective and in few seconds we obtain a plan by expanding a few number of nodes. By using the Graphplan algorithm, we obtain solutions for simple problems, but not for complex problems that contain a high number of objects. By applying the extended techniques of Graph-plan, we get a combinatorial explosion. The combinatorial explosion is due to the execution of services that create new objects in each level.
7
Conclusion And Perspective
In this paper, we give an extended view of the composition of Web Service problem by modelling the problem as a planning problem. In our work we propose an extended model of service to answer to composition problems that require the creation and elimination of objects as effects of the execution of a service. We also overcome the limitations of other approaches, by giving a dynamic and distributed definition of our domain. This allows us to add, remove and/or replace services without recalculating some other part of the domain. Finally, our model overcomes the limitation of pre-defined plans by defining the implicit request only through an initial and a goal states.
REFERENCES [1] P. Bertoli, A. Cimatti, M. Pistore, M. Roveri, and P. Traverso, ‘Mbp: a model based planner’, the IJCAI’01 Workshop on Planning under Uncertainty and Incomplete Information, Seattle, August 2001., (2001). [2] Paolo Traverso Malik ghallab, Dana Nau, Automated Planning , theory and practice, MORGAN KAUFMANN PUBLISHERS, Jun 2005. [3] S. McIlraith and T. Son. Adapting golog for composition of semantic web services, 2002. [4] C.Knoblock D.McDermott A.Ram M.Veloso D.Weld D.Willkins M.Ghallab, A.Howe, ‘Pddl — the planning domain definition language’, (1998). [5] M. Pistore, P. Bertoli, F. Barbon, D. Shaparau, and P. Traverso, ‘Planning and monitoring web service composition’, ICAPS 2004. [6] Marco Pistore and Paolo Traverso, ‘Planning as model checking for extended goals in non-deterministic domains’, 479–486, (2001). [7] D. Wu, E. Sirin, J. Hendler, D. Nau, and B. Parsia. Automating daml-s web services composition using shop2, 1998.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-737
737
Propositional merging operators based on set-theoretic closeness Patricia Everaere1 and S´ebastien Konieczny2 and Pierre Marquis3 Abstract. In the propositional setting, a well-studied family of merging operators are distance-based ones: the models of the merged base are the closest interpretations to the given profile. Closeness is, in this context, measured as a number resulting from the aggregation of the distances to each base of the profile. In this work we define a new familly of propositional merging operators, close to such distance-based merging operators, but relying on a set-theoretic definition of closeness, already at work in several revision/update operators from the literature. We study a specific merging operator of this family, obtained by considering set-product as the aggregation function.
1
Introduction
Information merging is a very important task in artificial intelligence: the issue is to determine the beliefs, or the goals, of a group of agents from their individual points of view. Much work has been devoted to the definition of merging operators in the propositional case [11, 9, 1, 8, 10]. In [8] a set of postulates is proposed to characterize different families of merging operators, and several families of operators satisfying those postulates are defined. Such operators are called model-based merging operators because basically they select the models of a given integrity constraint (i.e. a formula encoding laws, norms, etc., used for constraining the result of the merging) that are the closest ones to the given profile of belief/goal bases of the group. Often, those operators are defined from a distance between interpretations, which intuitively indicates how conflicting they are. This distance between interpretations induce a distance between an interpretation and a base, which indicates how plausible/satisfactory the interpretation is with respect to the base. Once such distances are computed, an aggregation function is used to define the overall distance of each model (of the integrity constraints) to the profile. Semantically, the models of the result of the merging are the closest models of the integrity constraints to the profile. A commonly-used distance between interpretations is the Hamming distance (also called Dalal distance [3]). The Hamming distance between two interpretations is the number of propositional variables the two interpretations disagree on. The amount of conflict between two interpretations is thus assessed as the number of atoms whose truth values must be flipped in one interpretation in order to 1 2 3
Universit´e Lille-Nord de France, LIFL, CNRS UMR 8022, France, email: patricia.everaere@univ-lille1.fr CNRS UMR 8188, CRIL, Universit´e Lille-Nord de France, Artois, France, email: konieczny@cril.fr Universit´e Lille-Nord de France, Artois, CRIL, CNRS UMR 8188, France, email: marquis@cril.fr
make it identical to the second one. Such a distance is very meaningful when no extra-information on the epistemic states of the agents are available. The major problem with distance-based merging operators is that evaluating the closeness between two interpretations as a number may lead to lose too many information.Thus, the conflicting variables themselves (and not only how many they are) can prove significant. Especially, when variables express real-world properties, it can be the case that some variables are more important than others, or that some variables are logically connected. In those cases, distances are not fully satisfactory. As an alternative to distance, an interesting measure used to evaluate the closeness of two interpretations is diff, the symetrical difference between them. Instead of evaluating the degree of conflict between two interpretations as the number of variables on which they differ (as it is the case with the Hamming distance), the diff measure assesses it as the set of such variables. In this work, we consider the family of propositional merging operators based on the diff measure. We specifically focus on the operator Δdiff,⊕ from this family obtained by considering set-product as the aggregation function. We evaluate it with respect to three criteria: logical properties, strategy-proofness and complexity.
2
A Diff-Based Merging Operator: Δdiff,⊕
The key idea underlying our approach consists in evaluating the degree of conflict between two interpretations ω and ω as the set of variables on which they differ: diff(ω, ω ) = {p ∈ P | ω(p) = ω (p)}. This definition has already been used in the belief revision/update literature in order to define a number of operators [6, 13, 12, 2, 14]. As for distances, we can straightforwardly define, using diff, a notion of closeness between an interpretation and a base, as the minimum closeness between the interpretation and the models of the base. Of course, since diff gives as output a set instead of a number, setinclusion has to be considered as minimality criterion: diff(ω, K) = min({diff(ω, ω ) | ω |= K}, ⊆). So the closeness between an interpretation ω and a base K is measured as the set of the minimal sets (for set inclusion) of propositional variables which have to be flipped in ω to make it a model of K. Now, we need to aggregate those measures in order to define a global notion of closeness between an interpretation and a profile. This is the aim of the aggregation functions. Of course, usual functions at work for distance-based operators cannot be used here simply because we do not deal with numbers, but with sets. Several aggregation functions can be considered in our setting. For space reasons, we focus on a single one in this paper. We consider
738
P. Everaere et al. / Propositional Merging Operators Based on Set-Theoretic Closeness
set-product ⊕ as an aggregation function: for two sets of sets E and E , E ⊕ E = {c ∪ c | c ∈ E and c ∈ E }. Definition 1 Let E = {K1 , . . . , Kn } be a profile and ω an interpretation. The closeness between ω and E is given by: diff(ω, E) = min({⊕Ki ∈E diff(ω, Ki )}, ⊆). By construction, each element of diff(ω, E) is a minimal set c of variables (a conflict set) such that for each base Ki , ω can be transformed into a model of Ki by flipping in ω the variables of c. Finally, we define a merging operator Δdiff,⊕ which picks up the models of the integrity constraints whose closeness to the profile E contains at least one of the minimal (w.r.t. ⊆) conflict set: Definition 2 Let E = {K1 , K2 , . . . , Kn } be a profile, μ an integrity constraint. Then diff μ (E) = min({diff(ω, E) | ω |= μ}, ⊆) and [Δdiff,⊕ (E)] = {ω |= μ | ∃c ∈ diff(ω, E) s.t. c ∈ diff μ (E)}. μ
3 Δ
Properties of diff,⊕
Δdiff,⊕
satisfies most of the logical properties proposed in [8]:
Proposition 1 Δdiff,⊕ satisfies (IC0), (IC1), (IC2), (IC3), (IC4) and (IC7). It does not satisfy (IC5), (IC6) and (IC8). Δdiff,⊕ does not satisfy (IC5) and (IC6), which are postulates capturing aggregation properties. This is not surprising since, unlike distance-based operators (as the ones based on Hamming distance), Δdiff,⊕ keeps a justification of the minimality of an interpretation (as a conflict set). Beyond the IC postulates, Δdiff,⊕ satisfies also an interesting additional logical property: Definition 3 A merging operator Δ satisfies the temperance property iff for every profile {K1 , . . . , Kn }: Δ ({K1 , . . . , Kn }) is consistent with each Ki (temperance) Proposition 2 Δdiff,⊕ satisfies (temperance). This proposition shows that the merged base obtained using Δdiff,⊕ is consistent with every base of the profile (when there is no integrity constraint). This proposition also gives an additional explanation to the fact that Δdiff,⊕ does not satisfy (IC6), since temperance is not compatible with this postulate. Proposition 3 There is no merging operator satisfying both (IC2), (IC6), and (temperance). It is worth noting that the temperance property is not satisfied by many merging operators. In particular, as implied by the previous proposition, none of the IC merging operators satisfies (temperance). Interestingly, the temperance property shows that Δdiff,⊕ can be viewed as a kind of negotiation operator, which can be used for determining the most consensual parts of the bases of all agents. Let us now investigate how robust Δdiff,⊕ is with respect to manipulation. Intuitively, a merging operator is strategy-proof if and only if, given the beliefs/goals of the other agents, reporting untruthful beliefs/goals does not enable an agent to improve her satisfaction. A formal counterpart of this idea is given in [4, 5]: Proposition 4 In the general case Δdiff,⊕ is not strategy-proof for any of the three indexes idw , ids and ip . When there is no integrity constraint (i.e., μ ≡ ), Δdiff,⊕ is strategy-proof for idw , but still not strategy-proof for ids or ip .
Most of the model-based operators are not strategy-proof, even in very restricted situations [5]. For example, ΔdH ,Σ or ΔdH ,Gmin , which are the best model-based operators with respect to strategyproofness, are not strategy-proof for idw , even if μ ≡ . Δdiff,⊕ performs better than any of them with this respect. Let us consider now the complexity issue for the inference problem from a Δdiff,⊕ -merged base. Proposition 5 MERGE(Δdiff,⊕ ) is Πp2 -complete. Hardness still holds under the restriction where E contains a single base K consisting of a conjunction of propositional variables, and α is a propositional variable. This result shows that Δdiff,⊕ is computationally harder than usual distance-based operators, but is at the same complexity level as many formula-based operators [7].
4
Conclusion
In this work we have introduced a family of model-based merging operators, relying on a set-theoretic measure of conflict. We focused on set-product as an aggregation function and considered the corresponding operator Δdiff,⊕ . A feature of this operator, typically not shared by existing model-based operators, is that it satisfies the temperance property, and as a consequence, it is strategy-proof for the weak drastic index when there are no integrity constraints. The price to be paid is a higher complexity than usual model-based operators (but similar to the one of formula-based merging operators [5]).
ACKNOWLEDGEMENTS The authors have been partly supported by the ANR project PHAC (ANR-05-BLAN-0384).
REFERENCES [1] C. Baral, S. Kraus, J. Minker, and V. S. Subrahmanian, ‘Combining knowledge bases consisting of first-order theories’, Computational Intelligence, 8(1), 45–71, (1992). [2] A. Borgida, ‘Language features for flexible handling of exceptions in information systems’, ACM Trans. on Database Syst., 10, 563–603, (1985). [3] M. Dalal, ‘Investigations into a theory of knowledge base revision: preliminary report’, in Proc. of AAAI’88, pp. 475–479, (1988). [4] P. Everaere, S. Konieczny, and P. Marquis, ‘On merging strategyproofness’, in Proc. of KR’04, pp. 357–367, (2004). [5] P. Everaere, S. Konieczny, and P. Marquis, ‘The strategy-proofness landscape of merging’, Journal of Artificial Intelligence Research, 28, 49–105, (2007). [6] H. Katsuno and A. O. Mendelzon, ‘Propositional knowledge base revision and minimal change’, Artificial Intelligence, 52, 263–294, (1991). [7] S. Konieczny, J. Lang, and P. Marquis, ‘DA2 merging operators’, Artificial Intelligence, 157, 49–79, (2004). [8] S. Konieczny and R. Pino P´erez, ‘Merging information under constraints: a logical framework’, Journal of Logic and Computation, 12(5), 773–808, (2002). [9] P. Liberatore and M. Schaerf, ‘Arbitration (or how to merge knowledge bases)’, IEEE Transactions on Knowledge and Data Engineering, 10(1), 76–90, (1998). [10] T. Meyer, P. Pozos Parra, and L. Perrussel, ‘Mediation using m-states’, in Proc. of ECSQARU’05, pp. 489–500, (2005). [11] P. Z. Revesz, ‘On the semantics of arbitration’, International Journal of Algebra and Computation, 7(2), 133–160, (1997). [12] K. Satoh, ‘Non-monotonic reasoning by minimal belief revision’, in Proccedings of the International Conference on Fifth Generation Computer Systems, pp. 455–462, (1988). [13] A. Weber, ‘Updating propositional formulas’, in Proccedings of First Conference on Expert Database Systems, pp. 487–500, (1986). [14] M. Winslett, ‘Reasoning about action using a possible model approach’, in Proc. of AAAI’88, pp. 89–93, (1988).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-739
739
Partial and Informative Common Subsumers in Description Logics Simona Colucci1 2 and Eugenio Di Sciascio1 and Francesco Maria Donini 3 and Eufemia Tinelli 4 Abstract. Least Common Subsumers in Description Logics have shown their usefulness for discovering commonalities among all concepts of a collection. Several applications are nevertheless focused on searching for properties shared by significant portions of a collection rather than by the collection as a whole. Actually, this is an issue we faced in a real case scenario that provided initial motivation for this study, namely the process of Core Competence extraction in knowledge intensive companies. The paper defines four reasoning services for the identification of meaningful common subsumers describing partial commonalities in a collection. In particular Common Subsumers adding informative content to the Least Common Subsumer are investigated, with reference to different DLs.
1
Introduction
Least Common Subsumers(LCSs) were originally proposed by Cohen, Borgida and Hirsh [5] as novel reasoning service for the Description Logic underlying Classic [4]. By definition, for a collection of concept descriptions, their LCS represents the most specific concept description subsuming all of the elements of the collection. The usefulness of such inference task has been shown in several application classes, varying from learning from examples [6, 7, 10], to similarity-based Information Retrieval [12, 13] and bottom-up construction of knowledge bases [1]. Nevertheless, there are some problems where the computation of LCS does not provide solutions. The LCS in fact intuitively represents properties shared by all the elements of a given collection. In several applications, instead, such a sharing is not required to be full: in other words we could be interested in finding a concept description subsuming a portion of the elements in the collection. Different perspectives on the introduced problem may be taken: if the LCS of the collection is the universal concept, we can determine the concept description subsuming a number m of concept descriptions in the collection, where m is the maximum cardinality of subsets of the collection for which a common subsumer non-equivalent to the universal concept exists. We give the name Best Common Subsumer to such a concept description, in analogy with LCS. Alternatively, we could be interested in determining a concept description subsuming at least k elements in the collection, where k is a threshold value established a priori on the basis of a decisional process dependent on the application domain. We give such a different concept description the name k-Common Subsumer (k-CS). In particular, the search should revert on those k-CSs adding informative content to LCS: we 1 2 3 4
SisInfLab–Politecnico di Bari, Bari, Italy D.O.O.M. s.r.l., Matera, Italy Universit`a della Tuscia, Viterbo, Italy Universit`a di Bari, Bari, Italy
call Informative k-Common Subsumer (IkCS) a k-CS more specific than the LCS of the collection. We here define the k-CS, the IkCS, the BCS and one more specific service (Best Informative Common Subsumer) and give some computation results in different DLs, namely ALN , EL and ALE.
2
Definitions
The definition of the four novel services relies on Least Common Subsumer definition, which we recall in the following. Definition 1 (LCS, [7]) Let C1 , . . . , Cn be n concepts in a DL L. An LCS of C1 , . . . , Cn , denoted by LCS(C1 , . . . , Cn ), is a concept E in L such that the following conditions hold:(i) Ci E for i = 1, . . . , n; (ii) E is the least L-concept satisfying (i),i.e., , if E is an L-concept satisfying Ci E for all i = 1, . . . , n, then E E . We define in the following a new concept, which represents the commonalities of k concepts out of the n in a collection of DL concepts. Definition 2 (k-CS) Let C1 , . . . , Cn be n concepts in a DL L, and let be k < n. A k-Common Subsumer (k-CS) of C1 , . . . , Cn is a concept D such that D is an LCS of k concepts among C1 , . . . , Cn . Among k-Common Subsumers we distinguish concepts adding informative content to the LCS of the investigated collection. Definition 3 (IkCS) Let C1 , . . . , Cn be n concepts in a DL L, and let k < n. An Informative k-Common Subsumer (IkCS) of C1 , . . . , Cn is a k-CS E such that E is strictly subsumed by LCS(C1 , . . . , Cn ). Some Informative k-Common Subsumers are peculiar for subsuming the maximum number of concepts in the collection, with such a maximum less than the cardinality n of the collection. We therefore define in what follows: Definition 4 (BICS) Let C1 , . . . , Cn be n concepts in a DL L. A Best Informative Common Subsumer (BICS) of C1 , . . . , Cn is a concept B such that B is an Informative k-CS for C1 , . . . , Cn , and for every k < j ≤ n every j-CS is not informative. For collections whose LCS is equivalent to the universal concept the following definition makes also sense: Definition 5 (BCS) Let C1 , . . . , Cn be n concepts in a DL L. A Best Common Subsumer (BCS) of C1 , . . . , Cn is a concept S such that S is a k-CS for C1 , . . . , Cn , and for every k < j ≤ n every j-CS ≡ . Proposition 1 If LCS(C1 , . . . , Cn ) ≡ , every BCS is a BICS.
740
S. Colucci et al. / Partial and Informative Common Subsumers in Description Logics
Even though the services defined above may appear quite similar to each other at a first sight, it has to be underlined that they deal with different problems: k-CS: can be computed for every collection of elements and finds least common subsumers of k elements among the n belonging to the collection; IkCS: describes those k-CSs adding an informative content to the one provided by LCS, i.e., more specific than LCS. Observe that IkCS does not exist when every subset of k concepts has the same LCS as the one of all C1 , . . . , Cn ; BICS: describes IkCSs subsuming h concepts, such that h is the maximum cardinality of subsets of the collection for which an IkCS exists. A BICS does not exist if and only if Ci ≡ Cj for all i, j = 1, . . . , n; BCS: may be computed only for collections admitting only LCS equivalent to the universal concept; it finds k-CSs such that k is the maximum cardinality of subsets of the collection for which an LCS not equivalent to exists.
For computing Lk it is sufficient to compute for every subset {i1 , . . . , ik } ⊆ {1, . . . , n} the concept LCS(Ci1 , . . . , Cik ). The same holds for Ik , excluding those LCS(C1 , . . . , Ck ) which are equivalent to LCS(C1 , . . . , Cn ). For the computation of the sets B and BI, instead, an algorithm can be defined[8], based on the one proposed by Kusters and Molitor [11] for LCS computation.
3
Acknowledgment
Computation
4
Conclusions
Motivated by a real-world application need —finding Core Competence in knowledge-intensive companies— we defined and investigated novel reasoning services finding commonalities among portions in a collection of concepts in ALN , EL and ALE. In all of the three studied languages a computation algorithm has been designed. The computation algorithm for ALN has been also implemented in the framework of IMPAKT, a novel and optimized knowledge-based system for competences and skills management[9], which will be released late this year by D.O.O.M. s.r.l.
The complexity of computing the common subsumers defined in Section 2 depends on the specific DL in which the collection is represented. We will therefore separate the results for three different DLs in the following. Nevertheless, some results are common to every DL, like the following theorem, which deals with the cardinality of the set of k-CSs, given a collection of concepts in a DL L.
We thank Franz Baader for helpful discussions. This work has been supported in part by projects EU-FP6-IST-26896, PE 013 Innovative models for customer profiling and PS 092 DIPIS.
Theorem 1 For some sets of n concepts C1 , . . . , Cn in a DL L, and for some k < n, there are exponentially many kCS of C1 , . . . , Cn .
[1] F. Baader and R. K¨usters. Computing the least common subsumer and the most specific concept in the presence of cyclic ALN -concept descriptions. In Proc. of KI-98, volume 1504 of LNCS, pages 129–140, Bremen, Germany, 1998. Springer–Verlag. [2] F. Baader, R. K¨usters, and R. Molitor. Computing least common subsumer in description logics with existential restriction. Technical Report LTCS-Report 98-09, RWTH Aachen, 1998. [3] F. Baader and A.-Y. Turhan. On the problem of computing small representations of least common subsumers. In Proc. of KI 2002, volume 2479 of LNAI, Aachen, Germany, 2002. Springer-Verlag. [4] A. Borgida, R.J. Brachman, D. L. McGuinness, and L. Alperin Resnick. CLASSIC: A structural data model for objects. In Proc. of ACM SIGMOD, pages 59–67, 1989. [5] W. Cohen, A. Borgida, and H. Hirsh. Computing least common subsumers in description logics. In Proc. of AAAI-92, pages 754–760. AAAIP, 1992. [6] W. Cohen and H. Hirsh. The learnability of description logics with equality constraints. Machine Learning, 17(2-3):169–199, 1994. [7] W. Cohen and H. Hirsh. Learning the CLASSIC description logics: Theorethical and experimental results. In Proc. of KR’94, pages 121– 133, 1994. [8] S. Colucci, E. Di Sciascio, and F.M. Donini. Partial and informative common subsumers of concept collections in description logics. In Proc. of DL 2008, 2008. [9] S. Colucci, T. Di Noia, E. Di Sciascio, F.M. Donini, and A. Ragone. Semantic-based skill management for automated task assignment and courseware composition. Journal of Universal Computer Science, 13(9):1184–1212, 2007. [10] M. Frazier and L. Pitt. C LASSIC learning. In Proc. of the 7th Annual ACM Conference on Computational Learning Theory, pages 23– 34, New Brunswick, New Jersey, 1994. ACM Press and Addison Wesley. [11] R. K¨usters and R. Molitor. Structural Subsumption and Least Common Subsumers in a Description Logic with Existential and Number Restrictions. Studia Logica, 81:227–259, 2005. [12] T. Mantay, R. Moller, and A. Kaplunova. Computing probabilistic least common subsumers in description logics. In KI - Kunstliche Intelligenz, volume 1701 of LNCS, pages 89–100. Springer-Verlag, 1999. [13] R. M¨oller, V. Haarslev, and B. Neumann. Semantics-based information retrieval. In Proc. of IT&KNOWS-98, Vienna, Budapest, 1998.
The following theorem, instead, focuses on the complexity for finding a BCS w.r.t. to the one for computing an LCS. Theorem 2 Let m be the sum of the sizes of C1 , . . . , Cn . Then finding a BCS of C1 , . . . , Cn amounts to the computation of O(m2 ) subsumption tests in L, plus the computation of one LCS. Both theorems are proved in [8]. Hereafter, regardless of the DL employed for the representation of concepts, we will refer to the solution sets for the introduced reasoning services by the names: B for the set of BCSs, BI for the set of BICSs, Ik for the set of IkCSs, given k < n and Lk for the set of k-CSs, given k < n. For a collection of concept descriptions in ALN , an algorithm can be defined computing the solution sets [8]. Complexity results for this algorithm are claimed in the following theorem. Theorem 3 Let C1 , . . . , Cn , T be n concepts and a simple Tbox in ALN , let m be the sum of the sizes of C1 , . . . , Cn , and let S(s) be a monotone function bounding the cost of deciding C T D in ALN , whose argument s is |C| + |D| + |T |. The computation of the solution sets B, BI, Lk , Ik for a collection of concept descriptions in ALN is then a problem in O(m2 + (S(m))2 ). Baader et al. [2] showed that, by taking into account existential restriction, the n-ary LCS operation is exponential, even for the small DL EL, and even shortening possible repetitions by using a TBox [3]. The computation results for the determination of the solution sets of a concept collection in EL and ALE are affected by results for LCS: Theorem 4 The computation of the solution sets B, BI, Lk , Ik for a collection of concept descriptions in EL or ALE may be reduced to the problem of computing the LCS of the subsets of the collection and may then grow exponential in the size of the collection.
REFERENCES
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-741
741
Prime Implicate-based Belief Revision Operators 1 Meghyn Bienvenu2 and Andreas Herzig3 and Guilin Qi4 1
INTRODUCTION
A belief revision operator can be seen as a function which takes as input a set of beliefs K and an input formula ϕ and outputs a new set of beliefs K ϕ. Many of the belief revision schemes that have been defined in the literature require additional input. The extra information they need comes in various forms: relations over subsets of the sets of beliefs [2], epistemic entrenchment relations [1], system of spheres [6], faithful orderings [7], etc. In many applications we do not have such background information, which is why there is a need for revision operators which give good results without it. Unfortunately, the above approaches appear ill-suited to cases where we do not have any information regarding the relative importance of different beliefs or models. For example, if we accord equal importance to each of the beliefs (or each model of the beliefs or non-beliefs), which seems the most reasonable thing to do if we have no preference information, then these approaches result in the infamous drastic revision operator which gives up all old beliefs whenever the incoming information contradicts them. All of the above belief revision schemes are insensitive to syntax: logically equivalent sets of beliefs are revised in the same way, and logically equivalent input formulas lead to the same result. The so-called formula-based approaches, like the full meet [5, 4] and cardinality-maximizing base revision operators [5, 9], abandon the postulate of insensitivity to syntax, and allow e.g. the set of beliefs K1 = {a, b} to be revised differently from K2 = {a ∧ b}. Such approaches can do without extra information: they do not collapse into the drastic revision operator. There are only very few belief revision operators that are both insensitive to syntax and independent of extra information. The most prominent one is Dalal’s [3]. It is often called model-based: revision is identified with a move from the models of K to those models of ϕ that are closest in terms of the Hamming distance. Two other modelbased revision operators exist: Weber’s [12] and Satoh’s [11]. In this paper we propose two revision operators which are formulabased yet syntax-insensitive, and do not rely on background information. Our operators are obtained by first replacing the belief base by its set of prime implicates and then applying either the full meet or the cardinality-maximizing base revision operators. The prime implicates of a belief base, defined as its logically strongest clausal consequences, can be seen as the primitive semantic components of the belief base, from which all other beliefs can be derived. We argue that when no extra information is available, prime implicates provide a natural and interesting way of representing a set of beliefs. Moreover, the fact that equivalent sets of formulae have the same sets of 1 2 3 4
The third author acknowledges partial support by the EU under the IST project NeOn (IST-2006-027595, http://www.neon-project.org/) IRIT-Universit´e Paul Sabatier, France, bienvenu@irit.fr IRIT-CNRS, France, herzig@irit.fr Institute AIFB, Universit¨at Karlsruhe, Germany, gqi@aifb.uni-karlsruhe.de
prime implicates guarantees the syntax-insensitivity of our operators.
2
FORMAL PRELIMINARIES
We consider a propositional language built out of a finite set of atoms and the usual Boolean connectives. We suppose the latter includes the 0-ary connective ⊥. We will use V(ϕ) to refer to the set of atoms occurring in ϕ. A belief base is a finite set of propositional formulae. Where convenient, we will identify W a belief base with the conjunction of its elements. We will use K to denote the disjunction of the elements in the belief base K. A literal is either an atom or the negation of an atom, and a clause is a disjunction of literals. Prime implicates (cf. [8]) are defined as the logically strongest clausal consequences of a formula. By definition, if π is a prime implicate of ϕ, then so too are all clauses equivalent to π. To simplify the presentation, we will choose a representative for each equivalence class of clauses, and we let Π(ϕ) denote the set of representatives of equivalence classes of prime implicates of ϕ. We define the minimal language of a formula ϕ, written V0 (ϕ), to be the set of atoms occurring in every formula ϕ which is equivalent to ϕ. A set {A1 , . . . , An } of sets of atoms is a splitting of a belief base K if and only if the Ai partition V0 (K) and there exist formulae V ϕ1 , . . . , ϕn such that K ≡ n i=1 ϕi and V(ϕi ) ⊆ Ai for all i. A splitting {A1 , . . . , An } of K is a finest splitting of K just in the case that if {A1 , . . . , Ap } is another splitting of K, then for every Ai there is some Aj such that Ai ⊆ Aj . It was shown in [10] that every belief base has a unique finest splitting. We will use K⊥ϕ and K⊥Card ϕ to denote respectively the set of inclusion- and cardinality-maximal subsets of K consistent with ¬ϕ.
3
PROPOSED REVISION OPERATORS
Our first revision operator Π conjoins the input ϕ and the disjunction of the maximal subsets of Π(K) consistent with ϕ. It is essentially the same as the syntactic full meet base revision operator [5, 4] except that instead of dealing directly with the formulae in the belief base we deal with the prime implicates of the belief base. Definition 1. Let K be a belief base and ϕ be a formula. Then the prime implicate-based full meet revision operator, written Π , is defined as follows: _ K Π ϕ = ϕ ∧ (Π(K)⊥¬ϕ) We illustrate the functioning of our operator on some examples: Example 2. Let K = {a ∨ b, a ∨ c} and ϕ = ¬a ∧ ¬b. We have Π(K) = K, and Π(K)⊥¬ϕ = {{a ∨ c}}, so the result of revising K by ϕ is ¬a ∧ ¬b ∧ (a ∨ c) ≡ ¬a ∧ ¬b ∧ c.
742
M. Bienvenu et al. / Prime Implicate-Based Belief Revision Operators
Example 3. Let K = {a ∨ c, ¬b ∨ d, ¬a ∨ b} and let ϕ = ¬c ∧ ¬d. Then Π(K) = {a ∨ c, ¬b ∨ d, ¬a ∨ b, b ∨ c, ¬a ∨ d, c ∨ d}. The maximal subsets of Π(K) consistent with ϕ are P1 = {a∨c, ¬b∨d}, P2 = {a ∨ c, ¬a ∨ b, b ∨ c}, P3 = {¬b ∨ d, ¬a ∨ b, ¬a ∨ d}, and P4 = {¬a∨b, b∨c, ¬a∨d}. Now P1 ∧¬c∧¬d ≡ a∧¬b∧¬c∧¬d, P2 ∧ ¬c ∧ ¬d ≡ a ∧ b ∧ ¬c ∧ ¬d, P3 ∧ ¬c ∧ ¬d ≡ ¬a ∧ ¬b ∧ ¬c ∧ ¬d, and P4 ∧ ¬c ∧ ¬d ≡ ¬a ∧ b ∧ ¬c ∧ ¬d , so K Π ϕ ≡ ¬c ∧ ¬d. In the last example, none of the prime implicates from K can be inferred from the revised base K Π ϕ. This is because our operator takes the disjunction of all the inclusion-maximal subsets consistent with the revision formula, which means that those prime implicates which do not appear in every inclusion-maximal subset can be lost when we take the disjunction. The solution lies in selecting only some of the inclusion-maximal subsets. If we have no information regarding the importance of different beliefs, as we assume here, there is no sure way of choosing among the subsets. One reasonable heuristic is to accord equal importance to each of the prime implicates, and hence to prefer those subsets which contain the most prime implicates. This leads us to propose a second revision operator which selects only those cardinality-maximal subsets consistent with the revision formula. Definition 4. Let K be a belief base and ϕ be a formula. Then the prime implicate-based cardinality-maximizing revision operator, written Π,Card , is defined as follows: _ K Π,Card ϕ = ϕ ∧ (Π(K)⊥Card ¬ϕ) The operator Π,Card can be seen as a syntax-insensitive version of the cardinality-maximizing base revision operator [5, 9]. Example 5. Let K and ϕ be as in Example 3. P2 , P3 , and P4 are the cardinality-maximal subsets that are consistent with ϕ. So we have K Π,Card ϕ ≡ (¬a ∨ b) ∧ ¬c ∧ ¬d, which is logically stronger than ¬c ∧ ¬d which is obtained using Π .
3.1
Properties of Our Operators
Revision operators are often judged based on whether they satisfy the well-known AGM postulates [2]. These postulates are formulated for logically closed sets of formulae (belief sets), but they can be modified so as to apply to belief bases. The modified postulates (omitted for lack of space) are known as the KM postulates [7]. Our first operator satisfies the first five KM postulates but fails to satisfy the last one. Proposition 6. Π satisfies KM1-KM5, but falsifies KM6. This proposition is not surprising since Katsuno and Mendelzon showed in [7] that KM6 ensures that the faithful assignment corresponding to the revision operator is a total pre-order.5 As our prime implicate-based full meet operator uses inclusion to compare subsets of prime implicates, it induces a partial and not a total pre-order over the set of interpretations. Katsuno and Mendelzon argued however in [7] that requiring the faithful assignment to be total may be too strong in practice, and they proposed to replace KM6 with weaker postulates KM7 and KM8. Since they are less well-known, we recall them here: KM7 If K ϕ1 |= ϕ2 and K ϕ2 |= ϕ1 then K ϕ1 ≡ K ϕ2 . 5
A faithful assignment maps a belief base K to a pre-order ≤K over the set of all interpretations of the language.
KM8 (K ϕ1 ) ∧ (K ϕ2 ) |= K (ϕ1 ∨ ϕ2 ). We show that both of these postulates are satisfied by our operator. Proposition 7. Π satisfies KM7 and KM8. Our cardinality-based operator satisfies all KM postulates. Proposition 8. Π,Card satisfies KM1-KM6. The AGM/KM postulates have been criticized for admitting revision operators that discard beliefs that have no real connection with the incoming information. For instance, there are AGM/KM operators for which (a ∧ b) ¬a |= b, even though intuitively we expect b to survive the revision. In an attempt to remedy this, Parikh [10] proposed an additional postulate which can be formulated as follows: Relevance If K is satisfiable and K |= ϕ and K ψ |= ϕ, then there is some set of atoms A in the finest splitting of K such that both V0 (ϕ) ∩ A = ∅ and V0 (ψ) ∩ A = ∅. We can show that our revision operators satisfy this postulate: Proposition 9. Π and Π,Card satisfy Relevance.
3.2
Comparison With Other Operators
The following proposition concerns the relation between our operators and the model-based operators mentioned in the introduction. Proposition 10. 1. Our operators sometimes yield logically stronger revised bases than the Dalal, Weber, and Satoh operators. 2. Our revision operators sometimes yield logically weaker revised bases than the Dalal and Satoh operators. Proof. For (1), consider Example 2. For (2), consider Example 3.
REFERENCES [1] Knowledge in Flux: Modeling the Dynamics of Epistemic States, MIT Press, 1988. [2] C. Alchourron, P. G¨ardenfors, and D. Makinson, ‘On the logic of theory change: Partial meet contraction and revision functions’, Journal of Symbolic Logic, 50(2), 510–530, (1985). [3] M. Dalal, ‘Investigations into a theory of knowledge base revision’, in Proceedings of the Seventh National Conference on Artificial Intelligence, (475-479). [4] R. Fagin, J. Ullman, and M. Vardi, ‘On the semantics of updates in databases’, in Proceedings of the Second ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS-83), pp. 352–365, (1983). [5] Matthew L. Ginsberg, ‘Counterfactuals’, Artificial Intelligence, 30(1), 35–79, (1986). [6] A. Grove, ‘Two modelings for theory change’, Journal of Philosophical Logic, (1988). [7] H. Katsuno and A. Mendelzon, ‘Propositional knowledge base revison and minimal change’, Artificial Intelligence, 52(3), 263–294, (1991). [8] P. Marquis, Handbook on Defeasible Reasoning and Uncertainty Management Systems, volume 5, chapter Consequence Finding Algorithms, 41–145, Kluwer, 2000. [9] B. Nebel, Handbook of Defeasible Reasoning and Uncertainty Management Systems, Volume 3: Belief Change, chapter How Hard is it to Revise a Belief Base?, Kluwer, 1998. [10] R. Parikh, Logic, Language, and Computation, volume 2, chapter Beliefs, belief revision, and splitting languages, CSLI Publications, 1999. [11] K. Satoh, ‘Nonmontonic reasoning by minimal belief revision’, in Proceedings of the Fifth International Conference Generation Computer Systems (FGCS-88), pp. 455–462, (1988). [12] A. Weber, ‘Updating propositional formulas’, in Proceedings of the First Conference on Expert Database Systems, pp. 487–500, (1986).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-743
743
Approximate structure preserving semantic matching Fausto Giunchiglia1 , Mikalai Yatskevich 1, Fiona McNeill 2, Pavel Shvaiko1 , Juan Pane1 , Paolo Besana2 Abstract. Typical ontology matching applications, such as ontology integration, focus on the computation of correspondences holding between the nodes of two graph-like structures, e.g., between concepts in two ontologies. However, there are applications, such as web service integration, where we may need to establish whether full graph structures correspond to one another globally, preserving certain structural properties of the graphs being considered. The goal of this paper is to provide a new matching operation, called structure preserving matching. This operation takes two graph-like structures and produces a set of correspondences between those nodes of the graphs that correspond semantically to one another, (i) still preserving a set of structural properties of the graphs being matched, (ii) only in the case if the graphs are globally similar to one another. We present a novel approximate structure preserving matching approach that implements this operation. It is based on a formal theory of abstraction and on a tree edit distance measure. We have evaluated our solution with encouraging results.
1
INTRODUCTION
Many varied solutions of matching have been proposed so far [1]3 . In this paper we focus on a particular type of matching, namely structure preserving matching. Similarly to the conventional ontology matching, structure preserving matching finds correspondences between semantically related nodes of the graphs. Differently from it, it preserves a set of structural properties (e.g., vertical ordering of nodes) and establishes whether two graphs are globally similar. These characteristics of matching are required in web service integration applications, see, e.g., [5]. Let us consider an example of approximate structure preserving matching between two web services: get wine(Region, Country, Color, Price, Number of bottles) and get wine(Region(Country, Area), Colour, Cost, Year, Quantity), see Figure 1. In this case the first web service description requires the fourth argument of the get wine function (Color) to be matched to the second argument (Colour) of the get wine function in the second description. Also, Region in T 2 is defined as a function with two arguments (Country and Area), while in T 1, Region is an argument of get wine. Thus, Region in T 1 must be passed to T 2 as the value of the Area argument of the Region function. Moreover, Year in T 2 has no corresponding term in T 1. Notice that detecting these correspondences would have not been possible in the case of exact matching by its definition. In order to guarantee a successful web service integration, we are only interested in the correspondences holding among the nodes of the trees underlying the given web services in the case when the web 1
University of Trento, Italy, email:{fausto,yatskevi,pavel,pane}@dit.unitn.it University of Edinburgh, Scotland, email:{f.j.mcneill,p.besana}@ed.ac.uk 3 See, http://www.ontologymatching.org for a complete information on the topic. 2
get Wine T1
Region Country Price Color Number of bottles
get Wine Region
T2
Country Area Colour Cost Year Quantity
Figure 1: Two approximately matched web services represented as trees. Functions are in rectangles with rounded corners; they are connected to their arguments by dashed lines. Node correspondences are indicated by arrows.
services themselves are similar enough. At the same time the correspondences have to preserve two structural properties of the descriptions being matched: (i) functions have to be matched to functions and (ii) variables to variables. Thus, for example, Region in T 1 is not linked to Region in T 2. Finally, let us suppose that the correspondences on the example of Figure 1 are aggregated into a single similarity measure between the trees under consideration, e.g., 0.62. If this global similarity measure is higher than empirically established threshold (e.g., 0.5), the web services under scrutiny are considered to be similar enough, and the set of correspondences showed in Figure 1 is further used for the actual web service integration.
2
THE APPROACH
The matching process is organized in two steps: (i) node matching and (ii) tree matching. Node matching solves the semantic heterogeneity problem by considering only labels at nodes and contextual information of the trees. We use here the S-Match system [4]. Technically, two nodes n1 ∈ T 1 and n2 ∈ T 2 match iff: c@n1 R c@n2 holds, where c@n1 and c@n2 are the concepts at nodes n1 and n2 , and R ∈ {=, , '}. In semantic matching [2] as implemented in the S-Match system [4] the key idea is that the relations, e.g., equivalence and subsumption, between nodes are determined by (i) expressing the entities of the ontologies as logical formulas and by (ii) reducing the matching problem to a logical validity problem. Specifically, the entities are translated into logical formulas which explicitly express the concept descriptions as encoded in the ontology structure and in external resources, such as WordNet. This allows for a translation of the matching problem into a logical validity problem, which can then be efficiently resolved using sound and complete state of the art satisfiability solvers. Notice that the result of this stage is the set of one-to-many correspondences holding between the nodes of the trees. For example, initially Region in T 1 is matched to both Region and Area in T 2. Tree matching exploits the results of the node matching and the structure of the trees to find if these globally match each other as
744
F. Giunchiglia et al. / Approximate Structure Preserving Semantic Matching
Table 1: The correspondence between abstraction operations, tree edit operations and costs. Abstraction operations t1 P d t2 t1 D t2 t1 P t2 t1 P d t2 t1 D t2 t1 P t2 t1 = t2
Tree edit operations a→b a→b a→λ a→b a→b a→λ a=b
Preconditions of operations a b; a and b correspond to predicates a b; a and b correspond to functions or constants a corresponds to predicates, functions or constants a b; a and b correspond to predicates a b; a and b correspond to functions or constants a corresponds to predicates, functions or constants a = b; a and b correspond to predicates, functions or constants
follows: Matching via abstraction. Given the correspondences produced by the node matching and based on the work in [3], the following abstraction operations are used in order to select only those correspondences that preserve the desired properties, namely that functions are matched to functions and variables to variables: Predicate: Two or more predicates are merged, typically to the least general generalization in the predicate type hierarchy, e.g., Bottle(X) + Container(X) → Container(X). We call Container(X) a predicate abstraction of Bottle(X) or Container(X) 'P d Bottle(X). Conversely, we call Bottle(X) a predicate refinement of Container(X) or Bottle(X) P d Container(X). Domain: Two or more terms are merged, typically by moving the functions or constants to the least general generalization in the domain type hierarchy, e.g., Acura + Nissan → Nissan. Similarly to the previous item we call Nissan a domain abstraction of Acura or Nissan 'D Acura. Propositional: One or more arguments are dropped, e.g., Bottle(A) → Bottle. We call Bottle a propositional abstraction of Bottle(A) or Bottle 'P Bottle(A). For example, predicate and domain abstraction/refinement operations do not convert a function into a variable. Therefore, the one-tomany correspondences returned by the node matching should be further filtered based on the allowed abstraction/refinement operations. For instance, the correspondence that binds Region in T 1 and Region in T 2 should be discarded, while the correspondence that binds Region in T 1 and Area in T 2 should be preserved. Tree edit distance via abstraction operations. We look for a composition of the abstraction/refinement operations allowed for the given relation R that are necessary to convert one tree into another. We represent abstraction/refinement operations as tree edit distance operations applied to the term trees. The tree edit distance problem involves three operations: (i) vertex deletion (υ → λ), (ii) vertex insertion (λ → υ), and (iii) vertex replacement (υ → ω) [6]. Our proposal is to restrict the formulation of the tree edit distance problem in order to reflect the semantics of the first-order terms. In particular, we redefine the tree edit distance operations in a way that will allow them to have one-to-one correspondence to the abstraction/refinement operations, see Table 1. Global similarity between trees. Since we compute the composition of the abstraction/refinement operations that are necessary to convert one term tree into the other, we are interested in the minimal cost of this composition. Global similarity between two trees is computed as shown in Eq. 1, where S stands for the set of the allowed tree edit operations; ki stands for the number of i-th operations necessary to convert one tree into the other and Costi defines the cost of the i-th operation, see Table 1. min ki ∗ Costi i∈S TreeSim(T1,T2) = 1 − (1) max(sizeof (T1), sizeof (T2))
CostT 1=T 2 1 1 1 1 1 1 0
CostT 1T 2 ∞ ∞ ∞ 1 1 1 0
CostT 1 T 2 1 1 1 ∞ ∞ ∞ 0
The highest value of TreeSim computed for CostT 1=T 2 , CostT 1T 2 and CostT 1T 2 is selected as the one ultimately returned. In the case of example of Figure 1, when we match T 1 with T 2, TreeSim would be 0.62 for both CostT 1=T 2 and CostT 1T 2 .
3
EVALUATION
We have evaluated our approach on different versions of SUMO and AKT ontologies4 . These are both first-order ontologies, out of which 132 pairs of trees (first-order logic terms) were used. The matching quality results are shown in Figure 2. Note that F-Measure values exceed 70% for the given range of the cut-off thresholds. The average execution time per matching task on a standard laptop was 93ms.
Figure 2: Evaluation results.
4
CONCLUSIONS
We have presented an approximate structure preserving semantic matching approach that implements the structure preserving matching operation. It is based on a theory of abstraction and a tree edit distance. We have evaluated our solution with encouraging results. Future work includes conducting extensive and comparative testing in real-world scenarios. Acknowledgements. We appreciate support from the OpenKnowledge European STREP (FP6-027253).
REFERENCES [1] J. Euzenat and P. Shvaiko, Ontology matching, Springer, 2007. [2] F. Giunchiglia and P. Shvaiko, ‘Semantic matching’, The Knowledge Engineering Review, 18(3), (2003). [3] F. Giunchiglia and T. Walsh, ‘A theory of abstraction’, Artificial Intelligence, 57(2-3), (1992). [4] F. Giunchiglia, M. Yatskevich, and P. Shvaiko, ‘Semantic matching: Algorithms and implementation’, Journal on Data Semantics, IX, (2007). [5] M. Klusch, B. Fries, and K. Sycara, ‘Automated semantic web service discovery with OWLS-MX’, in Proceedings of AAMAS, (2006). [6] K.-C. Tai, ‘The tree-to-tree correction problem.’, Journal of the ACM, 26(3), (1979). 4
See http://dream.inf.ed.ac.uk/projects/dor/ for full versions of these ontologies and analysis of their differences.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-745
745
Discovering Temporal Knowledge from a Crisscross of Timed Observations Nabil Benayadi and Marc Le Goc 1 Abstract. This paper is concerned with the discovering of temporal knowledge from a sequence of timed observations provided by a system monitoring of dynamic process. The discovering process is based on the Stochastic Approach framework where a series of timed observations is represented with a Markov chain. From this representation, a set of timed sequential binary relations between discrete event classes is discovered with an abductive reasoning and represented as abstract chronicle models. To reduce the search space as close as possible to the potential relations between the process variables, we propose to characterize a set of series of timed observations with a unique measure of the homogeneity of the crisscross of class occurrences and to use this measure to prune abstract chronicle models.
1
INTRODUCTION
When supervising and monitoring dynamic processes, a very large amount of timed messages (alarms or simple records) are generated and collected in databases. Mining these databases allows to discover the underlying relations between the variables that govern the dynamic of the process. This paper addresses this problem in the framework of the Stochastic Approach [2] where a timed message is considered as a timed observation that is represented with an occurrence of a discrete event class Ci = {(xi , δi )} linking a variable xi and a constant δi . The BJT4T algorithm represents a set of sequences Ω of discrete event class occurrences with a first order Markov chain and uses an abductive reasoning to identify the set of the most probable timed sequential binary relations between classes. A timed sequential bi+ i j nary relation R(Ci ,C j , [τ− i j , τi j ]) is an oriented relation C → C ) bei j tween two classes C and C that is timed constrained with the in+ i j terval [τ− i j , τi j ]. A set M = {(C → C )} of timed sequential binary relations constitutes an abstract chronicles model which is used by the BJT4S algorithm (BJT for Signatures) to look for the n-ary relations in Ω. The search space being generally so large, measures of the ”interestingness” of a timed relation are required to focus to the minimal set of hypothesis. To this aim, we defined a measure, called the BJ-measure, of the homogeneity of the crisscross (i.e. interlacing) of series of timed observations that is an temporal version of the J-measure of [3].
2
The BJ-Measure
Lets Ω a sequence of |Ω| occurrences of a set of classes Ci ∈ Cω , Ci and Co two classes in Cω , N(Ci ), N(Co ) and N(Ci ,Co ) respec-
tively the occurrence number in Ω of the classes Ci and Co and of the couple (Ci ,Co ). According to the memoryless property of a Markov chain, a timed sequential binary relation Ci → Co is associated with a discrete memoryless channel [1] that links the values of two random binary variables X = {Ci , ¬Ci } and Y = {Co , ¬Co }, where ¬Ci ≡ Cω − {Ci } and ¬Co ≡ Cω − {Co } so that: p(Ci ) = N(Co ) N(Ci ,C j ) , p(Co |Ci ) = N(Ci ) , |Ω|
N(Ci ) , p(Co ) |Ω|
tion of the J-measure can be adapted to define a BJL-measure that evaluates the homogeneity of the crisscross towards the future (i.e. from the Ci class to the Co class) and a BJW-measure that evaluates the homogeneity of the crisscross towards the past (i.e. from the Co class to the Ci class). These two measures will then be combined to define the BJ-Measure of a timed sequential binary relation Ci → Co . Definition 1 Considering a timed sequential binary relation Ci → Co such that p(Co |Ci ) > p(Co ), the BJL(Ci → Co ) measure is given by the following formula : BJL(Ci → Co ) = p(Co |Ci ) × log2 ( +
(1−p(Co |Ci )) |Cω |−1
× log2 (
(1−p(Co |Ci )) ) 1−p(Co )
p(Co |Ci ) ) p(Co )
(1)
where |Cω | is the number of event classes in ω. The BJL-measure has the following properties: • if p(Co |Ci ) ≤ p(Co ) then BJL(Ci → Co ) = 0 • if sequence ω consists only of two classes occurrences Ci and Co ,|Cω | = 2, the BJL(Ci → Co ) behaves like j-measure. • for p(Co |Ci ) = p(Co ), BJL(Ci → Co ) increase when N(Cω ) increase. • for p(Co |Ci ) = 1, BJL(Ci → Co ) is maximal (= log2 ( p(C1 o ) )). Definition 2 Considering a timed sequential binary relation Ci → Co such that p(Ci |Co ) > p(Ci ), the BJW(Ci → Co ) measure is given by the following formula : BJW(Ci → Co ) = p(Ci |Co ) × log2 ( +
(1−p(C |C )) |Cω |−1 i
o
× log2 (
(1−p(C |C )) ) 1−p(Ci ) i
o
p(Ci |Co ) ) p(Ci )
(2)
Co ) is null at the same A noticeable property is that BJW(Ci → i o point as BJL(C → C ). This property of symmetry is a consequence of Bayes’ rule:
p(Co |Ci ) p(Co )
=
p(Ci |Co ) . p(Ci )
The Figure 1 shows the
(abscissa) and the corresponding BJL(Ci → Co ) (orN dinate) for different ratio θ = NCCoi . When the numbers of the oc-
BJW(Ci
Co ) →
N
1
LSIS- University AIX-Marseille III France email: {nabil.benayadi, marc.legoc}@lsis.org
=
p(¬Co |Ci ) = 1 − p(Co |Ci ). The ”j” func-
currences of the classes Ci and Co are equals (i.e. θ = NCCoi = 1), BJL(Ci → Co ) = BJW(Ci → Co ) and the corresponding curve is
746
N. Benayadi and M. Le Goc / Discovering Temporal Knowledge from a Crisscross of Timed Observations
the diagonal. The maximum point of the diagonal corresponds to a perfectly homogeneous crisscross of occurrences with N(Ci ,Co ) = N(Ci ) = N(Co ): each occurrence of the Ci class is followed with an occurrence of the Co class and each occurrence of the Co is preceded with an occurrence of the Co class. The minimum point of the diagonal (i.e. the origin) corresponds to BJL(Ci → Co ) = BJW(Ci → Co ) = 0: the occurrences of the Ci and the Co classes are not interlaced. It is to note also that the curves of Figure 1 corresponding to θ and θ−1 are symmetric according to the diagonal (i.e. BJL(Ci → Co ) = BJW(Ci → Co )). ⎛ 1 ⎞ ⎟ log⎜⎜ o 1⎟ ⎝ p (C ) ⎠
N (C i , C o ) = 100
θ =
0,8
4 θ = 6 4 θ = 7 4 θ = 8
0,7 0,6
B JM -L
4 5
i
N (C i , C o ) = 98
o
N (C , C ) = 97 i
The α function provides then a simple mean to interpret the BJmeasure of a crisscross of a series of timed observations.
3
Application to SACHEM system
Our approach has been applied to sequences generated by the SACHEM knowledge-based system developed at the end the 20th century to help the operators to monitor, diagnose and control the blast furnace [2]. We are interested with the omega variable that reveals the quality of the management of the whole blast furnace. The studied sequence contains 7682 occurrences of 45 discrete
θ =1
N (C i , C o ) = 99
0,9
• α(Ci → Co ) = 0.75 when 75 of the 100 occurrences of class Ci is followed by one of the 100 occurrences of class Co . • α(Ci → Co ) = 0.5 when 50 of the 100 occurrences of class Ci is followed by one of the 100 occurrences of class Co .
o
N (C , C ) = 96 N (C i , C o ) = 95 N (C i , C o ) = 94
0,5 0,4
θ =
TGS
5 4
1454 1455
TGS
0,3
θ =
0,2
7 θ = 8 4 θ = 4
0,1 0 0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
6 4
FT 0,8
1217 1216
ω
1463 1464
0,9log⎛⎜⎜ 11
⎞ ⎟ i ⎟ ⎝ p (C ) ⎠
BJM -W
SS
N
Figure 1: BJL and BJW measures with different ratios θ = NCo C
(a)
Definition 3 The BJ-measure of a.timed sequential/binary relation BJL(Ci → Co ) Ci → Co is the norm of the vector : BJW(Ci → Co ) > BJM(Ci → Co ) = BJL(Ci → Co )2 + BJW(Ci → Co )2 (3) N
The BJ-measure depends only of the rates θ = NCCoi , which made difficult the comparison between two crisscross. This is the aim of the α(Ci → Co ) function. Definition 4 The α(Ci → Co ) function provided the value, projected N in the interval [0.5, 1], corresponding to the BJM(Ci → Co ) if θ = NCCoi was equal to 1. BJM(Ci → Co ) + 0.5 2 × max(BJM(Ci → Co ))
BD
i
The BJ-Measure aims to provide a general mean to evaluate and to represent the homogeneity of the crisscross of any series of classes occurrences.
α(Ci → Co ) =
FT
ω
(4)
where max(BJM(Ci → Co )) is the maximal value of the BJ-measure for a given θ (i.e. when N(Ci ,Co ) = min(N(Ci ), N(Co )) for any NCi and NCo ). The α(Ci → Co ) is illustrated with the red squares along the diagonal of Figure 1 when NCi = NCo = 100: • α(Ci → Co ) = 1 when each of the 100 occurrences of the class Ci is followed by a one and only one of the 100 occurrences of class Co and inversely (perfect crisscross). • α(Ci → Co ) = 0.99 when 99 of the 100 occurrences of class Ci is followed by one of the 100 occurrences of class Co .
SS
BD
1717 1718 1721
1256 1267 1260
(b)
Figure 2: Expert’s (1995, a) and discovered relations (2007, b)
event classes of the SACHEM system at Fos-Sur-Mer (France) from 08/01/2001 to 31/12/2001. For the 1463 class linked to the omega variable, the BJT4T algorithm provides a chronicle model with 205 = 3, 200, 000 sequential binary relations. Applying the BJ-measure to prune this tree, the BJT4P algorithm produces a tree with 195 nodes (the the pruning method is given in [2]). The reduction factor is greater than 16,000 and the pruned tree can then be used by the BJT4S algorithm to look for the set of n-ary relations observed in the sequence. When substituting a class withe the corresponding variable, this set becomes the graph (b) of Figure 2. The only difference with the Expert’s knowledge formulated in 1995 (graph a) is the direction of the relation between the variables FT and BD. This result shows that the branches with a high BJ-measure have a strong potentiality to be reveal some knowledge about the relations between the variables of a process. It is to note that the same result is observed with the Apache system, a clone of Sachem design to monitor and diagnose a galvanization bath.
REFERENCES [1] C.E.Shannon and W. Weaver, ‘The mathematical theory of communication’, University of Illinois Press, 27, 379–423, (1949). [2] M. Le Goc and N. Benayadi, ‘Discovering experts knowledge from sequences of discrete event class occurrences’, Proceedings of the 10th International Conference on Enterprise Information Systems (ICEIS08), (June 12-16 2008). [3] P. Smyth and R. M. Goodman, ‘An information theoretic approach to rule induction from databases’, IEEE Transactions on Knowledge and Data Engineering 4, 301–316, (1992).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-747
747
Fred meets Tweety Antonis Kakas1 and Loizos Michael2 and Rob Miller3 Abstract. We propose a framework that brings together two major forms of default reasoning in Artificial Intelligence: applying default property classification rules in static domains, and default persistence of properties in temporal domains. Particular attention is paid to the central problem of qualification. We illustrate how previous semantics developed independently for the two separate forms of default reasoning naturally lead to the integration that we propose, and how this gives rise to domains where different types of knowledge interact and qualify each other while preserving elaboration tolerance.
1
Introduction
Tweety is watching as we load the gun, wait, and then shoot Fred. Should we conclude that Tweety will fly away as birds normally do when they hear a loud noise as that normally produced by shooting a loaded gun? It depends on whether Tweety can fly! This belief, in turn, depends on whether Tweety is only known to be a bird, or also known to be a penguin. What can we conclude about Fred if after the act of shooting Fred we observe that Tweety is still on the ground? In this problem of “Fred meets Tweety” we need to bring together two major forms of default reasoning that have been extensively studied on their own in A.I., but have rarely been addressed in the same formalism. These are default property classification as applied to inheritance systems [4, 10], and default persistence central to temporal reasoning in theories of Reasoning about Action and Change (RAC) [3, 9, 11]. How can a formalism synthesize the reasoning encompassed within each of these two forms of default reasoning? Central to these two (and indeed all) forms of default reasoning is the qualification problem: default conclusions are qualified by information that can block the application of the default inference. Recent work has shown the importance for RAC theories to properly account for different forms of qualification [5, 12]. In our problem of integrating the default reasoning of property classification into RAC, we study how a static default theory expressing known default relationships between fluents can endogenously qualify the reasoning about actions and change, so that the application of causal laws and default persistence is properly adjusted by this static theory.
2
Knowledge Qualification
One of the first knowledge qualification problems formally studied in A.I. relates to the Frame Problem (see, e.g., [11]) of how the causal change properly qualifies the default persistence. In the archetypical Yale Shooting Problem domain [3], a turkey named Fred is initially alive, and one asks whether it is still alive after loading a gun, waiting, and then shooting Fred. The lapse of time cannot cause the gun 1 2 3
University of Cyprus, P. O. Box 20537, CY-1678, Cyprus. e-mail: antonis@ucy.ac.cy Harvard University, Cambridge, MA 02138, U.S.A. e-mail: loizos@eecs.harvard.edu University College London, London WC1E 6BT, U.K. e-mail: rsm@ucl.ac.uk
to become unloaded. Default persistence is qualified only by known events and known causal laws linked to these events. The consideration of indirect action effects gave rise to the Ramification Problem (see, e.g., [7]) of how these effects are generated and qualify persistence. Static knowledge expressing domain constraints was introduced to encode such indirect action effects. In early solutions to the Ramification Problem a direct action effect would cause this static knowledge to be violated, unless a minimal set of indirect action effects were also assumed so as to maintain consistency [7, 8]. Thus, given the static knowledge that “dead birds do not walk”, the shooting action causing Fred to be dead would also indirectly cause Fred to stop walking, qualifying the persistence of the latter property. Subsequent work examined default causal knowledge, bringing to focus the Qualification Problem4 (see, e.g., [12]) of how such default causal knowledge is qualified by domain constraints. In some solutions to the Qualification Problem, the role of static knowledge within the domain description was identified as that of endogenously qualifying causal knowledge, as opposed to aiding causal knowledge in qualifying persistence [5]. Observations after action occurrences also qualify causal change when the two are in conflict, a problem known as the Exogenous Qualification Problem (see, e.g., [5]). Independently of the above, another qualification problem was examined in the context of Default Static Theories [10] that consider how observed facts qualify default static knowledge. In the typical domain one asks whether Tweety is able to fly, when it is only known to be a bird. In the absence of any explicit information on whether Tweety is able to fly, the theory predicts that it is, but retracts this prediction once the extra fact that Tweety is a penguin is added. In this paper we investigate temporal domains that incorporate (possibly) default static theories. The technical challenge lies in understanding how the four types of knowledge in a domain, three of which may now be default, interact and qualify each other. To illustrate some of these interactions we employ the syntax of the action description language ME [5]. Strict static knowledge is encoded in propositional logic. Default static knowledge is encoded in terms of default rules of the form “φ ψ”, where φ, ψ are propositional formulas; an informal reading of such default rules suffices for this section. Formulas with variables are used as a shorthand notation for the set of all of their groundings over a finite domain of constants. ClapHands causes Noise Noise causes Fly(x) Noise causes ¬Noise Penguin(Tweety) holds-at 1 ClapHands occurs-at 3 ClapHands occurs-at 7
static theory: (1) Penguin(x) ¬CanFly(x) (2) Penguin(x) → Bird(x) (3) Bird(x) CanFly(x) rule (1) overrides rule (3) (4) ¬CanFly(x) → ¬Fly(x)
The default persistence of “Penguin(Tweety) holds-at 1” implies, through the static theory, that “¬CanFly(Tweety)” holds everywhere. This, then, qualifies the causal generation of “Fly(Tweety)” by the 4
Not to be confused with the broader sense of the term qualification we use.
748
A. Kakas et al. / Fred Meets Tweety
action “ClapHands” at time-points 3 and 7. If, on the other hand, the observation “Fly(Tweety) holds-at 5” is added, then the static theory is qualified itself, and does no longer qualify the causal generation of “Fly(Tweety)”. Note, however, that Tweety flies for an exogenous reason. If an action at time-point 6 were to cause Tweety to stop flying, this would release the static theory’s default conclusion that penguins do not fly. The action “ClapHands occurs-at 7” would then be qualified and would not cause Tweety to fly again. What would happen if “Noise” was caused at time-point 3 because Fred, a turkey that is initially alive, was shot; and we only knew that Tweety is a bird? Then, we would conclude that Fred is dead from time-point 3 onwards, and also that Tweety is flying. If, however, one observes “¬Fly(Tweety) holds-at 4”, then whether Fred is dead depends on why Tweety did not fly after Fred was shot! The observation by itself does not explain why the causal laws that would normally cause Tweety to fly were qualified. An endogenous explanation would be that Tweety is a penguin, and “Fly(Tweety)” is qualified from being caused. An exogenous explanation would be that Tweety could not fly due to exceptional circumstances (e.g., an injury). However, Tweety might not have flown because the shooting action failed to cause a noise, or because it failed altogether. Different conclusions on Fred’s status might be appropriate depending on the explanation.
3
Formal Semantics of Integration
Four different types of information present in a framework of RAC interact and qualify each other: (i) information generated by default persistence, (ii) action laws that qualify default persistence, (iii) static default laws of fluent relationships that can qualify these action laws, and (iv) observations that can qualify any of these. This hierarchy of information comes full circle, as the bottom layer of default persistence of observations (which carry the primary role of qualification) can also qualify the static theory. Due to the cyclical nature of the qualifications, we develop the formal semantics in two steps. For the temporal semantics we follow the semantics of ME [5], which accounts for the qualification of causal knowledge by a given strict static theory. Causal knowledge in ME is qualified so as to ensure that the static theory is never violated at the observable time scale. We extend that semantics by proposing that the qualification comes from an external set α(T ) of admissible states that might depend on the time-point T . Thus, we end up with a semantics that given an externally provided admissibility requirement α, computes the temporal evolution of states so as to ensure that the state of the world at time-point T always lies within the set of admissible states α(T ). The details of the temporal semantics of ME are largely orthogonal to the next step of determining how α is computed. An externally qualified model of a domain description D given an admissibility requirement α is any mapping of time-points to states such that (1) the world is initially in an admissible state; (2) it changes in an admissible manner; and it holds that (3.i) literals not caused to change persist, and (3.ii) caused change is realized. The admissibility requirement is determined by the static theory after being qualified by the combined effect of observations and persistence. We model this effect by considering virtual extensions of a domain D that contain additional virtual observations. Virtual observations are not meant to capture abnormal situations, but rather persistence of known observations from other time-points. The minimization of virtual observations that we impose later guarantees that known observations persist only as needed to achieve this effect. At every time-point T , we consider the static theory and the observations (including virtual ones) at T . The extensions of this default theory determine a particular set of admissible states α(T ). An in-
ternally qualified model of a domain description D is an externally qualified model of D given this admissibility requirement α. Given a domain description D, we consider its virtual extensions that have internally qualified models. Among those, we choose the ones with a minimal set of virtual observations. The internally qualified models of these virtual extensions of D are the models of D. Observations in our semantics act as the knowledge that bootstraps reasoning. Since every other type of knowledge is amenable to qualification, a strong elaboration tolerance result can be established. Theorem 1 (Elaboration Tolerance Theorem) Let D be a consistent domain, D a domain with no observations, and D ∪ D their union, where the static theories of D and D are merged together to form the single static theory of D ∪ D . We assume that the static theory of D ∪ D is consistent. Then, D ∪ D is a consistent domain.
4
Concluding Remarks
We have presented an integrated formalism for reasoning with both default static and default causal knowledge, two problems that have been extensively studied in isolation from each other. The proposed solution applies to domains where the static knowledge is “stronger” than the causal knowledge, and qualifies excessive change caused by the latter. A more detailed exposition of our developed formalism, including a tentative solution of how to encode causal laws that are “stronger” than the static knowledge, appears in [6]. Our future research agenda includes further investigation of such “strong” causal knowledge, and of how “strong” static knowledge can generate extra (rather than block) causal change. We also plan to develop computational models corresponding to the presented theoretical framework, using, for example, ideas from argumentation. Although we are unaware of any previous work explicitly introducing Fred to Tweety, much work has been done on the use of default reasoning in inferring causal change. In the context of the Qualification Problem see [2, 12]. For distinguishing between default and non-default causal rules in the context of the Language C+ see [1].
REFERENCES [1] S. Chintabathina, M. Gelfond, and R. Watson, ‘Defeasible Laws, Parallel Actions, and Reasoning about Resources’, in Proc. of Commonsense’07, pp. 35–40, (2007). [2] P. Doherty, J. Gustafsson, L. Karlsson, and J. Kvarnstr¨om, ‘TAL: Temporal Action Logics Language Specification and Tutorial’, ETAI, 2(3– 4), 273–306, (1998). [3] S. Hanks and D. McDermott, ‘Nonmonotonic Logic and Temporal Projection’, AIJ, 33(3), 379–412, (1987). [4] J. Horty, R. Thomason, and D. Touretzky, ‘A Skeptical Theory of Inheritance in Nonmonotonic Semantic Networks’, AIJ, 42(2–3), 311–348, (1990). [5] A. Kakas, L. Michael, and R. Miller, ‘Modular-E: An Elaboration Tolerant Approach to the Ramification and Qualification Problems’, in Proc. of LPNMR’05, pp. 211–226, (2005). [6] A. Kakas, L. Michael, and R. Miller, ‘Fred meets Tweety’, in Proc. of CogRob’08, (2008). [7] F. Lin, ‘Embracing Causality in Specifying the Indirect Effects of Actions’, in Proc. of IJCAI’95, pp. 1985–1991, (1995). [8] F. Lin and R. Reiter, ‘State Constraints Revisited’, J. of Logic and Comp., 4(5), 655–678, (1994). [9] J. McCarthy and P. Hayes, ‘Some Philosophical Problems from the Standpoint of Artificial Intelligence’, Mach. Intel., 4, 463–502, (1969). [10] R. Reiter, ‘A Logic for Default Reasoning’, AIJ, 13(1–2), 81–132, (1980). [11] M. Shanahan, Solving the Frame Problem: A Mathematical Investigation of the Common Sense Law of Inertia, MIT Press, 1997. [12] M. Thielscher, ‘The Qualification Problem: A Solution to the Problem of Anomalous Models’, AIJ, 131(1–2), 1–37, (2001).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-749
749
Definability in Logic and Rough Set Theory 1 Tuan-Fang Fan2 and Churn-Jung Liau3 and Duen-Ren Liu4 Abstract. Rough set theory is an effective tool for data mining. According to the theory, a concept is definable if it can be written as a Boolean combination of equivalence classes induced from classification attributes. On the other hand, definability in logic has been explicated by Beth’s theorem. In this paper, we propose two data representation formalisms, called first-order data logic (FODL) and attribute value-sorted logic (AVSL), respectively. Based on these logics, we explore the relationship between logical definability and rough set definability.
1
Rough Set Theory—A Review
The basic construct of rough set theory is an approximation space, which is defined as a pair (U, R), where U is the universe and R ⊆ U × U is an equivalence relation on U . We can write an equivalence class of R as [x]R if it contains the element x. Note that [x]R = [y]R iff (x, y) ∈ R. In philosophy, the extension of a concept is defined as the objects that are instances of the concept. Following the terminology, a subset of the universe is called a concept or a category in rough set theory. Given an approximation space (U, R), each equivalence class of 1 2
3 4
RX = {x ∈ U | [x]R ⊆ X},
Introduction
In recent years, knowledge discovery in databases (KDD) and data mining have received more and more attention because of their practical applications. The rough set theory proposed by Pawlak provides an effective tool for extracting knowledge from data tables [3]. To represent and reason about the extracted knowledge, a decision logic (DL) is also proposed in [3]. The semantics of the logic is defined in a Tarskian style through the notions of models and satisfaction. While DL is an instance of propositional logic, we can also represent knowledge in data tables by using first-order logic (FOL)[2] or many-sorted first-order logic (MSFOL). In this paper, we investigate the definability of concepts in the context of these alternative logical descriptions of data tables. In the next section, we review rough set theory, with the emphasis on the notion of definability. Then, in Sections 3 and 4, we propose first-order data logic and attribute value-sorted logic for the description of data tables respectively, and discuss the relationship between logical definability and rough set definability in the context of these logics. We conclude the paper in Section 5.
2
R is called an R-basic category or R-basic concept, and any union of R-basic categories is called an R-category. Now, for an arbitrary concept X ⊆ U , we are interested in the definability of X by using R-basic categories. We say that X is R-definable if X is an Rcategory; otherwise X is R-undefinable. The R-definable concepts are also called R-exact sets, whereas R-undefinable concepts are said to be R-inexact or R-rough. A rough set can be approximated by two exact sets, called the lower approximation and upper approximation of X, respectively, and defined as follows:
This work was partially supported by NSC (Taiwan) under grants 95-2221E-001-029-MY3 Department of Computer Science and Information Engineering, National Penghu University, Penghu 880, Taiwan, email:dffan@npu.edu.tw, and Institute of Information Management, National Chiao-Tung University, Hsinchu 300, Taiwan, email: tffan.iim92g@nctu.edu.tw Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, email: liaucj@iis.sinica.edu.tw Institute of Information Management, National Chiao-Tung University, Hsinchu 300, Taiwan, email: dliu@iim.nctu.edu.tw
RX = {x ∈ U | [x]R ∩ X = ∅}. Obviously, a set X is R-definable iff RX = RX. In data mining problems, the equivalence relation is determined by the attributes (features) used to classify objects. Two objects are equivalent if they have the same values in every such attribute. Thus, intuitively, a concept is definable in rough set theory if it can be precisely described by such attributes.
3
Definability in First-order Data Logic
To describe data tables by (a fragment of) FOL, we use an instance of function-free monadic predicate logic, called first-order data logic (FODL). The alphabet (or vocabulary) of FODL consists of a set of constant symbols, a finite set of monadic predicate symbols, a set of variables, Boolean connectives (¬, ∧, ∨, ⊃, ≡), and the quantifiers (∀, ∃). The syntax and semantics of FODL are the same as those of ordinary FOL[2]. Based on FODL, we can formulate the definability of a concept in rough set theory precisely. In the language of FODL, a concept corresponds to a predicate, and the equivalence relation in an approximation space can be determined by a set of predicates. Let S be a subset of predicates. Then the following formula defines an indiscernibility relation (with respect to S): ^ ηs (x, y) = P (x) ≡ P (y). P ∈S
Given an arbitrary predicate P , we can define two formulas corresponding to the lower and upper approximations of P : Ps (x) = ∀y(ηs (x, y) ⊃ P (y)), Ps (x) = ∃y(ηs (x, y) ∧ P (y)). Let Γ be an FODL theory that contains only predicate symbols in S ∪ {P }. Then we say that P is S-definable with respect to Γ if Γ |= ∀x(Ps (x) ≡ Ps (x)), where |= means the semantic consequence relation in FODL.
750
T.-F. Fan et al. / Definability in Logic and Rough Set Theory
In classical logic, the definability of a predicate is explicated by the well-known Beth’s definability theorem[1]. The theorem states that explicit definability is equivalent to implicit definability. Let Γ be an FODL theory that contains only predicate symbols in S ∪ {P }. Then Γ explicitly defines P if there exists a wff ϕ(x) that contains only predicate symbols in S such that
attribute domain variable, and S be a subset of the index set I. Then we can define the indiscernibility formula (with respect to S) as: ^ εs (x, y) = ∀v(Ri (x, v) ≡ Ri (y, v)).
Γ |= ∀x(ϕ(x) ≡ P (x)).
εPs (x) = ∀y(εs (x, y) ⊃ P (y)),
We say that Γ implicitly defines P if for any A, B ∈ M od(Γ) such that QA = QB for all Q ∈ S, we have P A = P B , where M od(Γ) is the set of models of Γ. In effect, the implicit definability of a predicate P means the possibility of uniquely characterizing P . The primary objective of this paper is to establish the relationship between logical definability and rough set definability.
εPs (x) = ∃y(εs (x, y) ∧ P (y)).
Theorem 1 Let Γ be an FODL theory that contains only predicate symbols in S ∪ {P }. Then the explicit (or implicit) definability of P in Γ implies that P is S-definable with respect to Γ.
4
Definability in Attribute Value-sorted Logic
i∈s
Again, given an arbitrary concept predicate P , we can define two formulas corresponding to its lower and upper approximations:
Let Γ be an AVSL theory that contains only predicate symbols in {Ri | i ∈ S}∪{P }. Then we say that P is indiscernibly S-definable with respect to Γ if Γ |= ∀x(εPs (x) ≡ εPs (x)). The definition of the explicit and implicit definability of P in Γ is the same as that in the FODL case and, analogously, we have the following theorem. Theorem 2 Let Γ be an AVSL theory that contains only predicate symbols in {Ri | i ∈ S} ∪ {P }. Then the explicit definability of P in Γ implies that P is indiscernibly S-definable with respect to Γ.
In FODL, a monadic predicate intuitively corresponds to an attributevalue pair. However, in many cases, the number of possible values for an attribute may be infinite. In such infinite-domain cases, an infinite number of predicates must be available in FODL, but since the indiscernibility wff ηs can only be defined with respect to a finite subset of predicates S, it is sometimes inadequate. To circumvent such difficulties, we can use many-sorted first-order logic (MSFOL) as the data representation formalism.
In addition to Pawlak’s approximation space, the notion of tolerance approximation spaces has been proposed in [4] to cope with the problem of imprecise boundary regions in rough set theory. The definability of a concept in a tolerance approximation space can also be formulated in AVSL. First, let x, y, v and S be defined as above. Then the tolerance formula (with respect to S) is ^ τs (x, y) = ∃v(Ri (x, v) ∧ Ri (y, v)).
4.1
Second, the lower and upper approximations of a concept predicate P are defined as follows:
Syntax and semantics
We use a special instance of MSFOL, called attribute value-sorted logic (AVSL), to describe data tables. The set of sorts for AVSL is Σ = {σi | i ∈ I} ∪ {σu }, where I is an index set. The sort σu is called the object sort and each σi is called an attribute value sort. As in the case of FODL, the alphabet (or vocabulary) of AVSL consists of constant symbols, predicate symbols, variables, and logical symbols. The only difference is that, in AVSL, a rank function is used to assign a rank to constant symbols, predicate symbols, and variables. The rank of a constant symbol or a variable is an element of Σ, and the rank of a predicate symbol is in Σk if its arity is k. A constant (resp. variable) of rank σu is called an object constant (resp. variable); otherwise, it is called an attribute domain constant (resp. variable). We assume that the set of predicate symbols is the union of a set of monadic predicates and the set of dyadic predicates {Ri | i ∈ I}. For each i ∈ I, Ri is of rank (σu , σi ), and called an attribute predicate. Also, a monadic predicate of rank σu is called a concept predicate; and for each i ∈ I, a monadic predicate of rank σi is called a value predicate. Now, a term is either a constant or a variable, and the rank of the term is that of the constant or variable. If P is a predicate of rank (σ1 , · · · , σk ) and t1 , t2 , · · · , tk are terms of ranks σ1 , σ2 , · · · , σk respectively, then P (t1 , t2 , · · · , tk ) is an atomic formula (k = 1, 2). The formation rules for compound wffs are the same as those for ordinary FOL[2].
4.2
Logical definability
Analogous to the case of FODL, we can formulate the definability of a rough concept in AVSL. Let x and y be object variables, v be an
i∈s
τ Ps (x) = ∀y(τs (x, y) ⊃ P (y)), τ Ps (x) = ∃y(τs (x, y) ∧ P (y)). Finally, let Γ be an AVSL Vtheory that contains only predicate symbols in S ∪ {P } such that { i∈s ∀x∃vRi (x, v)} ⊆ Γ. Then we say that P is tolerantly S-definable with respect to Γ if Γ |= ∀x(τ Ps (x) ≡ τ Ps (x)). Note that, to ensure the reflexivity of the tolerance relation, ∀x∃vRi (x, v) is included in Γ for each i ∈ S. However, logical definability no longer implies rough set definability in terms of the tolerance approximation space.
5
Conclusion
In this paper, we propose using FODL and AVSL for logical descriptions of data tables. Based on these logics, we precisely formulate the notion of definability in rough set theory and discuss its relationship to explicit and implicit definability in classical logic.
REFERENCES [1] E.W. Beth. On padoa’s method in the theory of definition. Indagationes Math., 15:330–339, 1953. [2] E. Mendelson. Introduction to Mathematical Logic. Chapman & Hall/CRC, forth edition, 1997. [3] Z. Pawlak. Rough Sets–Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, 1991. [4] A. Skowron and J. Stepaniuk. Tolerance approximation spaces. Fundamenta Informaticae, 27(2/3):245–253, 1996.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-751
751
WikiTaxonomy: A Large Scale Knowledge Resource Simone Paolo Ponzetto1 and Michael Strube1 Abstract. We present a taxonomy automatically generated from the system of categories in Wikipedia. Categories in the resource are identified as either classes or instances and included in a large subsumption, i.e. isa, hierarchy. The taxonomy is made available in RDFS format to the research community, e.g. for direct use within AI applications or to bootstrap the process of manual ontology creation.
1
INTRODUCTION
Advances in the development of knowledge intensive AI systems crucially depend on the availability of large coverage, machine readable knowledge sources. While tremendous progress in AI has been made in the last decades by investigating data-driven inference methods, we believe that further advancement ultimately depends also on the free access to large repositories of structured knowledge on which these inference techniques can be applied. In this article we approach the problem by using Wikipedia. We present methods for deriving a large coverage taxonomy of classes and instances from the network of categories in Wikipedia and present the RDF Schema we make freely available to the research community.
2
METHODS
We apply in sequence the methods described in Ponzetto & Strube [8] and Zirn et al. [13] in order to generate a semantic network from the system of categories in Wikipedia. 1. We label the relations between category pairs as isa and notisa. This way the category network, which per-se is merely a hierarchical thematic categorization of the topics of articles, is transformed into a subsumption hierarchy with a well-defined semantics. 2. We classify categories as either classes or instances in order to distinguish between isa subsumption and instance-of relations.
2.1
Deriving a taxonomy from Wikipedia
In [8] we presented a set of lightweight heuristics for distinguishing between isa and notisa links in the Wikipedia category network. Syntax-based methods label category links based on string matching of syntactic components of the category labels. They use a full syntactic parse of the category labels to check whether category label pairs share the same lexical head2 (head matching) or the head of a category label occurs as a modifier in another one (modifier matching). 1 2
EML Research gGmbH, Schloss-Wolfsbrunnenweg 33, 69118 Heidelberg, Germany. Website: http://www.eml-research.de/nlp The head of a phrase is the word that determines the syntactic type of the overall phrase of which it is a member. In the case of category labels, it is the main noun of the label, e.g. the noun Scientists for the category label S CIENTISTS WHO COMMITTED SUICIDE.
Connectivity-based methods reason on the structure and connectivity of the categorization network. Instance categorization applies the method from [10] to identify instances from Wikipedia pages to those categories referring to the same entities as the pages. Redundant categorization labels category pairs as in an isa relation by looking for directly connected categories redundantly having a page in common. Lexico-syntactic based methods use lexico-syntactic patterns applied to large text corpora (e.g. Wikipedia itself) to identify isa [4] and part-of relations [2], the latter providing evidence that the relation is not an isa relation. A majority voting scheme based on the number of hits for each set of patterns is used to decide whether the relation is isa or not. Inference-based methods propagate the previously found relations based on the properties of multiple inheritance and transitivity of the isa relation. These methods generate 105,418 isa links from a network of 127,325 categories and 267,707 links. We achieve a score of 87.9 balanced F-measure when evaluating the taxonomy against the subset of ResearchCyc [6] in which the categories can be mapped to.
2.2
Distinguishing between classes and instances
Zirn et al. [13] go one step forward from [8] and classify categories as instances or classes. This step yields a taxonomy with finer grained semantics, and it is necessary since the network contains many categories whose reference is an entity, e.g. the M ICROSOFT category3 , rather than a property of a set of individuals, e.g. M ULTINATIONAL COMPANIES . Similarly to [8], they devise a set of heuristics on which to decide the reference type of a category label and combine the best performing methods for each class into a voting scheme. Given a category c with label l, c is classified as either an instance or a class by the first satisfied criterion. 1. Page & Plural: if no page titled l exists and the lexical head of l is plural, then c is a class. 2. Capitalization & NER: else if l is capitalized and has been recognized by a Named Entity Recognizer as a named entity, then c is an instance. 3. Page: else if no page titled l exists, then c is a class. 4. Plural: else if the head of l is plural, then c is a class. 5. Structure: else if c has no sub-category, then it is a class. 6. Capitalization: else if l is capitalized, then c is an instance. 7. Default: else c is a class. Using the same category network from [8] this pipeline of heuristics is shown to classify 111,652 class and 15,472 instance categories with an accuracy of 84.5% when evaluated against ResearchCyc. 3
We use Sans Serif for words and queries, CAPITALS for Wikipedia pages and S MALL C APS for Wikipedia categories.
752
S.P. Ponzetto and M. Strube / WikiTaxonomy: A Large Scale Knowledge Resource
3
WIKITAXONOMY
We applied the methods from [8] and [13] using the English Wikipedia database dump from 25 September 2006. The extracted taxonomy was converted into RDF Schema [3, RDFS] using the Jena Semantic Web Framework4 . RDFS has a very limited semantics and serves mostly as foundation for other Semantic Web languages. Nevertheless it suffices in the present scenario of data exchange where we have only a set of classes in a hierarchical relation. RDFS in addition provides compatibility with free ontology editors such as Prot´eg´e [5] for visualization, additional manual editing or conversion to richer knowledge representation languages such as OWL [7]. Figure 1 shows a sample fragment of the WikiTaxonomy in RDFS format. In the RDFS data model Wikipedia categories are represented as resources (i.e. a list of rdf:Description elements) and the subsumption relation is modeled straightforwardly using the rdfs:subClassOf property. A human readable version of the name of the category is given via the rdfs:label property and a link to the on-line version of the corresponding page is provided using the rdfs:comment property. In order to distinguish between categories which are instances or classes we use the rdf:type predicate to state whether a resource is a class or an individual of a class. In addition, the distinction is also given in the resource identifier, i.e. the URI-reference.
4
RELATED WORK
Researchers working in information extraction have recently begun to use Wikipedia as a resource for automatically deriving structured semantic content. Suchanek et al. build the YAGO system [10] by merging WordNet and Wikipedia: the isa hierarchy of WordNet is populated with instances taken from Wikipedia pages. Auer et al. present the DBpedia system [1] which generates RDF statements by extracting the attribute-value pairs contained in the infoboxes of the Wikipedia pages (i.e. the tables summarizing the most important attributes of the entity referred by the page), e.g. the pair capital=[[Berlin]] from the GERMANY page. Wu & Weld show in [11] how to augment Wikipedia with automatically extracted information. They propose to ‘autonomously semantify’ Wikipedia by (1) extracting new facts from its text via a cascade of Conditional Random Field models; (2) adding new hyperlinks to the articles’ text by finding the target articles nouns refer to. Wu & Weld’s Kylin Ontology Generator (KOG) [12] is the work closer to ours. Their system builds a subsumption hierarchy of classes by combining Wikipedia infoboxes with WordNet using statistical-relational learning. Each infobox template, e.g. Infobox Country for countries, 4
http://jena.sourceforge.net
represents a class and the slots of the template are considered as the attributes of the class. KOG uses Markov Logic Networks [9] in order to jointly predict both the subsumption relation between classes and their mapping to WordNet. While KOG represents a theoretically sounder methodology than [8] and [13], the lightweight heuristics from the latters are straightforward to implement and show that, when given high quality semi-structured input as in the case of Wikipedia, large coverage semantic networks can be generated by using simple heuristics which capture the conventions governing its public editorial base.
ACKNOWLEDGEMENTS This work has been funded by the Klaus Tschira Foundation, Heidelberg, Germany. The first author has been supported by a KTF grant (09.003.2004).
REFERENCES [1] S¨oren Auer, Christian Bizer, Jens Lehmann, Georgi Kobilarov, Richard Cyganiak, and Zachary Ives, ‘DBpedia: A nucleus for a Web of open data’, in Proc. of ISWC 2007 + ASWC 2007, pp. 722–735, (2007). [2] Matthew Berland and Eugene Charniak, ‘Finding parts in very large corpora’, in Proc. of ACL-99, pp. 57–64, (1999). [3] Dan Brickley and Ramanathan V. Guha, ‘RDF vocabulary description language 1.0: RDF schema’, Technical report, W3C, (2004). http: //www.w3.org/TR/rdf-schema. [4] Marti A. Hearst, ‘Automatic acquisition of hyponyms from large text corpora’, in Proc. of COLING-92, pp. 539–545, (1992). [5] Holger Knublauch, Ray W. Fergerson, Natalya Fridman Noy, and Mark A. Musen, ‘The Prot´eg´e OWL plugin: an open development environment for semantic web applications’, in Proc. of ISWC 2004, pp. 229–243, (2004). [6] Douglas B. Lenat and R. V. Guha, Building Large Knowledge-Based Systems: Representation and Inference in the CYC Project, AddisonWesley, Reading, Mass., 1990. [7] Peter F. Patel-Schneider, Patrick Hayes, and Ian Horrocks, ‘OWL Web Ontology Language semantics and abstract syntax’, Technical report, W3C, (2004). http://www.w3.org/TR/owl-semantics. [8] Simone Paolo Ponzetto and Michael Strube, ‘Deriving a large scale taxonomy from Wikipedia’, in Proc. of AAAI-07, pp. 1440–1445, (2007). [9] Matthew Richardson and Pedro Domingos, ‘Markov logic networks’, Machine Learning, 62, 107–136, (2006). [10] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum, ‘YAGO: A core of semantic knowledge’, in Proc. of WWW-07, pp. 697–706, (2007). [11] Fei Wu and Daniel Weld, ‘Automatically semantifying Wikipedia’, in Proc. of CIKM-07, pp. 41–50, (2007). [12] Fei Wu and Daniel Weld, ‘Automatically refining the Wikipedia infobox ontology’, in Proc. of WWW-08, (2008). [13] C¨acilia Zirn, Vivi Nastase, and Michael Strube, ‘Distinguishing between instances and classes in the Wikipedia taxonomy’, in Proc. of ESWC-08, pp. 376–387, (2008).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-753
753
Computing -Optimal Strategies in Bridge and Other Games of Sequential Outcome Pavel Cejnar1 1 INTRODUTION Bridge is a card game for 4 players in 2 teams, line NS against line WE, consisting of 2 phases. In the first players try to agree on a game contract, in the second one of the players tries to fulfil the contract against the other line, playing cards in rounds and counting won tricks. A Nash equilibrium, or a profile of optimal strategies for each player with no incentive to deviate, exists in Bridge, however it seems to be difficult to compute it for such a large game. Then we often search for at least an -Nash equilibrium, or -optimal strategies, bringing an outcome no worse than . More references to -Nash equilibria and rare results about equilibria strategies in Bridge or a similar card game can be found in [2]. Promising results were published for another card game, Poker, in [3]. Studying differences between Bridge and Poker the ratio of symmetries seems to be worse [1] and behind this Bridge seems to be outside of a class of games that are applicable for the GameShrink algorithm. In Bridge in the second phase players do not get an immediate result as in Poker, but they play cards sequentially in rounds uncovering more information to other players and trying to win tricks, which form the outcome of the game, a sequential outcome. Based on this fact and on the minimax and alpha-beta technique (originally known from games of perfect information) we present two algorithms, 2.1 and 2.2, which together with the Brown-Robinson method [4] are suitable to compute -optimal strategies in a defined class of games containing Bridge. We also present a test-bed for such a class, a reduced variant of Bridge, and the effect of imperfect information on a strategy of players.
2 ALGORITHMS 2.1 AND 2.2 As a test-bed of later defined class and to simplify presented algorthms first we construct a reduced variant of Bridge. The reduction includes: a game only for 2 players (A and B representing the line NS and WE and sharing information between their hands), the card count (4 or 5 cards in each of 4 suits), the fixed first phase and the different outcome functions (either as a mean value of won tricks or a probability to fulfil a contract). Given a distribution of cards between A and B for 16 card game there exist 70 different configurations of cards between N and S hand and another 70 between W and E. The details can be found in [1]. Trying to find -optimal strategies for such a game we will use the Brown-Robinson method ([4]): Constructing the sequence of pure strategies a1 , a2 , ...ai and b1 , b2 , ..., bi for player A and B, where ai and bi are best strategies against avg(b1 , ..., bi−1 ) and avg(a1 , ..., ai−1 ), then 1
Department of Theoretical Computer Science, Charles University in Prague, Czech Republic, email: cejnar@kti.mff.cuni.cz
avg(a1 , ..., ai ) and avg(b1, ..., bi ) converge to equilibria strategies for i → ∞. To use this metod we need an algorithm to construct an optimal strategy (say for player A) against a given opponent strategy. We use a game tree where each node has its Si index where Si are all cards mj mk ml ... played up to this node and S0 means the root node. In each node player A being on turn he has to select a card to play in each of his acceptable (or that do not imply breaking the rules) configuration. Let CA,i , CB,j be acceptable configurations of player A and player B of index i and j in a given situation, let mk be an acceptable move in a given situation and let P (X) and P (X|Y ) be the probability of X and a conditional probability of X|Y . Then algorithm 2.1 (based on the minimax algorithm) runs as follows: 1. At the beginning player B is on turn and P (CB,i |S0 ) for each i is known. Each time player B is on turn (in Sk ) we know P (CB,i |Sk ), we know P (mm |CB,i &Sk ) for each m and i, then we compute P (CB,i |Sk + mm ) using the Bayes rule. If player A is on turn and plays mn , then we set P (CB,i |Sk + mn ) = P (CB,i |Sk ). Thus we know P (CB,i ) during the whole game. 2. After the end of game, there is only one CA,i and CB,j . The outcome for them is obvious - we set the outcome(CA,i , Sl ) assuming Sl cards played. 3. Having known the outcome for each node n moves to the end of game, for each node Sk n + 1 moves to the end of game: If player A is on turn, we select the action bringing the best outcome and thus we set outcome = (CA,j , Sk ) = maxn outcome(CA,j , Sk + mn ). In case of equality we choose a random one of all with the best outcome. If player B is on turn, we know P (CB,l |Sk ) for each l, we know his strategy, then we compute only outcome(CA,j , Sk ) = n outcome(CA,j , Sk + mn )P (mn |Sk ) where P (mn |Sk ) = P (mn |CB,q &Sk )P (CB,q |Sk ). q 4. To be a valid strategy we have to define player A actions in acceptable nodes where P (CB,i |Sk ) = 0 for each i. In such a situation we assume player B made a mistake. Then we assume player B still plays his strategy and made a minimal possible amount of mistakes. Then we are able to compute P (CB,i |Sk ) and apply item 3. Algorithm 2.1 runs in time linear to the size of game tree [1] and finds optimal strategy for player A against a given strategy of player B [1]. The construction of strategy for player B is similar. Traversing the whole game tree in each iteration of the BrownRobinson method has several weaknesses. It evaluates subtrees of nodes we can prove to have the same outcome each time. It also evaluates subtrees of nodes which are dominated in any CA,i or CB,j . To handle this we construct algorithm 2.2 which modifies the
754
P. Cejnar / Computing -Optimal Strategies in Bridge and Other Games of Sequential Outcome
alpha-beta technique (of games of perfect information) which can effectively reduce the whole game tree and thus the time for each iteration of the Brown-Robinson method and algorithm 2.1. Assuming Bridge as a zero-sum game for each node Si that will not be deleted let α(Si ) = minj outcome(CA,j , Si ), the minimal guaranted outcome of player A, let β(Si ) = maxj outcome(CA,j , Si ), the maximal possible outcome. For each node in general let αo (Si ) be the minimal public (known to all players) guaranted outcome of player A regardless of his future play and let βo (Si ) be the maximal possible one, let αp (Si ) and βp (Si ) be the values propagated to node Si . Let αh (Si ) be an estimate of α(Si ) meeting αo (Si ) ≤ αh (Si ) ≤ α(Si ). Using the property of sequential outcome, we can see it as the number of tricks we can take just now regardless of player B play. Let βh (Si ) be an estimate of β(Si ) meeting βo (Si ) ≥ βh (Si ) ≥ β(Si ), then algorithm 2.2 runs as follows: 1. Set αp (S0 ) = αo (S0 ) and βp (S0 ) = βo (S0 ). 2. If player A is on turn in node Si , compute αh (Si ) = minCA,j (αh (Si &CA,j )), i.e. the estimate of α(Si ) value. If αh (Si ) > min(βp (Si ), βo (Si )), mark this node as CUT, estimate α(Si ) as αh (Si ) and β(Si ) as βo (Si ) and don’t traverse the subtree. If αh (Si ) = βo (Si ), mark this node as SAME, save the value and don’t traverse the subtree (we are able to compute the strategy fast). Otherwise set αp (Si + mk ) = max(αp (Si ), αh (Si )), βp (Si + mk ) = min(βp (Si ), βo (Si )) for all mk in Si . If player B is on turn in node Si , compute βh (Si ) = maxCB,j βh (Si &CB,j ). If βh (Si ) < max(αp (Si ), αo (Si )), mark this node as CUT, estimate β(Si ) as βh (Si ) and α(Si ) as αo (Si ) and don’t traverse the subtree. If βh (Si ) = αo (Si ), mark this node as SAME, save the value and don’t traverse the subtree. Otherwise set βp (Si + mk ) = min(βp (Si ), βh (Si )), αp (Si + mk ) = max(αp (Si ), αo (Si )) for all mk in Si . 3. In terminal nodes set α(Si ) = β(Si ) = outcome(CA,j , Si ) for the remaining CA,j . 4. If player A is on turn in node Si and for all mk α(Si + mk ) and β(Si + mk ) are evaluated, compute α(Si ) = minCA,j maxml,CA,j (α(Si +ml,CA,j )) where ml,CA,j runs over all acceptable moves in CA,j . If there exists a mm move where β(Si + mm ) < α(Si ) then delete this node and its subtree. Then compute β(Si ) = maxmn (β(Si + mn )) where mn runs over all acceptable direct subnodes (excluding deleted ones). If α(Si ) = β(Si ), mark this node as SAME, save the value and delete its subtree. If player B is on turn in node Si and for all mk α(Si + mk ) and β(Si + mk ) are evaluated, compute β(Si ) = maxCB,o minml,CB,o (β(Si + ml,CB,o )) where ml,CB,o runs over all acceptable moves in CB,o . If there exists a mm move where α(Si + mm ) > β(Si ) then delete this node and its subtree. Then compute α(Si ) = minmn (α(Si + mn )) where mn runs over all acceptable direct subnodes (excluding deleted ones). If β(Si ) = α(Si ), mark this node as SAME, save the value and delete its subtree. We can go further and delete configurations in nodes where a better move exists, however it would get an additional overhead to remember them. Each equilibrium we found on a game tree reduced by the algorithm 2.2 we can transform to the equilibrium of original game tree[2]. Running time of this algorithm is no worse than linear to the size of game tree (when item 1 and 2 skipped). However the computation of αh (Si ) and βh (Si ) in games of sequential outcome seems to be very fast (counting tricks won without a loss of initiative) and
saves additional time. The following lemma allows us to stop the iterative process and find an -optimal strategy (proof in [1]): Let sA , sB be strategies of player A and player B, let oA , oB be optimal strategies of player A and player B against sB and sA , let |outcome(sA , oB ) − outcome(sA , sB )| < A and |outcome(oA , sB ) − outcome(sA , sB )| < B , then outcome of player A (and player B) will differ no more than A + B compared to equilibria strategies. In [1] it is also presented a more detailed view and C++ implementation of the method and algorithms. The reader can found there detailed discussion of data structures and an extension of algorithms to a game with more independent players.
3 GAMES OF SEQUENTIAL OUTCOME Depending on construction of algorithms we define a class of games that are most suitable to use for the presented method as games meeting all the following: It is a finite zero-sum game. It is a sequential game of perfect recall. It consists of three phases. In the first phase each player receives a finite set of private signals from a finite set of signals and it is given which players share information about their private signals. Other phases consist of number of rounds. Each player plays once for round and they change on turn in known order. The player on turn announces a public signal (to all players) and private signals (to given players). The set of available signals depends on all (public and private) signals received before. In the third phase in each round each player also announces at least one public signal bringing new information about private signals received at the beginning. The guaranted outcome of players is dependent on all signals announced before and it is a monotonic function in the third phase of game which rises for at least one player after each round. After the end of game no private information remains. 1 0.1 0.01 0.001 0.0001 1e-005 1e-006 0
10000
20000
30000
40000
50000
60000
Figure 1. We computed 100 iterations in a 6.27% sample of all distributions in a 16 card reduced variant of Bridge, which took 487 hours on Pentium 2GHz with 1GB RAM having between 0 and 0.09. Figure shows the ordered difference in a trick outcome of strategies played with imperfect information of player B configurations against the outcome of strategies with perfect knowledge (logarithmic scale on the y axis). Other results and examples of precomputed strategies can be found in [2][1].
REFERENCES [1] P. Cejnar, Bridge - Computing Optimal Strategies, Master’s thesis, Faculty of Mathematics and Physics, Charles University, Prague, 2008. Supervisor V. Majerech. [2] P. Cejnar, ‘Computing -Optimal Strategies in Bridge and Other Games of Sequential Outcome (extended version)’, http://kti.mff.cuni.cz/˜cejnar/papers/ECAI2008extended.pdf, 2008. [3] A. Gilpin and T. Sandholm, ‘Finding Equilibria in Large Sequential Games of Imperfect Information’, Ann Arbor, MI, (2006). ACM Conference on Electronic Commerce (EC’06). [4] J. Robinson, ‘An Iterative Method of Solving a Game’, Annals of Mathematics, 54, (1951).
2. Machine Learning
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-757
757
Classifier Combination Using a Class-indifferent Method Yaxin Bi1 and Shenli Wu 1 and Pang Xiong2 and Xuhui Shen 2 Abstract. In this paper we present a novel approach to combining classifiers in the Dempster-Shafer theory framework. This approach models each output given by classifiers as a list of ranked decisions (classes), which is partitioned into a new evidence structure called a triplet. Resulting triplets are then combined by Dempster’s rule. With a triplet, its first subset contains a decision corresponding to the largest numeric value of classes, the second subset corresponds to the second largest numeric value and the third subset represents uncertainty information in determining the support for the former two decisions. We carry out a comparative analysis with the combination methods of majority voting, stacking and boosting on the UCI benchmark data to demonstrate the advantage of our approach.
1
INTRODUCTION
The choice or design of a method for combining classifier decisions is a challenging task in ensemble learning and various methods have been developed in the past decades. Kuncheva in [2] roughly characterizes combination methods, based on the forms of classifier outputs, into two categories. The first category is that the combination of decisions is performed on single class labels, such as majority voting [1] and Bayesian probability [8]. The second category is concerned with the utilization of continuous values (probabilities) corresponding to class labels. One typical method, often called a class-aligned method, is based on the same classes from different classifiers in calculating the support for classes. This method includes meta-learning − stacking where combining functions are learnt from continuous values of class labels [4], linear sum and order statistics (mean, minimum and maximum) [6]. An alternative group of methods which are called class-indifferent methods are to make use of as much information as possible obtained from single and sets of classes in calculating the support for each class [2]. Formally suppose we are given classifier ϕ and a new instance d, a classification task is to make the decision for d using ϕ about whether instance d belongs to class ci ∈ C. Instead of singleclass assignment, the classifier output can be denoted by ϕ(d) = {s1 , · · · , s|C| }, where si is a numeric value that can be regarded as a class-conditional probability (prior posterior probability). Given an ensemble of classifiers, ϕ1 , ϕ2 , · · · , ϕM , all classifier outputs can be organized into a matrix called a decision profile depicted in Figure 1. Based on the decision profile, class-aligned methods calculate the support for class cj using only the DP (d)’s jth column, i.e. s1j , s2j , · · · , sM j , regardless of what the support for the other classes is. While class-indifferent methods use an entire decision pro1
School of Computing and Mathematics, University of Ulster, Co. Antrim, BT37 0QB, UK, email: {y.bi, s.wu1}@ulster.ac.uk 2 Institute of Earthquake Science, China Earthquake Administration, Beijing, 100036, China
Figure 1. A decision profile for instance d generated by ϕ1 (d), ϕ2 (d), · · · , ϕM (d)
file as a set of intermediate feature vectors to constrain a class decision, such as computing a covariance matrix for some classes [2]. In this study, we consider a class-indifferent method based on the Dempster-Shafer theory of evidence [5], which is is slightly different from the one above. We do not use an entire decision profile to compute the degrees of support for every class. Instead we select 2 classes from each ϕi (d) according to their numeric values and restructure them into a new list composed of three subsets of C which are represented by the novel evidence structure of triplet. For each triplet, its first subset contains the class with the largest value, and the second contains the second largest class, and the third one is the whole set of C. In this way, a decision profile in Figure 1 is restructured into a triplet decision profile where each column no longer corresponds to the same class shown in Figure 2. The degree of support for each class is computed through combining all triplets in a decision profile by Dempster’s rule of combination [5].
Figure 2. A triplet decision profile for instance d derived from DP (d) in Figure 1
As an example, we consider a case where there is a five classifier ensemble for a three class problem as shown in Figure 3. This figure presents the classifier outputs for a given input d from the ensemble and combined results using different rules. The winning class for each combination rule is shown in bold. It can be seen that different classes win for different combination rules. For example, class
758
Y. Bi et al. / Classifier Combination Using a Class-Indifferent Method
Table 1. Accuracies of the best INDIVIDUAL classifier, best combined classifiers based on TRIPLET using DS and MV along with MLRs (STACK1, 2 correspond to the settings (5) and (6)) and AdaBoostingM1 (BOOSTING corresponds to setting (7)) over the thirteen data sets
DATASET A NNEAL AUDIOLOGY BALANCE C AR G LASS AUTOS I RIS L ETTER C LEVELAND S EGMENT S OYBEAN W INE Z OO AVERAGE W IN /D RAW /L OOSE S IGNIFICANT WIN
I NDIVIDUAL 80.23 48.67 65.67 89.62 65.36 77.59 95.33 92.05 35.48 96.69 95.89 98.90 90.62 79.39
T RIPLET 81.57 57.44 63.17 94.29 66.81 79.28 96.67 92.91 37.09 97.35 96.88 100.00 93.61 81.30 12/0/1 7
1 wins four of the combination rules, class 1 wins three and class 3 wins only one rule. In particular, when class-aligned methods − sum rule and mean rule − cannot distinguish between classes 1 and 2, by taking account of the support of the other classes, Dempster’s rule is able to make a distinction between them. This demonstrates the advantage of the class-indifferent method.
Figure 3. Example of class-aligned methods and class-indifferent methods
MV 81.14 54.30 62.72 91.75 66.69 77.94 96.67 92.77 34.37 96.55 96.17 98.97 93.61 80.28 10/0/3 4
B OOSTING 77.35 45.16 93.17 92.60 65.97 77.32 98.00 92.53 31.91 96.57 95.50 98.38 89.43 81.07 5/0/8 3
S TACK 1 72.77 32.89 62.73 86.18 58.41 75.34 94.67 92.03 35.13 96.59 95.25 98.90 82.57 75.65 0/1/12 0
S TACK 2 75.34 32.19 68.49 90.03 57.77 77.32 94.00 92.53 31.87 95.85 95.20 98.32 83.64 76.35 3/0/10 0
IBk and NNge) by MLR; and 6) experimenting with AdaBoostingM1 where the best individual classifier SMO is used as the base classifier. To compare the classification accuracies between the individual classifiers and the combined classifiers across all the data sets, we employed the ranking statistics in terms of win/draw/loose record [3]. The win/draw/loose record presents three values, the number of data sets for which classifier A obtained better, equal, or worse than classifier B with respect to a classification accuracy. Classification accuracies were measured by the averaged F -measure [7]. Six groups of experimental results are summarized in Table 1. The bottom of the table provides summary statistics of comparing the performance of the best individual classifiers with the best combined classifiers across the data sets. It can be observed that the accuracy of the combined classifiers based on the triplet structure using DS is better than the five others on average. It has more wins to looses over the best combined classifiers using MV, boosting and stacking compared with the best individual classifiers. This observation is further supported by the statistical significant wins in which the triplet has three more wins than MV, four more wins than AdaBoostingM1, and seven more wins than MLR.
REFERENCES 2
EXPERIMENTAL EVALUATION
To evaluate our method, we used thirteen data sets downloaded from the UCI machine learning repository, including anneal, audiology, balance, car, glass, autos, iris, letter, heart, segment, soybean, wine and Zoo. For individual classifiers, we used thirteen learning algorithms including AOD, NaiveBayes, SMO, IB1, IBk, KStar, DecisionStump, J48, RandomForest, DecisionTable, JRip, NNge and PART, aa of which were taken from the Waikato Environment for Knowledge Analysis (Weka) version 3.4. For meta classifiers − stacking, we chose the multi-response linear regression (MLR) and we also chose AdaBoostingM1 to compare with our method. Parameters used for each algorithm were at the Weka default settings [7]. Six groups of experiments are reported here. These include 1) assessing all the algorithms; 2) combining the individual classifiers using DS; 3) combining the individual classifiers using MV; 4) combining J48, NaiveBayes, MLR and KStar by MLR [4]; 5) combining the best, the second best and the third best individual classifiers (SMO,
[1] R.P.W. Duin and D.M.J. Tax, ‘Experiments with classifier combining rules’, in Multiple Classifier Systems, J. Kittler and F. Roli, eds, pp. 16– 29, (2000). [2] L. Kuncheva, ‘Combining classifiers: Soft computing solutions’, in Pattern Recognition: From Classical to Modern Approaches, Pal S.K. and Pal A. (eds), pp. 427–451, (2001). [3] Melville P and Mooney R.J., ‘Constructing diverse classifier ensembles using artificial training examples’, in In Proc of IJCAI-2003, pp. 405– 510, (2003). [4] A.K. Seewald, ‘How to make stacking better and faster while also taking care of an unknown weakness’, in In Proceedings of ICML’02, pp. 554– 561, (2002). [5] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, New Jersey, 1st edition edn., 1976. [6] K. Tumer and G. J. Robust, ‘On combining classifiers’, Pattern Analysis and Applications, 6 (1), 41–46, (2002). [7] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2nd edition edn., 2005. [8] L. Xu, A. Krzyzak, and C.Y. Suen, ‘Several methods for combining multiple classifiers and their applications in handwritten character recognition’, IEEE Trans. on System, Man and Cybernetics, 2 (3), 418–435, (1992).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-759
759
Reinforcement Learning with Classifier Selection for Focused Crawling Ioannis Partalas1, Georgios Paliouras2, Ioannis Vlahavas1 Abstract. Focused crawlers are programs that wander in the Web, using its graph structure, and gather pages that belong to a specific topic. The most critical task in Focused Crawling is the scoring of the URLs as it designates the path that the crawler will follow, and thus its effectiveness. In this paper we propose a novel scheme for assigning scores to the URLs, based on the Reinforcement Learning (RL) framework. The proposed approach learns to select the best classifier for ordering the URLs. This formulation reduces the size of the search space for the RL method and makes the problem tractable. We evaluate the proposed approach on-line on a number of topics, which offers a realistic view of its performance, comparing it also with a RL method and a simple but effective classifier-based crawler. The results demonstrate the strength of the proposed approach.
1 Introduction In this paper we propose a novel adaptive focused crawler that is based on the RL framework [5]. More specifically, RL is employed for selecting an appropriate classifier that will in turn evaluate the links that the crawler must follow. The introduction of link classifiers reduces the size of the search space for the RL method and makes the problem tractable. We evaluate the proposed approach on a number of topics, comparing it with an RL approach from the bibliography and a classifier-based crawler. The results demonstrate the robustness and the efficiency of the proposed approach.
2 Reinforcement Learning with Classifier Selection In this work we propose an adaptive approach, dubbed Reinforcement Learning with Classifier Selection (RLwCS), to evaluate URLs, based on the RL framework. RLwCS maintains a pool of classifiers, H = {h1 , . . . , hk }, that can be used for URL evaluation, and seeks a policy for selecting the best classifier, ht , for a page to perform the evaluation task. In other words, the crawler must select dynamically a classifier for each page, according to the characteristics of the page. We solve this problem using an RL approach. In our case, there are just two classes, as a URL or page can be relevant or not to a specific topic. We represent the problem of selecting a classifier for evaluating the URLs, as an RL process. The state is defined as the page that is currently retrieved by the agent, on the basis that the perception of the environment arises mainly by the pages retrieved at any given time. Actions are the different classifiers, ht ∈ H. We add an extra 1 2
Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece, email: {partalas,vlahavas}@csd.auth.gr Institute of Informatics and Telecommunications, National Centre for Scientific Research ”Demokritos”, email: paliourg@itt.demokritos.gr
action which is denoted as S and combines the classifiers in a majority scheme. The set of actions is thus H ∪ {S}. The state transitions are deterministic as the probability of moving to a page when selecting a classifier for evaluation is equal to 1. The selected classifier is the one that scores the URLs of a visited page. More specifically, the classifier’s score that a URL belongs to the relevant class. The reward for selecting a classifier depends on the relevance of the page that the crawler visit. If the page is relevant, the reward is 1, while otherwise the reward is 0. Thus, we seek to find an optimal policy for mapping pages to classifiers in order to maximize the accumulated reward received over time. The mechanism that is used for training the RL module is the Qlearning algorithm [6]. Q-learning finds an optimal policy based on the action-value function, Q(s, a). The Q function expresses the benefit of following the action a when in state s. In our case the value of selecting a classifier in a specific page is associated with the expected relevance of the next page (state) that the crawler will fetch. Next, we need to define the features that will be used to represent both the states and the actions. Based on the literature of focused crawling we chose the following features to represent a state-action pair: • Relevance score of a page with respect to the specific domain. • Relevance score of the page, computed by the selected classifier (action). • Average relevance score of the parents of the page that is crawled. • Hub score. We employ function approximation to tackle the problem of the large state-action space. A well-known method is the combination of Q-learning with eligibility traces, Q(λ), and gradient descent function approximation [5]. Additionally, linear methods are used to approximate and represent the value function. Further details about the function approximation algorithm that we used can be found in [5].
3 Experimental Setup We constructed a number of topic-specific datasets following the procedure that is described in [2]. Table 1 shows the topics that we selected for experimentation3 . For each topic URL we downloaded the corresponding page and constructed the instances based on the textual information. More specifically, for each document downloaded we produced the TF-IDF vectors using the weighted scheme proposed by Salton and Buckley [3]. Each instance of the on-topic and off-topic documents is named relevant or irrelevant respectively.4 3 4
http://dmoz.org The datasets created are available at http://mlkd.csd.auth.gr/fcrawling.html
760
I. Partalas et al. / Reinforcement Learning with Classifier Selection for Focused Crawling
0.6
Table 1. ODP topics.
BFS RLwCS TD-FC
0.55
Number of URLs 62 166 72 114 196 64 179 239 275 103
After creating the set of relevant and irrelevant instances we train the classifiers for each topic that will form the action set for RLwCS, with the addition of the extra action that combines the opinions of the classifiers using the majority scheme. For P an instance x the output of the majority scheme is S(x) = maxcj km=1 hm (x, cj ), where hm outputs a probability distribution for each class cj , j = 1 . . . n.. We trained four classifiers using the WEKA machine learning library [8]: • • • •
Neural network (NN): 16 hidden nodes and learning rate 0.4. Support vector machine (SVM): polynomial kernel with degree 1. Naive Bayes (NB): with kernel estimation. Decision tree (DT): with Laplace smoothing and reduced error pruning.
The proposed approach, RLwCS, is compared with a base crawler that uses a SVM to assign scores to the URLs and with Temporal Difference Focused Crawling (TD-FC) method [1]. The experiments for the crawlers are performed on-line in order to obtain a realistic estimate of their performance. We must note here that the majority of the approaches reported in the literature, conducted their experiments offline in a managed environment. The online evaluation in a variety of topics allow us to make more accurate statistical tests in order to detect significant differences in the performances of the crawlers. For the purposes of evaluation, we used two analogous metrics to the well-known precision and recall, that is harvest rate and target recall [4].
0.14
0.45 0.4 0.35 0.3
BFS RLwCS TD-FC
0.12 Target recall
Topic Shopping/Auctions/Antiques and Collectibles/ Health/Medicine/Osteopathy/ Games/Video Games/Puzzle/Tetris-like/ News/Weather/Air Quality/ Science/Astronomy/Amateur/Astrophotography and CCD Imaging/ Health/Medicine/Informatics/Telemedicine/ Sports/Winter Sports/Snowboarding/ Sports/Hockey/Ice Hockey/ Arts/Literature/Periods and Movements/ Health/Alternative/Aromatherapy/
Harvest rate
0.5
0.1 0.08 0.06 0.04 0.02
0.25 0.2
0 0
500 1000 1500 2000 2500 3000
0
500 1000 1500 2000 2500 3000
Number of crawled pages
Number of crawled pages
(a) Average harvest rate
(b) Average target reacll
Figure 1.
Average harvest rate and target recall.
the proposed approach obtains the highest values during the crawling process and outperforms the other two methods. Wilcoxon tests at a confidence level of 95% report significant differences only after the first 600 pages have been crawled. This is again a very encouraging result for the proposed approach.
5 Conclusions In this paper we presented a novel Focused Crawling approach, named RLwCS, which is based on the Reinforcement Learning framework. The crawler learns to select an appropriate classifier for ordering the URLs of each Web page that it visits. We compared the proposed approach with the well-known Best-First Search crawler and a pure RL approach, on a number of topic-specific datasets. The crawlers were tested on-line, in order to obtain realistic measurements of their performance. The analysis of the results led to several interesting conclusions. The proposed approach manages to achieve good performance outperforming the BFS which is considered in the literature as a very effective crawler.
Acknowledgments We would like to thank Ioannis Katakis for providing us with the source code for text processing and Michalis Lagoudakis for interesting discussions that led to this work. This work is partly funded by the Greek General Secretariat for Research and Technology, project Regional Innovation Pole of Central Macedonia.
4 Results and Discussion Figure 1(a) presents the average harvest rate of each algorithm for all topics, against the number of crawled pages. We first notice that RLwCS clearly outperforms both BFS anf TD-FC, as it manages to collect more relevant pages. In order to investigate whether the performance differences between RLwCS and the other two algorithms are significant, we use the Wilcoxon signed rank test [7]. We performed 2 tests, one for each paired comparison of RLwCS with each of the other algorithms on each topic, at a confidence level of 95%. The test was performed on various points during the crawling process, and more specifically per 200 crawled pages. The test found that RLwCS is significantly better than all the other algorithms during the whole crawling process (200 to 3000 pages) on all topics. Another interesting observation is the fact that the proposed approach achieves a high harvest rate in the first 200 pages which is a strong advantage in on-line crawling tasks where the crawler must gather relevant pages in a small time frame and a small number of visited pages. Figure 1(b) shows the target recall curves for the competing algorithms, averaged across all topics. We notice again that
REFERENCES [1] A. Grigoriadis and G. Paliouras, ‘Focused crawling using temporal difference-learning’, in Proc. 3th Hellenic Conference on Artificial Intelligence, pp. 142–153, (2004). [2] Gautam Pant and Padmini Srinivasan, ‘Learning to crawl: Comparing classification schemes’, ACM Transanction on Information Systems, 23(4), 430–462, (2005). [3] Gerard Salton and Christopher Buckley, ‘Term-weighting approaches in automatic text retrieval’, Information Processing and Management, 24(5), 513–523, (1988). [4] P. Srinivasan, F. Menczer, and G. Pant, ‘A general evaluation framework for topical crawlers’, Information Retrieval, 8(3), 417–447, (2005). [5] R. S. Sutton and A. G. Barto, Reinforcement Learning, An Introduction, MIT Press, 1999. [6] C.J. Watkins and P. Dayan, ‘Q-learning’, Machine Learning, 8, 279–292, (1992). [7] F. Wilcoxon, ‘Individual comparisons by ranking methods’, Biometrics, 1, 80–83, (1945). [8] I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, 2005.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-761
761
Intuitive Action Set Formation in Learning Classifier Systems with Memory Registers L. Sim˜oes and M.C. Schut and E. Haasdijk1 Abstract. An important design goal in Learning Classifier Systems (LCS) is to equally reinforce those classifiers which cause the level of reward supplied by the environment. In this paper, we propose a new method for action set formation in LCS. When applied to a Zeroth Level Classifier System with Memory registers (ZCSM), our method allows the distribution of rewards among classifiers which result in the same memory state, rather than those encoding the same memory update action.
1
INTRODUCTION
This paper introduces a new method for action set formation (asf ) in Learning Classifier Systems, and tests it in partially observable environments requiring memory. The operation of asf is responsible for choosing the classifiers that will receive the reward supplied by the environment, for some performed action. When new classifiers are generated, the system has no way of knowing how good these are. Their strengths depend on the actions in the contexts under which they trigger, and on the other classifiers in the population with which they interact. As classifiers are added to the population, these are assigned an initial strength value. Then, by repeated usage, the strength update component will gradually converge towards a better estimate of their qualities. But since the system has to perform at the same time it is building its rule base, it is forced to act despite its uncertainty about the environment, and selecting from among an ever changing population of insufficiently tested classifiers. The method introduced here, iasf, eliminates some of the noise to which the quality estimation component is subjected, with the goal of improving system performance.
2
BACKGROUND
In the mid-1990s, Wilson [7] proposed ZCS as a simplification of Holland’s original LCS [3]. Most importantly, he left out the message list which acted as memory in the original system. Thus, Wilson’s models had no way of remembering previously encountered states and could not perform optimally in partially observable environments where an agent can find itself in a state that is indistinguishable from another state. However, the best action to undertake is not necessarily the same in both states. Wilson proposed [7] a solution for this problem in the form of memory registers to extend the classifiers. Cliff & Ross [2] follow this suggestion and implement ZCSM, extending ZCS with a memory mechanism. In their experiments they observed that ZCSM can efficiently exploit memory in partially observable environments. 1
Stone & Bull extensively compared ZCS to the more popular XCS in noisy, continuous-valued environments [6] and found that what makes XCS so good in deterministic environments (namely; its attempt to build a complete, maximally accurate and maximally general map of the payoff landscape) becomes a disadvantage as the level of noise in the environment increases. ZCS’s partial map, focusing on high-rewarding niches in the payoff landscape then becomes an advantage. This suggests ZCS as an adaptive control mechanism in multi-step, partially observable, stochastic real-world problems.
Department of Computer Science, Faculty of Sciences, VU University, Amsterdam, The Netherlands, email: {lfms, mc.schut, e.haasdijk}@few.vu.nl
3
INTUITIVE ACTION SET FORMATION
ZCS works on a population P of rules which together present a solution to the problem with which the system is faced. As it interacts with the environment, the system is triggered on reception of a sensory input. A match set M is then formed with all the rules in the population matching that input. From this set, a classifier is chosen by proportionate selection based on its strength, and its action is executed. With memory added as described in [2], rules prescribe an external action as well as a modification of the memory bits. It can be argued that the core of ZCS lies in the next, reinforcement stage, as it is responsible for incrementally learning the quality of the rules in the population, which will in turn determine the system’s behaviour. The action set A includes those rules in M that advocated the same action as the chosen classifier. The rules in this action set share in the reward that results from the selected action (with the rationale that choosing any of those rules would have had the same effect). Rules in M that advocate a different action are penalised. Traditionally, A consists of those rules in M that match on a bitwise comparison with the action-part of the chosen classifier. Now, consider ZCSM, where operators on the memory state are added to the action part of the rules. Suppose, then, a situation where the memory state was 01, and remains the same after execution of some chosen classifier c, which advocated2 [0#]. Traditional action set formation would then have A include only those classifiers from M advocating this same memory operation (“set the first memory register to 0”) as well as the same external action as the chosen classifier. However, all of the internal actions {##,#1,01} would result in exactly the same internal state. Not only would the system not reward any classifier in M having one of those internal actions (and the same external action) as the chosen classifier, it would actually penalise them. This seems to conflict with ZCS’s goal of equally rewarding those classifiers which would cause the same level of reward supplied by the environment. 2
Disregarding the external output for simplicity.
762
L. Simões et al. / Intuitive Action Set Formation in Learning Classifier Systems with Memory Registers
20
20 ZCSM1 ZCSM1iasf optimum
ZCSM8 ZCSM8iasf optimum 15 steps to food
steps to food
15
10
5
5
0
0 0
2000
4000
6000 number of trials
8000
10000
Figure 1. Performance comparison in woods101 with 1 memory bit.
This realisation prompted us to introduce a new variant of Cliff & Ross’ classifier system, ZCSMiasf , which compares classifiers based on the memory state which would result from their activation, rather than based on the memory operation. In this more intuitive scheme, any rule in M that prescribes the same external action as c and an internal action that leads to the same memory state (i.e., one of {##,#1,0#,01}) is included in A.
4
10
EXPERIMENTAL ANALYSIS
Experimental Design and Setup – To compare the performance of iasf against regular action set formation, we conducted a series of experiments in the well-known woods101 and 102 environments [2, 5]. These are mazes where paths towards food locations must be learned; both mazes contain indistinguishable locations where the sensory information (i.e., the layout of the perceivable cells) is identical but the appropriate action differs. To tackle such situations, the agent’s controller requires memory to be able to choose the correct action; only reacting to sensory information cannot suffice. An experiment consists of 10,000 trials where, starting from a random location in the maze, the agent must reach the food. If the agent moves into the cell with food, it receives a reward from the environment and the next trial commences: the food is replaced and the agent is randomly relocated. The agent can see the directly adjacent cells and uses that information to decide on an action—where to move next. Following Bull & Hurst’s suggestion, the system is then further tested for an additional 2,000 trials where “the Genetic Algorithm is switched off, reinforcement occurs as usual, and an action selection scheme is used which deterministically picks the action with the largest total fitness in M ” [1]. Performance is measured as the moving average over the previous 50 trials of the number of steps it took to reach the food on each trial. See [7, 2] for more detailed descriptions of the experimental setup. We performed experiments with a memory size of 1 in woods101 and 8 in woods102 with Wilson’s default parameter set for ZCS [7]. Given the more demanding characteristics of woods102, we used a larger population size (N = 2000) there. Results – Figures 1 and 2 show the results of experiments averaged over 30 runs; the lighter horizontal line shows the optimal average performance for each environment (2.9 steps for woods101 and 3.23 for woods102 [5]). The horizontal axes show the number of trials into the experiment. Analysis – Although the change in asf technique is an intuitive one and one that fulfils the LCS design goal of equal credit assignment to the classifiers producing the level of reward coming from the environment, no benefit in performance can be gleaned from the results of our experiments. In both cases, ZCSMiasf performed at substantially the same level as traditional ZCSM; only in woods102 can we see some slight –not statistically significant– improvement. Because
0
Figure 2.
2000
4000
6000 number of trials
8000
10000
Performance comparison in woods102 with 8 memory bits.
this is the more challenging of the two environments [5], this may indicate that performance in more complex environments and tasks can benefit from iasf, but this remains an issue for further investigation.
5
CONCLUSIONS
We have extended the way action sets are formed in classifier systems with memory registers, taking them closer to the design goal of equal credit assignment to the classifiers whose actions cause the level of reward supplied by the environment. We have validated our extension experimentally in partially observable environments using the Zeroth Level Classifier System. The environments on which experiments were performed are wellknown in the existing literature on the subject. The experiments showed no significant improvement in performance. We require further investigation to see whether such improvement does occur in more complex environments. Still, the current results can be considered valuable since the new method is more in line with the general design goal of equal credit assignment than the traditional method. In stochastic environments, where the ZCS algorithm has previously shown to outperform the more widely known XCS [6], rule quality estimation can be expected to take on a more significant role, which leads us to think that our extension will provide more significant benefits in partially observable instances of those problems. Again, further investigations are required to validate this assumption.
REFERENCES [1] Larry Bull and Jacob Hurst, ‘ZCS redux’, Evolutionary Computation, 10(2), 185–205, (2002). [2] Dave Cliff and Susi Ross, ‘Adding temporary memory to zcs’, Adaptive Behavior, 3(2), 101–150, (1994). [3] John H. Holland, ‘Escaping brittleness: the possibilities of generalpurpose learning algorithms applied to parallel rule-based systems’, in Machine learning, an artificial intelligence approach, eds., R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, volume 2, Morgan Kaufmann, (1986). [4] Pier Luca Lanzi, ‘An analysis of the memory mechanism of XCSM’, in Genetic Programming 1998: Proceedings of the Third Annual Conference, eds., John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, pp. 643–651, San Francisco, CA, USA, (22-25 July 1998). Morgan Kaufmann. [5] Pier Luca Lanzi and Stewart W. Wilson, ‘Toward optimal classifier system performance in non-markov environments’, Evolutionary Computation, 8(4), 393–418, (2000). [6] Christopher Stone and Larry Bull, ‘Comparing XCS and ZCS on noisy continuous-valued environments’, Technical Report UWELCSG05-002, Learning Classifier Systems Group, University of the West of England, Bristol, UK, (2005). [7] Stewart W. Wilson, ‘ZCS: A zeroth level classifier system’, Evolutionary Computation, 2(1), 1–18, (1994).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-763
763
An Ensemble of Classifiers for coping with Recurring Contexts in Data Streams Ioannis Katakis, Grigorios Tsoumakas and Ioannis Vlahavas1 Abstract. This paper proposes a general framework for classifying data streams by exploiting incremental clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual feature space is proposed. The clustering algorithm is then applied in order to group different concepts and identify recurring contexts. The ensemble is produced by maintaining an classifier for every concept discovered in the stream2.
1 INTRODUCTION Recent advances in sensor, storage, processing and communication technologies have enabled the automated recording of data, leading to fast and continuous flows of information, referred to as data streams. The dynamic nature of data streams requires continuous or at least periodic updates of the current knowledge in order to ensure that it always includes the information content of the latest batch of data. This is important in applications where the concept of a target class and/or the data distribution changes over time. This phenomenon is commonly known as concept drift. A very special type of concept drift is that of recurring contexts [5]. In this case, concepts that appeared in the past may recur in the future. Although the phenomenon of reappearing concepts is very common in real world problems (weather changes, buyer habits etc) only few methods take it into consideration [3-5]. In this paper we propose an ensemble of classifiers that utilizes a new representation model for data streams suitable for problems with recurring contexts.
2 TRANSFORMATION FUNCTION First, the data stream is separated into a number of small batches of examples. Each batch is transformed into a conceptual vector that is constructed out of a number of conceptual feature sets. Each feature set corresponds to a feature from the initial feature space. Let’s assume that unlabeled (U) and labeled (L) examples are represented as vectors
G xU
x1 , x2 ,..., xn
G
x , x ,..., x , c
and xL
1
2
n
j
where xi is the value of the feature f i , and c j C with C being the set of available classes. Let BU and BL be a batch of unlabeled and labeled instances of size b, G G G G G G BU xU ( k ) , xU ( k 1) ,..., xU ( k b 1) , BL xL ( k ) , xL ( k 1) ,..., xL ( k b 1)
^
1 2
`
^
`
Department of Informatics, Aristotle University of Thessaloniki, 54124 Greece, email:{katak, greg, vlahavas}@csd.auth.gr The full version of this paper as well as the datasets used for evaluation can be found at: http://mlkd.csd.auth.gr/concept_drift.html
Every batch of examples (BL) is transformed into a conceptual JG vector Z z1 , z2 ,..., zn , where zi are the conceptual feature sets. For every batch BL and feature fi of the original feature space the conceptual feature sets are calculated as follows: Pi ,vj : j 1..m, v Vi , if fi is nominal ° zi ® °¯ Pi , j , V i , j : j 1..m , if fi is numeric v where Pi , j P( f i v | c j ) and i [1, n], j [1, m], v Vi , and Vi is the set of values of the nominal attribute fi Pi ,vj is considered to be equal to nv , j / n j , where nv , j is the number of samples of class c j having the value v at attribute i in batch BL and nj is the number of samples belonging to c j in batch BL. For numeric attributes we use the mean ( Pi , c j ) and standard deviation (V i ,c j ) of attribute f i for samples of class c j in batch BL. The notion behind this representation is that every element of the conceptual vectors expresses in what degree a feature characterizes a certain class. Consequently, conceptual distance between two batches BL ( P ) and BL ( v ) can be defined as the Euclidean distance of the corresponding Conceptual Vectors: ConDis(BL ( P ) ,BL (Q ) ) = Euclidean( Z ( P ) , Z ( v ) )
^ ^
^dis( z
1( P )
`
`
1/2
`
, z1(Q ) ) ... dis ( zn ( P ) , zn ( v ) )
2
2
Where, dis ( zi ( P ) , zi (Q ) ) ] i1( P ) ] L1( v ) ... ] il( P ) ] il( v ) and ] i (j P ) is the j-th element of the i-th conceptual feature-set of the vector ȝ, whereas l is the length of the feature set. This mapping procedure tries to ensure that the more similar two batches will be conceptually, the closer in distance their corresponding conceptual vectors will be. The definition of this distance will be also beneficial for the clustering algorithm of the framework we present in the following section.
3 THE CCP FRAMEWORK The main components of the CCP (Conceptual Clustering and Prediction) framework (Fig. 1) are: a) a mapping function (M), that transforms data into conceptual vectors, b) an incremental clustering algorithm (R), that groups conceptual vectors into clusters and c) an incremental classifier (h) for every concept discovered. The pseudocode of the framework can be seen in Fig. 2. What is maintained as time (t) passes is a set of clusters Gt {g1 , g 2 ,..., g q } and a set of corresponding classifiers Ht={hi,,h2,…,hq}. Classifier hi is trained from batches that belong conceptually to cluster gi. Initially, Go=, Ho= . By classifying the current batch according to the classifier built from the cluster of the previous batch we make a kind of a locality assumption. We assume that successive batches (of small size) most of the time will belong to the same concept.
764
I. Katakis et al. / An Ensemble of Classifiers for Coping with Recurring Contexts in Data Streams
smaller batches do not suffice for calculating the summary probabilistic statistics. ȉhe experiments include a benchmark version of our framework (dubbed Oracle), where perfect clustering assignments are manually provided to the system. This allows the study of the maximum performance that can be achieved using the CCP framework. Results Table 2 shows the results of the experiments in the three datasets. We notice that even a basic implementation of CCP achieves better performance than all other methods. Fig. 3 shows the average accuracy over fifty instances for the CCP and WE method for the Usenet1 dataset. Note the sudden dives of WE’s accuracy in drift time-points. In all cases, CCP manages to recover much faster from the drift. Most notably, at the last two drift point, CCP recognizes the recurrent theme and remains accurate. Finally, the performance of Oracle, strongly underlines the fact that there is room for improvement by using more advanced incremental clustering algorithms.
Fig. 1. Clustering conceptual vectors into concepts
CCP Framework begin for i=1 to infinity do Zi-1=M.getconceptualVectorOf(BL(i-1)) gǯ = R.getClusterOf(Zi-1) R.update(Zi-1) hgǯ.update(BL(i-1)) hgǯ.classify(BU(i)) end
Table 2. Accuracy of the four methods in the three datasets Usenet1
Fig. 2. The main operation of CCP framework
4 EVALUATION Datasets The first two datasets (usenet1, usenet2) are based on the 20 newsgroups collection [1]. They simulate a stream of messages from different newsgroups that are sequentially presented to a user, who then labels them as interesting or junk, according to his/her personal interests. Table 1 shows which messages are considered interesting (+) or junk (-) in each time period. The third dataset is based on the Spam Assassin collection and contains both spam and legitimate messages.
Usenet2
spam
Simple Incremental
0.59
0.73
0.75
TimeWindow (w=100)
0.56
0.60
0.60
TimeWindow (w=150)
0.59
0.62
0.64
TimeWindow (w=300)
0.58
0.70
0.62
CCP (Oracle)
0.81
0.80
-
CCP (Leader-Follower)
0.75
0.77
0.93
Weighted Examples
0.67
0.75
0.91
Table 1. Dataset Usenet1 and Usenet2 0-300 medicine space baseball
+ -
medicine space baseball
+ -
301-600 600-900 Usenet 1 + + + Usenet 2 + +
900-1200 1200-1500 + +
+ -
+ -
+ -
Methods Evaluation involves the following methods: Simple Incremental Classifier (SIC): It maintains only one classifier, which incrementally updates its knowledge. Time Window (TW): It classifies incoming instances based on the knowledge of the latest N examples. Weighted Examples (WE): It consists of an incremental classifier that supports weighted learning. Bigger weights are assigned to more recent examples in order to focus on new concepts. An incremental naïve bayes classifier is used as base classifier for the above methods. Our implementation of the CCP framework includes the mapping function discussed in section 2, the Leader-Follower algorithm described in [2] as the clustering component and an incremental Naive Bayes classifier. Preliminary experiments showed that a batch size around 50 instances is appropriate. Larger batches invalidate the locality assumption, whereas
Fig. 3. Average accuracy over 50 instances for WE and CCP.
5 ACKNOWLEDGMENTS This work was partially supported by a PENED program (EPAN M.8.3.1, No.03Ǽǻ73), jointly funded by the European Union and the Greek Government (General Secretariat of Research and Technology).
REFERENCES [1] Asuncion, A. and Newman, D.J., UCI Machine Learning Repository. 2007, University of California, School of Information and Computer Science [www.ics.uci.edu/~mlearn/MLRepository.html]: Irvine, CA. [2] Duda, R.O., Hart, P.E., and Stork, D.G., Pattern Classification. 2000: Wiley-Interscience. [3] Forman, G. Tackling Concept Drift by Temporal Inductive Transfer. in 29th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2006. Washington, USA: p. 252-259. [4] Harries, M.B., Sammut, C., and Horn, K., Extracting Hidden Context. Machine Learning, 1998. 32(2): p. 101-126. [5] Widmer, G. and Kubat, M., Learning in the Presense of Concept Drift and Hidden Contexts. Machine Learning, 1996. 23(1): p. 69-101.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-765
765
Content-Based Social Network Analysis Paola Velardi† and Roberto Navigli† and Alessandro Cucchiarelli‡ and Mirco Curzi‡ Abstract. Relationships among actors in traditional social network analysis are modelled as a function of the quantity of relations (co-authorships, business relations, friendship, etc.). In contrast, within a business, social or research community, network analysts are interested in the communicative content exchanged by the community members, not merely in the number of relationships. In order to meet this need, this paper presents a novel social network model, in which the actors are not simply represented through the intensity of their mutual relationships, but also through the analysis and evolution of their shared interests. Text mining and clustering techniques are used to capture the content of communication and to identify the most popular topics.
1 SYSTEM DESCRIPTION This paper presents a model for social network analysis in which, besides analyzing the quantity of relationships (co-authorships, business relations, friendship, etc.), we analyze also their communicative content. Text mining and clustering techniques are used to capture the content of communication and to identify the most popular themes. The social analyst is then able to perform a study of the network evolution in terms of the relevant themes of collaboration, the detection of new concepts gaining popularity, and the existence of popular themes that could benefit from better cooperation. The idea of modeling the content of social relationships is not entirely new. In [1] a method is proposed to discover “semantic clusters” in Google news, i.e. groups of people sharing the same topics of discussion. In [2] the authors propose the AuthorRecipient-Topic model, a Bayesian network that captures topics and the directed social network of senders and recipients in a message-exchange context. In both cases the major weakness lies in the rather naive bag-of-words model used for extracting content from documents. For example, in [1] one of the topics around which groups of people are created are “said, bomb, police, london, attack”, and in [2] an example is: “section, party, language, contract, …”. This problem is common to many existing papers on topic clustering, where the focus is more on the clustering algorithm than on the selection of textual features. In our view, the usefulness of a content-based social analysis (CB-SA) is strongly related to the informative level and semantic cohesion of the learned topics. A simple bag-of-words model seems rather inadequate at capturing the meaning of social communications. Instead, we use a combination of natural language processing and machine learning techniques to obtain very informative clusters, representing the central “topics” of a community. —————————————————————————————————— †
‡
Department of Computer Science, University of Roma “La Sapienza”, Italy. e-mail: {velardi, navigli}@di.uniroma1.it. Department of Computer Science, Management and Automation (DIIGA), Polytechnic University of Marche, Italy. e-mail: {cucchiarelli, curzi}@diiga.univpm.it.
In summary (more details on the CB-SA model can be found in [3]), the analysis steps are the following: 1 Concept identification: The objective of this phase is to identify the emergent semantics of a community, i.e. the concepts that better characterize the content of actor’s communications. Concepts are extracted from available texts (hereafter referred to as the domain corpus) exchanged among the members of the community. We use our TermExtractor system [4], a freely available1 high performing tool to extract the relevant terminology from single documents and entire corpora. 2 Computation of semantic similarity: First, a graph G=(V,E) is built, being V the set of nodes representing terminological strings (hereafter denoted also as domain concepts) extracted as described in the previous phase, and E the set of edges. An edge (tj, ti) is added to E if any of the following three conditions holds2: i) a relation holds between the concepts expressed by tj and ti in a domain ontology or thesaurus (e.g. ontology representation is a kind-of knowledge representation); ii) the term ti occurs in a textual definition of tj from a domain glossary (e.g. we add (ontology representation, ontology) to E, as ontology representation is defined as “the description of an ontology in a well-defined language”); iii) the two terms co-occur in the document corpus (we use the Dice coefficient). Given the graph G, for each pair of concepts tj and ti, we compute the set of chains in the graph, i.e. edge paths of length l (l = 1, …, L, where L is the maximum path length) which connect the two concepts: . LCl (t j , ti ) = {t j t1 t 2 ... t l 1 t l ti } Finally, we compute the semantic similarity between tj and ti as a function of the corresponding lexical chains between the two concepts: L
| LCl (t j , ti ) |
l =1
| LCl (t j ) |
sim(t j , ti ) =
el
(1)
where LCl(tj) denotes the set of all the lexical chains connecting tj to any other node (i.e. the union of the sets LCl(tj, tm) for all tmV\{tj}). According to the above formula, the contribution of the lexical chains of length l is given by the inverse of the exponential of l weighted by the ratio of lexical chains of length l which connect tj to ti to that which connect tj to any node in the graph. Each domain concept tj is then associated with an n-dimensional vector xj, where n is the total number of extracted concepts, and the k-th component of xj is xji=sim(tj,ti). In the following, we denote with X the space of instance vectors, where |X|=|V|=n. 1 2
http://lcl.uniroma1.it/termextractor
The availability of an ontology and glossary is not strictly required. However we developed tools to facilitate their automatic acquisition [5].
766
P. Velardi et al. / Content-Based Social Network Analysis
3 Topic detection. The subsequent step, topic detection, is a clustering task: the objective is to organize concepts in groups, or clusters, so that concepts within a group are more similar to each other than are concepts belonging to different clusters. We cluster concept vectors in X using an empowered version of the k-means algorithm, the k-means++ method for optimal selection of the initial seeds [6]. The best clustering C is identified using the Silhouette Coefficient validity measure. 4 Social Network Analysis. Social network analysis is applied to the case of a research network, but the approach is fully general. Given the set G of research groups, the set D of members’ publications and the collection V of domain concepts, pattern matching is used to tag each publication di in D with a subset of domain concepts ViV. For any document di, we compute a vector vi of k elements yih (with k=|C C |) such that:
y ih =
l h,i |Ch |
tf •idf (t j ,d i )
j:x j C h
where xj is the similarity vector associated with concept tj (as defined in Section 2.2), lh,i is the number of concepts of Ch found in di, and tf•idf() is a standard measure for computing the relevance of a term tj in a document di of a collection D. Therefore each yih in vi measures the overlap of di with the topic ChC . We finally define a vector I g i which is the centroid of all publications vectors of gi. The Content Based-Social Network is then modelled through an undirected graph with: • the nodes representing the groups gi; • the edges representing the similarity between nodes, measured by the cosine function:
cos sim(gi ,g j ) = cos(gi ,g j ) =
I gi I gj
Figure 1. The best clusters obtained with k=150.
The model described in this paper allows the social analyst to extract information that is not available with standard social analysis tools. For sake of brevity, we show only the example of Figure 2, in which nodes represent the groups (the node dimension is proportional to the number of publications of the associated group, in turn related to the dimension of the NoE research groups), bent edges represent the similarity of interests (formula 2) and curved edges the co-authorship. In the figure, only edges above a user-defined threshold are shown. It is also possible to focus the analysis on a selectable subset of topics. The visualization is very useful to discover groups that could potentially cooperate but do not actually have common activities, thus allowing a better coordination of the network. A lot of other relevant information can be extracted from the CB-SA model. The interested reader is referred to [3].
(2)
2 EXPERIMENTS We applied our method to the study of the INTEROP NoE community, a research network now continuing within the European V_Lab on Enterprise Interoperability3. We collected 1452 full papers or abstracts authored by the INTEROP project members belonging to 46 organizations. We automatically extracted 728 domain terms and we then generated lexical chains, deriving semantic relations from the INTEROP ontology and cooccurrences from the domain papers and glossary. An excerpt of a similarity vector (the arguments are ordered by formula (1)) is: activity_diagram = (class_diagram (1), process_analysis (0.630), software_engineering (0.493), enterprise_software (0.488), deployment_diagram (0.468), bpms_paradigm (0.467), workflow_model (0.444), model-driven_architecture (0.442), workflow_management (0.418),....)
http:// www.interop-vlab.eu/
cluster 19 = { common_ontology, core_domain_ontology, core_ ontology, domain_ontology, enterprise_ontology, federated_ ontology, ontology_alignment, ontology_analysis, ontology_ application, ontology_architecture, ontology_maintenance, ontology_mediation, ontology_merging, ontology_representation, ontology_validation, ontology_versioning, reference_ontology}
I gi • I gj
This formula models the semantic similarity between groups. Traditional and ad-hoc Social Network measures can then be used to support a thorough analysis of the community, as briefly discussed in the experimental session. We also implemented a graphical interface to assist the social analyst in the study of the network: the analyst can perform several tasks, like e.g. to select a topic and display the intensity of interest and the intensity of collaborations on this topic, to show partners with common interest that do not cooperate, to identify “central” topics and their shift in time, etc.
3
Finally, the concept vectors built from concept chains were used to feed the k-mean++ algorithm. The cluster validity measure was computed for incremented values of k, 50k300. Clustering results in the range 140k170 show the best Silhouette values. Figure 1 shows the best-rated cluster (according to its Silhouette) for k=150.
Figure 2. Groups with highest co-authorship and strongest common interests.
REFERENCES [1] J. Dhiraj and D. Gatica-Perez: ‘Discovering Groups of people in Google News’, Proc. Of HCM’06 , Santa Barbara, CA, USA, (2006) [2] A. McCallum, A. Corrada-Emmanuel and X. Wang: ‘Topic and Role Discovery in Social Networks’. Proc. Int. Joint Conf. on Artificial Intelligence, (2005). [3] P. Velardi, R. Navigli, A. Cucchiarelli and F. D’Antonio, and ‘A New content-based model for social network analysys’, Proc. of IEEE Int. Conf. on Semantic Computing, S. Clara, USA, August 2008 [4] F. Sclano and P. Velardi, ‘TermExtractor: a Web Application to Learn the Common Terminology of Interest Groups and Research Communities’, Proc. of 9th Conf. on Terminology and Artificial Intelligence (TIA 2007), Sophia Antinopolis, 2007. [5] P. Velardi, A. Cucchiarelli and M. Petit, ‘A Taxonomy learning Method and its Application to Characterize a Scientific Web Community’, IEEE Transaction on Data and Knowledge Engineering (TDKE), Vol. 19, N. 2, 180-191, (2007). [6] D. Arthur and S. Vassilivitski, ‘k-means++: The Advantages of Careful Seeding’, Proc. of the 18th ACM-SIAM Symp. on Discrete Algorithms, New Orleans, Louisiana, 1027-1035, 2007.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-767
767
Efficient Data Clustering by Local Density Approximation Marc-Isma¨el Akodj`enou and Patrick Gallinari 1 Abstract. The clustering task is a key part of the data mining process. In today’s context of massive data, methods with a computational complexity more than linear are unlikely to be applied practically. In this paper, we begin by a simple assumption : local projections of the data should allow to distinguish local cluster structures. From there, we describe how to obtain ”pure” local sub-groupings of points, from projections on randomly chosen lines. The clustering of the data is obtained from the clustering of these sub-groupings. Our method has a linear complexity in the dataset size, and requires only one pass on the original dataset. Being local in essence, it can handle twisted geometries typical of many high-dimensional datasets. We describe the steps of our method and report encouraging results.
1
INTRODUCTION
Clustering is a well-known base building block of many data mining processes. It consists in identifying automatically natural groupings of points in a dataset. In the past forty years, an abundant literature has flourished on the subject. Many methods, and variants of methods have been proposed, each with their qualities and weaknesses. A recent survey can be found in [3]. The unceasingly increasing size of today’s information generating processes causes the datasets to often have a large volume : large number of instances and high dimensions. These two problems have often been tackled separately in the literature. On one hand, handling high dimensions is tricky. Highdimensional data often contain non-convex, twisted cluster shapes and noise. It is very common that the clusters are not full-dimensional but have a low intrinsic dimensionality, and lie on non-linear manifolds. The concentration of measure phenomenon adds even more difficulties in that the distances, Euclidean included, tend to lose their meaning. Many approaches based either on density or distance suffer much from this phenomenon. Popular techniques which give good results, like Spectral Clustering, are unfortunately quadratically complex either with the dimension or the number of instances. On the other hand, to cope with a massive number of instances, grid-based clustering methods rely on a partitioning of the data space into cells and then aggregate those cells to form the final clustering. The implicit assumption is that a particular cell is ”pure” in that it contains only points from the same cluster. With a proper grid resolution the assumption is quite reasonable, and the time complexity is usually linear in the number of instances. Unfortunately, because of the density-based aggregation process, in general the performance degrades quickly as the dimension increases. 1
LIP6 - Universit´e Paris 6 Pierre et Marie Curie, email: {Marc-Ismael.Akodjenou, Patrick.Gallinari}@lip6.fr
France
How could one have the best of both worlds : keeping linear complexity in N , but overcoming the curse of dimensionality linearly in d too ? In this paper, we propose to keep and relax the notion of cell in grid-based clustering to the notion of ”sub-grouping”. A sub-grouping is a ”pure” subset of points obtained with cheap projections of the data on local lines. Why a set of local lines ? First, a line projection is computationally cheap. Second, the use of locality to overcome bad dimensionality effects is transversal to many clustering or dimensionality reduction approaches. For example in the approach of [2], or in Subspace Clustering, local Euclidean distances prove to be pertinent, even when they are globally inadequate. After the projection step, the clustering of the data is obtained through the clustering of the sub-groupings. The key assumption is that subgroupings coming from the same cluster have common points. The method is designed to be of linear complexity with the dataset size. Moreover, aware of the access costs of today’s databases, it requires only one pass on the original dataset. Throughout this paper, we will use the following notations : the dataset X ⊂ Rd is a matrix of N datapoints. M is the number of lines under consideration and m is the number of closest lines for each point. Sub-groupings are sets of indices S1 , . . . , SP ⊂ [1, N ]. K is the number of clusters. The dot product is noted ·, ·, the Euclidean distance - · - and the cardinal of a set | · |.
2
CLUSTERING FROM SUB-GROUPINGS
Figure 1.
Projections on nearest lines leave small dense zones on the lines
The idea of our method is depicted on Figure 1. We pierce the dataset with M randomly-oriented lines. We then ”shatter” the dataset by orthogonally projecting the points on the lines. As we will see, each particular point is close to a small number of lines; we project each point only on its m closest lines (in the sense of the orthogonal distance). The small dense zones left by the projection of the data are likely to be ”pure” in terms of cluster memberships. It is such a dense zone (precisely, the indices of the datapoints in it) that we call a sub-grouping S of points. As two subgroupings issued from the same cluster are likely to have points in common, we propose to
768
M.-I. Akodjènou and P. Gallinari / Efficient Data Clustering by Local Density Approximation
cluster the subgroupings Sj first, and to deduce the clustering of the original data from the clustering of the Sj .
2.1
Projections and Sub-Groupings
What would be a set of lines likely to yield good sub-groupings ? Under the linear complexity constraint, all that is left is to choose lines at random. However, as most of the space is empty in high dimensions, it is reasonable to require that each line is close to at least one point. For this, we take a random datapoint in X to be the ”origin” of the line. Precisely, we choose M lines Lk defined by their originorientation pairs (yk , uk ). The yk ∈ X are took randomly, and the vectors uk took randomly on the unit sphere. For a point x, its projection on the line p Lk is projk (x) = x − yk , uk and its orthogonal distance to it is -x − yk -2 − x − yk , uk 2 . Proportion of lines
they have in common and the total number of points in the two sets. At the end of this clustering step the subgroupings Sk are clustered in K groups C1 , . . . , CK . The clustering of original datapoints is straightly deduced : the cluster of the point xi ∈ X is the Ck in which x is the most present, ”presence” being measured by the number of times P x appears in the subgroupings of Ck : clusterid(xi ) = arg maxk S⊂Ck 1(xi ∈ S) .
3
The complexity of the approach is linear in the dataset size. The projection/distance calculation step is M projections of the dataset, that is O(N M d). Sorting the distances has worst-case complexity N M logM . Identification of the sub-groupings on the lines is finding the modes of the kernel estimates, that is O(M N ) where N is the mean size of the Ik ’s. The hierarchical clustering step is O(M 2 logM ), which yields a linear computational complexity of O(M N d + (N M + M 2 )logM ).
4 0
Distance
Figure 2. Typical distance-to-lines histogram for a point
The sub-groupings we wish to find are small dense zones on the lines. Projecting all points on all lines (that is, ignoring the distances of points to lines) will not allow to distinguish anything. Only the closest lines of a point are likely to give good dense zones. One can see a typical histogram of distances of a point to the lines on the Figure 2. The normally distributed component, often met in highdimensional settings, is always present but is preceded by a small number of ”close” lines (circled in red). It is those lines that we will select as candidates to obtain meaningful sub-groupings for the point. For this, we sort the distances to the lines for each point, and we keep its m closest lines. Our experiments show that a very little m (around 5 or 10) is suitable to yield an efficient clustering. More sophisticated selection techniques based on a thresholding of the distance histograms could be used. Note that at the end of this step, each line Lk will be associated to a set Ik of points. We now have to identify the sub-groupings on each line. This is done by finding the modes of the projection of the points of Ik on the line Lk . We do this the classical way : we use a (gaussian) ker“ t−proj ” P Lk (x) nel density estimate fˆk (t) = x∈Ik K , where K(·) h is a gaussian kernel, to model the density on the line, and find the modes by identifying the valleys of this density, what is very fast (there is only |Ik | points in the kernel estimates). Each mode of this density yields a sub-grouping S ∈ [1, N ]. The collection of all subgroupings S1 , . . . , SP obtained from the M lines is the representation forwarded for the hierarchical clustering step.
2.2
Hierarchical Clustering of Sub-Groupings
It remains now to cluster the subgroupings. Each subgrouping Sj is a subset of [1, N ]. As mentioned above, subgroupings coming from the same cluster are likely to have points in common. Focusing on this assumption, it is natural to use the Jaccard Distance in the hier | archical merging procedure. The distance dJacc (S, S ) = 1− |S∩S |S∪S | measures the affinity of two sets by means of the ratio of points
COMPLEXITY
EMPIRICAL EVALUATION
We evaluate our approach on three popular datasets : USPS2 (handwritten digits 3,5,6 and 8), Coil203 and Umist4 (image datasets). The characteristics (N, d, c), c being the number of true classes, and the parameters (M, m) used by our method are shown in the table. We compare our method with k-means and Spectral Clustering [1]. In the results table we use two popular criteria : the first is the NMI criterion, which measures the structural agreement between the clustering and the classes, while the Purity criterion expresses the homogeneity of the clusters with respect to the classes. Both take values in [0, 1]. Results shown are an average over 10 runs for k-means and our method. Spectral Clustering has been tuned to give its best results.
NMI Purity
(N, d, c) (M, m) K-means Spectral Our method K-means Spectral Our method
USPS-3568 4400, 256, 4 400, 4 0.36 0.31 0.59 0.52 0.52 0.49
Coil20 1440, 1024, 20 500, 4 0.76 0.91 0.87 0.65 0.87 0.81
Umist 564, 10304, 20 1000, 10 0.65 0.73 0.83 0.50 0.61 0.72
The results show that our method exhibits performance similar, and sometimes superior, to Spectral Clustering. These results encourage us to think that the method, though simple and with linear complexity, has good performances; the locality of the approach seems to be appropriate. In future work, we will take a closer look at the selection of M and m. We will also examine whether or not the use of other distances than Euclidean for constituting the local subgroupings can give better performances (cosine distance for example).
REFERENCES [1] I. Fischer and J. Poland, ‘Amplifying the block matrix structure for spectral clustering’, in Proceedings of the 14th Annual Machine Learning Conference of Belgium and the Netherlands, pp. 21–28, (2005). [2] Wanli Min, Ke Lu, and Xiaofei He, ‘Locality pursuit embedding’, Pattern Recognition Journal, 37(4), 781–788, (2004). [3] Xu Rui and D. Wunsch, ‘Survey of clustering algorithms’, in IEEE Transactions on Neural Networks, volume 16, pp. 645–678, (2005). 2 3 4
http://cervisia.org/machine learning data.php http://www1.cs.columbia.edu/CAVE/software/softlib/coil-20.php http://www.cs.toronto.edu/∼roweis/data.html
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-769
769
Gas Turbine Fault Diagnosis using Random Forests Manolis Maragoudakis 1 and Euripides Loukis 1 and Panayotis-Prodromos Pantelides 1 Abstract. In the present paper, Random Forests are used in a critical and at the same time non trivial problem concerning the diagnosis of Gas Turbine blading faults, portraying promising results. Random forests-based fault diagnosis is treated as a Pattern Recognition problem, based on measurements and feature selection. Two different types of inserting randomness to the trees are studied, based on different theoretical assumptions. The classifier is compared against other Machine Learning algorithms such as Neural Networks, Classification and Regression Trees, Naive Bayes and K-Nearest Neighbor. The performance of the prediction model reaches a level of 97% in terms of precision and recall, improving the existing state-of-the-art levels achieved by Neural Networks by a factor of 1.5%-2%.
1
INTRODUCTION
Development of effective Gas Turbine Condition Monitoring and Fault Diagnosis methods has been the target of considerable research in recent years. This is due to the high cost, sensitivity and importance of these engines for most industrial companies. Most of this research is directed towards the diagnosis of Gas Turbine blading faults, because of the catastrophic consequences that these faults can have, if they are not diagnosed in time. Even very small blading faults can very rapidly grow and result to huge destructions ([1], [2], [3]). Blading faults diagnosis is regarded to be a very difficult problem, because of the high levels of noise in all relevant measurements and the high interaction between the numerous Gas Turbine blading rows. Therefore, it is very important to take advantage of the processing power of modern computers, in order to provide a fast and reliable engine condition diagnosis from available measurements and to develop the highest possible level of intelligence and assistance to the operation and maintenance personnel. The Gas Turbine Blading Fault Diagnosis problem was originally addressed in [4] and [5], based on classical pattern recognition methods. Our contribution to the domain, is the introduction of an ensemble classifier, namely Random Forests, for the first time for the task at hand, which outperforms all previous attempts to Gas Turbine Blading Fault Diagnosis. Furthermore, Random Forests can provide some insight on the interrelationships between input features, unlike Neural nets, thus directing domain experts at selecting which measurement tools to use in real world applications.
2
1. Unsteady internal wall pressure (using fast response transducers P2 to P5). 2. Casing vibration (using accelerometers A1 to A6 mounted to the outside compressor casing). 3. Shaft displacement at compressor bearings (using transducer B). 4. Sound pressure levels (using double-layer microphone M). Five experiments were performed, testing the datum healthy engine and a similar engine with the following four typical small (but quite rapidly growing, as mentioned in the introductory section) and also not straightforwardly diagnosable faults: 1. 2. 3. 4.
Fault-1: Rotor fouling. Fault-2: Individual rotor blade fouling. Fault-3: Individual rotor blade twisted (by appr. 8 degs). Fault-4: Stator blade restaggering.
Tests were performed at four different engine loads (full load, half load, quarter load and no load), both for the healthy engine as well as for the above four faults. At each load, four series of time-domain data were acquired for each instrument (two series in each of the two sampling frequencies, l = 13 kHz and m = 32 kHz). 12 different measuring instruments were used and measurements were taken for every possible combination between engine’s 5 operational conditions (healthy engine and 4 faulty conditions), 4 different engine loads (full load, half load, quarter load and no load) and 2 sampling frequencies (low and high). To be more precise, regarding engine’s healthy condition, measurements have been taken for every combination between the engine load and sampling frequency (total 8 different combinations). Especially in engine’s faulty condition there’s been one more measurement series for all the above combinations. Consequently, for every instrument we have aggregately 72 different measurements: 8 healthy engine’s measurements and 64 faulty engine’s measurements. For every instrument, each and every one of the above measurements consists of 27 values that are forms of the spectral difference of the first 27 harmonics of rotor’s shaft rotational frequency. So, if we would like to present the entirety of data in a data base then this would be composed of 864 instances described by 27 distinct attributes, corresponding to the 27 harmonics.
PROBLEM & DATA DESCRIPTION
The present work is based on data acquired from dynamic measurements on an industrial Gas Turbine into which different faults were 1
artificially introduced. During the experimental phase four categories of measurements were performed simultaneously:
University of the Aegean, Department of Information and Communication Systems Engineering, Samos, Greece
3
RANDOM FORESTS
Despite the fact that Random Forests have been quite successful in classification and regression tasks, to the best of our knowledge, there has been no research in using the afore-mentioned algorithm for Gas
770
M. Maragoudakis et al. / Gas Turbine Fault Diagnosis Using Random Forests
Turbine Fault Diagnosis. Random Forests are a combination of tree classifiers such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. A Random Forest multi-way classifier Θ(x) consists of a number of trees, with each tree grown using some form of randomization, where x is an input instance [8]. The leaf nodes of each tree are labeled by estimates of the posterior distribution over the data class labels. Each internal node contains a test that best splits the space of data to be classified. A new, unseen instance is classified by sending it down every tree and aggregating the reached leaf distributions. In order to make the classification process more formal, suppose that the joint classifier Θ(x) contains x individual classifiers Θ1 (x), Θ2 (x),...,Θx (x). Let us also assume that each data instance is a pair (x,y), where x denotes the input attributes, taken from a set Ai , i=1,...,M and y symbolizes the set of class labels Lj , j=1,...,c (c is the number of class values). For reasons of simplicity, the correct class will be denoted as y, without any indices. Each discrete attribute Ai takes values from a set Vi , i=1 to mi (mi is the number of values attribute Ai has). Finally, the probability that an attribute Ai has value vk is denoted by p(vi,k ), the probability of a class value yj is denoted by p(yj ) and the probability of an instance with attribute Ai having value vk and class label yj is symbolized by p(yj |vi,k ). Each training example is picked up from a set of N instances at random with replacement. By this procedure, called bootstrap replication, a pool of 36.8% of the training examples are not used for the tree construction phase. These out-of-bag (oob) instances allow for computing the degree of strength and correlation of the forest structure. Suppose that Ok (x) is the set of oob instances of classifier Θk (x). Furthermore, let Q(x, yj ) denote the subset of oob samples which were voted to have class yj at input example x. An estimate of p(Θ(x) = yj ) is given by the following equation:
a classifier may achieve high accuracy by simply always predicting the non faulty class. This problem particularly appears in the present task, where, from more than 2/5 of the data set contained the aforementioned class. A set of well-known machine learning techniques have constituted the benchmark to which our results have been compared: Multi-layer Perceptron Neural Networks, Naive Bayes, Classification and Regression Trees (CART), and k-Nearest Neighbor (kNN) instance-based learning. Cross validation was performed with k-NN in order to determine the best k. As regards to the Random Forests implementation, the best results were obtained by using 500 trees and 6 features. Due to lack of space, the evaluation outcome is depicted in the following figure, for the precision metric (F1 to F4 denotes the fault categories and OK denotes the non faulty state).
K Q(x, yj ) =
k=1 I(Θk (x) = yj ; (x, y) ∈ Ok ) K k=1 I(Θk (x); (x, y) ∈ Ok )
(1) Figure 1. Evaluation results in terms of precision for all methodologies.
where I(·) is the indicator function. The margin function which measures the extent to which the average vote for the right class y exceeds the average vote for any other class labels is computed by: margin(x, y) = P (Θ(x) = y) − maxcj=1,j =y (P (Θ(x) = yj ) (2) Since strength is defined as the expected margin, it is computed as the average over the training set: s=
n 1 (Q(xi , y) − maxcj=1,j =y Q(xi , yj )) n i=1
(3)
The average correlation is given by the variance of the margin over the square of the standard deviation of the forest: p=
V ar(margin) σ(Θ())2
(4)
is estimated for every input example x in the training set Q(x, yj ).
4
EXPERIMENTAL RESULTS
We applied two versions of Random Forests (Random Input (RI) Forests and Random Combination (RC) Forests) on the Gas Turbine data set, using oob estimates. As for evaluation metric, we considered per class precision and recall. Accuracy in some domains, such as the one at hand, is not actually a good metric due to the fact that
REFERENCES [1] E. Loukis, P. Wetta, K. Mathioudakis, A. Papathana siou, K. Papailiou, Combination of Different Unsteady Quantity Measurements for Gas Turbine Blade Fault Diagnosis, 36th ASME International Gas Turbine and Aeroengine Congress, Orlando, 1991, ASME paper 91- GT-201. [2] E. Loukis, Contribution to Gas Turbine Fault Diagnosis Using Methods of Fast Response Measurement Analysis, Doctoral Thesis, Athens, National Technical University of Athens, 1993. [3] G. Merrington, O. K. Kwon, G. Godwin, B. Carlsson, Fault Detection and Diagnosis in Gas Turbines, ASME Journal of Engineering for Gas Turbines and Power, 113, 1991, 11-19. [4] E. Loukis, K. Mathioudakis, K. Papailiou, A procedure for Automated Gas Turbine Blade Fault Identification Based on Spectral Pattern Analysis, Journal of Engineering for Gas Turbines and Power, 114, 1992, 201-208. [5] E. Loukis, K. Mathioudakis, K. Papailiou, Optimizing Automated Gas Turbine Fault Detection Using Statistical Pattern Recognition, Journal of Engineering for Gas Turbines and Power, 116, 1994, 165-171. [6] Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and regression trees. Wadsworth Inc., Belmont, California, 1984. [7] Leo Breiman. Bagging predictors. Machine Learning Journal, 26(2):123140, 1996. [8] Igor Kononenko. Estimating attributes: analysis and extensions of Relief. In Luc De Raedt and Francesco Bergadano, editors, Machine Learning: ECML-94, pp. 171182. Springer Verlag, Berlin, 1994.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-771
771
How Many Objects?: Determining the Number of Clusters with a Skewed Distribution Satoshi Oyama1 and Katsumi Tanaka2 Abstract. We propose a supervised approach to enable accurate determination of the number of clusters in object identification. We use the aggregated attribute values of the data set to be clustered as explanatory variables in the prediction model. Attribute aggregation can be done in linear time with respect to the number of data items, so our method can be used to predict the number of clusters with a low computational burden. To deal with skewed target values, we introduce a two-stage method as well as a method using a higher-order combination of explanatory variables. Experiments demonstrate our methods enable more accurate prediction than existing methods.
1 INTRODUCTION Object-identification problems, in which it is necessary to determine whether names appearing in documents or database records correspond to the same real world object, are important in information retrieval and integration. Typical examples of object-identification problems include disambiguating namesakes in Web search results and establishing correspondence between an abbreviated author name in bibliographic databases and a particular person. Objectidentification problems are generally solved by clustering data that contain an ambiguous name and by regarding data in the same cluster as corresponding to the same object. Among the various clustering algorithms, the most widely used are k-means and hierarchical algorithms including single-linkage. One problem in using the k-means clustering algorithm, though, is that a user has to specify the number of clusters as a parameter before starting the clustering procedure. If we use a hierarchical clustering algorithm for object identification, we must specify the number of clusters or a stopping condition so that the algorithm stops the clustering and outputs the results after a certain number of clusters have been found. Determining the number of clusters as a parameter in an objectidentification problem is not easy. One reason for this difficulty is that the number of corresponding objects varies considerably from name to name. For example, in the DBLP computer science bibliography3 , which is commonly used as a test collection for object identification, we observed that the number of corresponding full names (clusters) k and the frequency f of abbreviated names obey a powerlaw distribution: f (k) = αk−γ (α and γ are parameters). In a power-law distribution, a very large number of data items with low values coexist with a few data items with very high values. Thus the average value of the data is meaningless, and there are no “typical” data values. For example, in the data set we used, the average 1 2 3
Kyoto University, Japan, email: oyama@i.kyoto-u.ac.jp Kyoto University, Japan, email: ktanaka@i.kyoto-u.ac.jp http://dblp.uni-trier.de/
number of full names per abbreviated name is 1.5, but setting the parameter of the number of clusters to 1 (which means doing no clustering) or 2 for all names is not meaningful because that results in very poor performance for names with very many clusters. Therefore, we need to use a different number of clusters for each clustering problem with a distinct ambiguous name.
2 SUPERVISED-LEARNING APPROACH Previous methods to determine the number of clusters take an “unsupervised” approach and treat each clustering problem independently [1, 2, 3]. In contrast, we take a supervised approach that uses other clustering problems for which we know the true numbers of clusters to predict the number of clusters for an unknown problem. We think this is a reasonable approach for object identification where we solve many similar clustering problems for different names in the same domain. Our approach avoids unnecessary clustering for data sets with one cluster because model-based prediction of the numbers of clusters is used. This is especially effective for object identification when the numbers of clusters follow a power-law distribution and one-cluster problems (problems with no need for clustering) are a large proportion of the problems. Assume we have pairs of a data set S j to cluster and the true number of clusters in it, y j , where the pairs are denoted as T = {(S 1 , y 1 ), (S 2 , y 2 ), . . . , (S |T | , y |T | )}. Using T as training data, we construct a function fT that gives a prediction y of the number of clusters for an unknown data set S. We can consider various forms of function fP T . Among them, one of the simplest models is a linear model, y = i wi xi + b, where {xi } are explanatory variables that characterize the data set to be clustered, and {wi } and b are parameters determined from the training data T . The number of clusters should be predicted efficiently. The computational cost of k-means is O(kn) and that of a hierarchical clustering method is O(n2 ). Therefore, in practice, the prediction of the number of clusters should be done in linear time with respect to the number of data. Our model should return the number of clusters given a data set cluster, so we need explanatory variables that characterize the statistics of the set of data rather than each datum. In addition, efficiently computing explanatory variables is required. Aggregations of attribute values of the data items to be clustered are good candidates for such explanatory variables. We devised several types of variables that might be correlated with the number of clusters. The explanatory variables we will introduce can be computed in linear time with respect to the number of data items. We can easily compute the value of the aggregated variables by using aggregate functions such as count(), max(), min(), and avg(), which are available in most database systems.
772
S. Oyama and K. Tanaka / How Many Objects?: Determining the Number of Clusters with a Skewed Distribution
We use support vector regression [4] to determine the parameters in the linear model. One difficulty in building a model to predict values from a skewed distribution like a power-law distribution is that there is a large imbalance in the numbers of available training data for different target values. A large portion of the training data is shared by the data items with a target value of 1, and there are relatively few data items with large target values. If we use such training data, there is a risk of obtaining a model that underestimates the target values. To overcome the problem of imbalance between the numbers of training data for different target values, we introduce a method that successively applies two different models when predicting the number of clusters: (1) One model determines whether a given data set is composed of one cluster or multiple clusters. (2) The other model determines the number of clusters for a data set predicted to be composed of multiple clusters by the previous model. In ecology, a similar two-stage method is used to build a model to predict the abundance of rare species [5], although the learning methods used in each stage are different from ours. Another extension is that we use a model that is nonlinear to the explanatory variables rather than a linear model. Specifically, we consider a model using combinations of the explanatory variables. Using a higher-order model with large expressive power helps avoid the risk of under-fitting the training data, which sometimes occurs when applying a simple linear model to skewed data. In our implementation, we adopt a kernel trick and use a quadratic polynomial kernel: k(x, z) = (x, z + 1)2 . By using the kernel in support vector learning, we can virtually use the conjunctions of explanatory variables in the model without actually computing the values of conjunctions.
3 EXPERIMENTS We took the disambiguation of abbreviated author names in a bibliographic database as an example task. From the DBLP data, we randomly selected 2,000 abbreviated names corresponding to more than one paper. We did not use abbreviated names that corresponded to only one paper because there is obviously only one cluster (full name) for them. For each selected abbreviated name, we collected bibliographic data containing the name as an author and computed the value of the following explanatory variables: (1) Number of papers with the target abbreviated author name, (2) Number of different coauthors in the data set, (3) Number of different words appearing in the paper titles, (4) Number of different journals or conference proceedings in which the papers are published, (5) Difference between publication years of the newest and oldest papers, (6) Standard deviation of publication years of papers in the data set, (7) Frequency of last names used in abbreviated names in the database, (8) Percentage of abbreviated names with a particular letter among the abbreviated names. We applied 10-fold cross validation. We used SVMlight 4 , which implements support vector regression to build the regression models as well as binary support vector machines used in building two-stage models. As the metric, we used the root mean square error (RMSE) between the true number of clusters (full names) and the predicted number of clusters given by a model. We compared the Cali´nski and Harabasz (C&H) method [1], the Hartigan method [2], a method using an average threshold, x-means [3], the basic learning-based method (Linear (1 stage)), a two-stage method (Linear (2 stages)), nonlinear regression using a polynomial 4
http://svmlight.joachims.org/
kernel (Polynomial (1 stage)), and a two-stage method using a polynomial kernel in each stage (Polynomial (2 stages)). For C&H, Hartigan, and x-means, we simply applied the methods for the clustering problems in the test sets and did not use the training sets. For the method using an average threshold, we applied the single-linkage method for each clustering problem in the training set and calculated the average of the thresholds that resulted in the true numbers of clusters. We then applied the single-linkage method to each clustering problem in the test sets and determined the number of clusters by using the average threshold as the clustering-stopping condition. The overall RMSE for each method is shown in Table 1. The four learning-based methods outperformed the other methods. Among the four learning-based methods, the two-stage model and the model with the polynomial kernel outperformed the basic model, and their combination gave the results with the smallest errors. Table 1. C&H Hartigan Threshold X-means
3.063 2.279 2.231 2.585
RMSE for each method Linear (1 stage) Linear (2 stages) Polynomial (1 stage) Polynomial (2 stages)
1.819 1.490 1.145 1.114
4 CONCLUSION We described a supervised, model-based approach to predicting the number of clusters in a data set, which is more efficient and accurate than existing approaches. In addition, it enables us to avoid unnecessary clustering for one-cluster problems, which are a large proportion of the problems. As explanatory variables used in the prediction model, we used aggregated attribute values of the data set to be clustered, which can be computed efficiently. We described a basic learning-based method using a linear model as well as two extended methods: a two-stage method and a method using combinations of explanatory variables. Experimental results in author disambiguation showed that our learning-based methods outperformed existing methods and that the two extensions improved the performance of the basic linear model.
ACKNOWLEDGMENTS This work was supported in part by Grants-in-Aid for Scientific Research (Nos. 18049041 and 19700091) from MEXT of Japan, a MEXT project entitled “Software Technologies for Search and Integration across Heterogeneous-Media Archives,” a Kyoto University GCOE Program entitled “Informatics Education and Research for Knowledge-Circulating Society,” and a Microsoft IJARC CORE4 project entitled “Toward Spatio-Temporal Object Search from the Web.”
REFERENCES [1] T. Cali´nski and J. Harabasz, ‘A dendrite method for cluster analysis’, Communications in Statistics, 3(1), 1–27, (1974). [2] J. A. Hartigan, Clustering Algorithms, Wiley, 1975. [3] D. Pelleg and A. Moore, ‘X-means: Extending K-means with efficient estimation of the number of clusters’, in Proceedings of ICML 2000, pp. 727–734, (2000). [4] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, 1998. [5] A. H. Welsh, R. B. Cunningham, C. F. Donnelly, and D. B. Lindenmayer, ‘Modelling the abundance of rare species: Statistical models for counts with extra zeros’, Ecological Modelling, 88, 297–308, (1996).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-773
773
Active Concept Learning For Ontology Evolution 1 Murat S¸ensoy and Pınar Yolum 2 Abstract. This paper proposes an approach that enables agents to teach each other concepts from their ontologies using examples. Unlike other concept learning approaches, our approach enables the learner to elicit the most informative examples interactively from the teacher. Hence, the learner participates to the learning process actively. We empirically compare the proposed approach with the previous concept learning approaches. Our experiments show that using the proposed approach, agents can learn new concepts successfully and with fewer examples.
1
Introduction
In current approaches to concept learning, the learner is passive. That is, the training examples are solely chosen by the teacher. However, this assumes that the teacher has an accurate view of what the learner knows, which concepts are confusing for it, and so on. We propose to involve the learner in the learning process by enabling it to interact with the teacher to elicit the most useful examples for its understanding of the concept to be learned. In our approach, each agent represents its domain knowledge using an ontology and manages this ontology using a network of experts. Each expert is a stand-alone learner composed of one or more classifiers. Main task of an expert is to learn how to discriminate between the sub-concepts of a specific concept. An agent learns a new concept from another agent using our approach as follows: 1. The learner agent asks for the positive examples of the concept from the teacher agent. 2. After receiving the positive examples, the learner determines the new concept’s parent in its ontology using those positive examples. Then, the expert related to the parent concept is entitled to learn the new concept. 3. This expert determines some negative examples of the new concept using the positive examples and a semi-supervised learning approach. Hence, it first learns the new concept roughly without receiving any negative examples from the teacher. 4. The expert iteratively enhances its knowledge on the new concept by eliciting the most useful negative examples from the teacher. 5. After learning the new concept sufficiently, it is placed into the learner’s ontology and the ontology is modified accordingly.
2
Representing Knowledge
In the current instance-based concept learning approaches, one classifier is trained to learn each concept independently [3]. Although the 1
.This research has been partially supported by Bo˘gazic¸ i University Research Fund under grant BAP07A102 and The Scientific and Technological Research Council of Turkey by a CAREER Award under grant 105E073. 2 Department of Computer Engineering, Bo˘ gazic¸ i University, Bebek, 34342, Istanbul,Turkey, email: {murat.sensoy,pinar.yolum}@boun.edu.tr
concepts are related through parent-child relationships, their classifiers are regarded as independent of one another. Such approaches require each classifier to learn how to discriminate instances of one concept from those of every other concept in the ontology. Therefore, in order to learn a single concept, the agent uses the whole domain knowledge. In this paper, we envision that the domain knowledge related to an ontology is managed by a set of experts, each of which is knowledgeable in a certain concept. By knowledgeable in a concept, we mean that the expert can correctly report which of the concept’s subclasses an instance belongs to. Hence, each expert is trained with examples of the concept and nothing else. For example, an expert on motorcycles can tell us correctly that Burgman 400 is a scooter.
3
Actively Learning A Concept
While teacher teaches a new concept to the learner, it first selects a set of positive examples of the concept. This is relatively easier than selecting negative examples, which are chosen among instances of any other concept. Then, the teacher gives the selected positive examples to the learner. In our approach, negative examples are not directly given by the teacher, because the teacher cannot estimate which examples are more useful or informative for the learner. The given positive examples are classified using the experts of the learner and the most specific concept in the learners ontology is determined so that this concept subsumes all of the positive examples. Assume that, the teacher wants to teach Motorcycles concept to the learner, so it first provides examples of motorcycles to the learner. The learner realized that all of the provided examples are instances of Car&M otorsports concept in its ontology. Hence, learning task is delegated to the expert of Car&M otorsports concept. The expert examines the other instances of Car&M otorsports to differentiate given motorcycles from the others as much as possible. Motorcycle instances should have some features in common that make them separate from the other instances of Car&M otorsports concept. In order to determine which features are more important for the M otorcycles concept, the differences of the feature distributions between the positive examples and the unlabeled examples can be used [2, 5]. We can estimate how significant an instance I is as a motorcycle example, using the significance of its features. After computing the significance value for each known instance of Car&M otorsports, the obvious negative examples of M otorcycles are chosen among the instances that have the least significance values. Using these negative examples and the positive examples provided by the teacher, the expert tries to learn the new concept roughly. Note that until now, the teacher has not provided any negative examples. Using the positive examples of M otorcycles and the obvious negative examples, the expert trains a classifier. This classifier can
774
M. S¸ ensoy and P. Yolum / Active Concept Learning for Ontology Evolution
those approaches, the learner is inactive during the selection of the negative examples [3, 1]. The teacher selects the negative examples using its own ontology and viewpoint. Then, the learner is given positive examples and negative examples of the concept to be taught. In order to measure how successful our approach is in learning new concept for different number of negative examples, we set up experiments where the teacher is allowed to give or label only a predefined number of negative examples. Then, these examples are given to the learner (as feedback in our approach). After training the learner with these examples, probability of misclassification is computed. Figure 2 compares the results for the teacher-driven approach and the proposed approach. 0.45 Teacher−driven approach Proposed approach
0.4
0.35
Probability of misclassification
roughly discriminate instances of M otorcycles from other instances of Car&M otorsports. However, the boundary between these two classes is not learned precisely yet, because only the obvious negative examples are used for training. Moreover, some of these negative examples can be wrongly chosen. This may seriously affect the performance of the trained classifier. Therefore, the expert iteratively elicits more useful negative examples from the teacher and learns this boundary more precisely and correctly. Specifically, at each iteration, the expert samples instances of Car&M otorsports and then using the classifier, it labels these sampled instances as instance of M otorcycles or not. Then, the teacher instructs the expert about the correct labels of these examples. The feedback from the teacher is used to refine and improve the knowledge of the expert about the new concept M otorcycles. This, iterative active learning phase continues until the teacher makes sure that the learner correctly learns the concept. Then, the new concept is placed into the learner’s ontology as a new subconcept of Car&M otorsports. Lastly, we test whether M otorcycles concept subsumes some subconcepts of Car&M otorsports or not. If this is the case, concept-subconcept relationships are rearranged.
0.3
0.25
0.2
0.15
0.1
4
Evaluation
0.05
0
In order to evaluate our approach, we conduct several experiments in online shopping domain. For this purpose, we derive domain knowledge from Epinions3 . In our experiments, there is one teacher agent and one learner agent. In the implementation of the agents and the experts, we use JAVA and the C4.5 decision tree classifier of WEKA data mining project [4]. In our experiments, an instance refers to a product item such as IBM ThinkPad T60, which is an instance of PCLaptops concept. Each product item has a web page in Epinions website and this page contains specification of the product item in English. We derive a core vocabulary from these specifications automatically and each word in this vocabulary is used as a feature [2]. Figure 1 shows the performance of our approach at each iteration in terms of the probability of misclassification. In Figure 1, after the first iteration, the expert learns the new concepts roughly (with %12 error). This error rate is not acceptable for the teacher, so the expert continues with the next iteration. The second iteration results in a considerable progress in the learning performance (error drops to %4). The classification error drops to zero at the fifth iteration, which means that the teacher and the learner have exactly the same understanding for this concept. 0.12
Probability of misclassification
0.1
0.08
0.06
Figure 2.
20
40
60 80 100 Number of negative examples
120
140
150
Probability of misclassification with different number of negative examples.
As seen in Figure 2, the teacher-driven approach requires more negative examples than the proposed approach in order to achieve an acceptable performance. With only five negative examples, the learner that uses the proposed approach fails only on the 12% of its classifications. However, in the same case, the learner using the teacher-driven approach misclassifies an instance with a probability of slightly higher than 0.4. Similarly, with only 35 negative examples, on the average, the proposed approach can learn a concept perfectly, while the teacher-driven approach requires approximately 150 negative examples for the same quality of learning.
5
Discussion
This paper develops a framework for instance-based concept learning, where a learner can estimate some negative examples of the concept to be learned and obtain feedback about these negative examples from the teacher to learn the concept accurately. Our experiments show that our approach significantly outperform a teacher-driven approach that represents other instance-based concept learning approaches in the literature by enabling learners to learn a concept with few examples.
0.04
REFERENCES
0.02
0 1
2
3
4
5
Iteration
Figure 1. Probability of misclassification at different iterations.
We compare our approach with a teacher-driven concept learning approach. This approach represents the current concept learning approaches in the literature. Contrary to the proposed approach, in 3
5
http://www.epinions.com
[1] A. Doan, J. Madhaven, R. Dhamankar, P. Domingos, and A. Helevy. Learning to match ontologies on the semantic web. VLDB Journal, pages 303–319, 2003. [2] G. P. C. Fung, J. X. Yu, H. Lu, and P. S. Yu. Text classification without negative examples revisit. IEEE TKDE, 18(1):6–20, 2006. [3] S. Sen and P. Kar. Sharing a concept. In Working Notes of the AAAI-02 Spring Symposium, pages 55–60, 2002. [4] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2005. [5] H. Yu, J. Han, and K. C.-C. Chang. PEBL: Web page classification without negative examples. IEEE TKDE, 16(1):70–81, 2004.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-775
775
Determining Automatically the Size of Learned Ontologies Elias Zavitsanos1,2 and Sergios Petridis1 and Georgios Paliouras1 and George A. Vouros2 Abstract. Determining the size of an ontology that is automatically learned from texts is an open issue. In this paper, we study the similarity between ontology concepts at different levels of a taxonomy, quantifying in a natural manner the quality of the ontology attained. Our approach is integrated in a method for language-neutral learning of ontologies from texts, which relies on conditional independence tests over thematic topics that are discovered using LDA.
1
INTRODUCTION
Ontology learning is commonly viewed [1, 3] as the task of extending or enriching a seed ontology with new ontology elements mined from text corpora. While much work concentrates on enriching existing ontologies, in this paper, we propose an automated statistical approach to ontology learning, without presupposing the existence of a seed ontology. The proposed method tackles both tasks of concept identification and taxonomy construction. Among the difficulties of such an endeavor, is the determination of the appropriate depth of the subsumption hierarchy, given the text collection at hand. The benefit of being able to determine the depth of a taxonomy is that the hierarchy captures accurately the domain knowledge provided by the texts, reducing the extent of overlap among concepts and providing a coherent representation of the domain. In the proposed method, concepts are identified and represented as multinomial distributions over terms in documents, using the Markov Chain Monte Carlo (MCMC) process of Gibbs sampling [4], following the Latent Dirichlet Allocation (LDA) [2] model. To discover the subsumption relations between the identified concepts, conditional independence tests among these concepts are performed. Finally, statistical measures between the discovered concepts at different levels of the hierarchy are used to optimize the size of the ontology.
2
THE PROPOSED METHOD
Given a corpus of documents, treating each document as a bag of words, we remove the stop-words. The remaining words form the term space for the application of the topic generation model (LDA). The next step creates a Document - Term matrix, each entry of which records the frequency of each term in each document. This matrix is used as input to LDA. Next, the iterative task of the learning method is initiated. Sets of topics, that we call layers, are generated by the iterative application of LDA. Starting with one topic and by incrementing the number 1 2
Inst. of Informatics and Telecommunications, NCSR “Demokritos”, Greece, email: {izavits | petridis | paliourg}@iit.demokritos.gr University of Aegean, Dpt. of Information and Communication Systems Engineering, Greece, email: georgev@aegean.gr
of topics in each iteration, layers with more topics are generated. A layer comprising few topics attempts to capture all the knowledge of the corpus through generic topics. As the number of topics increases, the topics become more focused, capturing more detailed domain knowledge. Thus, the method starts from “general” topics, iterates, and converges to more “specific” ones. In each iteration, the method identifies the subsumption relations that hold between topics of different layers according to their conditional independencies. Since the generated topics are random variables, e.g. A and B, by measuring their mutual information we obtain an estimate of their mutual dependence. Given a third variable C that makes A and B conditionally independent, the mutual information of topics A and B is reduced and is captured by topic C, i.e., C is a broader topic than the others. Thus, we may safely assume that C subsumes both A and B and the corresponding relations are added to the ontology. Moreover, C has been generated before A and B. Thus, it belongs in a layer that contains topics that are broader in meaning than the ones in the layer of A and B. A significant contribution is the determination of the appropriate depth of the hierarchy from the given corpus of documents. We use a criterion based on the similarity of topic distributions that indicates the convergence towards the appropriate depth. We thus improve on our recent work [5] by fitting this criterion, which controls the iterative process of the topic discovery. This stopping criterion is based on the symmetric KL divergence between concepts of different levels that participate in subsumption relations. The intuition is that the KL divergence between concepts that belong in the top levels of the hierarchy should be higher than the KL divergence between concepts that belong in the lower levels. This is because the top concepts are broader in scope than lower ones and the “semantic distance” between them and their children is expected to be higher than this of more specific concepts and their children. To validate this assumption, we have experimented with the Genia3 and the Lonely Planet gold ontologies and the corresponding corpora4 . In order to measure the similarity of the concepts in the ontologies using statistical measures, we represented the concepts of each gold standard ontology as probability distributions over the term space of the corresponding corpus. To create such a representation, we have to measure the frequency of the terms that appear in the context of each concept. In both corpora, the concept instances are annotated in the texts, providing direct population of the concepts in the golden standard ontologies with their instances. Therefore, it is possible to associate each document to the concept(s) that it refers to, by counting the concept instances that appear in the document. 3 4
The GENIA project, http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA The Lonely Planet travel advise and information, http://www.lonelyplanet.com/
776
E. Zavitsanos et al. / Determining Automatically the Size of Learned Ontologies
Thus, we create feature vectors based on the document in which each concept appears. These feature vectors form a two-dimensional matrix that records the frequency of each term in the context of each concept. That is, we have a representation of each concept as a distribution over the term space of the text collection. For each concept, frequencies are normalized giving a probability distribution over the term space. Figure 1 depicts the results obtained by measuring the similarity between concepts that participate in subsumption relations, in the case of the Genia and the Lonely Planet gold standard ontologies. Small values of KL divergence indicate high similarity between concepts. Figure 1 also confirms our assumption that concepts at the lower levels of the hierarchy are more similar to their children than concepts at higher levels of the hierarchy.
observed relative insensitivity of the result for values between 0.2 and 0.4 and we opted for the more conservative value in this plateau. Table 1 depicts the results. Table 1.
Evaluation results for the Genia and the Lonely Planet corpora.
Concept Identification for the Genia corpus Precision Recall F-measure 94% 76% 84% Subsumption Hierarchy Construction for the Genia corpus Precision Recall F-measure 93% 75% 83% Concept Identification for the Lonely Planet corpus Precision Recall F-measure 62% 36% 44% Subsumption Hierarchy Construction for the Lonely Planet corpus Precision Recall F-measure 53% 35% 42%
To obtain a more detailed picture of the performance of the method, we replaced the stopping criterion with predefined depths for the learned hierarchy and we experimented in both corpora. Figure 2 presents the evaluation results in terms of the F-measure for various depths of the hierarchy, using the same evaluation style. Figure 1. Average KL Divergence of subsumed concepts in the Genia and the Lonely Planet gold standard ontology.
Based on this approach, we define a relative criterion that indicates how deep the hierarchy should be according to the information provided by the corpus of documents. This criterion, which controls the iterative task of the proposed method is defined as: KLbottom 1− < ε. KLtop corresponds to the average symmetKLtop ric KL divergence between the concepts of level l and the concepts of level l +1. KLbottom is the average symmetric KL divergence between the concepts at level l+1 and the concepts of level l+2. Values close to 0 indicate that the new level of concepts added does not differ much from the parent concepts. Thus we are reaching maximum “specificity” and therefore optimal depth. Actually, the parameter ε has a very small value very close to zero to avoid small rounding errors during the computations.
Figure 2 depicts that for a predefined depth of 8 levels in case of Genia, or 10 levels, in the case of Lonely Planet, the F-measure is maximized reaching the values of table 1. Therefore, the method determined correctly the appropriate depth in both corpora.
3
ACKNOWLEDGEMENTS
EVALUATION
We have evaluated the proposed method on both corpora introduced in section 2. Our evaluation procedure uses the representation of the golden standard concepts as probability distributions over the term space of the documents, as explained in section 2. In addition, the concepts of the produced hierarchy have exactly the same representation. They are probability distributions over the same term space. We can, thus, perform a one-to-one comparison of the golden concepts and the produced topics. Specifically, a topic is matched to a concept if their corresponding distributions were the “closest” compared to all the other and their KL divergence was below a fixed threshold thKL . The quantitative results have been produced using the metrics of Precision and Recall. The choice of threshold thKL affects the quantitative results, since a strict choice would force few topics to be matched with golden concepts, while a loose choice would cause many topics to be matched with golden concepts. We have chosen a value of thKL = 0.2 for the purposes of our evaluation, as we
Figure 2. F-measures for Concepts Identification and Subsumption Hierarchy Construction for the Genia (left) and the Lonely Planet (right) corpora.
The presented work was supported by the research and development project ONTOSUM5 , funded by the Greek General Secretariat for Research and Technology.
REFERENCES [1] E. Agirre, O. Ansa, E. Hovy, and D. Martinez, ‘Enriching very large ontologies using the www’, in Ontology Construction Workshop, (2000). [2] D.M. Blei, A.Y. Ng, and M.I. Jordan, ‘Latent dirichlet allocation’, Journal of Machine Learning Research, (2003). [3] A. Faatz and R. Steinmetz, ‘Ontology enrichment with texts from the www’, in Semantic Web Mining Workshop ECML/PKDD, (2002). [4] T. Griffiths and M. Steyvers, ‘A probabilistic approach to semantic representation’, in Conference of the Cognitive Science Society, (2002). [5] E. Zavitsanos, G. Paliouras, G.A. Vouros, and S. Petridis, ‘Discovering subsumption hierarchies of ontology concepts from text corpora’, in Proceedings of the International Conference on Web Intelligence, (2007). 5
See also http://www.ontosum.org/
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-777
777
Dynamic Multi-Armed Bandit with Covariates Nicos G. Pavlidis1 , Dimitris K. Tasoulis1 , Niall M. Adams2 and David J. Hand 1, 2 1
Multi-Armed Bandit with Covariates
In numerous real-world problems an agent selects repeatedly among a number of actions whose rewards are uncertain, aiming to maximise the overall reward. Each action-selection generates a reward and provides new information that reduces the agent’s uncertainty about the selected action. Ideally, one would like to maximise both immediate reward and the reduction of uncertainty, but the two objectives are often conflicting. The action with the highest expected reward is typically selected frequently, and therefore there is relatively little uncertainty about its reward. On the other hand, actions whose rewards are more uncertain are characterised by a low expected reward. In this class of problems, the central issue is how to resolve the dilemma between discovering new knowledge (exploration), and maximising reward based on existing knowledge (exploitation). The multi-armed bandit problem [3] is the simplest model to study this trade-off. In this problem an agent chooses repeatedly among n different actions, A = {α1 , α2 , . . . , αn } [4]. Each action selection, called a play, generates a numerical reward derived from the probability distribution associated with the chosen action. The learning problem that the agent faces is that of sampling sequentially from n populations with different probability densities, in order to maximise the cumulative reward. We study an extension of the standard problem in which the agent at each play observes a covariate, x(t) ∼ N (μ, σ 2 ), prior to selecting an action, and the reward is a linear function of the covariates: rαi (x(t)) = β0i + β1i x(t) + i ,
i ∼ N (0, σi2 ).
(1)
Such problems are encountered in real-world applications like scheduling, resource allocation and disaster management. A policy is a rule that associates states, x(t), with actions, αi . In a static environment, the learning problem that the agent faces is to identify the optimal policy, i.e. to select the action with the highest expected reward based on all possible realisations of x(t). Yang and Zhu [6], have shown that for a variety of nonparametric regression estimators, the ε-greedy strategy [5] with ε decreasing towards zero, converges asymptotically to the optimal policy [6]. The performance of a number of action-selection methods for the static multi-armed bandit problem with covariates was evaluated in [2]. In this work we investigate a dynamic environment in which the coefficients of all the reward functions change over time, according to a random walk. The agent is not learning the underlying dynamics of the environment. In this setting there is no optimal policy in the sense that there can be no fixed rule that associates states with optimal actions since the best action for a given state changes 1
Institute for Mathematical Sciences, 2 Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, United Kingdom, email: {n.pavlidis, d.tasoulis, n.adams, d.j.hand}@imperial.ac.uk
over time. The agent instead attempts to formulate accurate estimates of the coefficients of the reward functions which are used by the ε-greedy action-selection strategy. The estimation is performed using the Adaptive Recursive Least Squares algorithm (ARLS) [1]. This algorithm handles dynamics by incorporating a forgetting factor, λ(t) ∈ (0, 1], that is optimised at each iteration with respect to the estimation error using a stochastic gradient descent process. As λ tends to zero, the extent of forgetting increases, and vice versa as λ tends to unity.
2
Experiments and Results
We resort to simulations because theoretical guarantees of convergence for bandit algorithms in the static environment may not hold in dynamic environments. In all our numerical experiments we consider a 10-armed bandit problem with a one-dimensional covariate, x(t) ∼ N (0, 1). The actions are oriented so that each is optimal in a region of the domain of x with probability 1/10, and σi ∼ N (0, 0.5). The coefficients of all the arms change at each play by following a random walk with constant variance: βji (t + 1) = βji (t) + , 2 ∼ N (0, σrw ), for all i = 1, 2 . . . , 10, and j = 0, 1. The agent employs a separate ARLS algorithm for the estimation of the coefficients of each of the 10 actions. At each play, the update of the estimated coefficients of the selected action also updates the value of the forgetting factor. For simplicity a common forgetting factor, λ(t), is used by all the ARLS algorithms. At present we consider only the ε–greedy action-selection strategy [5]. ε-greedy selects the action with the highest expected reward based on the current parameter estimates with probability (1 − ), and with probability a random action is selected. In a static environment ε determines the balance between exploration and exploitation. In a dynamic environment this distinction is not so clear. Since all the reward functions change at each play, selecting the greedy action based on current estimates can be seen as exploring the change of this action. We first investigate the relation between the variance of the ran2 dom walk of the coefficients, σrw , and the evolution of the optimal forgetting factor for temporal prediction, λ(t). Fig. 1 illustrates the average λ(t) over 100 simulations using the 0.1-greedy strategy 2 2 for values of σrw ∈ [0, 1] with stepsize 0.1. Since, σrw is constant 2 throughout a simulation, for a particular setting of σrw and ε, λ(t) tends to oscillate around a fixed value over a simulation. Increasing the volatility of the coefficients of each arm results in a decline of the forgetting factor indicating that the estimation becomes progressively more sensitive to more recent observations. However, the forgetting factor is related not only to the variance 2 of the random walk, σrw , but also to the degree of exploration, ε. Fig. 2 illustrates the mean value of λ over an entire simulation with 2 respect to σrw and ε. The reported results are averages over 100 sim-
778
N.G. Pavlidis et al. / Dynamic Multi-Armed Bandit with Covariates
Figure 1. Evolution of λ(t) over 2000 plays using the 0.1-greedy strategy for different values of the variance.
sults from simulations with different reward functions are not directly comparable. In all cases, the value of ε that yielded the highest mean proportion of best action selected over a simulation was around 0.08. Moreover, the mean proportion of times the best action is selected 2 increases when σrw moves away from zero. The fact that for values 2 of σrw different from zero the optimal exploration constant is particularly low, while the mean proportion of times the best action is selected increases with respect to the static case, implies that introducing random walk dynamics decreases the difficulty of the problem. Examination of the evolution of the coefficients of the linear equa2 tions over time revealed that irrespective of the value of σrw , very few actions are optimal (or equivalently most actions become suboptimal irrespective of x) and typically a single action is optimal over a region of the domain of the covariate with very large probability. 2 This was also verified for higher values of σrw .
3 ulations. As Fig. 2 shows, holding the variance of the random walk constant and increasing the proportion of times a random action is selected results in a lower average forgetting factor. When ε is close
Figure 2.
2 and ε. Mean value of λ(t) for different values of σrw
to zero the action-selection strategy becomes equivalent to greedy and typically chooses among very few (usually two) of the available actions. Increasing ε, on the other hand, tends to make all the available actions equally likely to be selected. Consider the case when action αi is one of the actions the agent chooses. If the probability of choosing αi is one half, then this action is selected on average once every two plays. Therefore, each time the estimated coefficients of this action are updated the random walk dynamics have been applied twice on average to the true coefficients. If on the contrary, the ten actions are equiprobable, then action αi will be chosen on average once every ten plays, and hence the random walk dynamics will have been applied ten times between two consecutive updates. Increasing the mean number of plays between two consecutive updates of the estimated regression coefficients for an action renders the estimates less accurate at the time of the update. Thus, increasing the degree of exploration has an impact similar to that of increasing the speed at which the underlying environment is changing, by decreasing the sampling frequency of the actions that are chosen. Next, we investigate the relationship between the degree of exploration and the variance of the random walk. For different values 2 of σrw ∈ [0, 1], 1000 simulations were performed for each value of ε ∈ [0, 0.5] with stepsize 0.01. Performance is measured in terms of the mean proportion of times the best action is selected over a simulation. We do not consider cumulative reward because the re-
Conclusions
We study a dynamic version of the multi-armed bandit problem with covariates, in which the coefficients of the reward functions follow a random walk. The agent employs the adaptive recursive least squares algorithm, which is capable of handling a changing environment by endogenously adapting the degree of forgetting. Hence the agent attempts to perform optimal temporal prediction and does not model explicitly the dynamics of the underlying environment. We consider the ε-greedy action-selection strategy. Experimental results indicate that the degree of forgetting is related not only to the magnitude of the variance of the random walk, but also to the extent of exploration. Indeed in this problem increasing exploration has the same impact on the forgetting factor as increasing the speed of change. This can be justified by the fact that more exploration decreases the sampling frequency of the actions that the agent actually performs. The results for different values of the variance of the random walk suggest that this type of dynamics always decreases the difficulty of the problem by making some actions globally suboptimal. This renders the optimal degree of exploration very small. It also suggests that challenging dynamic real-world problems that can be formulated as a multi-armed bandit problem with covariates are unlikely to be governed by this type of dynamics.
ACKNOWLEDGEMENTS This research was undertaken as part of the ALADDIN (Autonomous Learning Agents for Decentralised Data and Information Systems) project and is jointly funded by a BAE Systems and EPSRC (Engineering and Physical Research Council) strategic partnership, under EPSRC grant EP/C548051/1. David J. Hand was partially supported by a Royal Society Wolfson Research Merit Award.
REFERENCES [1] S. Haykin, Adaptive Filter Theory, Prentice-Hall International, 1996. [2] N. G. Pavlidis, D. K. Tasoulis, and D. J. Hand, ‘Simulation studies of multi-armed bandits with covariates’, in Proceedings of the EUROSIM/UKSim 2008. IEEE, (2008). [3] H. Robbins, ‘Some aspects of the sequential design of experiments’, Bulletin of the American Mathematical Society, 55, 527–535, (1952). [4] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998. [5] C. J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. dissertation, Cambridge University, 1989. [6] Y. Yang and D. Zhu, ‘Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates’, The Annals of Statistics, 30(1), 100–121, (2002).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-779
779
Reinforcement Learning with the Use of Costly Features Robby Goetschalckx2 , Scott Sanner1 and Kurt Driessens2 Abstract. A common solution approach to reinforcement learning problems with large state spaces (where value functions cannot be represented exactly) is to compute an approximation of the value function in terms of state features. However, little attention has been paid to the cost of computing these state features (e.g., search-based features). To this end, we introduce a cost-sensitive sparse linear-value function approximation algorithm — FOVEA — and demonstrate its performance on an experimental domain with a range of feature costs.
1
Introduction
Reinforcement learning problems with large state spaces often preclude the representation of a fully enumerated value function. In this case, a common solution approach is to compute an approximation of the value function in terms of state features. While value function approximation is well-addressed in the reinforcement learning literature (c.f., Chapter 8 of [4]), the cost of feature computation is often ignored. Yet in the presence of costly features, this cost must be traded off with its impact on value prediction accuracy. While reinforcement learning is often modelled as a Markov decision process (MDP) [3], one might consider modelling the function approximation setting with costly features as a partially observable MDP (POMDP) [2] by using information-gathering actions to represent the computation of costly features. In theory, an optimal policy for this POMDP would select those features to compute at any decision stage in order to optimally trade-off feature cost w.r.t. its impact on future reward. However, such a framework requires embedding an already difficult-to-solve MDP inside a POMDP; in general, solutions to such a POMDP will not be feasible in practice. Here we propose a more pragmatic approach where we learn the relative value of features in an explicit way. To do this, we approximate the value function using cost-sensitive sparse linear regression techniques, directly trading off prediction errors with feature costs.
2
MDPs and Reinforcement Learning
We briefly review Markov decision processes (MDPs) [3] and reinforcement learning (RL) [4]. Formally, an MDP can be defined as a tuple S, A, T, R, γ. S = {s1 , . . . , sn } is a finite set of fully observable states. A = {a1 , . . . , am } is a finite set of actions. T : S×A×S → [0, 1] is a stationary, Markovian transition function. A reward R : S × A → R is associated with every state and action. γ is a discount factor s.t. 0 ≤ γ < 1 used to specify that a reward obtained t timesteps into the future is discounted by γ t . γ = 1 is permitted if total accumulated reward is finite. 1 2
National ICT Australia, email: Scott.Sanner@nicta.com.au Declarative Languages and Artificial Intelligence, Katholieke Universiteit Leuven, Leuven, Belgium, email: {robby,kurtd}@cs.kuleuven.be
A policy π : S → A specifies the action a = π(s) to take in each state s. The value Qπ (s, a) of taking an action a in state s and then following the policy π thereafter can be defined using the infinite horizon, expected discounted reward criterion: ˛ # "∞ ˛ X t ˛ π Q (s, a) = Eπ γ · rt ˛s0 = s, a0 = a (1) ˛ t=0
where rt is the reward obtained at time t (assuming s0 and a0 respectively represent the state and action at t = 0). The objective in ∗ an MDP is to find a policy π ∗ such that ∀π, s. Qπ (s, π ∗ (s)) ≥ π ∗ Q (s, π(s)). An optimal policy π is guaranteed to exist. In the RL setting, the transition and reward model may not be explicitly known to the agent although both can be sampled from experience. Here, we use the generalized policy iteration (GPI) framework known to capture most reinforcement learning algorithms [4]. GPI interleaves policy evaluation and update stages as follows: Generalized Policy Iteration (GPI) 1. 2. 3. 4.
Start with arbitrary initial policy π0 and set i = 0. Estimate Qπi (s, a) (e.g., from samples using Equation 1). Let πi+1 (s) = arg maxa∈A Qπi (s, a). If termination criteria not met, let i = i + 1 and goto step 2.
Every RL algorithm that is an instance of GPI algorithm may prescribe its own method for performing each step and many GPI instances guarantee convergence to π ∗ or an approximation thereof. We keep our treatment of reinforcement learning with costly features as general as possible. Specifically, this means that in the context of GPI, we can restrict our discussion of RL with costly features to that of cost-efficient Q-value approximation in step 2 of GPI.
3
Cost-efficient Value-approximation
π ˆw We represent a Q-value approximation Q (s, a) w.r.t. policy π as a linear combination of a feature set F = {f1 , . . . , fk } with weights w = w0 , . . . , wk where each fi : S × A → R and each wi ∈ R: X π ˆw Q wi fi (s, a) (2) (s, a) = w0 + fi ∈F
We assume each feature fi is associated with cost cfi ∈ R expressed in the same units as prediction error. Our task will be to find feature weights w that trade-off Q-value accuracy with feature cost. At step 2 of GPI, we assume that we are given data D = {Qπs,a } consisting of sampled Q-values to approximate. Then we define the optimal cost-efficient value approximation w ∗ as follows: X X π cfi 1 π 2 ˆw [Qs,a − Q I[wi = 0] w∗ = argmin (s, a)] + w |D| 1 −γ π Q ∈D s,a
fi ∈F
Here, I[·] is 1 when its argument is true and 0 otherwise. We see
780
R. Goetschalckx et al. / Reinforcement Learning with the Use of Costly Features
that the optimal setting of w ∗ directly trades off prediction error with feature cost (divided by (1 − γ) to account for the future discounted cost of feature evaluation at every time step). Unfortunately, this optimization objective is not convex due to step discontinuities where any wi = 0 and thus not easily amenable to finding a global optima. However, observing that weight sparsity encourages low feature cost, we can modify sparse linear regression approaches to encourage sparsity for a feature weight in a manner proportional to its cost. To do this, we focus on a class of sparse linear regression techniques collectively referred to as least-angle regression (LAR) methods, such as lasso and forward stepwise regression [1]. Fortunately, a simple modification of forward-stepwise regression provides us with an efficient algorithm — FOVEA — for approximating the solution to our optimization problem. We present FOVEA below and refer the reader to the detailed discussion in [1] for the original algorithm.
RMSE versus Cost 0.25
RMSE cost c2 2 c2 prediction cost
RMSE
0.2 0.15 0.1 0.05 0 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 c2
Figure 1.
Prediction error (RMSE) vs. the feature cost of the prediction.
Forward-stepwise Value Approximation (FOVEA) 1. Input: Q-value samples D = {Qπs,a } for policy π (e.g., computed from sample trajectories for a policy π using Equation 1). 2. Initialize F = ∅. 3. Initialize wi = 0 for i ≥ 1 and w0 with the average value of Qs,a ∈ D (this gives the residuals a mean of 0). 4. Normalize all feature predictions to have 0 mean and a standard deviation of 1. 5. Initialize the step-size η to some small positive value. 6. Repeat the following: (a) Compute a vector of residuals r and a vector of feature values fi with entries for each data sample Qπs,a ∈ D where the ˆ πw (s, a) and the feature value is fi (s, a). residual is Qπs,a − Q (b) Calculate cost-penalized correlation score for all fi ∈ F : ˛ ˛ 1 ˛ ˛ √ score i = ˛fi · r˛ − I[fi ∈ F ] cfi |D| (c) Find the feature fi with the highest scorei ≥ η; if no such feature found then halt and Output: w. √ / F , let F = F ∪ {fi }; wi = wi + sgn(fi · r) cf .3 (d) If fi ∈ i
(e) Else let wi = wi + sgn(fi · r)η. It is important to note that the forward stepwise approach is a greedy selection approach and thus the result obtained might not be the optimal one in all cases. However, we can still prove a form of local optimality during the progression of the FOVEA algorithm: Theorem 1 Every feature fi which is introduced in step 6d of the FOVEA algorithm immediately reduces the mean squared error of the prediction by the value of its cost cfi .
4
Experiments
We evaluated GPI using FOVEA on a simple deterministic corridor domain. The state space consists of five rooms, labeled s1 , . . . , s5 . From each state two actions, +1, −1 can be performed. Performing action +1 in state si for i < 5 leads to si+1 and performing −1 in state si for i > 1 leads to si−1 . All other actions take the agent to the center s3 . A reward of 1 is assigned for taking +1 in s5 and −1 is assigned for taking action −1 in s1 . All other rewards are equal to 0. We used a discount factor γ = 0.9. 3
sgn(·) is +1 if its argument is non-negative and −1 otherwise.
We provided seven state-action indicator features fi for 0 ≤ i ≤ 6 to the agent where taking action a ∈ {+1, −1} in i results in fi+a = 1 with all remaining indicator features set to 0. f0 , f2 , f3 , f5 and f6 are free and are assigned cost c1 = 0 while f1 and f4 have a cost c2 . Furthermore two random number generators were provided to the agent, one which was free and another one which had a cost c3 > c2 . Finally, the state-action feature indicators f0 , . . . , f6 were copied but now with the higher cost c3 . We used forward-stepwise value approximation to approximate Q-values using the state-action features defined above. We used 100 samples for each forwardstepwise update. All results shown are averages over 10 runs. We varied the value of c2 over a range of 0 to 0.5. If the agent does not pay c2 for f1 or f4 , it can not distinguish between s1 and s4 (if it pays the cost of only one of f1 or f4 , it can still infer the other by absence). The results in Figure 1 demonstrate the effectiveness of FOVEA. Initially the agent pays 2c2 for both f1 and f4 (illustrating slight sub-optimality by paying for both features due to inherent statistical noise in the estimation process, but still avoiding the useless features that cost c3 ) until it realizes for c2 > 0.05 that it can just pay c2 for one of these features and still obtain low prediction error. However, for c2 > 0.185, the agent refuses to pay the cost for either f1 or f4 since the cost exceeds the future expected reward. As such, there is a clear phase transition near c2 = 0.185 as the paid feature cost decreases rapidly while the prediction error likewise increases.
5
Future Work
Perhaps the most important area of future work is to explore efficient extensions to handle state- and action-dependent feature selection.
Acknowledgements This research was sponsored by the fund for scientific research (FWO) of Flanders, of which Kurt Driessens is a postdoctoral fellow, and by National ICT Australia.
REFERENCES [1] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, ‘Least angle regression’, Tech. report, Statistics Department, Stanford University, (2002). [2] Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra, ‘Planning and acting in partially observable stochastic domains’, Artificial Intelligence, 101, 99–134, (1998). [3] Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, New York, 1994. [4] R. Sutton and A. Barto, Reinforcement Learning: An Introduction, The MIT Press, Cambridge, MA, 1998.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-781
781
Data-driven Induction of Functional Programs Emanuel Kitzelmann1 Abstract. We present a new method and system, called I GOR 2, for the induction of recursive functional programs from few nonrecursive, possibly non-ground example equations describing a subset of the input-output behaviour of a function to be implemented.
1
Introduction
Classical attempts to construct functional L ISP-programs from input/output-examples [10, 4] are analytical, i.e., a L ISP-program belonging to a strongly restricted program class is algorithmically derived from examples. This is done by identifying repetitive syntactical patterns in traces. More recent approaches, e.g. [3, 7], generate and test programs until a program consistent with the examples is found. Theoretically, large program classes can be induced generateand-test based. Yet although these latter systems use type information, some of them higher-order functions, and further techniques for pruning the search space, they strongly suffer from combinatorial explosion. Also Inductive Logic Programming (ILP) [6] has originated some methods capable of inducing recursive programs on inductive types though ILP in general has a focus on classification. General purpose systems capable of recursive program induction like F OIL [9] are suitable to only a limited extent for program induction since they use greedy search methods with inappropriate heuristics. Special purpose systems [1] have problems similar to those described for functional approaches. The I GOR 2 [5] method described here combines classical analytical methods with an enumerative approach in order to put their relative strengths into effect. Induction is based on search in order to avoid strong a priori restrictions as imposed by purely analytical methods. But in contrast to the generate-and-test approach I GOR 2 constructs successor programs during search using analytical methods. I GOR 2 represents functional programs as sets of typed recursive first-order equations. The effect of constructing these equations analytically is that only equation sets entailing the example equations are enumerated. In contrast to greedy search methods, the search is complete—only programs known to be inconsistent are ruled out. Compared to purely analytical systems, I GOR 2 is a substantial extension since the class of inducible programs is much larger. E.g., all sample programs from [4, page 448] can be induced by I GOR 2 but only a fraction of the sample problems in [5, Sect. 5] can be induced by the system described in in [4]. Compared to ILP systems capable of inducing recursive functions and recent enumerative functional methods like F OIL [9] and M AGIC H ASKELLER [3] I GOR 2 mostly performs better regarding inducability of programs and/or induction times [2]. 1
University of Bamberg, Germany, email: emanuel.kitzelmann@unibamberg.de
2
General Method
Given a set E of example equations of the form F (a) = r for any number of target functions F to be implemented as well as for already implemented background functions which may be used by the induced program I GOR 2 returns a set of recursive equations P constituting a functional program which is correct w.r.t. the example equations in that it evaluates the left-hand sides (lhss) of the example equations to their right-hand sides (rhss). Even if example equations may contain variables, we call lhs arguments example input and rhss example output in the following. There are infinitely many correct solutions P , one of them E itself. In order to select one or at least a finite subset of the possible solutions at all and a “good” solution in particular, I GOR 2—like almost all inductive inference methods—is committed to a preference bias. I GOR 2 prefers solutions P which partition the examples in fewer subsets, i.e., programs with fewer case distinctions. Case distinctions are realised by disjoint patterns in the equation lhss. This concept is known as pattern matching in functional programming. Additionally simple forms of conditions to restrict the applicability of an equation like equality of pattern variables are used but not described in this paper. The search for solutions is complete, i.e., programs with the least number of case distinctions are found. This preference bias assures that the recursive structure in the examples as well as the computability by predefined functions is best possible covered. Example From appropriate type declarations and the examples2 Rev ([ ]) = [ ], Rev ([X]) = [X], Rev ([X, Y ]) = [Y, X], Rev ([X, Y, Z]) = [Z, Y, X],
(1)
Rev ([X, Y, Z, V ]) = [V, Z, Y, X] and the background equations Last([X]) = X, Last([X, Y ]) = Y, Last([X, Y, Z]) = Z, Last([X, Y, Z, V ]) = V I GOR 2 induces the following equations for Rev and an auxiliary function Init: Rev ([ ]) Rev ([X|Xs]) Init([X]) Init([X1 , X2 |Xs])
= = = =
[] [Last([X|Xs])|Rev (Init([X|Xs]))] [] [X1 |Init([X2 |Xs])]
The induction of a program is organised as a kind of best first search. During search, a hypothesis is a set of equations entailing the example equations and constituting a terminating program but potentially with unbound variables in the rhss, i.e., with variables in the 2
We use a syntax for lists as known from P ROLOG.
782
E. Kitzelmann et al. / Data-Driven Induction of Functional Programs
rhss not occurring in the lhss. We call such equations and hypotheses containing them unfinished equations and hypotheses. A goal state is reached, if at least one of the best—according to the preference bias described above—hypotheses is finished. Such a finished hypothesis is terminating by construction and since its equations entail the example equations, it is also correct. The initial hypothesis is a program with one equation per target function, namely the least general generalisation [8] of the example equations. In most cases (e.g., for all recursive functions) one equation is not enough and the rhss remain unfinished. Then for one unfinished equation successors are computed which leads to new hypotheses. Now repeatedly unfinished equations of currently best hypotheses are replaced until a currently best hypothesis is finished.
3
Computing Successor Sets of Equations
Three operations are applied to compute successor equations: (i) Partitioning of the inputs by replacing the pattern p of the equation by a set of disjoint more specific patterns; (ii) replacing the rhs by a (recursive) call of a defined function; and (iii) replacing the rhs subterms in which unbound variables occur by calls to new subprograms.
3.0.1
Refining a Pattern
Computing a set of more specific patterns, case (i), in order to introduce a case distinction, is done as follows: A position in the pattern p with a variable resulting from generalising the corresponding subterms in the subsumed example inputs is identified. The inputs are partitioned such that those with the same symbol at this position belong to the same subset. This yields a partition of the example equations. Now for each subset a new initial hypothesis is computed, leading to a set of successor equations. E.g., consider the examples (1) for Rev . The pattern of the initial equation is simply a single variable Q, since the example inputs have no common root symbol. The first example input consists of only the constant [ ]. All remaining example inputs have the list constructor cons as root. I.e., two subsets are induced, one containing the first example, the other containing the remaining examples. The lggs of the example inputs of these two subsets are [ ] and [Q|Qs] resp. which are the (more specific) patterns of the two successor equations.
3.0.2
Introducing (Recursive) Function Calls and Help Functions
In cases (ii) and (iii) help functions are invented. This includes the generation of examples from which they are induced. For case (ii) this is done as follows: Function calls are introduced by matching the currently considered outputs, i.e., those outputs whose inputs match the pattern of the currently considered equation, with the outputs of any defined function. If all current outputs match, then the rhs of the current unfinished equation can be set to a call of the matched defined function. The argument of the call must map the currently considered inputs to the inputs of the matched defined function. For case (iii), the example inputs of the new defined function also equal the currently considered inputs. The outputs are the corresponding subterms of the currently considered outputs. For an example of case (iii) consider the Rev examples except the first one as they have been put into one subset in the previous section. The initial equation for these is: Rev ([Q|Qs]) = [Q2 |Qs2 ]
(2)
It is unfinished due two the two unbound variables in the rhs. Now the two unfinished subterms (consisting of exactly the two variables) are taken as new subproblems. This leads to two new example sets for two new help functions Sub1 and Sub2 : Sub1 ([X]) = X, Sub1 ([X, Y ]) = Y, . . ., Sub2 ([X]) = [ ], Sub2 ([X, Y ]) = [X], . . .. The successor equation-set for the unfinished equation contains three equations determined as follows: The original unfinished equation (2) is replaced by the finished equation Rev ([Q|Qs]) = [Sub1 ([Q|Qs] | Sub2 [Q|Qs]] and from the new example sets initial equations are derived. Finally, as an example for case (ii), consider the examples for the help function Sub2 and the unfinished initial equation: Sub2 ([Q|Qs] = Qs2
(3)
The example outputs, [ ], [X], . . . of Sub2 match the example outputs for Rev . That is, the unfinished rhs Qs2 can be replaced by a (recursive) call to the Rev -function. The argument of the call must map the inputs [X], [X, Y ], . . . of Sub2 to the corresponding inputs [ ], [X], . . . of Rev , i.e., a new help function, Sub3 is needed. This leads to the new example set Sub3 ([X]) = [ ], Sub3 ([X, Y ] = [X], . . . The successor equation-set for the unfinished equation (3) contains the finished equation Sub2 ([Q|Qs] = Rev (Sub3 ([Q|Qs])) and the initial equation for Sub3 .
4
Conclusion and Future Research
I GOR 2 integrates classical data-driven program induction techniques with search. Comparisons show that this approach is competitive with existing program induction methods regarding solvable problems and mostly solves problems faster [2]. In future work we will extend I GOR 2 to higher-order functions such that well known higherorder functions like Map can be used in induced programs.
REFERENCES [1] P. Flener and S. Yilmaz, ‘Inductive synthesis of recursive logic programs: Achievements and prospects’, Journal of Logic Programming, 41(2–3), 141–195, (1999). [2] Martin Hofmann, Emanuel Kitzelmann, and Ute Schmid, ‘Analysis and evaluation of inductive programming systems in a higher-order framework’. Submitted to ECML’08, http: //www.cogsys.wiai.uni-bamberg.de/publications/ ecml08submission.pdf, 2008. [3] Susumu Katayama, ‘Systematic search for lambda expressions’, in Revised Selected Papers from the Sixth Symposium on Trends in Functional Programming, TFP 2005, ed., Marko C. J. D. van Eekelen, volume 6, pp. 111–126. Intellect, (2007). [4] E. Kitzelmann and U. Schmid, ‘Inductive synthesis of functional programs: An explanation based generalization approach’, Journal of Machine Learning Research, 7, 429–454, (2006). [5] Emanuel Kitzelmann, ‘Data-driven induction of recursive functions from input/output-examples’, in Proceedings of the ECML/PKDD 2007 Workshop on Approaches and Applications of Inductive Programming (AAIP’07), pp. 15–26, (2007). [6] S. Muggleton and L. De Raedt, ‘Inductive logic programming: Theory and methods’, Journal of Logic Programming, Special Issue on 10 Years of Logic Programming, 19-20, 629–679, (1994). [7] Roland Olsson, ‘Inductive functional programming using incremental program transformation’, Artificial Intelligence, 74(1), 55–83, (1995). [8] G. D. Plotkin, ‘A note on inductive generalization’, in Machine Intelligence, volume 5, 153–163, Edinburgh University Press, (1969). [9] J. R. Quinlan and R. M. Cameron-Jones, ‘FOIL: A midterm report’, in Proceedings of the 6th European Conference on Machine Learning, ed., P. Brazdil, LNCS, pp. 3–20, London, UK, (1993). Springer-Verlag. [10] D.R. Smith, ‘The synthesis of LISP programs from examples: A survery’, in Automatic Program Construction Techniques, eds., A.W. Biermann, G. Guiho, and Y. Kodratoff, 307–324, Macmillan, (1984).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-783
783
CTRNN Parameter Learning using Differential Evolution Ivanoe De Falco1 and Antonio Della Cioppa2 and Francesco Donnarumma3 and Domenico Maisto1 and Roberto Prevete3 and Ernesto Tarantino1 Abstract. Target behaviours can be achieved by finding suitable parameters for Continuous Time Recurrent Neural Networks (CTRNNs) used as agent control systems. Differential Evolution (DE) has been deployed to search parameter space of CTRNNs and overcome granularity, boundedness and blocking limitations. In this paper we provide initial support for DE in the context of two sample learning problems. Key words: CTRNN, Differential Evolution, Dynamical Systems, Genetic Algorithms
1
INTRODUCTION
Insofar as Continuous Time Recurrent Neural Networks (CTRNNs) are universal dynamics approximators [1], the problem of achieving target agent behaviours is redefined as the problem of identifying suitable network parameters. Although a variety of different learning algorithms exists, evolutionary approaches like Genetic Algorithms (GA) are usually deployed to perform searches in the parameter space of CTRNNs [5]. However GAs require some kind of network encoding which may greatly influence parameter searches. In fact, the resolution of the parameters is limited by the bit resolution of the encoding (granularity ) and the parameters cannot assume values falling outside an encoding a priori fixed interval (boundedness). Yamauchi and Beer [5] proposed a real-valued encoding for CTRNNs, which improves the learning process allowing parameter values to be in R. However, problems arise that, in not rare cases, prevent real-valued GAs (rvGA) from finding global optima (blocking) [2]. Here we propose an approach based on a Differential Evolution (DE) algorithm [4] which combines fast learning with the possibility of overcoming the limitations mentioned above. Section 2 introduces the DE algorithm. In Section 3 two sample CTRNN parameter search problems are solved with DE. Finally in Section 4, the obtained results are discussed and future developments of this approach are proposed.
2
DIFFERENTIAL EVOLUTION
DE is a stochastic, population-based evolutionary algorithm [4] which addresses a generic optimization problem with m 1 2 3
ICAR-CNR, Naples, Italy -{ivanoe.defalco, domenico.maisto, ernesto.tarantino}@na.icar.cnr.it DIIIE, Universit` a di Salerno - adellacioppa@unisa.it Universit` a di Napoli Federico II {donnarumma, prevete}@na.infn.it
real parameters by starting with a randomly initialized population consisting of n individuals, each made up of m real values, and, subsequently, by updating the population from a generation to the next one by means of many different transformation schemes commonly named as strategies [4]. In all of these strategies DE generates new individuals by adding to an individual a number of weighted difference vectors made up of couples of population individuals. In the strategy chosen, starting from xi , the i-th individual, a new trial one xi is generated, by perturbing the best individual xbest by means of 2 difference vectors. The generic j-th component candidate is: xi,j = xbest,j + F · [(xr1 ,j − xr2 ,j ) + (xr3 ,j − xr4 ,j )] with 4 randomly generated integer numbers r1 , r2 , r3 , r4 in {1, . . . , n}, differing from one another and F the parameter which controls the magnitude of the differential variation. So in DE new candidate solutions are created by using vector differences, whereas traditional rvGAs rely on probabilistic selection, random perturbation (mutation) and on mixing (recombination) of individuals. The three phases of a standard rvGA, selection, recombination and mutation, are combined in DE in one operation which is carried out for each individual. According to this, in rvGA not all the elements are involved in each phase of the generation of the new population, while, by contrast, DE algorithm iterates through the entire population and generates a candidate for each individual.
3
EXPERIMENTS
We tested the efficacy of CTRNN training by DE on two sample experiments where the approach seems to solve problems outlined in Section 1. Parameters ruling the DE algorithm were assigned experimentally via a set of training trials.
3.1
Cusp point learning
Let us consider a CTRNN made up of a single self-connected neuron. The equation of the system is given by τ · y˙ = −y + wσ (y + θ) + I
(1)
where for simplicity we set the time constant τ = 1 and the bias θ = 0. Notice that no elementary expression for the solution of (1) exists. Such system has a cusp point, that is the only bifurcation point in which the system undergoes a
784
I. De Falco et al. / CTRNN Parameter Learning Using Differential Evolution
picthfork bifurcation [3]. The goal of the experiment is to find such cusp point. To evaluate each network candidate (I , w ) we let it evolve for a sufficient time T so that we can consider y (T ) ; y¯ . Then we choose as fitness function FCP (y (I , w )) = ff ixed + ftan + fcusp with terms rewarding respectively fixed point, non hyperbolic and cusp curve intersection condition. Average and standard deviation values found for (I, w) in 10 runs using the DE algorithm are I¯ = −2.00015 with a standard deviation equal to 1.6 · 10−4 and w ¯ = 4.0003 with standard deviation 3.1·10−4 . These values are absolutely close ˜ w) to the coordinates (I, ˜ = (−2, 4) of the cusp point which can be formally inferred. Figure (1) shows fitness trend as a function of the generation number for average, best and worst case. The constant and smooth decrease suggests a gradual and continuous learning improvement as the generation number grows. In addition, the evident increasing resolution of the parameter values observable during DE runs demonstrates the possibility of tackling the granularity problem, theoretically having the machine precision as only limit.
Figure 2. Sequence generator task. Left: fitness of three different runs plotted as a function of the generation number. Right: Fitness 1 from 1000-th to 2500-th generation. Figures show DE avoiding blocking by escaping from local minima.
space allowing the surmounting of the boundedness problem. Moreover each run passes through a different sequence of local minima, from which the DE algorithm has to escape. So the descent of the function towards the global minimum occurs in “steps” (see Left of Figure 2). Right of Figure 2 illustrates how the search of the parameters continues even in the very proximity of optimal values, finding better and better solutions. Moving by vector differences in the parameter space is “as if” DE is capable of calibrating the magnitude and the direction towards the reaching of the minima in it. The result is that every run is able to overcome the blocking problem.
4
Figure 1. Cusp point learning: fitness plots of runs corresponding to the average, worst and best solutions as a function of the generation number
3.2
Sequence generator task
The goal of this task is to train a control network able to switch between two different behaviours (fixed points 0 and 1) anytime a signal trigger is detected [5]. Focusing on a network of three neurons, we generate a random sequence for each generation I = [bit1 , . . . , bitM ], where M is the length of the sequence and biti ∈ {0, 1} ∀i ∈ M . The length of every sequence of 0 (no signal) or 1 (trigger) has been extracted from a Gaussian distribution. For every sequence ? @ generation we generate the desired target t = t1 , . . . , tM . We measure @ ? the output candidate y = y¯31 , . . . , y¯3M with a fitness function FSG (y(w)) = FHM (y(w))+k ·FHD (y(w)) with the first term (the Hamming distance) and the second term respectively measuring how many times and how different the fixed point values are from the desired targets. We set k = 0.01 so as to weight the first contribute more than the second. In each of the 10 runs DE is able to find optimal solutions, even reaching the global minima. It is worth remarking that the weights found are very sparse (e.g. w ≈ 21.19 and w ≈ 1.86 · 1018 ) so that by fixing a priori intervals many good solutions would become inaccessible. This sparseness suggests that DE is almost able to investigate the entire parameter
CONCLUSIONS
We showed two experiments solved by means of DE which provides a simple and a “physical” way to perform CTRNN parameter space search. The first experiment provides an example of how the granularity problem can be overcome. DE showed a high precision in determining the parameter values which can be still improved by letting the execution run. The second experiment points to ways in which boundedness and blocking can be overcome, too, by a DE approach. Using only three neurons we solve the sequence generator task. The found parameter values are sparse, so fixing a priori intervals would have cut many possible solutions. Furthermore, although each run passes through a sequence of local minima, DE algorithm can escape from them jumping step by step towards a better approximation of a global minimum. After this encouraging results next studies will concern a direct comparison with rvGAs particularly on local minima trapping issue and a deeper investigation on theoretical details of DE approach for CTRNN learning.
References [1] Ken-ichi Funahashi and Yuichi Nakamura, ‘Approximation of dynamical systems by continuous time recurrent neural networks’, Neural Networks, 6(6), 801–806, (1993). [2] David E. Goldberg, ‘Real-coded genetic algorithms, virtual alphabets, and blocking’, Complex Systems, 5, 139–167, (1991). [3] J. K. Hale and H Kocac, Dynamics and Bifurcations, SpringerVerlag, 1991. [4] K Price, R Storn, and Lampinen J, Differential Evolution: A Practical Approach to Global Optimization, Natural Computing Series, Springer-Verlag, 2005. [5] Brian M. Yamauchi and Randall D. Beer, ‘Sequential behavior and learning in evolved dynamical neural networks’, Adaptive Behavior, 2(3), 219–246, (1994).
3. Model-Based Diagnosis and Reasoning
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-787
787
Incremental Diagnosis of DES by Satisfiability1 Alban Grastien and Anbulagan NICTA and Australian National University Abstract. We propose a SAT-based algorithm for incremental diagnosis of discrete-event systems. The monotonicity is ensured by a prediction window that uses the future observations to lead the current diagnosis. Experiments stress the impact of parameters tuning on the correctness and the efficiency of the approach.
1 Diagnosis by SAT Diagnosis is the AI problem of determining whether a system is running correctly during a time window, and of identifying any failure otherwise. Consider a system which is completely modeled by a DES (basically a finite state machine) denoted Mod . This system is running and generates observations. The goal of the diagnosis is to determine from the model and the observations whether faulty events occurred on the system. The problem can be reduced to finding particular paths on the DES consistent with the observations [4]. Since failures are rare events, we can consider paths that minimize the number of faults. In [2], we proposed to solve the DES diagnosis problem with satisfiability (SAT) algorithms. SAT is the problem of finding an assignment of the variables of a given Boolean formula in such a way as to make the formula evaluate to true. Given an upper bound on the number of transitions in the paths that are considered, a diagnosis problem – finding a particular path – can be encoded as a SAT problem. The SAT-based algorithm then simply uses SAT solver to look for a path with increasing number of faults until a diagnosis is found.
2 Incremental Diagnosis by SAT Incremental diagnosis (ID) consists in computing the diagnosis for a temporal window, and then updating this diagnosis to consider a larger temporal window. The incremental diagnosis can serve for two purposes. First, it is used when the observations for the latter temporal window are not immediately available: a diagnosis for the first temporal window is computed, and then must be updated as the other observations are provided. This is typically the case for on-line diagnosis, where the system is monitored while it is running. Second, an incremental approach can be used to simplify a non-ID problem. Given a diagnosis task on a large temporal window, the window is sliced into small windows to obtain simpler diagnosis problems. In both cases, the complexity of the ID must be independent of the previous diagnoses. This paper considers the second approach where all the observations are available. The on-line problem contains additional issues mostly independant from ID. 1
This research was supported by NICTA in the framework of the SuperCom project. NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.
Consider that the observations on a window are denoted Obs, and denote ⊕ the concatenation of two windows. Note that the concatenation may be non-trivial in case of uncertain observations [3]. Given the diagnosis of Obs1 , given the observations Obs2 , the incremental diagnosis is the computation of the diagnosis of Obs1 ⊕ Obs2 . The complexity of the incremental diagnosis of Obs1 ⊕ Obs2 given the diagnosis of Obs1 must not depend on the size of Obs1 . It is wise to perform the diagnosis of Obs1 in such a way as to ease the ID of Obs1 ⊕ Obs2 . In this case, the complexity of the diagnosis of Obs1 must not depend on the size of Obs2 . Rather than diagnosing the whole period (t0 , tn ), we do an ID and diagnose n windows of size λ (ti+1 = ti +λ). The n diagnoses must be consistent with each other: the path computed for (t1 , t2 ) must be a continuation of the path computed for (t0 , t1 ). However, when the diagnosis of (t0 , t1 ) is computed, we cannot be sure that the extracted path will be consistent with the next observations. In diagnosis, there is usually a delay between the occurrence of an event and the reception of observations proving this occurrence. However, this delay is generally bounded: it is unlikely an observation will explain what happened several days or weeks ago. Thus, we ensure that the path of (t0 , t1 ) is consistent not only with observations (t0 , t1 ) but also with the observations (t1 , t1 + μ). This way, the diagnosis for this window should be globally consistent. The period of time (t1 , t1 + μ) is called prediction window of the diagnosis window (t0 , t1 ). Note that the diagnosis is approximate as the best global path may be lost if it includes the early occurrence of many faults. Algorithm 1 Incremental Diagnosis(Mod ,I,Obs,Que,λ,μ) 1: S(0) := I(0); // I represents the initial states 2: for i := 0 ; i < n ; i ++ do // Diagnoses the window (ti , ti+1 ) 3: while no solution found for (ti , ti + λ) do 4: for (k := 0 ; k < K and no solution found ; k ++) do 5: F := Mod (ti , ti + λ + μ) ∪ Obs(ti , ti + λ + μ) ∪ Quek (ti , ti + λ) ∪ S(i); 6: if SAT(F) is satisfiable then 7: extract path(SAT(F)); 8: S(i + 1) := extract state(SAT(F)); 9: if no solution found for (ti , ti + λ) then // path reset 10: S(i) = ∅ We propose Algorithm 1 for the ID of (t0 , tn ). Let K be the maximum number of faults that can occur during λ time steps. For each window (ti , ti + λ), the SAT solver tries to find a path starting from state S(i), consistent with the observations (ti , ti + λ + μ) by increasing the number of faults (lines 4–8). F is the CNF that models the set of contraints on the path we are looking for. When the path is found, the function extract path extracts the path computed during (ti , ti + λ). The function extract state computes S(i + 1)
788
A. Grastien and Anbulagan / Incremental Diagnosis of DES by Satisfiability
est runtime is not achieve with smallest diagnosis windows but with medium-large diagnosis windows. 100
λ= 2 λ= 5 λ= 10 λ= 20 λ= 40
90 80 70
120
100 Nb. of faults diagnosed
Percentage of trajectory resets
in order to force the next path to be a prolongation of the current path. If no path is found starting from S(i) for (ti , ti + λ), the path for the previous window is not consistent with the new observations. For complexity reasons, backtracking is not allowed. The algorithm simply tries to find a new path that does not start from the previous path (line 10). We call this a path reset. When a path reset is performed, the path of (t0 , tn ) is not globally consistent. However for most systems, it can be expected that the misinterpretation of the observations will be only localised on a small time frame.
60 50 40
80
60
30
λ= 2 λ= 5 λ= 10 λ= 20 λ= 40
40 20 20
10 0
3 Empirical Validation
0
10
20
30
40
50
Size μ of the prediction window
100
0
a. Nb. of path resets.
600
400
50
100
λ= 2 λ= 5 λ= 10 λ= 20 λ= 40
600 500 400 300 200
200
100
0
Table 1. Runtime in seconds of M INI S AT solver on nID satisfiable
10
20
30
40
50
Size μ of the prediction window
0
100
c. Nb. of calls to M INI SAT.
problem instances with n observations
0
10
20
30
40
50
100
Size μ of the prediction window
d. Total runtime of MINI SAT. 800
λ= 2 λ= 5 λ= 10 λ= 20 λ= 40
2000
2–100 5–100 10–100 20–100 40–100 40–0
700 600 Runtime in seconds
Total runtime in seconds
2500
Quality of the diagnosis Figure 1a presents the percentage of path resets, and Figure 1b gives the number of faults computed for each pair of parameters. These measure the quality of the diagnosis. An accurate diagnosis should have no reset and the smallest number d(0, 1000) of faults consistent with the observations (this value is unknown but less than 128). As expected, the number of resets decreases when the size of the prediction window increases. In this example, a value μ = 100 is sufficient to avoid any reset. The Figure 1b also shows that a large diagnosis window partially avoids the bad-quality results of small prediction windows though it generates a big number of path reset. This is simply because enlarging the size of the diagnosis windows makes the incremental diagnosis more and more look like non incremental diagnosis.
40
700
Solver runtime in seconds
800
0
Table 1 shows the runtime required by M INI S AT to find a scenario consistent with the n observations and containing k(n) n/8 faults. In this Table, t >2d means that the instance cannot be solved in 2 days. Note that this computation is not a diagnosis in the sense that it should first be proved that there is no path with k faults where k < k(n), which is usually more expensive as these problems are unsatisfiable. Note that the runtime does not increase linearly but in a chaotic way, such as the difference between n = 999 and n = 1000. We now run Algorithm 1 on the scenario of 1 000 observations by varying the parameter λ in the range of {2, 5, 10, 20, 40} and the parameter μ in the range of {0, 10, 20, 30, 40, 50, 100}.
30
b. Nb. of diagnosis faults. λ= 2 λ= 5 λ= 10 λ= 20 λ= 40
1000
n 100 200 299 300 400 500 599 600 699 700 799 800 899 900 999 1000 t 116 19 >2d >2d 268 127 >2d 153 832 669 185 574 151 370 >2d 3204
20
Size μ of the prediction window
800
1200
Nb. of calls
The experiments are conducted on an Intel Pentium 4 PC running at 3 GHz CPU, under Linux using M INI S AT v2.0 [1]. For this study, we use the system presented in [2]. The maximum number of faults K is set to 1 + λ/2 in the experiment.
10
1500
1000
500 400 300 200
500
100
0
0
10
20
30
40
50
Size μ of the prediction window
e. Total runtime.
Figure 1.
100
0
100
200
300
400
500
600
700
Nb. of observations
800
900
1000
f. Runtime of ID solving.
Results of our incremental algorithm on nID problems.
Incremental runtime Figure 1f shows the evolution of the SAT runtime during the incremental diagnosis for some pairs λ, μ (other pairs lead to similar results). The experiments clearly show a linear runtime for most pairs of parameters. Note however that small prediction windows potentially generates picks of computation. These results validate our approach. The incremental algorithm of DES can be performed using SAT algorithms, and the runtime is lower than in a non-incremental approach. The results stress the importance of the parameters λ and μ both for efficiency and for diagnosis correctness. These parameters should be tested off-line before running the diagnosis to address the quality of diagnosis required and the resources available.
REFERENCES Runtime Figure 1c gives the number of calls to M INI S AT , Figure 1d presents the total runtime of M INI S AT , and Figure 1e presents the total runtime including the preprocessing time. All the computations are done in less than one hour, which is better than the incomplete computations of Table 1. The runtime generally increases when μ increases. Thus, a tradeoff might be required here between quality and efficiency. Note that the tendency is inverted when λ is large because the number of path restart decreases; for large diagnosis windows, large prediction windows increase quality and efficiency. Finally, note that the small-
[1] N. E´en and N. S¨orensson, ‘An extensible SAT-solver’, in Sixth International Conference on Theory and Applications of Satisfiability Testing (SAT-03), (2003). [2] A. Grastien, Anbulagan, J. Rintanen, and E. Kelareva, ‘Diagnosis of discrete-event systems using satisfiability algorithms’, in Proc. of 19th AAAI, pp. 305–310, (2007). [3] A. Grastien, M.-O. Cordier, and Ch. Largou¨et, ‘Incremental diagnosis of discrete-event systems’, in Sixteenth International Workshop on Principles of Diagnosis (DX-05), pp. 119–124, (2005). [4] G. Lamperti and M. Zanella, Diagnosis of Active Systems, Kluwer Academic Publishers, 2003.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-789
789
Characterizing and checking self-healability Marie-Odile Cordier1 and Yannick Pencol´e2 and Louise Trav´e-Massuy`es2 and Thierry Vidal1 1
INTRODUCTION
Real-life complex systems are often required to offer high reliability and quality of service and must be provided with self-management abilities, even in faulty situations. They are expected to be self-aware of their current state and survive autonomously the occurrence of faults, still managing to provide the desired functionality. In other words, such systems must be self-healing [2]. Designing self-healing systems requires to be able to evaluate the joint degree of self-awareness and reactiveness. In the artificial intelligence community, these two properties are better known as diagnosability [3, 1], i.e. the capability of a system to exhibit different observables for different anticipated faulty situations, and repairability, i.e. the ability of a system and its repair actions to cope with any unexpected situation. Checking separately diagnosability and repairability leads to a conservative assessement of self-healability. In this paper, we show that neither standard diagnosability nor repairability of every anticipated fault are necessary to achieve self-healability. Our main contribution consists of defining self-healability as a joint property bridging diagnosability and repairability, which requires a new definition of diagnosability that allows diagnosable subsets of faults to overlap, as opposed to the standard definitions which rely on a partition.
2
MAIN CONCEPTS
The presented framework, which is relevant for state based or event based systems, adopting the generic viewpoint defined in [1], is illustrated with discrete event systems 3 as our current objective is to apply it to service oriented architectures like Web Services in the framework of the WS-DIAMOND European project [4]. Observations and Faults : The set of observable events is O = {o1 , . . . , ono }. Complementing O with the set of unobservable events U = {u1 , . . . , unu } determines the whole set of events of the system E = O ∪ U . The occurrence of basic faults that might occur on the system are represented as specific unobservable events noted fi . In the following, we restrict ourselves to the single fault assumption (i.e. only one fault can be present in the system at a given time). The system can then be either in a nominal mode (absence of fault) or in one of the nf fault modes. The set of all possible system modes is hence given by F = {f0 , f1 , ..., fnf }, where f0 = ok. T denotes the set of (infinite) possible trajectories (i.e. sequences of events) occurring in the system, while OBS is the set of all possible sequences of observable events. A trajectory τ ∈ T corresponds 1 2 3
IRISA/INRIA/Universit´e de Rennes1; Campus de Beaulieu, F-35042 Rennes cedex, France, email: marie-odile.cordier, thierry.vidal@irisa.fr LAAS-CNRS; Universit´e de Toulouse; 7, Avenue du Colonel Roche, F31077 Toulouse cedex, France, email: yannick.pencole, louise@laas.fr We assume the liveness of the observations [3].
to only one observable σ, while one σ may correspond to several disctint trajectories. o5
o5
f1
o1
o1
f1 o2
o6 o2
o6
f2
o2
o2 o3 f4 o4
o4
f3
o3 o3
o1 o1
The above figure represents the global model of a discrete-event system. The set of fault modes is F = {ok, f1 , f2 , f3 , f4 }. Fault events are not observable, the other events being observable. o1 o∞ 5 is both a trajectory and the observable obtained over that trajectory including an infinite sequence of o5 . o2 o2 f2 o∞ 2 is another trajectory ∞ yielding the observable o∞ . f o is yet another trajectory which, 1 2 2 interestingly enough, yields the same observable o∞ 2 , which means these two trajectories cannot be discriminated from the observations. Macrofaults: It is not always possible to know with certainty in which mode a system is. It is often even not necessary with respect to reparability. It is why we define the concept of macrofault that represents the belief state referring to the system mode. A macrofault can be seen as an abstraction of system modes. For instance, if a pipe can be in the two basic fault modes leaking or blocked, it can also be said to be in an abnormal macrofault mode, where abnormal corresponds to leaking or blocked. A macrofault Fj is described by a non empty set of fault modes. With our single fault assumption, an ’occurrence’ of Fj means that exactly one of the faults fi ∈ Fj has occurred in the system. For instance, the macrofault {f1 , f2 } represents the fact that either f1 or f2 has occurred . A macrofault may be a singleton (Fj = {fi }). If all basic faults appear in a set of macrofaults E(F) ⊆ 2F , then it is called a covering set. Repairs : A repair plan is defined in a simplified way as, for our purpose, only the existence of such repair plans and their matching to (basic) faults is relevant. The set of available repair plans is denoted R = {r1 , ..., rnr }. The predicate Repair relates repair plans to (macro)faults: Repair(rk , Fi ) means that applying the repair plan rk brings back the system into a nominal state, under the condition that the system is in one of the modes described by the macrofault Fi 4 For instance, 4
rok , the (void) repair plan such that Repair(rok , ok), is assumed to exist.
790
M.-O. Cordier et al. / Characterizing and Checking Self-Healability
the repair plan r1 such that Repair(r1 , {f1 , f2 }) can be executed only if either f1 or f2 has occured. Having a repair plan for a macrofault is equivalent to having a repair plan for all the basic faults belonging to the macrofault, hence the following property : Repair(rk , Fj ) ≡ ∀fi ∈ Fj , Repair(rk , {fi }).
3
Definition 2 (Repairability) A set of macrofaults E(F ) is repairable, noted Repairable(E(F)), iff ∀Fj ∈ E(F) Repairable(Fj ). Example : If the only repair plan is r, with Repair(r, {f1 , f3 }), we indeed get Repairable({f1 , f3 }), and also Repairable({f1 }) and Repairable({f3 }). However, the system is not repairable since the faults f2 and f4 are not repairable.
SELF-HEALABILITY
Self-healability is intuitively defined by “A system is self-healing if, and only if, after the occurrence of any basic fault, a diagnosis is issued that automatically raises a repair plan fitted to the fault.” Behind this intuitive definition, two properties of the system are hidden: diagnosability and repairability. Diagnosability : Diagnosability relies on the notion of fault signatures [1]. Intuitively, a fault signature is the association between a fault and a set of possible observables. We use the following notations : • The predicate yields(fi , σ) means that there exists at least one trajectory in which fi ∈ F is present and that yields the observable σ ∈ OBS. The predicate yields can be generalized to macro-faults: yields(Fj , σ) means that it ∃fi ∈ Fj such that yields(fi , σ). σ is then called an elementary signature, or esignature of the fault Fi , • M F (σ) is the (unique) macrofault containing all faults that may yield σ, i.e. M F (σ) = {fi such that yields(fiS , σ)}. M F can be generalized to sets of e-signatures: M F (Σ) = σ∈Σ M F (σ). In this work, we are not interested in checking that any basic fault can be diagnosed, but we are interested in finding the level of diagnosability of a system. This is why the partition of faults classically used is replaced by a set of macrofaults possibly sharing common faults. Still, each macrofault must be associated to distinct observables and the corresponding sets of observables need to form a partition. Hence the following new definition for diagnosability that extends the classical definition and is suitable for self-healability. Definition 1 (Diagnosability of a set of macrofaults) The covering set E(F) is diagnosable, noted Diagnosable(E(F)), iff there exists a partition π = {Σ1 , . . . , Σm } of the observables OBS such that: E(F ) = {M F (Σj ), Σj ∈ π}. Example: A first straightforward set of macrofaults is E (F) = {F} = {{ok, f1 , f2 , f3 , f4 }} in which faults are indistinguishable: obviously it is diagnosable, the partition being π = ∞ ∞ ∞ ∞ ∞ {{o1o5∞ , o∞ 1 , o2 , o3 o2 , o3 , o4 , o6 }} = {OBS}. The set of macrofaults E1 (F ) = {{ok}, {f1 , f2 }, {f1 , f3 }, {f4 }} ∞ ∞ ∞ ∞ is diagnosable with π1 = {{o1o5∞ }, {o∞ 2 , o3 o2 }, {o1 , o3 , o6 }, ∞ {o4 }}. Note that E1 (F ) also corresponds to another partition ∞ ∞ ∞ ∞ ∞ π2 = {{o1o5∞ }, {o∞ 2 , o3 o2 , o6 }, {o1 , o3 }, {o4 }}. E2 (F) = {{ok}, {f1 }, {f2 }, {f3 }, {f4 }} is not diagnosable because there are some cases in which f1 and f2 cannot be discriminated, there is no partition of observables associated to it (the same for f1 and f3 ). Repairability : A macrofault Fj is repairable if and only if there exists a repair plan that repairs it. Repairable(Fj ) ≡ ∃rk such that Repair(rk , Fj ). The repairability of a set of macrofaults is then defined as the repairability of all the macrofaults in the set.
Self-healability : Our definition for self-healibility directly derives from the definitions of diagnosability and repairability. Definition 3 (Self-healing set of macrofaults) A set E(F) is self-healing iff it is diagnosable and repairable, i.e. Self Healing(E(F)) ≡ Diagnosable(E(F)) and Repairable(E(F)). Definition 4 (Self-healing system) A system is self-healing iff there exists a self-healing covering set E(F). Example : If Repairable(ok), Repairable({f1 ,f3 }), Repairable({f1 ,f2 }) and Repairable(f4 ), then the set E1 (F ) = {{ok}, {f1 , f2 }, {f1 , f3 }, {f4 }} is diagnosable and repairable. The system is self-healing. If Repairable(ok), Repairable({f1 ,f3 }), Repairable(f2 ) and Repairable(f4 ) then the system is not self-healing as there does not exist a repair plan for {f1 , f2 }. Due to lack of space, the algorithm to check whether a system is self-healing is not given.
4
CONCLUSION AND PERSPECTIVES
The main contributions of this paper are first a new and original definition of diagnosability which allows to diagnose possibly overlapping sets of non-discriminated faults, and then using that definition, to propose a thorough and integrated definition of the self-healability of a dynamic system. Interestingly enough, diagnosability of each basic fault is not required but what is needed is a diagnosability level that can be matched to the existing repairs. As far as we know, it is the first time that such a definition is issued. We are currently applying our work to web services in the framework of the WS-DIAMOND European project [4], in which we investigate a number of extensions to address more sophisticated and realistic cases, mostly in terms of the characterization of repair plans, their properties and conditions of applicability. One of the problems is how to deal with multiple faults that may appear sequentially. Another interesting issue refers to temporal conditions that may restrict the applicability of repairs and be in conflict with the time needed to diagnose a fault.
REFERENCES [1] M.-O. Cordier, L. Trav´e-Massuy`es, and X. Pucel, ‘Comparing diagnosability in continous and discrete-event systems’, in 17th International Workshop on Principles of Diagnosis, eds., C.A. Gonz´alez, T. Escobet, and B. Pulido, pp. 55–60, (June 2006). [2] D. Ghosh, R. Sharman, H. R. Rao, and S. Upadhyaya, ‘Self-healing systems survey and synthesis’, Decision Support Systems, 42(4), 2164– 2185, (2007). [3] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis, ‘Diagnosability of discrete event system’, IEEE Transactions on Automatic Control, 40(9), 1555–1575, (1995). [4] The Ws-DIAMOND team, ‘Ws-DIAMOND: Web services DIagnosability, MOnitoring and DIagnosis’, in 18th International Workshop on Principles of Diagnosis, DX’07, pp. 243–250, Nashville (TN, USA), (May 2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-791
791
Improving robustness in consistency-based diagnosis using possible conflicts Belarmino Pulido, Anibal Bregon, Carlos Alonso-Gonz´alez Intelligent Systems Group (GSI). Department of Computer Science, University of Valladolid. Spain email: {belar,anibal,calonso}@infor.uva.es Abstract. Behaviour simulation in Consistency-based Diagnosis requires knowing the initial value. This assumption is not easily fulfilled in real systems, even in the presence of measurements related to state variables due to noise and parameter uncertainties. This work proposes the integration of state observers to estimate initial states for simulation in consistency-based diagnosis with possible conflicts, using the BRIDGE framework, proposing an extension for a class of dynamic systems. Suitable state-observer structural models are obtained through same algorithms used to find possible conflicts – minimal subsystems with analytical redundancy–, without additional knowledge in the models.
1
Introduction
Two research communities have approached traditionally the problem of Model-Based Diagnosis (MBD) in different but complementary ways: the Fault Detection and Isolation community (FDI) born in the Automatic Control world, and the Diagnosis community (DX) rooted in the Artificial Intelligence field. FDI community uses control and statistical decision theories to carry on the fault detection and isolation stages. Main issue in this approach is fault detection robustness. This field has solid theoretical results for linear systems [4, 1], while analysis of non-linear systems is a major research issue. DX approach, has a solid theoretical foundation for static systems, being fault localization and identification its main research issues. Consistency-based diagnosis (CBD) is the most used approach, and GDE is its computational paradigm [5]. Recently, the BRIDGE community [3] established a common framework for sharing results and techniques; it is based on the comparison between CBD via conflicts [5] and FDI via analytical redundancy relations (ARRs) obtained through structural analysis [1]. This work is based on the Possible Conflicts approach [9], PCs for short, an off-line dependency compilation technique from the DX community. CBD using PCs is based on the on-line simulation of subsystems equivalent to conflicts. The approach needs the initial value of state variables to re-start the simulation. Main goal of this work is to improve the robustness of the method through a more precise estimation of the initial state, without modifying its fault isolation capabilities, and its consistencybased approach. Based on the similarity between PCs and ARRs in the BRIDGE framework [9], this work uses PCs to design stateobservers, which are used to estimate the initial states for simulation. This article is organized as follows. First, assumptions, techniques, and working principles are shown. Second, a new way to derive the structure of state observers from a dynamic system using possible conflicts, and an integration method of simulation and stateobservers are introduced. Finally, results, and discussion with related works are provided.
2
PCs, ARRs, and conflicts in the BRIDGE framework
Possible conflicts are those sub-systems capable to become conflicts in CBD, i.e. minimal subsets of equations containing enough analytical redundancy to perform fault diagnosis. Computation of PCs is performed on an abstract model for the set of equations in the system description, and are obtained off-line via two core concepts: minimal evaluation chains, MECs, and minimal evaluation models, MEMs. MECs are minimal over-constrained sets of relations, and they represent a necessary condition for a conflict to exist. MEMs are local propagation paths that describe how to use the relations of a MEC to predict behavior and to provide redundancy. Each MEM describes an executable model, which can be used to perform fault detection. PCs fit in the BRIDGE framework, which was defined for static systems [3]. It was demonstrated that PCs are equivalent to potential conflicts and the support for minimal ARRs [9]. This work will provide an specific extension for a class of dynamic systems. First, the influence of temporal information in PCs and ARRs calculation must be analyzed. Concepts will be illustrated using the following system.
3
Case study
The system (figure 1) is made up of a water tank, T , a valve, V , and a PID controller that acts through valve, uc , to keep the level of the tank, h, close to its reference, href . Other elements are the input flow sensor, Qi , and the output flow sensor, Qo .
href
Qi h
Controller uc
Qo
Figure 1. Our system is made up of tank, a valve and one controller.
4
Using PCs to design and integrate state observers
While DX approaches have opted by simulation techniques – known as integral approach for behavior estimation–, relying mainly in qualitative models; traditionally, the FDI community has opted by numerical models, and has rejected simulation. Most FDI methods rely upon derivative estimation [1]– known as derivative approach–, which has problems related with disturbances and uncertainties. It is known that integral and derivative approaches can provide equivalent results for behavior estimation with numerical models [2]. Moreover, in the FDI community simulation, estimation, and state observers are equivalent for linear models [4]. In fact, parity- and observer-based approaches provide residuals with similar structures.
792
B. Pulido et al. / Improving Robustness in Consistency-Based Diagnosis Using Possible Conflicts
Comparing the structure of state-observers and parity equations, several authors have already proved that they can be equivalent [6], according to the general system description which can be seen as: ˆ˙ ˆ X(t) = A · X(t) + B · U (t) + K · (Y (t) − Yˆ (t))
(1)
ˆ Yˆ (t) = C · X(t)
(2)
Δ = 30 Δ = 60
Fault arises a t = P C1 EST1 IN T1 P C2 EST2 IN T2
Depending on the selected value for the gain K, we obtain: from simulation for K = 0, to prediction for A = K · C. Other values for K provides a state-observer. Dependency-compilation and state-observer design Models are made up of instantaneous -static- and differential -dynamic- constraints. Algorithms used to find PCs provide an interesting sideresult if integral causality -integral approach- is used: the Minimal Evaluable Model can be implemented as a simulator or as an stateobserver. Proposition: Those MEMs containing a state variable can provide the minimal structural description for a state observer, if there exists one instantaneous constraint between the estimated state variable and its observed value1 . Integration proposal: increasing robustness with state observers State-observers generate a state-variable estimation, without fault, with noise in sensors and small parameter disturbances, and can be compared for fault detection. Their main drawbacks are: small persistence for activated residuals, small activation time (noise), and small fault masking. On the other hand, simulation in an interval, Δt, and using a dissimilarity comparison, DTW, in the interval, has different detection capabilities, being less sensible to noise in measurements. Semiclosed loop simulation iteratively introduces observations for initial conditions when the simulation interval elapsed. Our proposal is the integration of state observers within the CBD framework with Possible Conflicts, because observers will improve the estimations for the different states of the Possible Conflicts without fault, and they will not interfere with the behaviour of the Possible Conflicts in faulty situations. Running both MEMs in parallel, and assuming there is no fault detection, the state estimation given by the state-observer can be used as the initial state for the possible conflict simulation. This simple integration scheme shows the power of the proposal. The decision step in fault detection can be tuned giving more weight to the speed or false alarms ratio, level of noise or parameter uncertainty, etc.
5
Results on the case study
The study was made on a data-set, made up of several simulation scenarios for each fault mode in the plant. We introduced noise in the measurements (5%), and model uncertainty (5%). Each simulation lasted 1000 seconds, and contained several changes in the reference level of the tank. We randomly generated fault magnitudes at different time instants within the interval [420, 480]. We have reduced the mean and maximum values of thresholds for fault detection by integrating state observers within the PCs computation in nominal situations (see table 1 - upper part). For faulty situations, detection time for different times fault occurrences are shown for PCs, State Observers, and the integration of both. Faults are pipe blockages of 10% and 30%, with 5% on sensor noise, and 5% on parameters disturbances (see table 1 - mid and lower part). Due to space limitations we do not provide results for other fault modes which provided similar results. 1
Proof and examples of this proposition can be found in [8]
Fault arises a t = P C1 EST1 IN T1 P C2 EST2 IN T2
Nominal situation P C1 + EST1 P C2 + EST2 Med. Max. Med. Max. 9.16 32.92 49.56 58.48 5.82 23.78 40.15 55.99 Faulty situation: 10% pipe blockage 420 430 440 450 no no no 540 fp 646 fp fp 540 540 480 540 no no no no no no no no 480 540 540 fp Faulty situation: 30% pipe blockage 420 430 440 450 480 480 480 540 437 447 452 fp 480 480 480 480 540 480 540 540 422 427 432 436 480 480 480 480
P C3 + EST3 Med. Max. 22.93 52.42 21.54 51.82 460 no 647 540 fp no 540
470 no 660 540 540 no 480
460 540 457 480 480 442 480
470 540 572 480 480 446 480
Table 1. Results for the case study. f p is a false positive. no is a false negative. P C3 is not shown because is not affected by these faults.
6
Conclusions
Based on identical fault isolation capabilities between FDI approaches, and PCs and ARRs in the BRIDGE framework, our proposal is to use algorithms for computing PCs as a tool for stateobserver design. These algorithms can provide the structure of MEMs which can be implemented as state-observers without including additional constraints in the model. Our work proposes a simple integration of those two expressions for the same MEM in a PC, if possible: use an state-observer for initial state estimation, then use the estimation for a semi-closed loop simulation. Decision logic for fault detection can be tailored for each system, to get desired detection or false alarm rates. Results on a simulation plant are promising. We are testing on more demanding scenarios. Combination of state-observers and CBD has been done before [7], but the integration –fault detection only with state-observers– and the isolation stage –propagation back and forward in a temporalcausal graph– were different than those proposed in this work. Acknowledgments: This work was supported by the Spanish Ministry of Education and Culture (MEC 2005-08498). REFERENCES [1] M. Blanke, M. Kinnaert, J. Lunze, and M. Staroswiecki, Diagnosis and Fault Tolerant Control, Springer, 2003. [2] M.J. Chantler, T. Daus, S. Vikatos, and G.M. Coghill, ‘The use of quantitative dynamic models and dependency recording engines’, in Procs. of DX’96, pp. 59–68, Val Morin, Quebec, Canada, (1996). [3] M.O. Cordier, P. Dague, F. L´evy, J. Montmain, M. Staroswiecki, and L. Trav´e-Massuy`es, ‘Conflicts versus analytical redundancy relations: a comparative analysis of the model-based diagnosis approach from the artificial intelligence and automatic control perspectives’, IEEE Trans. Syst. Man Cy. B., 34(5), 2163–2177, (2004). [4] J.J. Gertler, Fault detection and diagnosis in Engineering Systems, Marcel Dekker, Inc., Basel, 1998. [5] W. Hamscher, L. Console, and J. de Kleer(Eds.), Readings in Model based Diagnosis, Morgan-Kaufmann Pub., San Mateo, 1992. [6] R. Isermann, Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance, Springer-Verlag, 2006. [7] P. Mosterman and G. Biswas, ‘Diagnosis of continuous valued systems in transient operating regions’, IEEE Trans. Syst. Man Cy. B., 29(6), 554– 565, (1999). [8] B. Pulido, C. Alonso, A. Breg´on, V. Puig, and T. Escobet, ‘Analyzing the influence of temporal constraints in possible conflicts calculation for model-based diagnosis’, in Procs. of DX’07, USA, (2007). [9] B. Pulido and C. Alonso-Gonz´alez, ‘Possible conflicts: a compilation technique for consistency-based diagnosis’, IEEE Trans. Syst. Man Cy. B., 34(5), 2192–2206, (2004).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-793
793
Dependable Monitoring of Discrete-Event Systems with Uncertain Temporal Observations Gianfranco Lamperti and Marina Zanella 1 Abstract. In discrete-event system monitoring, a set of candidate diagnoses is output at the reception of each observation fragment. However, when the observation is uncertain, this result may be not dependable: the sets of diagnoses, relevant to consecutive observation fragments, may be unrelated to one another, and, even worse, they may be unrelated to the actual diagnosis. To cope with this problem, the notion of monotonic monitoring is introduced, which is supported by specific constraints on the fragmentation of the uncertain observation, leading to the notion of stratification.
1
INTRODUCTION
Model-based diagnosis of discrete-event systems (DESs) has arisen a great interest [3, 2, 8]. A DES consists of several components, where the behavior of each component is represented by an automaton. Interconnections between components can be modeled explicitly [1] and/or implicitly, that is, a communication buffer may be indistinguishable from a component [8]. Several state changes of distinct components can occur simultaneously [9, 10], or not [1, 7]. Two diagnostic tasks inherent to DESs can be singled out, a-posteriori diagnosis [1] and monitoring-based diagnosis [6], both requiring an observation as input. Therefore, observation features and models have been investigated [5, 4]. This paper defines monotonicity, a property that consists in producing as output at each monitoring step a set of diagnoses that includes the actual diagnosis, and discusses the granularity with which a temporally uncertain observation has to be processed by any sound and complete problem-solving method so as diagnostic results are monotonic, whichever the DES at hand.
2
MONITORING
Given a system ˙ and an initial state ˙0 , each evolution of ˙ is confined within the behavior space, Bhv.˙; ˙0 /. The latter is a directed graph rooted in ˙0 , where each node is a state of ˙ and each arc is a transition. Each (possibly empty) sequence of transitions rooted in ˙0 is a history of ˙ . Let T be the domain of transitions in ˙ and Lo a domain of observable labels. A viewer V of ˙ is a function from T to .Lo [fg/, where is the null label. If .T; / 2 V then T is silent else T is visible. The signature hV is the sequence of observable labels relevant to h, h V D h` j T 2 h; .T; `/ 2 V ; ` ¤ i. Ideally, the signature should represent how h is observed outside ˙ . However, what is actually perceived is the observation of ˙ , O D .N ; A/, which is a directed acyclic graph (DAG) where N is the set of nodes and A the set of arcs, with the following uncertainty properties: (Logical uncertainty) Each label ` in the signature corresponds to a node in O; such a label is perceived as a subset of .Lo [ fg/ of candidate labels, necessarily including `; (Node uncertainty) Additional (spurious) nodes are possibly inserted into O, each of which is associated with a subset of candidate labels necessarily including ; 1
Universit`a di Brescia, Italy, e-mail: flamperti,zanellag@ing.unibs.it
(Temporal uncertainty) Absolute temporal ordering of the signature is relaxed to partial ordering (with the latter being consistent with the former). The extension of a node N in N , written kN k, is the set of labels embodied in N . A candidate signature of O is a sequence of labels obtained by first picking up a label from each kN k, N 2 N , without violating the ordering imposed by A, and then removing the labels from the sequence. The extension of O, kOk, is the whole set of candidate signatures of O. Proposition 1 .h V / 2 kOk. The ruler of ˙ is a mapping from T to .Lf [fg/, where Lf is a set of fault labels. If .T; / 2 R then T is normal else T is faulty. The diagnosis h ˝ R is the set of fault labels f` j T 2 h; .T; `/ 2 R; ` ¤ g. A diagnosis is empty when all transitions in h are normal. The uncertain observation taken as input by a monitoring task is not given as a whole but, rather, as a list of several fragments, where each fragment is composed of one or several nodes in N along with the relevant temporal constraints (arcs) in A. Formally, a fragmentation of O D .N ; A/ is a sequence O D hF1 ; : : : ; Fn i where each fragment Fi D .Ni ; Ai /, i 2 Œ1 :: n, is such that fN1 ; : : : ; Nn g and fA1 , : : : ; An g are partitions of N and A, respectively. Each fragment Fi represents a set of observable events received in the current time interval. Each node in Ni is an event, and Ai includes all and only the temporal relationships linking the nodes in Ni with their parent nodes. Without loss of generality, the parents of nodes in a new fragment are required to be in the fragments received up to now. Each nonempty prefix hF1 ; : : : ; Fi i of O corresponds to a subobservation OŒi D .NŒi ; AŒi /, where NŒi D
i [ j D1
Nj
AŒi D
i [
Aj
(1)
j D1
The empty sub-observation is OŒ0 D .;; ;/. If O is known, O is univocally defined by the sequence of Ni , as Ai necessarily includes all (and only) the arcs entering nodes in Ni . For each i 2 Œ0 :: n, we can define a sub-problem }Œi .˙ / D .˙0; V ; OŒi ; R/ where the solution of }Œi .˙ /, written .}Œi .˙ //, consists of a sound and complete set of candidate diagnoses, with each diagnosis being entailed by a history h whose signature conforms to OŒi . As such, .}Œi .˙ // D fı j ı D h ˝ R; h 2 Bhv.˙; ˙0 /; h V 2 kOŒi kg. A monitoring problem .˙ / D .˙0 ; V ; O ; R/ is a 4-tuple involving an initial state, a viewer, a fragmented observation, and a ruler. Its solution, written ..˙ //, is the sequence of the solutions of the diagnosis sub-problems }Œi .˙ /, i 2 Œ0 :: n, that is, ˝ ˛ ..˙ // D .}Œ0 .˙ //; : : : ; .}Œn .˙ // . N O N ; R/ N be a monitoring probN D .˙0 ; V; Example 1 Let .˙/ lem inherent to a DES called ˙N whose behavior space, rooted in ˙0 (node 0), is displayed in Fig. 1 along with its viewer
794
G. Lamperti and M. Zanella / Dependable Monitoring of Discrete-Event Systems with Uncertain Temporal Observations
Figure 2.
Proposition 3 A monitoring problem involving a stratified observation is monotonic.
N Figure 1. Behavior space (a), and viewer and ruler matrix (b) for ˙.
N (gray cells). Suppose that VN (white matrix-cells) and ruler R the actual history is hN D hX1 ; X2 ; Y2 ; Z4; Z3 ; Y4 ; W4 ; Z2 ; X1 i, N depicted in Fig. 2. A possible and the relevant observation is O, N N fragmentation O of O is defined by the following sequence of N sets of nodes: hfN1 ; N3 g; fN2g; fN4 g; fN5 gi. Then, ..˙// D h0 ; 1 ; 2 ; 3; 4 i, where 0 D f;g, 1 D ffwgg, 2 D ffwg; fxg; fx; ygg, 3 D ffwg; fxg; fx; yg, fx; y; zgg, and 4 D ffwg; fx; y; zgg. Example 1 shows that the solution of a monitoring problem is disappointing. In fact, at monitoring step 1, one is induced to believe that w is a quite certain fault but, from iteration 2 on, fault w is not certain any more. The rationale behind this deceitful behavior is that any sound and complete set of outputs complies with the whole observation received so far as it were a complete observation, while it is not. Therefore, the extension of the observation may change non-monotonically from one step to another, thus producing the highlighted negative effect. Let .˙ / D .˙0 ; V ; O ; R/, where O D hF1 ; : : : ; Fn i. Let h0 ; 1 ; : : : ; n i be the solution of .˙ / and ı the actual (unknown) diagnosis of the actual (unknown) history of ˙ . We say that .˙ / is monotonic iff 8i 2 Œ0 :: .n 1/ there exists ıi 2 i such that ı0 ı1 : : : ın1 ı. Example 2 The monitoring problem .˙N / in Example 1 is not monotonic: the actual diagnosis is ıN D fx; y; zg, for which the monotonicity condition does not hold as 1 D ffwgg includes no diagnoN sis that is a subset of ı. The monotonicity of a monitoring problem .˙ / depends on the nature of the fragmentation of O. The trivial fragmentation, involving the whole observation O as the unique fragment, supports monotonicity, but this is in fact a-posteriori diagnosis, not monitoring. Thus, we are interested in nontrivial fragmentations that guarantee monotonicity, independently of the specific system at hand, namely nontrivial stratified observations. A fragmentation O D hF1 ; : : : ; Fn i is stratified iff for each fragment Fi D .Ni ; Ai /, i 2 Œ1 :: n, we have 8N 2 Ni .Unrl.N / Ni /
N for ˙N . Observation O
(2)
where Unrl.N / is the set of all the nodes in N whose reciprocal emission order with respect to N is unknown. A stratified fragmentation is called a stratification and each fragment a stratum. Condition (2) requires that all nodes that are neither ancestors nor descendants (namely, unrelated) of nodes in the i -th stratum, be in the i -th stratum themselves. Proposition 2 Let O D hF1 ; : : : ; Fn i be a stratified observation. Then, for each i 2 Œ1 :: n, kOŒi k is composed of all the signatures in kOŒi 1k (possibly) extended with further observable labels.
N of the monitoring probExample 3 Consider a variant 0 .˙/ N is replaced by the lem defined in Example 1, where O N stratification hfN1 g; fN2; N3 g; fN4 g; fN5 gi. Then, .0 .˙// D h00 ; 01 ; 02 ; 03; 04 i, where 00 D f;g, 01 D f;, fwg, fxg, fygg, 02 D f fwg, fxg; fx; ygg, 03 D f fwg, fxg , fx; yg, fx; y; zgg, and N is monotonic. 04 D f fwg, fx; y; zgg. As expected, 0 .˙/ Property (2) is conserved when several contiguous strata are grouped together to form coarser-grained fragments. The contrary does not hold: when two or more contiguous fragments are obtained by splitting a single stratum, stratification may be lost. In the finest stratification strata cannot be further split without losing stratification. Proposition 4 The finest stratification is unique. Proposition 5 The finest stratification of an observation represented by a disconnected DAG is the trivial fragmentation. Proposition 6 Let i and i C1 , i 2 Œ0 :: .n 1/, be two consecutive elements in the solution of a monitoring problem involving a stratified observation. Then, 8ı0 2 i C1 .ı0 ı; ı 2 i /. In other words, monotonic monitoring is a shrink and expand operation, where i is first shrunk and then the remaining candidates are possibly extended with additional faults, to make up i C1.
REFERENCES [1] P. Baroni, G. Lamperti, P. Pogliano, and M. Zanella, ‘Diagnosis of large active systems’, Artificial Intelligence, 110(1), 135–183, (1999). [2] L. Console, C. Picardi, and M. Ribaudo, ‘Process algebras for systems diagnosis’, Artificial Intelligence, 142(1), 19–51, (2002). [3] R. Debouk, S. Lafortune, and D. Teneketzis, ‘Coordinated decentralized protocols for failure diagnosis of discrete-event systems’, Journal of Discrete Event Dynamic Systems: Theory and Application, 10, 33– 86, (2000). [4] A. Grastien, M.O. Cordier, and C. Largou¨et, ‘Incremental diagnosis of discrete-event systems’, in Sixteenth International Workshop on Principles of Diagnosis – DX’05, pp. 119–124, Monterey, CA, (2005). [5] G. Lamperti and M. Zanella, ‘Diagnosis of discrete-event systems from uncertain temporal observations’, Artificial Intelligence, 137(1–2), 91– 163, (2002). [6] G. Lamperti and M. Zanella, ‘A bridged diagnostic method for the monitoring of polymorphic discrete-event systems’, IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, 34(5), 2222– 2244, (2004). [7] G. Lamperti and M. Zanella, ‘Flexible diagnosis of discrete-event systems by similarity-based reasoning techniques’, Artificial Intelligence, 170(3), 232–297, (2006). [8] Y. Pencol´e and M.O. Cordier, ‘A formal framework for the decentralized diagnosis of large scale discrete event systems and its application to telecommunication networks’, Artificial Intelligence, 164, 121–170, (2005). [9] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D.C. Teneketzis, ‘Failure diagnosis using discrete-event models’, IEEE Transactions on Control Systems Technology, 4(2), 105–124, (1996). [10] R. Su and W.M. Wonham, ‘Global and local consistencies in distributed fault diagnosis for discrete-event systems’, IEEE Transactions on Automatic Control, 50(12), 1923–1935, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-795
795
Distributed Repair of Nondiagnosability Anika Schumann and Wolfgang Mayer and Markus Stumptner 1 1
Introduction
Automated fault diagnosis has significant practical impact by improving reliability and facilitating maintenance of systems [1]. Given a monitor continuously receiving observations from a dynamic eventdriven system, diagnosis algorithms infer possible fault events that explain the observations. For many applications, it is not sufficient to identify what faults could have occurred; rather, one wishes to know what faults have definitely occurred. Computing the latter requires diagnosability of the system, that is, the guarantee that the occurrence of a fault can be detected with certainty after a finite number of subsequent observations [2]. This paper defines a distributed framework that assists in assessing and improving the diagnosability of discrete-event systems. In this context, a system is diagnosable iff the presence or absence of each unobservable fault event can always be deduced once sufficiently many subsequent observable events have occurred. Otherwise, the system must be altered, for example by adding additional sensors, to allow to discriminate between ambiguous system behaviours. If the system is not diagnosable, additional sensors are required to distinguish the ambiguous system behaviours. Several past approaches deal with the problem of selecting sensor placements to ensure diagnosability of a system. However, the problem of computing an optimal sensor set with minimal size has a complexity exponential in the number of possible sensor placements [6]. Existing sensor placement algorithms are based on a global representation of the system, which may not be computable for large systems. In this paper we address the diagnosability problem in a distributed way by identifying those system behaviours that require modification to restore diagnosability. In fact, we show how to determine those subsystems whose modification is guaranteed to make the entire system diagnosable.
2
Diagnosability of discrete event systems
As in [2], we consider a discrete-event system G composed of components G1 , . . . , Gn that are each modelled as finite state machine (FSM). Here the transitions are partitioned into fault transitions and other locally unobservable transitions, transitions representing shared events that occur simultaneously in all concerned components, and observable transitions. A fault of the system is diagnosable iff its (unobservable) occurrence can always be deduced after finite delay [2]. To decide diagnosability we use the twin plant approach presented ei in [3]. It computes for each component the interactive diagnoser G 1
University of South Australia, Adelaide, Australia. Email: {schumann,mayer,mst}@cs.unisa.edu.au. This work was partially supported by the Australian Research Council under Discovery Grant DP0560183.
that gives the set of faults that can possibly have occurred for each seˆ i is obquence of observable and shared events. A local twin plant G tained by synchronising two instances of the diagnoser based on the observable events. Each path represents two indistinguishable system behaviours (i.e. two behaviours that emit the same sequence of observations). The twin plant states are partitioned into diagnosable and possibly nondiagnosable states [3]. A subset of the latter are the nondiagnosable states. A fault F is diagnosable in system G iff its ˆ n ) has no path with a cycle ˆ1, . . . , G global twin plant (GTP) Sync(G containing at least one observable event and one F -nondiagnosable state2 . Such a path is called a critical path. Unfortunately, computing the GTP is prohibitively expensive for large systems. Our algorithm avoids scalability issues by computing nondiagnosable behaviours iteratively in a distributed approach, such that the global model need not be derived in many cases. We start with a set of twin plants of individual subsystems that characterise all paths that (may possibly) admit nondiagnosable behaviour (i.e. paths with a (possibly) nondiagnosable state). By composing individual models, behaviours that are infeasible or distinguishable in a larger subsystem are eliminated incrementally until (non)diagnosability can be decided or resource limits are reached. This work is an extension to the one presented in [3]. However, the latter can only verify diagnosability in a distributed way. In contrast, our approach allows to confirm diagnosability and nondiagnosability given partial models of a system.
3
Distributed (non)diagnosability assessment
Our framework is based on the two properties below. Assume a set of ˆ is created from a partition of a discrete event system G. twin plants G Then, ˆ is free of cycles that include an observable 1. G is diagnosable if G transition and a possibly nondiagnosable state, or if there is a twin ˆ where all states are diagnosable. plant in G ˆ includes a path to a pos2. G is nondiagnosable if each plant in G sibly nondiagnosable state that does not have events shared with any other plant and at least one of these paths has a cycle with a possibly nondiagnosable state and an observable transition. The first property is derived from previous results on diagnosability [3]. The correctness of the second one follows directly from the Sync operation: The synchronisation of above paths from all twin plants results in a set of paths in the GTP, each containing an observable cycle with a possibly nondiagnosable state. Since every possibly nondiagnosable state in the GTP is nondiagnosable [3], such a synchronisation must contain a critical path and thus establishes the nondiagnosability of F . 2
The result of Sync is a FSM whose state space is the Cartesian product of the state spaces of the components, and whose transitions are synchronised in that any shared event always occurs simultaneously in all components that define it.
796
4
A. Schumann et al. / Distributed Repair of Nondiagnosability
Algorithm
{}-f1 a0
We use the above results to decide whether a system is (non)diagnosable. Starting with twin plants representing individual subsystems, our algorithm iteratively removes locally nondiagnosable paths by synchronising twin plants to form larger subsystems. In case formerly indistinguishable system behaviours become discriminable through observable events of the larger subsystem, the path is removed. Otherwise, the path remains dependent (i.e. has events shared with other subsystems), but may become independent after further synchronisation. The aggregation of subsystems continues until either condition (1) or (2) is met, or until resources are exhausted. In the latter case paths exist where it is not known if the system is indeed (non)diagnosable, and we return the locally non-diagnosable paths (a superset of the truly nondiagnosable subsystem) as an approximation. Hence, our approach admits certain anytime characteristics. Since the diagnosability problem is NP hard, some systems may require the computation of the GTP to assess diagnosability. While we cannot avoid this intrinsic complexity, we stop with an approximate solution in case resource limits are insufficient to obtain the exact solution.
5
Inferring repair alternatives
If a system cannot be proved diagnosable, an over-approximation of possibly nondiagnosable subsystems is obtained, represented by their twin plants that contain possibly nondiagnosable paths. To ensure the overall system is diagnosable, certain transitions must be modified such that the potentially nondiagnosable paths cannot manifest in the revised model. We identify the relevant transitions using the following labelling scheme: every twin plant transition t is labelled with the set of transition identifiers that comprises all those transitions that have been synchronised to obtain t: every component transition (except fault transitions) is assigned a unique identifier label, the identifier label is e i , and, subpropagated to the corresponding interactive diagnoser G ˆi. sequently, to the corresponding transition in twin plant G The FSMs in Figure 1 illustrate the labelling. Every transition t e i is labelled with the set of transition of the interactive diagnoser G identifiers obtained from the transitions in Gi represented by t. In the twin plant every shared transition corresponds to exactly one transition in the interactive diagnoser, and every observable transition refers to two transitions (one from the left and one from the right diagnoser). For shared transitions, the labelling is kept. For observable transitions the identifier labels are obtained from the union of the two corresponding diagnoser transition labels. Since the algorithm described below requires the synchronisation ˆ = of twin plants, the transition identifiers for every twin plant G ˆ ˆ Sync(G , G ) must be determined. This label propagation is similar to the propagation described previously: every transition labelled by ˆ, G ˆ } ˆ ∈ {G an event that only occurs in one of the twin plants G carries the same identifier as the unique corresponding transition in ˆ is obtained as the ˆ . Otherwise, the identifier for a transition in G G union of the identifiers of the two corresponding transitions. Through transition labels those components where behavioural modification would remove a cause of nondiagnosability can be identified. For instance, the critical paths shown in Figure 1(c) can be eliminated by changing the transition from x5 to x7, which may be accomplished by modifying either component transition t3 or t5 in Figure 1(a). A system designer might choose to do this by replacing one of the sensors emitting event o1 by one emitting a different event, thus changing the component’s behaviour. Then the behaviours rep-
a1
{t2}-s2
a2
{t3}-o1
{t5}-o1
{t4}-s1
a4
a3
{t6}-s1 {t7}-o1
a5
(a) Labelled component model {t6}-s1 a5, {f1} {t2}-s2 a2, {f1} {t3}-o1 a4, {f1} {t7}-o1 a0, {} {t4}-s1 {t5}-o1 {t6}-s1 a3, {} a4, {} {t7}-o1
a5, {}
(b) Labelled diagnoser {t6}-r:s1 {t2}-l:s2 x0
{t4}-r:s1
x2 x3
{t4}-r:s1 {t2}-l:s2
x5
{t3,t5}-o1
x7
{t6}-l:s1
x12 x11
{t6}-l:s1 {t6}-r:s1
x13
{t7}-o1
(c) Labelled twin plant. Grey states denote nondiagnosable ones and white states denote diagnosable ones.
Figure 1. Assignment of transition identifiers. Solid, dashed, and dotted lines denote observable, shared, and failure transitions, respectively.
resented by the two transition sequences from a0 to a4 become distinguishable.
6
Conclusion and future work
We have outlined a distributed algorithm that ascertains (non)diagnosability of distributed event-driven systems. We have shown how to identify component behaviours and transitions that, if modified, render a system diagnosable. Our approach has two distinct features: first, our algorithm can find solutions of a whole system by operating on partitions thereof, and, second, an approximation is returned if computational resources to construct the entire system are not available. Diagnosability assessment and repair can be used to analyse physical and abstract systems such as distributed computing processes. Our work is particularly relevant for the latter, since assessing and designing monitoring capabilities of a system that are sufficient to allow compensation and reconfiguration to take place are active areas of research [4, 5]. As part of future work we intend to extend our approach to incorporate the costs for modifying the system and to explore a richer model of possible transition modifications, tailored to the analysis of distributed software systems.
REFERENCES [1] G. Lamperti and M. Zanella, Diagnosis of active systems, Kluwer Academic Publishers, 2003. [2] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis, ‘Diagnosability of discrete event system’, IEEE Transactions on Automatic Control, 40(9), 1555–1575, (1995). [3] A. Schumann and Y. Pencol´e, ‘Scalable diagnosability checking of event-driven systems’, in IJCAI-07, pp. 575–580, (2007). [4] Rajesh Thiagarajan, Markus Stumptner, and Wolfgang Mayer, ‘Semantic web service composition by consistency-based model refinement’, in The 2nd IEEE Asia-Pacific Service Computing Conference (APSCC 2007), pp. 336–343, Tsukuba, Japan, (December 2007). [5] WSDIAMOND, ‘WS-Diamond deliverable D5.1: Characterization of diagnosability and repairability for self-healing web services’, Technical Report IST-516933, University of Torino and others, (April 2007). [6] T. Yoo and S. Lafortune, ‘On the computational complexity of some problems arising in partially-observed discrete-event systems’, in American Control Conference, volume 1, pp. 307–312, (2001).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-797
797
From constraint representations of sequential code and program annotations to their use in debugging1 Mihai Nica and Franz Wotawa2 1
Introduction
Debugging, i.e., the detection, localization, and correction of bugs, has been considered an important task in software engineering. A lot of research has been devoted to debugging but mainly to fault detection. In this paper we focus on fault localization, which is based on the constraint representation of programs. For this purpose, programs are converted into their equivalent constraint satisfaction problem (CSP). A solution of the corresponding CSP is a diagnosis candidate. Besides the source code, a failure revealing test case has to be given. For more information regarding CSP we refer to [2]. The work described in this paper is most closely to the work of Ceballos et al. authors [9], where constraint programming is used for fault localization. There approach requires that the programmer provides contracts, i.e., pre- and post-conditions, for every function. However, the authors do not investigate the complexity of solving the resulting problem and the scalability to larger programs. In particular, they do not consider structural decomposition or other methods for improving constraint solving, which would make the approach feasible. In order to complement previous research, we investigate the complexity of solving the CSP corresponding to a debugging problem that comprises the source code and the test case. In the past, in order to find problem classes which are tractable, much work has been done on the structural decomposition of CSPs. Gottlob et al. proposed the hypertree decomposition, and they showed that this decomposition method generalizes other important methods [3, 4]. The hypertree width, a characteristic of the structure of the constraint system, is a measure for the complexity of solving a CSP and, therefore, a measure for the complexity of the debugging problem. In other words, by performing a hypertree decomposition we can obtain a metric for the complexity of debugging.
1. 2. 3. 4. 5.
Figure 1.
Example
In this section we use a small example program to motivate fault localization using constraint-based reasoning with integrated annotations. For the program in Figure 1 assume that Line 3 is changed to ’while (i <= x) {’ which leads to an obviously wrong implementation. If we are only interested in finding single faults at the statement level, we use the following process. Statement by statement we go through the program and assume the current statement to be faulty. All other statements are considered to work as expected. 1
2
This research has been funded in part by the Austrian Science Fund (FWF) under grant P20199-N15 and by the FIT-IT research project Self Properties in Autonomous Systems(SEPIAS) which is funded by BMVIT and the FFG Technische Universit¨at Graz, Institute for Software Technology, 8010 Graz, Inffeldgasse 16b/2, Austria, {mihai.nica,wotawa}@ist.tugraz.at. Authors are listed in alphabetical order.
A program for computing the product of two natural numbers
When assuming a statement to be faulty, we cannot derive a value for those variables defined in the statement. A variable is said to be defined within a statement if a value is assigned to the variable. Such a semantics for faulty statements is implemented in the previous model-based diagnosis approaches of debugging, e.g., in [5]. We now assume that Line 1 of the multiplication program behaves faulty. In this case the variable i in Line 1 is assigned the undefined value ?, that means: 1.i =?;. Because of this change, we are not able to decide whether the condition in Line 3 evaluates to true or false. Hence, no values for r or i can be determined, and finally, we cannot contradict the expected value. As a consequence Line 1 is a valid diagnosis accordingly to model-based diagnosis [6]. The same happens when assuming Line 2 to be faulty. In this case r has no value assigned. From the other information we know that the subblock of the while is executed once. Hence, we receive the following equation, where the available information is given in parentheses: 4.
2
{ x ≥ 0 ∧ y ≥ 0 } // PRE-CONDITION i == 0; r == 0; while (i < x) { { r == i · y } // INVARIANT r = r + y; i = i + 1; } { r == x · y } // POST-CONDITION
{r =? ∧ y=2}
r = r + y;
{r=0}
This equation can be solved by setting the value of r (before executing the statement) to -2 which does not contradict the value ?. A similar situation occurs for the other statement, and hence, there is no way of excluding even a single statement from the list of possible bug candidates. This problem of not being able to exclude statements is due to the fact of missing information. In order to overcome this problem we have to combine verification information and debugging. For this purpose we consider program annotations which can also be used for verification based on Hoare’s calculus like the one given in Figure 1. When now using the same procedure for finding single faults, only Lines 1 and 3 remain as diagnosis results. We now prove that Line 2 can be excluded and it is easy to see that the same argument applies to Line 4 and 5 as well. If assuming Line 2 to be faulty, we receive the following equation: { r == i · y ∧ i==1 ∧ x==0 ∧ y==2}
798
4.
M. Nica and F. Wotawa / From Constraint Representations of Sequential Code and Program Annotations to Their Use in Debugging
Name BinSearch BinSearch Binomial Binomial Hamming Hamming Huffman Huffman whileTest whileTest Permutation Permutation Permutation Adder SumPowers SumPowers SumPowers IscasC432 ComplexHypertree ComplexHypertree ComplexHypertree
r = r + y; {r==0}
From i==1 and i==0 we derive r==0 before executing the statement. Hence, we obtain r to be 2 after the execution which contradicts the expected value of r. Statement 2 is no single fault diagnosis anymore. This simple example shows that the integration of verification information that is based on program annotations really improves debugging. Hence,a representation of programs and there annotations as constraints together with a constraint solver can be used to check correctness assumptions of program statements.
3
Debugging process
The whole conversion algorithm of programs into their equivalent CSP representation and its use in debugging is described in [10]. We only briefly discuss the overall diagnosis process that comprises the following steps: 1. Remove loops: The first step is to remove all while statements and recursive function calls by ’unrolling’. For this purpose a while statement is converted into a nested if-statement. A similar procedure is done for recursive functions. Since, the maximum number of iterations is known for a given test case, the resulting loop-free program behaves in the same way like the original program. 2. SSA Conversion: In the second step, the loop-free program is converted into its static single assignment (SSA) form. In the SSA form every variable is defined once. For more information regarding the SSA form we refer to [1]. In this step the assertions are also converted. 3. The CSP’s hyper-tree: From the SSA form we build the constraints system and its corresponding hyper-tree. This is done by mapping every program variable to its corresponding constraint variable. Every assignment is mapped directly to a constraint. The behavior of the constraints is given by the semantics of the corresponding statements. 4. Diagnosis: In the diagnosis step, we use the resulting CSP and the given test case directly for solving the obtained debugging problem. For this purpose we use the TREE* algorithm [7]. The algorithm requires an acyclic CSP, which can be obtained by applying for example hyper-graph decomposition [3, 4] or other decomposition methods. The combination of TREE* and composition method is described in [8]. When using the debugging process, the complexity of debugging is equivalent to the complexity of solving a CSP. [4] states that the complexity of solving a CSP is related to the hyper-tree width of the CSP as follows: The time need it to find a solution for a CSP with n variables as input and a corresponding hyper-tree width of k is in the worse case O(nk log n). Hence, knowing the hyper-tree width of CSPs of programs is important in practice. In Table 2 we report first results regarding the hyper-tree width of some small programs comprising while- and if-statements. The table comprises the lines of code (LOC), the lines of code of the corresponding SSA form (LOC2), the number of while-statements (#W), the number of If-statements (#I), the number of considered iterations (#IS), and the hyper-tree width (HW) for each program. The hyper-tree width of the programs varies from 3 to more than 30, which indicates that computing diagnosis candidates is a complex task when relying on CSP representation of programs. Another important issue is that the hypertree width increases when the number of considered iterations (during the unrolling step) increases. Whether there is an upper-bound or not is still an open issue.
LOC 27 27 76 76 27 27 64 64 60 60 24 24 24 63 21 21 21 162 12 12 12
LOC2 40 112 82 1155 62 989 78 342 88 376 41 119 1231 70 33 173 1376 162 30 370 1076
#W 1 1 5 5 5 5 4 4 4 4 3 3 3 0 2 2 2 0 1 1 1
#I 3 3 1 1 1 1 1 1 0 0 1 1 1 5 1 1 1 0 0 0 0
#IS 1 4 1 30 1 10 1 20 1 9 1 7 100 1 15 100 1 30 100
HW 3 8 3 ≥ 30 2 ≥ 14 2 ≥ 12 2 8 3 6 6 3 2 10 10 9 3 17 17
Figure 2. The hyper-tree width for different sequential programs
4
Conclusions
In this paper we discussed the compilation of programs into their equivalent CSP representation and its use for fault localization. Assertions like pre- and post-conditions or invariants can be easily integrated. Moreover, CSP solvers can be directly used for debugging. Solving CSPs depends also on their structural properties. The structural properties of the CSP corresponding to a given problem, represent an indicator for the complexity of program debugging. In this paper, we give first results of the debugging complexity using hyper-tree width. The results show that debugging requires a lot of computational resources.
REFERENCES [1] Marc M. Brandis and H. M¨ossenb¨ock. Single-pass generation of static assignment form for structured languages. ACM TOPLAS, 16(6):1684– 1698, 1994. [2] Rina Dechter. Constraint Processing. Morgan Kaufmann, 2003. [3] Georg Gottlob, Nicola Leone, and Francesco Scarcello. On Tractable Queries and Constraints. In Proc. DEXA 2001, Florence, Italy, 1999. [4] G. Gottlob, N. Leone, and F. Scarcello. A comparison of structural CSP decomposition methods. AI, 124(2):243–282, 2000. [5] Wolfgang Mayer, Markus Stumptner, Dominik Wieland, and Franz Wotawa. Can ai help to improve debugging substantially? debugging experiences with value-based models. In ECAI, pages 417–421, Lyon, France, 2002. [6] Raymond Reiter. A theory of diagnosis from first principles. AI, 32(1):57–95, 1987. [7] Markus Stumptner and Franz Wotawa. Diagnosing tree-structured systems. AI, 127(1):1–29, 2001. [8] M. Stumptner and F. Wotawa. Coupling CSP decomposition methods and diagnosis algorithms for tree-structured systems. In Proc. 18th IJCAI, pages 388–393, Acapulco, Mexico, 2003. [9] R. Ceballos and R. M. Gasca and C. Del Valle and D. Borrego Diagnosing Errors in DbC Programs Using Constraint Programming Lecture Notes in Computer Science, Vol. 4177, Pages 200-210, 2006. [10] Paper waiting to be reviewed by Informatica. http://www.informatica.si/
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-799
799
Compressing Binary Decision Diagrams Esben Rune Hansen1 and S. Srinivasa Rao2 and Peter Tiedemann3 Abstract. The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to 1-2 bits per node. Empirical results for our compression technique are presented, including comparisons with previously introduced techniques, showing that the new technique dominate on all tested instances.
1
Introduction
In this paper we introduce a technique for compressing binary decision diagrams for those cases where random access to the compressed representation is not needed. The two primary areas in which decision diagrams are used in practice are verification and configuration. In both of these areas it is sometimes important to store binary decision diagrams using as little space as possible but without the need for random access. Primarily the need for such compression arises when it is necessary to transmit binary decision diagrams across communication channels with limited bandwidth. In the area of verification this need arises for example when using a networked cluster of computers to perform a distributed compilation of a binary decision diagram [1]. A similar exchange of BDD data takes place in distributed configuration as described in [11]. In such approaches the fact that the network bandwidth is much lower than the memory bandwidth can become a major bottleneck as computers stall waiting to receive data to process. Transmitting the binary decision diagrams in a compressed representation can help alleviate this problem. A full version of this paper is available at [4]. Related work The only previous work we are aware of for compressing BDDs for offline storage is the work by Starkey and Bryant[9] and by Mateu and Prades-Nebot[7] which describes techniques for image compression using BDDs. The latter includes a nontrivial encoding algorithm for storing the BDD. Kieffer et.al[5] gives theoretical results for using BDDs for general data compression including a technique for storing BDDs. Preliminaries For a definition of BDDs please see [2]. We denote a given BDD as G(V, E) and use Elow and Ehigh to denote the set of low and high edges respectively. We use l(u) to denote the layer in which a node u is located. An edge (u, v) such that l(u) + 1 < l(v) is called a long edge and is said to skip layer l(u)+1 to l(v)−1. The length of an edge (u, v) is defined as l(v) − l(u). A layer ordering idl : V → {1, . . . , |V |} of the nodes in a layered DAG G(V, E) rooted in r is the ordering of V layer by layer in increasing order of 1 2 3
IT-University of Copenhagen MADALGO, Aarhus University, Denmark IT-University of Copenhagen
the layer. Nodes at the same layer are ordered as they are visited by a DFS in the DAG starting at r and traversing left edges prior to right edges. We refer to idb (v) and idl (v) as “the BFS id of v” and “the layer id of v” respectively. Lemma 1. Every binary tree can be unambiguously encoded using 2 bits pr. node. To achieve such an encoding each node v is encoded using two bits. The first bit and the second bit is true iff v contains a left and a right child respectively. In order to make decoding possible the order in which the children of already decoded nodes appear in the encoded data must be known.
2
The Compression technique
Our compression technique can be summarized by the following steps: 1. Build a spanning tree on the BDD (Section 2.1). 2. Encode edges in the spanning tree, using Lemma 1. 3. Encode by one bit the order in which the two terminals appear in the spanning tree. 4. Encode the length of the edges in the spanning tree where necessary (Section 2.1). 5. Encode the edges that are not in the spanning tree (Section 2.2). 6. Compress the resulting data using standard compression techniques.
2.1
The spanning tree
We will construct a spanning tree with a minimum number of long edges. For each node v in the BDD with parents u1 , . . . , uk , we add the edge (uj , v) that minimizes l(v) − l(uj ) to the spanning tree. This ensures a spanning tree with a minimal number of long edges. In the following, an edge is called a tree edge if it is contained in the spanning tree and a nontree edge otherwise. Encoding the lengths of the tree edges The spanning tree is stored as a binary tree where all edges have the same length. Since some of the edges in the spanning tree may correspond to long edges in the BDD, the binary tree itself is not sufficient to reconstruct the layer information during decoding. We therefore encode the location and the length of each long edge that is included in the spanning tree. The location of a long edge (u, v) is uniquely specified by the BFS order of the end point of the edge, that is idb (v). To encode location of the long edges (u1 , v1 ), . . . , (uk , vk ) we, output a bitvector of length |V | for which entries idb (v1 ), . . . , idb (vk ) are true and all other entries are false.
800
2.2
E. Rune Hansen et al. / Compressing Binary Decision Diagrams
Encoding nontree edges
When the spanning tree and the layer information is encoded, we only need to encode the nontree edges, that is, those edges in the BDD that are not contained in the spanning tree. It is easy to see that there is |E|/2 + 1 tree edges (when |V | > 3), leaving |E|/2 − 1 nontree edges. With access to the spanning tree with restored layer information, the fact that every BDD node except the terminals has two children, the starting point of the nontree edges is known. The end-point of a nontree edge is called an incomplete child. We define S as the sequence of incomplete children appearing in layer order of their parent and idl (S) as the correspong sequence of layer ids. Below we describe three encodings of nontree edges which combine to encode all the nontree edges. Incomplete children with large in-degree Standard compression techniques excel at compressing sequences with high redundancy. We note that nodes with in-degree d will appear d − 1 times in the sequence of nontree edges. Hence standard compression will efficiently compress those nontree children that have a high in-degree if they are separated from the nodes that have a low in-degree. We split S into two disjoint subsequences H and L, the first containing those incomplete children that have an in-degree larger than a specified threshold, the latter containing the rest. Based on H we construct the sequence of integers S H on the sequence of nodes v1 , . . . , v|V | in S by encoding vi ∈ H as idl (vi ) and vi ∈ L as 0. By 0s we indicate the incomplete children that are not among the incomplete children with high in-degree. The remaining incomplete children L, we code separately, as described in the next two sections. Incomplete children with small in-degree To encode L we will exploit the fact that the sequence of integers in idl (L) will in most instances tend to be increasing (this behavior is analysed in more detail in the full version [4]). We exploit this fact by encoding the sequence idl (L) using delta coding: Definition 2 (Delta Coding). Consider any sequence of integers (i1 , . . . ik ) ∈ Zk for any k ∈ N. We define the delta coding of (i1 , . . . ik ) by Δ(i1 , . . . ik ) = (i1 , i2 − i1 , i3 − i2 , . . . , ik − ik−1 ) Long forward edges A nontree edge (u, v) is a forward edge if u is an ancestor of v in the spanning tree. Any forward edge (u, v) in the graph with length k can be unambiguously decoded from idl (v) and k. We label each node v with the number of long-edges that ends in v. We then write the length of the long edges, ordered by their end-points. We introduce a threshold on the number of long forward edges to control the use of this approach. If the threshold is not exceeded all long forward edges are instead encoded as described above.
3
Experiments
In this section we provide empirical results from compressing a large set of BDDs from various sources using the new encoder described in this paper and the encoders from [7] and [5]. We also provide results for a naive encoder, which outputs the size of each layer followed by a list of children. Many of the instances we show results for are taken from the configuration library CLib [10]. We apply LZMA[8] to the output of all encoders to produce the final encoding. The Java source code used for these experiments (including a command-line encoder and decoder for BDDs in the BuDDy [6] file format) will be made available along with all instances used in these experiments at [3].
Conclusion From the empirical results (Figure 1) we can see that the naive encoding, being only compressed by LZMA, is outperformed with a factor of up to 20. We also note that the new encoder is consistently able to perform as well or better than the other encoders on all tested instances. In particular the largest BDD in our test (“complex-P3”) required about twice as much space when using either of the two other dedicated encoders. Name renault renault-dir pc-CP pc Big-PC Big-PC-dir complex-P3 complex-P2 1-6+22-32 1-6+22-32-dir isp9607 isp9605 chinese 5x27queens 13x13rook 8x8rook 8x8queen-dir 8x8queen mult-mix-10 mult-apart-10
|V | this paper [7] Product Configuration 455798 0,90 126% 1392863 0,23 198% 16496 0,76 220% 3467 2,19 224% 356696 0,38 334% 1291600 0,17 260% Power Supply Restoration 2812872 0,44 243% 163432 1,16 181% 20937 1,89 136% 61944 0,99 135% Fault Trees 228706 0,63 389% 4570 3,30 130% 3590 2,06 214% Combinatorial 562764 4,33 108% 76808 3,56 210% 1339 6,03 140% 2453 2,17 115% 879 4,29 114% Multipliers 42468 9,92* 114% 31260 8,07* 120%
[5]
Naive
103% 214% 209% 211% 266% 260%
402 % 1352% 788 % 436 % 1345% 2035%
202% 167% 154% 161%
951 % 541 % 413 % 606 %
204% 145% 160%
873 % 305 % 450 %
109% 165% 139% 178% 138%
204 % 311 % 277 % 374 % 332 %
107% 124%
169 % 202 %
Figure 1. Above is shown the name and nodecount of each of the instances tested. The result of the new encoder, in bits per node, is then showed, followed by the relative results of the rest of the encoders. The * indicates that delta-coding was not used.
References [1] P. Arunachalam, C. Chase, and D. Moundanos, ‘Distributed binary decision diagrams for verification of large circuits’, ICCD, 00, 365, (1996). [2] Randal E. Bryant, ‘Graph-based algorithms for boolean function manipulation’, IEEE Transactions on Computers, 35(8), 677–691, (1986). [3] Esben Rune Hansen, Srinivasa Rao, and Peter Tiedemann, ‘Bdd compression’. http://bddcompression.sourceforge.net. [4] Esben Rune Hansen, Srinivasa Rao, and Peter Tiedemann. Compressing binary decision diagrams, 2008. http://arxiv.org/abs/0805.3267v1. [5] J. Kieffer, P. Flajolet, and E h. Yang, ‘Universal lossless data compression via binary decision diagrams’, in Proceedings of ISIT 2000, (2000). [6] J. Lind-Nielsen, ‘BuDDy - A Binary Decision Diagram Package’. http://sourceforge.net/projects/buddy, online. [7] P. Mateu-Villarroya and J. Prades-Nebot, ‘Lossless image compression using ordered binary-decision diagrams’, Electronic Letters, 37, 162– 163, (2001). [8] Igor Pavlov. 7z lzma sdk. http://www.7-zip.org/sdk.html. [9] M. Starkey and R. Bryant. Using ordered binary-decision diagrams for compressing images and image sequences, 1995. [10] Sathiamoorthy Subbarayan. Clib: configuration benchmarks library. http://www.itu.dk/research/cla/externals/clib. [11] Peter Tiedemann, Tarik Hadzic, Stuart Henney, and Henrik Reif Andersen, ‘Interactive distributed configuration’, in Proceedings of CP2006, pp. 761–765. Springer-Verlag Berlin Heidelberg, (2006).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-801
801
Dependent Failures in Consistency-based Diagnosis 1 J¨org Weber and Franz Wotawa 2 1
Introduction
Model-based diagnosis (MBD) approaches which follow the consistency-based diagnosis paradigm [3, 2] usually assume that components fail independently; i.e., that any abnormal behavior of a component is the consequence of an internal fault. Although some researchers have acknowledged that components may fail dependently, there are very few works which have addressed this issue. In [5] we presented an approach for diagnosing dependent failures in the hardware-software hybrid system of a mobile autonomous robot, and in [4] we described a formalization of this approach. This paper proposes an improved technique for modeling failure dependencies. We formalize the semantics of our model, and we propose a troubleshooting strategy for systems with dependent failures. With the term dependent failure we denote cascades of failures which happen when a component, the cause of the cascade (CoC), fails due to an internal fault and when this failure causes the failure of other components as well. I.e., in systems with dependent failures it may happen that some components suffer from persistent damage after unexpected occurrences at their inputs. In physical systems, phenomena like overvoltages, high pressure, heat, etc., may harm those components which have not been designed to sustain such contingencies. In most existing MBD approaches the independence assumption is reflected in at least two ways: first, the applied focusing criteria rely on it; second, the resulting multiple-fault diagnoses do not indicate any dependencies between the failures. Many approaches compute the (subset-)minimal diagnoses or the minimal cardinality diagnoses. However, in case of dependent failures those focusing criteria may miss failed components. Even though we cannot expect to find all component failures, we should at least seek to determine all possible causes of a cascade of failures (i.e., the CoC’s). Furthermore, the obtained results should state the dependencies between the failures, since this information is often essential for a successful recovery of the system.
2
Discussion: Dependent Failures
Figure 2 depicts a circuit with two inputs sc1 and sc2 , which are either on or of f . They control the state of the switches. The filaments of the bulbs are either ok or broken. The voltage magnitudes uxx are modelled in a qualitative way; in particular, ubi = high indicates that the bulb Bi is exposed to a voltage which exceeds the range it was designed for (e.g., > 230 V ). The logical system description SD, which (as usual) captures the nominal behavior of components using the predicate AB which denotes ”abnormal”, is depicted in Fig. 1. 1 2
This research has been funded in part by the Austrian Science Fund (FWF) under grant P20199-N15. Institute for Software Technologie, Technische Universit¨at Graz, Austria, email: {jweber,wotawa}@ist.tugraz.at
¬AB(V ) → (uv = norm) ¬AB(R) ∧ (uv = x) → (us = x), x ∈ {zero, norm, high} AB(R) → (us = low) ¬AB(Si ) ∧ (sci = on) → (ubi = us ) ¬AB(Si ) ∧ (sci = of f ) → (ubi = zero) ¬AB(Bi ) ↔ (f ili = ok) [f il...”filament”] (f ili = ok) ∧ (ubi = zero) ↔ (lighti = on) (us = low) ∧ (ubi = low) → ⊥ (us = norm) ∧ (ubi = high) → ⊥ Figure 1. System Description (SD) for the system in Fig. 2
Figure 2. Circuit with a voltage source V , a resistor R, two switches S1 and S2 , and two bulbs B1 and B2 . The variables uxx denote voltages.
If every component works correctly and both system inputs sc1 and sc2 are on, then all voltages in the model have the value norm, and the two bulbs light. Now suppose that V fails in a way s.t. it produces a voltage significantly higher than expected, i.e., uv = high. Clearly, this will eventually destroy the bulbs, as the lifespan of a filament strongly decreases with higher voltage. Hence, a fault in V may be the cause of f ili = broken, and consequently it may be the cause of AB(Bi ). In such a case, V is the CoC, the cause of the cascade of failures. Note that, if the ultimate purpose of diagnosis is repair, then a bulb with a broken filament should always be regarded as abnormal. As usual, let SD be the logical system description, COM P the set of components, and OBS a set of observations [3]: Definition 2.1 A diagnosis for (SD, COM P, OBS) is a set Δ ⊆ COM P s.t. SD ∪ OBS ∪ {AB(c)|c ∈ Δ} ∪ {¬AB(c)|c ∈ COM P \ Δ} is consistent. Δ is (subset-)minimal iff no proper subset of it is a diagnosis. For OBS = {sc1 = sc2 = on, light1 = light2 = of f, f il1 = f il2 = broken}, we obtain the minimal diagnosis Δ = {B1 , B2 }. If we attempt to repair the system by replacing both bulbs, the new bulbs will soon fail again, as the actual cause V remains faulty. Our approach generates the failure cascade hypothesis HV,Δ = {DF (V, B1 ), DF (V, B2 )}, meaning that V is the CoC of the minimal diagnosis Δ (note that V ∈ Δ) and that B1 and B2 have failed in dependence of V ; the DF predicate denotes ”dependent failure”.
802
J. Weber and F. Wotawa / Dependent Failures in Consistency-Based Diagnosis
It can be seen that HV,Δ corresponds to a non-minimal diagnosis Δ = {V, B1 , B2 }, which is a superset of Δ. Moreover, HV,Δ also indicates the failure dependencies, i.e., the causal order of the cascade of failures. This is very important as in many systems this causal order also influences the order in which the components must be repaired; here, V should be repaired before the bulbs are replaced. If we have full observability and OBS = {uv = high, f il1 = f il2 = broken, . . .}, we obtain the minimal diagnosis Δ = {V, B1 , B2 }. Although this result comprises all components which have actually failed, it still does not indicate the failure dependencies. The discussions above show that, if knowledge about possible failure dependencies exists, the diagnosis can be improved by an approach which takes those dependencies into account and which is also able to provide results which state the causal order of failures. Moreover, it should be possible to logically refute those failure cascade hypotheses which are inconsistent with the observations.
3
Failure Cascade Hypotheses
We propose to capture the possible failure dependencies in a cascading failure graph (CFG), a model separated from the system description SD: Definition 3.1 A system is a tuple (SD, CF G, COM P ). The intended usage of SD is as usual in MBD, and it may also specify faulty behavior [2]. CF G is a directed acyclic graph (DAG) whose nodes are conjunctions of literals. Each node contains at most one AB literal, and this literal must be positive. Each edge is labelled with an abstracted condition symbol α [1]. The CFG is a partial description of what happens in the course of a cascade of failures. It is a causal model whose edges represent MAY relationships, similar to those in [1]. The abstracted condition symbols abstract from the actual conditions which may be very complex or even unknown. Figure 3(a) depicts a very abstract CFG for the circuit. Intuitively, the model in this figure indicates that AB(V ) may cause AB(B1 ) and/or AB(B2 ). A refined model is depicted in Fig. 3(b).
Definition 3.4 The cascading failure model (CF M ) is a set of logα ical sentences. It is created as follows. For each edge Si → Sj : add Si ∧ α → Sj to CF M . Moreover, for every pair of components αi,k−1 αi,1 (ci,1 , ci,k ) with a direct dependency path Si,1 → . . . → Si,k add DF (ci,1 , ci,k ) → Si,1 ∧ αi,1 ∧ . . . ∧ αi,k−1 to CF M . |= It follows that CF M ∪ {DF (ci,1 , ci,k )} AB(ci,1 ) ∧ AB(ci,k ). In our example, CF M contains the following sentences: AB(V ) ∧ α1 → (uv = high) (uv = high) ∧ α2 → (ub1 = high) ... DF (V, B1 ) → AB(V ) ∧ α1 ∧ α2 ∧ α4 ... Definition 3.5 A failure cascade hypothesis3 H ∈ 2Φ is a set of dependency assumptions s.t. the following holds: if DF (c , c) ∈ H, then there is no component c with DF (c , c) ∈ H. Definition 3.6 Given a hypothesis H, a component c is a cause of a cascade (CoC) in H iff there is a component c s.t. DF (c, c ) ∈ H and there is no component c with DF (c , c) ∈ H. E.g., in H = {DF (V, B1 ), DF (V, B2 )} there is only one CoC, namely V . In general, a hypothesis may have multiple CoCs. We introduce the notation Γ(H) to denote the set Γ(H) = {c | DF (c, ·) ∈ H or DF (·, c) ∈ H}. E.g., for H = {DF (V, B1 ), DF (V, B2 )} we obtain Γ(H) = {V, B1 , B2 }. Definition 3.7 A hypothesis H is consistent iff SD ∪ CF M ∪ OBS ∪ H ∪ {¬AB(c) | c ∈ COM P \ Γ(H)} |= ⊥ Proposition 3.1 If a hypothesis H is consistent, then Γ(H) is a diagnosis. We propose to focus on hypotheses which have a single CoC; i.e., we assume that all multiple failures have a single cause: Definition 3.8 A σ-hypothesis Hc,Δ , which relates to a component c and a (non-empty) minimal diagnosis Δ, is a hypothesis which has only one CoC, namely c, and Γ(Hc,Δ ) ⊇ Δ.
(a) A simple CFG.
(b) A refined CFG.
Figure 3. Two cascading failure graphs (CFG’s) for the system in Fig. 2.
A component ci,1 may directly cause the dependent failure of a component ci,k if there is a direct dependency path from ci,1 to ci,k : αi,1
αi,2
αi,k−1
Definition 3.2 A path Si,1 → Si,2 → . . . → Si,k in CFG is a dependency path from ci,1 to ci,k (k > 1) iff Si,1 contains AB(ci,1 ) and Si,k contains AB(ci,k ). Moreover, if no AB literal occurs in any node Si,j with 1 < j < k, then it is a direct dependency path. Definition 3.3 For every pair (ci,1 , ci,k ) of components with a direct dependency path from ci,1 to ci,k there is a dependency assumption DF (ci,1 , ci,k ), denoting that the failure of ci,1 has directly led to the failure of ci,k . The set of all dependency assumptions is Φ. We assume that there is at most one direct dependency path between two components. In our example we have: Φ = {DF (V, B1 ), DF (V, B2 )}. The cascading failure model (CFM) captures the semantics of a CFG. It is automatically generated:
We propose to compute only those minimal diagnoses which may have a single cause, to generate σ-hypotheses for these diagnoses, and to check the consistency of the hypotheses. We propose the strategy to seek (at least) one consistent σ-hypothesis for each possible cause of a minimal diagnosis. The reason behind this strategy is the observation that finding the ultimate cause of a cascade of failures is, in many domains, crucial for a successful repair of the system.
REFERENCES [1] Luca Console, Daniele Theseider Dupr´e, and Pietro Torasso, ‘A theory of diagnosis for incomplete causal models’, in Proc. IJCAI, pp. 1311–1317, Detroit, (August 1989). Morgan Kaufmann. [2] J. de Kleer, A. K. Mackworth, and R. Reiter, ‘Characterizing diagnoses and systems’, Artificial Intelligence, 56(2–3), 197–222, (1992). [3] Raymond Reiter, ‘A theory of diagnosis from first principles’, Artificial Intelligence, 32(1), 57–95, (1987). [4] J¨org Weber and Franz Wotawa, ‘Diagnosing dependent failures - an extension of consistency-based diagnosis’, in 18th International Workshop on Principles of Diagnosis (DX-07), Nashville, USA, (2007). [5] J¨org Weber and Franz Wotawa, ‘Diagnosing dependent failures in the hardware and software of mobile autonomous robots’, in Proceedings of IEA/AIE 2007, Kioto, Japan, (June 2007). 3
For brevity we will often simply use the term ”hypothesis”.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-803
803
Cost-sensitive Iterative Abductive Reasoning with Abstractions Gianluca Torta1 and Daniele Theseider Dupr´e2 and Luca Anselma1 1 Introduction Several explanation and interpretation tasks, such as diagnosis, plan recognition and image interpretation, can be formalized as abductive reasoning. A number of approaches, including recent ones [1, 4], address the problem based on a task-independent representation of a domain which includes an ontology or taxonomy of hypotheses. In this paper we adopt a similar representation, but we also deal with abduction as an iterative process where, like in model-based diagnosis, further observations are proposed to discriminate among candidate explanations; in addition, we take into account costs of observations and actions. In fact, discrimination also involves refining hypotheses, but this is performed down to an appropriate level which depends on the cost of actions (e.g. repair actions or therapy) to be taken based on the results of abduction, and on the cost of additional observations, which should be balanced with the benefits, in terms of more suitable actions, of better discrimination. The presence of a domain representation with abstractions has a significant impact on this trade-off. In general, a better assessment of the situation at hand, based on additional observations, leads to a more focused action. However, the cost of observing the same phenomenon at different levels of abstraction may vary significantly; in fact, it could involve more or less costly medical or technical tests, or computationally complex image processing, possibly with additional costs due to the delay before taking an action. Moreover, the knowledge base could have been designed independently of the explanation/action task (e.g. diagnosis and repair), and could therefore include a detailed description of the domain which is not necessary for the task; more generally, the convenience of a detailed discrimination may depend on the specific case at hand. By explicitly considering abstractions in the iterative abduction process, we can often reduce the observation costs significantly, yet maintaining the ability to exploit detailed observations and knowledge when convenient (similar advantages have been shown in inductive classification with abstractions, e.g. [6]). In the following, we first describe the knowledge we expect to be available. We then describe a basic iterative abduction loop and, finally, we concentrate on the criterion for selecting the next step in the loop: either performing a next observation at some level of detail, or stopping because the estimated most convenient choice is performing the action(s) associated with the current hypotheses.
2 Domain Representation The basic elements of the domain model are a set of abducibles (atomic assumptions) A = {A1 , . . . , An } and a set of manifesta1 2
Universit`a di Torino, Italy, email: {torta,anselma}@di.unito.it Universit`a del Piemonte Orientale, Italy, email: dtd@mfn.unipmn.it
tions M = {M1 , . . . , Mm }. Each abducible Ai is associated with an IS-A hierarchy Λ(Ai ) containing abstract values of Ai as well as their refinements at multiple levels; similarly, each manifestation Mj is associated with an IS-A hierarchy Λ(Mj ). We assume that the direct refinements v1 , . . . , vq of a value V in a hierarchy (either Λ(Ai ) or Λ(Mj )) are mutually exclusive, and at most one of the leaf values in a hierarchy is true in each situation, i.e. we allow at most one instance for each abducible and observation; moreover, for each leaf value v of an abducible an a-priori probability p(v) is given. The hypotheses space S(A) for the abduction task is the set of all of the combinations γ of values drawn from one or more distinct hierarchies Λ(Ai ), while the manifestations space S(M) is the set of all of the combinations ω of values drawn from distinct hierarchies Λ(Mj ). The relationships between the abducibles and the manifestations are defined by the domain knowledge K ⊆ S(A) × S(M). Given an instance of manifestations ω ∈ S(M) and an instance of abducibles γ ∈ S(A), (γ, ω) ∈ K means that ω is a possible observation set corresponding to the hypothesis set γ. We associate costs with values of both abducibles and manifestations. Let C ∈ Λ(Ai ) be a value belonging to the IS-A hierarchy of Ai ; its cost ac(C) is the cost of the action to be taken when Ai takes value C (e.g. a repair action if Ai represents a component and C denotes one of its fault modes). Let c1 , . . . , cq be the children of C in Λ(Ai ), i.e. the possible refinements of value C. We assume that: max({ac(c1 ), . . . , ac(cq )}) ≤ ac(C) ≤
q
ac(ck )
k=1
i.e. the action that we take for a value C of Ai costs no less than the most expensive action for its refinements and no more than taking the actions for all of such refinements. As for the manifestations, let O ∈ Λ(Mj ) be a value belonging to the IS-A hierarchy of Mj ; its cost oc(O) is the cost of making the observation which refines value O into one of its children o1 , . . . , oq in Λ(Oj ). We can associate an action cost also with any instance γ = r{C1 , . . . , Cr } ∈ S(A) of abducibles simply as ac(γ) = ac(Ci ), i.e. we assume that independent actions are taken for i=1 each of the abducibles values that appear in γ. With a slightly more complex computation we can also associate an action cost with a set of instances Γ = {γ1 , . . . , γs } representing the cumulative action cost if Γ is the final set of explanations. For each abducible Ai s.t. (a value of) Ai appears at least in one γ ∈ Γ, we compute a new hierarchy Λ(Ai , Γ) by considering the portion of Λ(Ai ) up to the least upper bound LUB (Ai , Γ) that covers all of the values of Ai that appear in Γ and by further removing from such a sub-tree all of the values that do not appear in Γ. In this way, it may happen that the cost ac(C) of a value C ∈ Λ(Ai , Γ) is larger than the sum of the costs ac(ck ) of its children,
804
G. Torta et al. / Cost-Sensitive Iterative Abductive Reasoning with Abstractions
since not all of the children of C defined in Λ(Ai ) need to appear in Λ(Ai , Γ). We therefore update (with a bottom-up computation) the ac costs in Λ(Ai , Γ) to new costs ac∗ in order to reestablish this property. The action cost of Γ is then computed just as: ac(Γ) =
ac∗ (LUB (Ai , Γ))
Ai ∈Γ
3 Iterative Abduction We rely on the following generic loop for iterative explanation: Input is a set of values ωI = {O1 , . . . , Ot } representing the initial observations, i.e. the values of a set of manifestations {M 1 , . . . , M t } ⊆ M. Generate a set Γ of candidates (i.e. explanations of ωI ). loop O := NextStep(Γ); if O = STOP then exit else perform observation to refine O into one of its children ok ; Γ := Update(Γ, ok ) end That is, we assume that one or more initial observations are given; that there is a way to generate candidate explanations based on them (see below), and to update candidates based on additional observations; and we proceed with selecting and performing one observation at a time, which, of course, is in general suboptimal, as in [3, 2]. In this paper we aim at providing a general approach to the selection of the next step; we do not provide a general approach to candidate generation and update which could involve a mix of abduction and consistency reasoning; its formulation would depend on the way K is represented. With hierarchies of abducibles, moreover, abstract as well as detailed assumptions may take part in explanations; a general criterion which is suitable in this setting is the preference for least presumptive explanations [5], which generalize minimal (wrt set inclusion) explanations: an explanation that (also based on the IS-A hierarchy) implies another explanation is not least presumptive. In the following we assume that the candidates computed at each iteration represent the least presumptive explanations of the observations collected so far.
If ac(Γ) is the minimum among the costs, we stop; otherwise we observe the O with the smallest c(O). Let Γk = {γ1 , . . . , γs } be one of the candidate sets involved in the above formula (note that each candidate γi may contain ground as well as abstract causes) and ac(Γk ) be its action cost, i.e. the cost of stopping at Γk , which must be compared with the estimated cost of acting after a further discrimination and refinement. In principle, this estimation step would require to simulate all the possible observation sequences and outcomes and, for each of them, to assess the point where it is convenient, on average, to stop and perform the actions; in order to avoid such an intractable search, we assume that the abductive process will continue as follows: first, one of the γi ∈ Γk will be isolated; then, γi is refined level by level, up to a point where performing an action is estimated to be convenient. Therefore the estimated cost of Γk is: c(Γk ) = min(ac(Γk ), ic(Γk ) + rac(Γk )) where ic(Γk ) is the estimated cost of isolating a single γi ∈ Γk and rac(Γk ) is the estimated additional refinement and action cost once some γi has been isolated. In this proposal, we estimate the cost ic(Γk ) as follows: ic(Γk ) =
s
−p(γi ) · log(p(γi )) · oc(γi )
i=1
where −log(p(γi )) is the estimated number of observations needed for isolating γi [3] and oc(γi ) is an estimate of the cost of a single observation3 . The cost rac(Γk ) of refining its members γi = {Ci,1 , . . . , Ci,ri } until an action is taken is estimated by: rac(Γk ) =
s i=1
p(γi ) ·
ri
c(Ci,j )
j=1
where c(Ci,j ) is the estimated cost associated with Ci,j . In case action costs do not depend on the current context, each cost c(Ci,j ) can be pre-computed offline with a bottom-up visit of the taxonomies of the causes. In this proposal we have adopted a formula similar to the one for c(Γk ), i.e.: c(Ci,j ) = min(ac(Ci,j ), ic(Ci,j ) + rac(Ci,j )) where ic(Ci,j ) is the estimated cost of isolating a single child of Ci,j in the hierarchy and rac(Ci,j ) is the estimated additional refinement and action cost once some child of Ci,j has been isolated4 .
4 Choosing the Next Step Let Γ be the current candidate set and let OBS be the set of possible observations (including refinements of previous observations). In order to decide whether to stop or to proceed with a new observation O ∈ OBS, we select the minimum among: • the action cost ac(Γ) associated with Γ • for each O ∈ OBS, the estimated cost c(O), which is the sum of the cost oc(O) of observing O and the expected cost of the candidate set after observing O, i.e.: c(O) = oc(O) +
q
p(ok ) · c(Γk )
k=1
where Γ1 , . . . , Γq are the possible candidate sets that would result by observing O and getting values o1 , . . . , oq respectively, p(ok ) is the probability of getting value ok (computed based on current candidates Γ as in [3, 2]) and c(Γk ) is the estimated cost of Γk as detailed in the following.
REFERENCES [1] Ph. Besnard, M.-O. Cordier, and Y. Moinard, ‘Ontology-based inference for causal explanation’, in Knowledge Science, Engineering and Management, 2nd Int. Conf., LNCS 4798, pp. 153–164, (2007). [2] L. Console, D. Theseider Dupr´e, and P. Torasso, ‘Introducing test theory into abductive diagnosis’, in Proc. 10th Int. Work. on Expert Systems and Their Applications, pp. 111–124, Avignon, (1990). [3] J. de Kleer and B.C. Williams, ‘Diagnosing multiple faults’, Artificial Intelligence, 32(1), 97–130, (1987). [4] B. Neumann and R. M¨oller, ‘On scene interpretation with description logics’, in Cognitive Vision Systems, 247–275, Springer, (2006). [5] D. Poole, ‘Explanation, prediction: an architecture for default, abductive reasoning’, Computational Intelligence, 5, 97–110, (1989). [6] J. Zhang, A. Silvescu, and V. Honavar, ‘Ontology-driven induction of decision trees at multiple levels of abstraction’, LNCS, 2371, 316–323, (2002). 3 4
We have defined oc as a function of γi to possibly take into account the level of detail of observations related with γi Note that when Ci,j is a leaf of the hierarchy, c(Ci,j ) is the action cost ac(Ci,j ).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-805
805
Computation of Minimal Sensor Sets for Conditional Testability Requirements Gianluca Torta1 and Pietro Torasso1 1 Introduction The problem of computing a minimal set of sensors (MSS) that guarantees a desired level of diagnosability of a given system is well known in Model-Based Diagnosis (e.g. [4], [3]). Unfortunately, for many real-world systems, guaranteeing the testability of a given fault in every situation is impossible, no matter how many sensors we place and how many test vectors we apply for identifying the fault. This impossibility is mainly the consequence of masking effects induced by the presence of other faults (e.g. we can’t tell whether a bulb is working properly if the power is down). While it is possible to partially address this problem by putting restrictions on the number of faults (in particular, the single-fault assumption as in [3] ), this unnecessarily limits the applicability of computing MSSs. To overcome these limitations, this paper introduces conditional testability, which requires the testability of a fault to be guaranteed just under some conditions given by the user. In order to be useful, such conditions must be easy to detect, so that during the on-line testing phase it is always known whether the testability guarantee applies or not. Therefore, the conditions are expressed in terms of endogenous variables: this makes possible to express testability conditions for a fault of component c directly on the endogenous variables local to c (making the specification task significantly easier). Moreover, if we assume that endogenous variables are potentially observable, it is always possible to find at least a Sensor Set (SS) that guarantees detectability of the conditions. In this paper we sketch how to compute MSSs for conditional testability starting from discriminability relations that are parsimoniously encoded using a symbolic representation. Such relations are built from an extended model of the system to be diagnosed which includes a set of switches modeling the inclusion/exclusion of observable variables into the set of actual observations. Minimization of the SSs can be further constrained either by positive information (e.g. we already have some sensors) or negative information (e.g. some endogenous variables can’t be sensorized); despite this flexibility, the minimization can be done in linear time w.r.t. the size of the symbolic representation of the set of SSs satisfying the given conditional testability requirements.
- SV is the set of discrete system variables partitioned in I (inputs and commands), C (components) and E (endogenous variables). We will denote with dom(v) the finite domain of v ∈ SV; in particular, for each c ∈ C, dom(c) consists of the list of possible behavioral modes for c (an ok mode and one or more fault modes) - DT (Domain Theory) is a relation over the variables in SV 2 The notion of testability we consider in this paper assumes that the system can be tested under different contextual conditions in order to find the set of consistency-based diagnoses (see e.g. [1], [3]). In general we are interested in verifying whether a mode m of a component c is testable and what degree of observability guarantees such a testability independently of the status of the other components. Unfortunately, this intuitive notion may be too strong because the status of the other components can mask the effects of the presence of c(m); for this reason we introduce a weaker notion of testability which is required to hold only on a restriction of the possible system behaviors. This weaker notion is useful only when we have the guarantee of recognizing (through suitable observations) whether the behavior of the system to be diagnosed complies with the restriction or not, so that it is always possible to tell whether the conditional testability of c(m) applies. First, we introduce the notion of discriminability between two hypotheses given a specific degree of observability3 . Definition 2 An hypothesis H is any relation obtained by constraining DT through the application of relational algebra operators σ and 1. We say that two hypotheses Hi , Hj are discriminable w.r.t. observability O ⊆ E iff ΠO (Hi ) ∩ ΠO (Hj ) = ∅. In the above definition Hi and Hj are any two restrictions to the system model involving the status of the system or the values of E variables, or a combination of both. The possible values of the observable variables O under hypotheses Hi and Hj must be disjoint. We are now ready to formalize the notion of testability of a behavioral mode m of component c given a specific level of observability when the global behavior of the system satisfies certain conditions.
2 Conditional Testability and Minimal Sensor Sets
Definition 3 Let c ∈ C, m ∈ dom(c), ϕE be a formula over E variables and SE = {C : DT ∧ C ∧ ϕE ⊥}, where C denotes an instance of C variables. We say that c(m) is conditionally testable under conditions ϕE w.r.t. observability O ⊆ E iff:
In this section we formally characterize conditional testability and Minimal Sensor Sets, starting with the definition of system model.
- there exists an instance X of I s.t. hypotheses Hi = (DT 1 X 1 SE ) and Hj = (DT 1 X 1 S E ) are discriminable w.r.t. O 2
Definition 1 A System Description is a pair SD = (SV, DT) where: 3 1
Universit`a di Torino, Italy, email: {torta,torasso}@di.unito.it
In component-based systems, relation DT is obtained by joining a set of relations DT1 , . . ., DTn each one modeling the behavior of a component. In the following, we use Π, σ and 1 to denote the project, select and join operations defined in the relational algebra
806
G. Torta and P. Torasso / Computation of Minimal Sensor Sets for Conditional Testability Requirements
- for each X , hypotheses Hi = σϕE (DT 1 X 1 SE ) and Hj = σ¬ϕE (DT 1 X 1 SE ) are discriminable w.r.t. O - for each X , hypotheses Hi = σϕE ∧c(m) (DT 1 X 1 SE ) and Hj = σϕE ∧¬c(m) (DT 1 X 1 SE ) are discriminable w.r.t. O Set SE represents all of the possible assignments to component variables C (i.e. system states) consistent with ϕE . The first discriminability condition requires that we can find an instance X of the inputs such that, given observability O, it is possible to tell whether the status of the system is in SE or in S E . Since ϕE never holds in states C ∈ S E , only states in SE must be further considered. The second and third conditions are strongly related. Indeed, even if the system status is in SE , it is possible that ϕE does not hold given the current input vector X ; the second condition requires that, provided the system status is in SE , observability O allows us to detect whether ϕE holds or not, regardless of the current input X . Finally, if ϕE holds it must be possible (third condition) to tell which of c(m) and ¬c(m) holds, i.e. c(m) is discriminable from ¬c(m). Conditional testability represents the formal basis for defining the notion of Minimal Sensor Set. Definition 4 A conditional testability requirement λ is a pair (c(m), ϕE ) where c ∈ C, mode m ∈ dom(c) and ϕE is a formula over E variables. Definition 5 Given SD = (SV, DT), an observability O and λ = (c(m), ϕE ), we say that O satisfies λ if c(m) is conditionally testable under ϕE w.r.t. O. A Minimal Sensor Set is an observability O∗ satisfying λ such that no other O , |O | < |O∗ | satisfies λ. From the definition above it is apparent that the preference criterion chosen for selecting MSSs is based on the minimum cardinality. The definition of MSS can be straightforwardly extended to apply to a set Λ = {λ1 , . . . , λm } of conditional testability requirements.
3 Computing the Sensor Sets The computation of MSSs involves a number of steps. The starting point is represented by the System Description SD and a set of userprovided conditional testability requirements Λ = {λ1 , . . . , λm }. Given a requirement λ = (c(m), ϕE ), the system computes the set SSλ of all the Sensor Sets that satisfy λ. In particular, according to Definition 5, SSλ includes all the observabilities that make c(m) conditionally testable under ϕE ; note that SSλ is empty only when ϕE is too weak and c(m) is not testable even under full observability. The computation of SSλ requires a way for easily representing and manipulating degrees of observability. To this end, we extend SD by adding a set of observation switches which model the inclusion/exclusion of potentially observable variables into the set of actual observations and by adding formulas relating the influence of the switches on the observable variables as described in [3]. Once the sets SSi , i = 1, . . . , m have been computed (one for each conditional testability requirement), the intersection of SSi , i = 1, . . . , m yields a set SS containing all the Sensor Sets (observabilities) that simultaneously satisfy the requirements in Λ. Note that SS is ∅ only if at least one of SSi is ∅; indeed, if each SSi = ∅, then SS contains at least the sensor set corresponding to full observability. Minimizing Sensor Sets. At this point, the process executes a minimization step in order to build a set M SS containing all of the Minimal Sensor Sets in SS. During the minimization step, the user has the possibility of specifying a set of constraints Ω on the sensors. In particular, the user may want to constrain some observables o to be
available (e.g. because they are already sensorized) or to be excluded (e.g. because it is impossible to measure o with a sensor). In principle, the minimization step could be very expensive from a computational point of view but the efficient approach developed for the computation of minimum cardinality diagnoses in [2] can be adopted for the computation of MSSs. The basic idea is to precompute a set of filters CSSi , where CSSi represents all of the possible observabilities involving exactly i observable variables as assignments to the switch variables. In order to compute M SS we intersect SS with filters CSSi starting form CSS0 (the lowest level of observability) and stopping as soon as the intersection is not empty. On-line Use of MSS for Testing. Let us consider the diagnosis of the system after it has been sensorized according to one of the Miminum Sensor Sets in M SS; the possibility of discriminating among different diagnoses depends on the set of requirements used for computing M SS. In particular, according to the first condition of definition 3, for each λ = (c(m), ϕE ) it is possible to apply a single input vector4 in order to figure out whether the actual status C of the system is inconsistent with conditions ϕE ; therefore, we have a cheap way of determining which are the modes of each component which are guaranteed to be discriminable with further tests. If it turns out that C is consistent with ϕE , we perform additional tests until we apply an input vector XC that, together with C, induces a behavior that satisfies ϕE . Thanks to the properties of the M SS computed by our approach, we have the guarantee that such an XC exists and that, after we update the set of candidate diagnoses with the readings of the sensors induced by XC , all of the candidate diagnoses either agree on c(m) or on ¬c(m). Symbolic Implementation. Since the size of the relations involved in the computation of M SS can in general be huge, we have adopted OBDDs (Ordered Binary Decision Diagrams) for encoding and manipulating all of such relations (including the Domain Theory DT). For the minimization step, we have also a theoretical result which guarantees that the potentially very expensive task of computing M SS can be done in linear time with respect to the size of the OBDD OSS encoding the set SS. Property 1 Let OSS be an OBDD encoding the Sensor Sets for the set of requirements Λ; then, OBDD OM SS encoding the Minimal Sensor Sets can be computed in time O(|E|3 · |OSS |). The OBDD implementation has proved to be effective when applied to the model of an hydraulic system involving 4 commands, 10 components, and 40 endogenous multi valued variables (similar to the one in [3]). The total CPU time taken by the computation of M SS is very small when we require that three behavioural modes of the 5 pipes are conditionally testable: the total CPU time is about 900 msec on a PC with CPU at 1.4GHz and 512 MB RAM. In particular, the time taken by the minimization step is almost negligible.
REFERENCES [1] M. Esser and P. Struss, ‘Fault-model-based test generation for embedded software’, in Proc. IJCAI, pp. 342–347, (2007). [2] P. Torasso and G. Torta, ‘Model-based diagnosis through obdd compilation: a complexity analysis’, LNCS, 4155, 287–305, (2006). [3] G. Torta and P. Torasso, ‘Computation of minimal sensor sets from precompiled discriminability relations’, in Proc. DX, pp. 202–209, (2007). [4] L. Trav´e-Massuy`es, T. Escobet, and X. Olive, ‘Diagnosability analysis based on component-supported analytical redundancy relations’, IEEE Trans. on Systems, Man and Cybernetics A, 36(6), 1146–1160, (2006). 4
Such input vector X is computed during the off-line check of the discriminability condition and it can be saved and associated with λ.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-807
807
Combining Abduction with Conflict-based Diagnosis Ildik´o Flesch1 and Peter J.F. Lucas2 Abstract. Conflict-based diagnosis is a recently proposed probabilistic method for model-based diagnosis, inspired by consistencybased diagnosis, that uses a measure of data conflict, called the diagnostic conflict measure, to rank diagnoses. In this paper, this method is refined using an abductive method that reuses part of the computation of the diagnostic conflict measure.
I1
I2
AbX1
I3
AbA1
OX1
1 INTRODUCTION Conflict-based diagnosis is a recently proposed probabilistic method for model-based diagnosis that is inspired by consistency-based diagnosis, and uses a measure of data conflict, called the diagnostic conflict measure, to rank diagnoses. The probabilistic information that is required to compute the diagnostic conflict measure is represented by means of a Bayesian network. This Bayesian network contains sufficient information to compute abductive diagnoses as well. In this paper, conflict-based diagnosis is augmented with an abductive method, similar in spirit to the probabilistic method employed by GDE [2]. The method reuses part of the computation of the diagnostic conflict measure. In essence, abductive diagnosis is used to rank conflict-based diagnoses with equal conflict-based rankings.
2 PRELIMINARIES 2.1 Model-based Diagnosis In model-based diagnosis, the structure and behaviour of a system is represented by a logical diagnostic system SL = (SD, COMPS), where (i) SD denotes the system description, which is a finite set of logical formulae, specifying structure and behaviour, and (ii) COMPS is a finite set of constants, corresponding to the components of the system; these components can be faulty. The system description consists of behaviour descriptions specifying normal and abnormal (faulty) functionalities of the components, and of connections of inputs and outputs of components. A logical diagnostic problem is defined as a pair PL = (SL , OBS), where SL is a logical diagnostic system and OBS is a finite set of logical formulae, representing observations. Two types of model-based diagnosis are distinguished: (i) consistency-based diagnosis [2, 6], and (ii) abductive diagnosis [1]. Let ΔC consist of the assignment of abnormal behaviour to the set of components C ⊆ COMPS and normal behaviour to the remaining components COMPS − C, then, adopting the definition from [3], ΔC is a consistency-based diagnosis of the logical diagnostic problem PL iff the observations are consistent with both the system description and the diagnosis; formally: SD ∪ ΔC ∪ OBS ⊥. 1 2
Department of Computer Science, Maastricht University, email: ildiko@micc.unimaas.nl Institute for Computing and Information Sciences, Radboud University Nijmegen, email: peterl@cs.ru.nl
OA1 AbA2
OA2
AbX2
OX2
Figure 1.
AbR1
OR1
The graphical representation of a Bayesian diagnostic system corresponding to the full-adder in [6].
In the abductive approach, the behavioural assumptions ΔC are called an abductive diagnosis if the system description SD and the behavioural assumptions ΔC imply the set of observations OBS; formally: SD ∪ ΔC OBS.
2.2 Bayesian Diagnostic Problems Let P (X) be a joint probability distributions of the set of discrete binary random variables X, where for a singleton x and x ¯ denote the values ‘true’ and ‘false’, respectively. A Bayesian network B is then defined as a pair B = (G, P ), where the acyclic directed graph G = (V, E) represents the relations between the random variables defined in P (X), where each random variable corresponds to a unique vertex. A Bayesian diagnostic system is denoted by SB = (G, P ), where P is a joint probability distribution of the vertices of G, interpreted as random variables, and G is obtained by mapping a logical diagnostic system SL to a Bayesian diagnostic system as follows: (i) component c is represented by its input Ic and output Oc , where each arc points from input to the output, (ii) to each component c there belongs an abnormality vertex Abc . An example is given in Figure 1. Let the set of values of the abnormality variables Abc , with c ∈ COMPS, be denoted by δC = {abc | c ∈ C} ∪ {abc | c ∈ COMPS − C}, which establishes a link to ΔC in logical diagnostic systems. In this paper, the set of observed input and output variables are referred to as Iω and Oω , whereas the unobserved input and output variables will be referred to as Iu and Ou respectively. Let iω denote the values of the observed inputs, and oω the observed output values.
808
I. Flesch and P.J.F. Lucas / Combining Abduction with Conflict-Based Diagnosis
The set of observations is then denoted as ω = iω ∪ oω . The following assumptions are used in the remainder of this paper: (i) the probabilistic behaviour of a component that is faulty is independent of its inputs, and (ii) normal components behave deterministically. These are realistic assumptions, as it is unlikely that detailed functional behaviour is known for a component that is faulty, whereas when the component is not faulty, it is certain it behaves as intended. A Bayesian diagnostic problem, denoted by PB = (SB , ω), consists of (i) a Bayesian diagnostic system and (ii) a set of observations ω [5, 4].
2.3 Conflict-based Diagnosis The theory of conflict-based diagnosis uses the diagnostic conflict measure to solve Bayesian diagnostic problems [4], where a numeric value is assigned to each diagnosis to order them. Define ω = iω ∪ oω as the observations, then the diagnostic conflict measure (DCM), denoted by conf δC (ω), is defined as conf δC (ω) = log
P (iω | δC )P (oω | δC ) . P (iω , oω | δC )
(1)
Using the independence properties from Bayesian diagnostic problems we obtain: P Q P i P (i) ou c P (Oc | π(Oc )) P Q . (2) conf δC (ω) = log P P (i ) u iu ou c P (Oc | π(Oc )) Intuitively, if the probability of the individual occurrence of the observations is smaller than that of the joint occurrence (if the numerator is smaller than the denominator), then the observations do ‘like’ or support each other. Thus, a smaller value of the DCM indicates a better fit between observations and component behaviours. Therefore, the DCM imposes an ordering on diagnoses, where the lower the DCM for a diagnosis is, the better the diagnosis fits the diagnostic problem. A diagnosis is a conflict-based diagnosis, if its DCM is non-positive, and it is also called minimal, if it has the least DCM value in comparison to the other conflict-based diagnoses.
3 ABDUCTIVE CONFLICT-BASED DIAGNOSIS In the ranking obtained by conflict-based diagnosis there may be cases, where the diagnoses have the same DCM. This has motivated us to develop a method which offers a way to distinguish such diagnoses. This method makes use of abductive computations, for which parts of the computation of the DCM are reused.
3.1 The Relation between Abductive and Consistency-based Reasoning In our probabilistic setting, the consistency condition requires that the probability of the occurrence of the observations given the diagnosis is non-zero. Formally, in consistency-based reasoning, we are searching for diagnoses δC with P (iω , oω | δC ) > 0. Note that the set of abnormality assumptions δC is given knowledge. In abductive reasoning, on the other hand, the observations have to be implied by the system descriptions and the abnormality assumptions δC . This means that we are looking for abnormality assumption δC that can be explained by the observations; formally: P (δC | iω , oω ). Using Bayes’ rule the following relationship between consistency-based and abductive reasoning can be established: P (δC | iω , oω ) =
P (iω , oω | δC )P (δC ) , P (iω , oω )
(3)
where 1/P (iω , oω ) is a normalisation constant. The maximum = a-posteriori assignment (MAP) diagnosis, defined as δC argmaxδC P (δC | iω , oω ), is the natural probabilistic analogue to the concept of subset-minimal abductive diagnosis [7]. According to Equation (3), computation of abductive diagnoses requires the computation of consistency-based diagnoses.
3.2 Abductive Probabilistic Computations Next, a formula to compute abductive diagnoses of Bayesian diagnostic problems is derived, which is used to distinguish between equally ranked conflict-based diagnoses. Note that the numerator P (iω , oω | δC ) in Equation (3) is also the denominator of the DCM in equations (1) and (2); according to [4]: X XY P (iu ) P (Oc | π(Oc )). P (iω , oω | δC ) = P (iω ) iu
ou
c
In contrast to Equation (2), the factor P (iω ) is not divided out. The denominator of the abductive formulas is computed as: X X XY P (δc ) P (iu ) P (oc | π(Oc )). P (iω , oω ) = P (iω ) δc
iu
ou
c
It is now possible to derive the abductive computational form: P (iω , oω | δc )P (δc ) = P (iω , oω ) P P Q P (iω ) iu P (iu ) ou c P (oc | π(Oc ))P (δc ) P P P Q = P (iω ) δc P (δc ) iu P (iu ) ou c P (oc | π(Oc )) P P Q P (δc ) iu P (iu ) ou c P (oc | π(Oc )) P P Q = P . (4) δc P (δc ) iu P (iu ) ou c P (oc | π(Oc ))
P (δc | iω , oω ) =
At first sight, it seems computationally infeasible to compute P (δc | iω , oω ) in this manner. However, the computation can be simplified as P (δC | ω) is only used to rank diagnoses and thus the denominator need not be used as it is the same for all diagnoses; only the numerator has to be computed. P The computation of the numerator is easy, since the second term iu P (iu ) · · · is already computed as part of the denominator of the DCM (see Equation (2)). Only the probability P (δc ) needs to be computed, which is a product of the individual probabilities for (ab)normal behaviours of the components.
4 CONCLUSIONS In this paper a method was described to augment conflict-based diagnosis with probabilistic abductive diagnosis. The refinement of conflict-based diagnosis by abduction has the virtue that it reuses part of the computation required for finding conflict-based diagnoses.
REFERENCES [1] L. Console and P. Torasso. A Spectrum of Logical Definitions of Model-based Diagnosis, Computational Intelligence, 7:133–141, 1991. [2] J. de Kleer and B. C. Williams. Diagnosing multiple faults, AIJ, 32:97– 130, 1987. [3] J. de Kleer, A. K. Mackworth, and R. Reiter. Characterizing diagnoses and systems. AIJ, 56:197–222, 1992. [4] I. Flesch, P.J.F. Lucas and Th. van der Weide. Conflict-based diagnosis: Adding uncertainty to model-based diagnosis. IJCAI, 380–388, 2007. [5] J. Pearl. Probabilistic Reasoning in Intelligent Systems:Networks of Plausible Inference. Morgan Kauffman, San Francisco, CA, 1988. [6] R. Reiter. A theory of diagnosis from first principles. AIJ, 32:57–95, 1987. [7] S.E. Shimony and E. Charniak. A new algorithm for finding MAP assignments to belief networks. AIJ, Volume 6, pp. 185–193, 1991.
4. Cognitive Modeling and Interaction
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-811
811
An Activity Recognition Model for Alzheimer’s Patients: Extension of the COACH Task Guidance System B. Bouchard1 and P. Roy2 and A. Bouzouane1 and S. Giroux2 and A. Mihailidis3 Abstract. This paper presents a hybrid plan recognition model, based on probabilistic description logic, which addresses the issue of recognizing the activities and the errors of Alzheimer’s patients at an early stage of the disease. This model has been implemented to be a new extension of the COACH system, an emerging prototype of cognitive device for persons with Alzheimer’s disease that offers assistance in task completion. We present an initial experimentation done on this new recognition module for COACH, which is based on the results of two sets of clinical trials.
1
Introduction
For several years, the IATSL laboratory4 has been exploring the process by which cognitive assistance, inside a smart home, can be provided to an occupant suffering from Alzheimer’s disease, in the performance of his Activities of Daily Living (ADL). This widespread form of dementia causes a progressive deterioration of thinking (cognitive impairment) and memory, leading to incoherent behavior and limiting the patient’s capacity to perform his tasks of everyday life (wash his hands, cook the meal, etc.) [4]. In this context, the research team led by Mihailidis [7] has developed COACH (Cognitive Orthosis for Assisting aCtivities in the Home), which is a prototype aiming to actively monitor an Alzheimer’s patient attempting a specific task, for instance handwashing, and offering assistance in the form of guidance (e.g., prompts or reminders) when it is most appropriate. A major limitation of the current COACH prototype is to presume that the system already knows which activity is in progress, and thus to suppose that it can only have one on-going task at a time. In this paper, we begin to address the limitation of recognizing an on-going ADL from observed basic actions, that constitutes a key issue inherent in cognitive assistance [3]. The complexity of this recognition process is increased because a memory lapse can lead a patient to perform actions in the wrong order, to skip steps of his activity, or to perform actions that are not even related to his original goal. However, the person is not always making errors; he can simply temporarily stop the execution of a plan to begin another one in the middle of an activity. This way, the patient deviates from the activity originally planned by carrying out multiple interleaved plans. This raises a new recognition dilemma that must be taken into account. 1
Universit´e du Qu´ebec a` Chicoutimi, (QC) Canada email:{Bruno.Bouchard, Abdenour.Bouzouane}@uqac.ca 2 Universit´ e de Sherbrooke, (QC) Canada email: {Patrice.C.Roy, Sylvain.Giroux}@usherbrooke.ca 3 University of Toronto, (ON) Canada, email: Alex.Mihailidis@utoronto.ca 4 The IATSL lab is sponsored by the Alzheimer Society of Canada, the American Association of Alzheimer, Intel, the Natural Sciences and Engineering Research Council of Canada (NSERC), and several other partners.
Our contribution follows the traces of hybrid approaches for plan recognition, an emerging alternative that combines two different avenues of research, logical and probabilistic. We distinguish Geib’s model[5], which is based on abductive probabilistic logic in order to deduce hypotheses quantified by probabilities, thereby explaining multiple plan behavior while taking into account uncertainty in the loss of observations. Another model, developed by AvrahamiZilberbrand et al. [1], considers that plan recognition is the result of a probabilistic quantification on the hypotheses obtained from a symbolic algorithm. It exploits a decision-tree system based on the properties of the observed actions, which is similar to the one used in the learning program C4.5, in order to efficiently identify possible hypotheses for interpreting simultaneous plans. These two models suppose that the observed agent is coherent and consequently, the proposed solutions cannot be applied to the recognition problem that we raised. On the other hand, the team of the Research Center on Intelligent Habitats (CRHI) of the University of Sherbrooke recently proposed a hybrid recognition model, based on lattice theory and a probabilistic action description logic [6, 2], allowing us to formalize the plausible incoherent intentions of the patient, resulting from the symptoms of his cognitive impairment. This model corresponds quite well with our needs by focusing on the recognition of erroneous/interleaved activities of Alzheimer’s patients.
2
Hybrid recognition model
A plan recognition process consists of interpreting the set of observed actions performed by a agent (patient) with the aim of predicting his future actions that explain his behavior. Let A = {a, b, . . .} be the set of actions that an observed agent is able to perform and let P = {α, β, . . .} be the set of known plans of the observer. Let O be the set of observations such that O = {o | ∃a ∈ A, a(o)}. The assertion a(o) means that observation o corresponds to an instance of action concept a. The set of possible plans that would explain the set of observations O, according to the agent’s knowledge, is expressed by Pso = {α ∈ P | ∃(a, o) ∈ α × O, a(o)}. However, his intentions can go beyond the set of possible plans. In order to generate all of the agent’s intentions, we enhance Pso by dynamically generating extra-plans (hypotheses) based on the composition operation α ⊕ β between each pair of incomparable possible plans (α, β) ∈ Pso . We define this enhanced set of plans Pho as the union of the composition pairs of possible plans. This set is an interpretation model for O if it forms a lattice structure < Pho , ≺p , Δ, ∇ > ordered by the subsumption relation of plans ≺p and each couple of plans admits an upper bound ∇, corresponding to their least common partial subsumer, which is minimally composed of the observed actions. Also, each couple of plans admits a lower bound Δ, corre-
812
B. Bouchard et al. / An Activity Recognition Model for Alzheimer’s Patients: Extension of the COACH Task Guidance System
sponding to a hypothesis schema (a plan containing action variables) obtained by disunifying the incomparable possible plans using the first-order logic disunification operation. The interest of this schema is to synthesizes the predictions concerning future actions. A plan αΔβ, defined as a sequence of actions a1 , . . . , x, . . . , an , denoted αΔβ(an ◦ an−1 ◦ · · · ◦ x ◦ · · · ◦ a1 ) where ◦ is a sequence operator and x is an action variable, is a hypothesis schema if and only if there exists a substitution σ(x) ∈ A+ such that each new extra-plan π(an ◦ · · · ◦ σ(x) ◦ · · · ◦ a1 ) satisfies the two following properties. The first one is the ⊕-stability of π which means that each hypothesis plan π ∈ α ⊕ β is formed by: (i) a set of partial plans included in the knowledge base P of the observer, (ii) at least one action common to plan α and to plan β, and (iii) a composition of actions that are components of α or of β. The second criterion is the ⊕-closure propriety which expresses that hypothesis π must admit an upper bound α∇β and a lower bound αΔβ. Hence, it must be included in the lattice bounded by αΔβ and α∇β. This algebraic space is not sufficient to disambiguate the relevant hypotheses. Therefore, the addition of a probabilistic quantification on the lattice structure is an interesting alternative. The symbolic recognition agent filters the hypotheses by passing only a bounded space to the probabilistic inference engine. Our proposal consists of characterizing through an interval of probabilities the relative influence (partial subsumption) of a plan on another one. Let α, β be two hypotheses interpreting the observed actions. The plan β partially subsumes the plan α with an interval of probabilities [pmin , pmax ], if there exists a supremum plan α∇β such that α ≺p α∇β and β ≺p α∇β, where pmin = 1/Pmin (α|α∇β) ∗ max(0, Pmin (α|α∇β) + Pmin (β|α∇β) − 1), and pmax = min(Pmax (β|α∇β)/Pmin (α|α∇β), 1). For instance, the term Pmin (α|α∇β) corresponds to the minimal conditional probability of observing the realization of a particular plan α, knowing that the sequence of actions α∇β has been observed. This estimation is based on a database given as input and composed of samples of observation frequencies concerning the realization of activities, which are obtained at the end of a training period while the system learns the usual routines of the patient. In other words, it models the minimal probability of implementation of an erroneous/interleaved plan by the patient.
3
Validation
The COACH infrastructure consists of an intelligent environment taking the form of a common washroom, as we can see in Figure 1. It is equipped with a video camcorder mounted on the wall to record trials. The researchers can change the faucet for usability studies. The extended new architecture of COACH is divided into three different layers. First, inputs are received from hardware sensors (the camera) and are sent to the low-level (first) recognition layer. This layer uses a vision algorithm based on a Bayesian sequential estimation method using flocks of color features, which allows one to identify basic events (observations), such as the patient’s hand location, the tap position (open or closed), etc. Thereafter, refined observations are sent to the plan recognition (second) layer, which uses our hybrid recognition model to construct a recognition space (a lattice structure), aiming to identify the possible on-going activities and to anticipate the possible future erroneous deviations of the patient. Finally, the assistance (third) layer receives this structured space and uses it to compute a correct prompting solution. The initial experimentation that we conducted is based on two studies conducted in the last two years in Toronto at the Lakeside
Figure 1.
Set-up for two activities: toothbrushing and handwashing.
Long Term Care Centre Toronto Rehabilitation Institute and at the Harold & Grace Baker Centre with 20 patients. Both studies lasted approximately 3 months. In these studies, each patient was asked to perform, once a day (for 50 to 60 days), the same HandWashing activity. These trials allowed us to create a database of real case scenarios concerning common erroneous behavior of patients. Based on these data, we selected 30 representative scenarios that cover each type of error and we simulated them, step by step. The objective was to evaluate in what proportion the new module was able to recognize patient’s erroneous/interleaved activities. The results of this initial experiment show that the module was able to recognize almost all interleaved multiple activities, 80% of omission type errors, 60% of the substitution errors, and 50% of sequence errors. These results are promising, as all the recognized deviations were dynamically generated according to the initial identified possible plans set. However, the module is limited by the fact that the first observed action is assumed to be correct (no errors). Also, some unrecognized errors are due to foreign actions that had nothing to do with the on-going activity, for instance washing his face instead of his hands.
4
Conclusion
The extension of COACH that we proposed allows us to address a major limitation of the former prototype, which presumed that the system already identified the on-going activity. Therefore, it should be seen as a first step toward the deployment of a complete prototype that could provide assistance for multiple different tasks to a patient at home. An interesting enrichment of the model consists in recognizing repetitive actions induced by patient’s erratic behavior, and to take into account the temporal relations between actions.
REFERENCES [1] D. Avrahami-Zilberbrand and G.A. Kaminka, ‘Hybrid symbolicprobabilistic plan recognizer: Initial steps’, in Twenty-First National Conference on Artificial Intelligence (AAAI’06), pp. 1–7, (2006). [2] The Description Logic Handbook (Second Edition), eds., Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider, Cambridge University Press, 2007. [3] B. Bouchard, A. Bouzouane, and S. Giroux, ‘A keyhole plan recognition model for Alzheimer’s patients: First results’, Applied Artificial Intelligence, Taylor & Francis publisher, 22 (7), 623–658, (2007). [4] J. Diamond, ‘Alzheimer disease and current research’, Technical report, Alzheimer Society of Canada, (October 2006). [5] Christopher Geib and Robert Goldman, ‘Partial observability and probabilistic plan/goal recognition’, in Proc. of the IJCAI-05 workshop on Modeling Others from Observations, Edinburgh, Scotland, (2005). [6] J. Heinsohn, ‘Probabilistic description logics’, in Proc. of Uncertainty in AI’94, pp. 311–318, (1994). [7] A. Mihailidis, J. Boger, M. Canido, and J. Hoey, ‘The use of an intelligent prompting system for people with dementia: A case study’, ACM Interactions, 14(4), 34–37, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-813
813
Not so new: overblown claims for ‘new’ approaches to emotion Dylan Evans1 Abstract. The non-classical thesis of emotion (NCE) states that the conceptual resources of classical cognitive science cannot adequately account for certain important features of emotion. It also states that these features can be adequately accounted for by employing the conceptual resources of non-classical forms of cognitive science. There is a general problem with all forms of NCE, since they all assume that classical cognitive science is too restrictive, when if anything the reverse is true. In fact, the relationship between classical and nonclassical approaches to the study of emotion is much more fuzzy than NCE suggests. Though the two approaches use different terms of art, this does not grant one group privileged access to cognitive resources inaccessible to the other, but merely directs their attention to different features of the phenomena being studied. Thus the real contribution of non-classical models of emotion is to draw our attention to certain key aspects of emotion requiring explanation that had perhaps been somewhat neglected by classical models.
1
THE NON-CLASSICAL THESIS OF EMOTION (NCE)
During the past decade, some philosophers and psychologists have argued that the conceptual resources of classical cognitive science cannot adequately account for certain important features of emotion. They have further argued that these features can be captured by nonclassical forms of cognitive science (eg. [6]). I will refer to such claims as the non-classical thesis of emotion (NCE). In this paper, I examine one particular form of NCE — namely, that put forward by Giovanna Colombetti. Before proceeding to examine Colombetti’s arguments in detail, however, let’s spell out the main components of NCE in a bit more detail. NCE may be characterised in terms of three lists and three claims, as follows: Three lists: 1. A list of the conceptual resources of classical cognitive science 2. A list of the conceptual resources of (some form of) non-classical cognitive science, which includes at least some elements not found in List 2. 3. A list of the key aspects of emotion that require explanation Three claims: 1. The newness of non-classical cognitive science (NNC): Members of list 2 that are not in list 1 cannot be reduced to any combination of the members of list 1 2. The explanatory weakness of classical cognitive science: Some members of list 3 cannot adequately be accounted for by the members of list 1. 1
Cork Constraint Computation Centre, Department of Computer Science, University College Cork, Ireland, email: devans@4c.ucc.ie
3. The explanatory strength of non-classical cognitive science: The same members of list 3 enumerated in claim 2 can be adequately accounted for by the members of list 2 (either on their own, or in addition to the members of list 1). The three claims are common to all forms of NCE. Differences between the various forms of NCE lie entirely in the different ways that the three lists are populated. None of the proponents of NCE goes so far as to provide exhaustive specifications of all three lists. Not only are their specifications partial, but they are not usually provided in the form of lists at all. Instead, their lists must be reconstructed from hints and ellipses, which makes criticism difficult. Nevertheless, this is what I will attempt do with Colombetti’s version of NCE in section 2
2
COLOMBETTI’S SPECIFICATION OF CLASSICAL COGNITIVE SCIENCE
Giovanna Colombetti provides a fairly representative example of NCE in a 2003 paper entitled ‘Complexity as a new framework for emotion theories’ [2]. Colombetti does not talk explicitly about ‘classical cognitive science’, but she does set out to criticise ‘good old fashioned frameworks’. The latter phrase clearly echoes the term GOFAI (‘good old fashioned Artificial Intelligence’) popularised by the philosopher John Haugeland, and which is synonymous with classical cognitive science [4]. According to Colombetti, the ‘good old fashioned frameworks [are] based on modular and hierarchical perspectives of the mind, which try to explain the elicitation of emotion by positing a strictly sequential causal chain of mental and/or physical events’ ([2]; emphasis in original). So, here, at least, is a partial specification of list 1 according to Colombetti: 1. Modular processes 2. Hierarchical processes 3. Strictly sequential causal chains of events A deep ambiguity affects Colombetti’s use of all three terms. In a longer version of this paper, I explain in detail what this ambiguity is in each case [3]. I leave this analysis aside here for reasons of space.
3
COLOMBETTI’S SPECIFICATION OF NON-CLASSICAL COGNITIVE SCIENCE
Colombetti does not use the term ‘non-classical cognitive science’, but she does talk about ‘the dynamical systems approach in cognitive science’ (or the ‘dynamical perspective’), and explicitly contrasts this approach with ‘the good old fashioned frameworks’ that we have already claimed to be coterminous with classical cognitive science.
814
D. Evans / Not So New: Overblown Claims for ‘New’ Approaches to Emotion
This is broadly in line with most of the claims made on behalf of nonclassical cognitive science, which tend to focus on the ‘dynamical hypothesis in cognitive science’ [7], or on strong claims about the embodiment and/or situatedness of cognition [1], or both. Thus we may take her characterisation of the dynamical systems approach to be her specification of non-classical cognitive science. Colombetti does not provide a formal list of the conceptual resources of the dynamical systems approach, but she does mention the following two ideas as being distinctive features of this approach: 1. Collective action of micro-components 2. Circular causation The first of these is explicitly contrasted with the ‘hierarchical’ processes supposedly invoked by classical cognitive science, and the second with the ‘strictly sequential causal chains’ that classical cognitive science is supposedly restricted to. Since these terms are problematic, it is hardly surprising that their supposed opposites are similarly problematic.
4 CONCEPTUAL RESOURCES, COGNITIVE ARCHITECTURES, AND COGNITIVE MODELS It is important to note that NCE is not a claim about the existence of new models or theories of emotion, but a claim about conceptual resources (or, more precisely, a claim about the relationship between models and conceptual resources). If a new model or theory of emotion accounts for hitherto refractory aspects of emotional phenomena, but can be entirely explicated by recourse to the conceptual resources of classical cognitive science, then the existence of the new model provides no support to NCE. This is often overlooked by proponents of NCE. Typically, supporters of NCE get excited about a new model of emotion that is expressed in terms that are not part of the conceptual toolbox of classical cognitive science. The fact that the new model accounts for aspects of emotion that have previously been neglected by models developed in the classical idiom is then taken to show that the classical idiom is incomplete. But this is a non-sequitur, for it neglects the possibility that the new model can also be expressed in terms that are drawn entirely from the the classical idiom. This general point undermines all the various forms of NCE. All forms of NCE require the combined conceptual resources of classical cognitive science and non-classical cognitive science to be greater than the conceptual resources of classical cognitive science alone. More formally, if C is the set of conceptual resources of classical cognitive science, and N is the set of resources for non-classical cognitive science, then if NCE is true, the relative complement of N relative to C must be non-empty. However, the problem with classical cognitive science, if there is one, is that its conceptual resources are all-encompassing. As a theory, it is not constrained enough. Take the Soar cognitive architecture developed by John Laird, Allen Newell and Paul Rosenbloom, for example [5]. Soar is about as good an example of classical cognitive science as anyone could hope for. It was designed as a common format for expressing a whole variety of cognitive models. Yet Soar is Turing-complete, so it can be programmed to represent any kind of computational cognitive model at all. So, for most of its critics, the problem with Soar is not that it is too constrained, but that it does not embody enough constraints to act as a good psychological theory.
The reference to Soar is particularly apt, since the current discussion would be better understood by cognitive scientists themselves (rather than by the philosophers of cognitive science who tend to dominate the discussion) if it were couched in the terminology of‘cognitive architectures’ rather than that of‘conceptual resources’. A cognitive architecture is, in fact, a specification of the kind of conceptual resources that may be used to construct a set of consistent cognitive models. Classical cognitive science is perhaps best seen as a set of cognitive architectures (comprising Soar, ACT-R, and others), while non-classical cognitive science is a different set (comprising subsumption architectures, neural networks, dynamical models, among others). For any pair of architectures, and any cognitive model, the cognitive model can always be programmed in both, or just the classical architecture — but never in the non-classical architecture alone.
5
CONCLUSION
The classical and non-classical forms of cognitive science certainly sound different. The key terms of the latter are rarely, if ever, to be found in the former. However, these terminological differences do not reflect any deep conceptual rift, since there is nothing in nonclassical explanations that cannot be translated into the terms of classical cognitive science. Yet the different terminology employed by classical and non-classical forms of cognitive science does make a difference to the way that the proponents of each go about their research. Terms like‘circular causation’ summon up in the researchers’ mind a set of studies that have put special emphasis on feedback loops (even though the researcher might explicitly state that they are concerned with something‘more’ than mere feedback) and so perhaps lead the researcher who‘feels at home with’ this terminology to discover feedback loops that he or she might might otherwise have missed. What is needed here is a theory of pragmatics, rather than a theory of deep conceptual structure. The use of different terms by different groups of cognitive scientists does not grant one group priveleged access to cognitive resources inaccessible to the other, but rather serves as a heuristic that directs their attention to different features of the phenomena being studied.
ACKNOWLEDGEMENTS The research for this paper was supported by Marie Curie Transfer of Knowledge Action no. MTKD-CT-2006-042563. Thanks are also due to Ric Wallace for his comments on an earlier draft of this paper.
REFERENCES [1] A. Clark, Being There: Putting Brain, Body, and World Together Again, MIT Press, Cambridge, Mass., 1997. [2] G. Colombetti, ‘Complexity as a new framework for emotion theories’, Logic and Philosophy of Science, 1, (2003). [3] D Evans, ‘The non-classical thesis of emotion’, unpublished manuscript. [4] J. Haugeland, ‘What is mind design?’, in Mind Design II: Philosophy, Psychology, Artificial Intelligence, ed., J. Haugeland, MIT Press, Cambridge, Mass. and London, England, (1996). [5] John E. Laird, Allen Newell, and Paul S. Rosenbloom, ‘Soar: an architecture for general intelligence’, Artif. Intell., 33(1), 1–64, (September 1987). [6] M. D. Lewis, ‘Bridging emotion theory and neurobiology through dynamic systems modeling (with commentary)’, Behav Brain Sci, 28(2), (April 2005). [7] T. van Gelder, ‘The dynamical hypothesis in cognitive science.’, Behav Brain Sci, 21(5), (October 1998).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-815
815
Emergence of Rules in Cell Assemblies of fLIF Neurons Roman V. Belavkin and Christian R. Huyck 1 Abstract. Inspired by biological cognition, CAB OT project explores the ways symbolic processing can emerge in a system of neural cell assemblies (CAs). Here we show how a stochastic meta– control process can regulate learning of associations between the CAs, the neural basis of symbols. An experiment illustrates the learning between CAs representing conditions actions pairs, which leads to CA–based representations of ‘if–then’ rules.
1
INTRODUCTION
Previously, the authors have demonstrated how states in cell assembly (CA) neural system can be controlled and used to perform a typical symbolic task (counting) [5]. This work has developed into a much more ambitious project called CAB OT, where the same principles are applied in a system, based entirely on CAs, that integrates elements of vision, categorisation, natural language processing and learning in a virtual environment. This paper presents a part of this project — learning the connections between different CAs — that combines symbolic representations into logical rules.
2 OVERVIEW OF THE ARCHITECTURE Our system uses fatiguing, leaky, integrate and fire (fLIF) neurons [4], an extension of LIF neurons [6]: Integrate and fire — the neuron ‘fires’ if its Paction potential, A, exceeds threshold θ, where A = (w, x) = ki=1 wi xi (integrator), w, x ∈ Rk are the weights and the stimuli vectors. The weights wt adapt according to the compensatory learning rule [4], which is an implementation of Hebbian learning [3]. t + (wt , xt ), where Leak and accumulation of potential, At+1 = A dt dt = ∞ if fired at t; d ≥ 1 otherwise. Fatigue makes the threshold dynamic, θt+1 = θt + Ft , where Ft = F+ ≥ 0 if fired (fatigue); Ft = F− < 0 otherwise (recovery). Cell assemblies are reverberating groups of neurons [3], and they are believed to be the neural basis of symbols in human mind. Our system is based on networks of sparsely connected neurons. The topology of the networks is pre–defined by some random pattern, and it can be highly recurrent. When enough neurons fire to start the reverberating circuit, the CA ignites, and its persistence is an important property of CAs’ dynamics. The fatigue and recovery rate parameters affect the persistence. A CA can be extinguished by another CA, which can ignite due to the change of the external pattern. A network with several CAs encoding a set of external patterns is referred to as a module. Several modules can be interconnected to create more complex systems. For example, a system of 7 modules and 40 CAs was used to implement a simple counting task [5]. More 1
Middlesex University, London NW4 4BT, UK
complex systems have been used to parse natural language and implement finite state automata. The next stage in the development of the project is the ability to learn the connections between different modules, the focus of this paper.
3 STOCHASTIC META–CONTROL Although the connections between the correlated cells are strengthened via Hebbian learning, it is the meta–process that controls which neurons fire and thus which connections are supported. The meta– process is based on stochastic control of action–selection algorithms, implemented earlier by the authors in cognitive architectures [1] and which are based on the following result of information theory. Given utility function u : Ω → R, the goal is to find probability distribution, P p on Ω, that maximises the expected utility Ep {u} = (p, u) = pi ui under additional constraints. This distribution is p(ω) = q(ω) eβu(ω)−Γ(β)
(1)
where P q(ω) is the reference (prior) distribution, Γ(β) = ln Ω q(ω)eβu(ω) , and β is the Lagrange multiplier, defined from constraints on information (I(p, q) ≤ I < ∞) or on the expected utility (Ep {u} ≥ U > −∞): β(U ) =
dI(U ) , dU
I(U ) = sup[U β − Γ(β)]
(2)
β
Function β(U ) is strictly increasing, and for β > 0 the optimal distribution (1) has non–zero values (p(ω) > 0) for all ω ∈ Ω such that u(ω) > −∞. Thus, the optimal distribution describes stochastic process, where all ω are randomised by the control parameter β > 0, or its inverse T = β −1 called the temperature. Value–Explore Topology. Problems of optimal control often involve maximisation of utility over set Ω = X × Y , where X is the set of observations (e.g. goals), and Y is the set of controls (e.g. actions). In our system, these sets are represented by two modules, Goals and Actions, where CAs represent conditions and actions respectively. Thus, ω ∈ Ω are condition–action pairs (x, y) ∈ X × Y . Value
Goal 1 .. . Goal m
/ Explore
/
/
/ Act 1
.. . Act n
Initially, the modules are set up with excitatory connections from every x ∈ X to all y ∈ Y . Thus, given some goal, any action can be triggered. Due to the Hebbian learning, the connections x → y between CAs that have fired together are reinforced, giving the pair a
816
R.V. Belavkin and C.R. Huyck / Emergence of Rules in Cell Assemblies of fLIF Neurons
higher chance to ignite in the future. Thus, due to Hebbian learning, the system can learn some random relation R ⊂ X ×Y (set of rules), which may not be optimal. Learning of only a particular (optimal) relation is supported by the meta–process that involves two additional modules: Value and Explore. The activity of the Value module represents the values of utility (higher activity corresponds to higher utility). The average activity of the module corresponds to constraint U in equation (2). The input of the module can be configured according to the application. For example, it may receive inputs from the sensory system representing agent’s preference on the states of the environment. The purpose of the Explore module is to randomise the activity of the Action module. Cells in this module are spontaneously firing, and the module sends excitatory connections to all CAs in the Action net. Thus, the Explore module can trigger randomly any Action CA, and this process has no memory. The module implements the effect of parameter β > 0 in equation (1) (or the temperature T = β −1 ). The Value module sends inhibitory connections to Explore, so that high activity of Value inhibits the activity of Explore. This implements the monotonic relation between constraint U and β in equation (2), and it allows for a very simple yet effective learning scheme. If a particular goal–action pair (x, y) results in a high utility, then the Value module inhibits Explore, and the (x, y) pair is allowed to persist longer. Since high utility pairs (x, y) on average co–fire longer than low utility pairs, their connections increase relative to others due to the compensatory Hebbian learning rule. This way, the meta–process supports learning of the optimal relation R ⊂ X × Y . As a result, the average activity of the Value module (U ) increases with time, while the activity of the Explore module (T = β −1 ) decreases. The system makes a transition from stochastic to an almost deterministic rule–based system. The biological plausibility of this topology is supported by studies of the reward path and tonically active cholinergic neurons in the basal ganglia and striatal complex [2]. These neurons account for a small proportion of the connections, and they are quite uniform and nontopographic. These neurons may play the role of stochastic noise, and their activation is reduced when the reward path is activated.
4 EXPERIMENT: LEARNING DICHOTOMIES The code of the system and the experiment described is available at http://www.cwa.mdx.ac.uk/CABot/CANT.html In this simple experiment, there are two CAs in the Goal and two CAs in the Action modules. Each module consisted of 800 cells, with 400 cells in each CA. The modules were set up with low weight excitatory connections from every goal CA to all action CAs, shown by dashed arrows on the left diagram below. The task was to learn two rules, shown by two solid arrows on the right diagram. Goal 1I _ _ _/ Act : 1
I uu Iu u II u $ Goal 2 _ _ _/ Act 2
Goal 1
/ Act 1
Goal 2
/ Act 2
The training procedure consisted of a random presentation of an input pattern activating one of the goal CAs every 100 cycles. Figure 1 shows the proportion of the correct actions selected (ordinate) as a function of cycles (abscissa). The chart shows the results of five simulations. Initially the system makes only half of the choices correctly. After 3000 cycles, the proportion of correct choices increases to 70–90%. Figure 2 shows the percentage of neurons firing per cycle
1 0.8 0.6 0.4 0.2 00
50
100
150 200 Cycles (x10)
250
300
Figure 1. The proportion of correct action choices (ordinate) as a function of cycles (abscissa). The curves represent results of different trials.
1 0.8 0.6 Value Explore
0.4 0.2 00 Figure 2.
50
100
150 200 Cycles (x10)
250
300
Activities of the Value and Explore modules in one experiment.
in the Value and the Explore modules in one of the experiments. As desired, an increase of the Value activity coincides with the decrease of the Explore. The implementation of the meta–process for rule acquisition in our system is an important step in its evolution creating new opportunities and improving our understanding of biological cognition.
ACKNOWLEDGEMENTS This work was supported by EPSRC grant EP/DO59720.
REFERENCES [1] R. V. Belavkin, ‘Acting irrationally to improve performance in stochastic worlds’, in Proceedings of AI–2005, the 25th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, eds., M. Bramer, F. Coenen, and T. Allen, pp. 305–316, Cambridge, (December 2005). Springer. [2] R. Granger, ‘Engines of the brain: The computational instruction set of human cognition’, AI Magazine, 27(2), 15–32, (July 2006). [3] D. O. Hebb, The Organization of Behavior, John Wiley & Sons, New York, 1949. [4] C. Huyck, ‘Hierarchical cell assemblies’, Connection Science, (2007). [5] C. Huyck and R. V. Belavkin, ‘Counting with neurons, rule application with nets of fatiguing leaky integrate and fire neurons’, in Proceedings of the Seventh International Conference on Cognitive Modeling, eds., D. Fum, F. D. Missier, and A. Stocco, Trieste, Italy, (April 2006). Edizioni Goliardiche. [6] W. Maas and C. Bishop, Pulsed Neural Networks, MIT Press, 2001.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-817
817
E RS: Evaluating Reputations of Scientific Journals ´ Emilie Samuel and Colin de la Higuera1 Abstract. Current methods for evaluating research are based on counting the number of citations received for publications. Thus, the more an article is cited and the more its impact is considered as important. In this article, we propose a new method for assessing the reputation of scientific journals, based on a Web application in which are gathered the votes of expert researchers. The voting results indicate degrees of preference for one journal over another. Our system uses, in addition, the publications of an expert in order to quantify his expertise in specific fields. These values are coupled with those of votes to determine the relevance, according to the field, of each journal in each topic. An iterative process of transferring values given to journals by experts to values of the experts themselves given their publications has been implemented.
1 Key concepts The system ERS manages bibliographic data formed by journals, researchers and themes. The journals and themes are bound by a relation called relevance. Each journal publishes articles more or less relevant to some research topic. Journals and researchers are directly connected by publications (relation has published). From these we hope to measure the expertise of each researcher in relationship with each theme. Finally, the votes of the researchers, depending on the expertise of the latter, will influence the relevance of the themes for journals. The phenomenon is recursively cyclic: the relevance influences the expertises, which in turn affect the relevances through the votes. The influence of researchers grows with their global confidence. We summarise these relationships through the diagram represented in Figure 1. Entities Experts, Themes and Journals are connected by relations relevance, expertise, has published and vote. Associated attributes are not represented.
1.1 Relevance of a journal for a theme Each journal is more or less relevant to each theme. The relevance reflects the reputation for this topic, of the articles published by the journal. Its value can be interpreted as the probability that the community of researchers in the field advise someone searching for literature in the given area, this journal.
1.2 Global confidence and expertise of a voter in a topic The computation of the expertise in a theme depends directly on the relevance of the journals in which the expert has published for that theme. If, for example, the expert has published several times in journals recognised by the system itself as being relevant to the theme 1
Laboratoire Hubert Curien - UMR CNRS 5516 - 18 rue Benoˆıt Lauras, ´ 42000 Saint-Etienne - email: {emilie.samuel, cdlh}@univ-st-etienne.fr
Journals
has published vote
relevance
expertise
Experts Figure 1.
Themes
The general model
databases, then the expert will be deemed to be an expert in this area. Thus, the calculation of expertise in each subject, which is based on the journals in which the author has published, depends, for each of them: • in the number of publications; • in the sum of all relevances of the journal, which reflects its importance; • in the likelihood of the theme, given the journal; • in the belief in the relevance of the theme for this journal. However, comparisons between different researchers should be avoided. One can, for example, consider that a group of individuals with similar profiles have interests for similar research fields. By contrast, a researcher with expertise of 10 % in information retrieval can not be considered twice as recognised in this area as a researcher with expertise of 5%. It may, in fact, be the case that the publications of one are less diversified than the second, which would then generate higher expertise, but in fewer topics.
1.3 Interrogating the experts For each expert, a list of journals to be evaluated automatically is defined. This list consists of journals in which he has published, and of journals that are judged by the system, close to his expertise. This list can also include journals in which his co-authors have published or journals on which the system has little information. The method of paired comparisons is used, whose application to ranking has been addressed since [2]. This method is intended to indicate a degree of preference, and lets one get a partial order by comparing journals two by two. It is then possible, from several partial orders resulting from expert opinions, to establish a total order of all the journals in each theme. Our approach is related to that shown in [3], where the authors propose to build clusters of total orders, corresponding to the opinions collected about movies. The expert must answer questions such as “ If you were to choose an article by one of these two journals, which would you choose?”. We call this process between two journals a match. A series of matches (until interruption by the expert) is organised, each match
É. Samuel and C. de la Higuera / ERS: Evaluating Reputations of Scientific Journals
being randomly drawn, where the journals in which the expert has published have higher probability to appear. The results of the matches are then analysed following the methodology employed by the Elo classification, used to rank chessplayers [1]. This classification assigns each player a rating based on his performance in competition. The rating of a player evolves over time with his results. When two players meet, a predicted result for each is calculated, the highest ranked player being supposed to beat his weaker opponent. The greater the difference in rating between the two players and the higher the probability that the best player wins. Following the match between the two players, their ratings are updated according to the following principle: if a player has achieved a better result than expected, it means that he was underestimated, and his rating is therefore increased, and vice versa. A rating can therefore rise or diminish, and the adjustment takes place proportionally with the difference between the true outcome and the presumed outcome.
2 Operational aspects A beta version of the system ERS has been running since July 20072 . Its ergonomics and aesthetics are subject to change. We are seeking a more attractive, user-friendly and interactive platform while retaining its ease of use. Initially we used a limited list of 16 themes, to which was added one smaller theme (grammatical inference) for testing purposes. The operationalisation required an initialisation phase, each journal being allocated an initial relevance in each theme. To do this, we chose an initial set of themes, and associated with each theme a list of keywords. For example, can be associated to the theme machine learning words like pattern recognition, or classification, or reinforcement. We then computed a frequency (term frequency) for each keyword appearing in the titles of journal articles. Thus, the more a journal publishes articles with these words in their titles and the more its relevance to the corresponding theme increases. The confidence in the relevance matches, is obtained as a computation of the inverse document frequency, which is a function increasing with the specificity of the keywords.
3 Convergence of the system The update of the system is done daily, in a batch mode. Thus, the data of the system constituted by relevance, global confidence and expertise are changing continuously, and are recomputed iteratively. The convergence of these values occurs as soon as relevance remains stable from one iteration to the next. The first phase of the experimental validation of the convergence of relevances consisted in the initialisation (random) and normalisation of the relevances for 570 journals and for 17 themes. In order to constitute a panel of experts, 2000 researchers were then randomly selected from those identified in D BLP. Their global confidence and expertise in each subject were computed according to their publications in journals. Thereafter, a simulation of votes by these experts took place. This consisted of generating randomly 28500 votes, so as to reach an average of 50 per journal. The algorithm was finally run on this repeatedly until convergence of the values of relevance, global confidence and expertise. The convergence results of three experiments respecting this protocol are shown in Figure 2. The variation distance L1 was used to 2
http://labh-curien.univ-st-etienne.fr/ERS/
measure the value of the difference between the relevance of an iteration to the next. As can be seen, the computation converges in a small number of iterations, each carried out in an average of 2 seconds. 20
15
distance L1
818
10
5
0 0
Figure 2.
1
2
3
4
5
6
7 itérations
8
9
10
11
12
13
14
Convergence of the relevances during the batch computations
4 Conclusion and perspectives System ERS permits a different evaluation of scientific journals, directly based on the opinions of scholars. This system, which we hope to render attractive, simple and efficient, offers an assessment protocol for comparing journals two by two. Following the processing of votes, the results indicate, according to the subject, which are the journals best recognised by the community of the area. A number of perspectives are being looked into. In addition to those cited in this article, the first is to work on the actually very reduced list of themes: ideally the list should be dynamic: new communities or sub-communities should be detected by the system, and the corresponding keywords should be automatically computed. The computation of the expertise and confidence of the researchers could involve a more complex analysis, taking into account (again in an automatic way) the date on which his articles were published, or other information beyond D BLP and obtained by Web mining techniques. The interrogation scenario should also be considered as being improvable. Using better the results is another possible task: a profile for a journal (as a vector of quantities over themes) can be easily computed, and a similar profile can be computed for a researcher. One can therefore query the system with questions like “which journal is the closest to my way of doing research?”. In addition, the identification of researchers at the registration remains an important point on which further work is necessary. Finally, the evaluation of conferences is a logical evolution of the system, which requires additional attention and so is the even more ambitious task of adapting the system to other fields of research.
Acknowledgements System ERS was developed in the laboratory Hubert Curien with the help of Fabrice Muhlenbach, Baptiste Jeudy and Franc¸ois Jacquenet. The expertise of Thierry Murgue has solved many engineering problems. Students from the department of computer science at SaintEtienne have and continue to develop different modules for the system.
REFERENCES [1] A. E. Elo, The rating of chessplayers, past and present, Arco, 1978. [2] H. Joe, ‘Rating systems based on paired comparison models’, Statistics & Probability Letters, 11, 343–347, (1991). [3] A. Ukkonen and H. Mannila, ‘Finding outlying items in sets of partial rankings’, in Knowledge Discovery in Databases: PKDD 2007, volume 4702 of LNCS, pp. 265–276. Springer, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-819
819
Personal Experience Acquisition Support from Blogs using Event-Depicting Images Keita SATO1 and Yoko NISHIHARA2 and Wataru SUNAYAMA1 Abstract. Internet users write blogs related to their personal experience, daily news, and so on. We can obtain blogs about personal experience using search engines on the Web. However, the search engines also output blogs about other topics unrelated to personal experience. Therefore, it is necessary for us to read all blogs to obtain those about personal experiences. It takes too much time. This paper proposes a support system for obtaining blogs about personal experiences efficiently. The system extracts three keywords that denote place, object, and action from a blog. The three keywords describe an event that leads a person to write a blog about personal experience. The system expresses the event with three pictures depicting the extracted keywords. The pictures help users to judge whether personal experience is written about in the blog. We experimented with the system, and verified that it supports users in obtaining personal experiences efficiently.
The system extracts events that lead people to write blogs about personal experiences. The system does not need training data for event extraction, and the system can help users to obtain blogs about personal experiences.
1
The system chooses blogs including sentences in the past tense from the downloaded blogs. This is because a sentence is almost always written in the past tense when an event is described in it.
INTRODUCTION
Internet users write blogs related to their personal experience, daily news, and so on. We can obtain blogs about personal experience using search engines on the Web. However, the search engines also output blogs about other topics. Users need to read all blogs to obtain those about personal experiences, which takes too much time. This paper proposes a support system for obtaining blogs about personal experiences efficiently. The system extracts three keywords from a blog. The keywords express an event related to personal experiences. The system expresses the event using three pictures that depict the extracted keywords. The pictures help users to judge whether personal experience is written about in the blog. Showing images reduces the time needed to understand a blog’s contents [4, 5]. Therefore, seeing images requires less time to choose blogs about personal experience than reading blog texts. We define an event as three keywords, a place keyword, an object keyword, and an action keyword. Many studies about information extraction from the Web have been conducted [1]. In the case of extracting information noticed by many people, Glance, et al. have proposed a method to extract noticed persons, topic keywords, and topic sentences from blogs [3]. The proposed system also extracts information from the Web. However, we aim to extract information noticed by one person. Blogs about personal experience are reviews posted by Internet users. Review extraction methods from the Web have been studied. [2, 6] separate reviews for commercial items into positive/negative to extract characteristic keywords by machine learning. However, the proposed system does not extract blogs about personal experiences. 1 2
Hiroshima City University, Japan, email: keita@sys.im.hiroshimacu.ac.jp,sunayama@sys.im.hiroshima-cu.ac.jp The University of Tokyo, Japan, email: nishihara@sys.t.u-tokyo.ac.jp
2 PROPOSED SYSTEM In the proposed system, a user inputs a query related to a personal experience about which the user wants to know. Blogs are downloaded from a blog site using the query. The system chooses blogs with events from those downloaded. The system then separates the chosen blog texts into several blocks. Three keywords are extracted from each block. Then the system sets out three images depicting the extracted keywords and finally outputs the images.
2.1 Blog selection
2.2 Blog text separation A blog is considered to have descriptions about certain places that are different from each other. Therefore, the system separates blog texts into several blocks. One block has one place keyword. We define that a place keyword is a noun. The system extracts keywords behind a preposition that is often used for a place expression. The prepositions are as follows: at, in, to, for, and so on.
2.3 Keyword extraction Three keywords related to place, object and action, are extracted from each blocks. We explain how to extract an object keyword and an action keyword in the following sections.
2.3.1 Object keyword extraction We define that an object keyword is a noun. If several object keywords are in a sentence, the relation between an object keyword and the extracted place keyword is evaluated using Eq. (1). relation(p, o) =
hit(p ∧ o) hit(p ∧ o) × hit(p) hit(o)
(1)
In Eq. (1), o denotes an object keyword and p denotes a place keyword. Eq. (1) calculates the proportion of the number of Web pages in which both keywords are included to the number of Web pages
820
K. Sato et al. / Personal Experience Acquisition Support from Blogs Using Event-Depicting Images
in which each keyword is included. If the value of Eq. (1) is high, the relation between the keywords is strong. The system extracts a keyword with the highest value of Eq. (1) as an object keyword.
1. Image System: Look at images for each blog. Text System: Read blog summaries for each blog. 2. If you think a blog has events, choose the blog and read it. 3. Extract texts about personal experience from the read blog.
2.3.2
The number of participants was 36. The participants were undergraduate/graduate students majoring information science. 18 participants were assigned to a set of one query and one system. The time for the set was five minutes. We considered that most people spend about five minutes to do some research on personal experience on the Internet. We compared the number of text-extracted blogs using the Image System and the number using the Text System.
Action keyword extraction
It is defined that an action keyword is a verb appearing in a sentence where the object keyword has been extracted. This is because an action keyword usually appears near an object keyword.
2.4
Image setting out
The system sets out three images depicting the extracted keywords (Fig. 1). The images are set out transversely, place, object, and action, from left to right. If a blog is divided into some blocks, the system chooses a block in which three keywords are first extracted. The system uses an image database made by the authors for setting out the three images. The database has 1,000 place images, 700 object images, and 200 action images. If there is not an image depicting the extracted keyword, the system shows a space.
3.1 Experimental Results Table 1 shows averages of text-extracted blogs. The averages using the Image System were higher than the averages using the Text System (P<.05). The result means that most of the blogs chosen using the Image System have events that lead Internet users to write blogs about personal experiences. Table 2 shows proportions of read blogs to text-extracted blogs. Except Hiroshima, the proportions using the Image System were higher than the proportions using the Text System (P<.05). In the case of Hiroshima, the Image System did not show many images depicting places. Therefore, the participants often chose blogs that did not have events. However, in the cases of the other queries, the proportions using the Image System were higher than those using the Text System. From the result, it was verified that the Image System helps users to obtain more blogs about personal experiences efficiently. Table 1. Image Text
Okinawa 2.2 1.4
Table 2. Image Text
Okinawa 0.38 0.29
Averages of text-extracted blogs
Tokyo 3.1 2.6
Hiroshima 2.9 2.6
Nigata 1.8 1.3
Hokkaido 3.2 2.5
School festival 3.4 2.1
Proportions of read blogs to text-extracted blogs Tokyo 0.63 0.55
Hiroshima 0.57 0.60
Nigata 0.38 0.28
Hokkaido 0.54 0.53
School festival 0.63 0.45
4 CONCLUSION This paper proposes a support system to obtain blogs about personal experiences. The system expresses an event using three images, which depict place, object, and action. We verified that the system helps users to obtain blogs about personal experiences efficiently. Figure 1.
3
Output of the proposed system: blogs with three images
EXPERIMENT
We experimented with the proposed system (Image System). We asked participants to extract texts about personal experiences from blogs. We used 100 blogs that were downloaded from a blog site and chosen by the Image System. We considered that a system user wants to know about personal experiences that he/she may also experience in the future. Therefore, we used the following six queries: “{Okinawa, Tokyo, Hiroshima, Nigata, and Hokkaido} AND sightseeing,” and “school festival AND refreshment shop”. Okinawa, Tokyo, Hiroshima, Nigata, and Hokkaido are the names of sightseeing areas in Japan. We prepared another system that shows blog summaries (Text System). The summaries were also shown in the Image System. The 100 blogs were divided into four sets of 25 blogs. Both systems used a web browser whose window size was 1,200 pixels×1,920 pixels. We instructed participants as follows:
REFERENCES [1] C.H. Chang, M. Kayed, M.R Girgis, and Shaalan K., ‘A survey of web information extraction systems’, IEEE Transactions on Knowledge and Data Engineering, 18(10), 1411–1428, (2006). [2] K. Dave, S. Lawrence, and D.M. Pennock, ‘Mining the peanut gallery: Opinion extraction and semantic classification of product reviews’, in Proc. of the 12th International World Wide Web Conference, 519–528, (2003). [3] N.S. Glance, M. Hurst, and T.: Tomokiyo, ‘Blogpulse: Automated trend discovery for weblogs’, In WWW2004 Workshop on the Weblogging Ecosystem, (2004). [4] S. Hulbert, J. Beers, and P. Fowler, ‘Motorists’ understanding of traffic control devices’, AAA Foundation for Traffic Safety, (1979). [5] M. Pietrucha and R. Knoblauch, ‘Motorists’ comprehension of regulatory, warning and symbol signs’, Technical Report Contract DTFH6183-C-00136, FHWA, 2, (1985). [6] P.D. Turney, ‘Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews’, in Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, 417–424, (2002).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-821
821
Object Configuration Reconstruction from Descriptions using Relative and Intrinsic Reference Frames H. Joe Steinhauer 1 Abstract. We provide a technique to reconstruct an object configuration that has been described on site by only using intrinsic and relative frames of reference into an absolute frame of reference, as seen from the survey perspective.
LF
SF
LN
1
Description
In the present work, we assume descriptions consisting of a combination of a route tour [9, 6] and several embedded gaze tours [9, 6], one for each step within the route tour. Furthermore, a momentarily applied absolute frame of reference is used in addition to each gaze tour to report the relative positions between the objects, seen from a particular viewpoint. An activity diagram illustrating the route tour is presented in figure 2. All three description types use on of the two rectangular frame of reference, shown in figure 1. The frame of reference in figure 1a) is similar to many current approaches to qualitative reasoning about orientation. See for example [8, 3, 7, 4, 1]. Considering that people in way-finding or route-description tasks usually distinguish between eight direction classes [5], the eight directions right back (rb), right neutral (rn), right front (rf ), straight front (sf ), left front (lf ), left neutral (ln), and left back (lb) shall be used as possible object orientations. The first time a frame of reference is used during the route tour, it automatically sets a corresponding projection-based global frame of reference that captures the concept of a representation of position in latitude and longitude [2]. 1
LN
RN
LB
LF
SF
RF
Motivation
You probably recognize the following scenario: You invited a friend to your house and gave him extraordinary directions to easily find the place. However, your friend suddenly phones you and tells you that he ’somehow’ cannot find your house. Furthermore, he even had lost his orientation completely. Now it is your task (because you are supposed to know the area) to figure out where he is and guide him from there to your house. You probably let him describe the objects he sees around him and you try to match the described to some mental or external map of the suspected area. Having a survey perspective of his mental or external reconstruction, you (the listener) needs to translate all relative object relationships that your friend (the observer) provides into the global frame of reference. The observer is therefore encouraged to produce an object configuration description that contains the information needed to recognize the object configuration from a survey perspective.
2
RF
Department of Computer and Information Science, Link¨oping University, Sweden, email: joest@ida.liu.se
LB
SB
RB
SB
RB
RN
b)
a)
Figure 1. The frames of reference used.
As described in [9] people are able to change perspectives during a task. Further they are often willing to accept a higher cognitive load if they feel that this may alleviate the cognitive load for their communication partners. Therefore we ask the observer to switch between both reference frames in the continuation of the description process.
3
Reconstruction
Assuming (for readability reasons) that the listener uses the terms north, northeast, east, southeast, south, southwest, west, and northwest as global directions in the reconstruction, he may choose the orientation north for object 1. Accordingly, the other relationships are translated. The information in which direction the observer moved enables the listener to follow the angle that the applied frame of reference has in relation to the underlying global frame of reference. For a smooth reconstruction, it is advantageous if the description is sorted. The position of an object is only described in relation to objects that have been mentioned within the description before. Incorporating an additional object into a configuration is done as follows. In the order the relationships of this object are given to other objects that already are reconstructed, the area of the new object is calculated by intersecting all the qualitative regions of the new object to all its reference objects. For instance is the estimated region for object 5 in figure 3a) the intersection of the regions north 1, northeast 2, northwest 3, and northwest 4 (printed in grey). Sometimes space has to be made between some already placed objects, for instance when the new object happens to be ’in the middle’ of them. For instance consider to insert object 8 into the configuration shown in figure 3b) using the relationships (8 southeast 5), (8 southwest 7), and (8 northeast 1). The intersection of the regions southeast 5, southwest 7 and northeast 1 contains no space. We can solve this problem by dividing all objects in the reconstruction in two groups,
822
H.J. Steinhauer / Object Configuration Reconstruction from Descriptions Using Relative and Intrinsic Reference Frames
or northeast of the moved object and are not moved, the object is southwest, south, or southeast of each of them. These regions are infinite to the south and the object will never leave them by moving southwards. All other objects are moved in the same way as the object itself and therefore its relationships to these objects does not change. Figure 3d) presents the result where object 8 has been inserted into the new obtained space. The procedure to obtain space in the horizontal dimension works accordingly. An object’s orientation is given by an arrow pointing in the object’s front edge or front corner. The representation of all objects aligned with the underlying global frame of reference allows the listener to draw objects into the reconstruction, whose orientation is unknown and to add the orientation later without need to redraw the object, or to change its frame of reference. Furthermore, it is necessary to apply the described reconstruction procedure.
move to next object
object inherits observer’s orientation
[first object]
[not first object] [moving direction rf, lf, rb, lb]
[moving direction sf, sb, ln, rn] establish underlying global FoR
change FoR type
4 build OCD by gaze tour
build OCD using momentarily absolute FoR
[unvisited objects left]
[all objects visited]
Figure 2. The route tour process.
1
4
1
4
6
3
b)
a)
1
2
5
7
5
4
1
3
4
e We provide a technique to reconstruct an object configuration that has been described on site by only using intrinsic and relative frames of reference into an absolute frame of reference, as seen from the survey perspective. A set of eight basic relations is sufficient to describe eight positional object relations and allow for eight object orientations. On one hand, the use of eight orientation classes seems natural for people, on the other hand, the use of eight orientation classes (opposed to for instance four orientation classes) adds a higher cognitive load for the description process by making it necessary for the observer to switch between two different types of frame of reference. Decisions had to be made to what extent to manufacture an easy reconstruction process and to what extent to be responsive to psychological results of typical human behavior in object configuration description. Both components are important in order to develop a representation scheme that is usable by a person from each side of the process. Nevertheless, these two aims are conflicting. However, Tverksy et al. [9] experienced that people accommodate the acceptable amount of inconvenience according to the cognitive load that the task requires of their communication partners. Therefore, it seems reasonable to balance the effort on both sides.
REFERENCES
7 8
6
2 c)
7
2
3
2
5
Summary
6
3
d)
Figure 3. Reconstruction of an object configuration.
one group containing all the objects that will be north of object 8 and and the other group containing all objects that will be south of object 8. Then the two groups, as they are, are moved apart from each other in vertical dimension indicated by the black line in figure 3c). This procedure does not influence any of the relationships of the reconstructed objects. The intuitive proof is as follows: At the beginning, the reconstruction is correctly containing all objects’ relationships. Objects will only be moved in increasing vertical dimension (southwards) and will therefore never cross any reference frame line that separates regions vertically. For all objects that are north, northwest,
[1] J. Fernyhough, A. G. Cohn, and D. C. Hogg, ‘Constructing qualitative event models automatically from video input’, in Image and Vision Computing, volume 18, pp. 81–103, (2000). [2] A. U. Frank, ‘Qualitative spatial reasoning with cardinal directions’, in Seventh Austrian Conference on Artificial Intelligence, ed., H. Kaindl, Informatik Fachberichte, pp. 157–167, Wien, Austria, (September 1991). [3] C. Freksa, ‘Temporal reasoning based on semi-intervals’, in Artificial Intelligence, volume 54, pp. 199–227, (1992). [4] R. K. Goyal and M. J. Egenhofer, ‘Similarity of cardinal directions’, in SSTD 2001, ed., C. S. Jensen, LNCS 2121, pp. 36–55. Springer-Verlag Berlin Heidelberg, (2001). [5] Alexander Klippel, ‘Wayfinding choremes’, in Spatial Information Theory: Foundations of Geographic Information Science, eds., W. Kuhn, M. F. Worboys, and S. Timpf, pp. 320–334, Berlin, (2003). Springer. [6] Willem J. M. Levelt, Speech, Place and Action, chapter Cognitive Styles in the Use of Spatial Direction Terms, 251–268, John Wiley & Sons Ltd., 1982. [7] Gerard Ligozat, ‘Reasoning about cardinal directions’, Journal of Visual Languages and Computing, 9(1), 23–44, (1998). [8] A. Mukerjee and G. Joe, ‘A qualitative model for space’, in Proceedings of the AAAI, pp. 721–727, Boston, (1990). [9] Barbara Tversky, Paul Lee, and Scott Mainwaring, ‘Why do speakers mix perspectives?’, Spatial Cognition and Computation, 1, 399–412, (1999).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-823
823
Probabilistic Reinforcement Rules for Item-Based Recommender Systems Sylvain Castagnos and Armelle Brun and Anne Boyer1 Abstract. The Internet is constantly growing, proposing more and more services and sources of information. Modeling personal preferences enables recommender systems to identify relevant subsets of items. These systems often rely on filtering techniques based on symbolic or numerical approaches in a stochastic context. In this paper, we focus on item-based collaborative filtering (CF) techniques. We propose a new approach combining a classic CF algorithm with a reinforcement model to get a better accuracy. We deal with this issue by exploiting probabilistic skewnesses in triplets of items.
1
INTRODUCTION
This paper focuses on recommender systems based on collaborative filtering techniques (CF). CF algorithms provide personalization by exploiting the knowledge of a similar population and predicting future interests of a given user (called ”active user”) as regards his/her known preferences. In practical terms, this kind of algorithms is broken down into 3 parts. Firstly, the system needs to collect data about all users under the form of explicit and/or implicit ratings. Secondly, this data is used to infer predictions, that is to say to estimate the votes that the active user would have assigned on unrated items. Finally, the recommender system suggests to the active user items with the highest estimated values. As the highest values of prediction are the only ones of interest, we propose a new model that focuses on prediction of high values, to improve accuracy. As the error on these values may be significant with a usual item-based CF algorithm, we propose to re-evaluate them by using reinforcement rules. The latter are automatically inferred by selecting triplets of items in the dataset according to their joint probabilities. After a short state-of-the-art, we propose a model combining an Item-Based Algorithm (CIBA) with reinforcement rules. We call it ”Reinforced Item-Based Algorithm” (RIBA).
2
RELATED WORK
2.1
Notations
To help the readers, we introduce the following notations: • • • • • 1
U = {u1 , u2 , . . . , un } is the set of the n users; I = {i1 , i2 , . . . , im } is the set of the m items; Uk refers to the set of users who have rated the item ik ; Ia is the list of items rated by the active user ua ; v(j, k) is the vote of the user uj on the item ik ; LORIA - University Nancy 2, email: {sylvain.castagnos, armelle.brun, anne.boyer}@loria.fr
• vmin and vmax are respectively the minimum and maximum values on the rating scale; • vl and vd are the thresholds for liked and disliked items; • i¯k is the average of all users’ ratings on ik ; • s(k, t) the similarity measure between ik and it ; • p(a, k) is the prediction of ua for item ik ; • pr(a, k) is the prediction of ua for ik with reinforcement rules.
2.2
Classical Item-Based Algorithm
To supply the active user with information that is relevant to his/her concerns, the system first builds his/her profile under the form of a vector of item ratings. Profiles of all users are then aggregated in a user-item rating matrix, where each line corresponds to a user, and each column to an item. Item-based CF is based on the observation that the consultation of a given item often leads to the consultation of another one [4]. To translate this idea, the system builds a model that computes the relationships between items. Most of time, the model is generated by transforming the user-item matrix in an item-item matrix. This conversion requires the computation of similarities between items (i.e. columns of the user-item rating matrix). The active user’s predictions are then computed by taking into account his/her known ratings, and the similarities between the rated items and the unrated ones. In this paper, we propose a model that can be plugged on an itembased collaborative filtering algorithm in order to refine some predictions. In this subsection, we present the Classical Item-Based Algorithm (CIBA) used as a base for our model. When implementing an item-based CF algorithm, the designer has to choose a pairwise similarity metric, and a prediction formula. We decide to use the Pearson correlation coefficient, as litterature shows this similarity metric works better [4]. Consequently, we fill the itemitem similarity matrix by applying the equation 1 for each pair of items. ¯ ¯ uj ∈Uk ∩Ut (v(uj , ik ) − ik )(v(uj , it ) − it ) > s(k, t) = > (1) ¯ 2 ¯ 2 uj (v(uj , ik ) − ik ) uj (v(uj , it ) − it ) We also compared different prediction formulas [2, 3]. We chose to adapt the weighted sum of the deviation from the mean, usually used in user-based framework, to an item-based context (cf. formula 2). This formula leads to the highest accuracy. p(a, k) =
¯ it ∈Ia s(k, t) × (v(a, t) − it ) +i¯k , vmax max vmin , min it ∈Ia |s(k, t)| (2)
824
3
S. Castagnos et al. / Probabilistic Reinforcement Rules for Item-Based Recommender Systems
REINFORCED ITEM-BASED ALGORITHM
Our model, called ”Reinforced Item-Based Algorithm” (RIBA), is a combination of a Classic Item-Based Algorithm (CIBA) and probabilistic association rules that come to reinforce some predictions. This section is dedicated to the way to combine these two approaches.
3.1
Probabilistic Reinforcement Rules
In standard item-based CF algorithms, similarities are computed between each neighbor item and the target item. We argue that, in some cases, pair-wise similarities may be insufficient to explain the interest of a user for an item. We propose here to evaluate similarities of triplets, rather than pairs of items, before the prediction phase. A triplet is an association rule where the premisse is made up of two terms. The conclusion is the reinforced item. To illustrate this statement, we can consider three items ik =”Cinderella”, it =”Scary Movie”, and iw =”Shrek”. A user may have liked ik which is a fairytale without appreciating iw . At the same time, a user who enjoys the horror film parody it should probably rate lowly iw . However, a film goer who likes both fairy tales and parodies will take fun when watching Shrek. Let introduce the following additional notations: • • • • • • •
Ik denotes the fact to like ik , i.e. when vj,k ≥ vl ; Ik is the fact to dislike ik , i.e. when vj,k ≤ vd ; I¨k when ik has not been rated (by convention, the vote is equal to 0 in this case); I˘k when ik has been rated (the vote is between vmin and vmax ); P (Ik , It , Iw ) the probability to like the three items ik , it , and iw ; P (Ik , It | I¨w ) the probability to like ik and it for users who have not rated iw ; N (Ik , It , I¨w ) the number of users who have liked ik and it , and not rated iw .
Then a rule < Ik , It >⇒ Iw means that Ik alone does not explain Iw , It alone does not explain Iw , but < Ik , It > together explain Iw . Let notice that 3 items could lead up to 8 reinforcement rules, such as < Ik , It >⇒ Iw , or < Ik , It >⇒ Iw .
3.2
Determination of the reinforcement rules
A triplet < ik , it , iw > is candidate to be a reinforcement rule < Ik , It > ⇒ Iw if the similarities between each pair of its items are around the mean similarity. In that case, the resulting reinforcement rule could impact accurately Iw . Thus a triplet is a candidate if the following constraints are satisfied: 0 < tmin ≤ |s(k, t)| ≤ tmax < 1
(3)
0 < tmin ≤ |s(k, w)| ≤ tmax < 1
(4)
0 < tmin ≤ |s(t, w)| ≤ tmax < 1
(5)
where tmin and tmax respectively refer to the minimum and maximum similarity threshold that will be set experimentally. For each reinforcement rule candidate, we compute the probability of the corresponding triplet. Thus for each triplet < ik , it , iw >, we
compute the joint probabilities P (Ik , It , Iw ), P (Ik , Iw | I¨t ), and P (It , Iw | I¨k ): N (Ik , It , Iw ) N (I˘k , I˘t , I˘w ) N (Ik , I¨t , Iw ) | I¨t ) = N (I˘k , I¨t , I˘w )
P (Ik , It , Iw ) =
(6)
P (Ik , Iw
(7)
If this probability is significantly higher than the probability of each pair of its items, than this triplet is selected as a reinforcement rule. The reinforcement rule < Ik , It >⇒ Iw is then generated when the following conditions are fulfilled: P (Ik , It , Iw ) P (Ik , Iw | I¨t ) P (Ik , It , Iw ) P (It , Iw | I¨k )
3.3
(8) (9)
Rating Refining Process
The generated reinforcement rules allow to refine some predictions. For each prediction p(a, k), a rule is applicable if ik corresponds to the item in the conclusion and if the premises are valid. Each applicable rule associated to p(a, k) is set to a weight w(r, a, k). This weight is equal to 1 when the conclusion of the rule is Ik , and it w(r, a, k) = −1 if the conclusion of the rule is Ik . We call ARa,k the set of rules that can be applied for the prediction computation of p(a, k). We refine the vote with the following equation: coef ∗ r∈ARa,k w(r, a, k) pr(a, k) = p(a, k) + (10) r∈ARa,k |w(r, a, k)| ”coef” is the coefficient of refinement. The greater this coefficient is, the more important the refinement will be.
4
CONCLUSION
In order to increase the quality of suggestions in recommender systems, we proposed a new approach combining an item-based collaborative filtering model with reinforcement rules. These rules are generated automatically by analyzing joint probabilities in triplets, and allow us to refine predictions of items where pair-wise similarities are not sufficient. The experiments show that this approach improves significantly the accuracy of high predictions. We validate our model by using the MovieLens dataset (http://www.movielens.org/) and get an improvement from 6 to 8% as regards the High MAE measure [1].
REFERENCES [1] Linas Baltrunas and Francesco Ricci, ‘Dynamic item weighting and selection for collaborative filtering’, in Workshop PriCKL07, in conjunction with the 18th European Conference on Machine Learning (ECML) and the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Warsaw, Poland, (September 2007). [2] Sylvain Castagnos and Anne Boyer, ‘A client/server user-based collaborative filtering algorithm: Model and implementation’, in 4th Prestigious Applications of Intelligent Systems special section (PAIS 2006), in conjunction with the European Conference on Artificial Intelligence (ECAI 2006), Riva del Garda, Italy, (August 2006). [3] Bradley N. Miller, Joseph A. Konstan, and John Riedl, ‘Pocketlens: Toward a personal recommender system’, in ACM Transactions on Information Systems, volume 22, pp. 437–476, (July 2004). [4] Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John Reidl, ‘Item-based collaborative filtering recommendation algorithms’, in World Wide Web, pp. 285–295, (2001).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-825
825
An Efficient Behavior Classifier based on Distributions of Relevant Events Jose Antonio Iglesias and Agapito Ledezma and Araceli Sanchis1 and Gal Kaminka2 1
Introduction
Recognizing the behavior of others is a significant aspect of many different human tasks. In order to make a good decision, humans usually try to predict the behavior of others. We present an approach for creating automatically the model of the behavior of agents (software agents, robots or humans). Because of the sequence learning is a common form of human and animal learning, the observations of an agent are transformed into a sequence of atomic behaviors which is statistical analyzed to find out its corresponding behavior model. Before recognizing a behavior, it needs to be modeled. Different techniques have been used in agent modeling in different areas: opponent-modeling in soccer domain simulation [6], intelligent user interface [7], and virtual environment for training [8]. However, although lot of research focus on agent modeling in an specific environment, it is not clear that they can be used in other environments. The aim of this research is to provide a general framework which can represent and classify different agent behaviors in a wide range of domains. Also, as the actions performed by an agent are usually influenced by his past experiences, the automated sequence learning is used for behavior classification.
2
b) Storage of the subsequences in a trie: The subsequences of events are stored in a trie, in which every node represents an event, and the node’s children represent the events that have appeared following this event. Each node keeps track of the number of times an event has been inserted on to it. The subsequence suffixes (subsequences that extend to the end of the sequence) are also inserted. c) Creation of the behavior model: The trie is traversed to calculate the relevance of each subsequence. For this purpose, frequency-based methods are used and the relative frequency or support of a subsequence is calculated. Then, an agent behavior model is represented by the distribution of its subsequences. 3. Storing the model in the Library: Once a behavior model (distribution of relevant subsequences) is created, it is stored in Library of Behavior Models (LibBM) (similar to plan-libraries used in plan recognition). This model is stored (with an identification name) as a trie for a good and effective handling (Figure 1a).
ABCD: Agent Behavior Classifier based on Distributions of relevant events
Any behavior has a sequential aspect and this sequentiality should be considered in the modeling process. Our approach classifies an observed agent behavior into the classes (behaviors) stored previously in a library. Therefore, this process is divided in the following 2 parts:
2.1
Construction of Behavior Models
1. Obtaining Atomic Behavior Sequences: Useful features are extracted from the stream of observations of the environment and an ordered sequence of events is obtained. An event is an atomic behavior that occurs during a particular interval of time and defines an specific agent act. The type of events is domain-dependent. 2. Creating the behavior model: The temporal dependencies are very significant and to get the most representative set of sequential events (subsequences) from the acquired sequence, the data structure trie [2] is used as in [3, 4]. The construction of a trie from a single sequence of events is processed in three steps: a) Segmentation of the sequence: This segmentation can be done by using some environment characteristic that separates the sequence in several subsequences of uninterrupted events or by obtaining every possible ordered subsequence of a defined length. 1 2
Carlos III University of Madrid, Spain, masm}@inf.uc3m.es Bar-Ilan University, Israel, galk@cs.biu.ac.il
{jiglesia,
ledezma,
Figure 1.
2.2
Agent Behavior Classification Process
Behavior Classification
The observations of the agent to classify are collected and the corresponding behavior model (represented by a distribution of events) is created. Then, it is matched with all the behavior models stored in LibBM. As both models are represented by a distribution of events, an statistical test is applied for matching these distributions. The proposed non-parametric test applied for matching two behaviors is a modification of Chi-Square Test for two samples. The behavior model to classify is considered as an observed sample and all the behavior models stored in LibBM are considered as expected
826
J.A. Iglesias et al. / An Efficient Behavior Classifier Based on Distributions of Relevant Events
samples. This test compares the observed distribution with all the expected distributions objectively and evaluates if a deviation appears. The proposed test is the comparison of two sets of support values 2 (Figure 1b). in which Chi-Square is the sum of the terms (Exp−Obs) Obs With this comparison, a value (comparing value) that indicates the difference (deviation) between the two distributions is obtained. The lower the value, the closer the similarity between the two behaviors. This comparison test is applied once for each behavior model stored in LibBM. The model which obtains the lowest deviation is considered as the most similar one. An advantage of the proposed test is its rapidity because only the observed subsequences are evaluated. However, there is no penalty for the expected relevant subsequences which do not appear in the observed distribution.
3 3.1
Experiments UNIX User Classification
In this domain, the behavior of a user is represented by the sequence of UNIX commands he/she typed during a period of time. We use 9 sets of preprocessed user data drawn from the command histories of 9 UNIX computer users [1]. Each UNIX user file is divided in: 1.Training Files: created with a small and random part of consecutive commands (100, 250, 500 and 850 commands) taking from the corresponding User file and creating 4 different LibBMs. These results are calculated using subsequences of size 6. 2. Testing Files: Obtained from the other part of each given user file. 20 Testing files with different amount of commands (from 15 to 35) are evaluated. For evaluating the results, a value (Classification Result Value) is calculated from the ranking list obtained for each classification. If the classification is done correctly, this value is the difference (positive) between the lowest and the second lowest value. If the classification is done incorrectly; for evaluating how far the obtained result is from the correct one, this value is calculated by comparing the lowest value with the obtained value (obtaining a negative value).
3.2
RoboCup Soccer Coach Simulation
The goal in this domain is to observe a game and recognize the behavior models (previously analyzed and stored in LibBM) followed by the opponent team members. For these experiments, we have used the rules from the RoboCup 2006 Coach Competition. The construction of models is done considering only the behavior followed by a few players (player behavior). However, the behavior to classify is the sum of several player behaviors (team behavior). The construction of models is done by analyzing several game log files (Training files) in which different player behaviors are activated. The procedure to identify high-level events in a soccer game described by Kuhlmann et al. [5] is used. Then, a new game in which several player behaviors are activated at the same time (team behavior) is observed and the player behaviors activated must be recognized. In these experiments, 17 player behaviors are analyzed (download from RoboCup 2006 Coach Competition web page) and stored in LibBM. The ranking list obtained (with the most likely player behaviors) is evaluated. Table 1 shows the first 10 elements of the ranking lists obtained for the 3 iterations of the first round. The number of player behaviors activated in each iteration is indicated in square brackets. The player behaviors are identified with a number (from 00 to 16) and the player behaviors activated are marked with an asterisk. Table 1.
Iter1 [4] Iter2 [5] Iter3 [5]
4
Results for the RoboCup Coach Competition. Round1
Ranking list reported (most likely player behaviors) 04(*), 16, 00(*), 12, 15(*), 03, 09, 05, 01, 06 16(*), 01(*), 00, 13(*), 05, 09, 07(*), 03, 10, 08(*) 04(*), 02(*), 13, 05, 12, 00(*), 01, 06(*), 03, 10
Conclusions and Future Works
A general approach which can represent and handle different behaviors in a wide range of domains is provided and it is generalizable using behaviors represented by a sequence of events. The experiments show that a system based on ABCD is very effective for classifying a UNIX user. For areas such as computer intrusion detection, these results are very encouraging. In the real-time and multi-agent domain; the results depend of the kind of behavior to recognize, however the obtained results are satisfactory. As many agents change their behavior and their preferences over time, their models should be frequently revised to keep it up to date. This aspect could be solved by using Evolving Systems. Also, the use of the classification results for carrying out effective actions in the environment is considered in our future work3 .
REFERENCES
Figure 2.
Classification Results - User 5
Figure 2 shows the classification results of 20 different commands of a UNIX user. X-axis: length of the sequence to classify (from 15 to 35 commands). Y-axis: classification result value obtained by applying ABCD. The 4 lines show the results by using 4 different sizes of Training Files to create the Tries of the LibBM: 100, 250, 500 and 850 commands. Each graph point is the average value of 25 different test conducted. Although this average determines that a sequence is correctly classified most of the tests, the classification of the 25 tests is not always correct. The percentages of the 25 tests correctly classified using testing file of 20 commands are shown in Figure 2.
[1] C.L. Blake D.J. Newman, S. Hettich and C.J. Merz. UCI repository of machine learning databases, 1998. [2] E. Fredkin, ‘Trie memory’, Comm. A.C.M., 3(9), 490–499, (1960). [3] Jos´e Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis, ‘A comparing method of two team behaviours in the simulation coach competition’, in MDAI, volume 3885 of LNCS, pp. 117–128. Springer, (2006). [4] Jose Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis, ‘Sequence classification using statistical pattern recognition’, in IDA, pp. 207–218, (2007). [5] Gregory Kuhlmann, Peter Stone, and Justin Lallinger, ‘UT Austin Villa 2003 simulator online coach team’, in RoboCup2003, (2004). [6] Agapito Ledezma, Ricardo Aler, Araceli Sanchis, and Daniel Borrajo, ‘Predicting opponent actions by observation.’, in RoboCup, (2004). [7] Neal Lesh, Charles Rich, and Candace L. Sidner, ‘Using plan recognition in human-computer collaboration’, in UM99, pp. 23–32, (1999). [8] M. Tambe and P. S. Rosenbloom, ‘Resc: An approach for dynamic, realtime agent tracking’, in IJCAI-95, Montreal, Canada, (1995). 3
Acknowledgments. This work has been supported by the Spanish Ministry of Education and Science under project TRA-2007-67374-C02-02.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-827
827
ContextAggregator: A heuristic-based approach for automated feature construction and selection Robert Lokaiczyk and Manuel Goertz1 Abstract. Our research goal is to work towards a personal contextaware assistance and retrieval of relevant resources to computer users during a certain work task. This paper presents a general-purpose, algorithmic approach for automated context aggregation by heuristicbased feature construction. Our implementation of the context reasoning layer combines lower-level context features to new aggregated higher-level context features. Our approach allows – in contrast to most other approaches – an automated feature combination to achieve a high prediction accuracy of the user’s work task.
Introduction Recent work in personal information and knowledge management systems often focuses on context awareness [5] and task orientation [3, 9]. Which, given a determined work task, provide the user with suitable learning resources relevant for the current learning need in the current work task. A crucial factor for fulfilling the vision of in-place and in-time e-learning systems is the user’s context. Taking the user’s current task into account the systems are able to provide adaptive assistance and learning resources. Forms of workplaceintegrated learning support might be displaying a list of task-relevant documents in an enterprise environment. In [7] and [3] resources are determined by querying the (pre-modelled) semantic network given a description of the current work task. Consequently, our goal should be to determine the current work task of the user automatically only by means of available context information on the desktop and not by manual input by the user.
1
Context
We focus on knowledge-intensive work on the desktop of the computer worker. Therefore, we define Desktop Context – in accordance with [1] – as all measurable environmental settings that surround the user desktop work. Technically, these settings are monitored by desktop context sensors that collect system events and user interaction with the workbench. The context sensors are implemented as software hooks that operate on operation system level and log the data continuously. Thereby, the layer of context elicitation is completely transparent and unobtrusive to the user. The collected context events are encoded in a data stream which can be used as a feature stream for further processing. Whole tasks can be seen as slices of the event stream consisting of typical events correlating with a certain work task. The sequence of context events reflects the users actions during the work process. Context events include keystrokes, application launches, full-text of 1
SAP Research, Darmstadt, Germany, email: firstname.lastname@sap.com
documents etc. Based on the user’s context information it is possible to predict the user’s work task.
2
Approach
The basic approach of understanding the problem of task detection as a machine learning classification is shown in [6]. Consequently we only briefly summarize the key idea. First, a reasonable amount of training data is acquired by manual annotation of the work task by user right during is work process. The user selects from a limited set of tasks which are pre-modeled and typical for the work process within the involved organization. The selected task is annotated to the collected training material of work streams recorded with the context monitor. The task prediction algorithm based on the learned model automatically classifies the active tasks using continuously recorded event streams. Whenever the classifier detects a change in the user’s work tasks, a new retrieval of task-relevant resources is triggered and our personal information assistant displays a new list of associated learning resources.
2.1
ContextAggregator Algorithm
The paper presents the idea of unsupervised context aggregation. Until now most approaches of aggregating desktop events to more complex, meaningful units are manually handled by the user or previously modeled by domain experts. We differ by providing an unsupervised algorithm for context aggregation that takes the user out of the loop and is not dependent of domain-specific knowledge. The fundamental idea is combining desktop events to new events that potentially are more valuable features for work task prediction. Thereby the mutual correlation between features is taken into account to increase information gain for the prediction.
2.2
Aggregation Functions
The idea of the aggregation functions here is basically building predicates on new combinations of features that are considered as potentially more valuable features. As measurement for the impact we use information gain [8], a common feature relevance measure from the data mining area. We propose a algorithm and a set of combination functions that appears to be very prospering for our particular context aggregation problem. For combining features we use an extensive set of functions that map a number of features (n) to a new feature (see equation 1). fi : F n → F (1) For our experiments the used set of functions turns out to deliver already good results. But the extensibility of the algorithm with more
828
R. Lokaiczyk and M. Goertz / ContextAggregator: A Heuristic-Based Approach for Automated Feature Construction and Selection
specific aggregation functions is definitely an advantage in order to receive even better results with domain-specific mapping functions.
2.3
Heuristic
For reducing both the computational complexity and the memory requirements we apply some heuristic rules that prefer certain feature combinations and reject others. In particular we use the following heuristics to reduce the complexity of context aggregation: I) Filter ill-defined mappings of events. As an example we can consider the function max(date, windowname), which is not defined. II) Keep statistics of transformation functions that usually lead to increased information gain. Thus, the algorithm can prefer rules that are already known to improve the result on the particular domain. III) Skip feature duplicates. We avoid those features by checking for duplicates within the already existing feature vectors. As an example you can consider max(max(event)) which always reduces to max(event). IV) Limit the stored feature set to a small subset of possible features. We keep only the topmost n features (ranked by information gain). V) Skip feature combinations with low impact. For a potential improvement the information gain of the feature combination should be at least above the maximum of the information gain of the involved features. With this set of rules the algorithm is quickly able to determine the most valuable feature combinations and will not take unimportant combinations into account.
3
Analysis of the Algorithms
First, the we analyse the convergence of the ContextAggregatoralgorithm. As shown in Figure 1(a), the ContextAggregatoralgorithm usually converges very fast, only after a few iterations. In our experiments there was no more strong improvement after about 5 iterations.
QE\-+KVS[XL
XIVQMREXMSRFIGEYWI -+C -+C
2YQFIVSJ-XIVEXMSRW
EGGYVEG]
2YQFIVSJ-XIVEXMSRW
(a) Convergence of the Maximum (b) Boosted Task Prediction AccuInformation Gain Growth racy
Figure 1. Evaluation of the Proposed Algorithm
For evaluation purposes we (see Chapter 1) collected context data together with annotated task labels during a work process (14 unique
users; 18 work hours). To measure the improvement of aggregating the context of the collected training material, we apply an n-fold cross-validation where n is the number of distinct users. We calculate the averaged performance metrics from the individual data segmentations. The separation of training data for each user is necessary in order to really prove that the learned knowledge from the training data is really transferable to the separated user whose own training material is not in the particular training set. As classification algorithm we use Naive Bayes, since it has the theoretical minimum error rate in comparison to all other classifiers [4] and practical experiments indicate a good accuracy even if the independence assumption is violated [2]. In order to prove the boosted accuracy with automatically derived higher-level context information we compare the accuracy values of the prediction algorithm with context aggregation to those without. Obviously, the context aggregation yields to an increase in prediction accuracy which can be seen in Figure 1(b). This result is significant at a confidence of 99%.
4
Summary
In this paper we propose a multi-purpose context aggregation algorithm based on heuristic rules that is able to construct more relevant features out of the large number of possible context events. Furthermore, we evaluate the algorithm on the data of a user study for the purpose of user task prediction and show a significant improvement over the basic non-aggregated version. By using a number of simple heuristics we are able to reduce the computational complexity and memory requirements of the aggregation algorithm.
REFERENCES [1] Anind K. Dey. Understanding and using context, 2001. [2] Pedro Domingos and Michael J. Pazzani, ‘Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier’, in International Conference on Machine Learning, pp. 105–112, (1996). [3] Olaf Grebner, Uwe V. Riss, Ernie Ong, Marko Brunzel, Thomas RothBerghofer, and Ansgar Bernardi. Task management for the nepomuk social semantic desktop (poster). 4th Conference on Professional Knowledge Management - Experiences and Visions -, March 2007. [4] Jiawei Han and Micheline Kamber, Data Mining. Concepts and Techniques., Morgan Kaufmann Publishers, 2001. [5] Angela Kessell and Christopher Chan, ‘Castaway: a context-aware task management system’, in CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pp. 941–946, New York, NY, USA, (2006). ACM. [6] Robert Lokaiczyk, Andreas Faatz, Arne Beckhaus, and Manuel G¨ortz, ‘Enhancing just-in-time e-learning through machine learning on desktop context sensors’, in CONTEXT, eds., Boicho N. Kokinov, Daniel C. Richardson, Thomas Roth-Berghofer, and Laure Vieu, volume 4635 of Lecture Notes in Computer Science, pp. 330–341. Springer, (August 2007). [7] H. Mayer, W. Haas, G. Thallinger, S. Lindstaedt, and K. Tochtermann. APOSDLE - Advanced Process-oriented Self-directed Learning Environment. Poster Presented on the 2nd European Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies, 30 November - 01 December 2005. [8] Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997. [9] Jianqiang Shen, Lida Li, Thomas G. Dietterich, and Jonathan L. Herlocker, ‘A hybrid learning system for recognizing user tasks from desktop activities and email messages’, in IUI ’06: Proceedings of the 11th international conference on Intelligent user interfaces, pp. 86–92, New York, NY, USA, (2006). ACM Press.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-829
829
A pervasive assistant for nursing and doctoral staff Alexiei Dingli and Charlie Abela1 Abstract. The goal of health-care institutions is to provide patientcentric health care services. Unfortunately, this goal is frequently undermined due to human-related aspects. The PervasIve Nursing And docToral Assistant (PINATA) provides a patient-centric system powered with Ambience Intelligence techniques and Semantic Web technologies. Through PINATA, the movement of patients and medical staff is tracked via RFID sensors while an automated camera system monitors the interaction of people within their environment. The system reacts to particular situations autonomously by directing medical staff towards emergencies in a timely manner and providing them with just the information they require on their handheld devices. This ensures that patients are given the best care possible on a 24/7 basis especially when the medical staff is not around.
1
INTRODUCTION
One of the main challenges faced by healthcare institutions is to maximize the available time that doctors and nurses spend with patients and to decrease mundane tasks such as form filling, which though important, inhibits the health worker’s efficiency and effectiveness. Ambient Assisted Living (AAL) systems which make use of Ambient Intelligence (AmI) technologies can help to solve these problems and to provide personalized solutions such as in [2] and [4]. These systems can be used for various tasks such as monitoring the patient’s permanence in a hospital, track down medical records, monitor diet, track movement and detect incidents (such as falls). Back-end intelligent systems are required to analyse the feedback obtained through the different sensors located around the hospital and recommend a plausible course of action for the medical staff.
2
STATE OF THE ART
Ambient Intelligence (AmI) builds on three key technologies: ubiquitous computing, ubiquitous communication and intelligent user interfaces [2]. Ubiquitous computing means integration of microprocessors into everyday objects like furniture, clothing, white goods, toys and even paint. Ubiquitous communication enables these objects to communicate with each other and with the user by means of ad-hoc wireless networking. Intelligent user-interfaces enable people in the AmI environment to control and interact with the environment in a natural (voice, gestures) and personalised way (preferences, context) [1]. In AmI, people are empowered through a context aware environment that is sensitive, adaptive and responsive to their needs, habits, 1
Department of Artificial Intelligence, Faculty of ICT, University of Malta, Malta, email: alexiei.dingli@um.edu.mt, charlie.abela@um.edu.mt
gestures and emotions. It is expected that by providing intelligent environments, quality and cost control can be improved and innovative intelligent personal health services can be developed. The five rights of patient care are often given as right patient, right drug, right dose, right route and right time [8]. Through technologies such as RFID or Radio-Frequency Identification, it is possible to further integrate the digital and healthcare worlds to maintain those five rights and to join-up care and processes. In [2] this technology was used to provide personalised visualisation of patients’ information (including also images) to doctors during a clinical session. In [4] there is an outline of an RFID model for designing a real-time hospital-patient management system. A pilot implementation was done in [3] which consisted in monitoring of person and patient logistics in operating theatres, tracking and tracing of operating theatre materials and tracking and tracing of blood products. In [9] it was being predicted that the RFID technology was to play a very important role in the healthcare sector.
3
METHODOLOGY
PINATA is based upon a Service Oriented Architecture (SOA) similarly to [7] and is composed of two main components (as can be seen in Figure 1); a Knowledge Brokering module (KBr) and a Device Manager (DM). The KBr consists of two main components, a KnowledgeBase (KB) and an AmI module. The role of the AmI is to integrate the patients’ information obtained through various sensors (after storing it inside the KB), analyse it and recommend a way forward. This module makes use of a number of domain specific ontologies which have been crafted in consultation with various medical entities. The Patient Ontology is one such ontology. It is an electronic representation of the patients’ records and describes patients’ profiles in terms of various health-related information. The Medical Ontology, is based on [5] and [6] and represents conceptual knowledge about clinical situations from three perspectives; clinical problems, investigations and recommendations. A set of rules are used to represent the decisionmaking logic of PINATA. The SOA approach was adopted to facilitate the integration of the patient-related data which typically resides in different hospitals or clinics. This approach allows the system to query the different organizations, get the data and collate it together thus providing a unified view of the information for the KBr. Once all the information is inside the KB, the AmI infers new knowledge from the available information and sends it to the medical staff for immediate action. The DM handles the various devices connected to the system. It also serves as a communication gateway between the AmI and the medical staff. In the present hospital scenario, the patient has an
830
A. Dingli and C. Abela / A Pervasive Assistant for Nursing and Doctoral Staff
bathroom. The KBr can distinguish between a movement in the bed (while the person is sleeping) and the actual action of going out of the bed. In the latter case, the system can switch on the lights of the bathroom automatically and switches them off once the person returns back to his/her own bed. When patients return back to their homes, a basic version of PINATA can be installed in their homes. This is feasible due to the fact that PINATA is based around a SOA architecture. Thus it is possible to have cameras and sensors installed in the households while the processing and interpretation of the captured data is sent to the main hospital servers for continual monitoring. By doing so, the care provided by the hospitals can be extended to the community, thus making it possible for more patients to spend less time in hospitals and more time recovering in their homes. Once in homes, PINATA can be further extended to handle other aspects of health-care and safety in order to improve on the quality of life.
4 Figure 1. The PINATA Architecture
RFID tag embedded inside the wrist band. The various RFID readers around the hospital detect the movement of the patient and send the information to the DM and eventually to KBr. This ensures that the patient’s whereabouts are continuously known by the medical staff. Handheld devices are used to provide the staff with various types of information including alerts (related to patients’ medication schedule). These alerts are described in the Medical Ontology and the web service responsible of keeping track of the patient’s medications makes use of this knowledge when sending out the alert to the nurse’s device. When a nurse is in the proximity of a patient, the handheld device reads the RFID tag and can automatically display the patient’s information, again via the appropriate set of web services. PINATA makes use of a camera-based monitoring system similar to [10], which tracks the movement of patients, through image processing and in case of an emergency alerts the nurse. To ensure that this system in no way presents a threat to the patient’s privacy, images are not recorded by the cameras. A typical situation in which this system becomes important is that in which a patient faints and falls in his room. Information captured through the camera is collated and analysed by the KBr which triggers an alert via the DM that is sent to the nurse. The RFID system is used to track the nurse which is in the closest vicinity to the patient in distress. The system also uploads automatically on the nurse’s handheld device, all the information required for that particular context. In a typical situation such as that in which the patient is suffering from anaphylactic shock due to some allergic reaction, the system is able to recommend to the nurse the best course of action. If the situation is deemed critical by the system (based upon various cues extracted from the environment and based upon knowledge accumulated during past events), it will automatically escalate the problem and request for reinforcements. Through the DM, PINATA can also interact with the surrounding environment and influence it. The KBr module is constantly collating the various inputs from the sensors (obtained through the DM) and managing the status of the environment. This involves switching on/off electrical equipment autonomously or alerting the person about possible dangerous situations. A typical situation is that in which a patient wakes up in the middle of the night to go to the
CONCLUSION
Even though PINATA is still a prototypical system and more work needs to be done, the results obtained from the system are encouraging. Patients quickly got used to it and the medical staff understood its potential and are now exploring new possibilities with our help. The beauty of the whole system is that it makes use of rather cheap technology which is readily available but which is controlled by a powerful brain. The KBr module is capable of integrating information obtained from various sources, reasoning things out and deciding on the best strategy. This has showed us that the time is reap to fuse intelligent systems with the real world and this fusion is unleashing new possibilities never thought of before in the field of personal health care and safety.
ACKNOWLEDGEMENTS This work was carried out within the PINATA project, funded by the Malta Council for Science and Technology (http://www.mcst.org.mt) and done in collaboration with St.James Hospital Malta (http://stjameshospital.com). The project was also supported by the Ministry of Technology (http://www.miti.gov.mt).
REFERENCES [1] J. Ahola, ‘Ambient intelligence: Plenty of challenges by 2010’, in EDBT ’02: Proceedings of the 8th International Conference on Extending Database Technology, p. 14, London, UK, (2002). Springer-Verlag. [2] J. Bravo, R. Hervas, G. Chavira, and S Nava, ‘Rfid-sensor fusion: An experience at clinical sessions’, PTA2006 Workshop, (2006). [3] Capgemini, ‘Gaining solid results with rfid in healthcare’, (2007). [4] B. Chowdhury and R. Khosla, ‘Rfid-based hospital real-time patient management system’, in ACIS-ICIS, pp. 363–368, (2007). [5] M. J. Field and K. N. Lohr, ‘Clinical practice guidelines:directions for a new program’, National Academy Press, Institute of Medicine, Washington, DC, (1990). [6] S. Hussain and S. Abidi, ‘Ontology driven cpg authoring and execution via a semantic web framework’, HICSS-40, Hawaii, (2007). [7] V. Issarny, D. Sacchetti, F. Tartanoglu, F. Sailhan, R. Chibout, N. Levy, and A. Talamona, Developing Ambient Intelligence Systems: A Solution based on Web Services, Springer Netherlands, 2005. [8] C. Jervis, ‘Tag team care: Rfid could transform healthcare’, e-Health Insider, (2005). [9] J. Reiner and M. Sullivan, ‘Rfid in healthcare: a panacea for the regulations and issues affecting the industry?’, (2005). [10] L. Snidaro, C. Micheloni, and C. Chiavedale, ‘Video security for ambient intelligence’, Systems, Man and Cybernetics, Part A, IEEE Transactions, (2005).
5. Natural Language Processing
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-833
833
Author Identification Using a Tensor Space Representation Spyridon Plakias and Efstathios Stamatatos1 Abstract. Author identification is a text categorization task with applications in intelligence, criminal law, computer forensics, etc. Usually, in such cases there is shortage of training texts. In this paper, we propose the use of second order tensors for representing texts for this problem, in contrast to the traditional vector space model. Based on a generalization of the SVM algorithm that can handle tensors, we explore various methods for filling the matrix of features taking into account that similar features should be placed in the same neighborhood. To this end, we propose a frequency-based metric. Experiments on a corpus controlled for genre and topic and variable amount of training texts show that the proposed approach is more effective than traditional vector-based SVM when only limited amount of training texts is used.
1
INTRODUCTION
Author identification deals with the assignment of a text of unknown authorship to one author, given a set of candidate authors for whom text samples of undisputed authorship are available. The plethora of available electronic texts (e.g., e-mail messages, online forum messages, blogs, source code, etc.) indicates a wide variety of applications in areas such as intelligence, criminal law, computer forensics, etc. [1] From a machine learning point-of-view, author identification can be viewed as a multi-class single-label text categorization (TC) task. Actually, several studies on TC use this problem as one more testing ground together with other tasks, such as topic identification, language identification, genre detection, etc. [6] However, there are some important characteristics of author identification that distinguish it from other TC tasks. In particular, in style-based TC the most important factor for selecting features is the frequency [4]. On the contrary, in topic-based TC the most frequent words are excluded since they carry no semantic information. Moreover, in the typical applications of author identification usually there is shortage of training texts for the candidate authors. This stands for both the amount and length of training texts. Therefore, it is crucial for authorship identification methods to be able to handle limited training texts effectively. The vast majority of TC methods use a vector-based representation of texts. Traditionally, a bag-of-words approach provides several thousands of lexical features. Alternatively, character-based features (character n-grams) can be used. The latter have provided very good results in authorship identification experiments albeit the fact they increase considerably the dimensionality of the representation [5]. Especially in the case of short texts, such representation will produce very sparse data. Powerful machine learning algorithms such as support vector machines (SVM) can effectively handle such high dimensional and 1
Dept. of Information and Communication Systems Eng., University of the Aegean, 83200 – Karlovassi, Greece, email: stamatatos@aegean.gr
sparse data. However, in case we have only a few instances for training, such algorithms are less effective. In this paper, we propose the use of tensor space representation for author identification tasks in order to cope with the problem of limited training texts. That is, instead of representing a text as a vector, we represent it as a matrix. Using a tensor of second order, the dimensionality of the text representation remains high but the classification algorithm has to learn much less parameters. As a result, it can better handle cases with very limited training instances. To this end, we use a generalization of the SVM algorithm that can handle tensors instead of vectors [3]. In contrast to the vector model, the position of each feature within the matrix is important since relevant features should be placed in the same row or column. Therefore, we examine several techniques for filling the representation matrix so that relevant features to be in the same neighbourhood. A set of experiments on a corpus controlled for genre and topic shows that when multiple short training texts are available the SVM model is the most effective. However, when only limited amount of short training texts is available, the tensor model produces better results.
2
THE TENSOR-BASED MODEL
In a vector space model, a text is considered as a vector in Rn, where n is the number of features. A second order tensor model considers a text as a matrix in Rx
Ry, where x and y are the dimensions of the matrix. A vector xRn can be transformed to a second order vector XRx
Ry provided n|x*y. A linear classifier in Rn (e.g., SVM) can be represented as aTx+b, that is, there are n+1 parameters to be learnt (b, ai, i=1,…,n). Similarly, a linear classifier in Rx
Ry can be represented as uTXv+b, that is, there are x+y+1 parameters to be learnt (b, ui, i=1,…y, vj, j=1,…x). Consequently, the number of parameters is minimized when x=y and this is much lower than n. Therefore, the vector space representation is more suitable in cases with limited training sets. To be able to handle tensors instead of vectors, we use a generalization of SVM, called support tensor machines (STM) [3]. This algorithm works iteratively. First, it sets u=(1,…,1)T. Then, it solves a standard SVM optimization problem to compute an estimation of v. Once v is estimated, it solves another standard SVM optimization problem to estimate a new u. The procedure of calculating new values for u and v is repeated until they tend to converge. It is obvious that the tensor-based model takes into account associations between the features. Each feature is strongly associated with features that are in the same row and column. It is, therefore, crucial to place relevant features in the same neighbourhood. In conclusion, to transform suitably a vector representation to a second order tensor representation, one has to define what features are considered relevant and how relevant features are placed in the same neighbourhood.
834
S. Plakias and E. Stamatatos / Author Identification Using a Tensor Space Representation
In this paper, we consider the frequency of occurrence as the factor that determines relevance among features [4]. In a binary classification case, where we want to discriminate author A from author B, the relevance r(xi) of a feature xi is:
f A ( xi ) f B ( xi ) f A ( xi ) f B ( xi ) b
r ( xi )
where fA(xi) and fB(xi) are the relative frequencies of occurrence of feature xi in the texts of author A and B, respectively, and b a smoothing factor. The higher the r(xi), the more important the feature xi for author A. Similarly, the lower the r(xi), the more important the feature xi for author B. In order to fill the matrix with the features taking into account the just defined relevance of features, we examined three techniques (an example for each case is shown in figure 1): Vertical: the columns of the matrix are filled with decreasing relevance values. Hence, the first columns of the tensor will be strongly associated with author A and the last columns with author B. On the other hand, the rows of the matrix contain features of mixed importance for the two authors. Diagonal: we start from the upper left corner of the matrix and fill diagonals with decreasing relevance values. Hence, the upper left part of the matrix will be strongly associated with author A and the lower left part with author B. That way, the first rows and columns are mainly associated with author A while the last rows and columns with author B. Hilbert: we use the Hilbert space filling curve [2]. Examples of such curves are shown in figure 2. This technique produces small neighbourhoods of relevant features but any row or column contain features of mixed importance. ª1 º «2» « » «3» « » «4» «5» « » «6» « » «7 » «8 » « » ¬« 9 ¼»
o
ª1 «2 « ¬« 3
Vertical
4 5 6
ª1 º «2» « » «3» « » 7 º «4» » « 8 », 5» « » 9 ¼» « 6 » « » «7 » «8 » « » ¬« 9 ¼»
o
ª1 «2 « ¬« 4
Diagonal
3 5 7
ª1 º «2» « » «3» « » 6º «4» » « 8», 5» « » 9 ¼» « 6 » « » «7» «8» « » ¬« 9 ¼»
o
ª4 «5 « ¬« 8
3 6 7
2º 1 »» 9 ¼»
Hilbert
Figure 1. Three different techniques to transform a vector to a second order tensor. The vector features are sorted with decreasing relevance r.
respectively. In all cases, the test corpus comprises 50 texts per author not overlapping with the training texts. To represent the texts we used a character n-gram approach. Thus, the feature set consists of the 2,500 most frequent 3-grams of the training corpus. A standard SVM model was built using the vector of 2,500 features. Moreover, the tensor model was based on a 50x50 matrix. For each space filling technique (vertical, diagonal, and Hilbert) we built a STM model. Note that since we deal with a multi-class author identification task, we followed a one vs. one approach, that is, for each pair of authors a STM model was built and the space filling technique was based on the feature relevance for that pair of authors. Based on preliminary experiments, we set the C parameter of SVM to 1, the corresponding parameter for STM models to 0.1 and the smoothing parameter b equal to 1. The comparison of the performance of SVM and STM models can be seen in table 1. Although SVM is superior when multiple training texts are available, the STM model based on vertical space filling provides better results when the training corpus is limited. Table 1. Performance of SVM and STM models. Training texts per author Method 50 10 SVM STM-Vertical STM-Diagonal STM-Hilbert
5
80.8% 78.0% 75.6% 76.6%
5
64.4%
48.2%
68.0% 60.8% 66.6%
51.2% 47.6% 46.0%
CONCLUSION
In this paper, we presented a tensor-based model for the author identification problem. The proposed approach is more effective than SVM when only limited amount of training texts is available. We used the frequency as the criterion of feature relevance and examined several space filling techniques to form the feature matrix so that relevant features to be in the same neighbourhood. The vertical method seems to provide the best results for limited training corpora. This technique produces some subsets of features (columns of matrix) that are strongly associated with the authors as well as other subsets (rows) that contain features of mixed importance for the authors. Further experiments should be conducted to verify this promising result. Moreover, more complex space filling techniques can be tested to provide even better results.
REFERENCES [1]
[2] Figure 2. Examples of the Hilbert space filling curve.
4
EXPERIMENTS
The corpora used for evaluation in this study consist of newswire stories in English taken from the publicly available Reuters Corpus Volume 1 (RCV1). The top 10 authors with respect to the amount of texts belonging to the topic class CCAT (about corporate and industrial news) were selected. Therefore, this corpus of short texts is controlled for genre and topic hoping that the main factor that distinguishes the texts will be the authorship. Three versions of this corpus were formed using 50, 10 or 5 training texts per author,
[3]
[4]
[5]
[6]
A. Abbasi and H. Chen, ‘Applying Authorship Analysis to Extremist-Group Web Forum Messages’, IEEE Intelligent Systems, 20(5), 67-75, (2005). A.R. Butz, ‘Alternative Algorithm for Hilbert’s Space Filling Curve’, IEEE Trans. On Computers, 20, 424-42 (1971). D. Cai, X. He, J.R. Wen, J. Han, W.Y. Ma. Support Tensor Machines for Text Categorization, Technical report, UIUCDCS-R2006-2714, University of Illinois at Urbana-Champaign, (2006). M. Koppel, N. Akiva, and I. Dagan, I. ‘Feature Instability as a Criterion for Selecting Potential Style Markers’, Journal of the American Society for Information Science and Technology, 57(11), 1519–1525, (2006). E. Stamatatos, ‘Ensemble-based Author Identification Using Character n-grams’, Proc. of the 3rd International Workshop on Text-based Information Retrieval, 41-46 (2006). D. Zhang and W.S. Lee,. ‘Extracting Key-substring-group Features for Text Classification’, Proc. of the 12th Annual SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 474-483 (2006).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-835
835
Categorizing Opinion in Discourse Nicholas Asher, Farah Benamara 1 and Yvette Yannick Mathieu2 1
Categorizing Opinions
While research in the field of opinion analysis has focused on determining the orientation of opinion words in various lexical categories, almost no work to date has investigated the effects of rhetorical relations on the expression of opinion. We present a preliminary study for a discourse-based opinion categorization and propose a new annotation scheme for a fine-grained contextual opinion analysis using discourse relations. This study uses a lexical semantic analysis of opinion conveying expressions, based on the research of Wierzbicka [1], Levin [3] and Mathieu [4], coupled with an analysis of how clauses involving these expressions are related to each other within a discourse. Rather than providing a definition of opinion, we study how affective content is explicitly and lexically expressed in written texts. An opinion expression belongs to one of our top-level categories: REPORTING, JUDGEMENT, ADVISE and SENTIMENT. In the REPORTING group, opinions are often expressed as the objects of verbs used to report the speech and opinions of others. These verbs convey the degree of the holders commitment to the opinion being presented, and some provide at least indirectly a judgment by the author on the opinion expressed. The opinion polarity is given by the verbs’ complements. This category contains three subgroups according to the degree of commitment and the degree of veracity concerning the information in their complements. In the first subgroup, we find verbs that introduce information that (a) the author takes as established (INFORM group) or that (b) the holder is strongly committed to (ASSERT group). The second subgroup contains (c) the TELL group. Unlike ASSERT verbs, TELL verbs do not convey strong commitments of the subject to the embedded content; unlike INFORM verbs, they do not convey anything about the authors view of the embedded content. Finally, the last subgroup introduces an opinion with a certain degree of subjectivity. It contains (d) the THINK group verbs which express the fact that the subject has a strong commitment to the complement of the verb and (e) the GUESS group verbs which express a weaker commitment on the part of the agent. The veracity of the information from (d) is stronger than the information from (e). The JUDGEMENT group involves words that express a positive or negative assessment of something or someone. It includes verbs, nouns and adjectives. We consider two subgroups: judgments referring to a system of social norms (f) the BLAME group and (g) the PRAISE group - and judgments referring to personal norms -(h) the APPRECIATION group-. ADVISE expressions urge the reader to adopt a certain course of action or opinion. We find here (i) the RECOMMEND group which expresses a good/bad opinion and a stronger push for some course of action and (j) the SUGGEST group used to say what the writer suggests or speculates on without being absolutely certain; finally, (k) 1 2
IRIT-CNRS France, email: {asher, benamara}@irit.fr LLF-CNRS France, email: yannick.mathieu@linguist.jussieu.fr
the HOPE group expresses the wish that some desire will be fulfilled. Expressions in (i) are stronger than in (j) and (k) whereas expressions (k) are weaker. Finally, words in the SENTIMENT group express an attitude toward something usually based on feeling or emotion rather than reasoning. They have a polarity as well as a strength. We distinguish here between positive sentiments expressed by words in the CALM DOWN, ENTERTAIN, JOY, LOVE and FASCINATE groups and negative sentiments expressed by words in the ANGER, BORE, OFFENSE, SADNESS, FEAR, HATE and DISAPPOINT groups. Some groups, such as ASTONISHMENT and TOUCH generally express a neutral polarity, although the polarity and the strength are given by the context.
2
Rhetorical relations between clauses containing opinion expressions
The rhetorical structure (RS) is an important element in understanding opinions conveyed by a text. Our four opinion categories are used to label opinion expressions within a discourse segment. Using the discourse theory SDRT [2] as our formal framework, we define a basic segment as a clause containing an opinion expression or a sequence of clauses that together bear a rhetorical relation to a segment expressing an opinion. We have segmented conjoined NPs or APs into separate clauses for instance, the film is beautiful and powerful is taken to express two segments: the film is beautiful and the film is powerful. Segments are then connected to each other using a small subset of ”veridical” discourse relations. For example, there are three opinion segments in the following sentence, S:[Even if the product is excellent]a, [the design is very basic]b, [which is disappointing in this brand]c. There is a CONTRAST relation between a and b that renforces sentiment expressed in segment c. We use five types of rhetorical relations. CONTRAST and CORRECTION indicate a difference of opinion. CONTRAST(a, b) implies that a and b are both true but there is some defeasible implication of one that is contradicted by the other, whereas CORRECTION(a, b) involves a stronger opposition and implies that b is true while a is false. To find these relations in text, we use specific discourse markers, such as: although, but, etc. for CONTRAST and protest, deny, etc. for CORRECTION. EXPLANATION(a, b), marked by because indicates that b offers a (typically sufficient) reason for a. ELABORATION(a, b), marked by for example, in particular implies that b gives more details on what was expressed within a. We have merged EXPLANATION and ELABORATION within a single relation called SUPPORT, as both of these relations are used to support opinions. RESULT(a,b) indicated by markers like so, as a result, indicates that b is a consequence or result of a. Finally, CONTINUATION(a, b) means that a and b form part of a larger thematic whole. For example, the RS of S is RESULT(CONTRAST(a,b),c). We also took account of disjunctions, conditionals and negations in
836
N. Asher et al. / Categorizing Opinion in Discourse
evaluating opinions.
3
A Semantic Representation
We represent each opinion word that belongs to a category with a shallow semantic feature structure (FS) that associates with a segment: the category it belongs to, the opinion holder, the opinion topic, the opinion expressions that enable us to identify the segment, and the associated modality. A modality is defined as a degree of preference(Pref) for expressions in the ADVISE category, or a combination of a degree of commitment (C) and a strength for expressions in the REPORTING category, or a combination of a polarity and a strength for expressions from the JUDGMENT and the SENTIMENT categories. For example, the groups (a) and (b) are associated to the modality C1, the groups (c) to C2 and the groups (d) and (e) to C3 such that C1 ≥ C2 ≥ C3. Simple scalar dimensions are used to represent strength. The values 2, 1 and 0 mean that the expression has a strong, a medium or a low strength, respectively. When verb arguments contain an opinion expression, we have an additional attribute in the FS describing the content of opinion expressions introduced by the verb. This attribute is mainly used for verbs in the REPORTING group. For example, the segment [The French presidency confirmed congratulations sent to Vladimir Putin] is represented as :
⎡ ⎢ ⎢ ⎣
Category : [ reporting : Assert] M odality : [ commitment : C1 , strength : 1] Holder (1) : T he F rench presidency Opinion word : conf irmed Category : [ judgment : praise] M odality : [ polarity : positive, strength : 1] Content (2) : Holder : (1) T opic : V ladimir P utin Opinion W ord : congratulations T opic : (2)
6
⎤ 7 ⎥ ⎥ ⎦
Discourse relations tell us how to combine various opinions using a set of dedicated combination rules. SUPPORT strengthens the opinion in the first constituent. CONTINUATION strengthens the polarity of the common opinion. RESULT strengthens the polarity or opinion in the second argument. For CONTRAST, we distinguish two cases. If the two arguments are opinion segments, then the CONTRAST weakens the polarity of the first argument. If one of the arguments bears a rhetorical relation with the other argument, then the CONTRAST strengthens the opinion polarity as in:[[I am an atheist], but [I totally agree with the priest]].
4
Annotation Methodology and Preliminary Results
We annotated three different types of on line corpora: movie reviews (M), Letters to the Editor (L) and news reports (N), written in French and English. M were taken from Telerama, AlloCine and movies.go.com, L from La Depeche du Midi and The San Francisco Chronicle, N from Le Monde, 20 Minutes and the MUC 6 news corpus. We randomly selected 150 articles for French corpora (around 50 articles for each genre). Two native French speakers annotated respectively around 546 and 589 segments. To check the cross linguistic feasibility of generalisations made about the French data, we also annotated opinion categories for English. We have annotated around 30 articles from M and L. For N, the annotation in English was considerably helped by using texts from the MUC 6 corpus (186 articles), which were annotated independently with discourse structures by three annotators in the University of Texas’s DISCOR project (NSF grant, IIS-0535154); the annotation for our opinion expressions involved a collapsing of structure proposed in DISCOR.
Our lexicon is then extended during the annotation process. Actually, we have categorized 200 verbs, 160 nouns and 195 adjectives for French and 187 verbs, 150 nouns and 170 adjectives for English. For each corpus, annotators annotate elementary discourse segments, define its shallow semantic representation and then connect discourse segments using the set of rhetorical relations we have identified. The average distribution of opinion expressions in our corpus across our categories in French (Bold font) and English (normal font) is shown in the table below. Table 1.
Groups Reporting Judgment Advise Sentiment
Distribution of categories by each annotator.
Movie (%) 2.67 2.12 60.53 40.52 6.92 10.63 27.30 34.04
Letters (%) 14.80 13.34 52.50 73.34 10.05 13.34 33.08 2.67
News (%) 43.91 42.85 39.23 33.34 7.27 9.52 11.35 16.67
Opinions in N involve principally reported speech. As we only annotated segments that clearly expressed opinions or were related via one of our rhetorical relations to a segment expressing an opinion, our annotations typically covered only a fraction of the whole document. The Press articles were the hardest to annotate and generally contained lots of embedded structures introduced by REPORTING type verbs, as well as negations. To compute the inter-annotator agreements (IAG), we chose to focus, at a first step, only on agreements on opinion categorization, segment identification and rhetorical relations. We computed the IAG only on the French corpus. We have a kappa of 95% on opinion categorization.
5
Conclusions and Future Works
We think that refined categories are needed to build a more nuanced appraisal of opinion expressions in discourse. The preliminary evaluations of our annotations have shown the validity of the categorization of opinions we proposed. We are able to calculate an overall global opinion on a topic in a principled way, by taking account of logical and discourse structure. In future research, we plan to (1) extend our annotation scheme to other types of corpora and to deepen our opinion typology, (2) compute IAG on the opinion holder, topics, modality as well as polarity, (3) characterize each discourse segment with a deep semantic representation and (4) to compare our annotation scheme to the MPQA one. In terms of automatization, we plan first to exploit a syntactic parser to get the argument structure of verbs and then to use a discourse segmenter like that developed in the DISCOR project, followed by the detection of discourse relations using cue words. This will allow us to use the deep semantic analysis to provide a classification of texts according to their opinions on various topics and to compare this approach to the bag of words approach.
REFERENCES [1] Wierzbicka A., Speech Act Verbs, Sydney: Academic Press, 1987. [2] N. Asher and A Lascarides, Logics of Conversation, Cambridge University Press, 2003. [3] B Levin, English Verb Classes and Alternations: A Preliminary Investigation, University of Chicago Press., 1993. [4] Y. Y. Mathieu, A Computational Semantic Lexicon of French Verbs of Emotion, In Shanahan, G., Qu, Y., Wiebe, J. (eds.): Computing Attitude and Affect in Text. Dordrecht, The Netherlands: Springer, 2004.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-837
837
A Dynamic Approach for Automatic Error Detection in Generation Grammars Tim vor der Br¨ uck1 and Holger Stenzhorn2 1
Introduction
In any real–world application scenario, natural language generation (NLG) systems have to employ grammars consisting of tremendous amounts of rules. Detecting and fixing errors in such grammars is therefore a highly tedious task. In this work we present a data mining algorithm which deduces incorrect grammar rules by abductive reasoning out of positive and negative training examples. More specifically, the constituency trees belonging to successful generation processes and the incomplete trees of failed ones are analyzed. From this a quality score is derived for each grammar rule by analyzing the occurrences of the rules in the trees and by spotting the exact error locations in the incomplete trees. In prior work on automatic error detection v.d.Br¨ uck et al. [5] proposed a static error detection algorithm for generation grammars. The approach of Cussens et al. [1] creates missing grammar rules for parsing using abduction. Zeller [6] introduced a dynamic approach in the related area of detecting errors in computer programs.
2
Error Detection
The basic purpose of NLG, as considered here (we follow the TG/2 formalism [3]), is to convert an input structure, given as feature value pairs, by means of grammar rules into a constituency tree where the surface text can be read off as the terminal yield. Each non–leaf node in this tree is associated to a particular input substructure, a category and the applied grammar rule while the leaf nodes are associated to text segments. The final surface text is created by concatenating the text segments of the leaf nodes. In case of success, the generation system not only returns the generated surface text (or texts if multiple possible solutions have been found) but also the associated constituency tree (or trees). However, in the case of failure, no surface text is generated and no associated constituency tree exists. However, it is obvious that, in order to detect a specific spot where the generation process fails, it is highly advantageous to have partial constituency trees for the failed generation attempts as well. For this reason, the employed generation engine has been extended to provide two types of partial trees in case of gen1 2
FernUniversit¨ at in Hagen, Hagen, Germany, tim.vorderbrueck@fernuni-hagen.de Institute for Medical Biometry and Medical Informatics, University Medical Center Freiburg, Freiburg, Germany, holger.stenzhorn@uniklinik-freiburg.de
eration failures: The tree of the first type is the largest tree to result from the generation process – we call the maximum tree. The other alternative tree, representing a non–successful generation, is the one having the smallest total difference to a complete tree – we call the minimum tree. Usually both types of trees are incomplete and hence can have non–terminal categories at its leaf nodes. In the following, a complete tree resulting from a successful generation is called a positive tree while an incomplete tree (either maximum or minimum) is called a negative tree. The detection of incorrect rules is basically done in several consecutive steps: 1. First, a global (i.e., independent of any specific input structure) rule quality score (gqs) is derived for each rule. 2. For each input structure leading to a generation failure, the most probable error location in the associated constituency tree is detected. 3. Both information are put together to derive a local rule quality score (lqs) which is associated to a certain input structure. The rules with the lowest lqs (and gqs below a given threshold) are considered as potentially erroneous. 1. Deriving a Global Rule Quality Score If a certain rule appears in a positive generation tree, this generally indicates that the rule is correct. However, the fact that a rule is appearing in a negative tree or in no constituency tree at all, is an indicator for an incorrect rule. By using this information, a gqs is defined for each rule. The gqs of a rule reflects the probability that a generation fails if this rule appears in the associated constituency tree. More specifically, the gqs is defined as the negated probability that a tree is negative if the rule r occurs in that tree: gqs := −P (t ∈ T − |(lhs(r), r) ∈ t) where • T − : the set of negative trees • lhs(r): left-hand side (LHS) category of rule r (see [3]) As usual, the probability is estimated by the relative frequency of a tree being negative if a certain rule appears in it. If a rule never appears in either the positive or negative trees then its gqs is set to −1 since this is a strong indication of a potential error. A rule is assumed to be correct if its gqs exceeds a given threshold h. To account for the fact that the probabilities for rules leading to negative trees (or that they appear in no tree at all) are not independent from each other (i.e., a rule might be assigned to a low score because of an error in an ancestor rule), a small portion of the gqs is propagated upwards and added to the gqs of each rule which could, according to its LHS, be
838
T. vor der Brück and H. Stenzhorn / A Dynamic Approach for Automatic Error Detection in Generation Grammars
possibly applied at a superior node in a constituency tree. Note that only scores are modified or propagated which are assigned to rules that are not assumed to be correct (gqs < h). 2. Spotting the Error in the Generation Tree: The gqs already results in a good approximation for identifying an incorrect rule. However this method also has a drawback in that the identified rules are not related to any input structure. This is obviously an important information for the grammar developer if (s)he wants to know why no output has been generated for a specific input structure. Furthermore this information can potentially be necessary to automatically correct the error (which is planned for future work). Hence, we additionally try to determine for each input structure leading to a generation failure the most probable location (node) in the constituency tree where the error occurred and use this information to calculate a local rule quality score (i.e., a score which relates to a certain input structure). The identified node is supposed to be associated to the LHS category of the erroneous rule 3 . Naturally, since positive trees do not lead to a generation error, for spotting the erroneous nodes only the negative trees have to be examined. An error is defined for each negative tree separately which means that different errors can relate to different constituency trees. Analogous to the determination of the gqs, there is again the possibility to employ either the maximum or the minimum trees where both methods have been evaluated. To spot the error location, each node in a negative tree is assigned to its node quality score (nqs). For the calculation of the nqs the following two aspects, relevant to many machine learning approaches, are taken into account: 1. How do the negative examples (i.e., negative trees) differ from the positive ones? 2. What do all negative examples (i.e., negative trees) have in common? To account for the first aspect, the probability of a node is determined in that a tree is negative if it contains this node (pair of category and rule): q1 = P (t ∈ T − |(r, c) ∈ t) where the probability is estimated by the relative frequency. A node is assigned the maximum value of 1 if it occurs only in the negative and never in positive trees. To account for the second aspect, the probability is estimated in that a negative tree contains a given pair of category and rule: q2 = P ((r, c) ∈ t|t ∈ T − ). A node is assigned the highest possible value of 1 if it occurs in all negative trees. The nqs for a tree node (r, c) is then given as nqs(r, c) = −q1 q2 . A node is considered to appear in a constituency tree if this tree contains a node associated to identical category and rule. Note that a leaf node of the incomplete constituency tree might not be associated with any rule. Such a node matches all nodes with identical category. The nodes associated with the lowest nqs are considered to be potentially erroneous, i.e., one of them is assumed to contain the LHS category of the erroneous rule. 3. Putting Both Types of Information Together : Finally, the gqs and the expected error location(s) are combined to the lqs. Even if the error location in the constituency 3
Note that this approach is not suitable for detecting rules with an incorrect LHS. In this case, only the gqs should be used instead.
tree is not correctly determined by this algorithm, the actual error location is often a child, parent or sibling of one of the indicated locations. Thus, for determining the lqs of a rule, its gqs is weighted depending of the minimum possible distance in a constituency tree of that rule’s LHS category from any node representing one of the indicated error locations using an exponential decay. If this distance could not be determined because the rule’s LHS category is not reachable at all, the distance is set to some large value (e.g., the number of categories in the grammar).
3
Evaluation and Conclusion
Table 1. Erroneous rule is among the top 5/3/2 suggestions; for all cases/cases with both positive and negative trees, in percent (%). Type Max. tree Min. tree
Top 5 54/94 44/86
Top 3 48/82 38/74
Top 2 48/82 38/73
For the evaluation, we randomly changed a path expression [5] of a rule’s right-hand side in the evaluation grammar and determined how often the erroneous rule appeared under the top five/three/two rules with the lowest lqs. The evaluation shows that the accuracy raises significantly if at least one positive constituency tree exists (see Table 1). The described algorithm has been implemented in a plugin for the grammar workbench eGram [3] which supports the GUI-based development of grammar rules for the grammar formalisms of the TG/2 [2] and XtraGen [4] NLG systems. The automatic detection and correction of grammar errors remains a very difficult task but it is an important and necessary step towards creating NLG systems that are easy to deploy in real-world application scenarios with large amounts of rules.
ACKNOWLEDGEMENTS We are especially obliged to Stephan Busemann for providing one of the authors with a research license of eGram and XtraGen. Furthermore we thank all members of our departments who contributed to this work.
REFERENCES [1] J.Cussens and S.Pulman, ‘Incorporating linguistics constraints into ILP’, in Proc. of CoNLL, Lisbon, Portugal, (2000). [2] S.Busemann, ‘Best-first surface realization’, in Proc. of INLG, Herstmonceux, UK, (1996). [3] S.Busemann, ‘eGram — a grammar development environment and its usage for language generation’, in Proc. of LREC, Lisbon, Portugal, (2004). [4] H. Stenzhorn, ‘XtraGen. A NLG system using Java and XML technologies’, in Proc. of NLPXML, Taipeh, Taiwan, (2002). [5] T. v.d.Br¨ uck and S. Busemann, ‘Suggesting error corrections of path expressions and categories for tree–mapping grammars’, Zeitschrift f¨ ur Sprachwissenschaft, 26(2), (2007). [6] A. Zeller, ‘Locating causes of program failures’, in Proc. of ICSE, Saint Louis, Missouri, USA, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-839
839
Answering Definition Question: Ranking for Top-k Chao Shen and Xipeng Qiu and Xuanjing Huang and Lide Wu 1 Abstract. As an important form of complex questions, definition question attracts much attention from QA researchers. For many of the definition question answering systems, it is a core step to rank the candidate answer sentences, so that the top-k in the ranked list can be extracted. We integrate these evidences as features into a whole framework, and propose a novel method to learning weights of these features to rank the candidate answer sentences.
1 Introduction Definition question answering[10], as an important form of complex question answering is attracting more attention recently. The definition question can be interpreted as “Tell me interesting things about X”. Here “X” is usually called “target”. Most definition question answering systems have the pipeline structure: Step-1 Extracting the candidate answer sentences from the corpus. Step-2 Ranking the candidate answer sentences. Step-3 Removing redundant answer sentences. Step-1 is the IR on the sentence or sub-sentence level. For a target, we can get a list of sentences through this step. Step-2 is the core step, which ranks the output of Step-1. Many researches on definition question focus on this step and various methods have been developed. Some simple methods such as checking the overlap of words between two sentences in the answer are often used in the step-3. To answer definition questions, pattern based methods [3] and centroid vector based methods [1, 5] are popular in ranking the answer sentences. And various resources including lexico-syntactic patterns and external resources such as Google, Wikipedia, encyclopedia, have been used as evidences to judge whether a sentence is a definition sentence about a target. However, in previous systems, if multiple resources have been used, the importance of each resource in the definition question answering system is fixed manually. Since different patterns and centroid vector may play different roles, there should be a way to automatically identify their weights. Our work propose a learning method which 1) gives the optimal top-k sentences instead of the optimal ranking of the whole list and 2) explicitly slackens the condition that definition sentences should be ranked ahead of the other. Using such learning method for ranking, we integrate evidences for sentence be to definition as features into a whole framework and achieve a better result.
2 Learning to Rank for Top-k In this section, we will introduce how weights of resources is learned. Specifically, we use modified version of a online learning algorithm 1
Fudan University, China, email: {shenchao,xpqiu,xjhuang,ldwu}@fudan .edu.cn
MIRA [2] for the task of sentence ranking in definition question answering. In training, a set of targets X = {x1 , x2 , . . . , xT } is given. Each target xt is associated with a set of nuggets sentences y t = {y1t , y2t , . . . , ynt t }, where yjt denotes the j-th sentence and nt denotes the sizes of y t . Meanwhile, each target also associated a list of sentences, st = {st1 , st2 , . . . , stmt }, which are the output of the first step of the pipeline system, and to be ranked. From st , we will select k sentences as the input of the step 3 module or directly as the answer of the target. An arbitary subset of st with size k is denoted as st (k). To evaluate these sets of sentences, we defined score(xt , st (k)) = w ∗Ψ(xt , st (k)), where Ψ(xt , st (k)) is the feature vector for the target and its k-sentences pair < xt , st (k) > and yˆt = arg maxst (k) score(xt , st (k)) will be extracted. We learn w with the goal that as many elements in yˆt are in y t as possible. If we assume each sentence is independent with others, the feature of the
k X
score(xt , stj )
j=1
where score(xt , stj ) = w ∗ ψ(xt , stj ). Thus yˆt is the top k sentences in the decreasingly ranked list of st by score(xt , stj ). Algorithm 1 Modified Version of Online MIRA Training Data: Γ = {(xt , y t )Tt=1 1: w0 = 0; v = 0; i = 0 2: for n : 1 . . . N do 3: for t : 1 . . . T do 4: min ||wi+1 − wi || 5: s.t. score(xt , sti ) − score(xt , stj ) ≥ 1 6: ∀stj ∈ Q, ∀sti ∈ y ∗t = (y t \ Q) ∪ P 7: v = v + w(i+1) 8: i=i+1 9: end for 10: end for 11: w = v/(N ∗ T ) MIRA is first proposed for multi-classification. In [8, 7], it was successfully used for structure-learning. The difference between the MIRA in [8] and the version of ours Algorithm 1 is the contraints (5,6th line of Algorithm 1) used to update the wi . To circumvent the problems of ranking for definition question answering mentioned in the Section 1, we first introduce y ∗t , through adding nugget sentences in st \ yˆt to yˆt and excluding non-nugget sentences from yˆt , and take it as a slackened supervisor of the learning.
840
C. Shen et al. / Answering Definition Question: Ranking for Top-k
We define θ0t = min{|yˆt \ y t |, |(st \ yˆt ) ∩ y t |}, i.e. the minimal number of non-nugget sentence in top-k and non-nugget sentences out of top-k. In the iteration of updating w with the input of (xt , yt ), we build y ∗t by inserting θt = min{θ0t , θ} nugget sentences, P , out of top-k into the top-k sentences and excluding the same number of non-nugget sentences, Q. Then y ∗t = (y t \Q)∪P is a better answer which contains more θ nugget sentences, if possible, compared with yˆt . and P and Q is defined as following: P : the top-θ nugget sentences of st \ yˆt Q : the bottom-θ non-nugget sentences in yˆt
Table 1. Comparison in terms of ranking on the TREC 2006 question set
k 10 15 20 25 30 35
Our Method 0.2401 0.2725 0.2859 0.2801 0.2579 0.2338
RankSVM 0.1697 0.2068 0.2186 0.2225 0.1944 0.1916
F3 Han-Model 0.2282 0.2382 0.2592 0.2610 0.2557 0.2502
Exact-Answer 0.1842 0.2100 0.2737 0.2643 0.2449 0.2153
Table 2. Performance on TREC 2005 Question Set
3 Experiments We do two experiments on 65 TREC 2004 targets, 75 TREC 2005 targets and 75 TREC 2006 targets to validate our method. Same module of sentence extraction in [9] is used to extracting the candidate answer sentences from the corpus and no removing redundancy module is used. The features used in the paper is also same in [9], including 4 based on language model, 1 about document retrivel, and several based on syntaical patterns. In order to building training corpus, we collect the judgement of TREC to all the submitted answers from participants. If a [string, docid] pair is judged covering certain nugget of a target xt , we extract the original sentence from AQUAINT according to the [string, docid] pair, and add it to the set yt for target xt .
F3-Score 0.2872 0.3031 0.3095
with a automatical evaluation tool Pourpre v1.0c [6]. [5] gave the result of their experiment on the TREC 2005 as test data and TREC 2004 as training data. As same as the setting of [5], we select the top 12 highest ranked sentences (k = 12) as answers. According to the analysis of the parameter θ, we let θ = 2. From Table 2, we can see our method clearly outperforms SP, and has a comparable result with HIM.
4 Conclusion
3.1 Ranking Comparison To show the effectiveness of our ranking method, we compare our result with those of the following methods. RankSVM RankSVM is used to rank definition sentences [11]. As same as in [11], we only use linear kernel. Han-Model If we fix the weights for 4 features based on language models, we can take our system as a simple version of the statistical model proposed by [4]. Exact-Answer In our proposed method, we do not ask all nugget sentences ranked higher than non-nugget sentences. In this baseline, we construct stricter constraints, all nuggets sentences of a target should be ranked higher than the current non-nuggets sentences in top-k. s.t. s(xt , yit ) − s(xt , stj ) ≥ 1 ∀stj ∈ yˆt \ y t , ∀yit ∈ y t
System Soft-Pattern (SP) Human Interest Model (HIM) Our Method
(1)
Comparison is on the TREC 2006 targets and TREC 2005 targets are used for training. This is because target set of TREC 2005 and 2006 both include PERSON, ORGANIZATION, THING, EVENT, but TREC 2004 does not contain EVENT targets. θ is decided by 5-fold cross validation on TREC 2005 targets. Table 1 shows the F3-score for each methed. Though RankSVM and Exact-Answer use more features, they still fail to outperform Han-Model. This implies the importance of the ranking method: If the weights of the features cannot be decided properly, the extra features will not help improve the performance. We can see our method has advandage, especially when k is relative small.
3.2 Comperison with Other Systems In [5], two state of the art systems, Soft Pattern model(SP) and Human Interests Model(HIM) are evaluated on the TREC 2005 targets
In this paper, we integrate multiple resources to rank candidate answer sentences for definition question answering. Specifically, we have proposed a method of learning for ranking to do such task. Instead hoping that all definition sentences are at the top of the list of candidate answer sentences, we use a slack parameter θ to let the top-k sentences involve as many definition sentences as possible. Experimental results indicate that our proposed method performed better than the several other methods to rank used in the definition question answering. And our multiple resources integrated system has a comparable result to state of the art system.
REFERENCES [1] Y. Chen, M. Zhou, and S. Wang, ‘Reranking answers for definitional QA using language modeling’, Proc. of ACL, (2006). [2] K. Crammer and Y. Singer, ‘Ultraconservative online algorithms for multiclass problems’, Journal of Machine Learning Research, 3, 951– 991, (2003). [3] H. Cui and M.Y. Kan, ‘Generic soft pattern models for definitional question answering’, Proc. of ACL, (2005). [4] K.S. Han, Y.I. Song, and H.C. Rim, ‘Probabilistic model for definitional question answering’, Proc. of SIGIR, (2006). [5] K.W. Kor and T.S. Chua, ‘Interesting nuggets and their impact on definitional question answering’, Proc. of SIGIR, (2007). [6] J. Lin and D. Demner-Fushman, ‘Automatically evaluating answers to definition questions’, Proc. of HLT-EMNLP, (2005). [7] R. McDonald, ‘Discriminative Sentence Compression with Soft Syntactic Evidence’, Proc. of EACL, (2006). [8] R. McDonald, K. Crammer, and F. Pereira, ‘Online Large-Margin Training of Dependency Parsers’, Proc. of ACL, (2005). [9] X. Qiu, B. Li, C. Shen, L. Wu, X. Huang, and Y. Zhou, ‘FDUQA on TREC2007 QA Track’, Proc. of TREC, (2007). [10] E.M. Voorhees, ‘Overview of the TREC 2003 Question Answering Track’, Proc. of TREC, (2003). [11] J. Xu, Y. Cao, H. Li, and M. Zhao, ‘Ranking definitions with supervised learning methods’, Proc. of WWW, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-841
841
Ontology-Driven Human Language Technology for Semantic-Based Business Intelligence Thierry Declerck1 and Hans-Ulrich Krieger2 and Horacio Saggion3 and Marcus Spies4 Abstract. In this poster submission, we describe the actual state of development of textual analysis and ontology-based information extraction in real world applications, as they are defined in the context of the European R&D project ”MUSING” dealing with Business Intelligence. We present in some details the actual state of ontology development, including a time and domain ontologies, which are guiding information extraction onto an ontology population task.
1 INTRODUCTION MUSING is an R&D European project dedicated to the development of Business Intelligence (BI) tools and modules founded on semantic-based knowledge and content systems. MUSING integrates Semantic Web and Human Language technologies for enhancing the technological foundations of knowledge acquisition and reasoning in BI applications. The impact of MUSING on semantic-based BI is being measured in three strategic domains: • Financial Risk Management (FRM), providing services for the supply of information to build a creditworthiness profile of a subject – from the collection and extraction of data from public and private sources up to the enrichment of these data with (semantic) indices, scores and ratings; • Internationalization (INT), providing an innovative platform, which an enterprise may use to support foreign market access and to benefit from resources originating in other markets; • IT Operational Risk & Business Continuity (ITOpR), providing services to assess IT operational risks that are central for Financial Institutions – as a consequence of the Basel-II Accord - and to asses risks arising specifically from enterprise’s IT systems – such as software, hardware, telecommunications, or utility outage/disruption. Across those development streams of MUSING, there are some common tasks, like the one consisting in extracting relevant information from annual reports of companies and to map this information into XBRL (Extended Business Reporting Language). XBRL is a standardized way of encoding financial information of companies, but also the management structure, location, number of employees, etc. (see www.xbrl.org). This is mostly ”quantitative” information, which is typically encoded in structured documents, like financial tables or company profiles etc. But for many Business Intelligence applications, there is also a need to consider ”qualitative” information, which is most of the time delivered in the form of unstructured text, 1 2 3 4
DFKI GmbH, Germany, email: declerck@dfki.de DFKI GmbH, Germany, email:kriegger@dfki.de University of Sheffield, UK, email: H.Saggion@dcs.shef.ac.uk Semantics Technology Institute, Austria, marcus.spies@sti2.at
which one can find in textual annexes to the balance sheets in annual reports or in news articles. The problem is here how to accurately integrate information extracted from structured sources, like the periodic reports of companies, and the day to day information provided by news agencies, mostly in unstructured text form. The detection and interpretation of temporal information in structured and unstructured documents is also a central focus of our attention in MUSING. We describe in the following the actual state of development of MUSING ontologies, including our proposal for temporal representation. Due to lack of space, we can not show here examples of the kind of temporal expressions we encounter in applications of MUSING, and how our IE and Ontology Population tools deal with those expressions in the light of our representation of temporal information, aiming also at supporting temporal reasoning in various applications. But those examples will be available on the poster.
2 STATE OF MUSING ONTOLOGIES In MUSING we decided to use as the upper level ontology the PROTON ontology (http://proton.semanticweb.org), on the base of which domain-specific extensions can be easily defined. The species of the model of the PROTON Upper module is OWL Full. The MUSING version available contains mostly the same information as the original one but is slightly changed to fulfill the OWL Lite criteria. The System module of PROTON, http://proton.semanticweb.org/2005/04/protons, provides a sort of high-level system- or meta-primitives. It is the only component in PROTON that is not to be changed for the purposes of ontology extension.” The Top-Level classes in PROTON, http://proton.semanticweb.org/2005/04/protons, represent the most common definition of world knowledge concepts. These can directly be used for knowledge discovery, metadata generation and to interface intelligent knowledge access tools. The PROTON has also an upper module, http://proton.semanticweb.org/2005/04/protonu, which adds sub-classes and properties to the Top-module super classes to the concepts other than ”Abstract, Happening and Object” from the original PROTON Top ontology. The ”Extension” ontology in MUSING has been designed as a single contact point between upper and MUSING application specific ontologies. In MUSING we also developed a general time ontology, which is also added to the upper module. Besides the time ontology, there are currently five domain ontologies, which are not assigned to any particular application. They cover the following areas: Company, Industry sector, BACH (Standard for a harmonization of financial for harmonizing accounts of companies across countries), XBRL (Standard language for ”Business Reporting”) and Risk. In the time ontology of MUSING, temporally-enriched facts are represented through time
842
T. Declerck et al. / Ontology-Driven Human Language Technology for Semantic-Based Business Intelligence
slices, four dimensional slices of what Sider (1997) calls a spacetime worm (we only focus on the temporal dimension in MUSING). These worms, often referred to as perdurants, are the objects we are talking about. The time ontology itself contains the conceptualization of temporal objects that are relevant in MUSING. In fact, any time ontology can be combined with the ”4D” ontology. The other other ontologies are domain and applications specific. As a concluding remark about the ontologies, we would like to mention that they have been built by hand, most of them on the base of ”compentency questions” addressed by domain experts. But it is also planned in MUSING to investigate the topic of (semi-)automatic ontology learning or creation, on the base of information and knowledge extracted from the analyzed data. The poster presentation will mainly visualize the interconnections of the ontologies, and the integrated reasoning component that has been designed for acting on the ontologies and the knowledge bases of MUSING.
3 ONTOLOGY-BASED INFORMATION EXTRACTION IN MUSING In the former chapter, we presented in some details the different types of MUSING ontologies, and the way they interact (mainly via the ”Extension” ontology). This model of the relevant concepts for a set of Business Intelligence applications has to be filled (or populated) with real data, so that the applications can make use of the semantic capabilities of such an ontology infrastructure. We call this task ”ontology population”, which in a sense is Information Extraction (IE) guided by ontologies, the results of IE not being displayed in the form of templates, but in knowledge representation languages, e.g. OWL in the case of MUSING. The information stored in this way is considered as ”instances” of the concepts and relations introduced in the ontology. The set of instances is building the knowledge base for the applications, and this knowledge base is supporting for example credit institutes on their decision-making procedures on credit issuing issues. As we mentioned in the introduction, a substantial amount of the needed information for the development of semantic business intelligence applications is to be found in unstructured textual documents, so that the automatic ontology population task is relying on natural language processing in general and Information Extraction in particular. It is important to note here that all the instances of the ontologies, populated by means of the IE tools, are automatically ”enveloped” within temporal information, which turns every entity or event into a perdurant In case termporal information is not available, or has not been found, this can be left underspecified in the representation of the instances, and filled by information generated from other resources, or by the temporal reasoning engine, also implement in MUSING. As an example we can look at the following sentence, we took from a newspaer: ”Ermotti arbeitete frueher kurz fuer den weltgroessten Finanzkonzern Citigroup und danach 17 Jahre lang bis 2004 fuer die Investmentbank Merrill Lynch.” (Ermotti have worked before for a short time for the world largest financial concern, Citigroup, and afterwards for 17 years, till 2004, for the investment bank Merrill Lynch.) This is a quite interesting sentence, since it contains a lot of temporal expressions (actually a quite normal fact in news articles). The first two expressions (”before” and ”a short time”) are again very vague. So here we assume that the before is actually ”before the pubdate”. The next temporal expressions are ”for 17 years” and ”till
2004”. In those two expressions we get now more precise information: The relation ”Ermotti works at Merrill Lynch” is first associated with the duration of 17 years, and in a second step we can calculate the starting point of this relationship since an ending point is given: 2004 (we allow for such under-specification in the time ontology, having introduced a class called ”yearDate”). In order to extract this information and to populate the ontology we need here a deeper linguistic analysis. We extract with the help of syntactic analysis (and more specially dependency analysis) that there is a working relationship between Ermotti (as the subject of the first part/clause of the sentence) and Merril Lynch. We can associate the time code to this relationship on the base of the dependency analysis of the two temporal expressions as linguistic expressions that ”modify” the main verb ”arbeitete” (worked). The name of the company for which Ermotti is working is included in a prepositional phrase (PP). The linguistic pattern ”[NP-SUBJ X] works [PP for [NP-IOBJ Y]]”is a very good candidate for a mapping into a relation <X is employed by Y>. But clearly the constraints that apply to both ”X” and ”Y” are, that the first is an instance of a person and the second an instance of a company (domain and range of the relation). In this example, the reader could see how the constituent analysis of text, coupled with named entity detection, some lexical semantics and dependency relations, is guiding the ontology population. In this example we can also see that there are at least three syntactic ways to express temporal information; as an Adverb, an NP and a PP. First the textual analysis gives a linguistic structure to the unstructured text, on the base of which we define a mapping, which associates the name of the person to the person ontology and the name of the company to the company ontology. The relationship ”<Errmotti, is employed by, Merril Lynch>” can then be associated to the time slice ”1987-2004”. From the individual news article under consideration we can not extract information about activities of Ermotti in the time between 2004 and 2005-12-16, but we assume that he had an activity in the banking domain. We can thus automatically query for documents telling us something about ”Ermotti” and ”Year 2005”, in order to ”fill the temporal gap” in the information card about Ermotti. The already extracted information and the temporal ontology of MUSING are structuring the semantic content of the query. On this base we found for example an article published on the 2006-12-06, one year later. The poster presentation will visualize in details the interconnections of the ontologies and the NLP and IE tools in order to populate the ontologies.
4 Conclusion In this poster, we show how we combine Semantic Web resources and tools with Language Technologies, in order to help in creating knowledge bases in the field of Business Intelligence applications, ”upgrading” thus the actual strategies implemented in this field, building on quantitative and qualitative information automatically extracted from various types of documents, towards a new generation of semantically driven Business Intelligence methods and tools.
ACKNOWLEDGEMENTS The research described in this paper has been partially financed by the European Integrated Project MUSING, with contract number FP6-027097.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-843
843
Evaluation Evaluation David M W Powers1 Abstract. Over the last decade there has been increasing concern about the biases embodied in traditional evaluation methods for Natural Language Processing/Learning, particularly methods borrowed from Information Retrieval. Without knowledge of the Bias and Prevalence of the contingency being tested, or equivalently the expectation due to chance, the simple conditional probabilities Recall, Precision and Accuracy are not meaningful as evaluation measures, either individually or in combinations such as F-factor. The existence of bias in NLP measures leads to the ‘improvement’ of systems by increasing their bias, such as the practice of improving tagging and parsing scores by using most common value (e.g. water is always a Noun) rather than the attempting to discover the correct one. In this paper, we will analyze both biased and unbiased measures theoretically, characterizing the precise relationship between all these measures.
1
INTRODUCTION
A common but poorly motivated way of evaluating results of Language and Learning experiments is using Recall, Precision and F-factor. These measures are named for their origin in Information Retrieval and present specific biases, namely that they ignore performance in correctly handling negative examples, they propagate the underlying marginal Prevalences and Biases, and they fail to take account the chance level performance. In the Medical Sciences, Receiver Operating Characteristics (ROC) analysis has been borrowed from Signal Processing to become a standard for evaluation and standard setting, comparing the Recall-like True Positive Rate and False Positive Rate. In the Behavioural Sciences, the related concepts of Specificity and Sensitivity, are commonly used. Alternate techniques, such as Rand Accuracy, have some advantages but are nonetheless still biased measures unless explicitly debiased.
2
THE BINARY CASE
It is common to introduce the various measures in the context of a dichotomous binary classification problem, where the labels are by convention + and − and the predictions of a classifier are summarized in a four cell contingency table. This contingency table may be expressed using raw counts of the number of times each predicted label is associated with each real class, A, B, C, D, summing to N, or we may use acronyms for the generic terms for True and False, Real and Predicted Positives and Negatives, or else relative versions of these, e.g: tp, fp, fn, tn and rp, rn and pp, pn refer to the joint and marginal probabilities, and the four contingency cells and the two pairs of marginal probabilities each sum to 1. These systems are both illustrated in Table 1.
1
AILab, CSEM, Flinders University of South Australia, email:David.Powers@flinders.edu.au
We thus make the specific assumptions that we are predicting and assessing a single condition that is either positive or negative (dichotomous), that we have one predicting model, and one gold standard labelling.
2.1
Recall & Precision, Sensitivity & Specificity
Recall or Sensitivity (as it is called in Psychology) is the proportion of Real Positive cases that are correctly Predicted Positive. This measures the Coverage of the Real Positive cases by the +P (Predicted Positive) rule. Its desirable feature is that it reflects how many of the relevant cases the +P rule picks up. It tends not to be very highly valued in Information Retrieval (on the assumptions that there are many relevant documents, that it doesn't really matter which subset we find, that we can't know anything about the relevance of documents that aren't returned). Recall tends to be neglected or averaged away in Machine Learning and Computational Linguistics (where the focus is on how confident we can be in the rule or classifier). However, Recall has been shown to have a major weight in predicting success in several context including these areas, and in a Medical context Recall is primary but it is referred to as True Positive Rate (tpr). Recall is defined, with its various common appellations, by equation (1): Recall
= Sensitivity = tpr = tp/rp
(1)
Conversely, Precision or Confidence (as it is called in Data Mining) denotes the proportion of Predicted Positive cases that are correctly Real Positives. It can also be called True Positive Accuracy (tpa), as a measure of accuracy of Predicted Positives in contrast with rate of discovery of Real Positives (tpr). Precision is defined in (2): Precision
= Confidence = tpa = tp/pp
(2)
These two measures and their combinations focus only on the positive examples and predictions, although between them they capture some information about the rates and kinds of errors made. However, neither of them captures any information about how well the model handles negative cases. Recall relates only to the +R column and Precision only to the +P row. Neither of these takes into account the number of True Negatives. This also applies to their Arithmetic, Geometric and Harmonic Means: A, G and F=G2/A (the F-factor or F-measure).
Table 1. Systematic and traditional notations in a contingency table. +R
−R
+P
tp
fp
pp
−P
fn
tn
pn
rp
rn
1
+R
−R
+P
A
B
A+B
−P
C
D
C+D
A+C B+D
N
844
D.M.W. Powers / Evaluation Evaluation
Usually, there is in principle nothing special about the Positive case, and we can define Inverse statistics in terms of the Inverse problem in which we interchange positive and negative and are predicting the opposite case. Inverse Recall or Specificity is thus the proportion of Real Negative cases that are correctly Predicted Negative (3), and is also known as the True Negative Rate. Rand Accuracy explicitly takes into account the classification of negatives, and is expressible both as a weighted average of Precision and Inverse Precision and as a weighted average of Recall and Inverse Recall. Conversely, the Jaccard or Tanimoto similarity coefficient explicitly ignores correctly classified negatives (TN). Each of these measures also has a complementary form defining an error rate, of which some have specific names and importance: Fallout or False Positive Rate (fpr) is the proportion of Real Negatives that occur as Predicted Positive (ring-ins); Miss Rate or False Negative Rate (fnr) is the proportion of Real Positives that are Predicted Negatives (false-drops).
2.2
Prevalence, Bias, Cost & Skew
We now turn our attention to various forms of bias or skew that detract from the utility of all of the above surface measures [1,2]. We will first note that rp represents the Prevalence of positive cases, RP/N – it is not usually under the control of the experimenter. By contrast, pp represents the (label) Bias of the model [1], the tendency of the model to output positive labels, PP/N, and is directly under the control of the experimenter, who can change the model by changing the theory or algorithm, or some parameter or threshold. A common rule of thumb, and a characteristic of some algorithms, is to parameterize a model so that Prevalence = Bias, viz. rp = pp. Corollaries of this setting are Recall = Precision (= A = G = F), Inverse Recall = Inverse Precision and Fallout = Miss Rate.
2.3
ROC and PN Analyses
Flach [4] has highlighted the utility of ROC analysis to the Machine Learning community, and characterized the skew sensitivity of many measures in that context, utilizing the ROC format to give geometric insights into the nature of the measures and their sensitivity to skew. ROC analysis plots the rate tpr against the rate fpr. The most common condition is to minimize the area under the curve (AUC), which for a single parameterization of a model is defined by a single point and the segments connecting it to (0,0) and (1,1). A particular cost model and/or accuracy measure defines an isocost gradient, which for a skew and cost insensitive model will be c=1, and hence another common approach is to choose a tangent point on the highest isocost line that touches the curve. The area under the simple trapezoid is: AUC = 1 – (fpr+fnr)/2.
but we present only the dichotomous formulation of Powers Informedness, as well as the complementary concept of Markedness. In fact, Bookmaker Informedness-based formulae may be averaged over all labels according to the label bias, and Markedness-based formulae over all classes by prevalence.
Definition 1 Informedness quantifies how informed a predictor is for the specified condition, and specifies the probability that a prediction is informed in relation to the condition (versus chance). Informedness
Markedness quantifies how marked a condition is for the specified predictor, and specifies the probability that a condition is marked by the predictor (versus chance). Markedness
= Precision + Inverse Precision – 1 = tpa-fna = 1-fpa-fna = (Precision−Prevalence) / (1-Bias)
(4)
These definitions are aligned with the psychological and linguistic uses of the terms condition and marker. The condition represents the experimental outcome we are trying to determine by indirect means. A marker or predictor (cf. biomarker or neuromarker) represents the indicator we are using to determine the outcome. There is no implication of causality, however there are two possible directions of implication. Detection of the predictor may reliably predict the outcome, with or without the occurrence of a specific outcome condition reliably evincing the predictor. In the Psychology literature, Markedness is known as DeltaP and is empirically a good (normative) predictor of human associative judgements – that is it seems we develop associative relationships between a predictor and an outcome when DeltaP is high, and this is true even when multiple predictors are in competition. Conversely a complementary, backward, additional measure of strength of association, DeltaP' aka Informedness has been proposed [5]. Note that we can also estimate significance and confidence [3]: χ2 CI
= N·Informedness·Markedness (5) = 1-|Informedness|/√[N-1]; CM = 1-|Markedness|/√[N-1]
REFERENCES
DeltaP, Informedness and Markedness
Powers [2] also derived an unbiased accuracy measure to avoid the bias of Recall, Precision and Accuracy due to population Prevalence and label bias. The Bookmaker algorithm costs wins and losses in the same way a fair bookmaker would set prices based on the odds. Powers then defines the concept of Informedness which represents the 'edge' a punter has in making his bet, as evidenced and quantified by his winnings. Fair pricing based on correct odds should be zero sum – that is, guessing will leave you with nothing in the long run, whilst a punter with certain knowledge will win every time. Informedness is the probability that a punter is making an informed bet and is explained in terms of the proportion of the time the edge works out versus ends up being pure guesswork. Powers defined ‘Bookmaker Informedness’ for the general, K-label, case,
(3)
Definition 2
[1]
2.4
= Recall + Inverse Recall – 1 = tpr-fpr = 1-fnr-fpr = 2AUC-1 = (Recall-Bias) / (1−Prevalence)
[2]
[3] [4]
[5]
Lafferty, J., McCallum, A. & Pereira, F. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the 18th International Conference on Machine Learning (ICML-2001), CA: Morgan Kaufmann, pp. 282-289. Powers, David M. W. (2003), Recall and Precision versus the Bookmaker, Proceedings of the International Conference on Cognitive Science (ICSC-2003), Sydney Australia, 2003, pp. 529-534. http://david.wardpowers.info/BM/index.htm accessed 22 December 2007 Powers, David M. W. (2007) Evaluation, Flinders InfoEng Tech Rept SIE07001 http://www.infoeng.flinders.edu.au/research/techreps/SIE07001.pdf Flach, PA. (2003). The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003, pp. 226-233. Perruchet, Pierre and Peereman, R. (2004). The exploitation of distributional information in syllable processing, Journal of Neurolinguistics 17:97−119.
6. Uncertainty and AI
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-847
847
Using Decision Trees as the Answer Networks in Temporal Difference-Networks Laura-Andreea Antanas1 , Kurt Driessens1 , Jan Ramon 1 and Tom Croonenborghs 2 1
Introduction
State representation for intelligent agents is a continuous challenge as the need for abstraction is unavoidable in large state spaces. Predictive representations offer one way to obtain state abstraction by replacing a state with a set of predictions about future interactions with the world. One such formalism is the Temporal-Difference Networks framework [2]. It splits the representation of knowledge in the question network and the answer network. The question network defines which questions (interactions) about future experience are of interest. It contains nodes, each corresponding to a single scalar prediction about a future observation given a certain sequence of interactions with the environment. The nodes are connected by links, annotated with action-labels, which represent temporal relationships between the predictions made by the nodes, conditioned on the action-labels on the links (more details in [2]). The answer network provides the predictive models to update the answers to the defined questions, which are expected values of the scalar quantities in the nodes. These values can be seen as estimates of probabilities. With each executed action of the agent, the predictions are updated using the answer network models to obtain a description of the new state. In classical TD-networks, logistic regression models are used, whose weight vector is obtained using a gradient learning approach. We propose the use of probability-valued decision trees [1] in the answer network of TD-Nets. We believe that decision trees are a particular good choice to investigate, as they offer a different yet powerful form of generalization. Moreover, this aids in a better understanding of the strengths and weaknesses of TD-Nets and represents an important first step towards using them in worlds with more extensive observations. Furthermore, decision tree induction can be regarded as a prototypical example of a non-gradient learning approach.
2
Decision Trees as Answer Networks in TD-Nets
The (abstracted) state representation in a TD-Net consists of (1) the predictions made by the TD-Net in the previous timestep yt−1 = (1) (n) [yt−1 , . . . , yt−1 ] (one prediction for each node) with n the number of nodes in the question network, (2) the action executed during the last time frame at−1 and (3) the current observation ot , i.e., xt = [yt−1 , at−1 , ot ]. From this vector, the answer network will compute new predictions yt = f (xt ). In the original implementation of TD-Nets, this answer function is represented by a logistic regression function, i.e., yt = f W (xt ) = σ(Wxt ). 1 2
Declarative Languages and Artificial Intelligence, Katholieke Universiteit Leuven, Leuven, Belgium, email: {laura,kurtd,janr}@cs.kuleuven.be Biosciences and Technology Department, KH Kempen University College, Geel, Belgium, email: tom.croonenborghs@khk.be
We investigate the use of probability trees [1] as an alternative to logistic regression for the answer network. The modification of the original TD-Nets framework consists solely in the introduction of a probability tree function f T as the answer network, instead of the logistic regression function f W . Both the semantics of TDNets as well as the temporal improvement learning principle, remains unchanged. The difference between the two approaches is that we choose to represent the predictive model with one tree for each node. In the original TD-nets implementation it is common to learn a set of weight vectors (aggregated in matrices), but one matrix for each
Figure 1. Dependancies between the different values used for the generation of learning examples. The structure of a learning example is shown by the shaded box, where zt is the target.
For example, if the actions taken at time t and t+1 are at and at+1 respectively, our version of TD(1) will generate the learning examples as a result of these interactions. If a node (n ) in the network is conditioned by actions at and at+1 , following this order in time, TD(1) will use the observation ot+2 as the target for the input vector [yt−1 , at−1 , ot ]. For another node (n ), conditioned only by action at , it will use [yt−1 , at−1 , ot ] as the input vector and ot+1 as target. In practice, we implement this approach by using a history about chosen actions and observation with a length equal to the maximum depth of the question network. We choose to use a TD(1) approach because it generates the most informative learning examples for the probability tree. At the start, the predictions y will be mostly noise. This means that both the input
L.-A. Antanas et al. / Using Decision Trees as the Answer Networks in Temporal Difference-Networks
vector as well as the target, when not based directly on an observation, will contain noise. TD(1) will avoid the second source of noise when possible. Experiments reported in [3] show that TD(1) gives the best learning performance for a logistic regression approach too. Incremental Tree Learning As stated before, we learn a single probability tree for each node in the question network. We will employ binary decision trees, where internal nodes in the tree test attribute-value combinations. The available decision tests are identified before the induction of the probability trees begins, using a language bias defined by the user of the system. It specifies the possible actions, ranges for prediction values and observations for the world that can be considered when building the tree. Since predictions from the question network nodes are needed while still learning the answer network, we need an incremental tree learning algorithm. The incremental tree induction algorithm we used is described in Algorithm 1. Algorithm 1 Incremental Tree Induction 1: initialize by creating a tree with a single (empty) leaf 2: for each learning example do 3: sort the example down the tree until it reaches a leaf 4: update the statistics in the leaf and store the example 5: if number of examples in leaf > window size then 6: remove oldest example in the leaf and update the statistics 7: end if 8: if a split is needed and # examples in leaf > min ex size then 9: generate an internal node using the indicated test 10: grow 2 new empty leafs 11: end if 12: end for Each leaf in the tree stores statistical information about the examples it contains. This allows the algorithm to compute standard deviations of the target-value for all subsets created by all the available tests. The splitting criterium checks if the examples in one leaf are sufficiently coherent with respect to their target value. A leaf is split when it contains enough examples for the statistics to be significant reliable: min ex size = 30. This algorithm is greedy and has no mechanism to undo early mistakes. Since in the TD-Network learning setting, both the inputs and the targets of early learning examples can be noisy, we employ a sliding window approach to forget early learning examples. In the experiments, we use a window size = 50.
3
Empirical Evaluation
We compare the original logistic regression approach with the probability trees approach. To this extend, we implemented our own version of the original TD(λ) learning algorithm as described in [3]. We perfomed experiments in two different environments: a ring world and a simple grid world. Experimental results are presented only for the 5-state deterministic ring world, as also used in [3]. Our ring world contains 5 interconnected circles. A circle indicates a state in the world. The agent has two different actions A={N,P}. N moves the agent to the adjacent state in clockwise rotation. P moves the agent in the counter-clockwise direction. The agent can only observe whether it is in state 1 or not, i.e. the observation bit is on (1) if the agent is in state 1 and off (0) otherwise. As in [3], we used symmetric action-conditional networks of depth 1, 2 and 3 as question networks. For the experiments with the classical TD-networks we used a learning rate α = 0.5, obtaining similar results to the ones presented in [3]. To compare the different learning algorithms, we used the root mean-squared error (RM SE) to determine the quality of the learned
models. This error at time t is calculated by comparing for each node i the correct target zi∗ with the one predicted by the learned model yti . The correct targets are computed using full knowledge of the environment. Hence, if the RM SE converges to 0, a correct answer network has been learned. The experiments are performed in an episode-based fashion. All experiments present the average RM SE as a function of the number of episodes over 10 different runs. 5-state ring world 0.35
FW D1 W F D2 FW D3 T F D1 T F D2 T F D3
0.3 0.25 RMSE
848
0.2 0.15 0.1 0.05 0 0
200
400
600
800
1000
1200
1400
Number of Learning Episodes
Figure 2.
RM SE-curves for the symmetric action-conditional networks.
Figure 2 shows the results for the ring world with question networks of depth 1, 2 and 3. f T always converges faster than f W for networks of equal depth. For networks of depth 1 it is not possible to provide a completely accurate answer network, as the question network is too small to represent the full environment, but the probability tree learner quickly learns the best approximation. As expected, networks with larger depth perform better. The results for the grid world also show a faster learning performance for the decision trees.
4
Conclusion
We introduced the use of probability trees as answer networks in TD Networks. We illustrated how to translate the standard TD(1) learning approach into a training example generator and evaluated the performance of a simple incremental and greedy tree induction algorithm. The experimental evaluation shows consistently that the learning performance of the probability trees outperforms the original logistic regression approach. We consider this an important step towards the wider applicability of TD Networkss. As we only regard this work as a proof of concept, a wide range of future work is possible. The current implementation of the tree induction algorithm could be significantly improved, for example by including tree restructuring operators or extending the learning algorithm to learn model-trees to combine the advantages of regression trees and logistic regression. Also, other non-parametric learners could be substituted for the probability tree learner. One exciting direction is the use of more elaborate observations, such as those for relational worlds. In this context decision trees offer the advantage of the more flexible parameterisation.
REFERENCES [1] D. Fierens, J. Ramon, H. Blockeel, and M. Bruynooghe, ‘A comparison of approaches for learning probability trees’, in Proceedings of the 16th European Conference on Machine Learning (ECML-05), pp. 556–563, (2005). [2] R.S. Sutton and B. Tanner, ‘Temporal-difference networks’, in Advances in Neural Information Processing Systems 17, pp. 1377–1384, (2004). [3] B. Tanner and R.S. Sutton, ‘Td(λ) networks: temporal-difference networks with eligibility traces’, in Proceedings of the 22nd international conference on Machine learning (ICML-05), pp. 888–895, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-849
849
An Efficient Deduction Mechanism for Expressive Comparative Preferences Languages Nic Wilson1 1
INTRODUCTION
2
COMPARATIVE PREFERENCE STATEMENTS
Recent years have seen a considerable literature develop in the Artificial Intelligence community on formalisms for reasoning with comparative preferences in combinatorial problems, involving statements which compactly express the relative preference of outcomes (complete assignments to a set of variables). A fundamental task for reasoning with preferences is the following: given input preference information from a user, and outcomes α and β, should we infer that the user will prefer α to β? For CP-nets and related comparative preference formalisms, inferring a preference of α over β using the standard definition of derived preference appears to be extremely hard, and has been proved to be PSPACE-complete in general for CP-nets [5]. Such inference is also rather conservative, only making the assumption of transitivity, and tends to lead to weak preferences with a great deal of incomparability. It is very often desirable to be able to fill out the user’s direct preferences in a plausible way by some kind of extrapolation, generating a fuller relation. This paper, generalising the approach in [9], defines a less conservative approach to inference which can be applied for very general forms of input. It is shown to be efficient for rather expressive comparative preference languages, allowing comparisons between arbitrary partial tuples (including complete assignments), and with the preferences being ceteris paribus or not. No acyclicity conditions are required regarding the input statements, and consistency (i.e., acyclicity of the preference relation) is not assumed. This paper is a short version of [10].
In this paper we will focus especially on comparative preference statements ϕ of the form p > q - T , where P , Q and T are subsets of V , and p ∈ P is an assignment to P , and q ∈ Q is an assignment to Q. (We can assume, without loss of generality, that P ∩ T = ∅ and Q ∩ T = ∅.) Informally, the statement p > q - T represents the following: p is preferred to q if T is held constant. Formally, the semantics of this statement is given by the relation ϕ∗ which is defined to be the set of pairs (α, β) of outcomes such that α extends p, and β extends q, and α and β agree on T : α(T ) = β(T ). Each pair (α, β) in ϕ∗ represents a preference (e.g., of a single user) for outcome α over outcome β. Many comparative preference statements may be elicited, toform a set Γ. This set Γ thus directly represents preferences Γ∗ = ϕ∈Γ ϕ∗ . We are particularly interested in such statements ϕ when P = Q. The statement can then be written as us > us - T , where U , S and T are disjoint sets of variables, and u ∈ U , and s and s are assignments to S which differ on each variable: s(Z) = s (Z) for all Z ∈ S. Ceteris paribus preferences [other values being equal] are represented by statements with T = V − (U ∪ S); this includes CP-nets [1, 2], TCP-nets [3, 4] statements, and a feature vector rules in [6]. A CP-theory [8, 7] statement u : x > x [W ] is exactly equivalent to statement us > us - T when we set S = {X}, x = s, x = s and T = V − (U ∪ {X} ∪ W ). A preference of outcome α over outcome β can be expressed by a statement of the form us > us - ∅ with S = V − U .
Terminology. Let V be a set of n variables. For each X ∈ V let X be the set of possible values of X; we assume Xhas at least two elements. For subset of variables A ⊆ V let A = X∈A X be the set of possible assignments to set of variables A. The assignment to the empty set of variables is written . An outcome is an element of V , i.e., an assignment to all the variables. If a ∈ A is an assignment to A, and b ∈ B, where A ∩ B = ∅, then we may write ab as the assignment to A ∪ B which combines a and b. For partial tuples a ∈ A and u ∈ U , we may write a |= u, or say a extends u, if A ⊇ U and a(U ) = u, i.e., a projected to U gives u. More generally, we say that a is compatible with u if there exists outcome α ∈ V extending both a and u, i.e., such that α(A) = a and α(U ) = u. This is if and only if u and a agree on common variables, i.e., u(A ∩ U ) = a(A ∩ U ). Otherwise, we say that a and u are incompatible.
Selection-projections. The computational technique described in this paper is efficient essentially if and only if one can efficiently compute a particular compound operation on the input comparative preference statements: the projection of a selection. Fortunately, this operation is efficient for a broad class of natural comparative preference statements. Let a be an assignment to set of variables A, and let Y be a set of variables disjoint with A. For relation R on the set of outcomes, define the a-selection Ra of R to consist of all pairs (α, β) in R such that both α and β extend a. We define, for Y ⊆ V , the projection R↓Y of R to be the set of all pairs (y, y ) ∈ Y × Y such that there exists tuples z and z with (yz, y z ) ∈ R. We write RaY for (Ra )↓Y , the projection to Y of the a-selection of R. We call this compound operation a selection-projection. Let y, y ∈ Y be assignments to Y . We have (y, y ) ∈ RaY if and only if there exist assignments z, z to V − (A ∪ Y ) such that (ayz, ay z ) ∈ R. We have the following important property:
1
Cork Constraint Computation Centre, Department of Computer Science, University College Cork, Cork, Ireland, n.wilson@4c.ucc.ie
Proposition 1 (Decomposition) For i in some index set I, let Ri be some relation on outcomes, and let R = i∈I Ri . Let a be an
850
N. Wilson / An Efficient Deduction Mechanism for Expressive Comparative Preferences Languages
assignment to set of variables A, and let Y be a set of variables disjoint with A. Then RaY = i∈I (Ri )Ya . For comparative preference statement ϕ and set of comparative ∗ Y Y preference statements Γ we abbreviate (ϕ )aY to ϕa and abbreviate ∗ Y Y Y (Γ )a to Γa . We thus have Γa = ϕ∈Γ ϕa . We are interested in sets Y whose associated product set Y is not large (so, small sets of variables whose domains are fairly small). Then the relations ΓYa are of manageable size, even though Γ∗ may very well be exponentially large. Proposition 2 Let P , Q and T be subsets of V , and let p ∈ P be an assignment to P , and q ∈ Q be an assignment to Q. Let ϕ be a comparative preference statement of the form p > q - T , as defined above, where p ∈ P , q ∈ Q and (P ∪ Q) ∩ T = ∅. Let a be an assignment to a set of variables A, and let Y be a set of variables disjoint from A. ϕYa is empty unless a is compatible with both p and q. If a is compatible with both p and q then ϕYa consists of all pairs (y, y ) such that (i) y and y agree on Y ∩ T , i.e., y(Y ∩ T ) = y (Y ∩ T ); (ii) y is compatible with p; and (iii) y is compatible with q. Each of these conditions can be checked in time at worst linear in n, the number of variables, and so the relation ϕYa can be computed in time linear in n, given that the size of Y is bounded by a constant. Proposition 2 therefore shows that computing selection-projections can be achieved efficiently for statements of the form p > q - T , and hence, by Proposition 1, for sets Γ of such statements.
3
Y-ENTAILMENT
This section defines a form of entailment, which we call Yentailment, which is polynomial for a wide range of comparative preference statements; in particular, statements of the form p > q - T as described in Section 2, or any other comparative statements for which computing selection-projections is polynomial. Throughout this section, we consider a fixed family Y of sets of variables, which parameterises the inference relation, and a fixed (and completely arbitary) input relation R on outcomes. We assume that Y satisfies the following property: if Y ∈ Y and non-empty Y is a subset of Y then Y ∈ Y. For example, Y might be defined to be all singleton subsets of V (i.e., sets with cardinality of one), or, alternatively, all subsets of cardinality at most two, and so on. We also consider a fixed comparative preference statement ψ of the form us > us - ∅, where U and S are disjoint sets of variables, and u ∈ U , and s and s are assignments to S which differ on each variable in S. Definition 1 (Pickable and Decisive) Given set Y ⊆ V and assignment a to some subset A of V − Y , we define Ya to be the transitive closure of RaY . Suppose that u is compatible with a ∈ A. Set of variables Y is said to be ψ-pickable given a if Y ∩ A = ∅ and either (i) Y ⊆ U and u(Y ) is not Ya -equivalent to any other assignment in Y ; or (ii) Y ⊆ U and there exists y, y ∈ Y with y Ya y and y is compatible with us, and y is compatible with us . In this case we say that Y is ψ-decisive given a. (y is said to be Ya -equivalent to y if both y Ya y and y Ya y.) The following algorithm defines Y-entailment: we say that R Yentails ψ if and only if the algorithm returns true.
procedure Does R Y-entail ψ? for j := 1, . . . , n let aj be u restricted to Y1 ∪ · · · ∪ Yj−1 (in particular, a1 = ); if there exists a set in Y which is ψ-decisive given aj then return false and stop; if there exists a set in Y which is ψ-pickable given aj then let Yj be any such set; else return true and stop; next j; return true.
Application to Deduction for Comparative Preference Statements Relation R will very often be exponentially large, and so will need to represented compactly, in particular as a set Γ of comparative preference statements (in some language), where Γ represents relation R = Γ∗ on outcomes. We infer ψ of the form us > us - ∅ from Γ if and only if Γ∗ Y-entails ψ. Applying the algorithm requires us to computeselection-projections of the form ΓYa , which we can compute as ϕ∈Γ ϕYa using Proposition 1. Complexity: Assume that the domain sizes are bounded above by a constant, and that the elements of Y have cardinality at most k, and so |Y| is less than nk . The algorithm is then O(mnk+1 ), where m = |Γ|. Semantics: In [10] it is shown how Y-entailment can be given a semantics. (In fact, Y-entailment is defined semantically there, with the correctness of the algorithm then being a theorem.) In the standard entailment, Γ entails ψ if and only if every total pre-order extending Γ∗ also extends ψ ∗ . (Equivalently, ψ ∗ is a subset of the transitive closure of Γ∗ .) In contrast, Γ Y-entails ψ if and only if every total pre-order of a particular generalised lexicographic form which extends Γ∗ also extends ψ ∗ . This implies that Y-entailment is more adventurous than the standard entailment.
REFERENCES [1] C. Boutilier, R. Brafman, H. Hoos, and D. Poole, ‘Reasoning with conditional ceteris paribus preference statements’, in Proceedings of UAI99, pp. 71–80, (1999). [2] C. Boutilier, R. I. Brafman, C. Domshlak, H. Hoos, and D. Poole, ‘CPnets: A tool for reasoning with conditional ceteris paribus preference statements’, Journal of Artificial Intelligence Research, 21, 135–191, (2004). [3] R. Brafman and C. Domshlak, ‘Introducing variable importance tradeoffs into CP-nets’, in Proceedings of UAI-02, pp. 69–76, (2002). [4] R. Brafman, C. Domshlak, and E. Shimony, ‘On graphical modeling of preference and importance’, Journal of Artificial Intelligence Research, 25, 389–424, (2006). [5] J. Goldsmith, J. Lang, M. Truszczy´nski, and N. Wilson, ‘The computational complexity of dominance and consistency in CP-nets’, in Proceedings of IJCAI-05, pp. 144 –149, (2005). [6] M. McGeachie and J. Doyle, ‘Utility functions for ceteris paribus preferences’, Computational Intelligence, 20(2), 158–217, (2004). [7] N. Wilson, ‘Consistency and constrained optimisation for conditional preferences’, in Proceedings of ECAI-04, pp. 888–892, (2004). [8] N. Wilson, ‘Extending CP-nets with stronger conditional preference statements’, in Proceedings of AAAI-04, pp. 735–741, (2004). [9] N. Wilson, ‘An efficient upper approximation for conditional preference’, in Proceedings of ECAI-06, pp. 472–476, (2006). [10] N. Wilson, ‘An efficient deduction mechanism for expressive comparative preferences languages’, Longer Version of Current Paper available at the 4C website: http://www.4c.ucc.ie/, (2008).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-851
851
An Analysis of Bayesian Network Model-Approximation Techniques Adamo Santana1 and Gregory Provan2 Department of Computer Science, University College Cork, Cork, Ireland Abstract. Two approaches have been used to perform approximate inference in Bayesian networks for which exact inference is infeasible: employing an approximation algorithm, or approximating the structure. In this article we compare two structure-approximation techniques, edge-deletion and approximate structure learning based on sub-sampling, in terms of relative accuracy and computational efficiency. Our empirical results indicate that edge-deletion techniques dominate the subsampling/induction strategy, in both accuracy and performance of generating the approximate network. We show, for several large Bayesian networks, how edge-deletion can create approximate networks with order-of-magnitude inference speedups and relatively little loss of accuracy.
1
Introduction
Bayesian networks (BNs) have become an important tool for modeling and probabilistic inference. As the size and complexity of BN models increase, so too do the demands of performing inference. In cases where exact inference is intractable, it is important to use approximation techniques to enable inference to take place. Such approximation may apply to the inference algorithms (e.g., stochastic sampling algorithms [2], or other approaches [3, 8]), or to the BN model B (e.g., edgereduction [1, 10], probability-table/state-space approximation approaches [5, 7]). In this article, we focus on generating a space-bounded, approximate model, in cases where we have limitations on the space for embedding a BN model. Our objective is the examine the tradeoff between space and performance of different approximations, i.e., given an approximate model B , what kinds of inference speedups do we obtain for what levels of inference accuracy, with respect to B? This goal contrasts with the objectives of previous network-approximation (e.g,. edge-deletion) approaches [1, 10], where the primary interest was deleting edges while remaining within a certain error bound. Our contributions are the following. First, we compare two BN-approximation approaches, one using BN thresholdbased sub-sampling and network induction, and the other using threshold-based edge deletion. We show that the sampling/induction approach is limited by the accuracy of the induction algorithm, and produces networks which are inferior to the edge-deletion approach, due to the network-induction. We also show that, on a range of networks, the edge-deletion 1 2
adamo@ufpa.br; Supported by CAPES CBE/PDEE 0005/2007. g.provan@cs.ucc.ie; Supported by SFI grant 04/IN3/I524.
approach can produce several orders-of-magnitude speedups in inference with small penalties in inference accuracy.
2
Technical Preliminaries
A BN model B is defined as a tuple (G, P), where G is a directed acyclic graph (DAG), and P is a set of probability distributions constructed from vertices V = {Vi } in G such n that P r{V} = i=1 P r{Vi |pa(Vi }, where pa(Vi ) are the parents of Vi in G. We compare two approaches, sub-sampling plus machine learning (SSML), and edge deletion (ED). SSML Approach: In SSML, we generate from B a training dataset T composed of 10,000 random samples, using the GeNIe tool [4]. We then used a sampling threshold φ to prune from T all cases for which P r(B) < φ, to create Tφ . For each value of φ examined, we induced an approximate network Bφ from Tφ using the constrained based PC-algorithm [9]. ED Approach: In ED, we generate from B an approximate network Bκ by pruning from B all those edges whose Kulback-Leibler (KL) divergence is below a threshold κ. The KL divergence [6] was chosen as the metric for indicating the importance of the dependence related to each edge of the network since it is one of the most widely used methods for measuring the distance between distributions. We adopt several metrics for the “quality” of an approximate network B with respect to B: the error on a test set is the difference in posterior probability averaged over the set (B ) Vt of target nodes; the complexity reduction factor, CT , CT (B) is the relative network complexity, based on using the maximum clique table size of B , CT (B ), as an inference complexity measure; and the network reduction factor, S(B , B), is a measure of the degree of isomorphism between B and B.
3
Experimental Analysis
We empirically compared the SSML and ED approaches to BN approximation using 7 benchmark networks: C17, Alarm, Hailfinder, Pignet, Barley, Munin and C250 (a circuit with 250 nodes and 500 arcs). In our experiments, we created networks based on sub-sampling thresholds of φ = e−10 , e−5 , 5e−10 , and KL thresholds κ = 0.1, 0.15 and 0.2. To test the error of each approximate network, we sampled to create a testing data set of 500 cases, such that we chose a set of “target” nodes whose posterior distributions we computed during testing. We computed several comparative measures, including the error rate for classification, the KL-divergence between
852
A. Santana and G. Provan / An Analysis of Bayesian Network Model-Approximation Techniques
Complexity Reduction Using SSML
Figure 2 displays the results of tradeoffs made over a range of KL threshold values using ED, showing that a significant reduction in relative inference complexity occurs, with little loss of accuracy. For example, our data indicate that for the C250 network, we have O(106 ) faster inference with > 90% accuracy; for Munin, we have O(105 ) faster inference with ∼ 80% accuracy. 1 Accuracy
the distributions and the maximum clique table size CT of the networks. Figure 1 shows that, whereas for ED the CT values never increase with increasing threshold κ (meaning that the network gets no computationally harder to evaluate), with SSML the CT values can increase with φ. This anomalous performance is due to the induction process, in which we cannot guarantee that the network structure learned will monotonically decrease in size and CT values with φ; in contrast, with ED, this is guaranteed as edge pruning occurs. This failure to guarantee that approximate networks will be computationally simpler with SSML means that it may not be possible to use this approach unless structure-based constraints can be applied during the induction phase.
Alarm
log(Complexity Reduction Factor)
C250
0
Pignet
100000
Munin
10000
0.2
0.4
0.6
0.8
1
Relative Inference Complexity
Barley
Figure 2.
Tradeoff curves for four larger networks using ED
C250
1000
4
100 10 1 0.1 0.01 0.001 0
0.0001
0.0002
0.0003
0.0004
0.0005
Network Reduction Factor
Complexity Reduction using ED C17 Alarm
1
Hailfinder
0.9 Complexity Reduction Factor
Pignet
0.7
0.5
Hailfinder
1000000
Barley
0.8
0.6
C17
10000000
Munin
0.9
Pignet
0.8
Barley
0.7
Munin
0.6
C250
0.4 0.3 0.2 0.1 0 0.05
0.1
0.15
0.2
Network Reduction Factor
Figure 1.
This paper compared two models for BN structure approximation, based on sub-sampling with network induction (SSML) and edge deletion (ED), to identify the types of tradeoff of inference-speedup and loss of accuracy possible with each approach. We showed that SSML cannot guarantee monotonically faster inference with increasing network approximations; this arises because the network structure induced from approximate data (sampled from the original network) has high variance. In contrast, with ED, the tradeoffs of accuracy for faster inference are guaranteed to be monotonic. We have showed, for several large BNs, how ED can create approximate networks with order-of-magnitude inference speedups with relatively little loss of accuracy.
REFERENCES
0.5
0
Conclusions
Comparison of SSML and ED for approximation of Network Complexity
The other major difference is the computational cost for the structure-approximation. The ED approach, since it uses the computing of divergences and pruning of edges with KL values, is proved very efficient. In contrast, the SSML has high computational cost, since it involves computing posteriors for the original network and inducing an approximate network for a training set; both expensive tasks for complex BNs. Since exact inference was computationally infeasible for these larger networks, we used several sampling-based inference algorithms [2], all of which generate 10,000 samples to ensure close convergence of the results to the exact value.
[1] A. Choi and A. Darwiche, ‘An Edge Deletion Semantics for Belief Propagation and Its Practical Impact on Approximation Quality’, Proceedings AAAI, 21(2), 1107, (2006). [2] B. D’Ambrosio, ‘Inference in Bayesian networks’, AI Magazine, 20(2), 21–36, (1999). [3] R. Dechter and I. Rish, ‘Mini-buckets: A general scheme for bounded inference’, J. of the ACM, 50(2), 107–153, (2003). [4] M.J. Druzdzel, ‘GeNIe: A development environment for graphical decision-analytic models’’, Proc. AMIA, (1999). [5] C.U. Kjaerulff, ‘Reduction of computation complexity in bayesian networks through removal of weak dependencies’, Proc. of 10th Conf. on UAI, (1994). [6] S. Kullback and R.A. Leibler, ‘On Information and Sufficiency’, Annals of Math. Stat., 22(1), 79–86, (1951). [7] C.L. Liu and M.P. Wellman, ‘Bounding probabilistic relationships in Bayesian networks using qualitative influences: methods and applications’, International Journal of Approximate Reasoning, 36(1), 31–73, (2004). [8] F.T. Ramos and F.G. Cozman, ‘Anytime anyspace probabilistic inference’, Int. J. of Approx. Reasoning, 38, 53–80, (2005). [9] P. Spirtes, C.N. Glymour, and R. Scheines, Causation, Prediction, and Search, MIT Press, 2000. [10] R.A. van Engelen, ‘Approximating Bayesian belief networks by arc removal’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(8), 916–920, (1997).
7. Distributed and Multi-Agents Systems
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-855
855
Verifying the Conformance of Agents with Multiparty Protocols Laura Giordano1 , Alberto Martelli2 di Informatica, Universit`a del Piemonte Orientale, Alessandria 2 Dipartimento di Informatica, Universit` a di Torino, Torino
1 Dipartimento
Abstract. The paper defines a notion of conformance of a set of k agents with a multiparty protocol with k roles, requiring the agents to be interoperable and to produce correct executions of the protocol. Conditions are introduced that enable each agent to be independently verified with respect to the protocol.
particular, includes commitments; (ii) a set of temporal constraints, which specify the wanted interactions. Protocols with nonterminating computations, modeling reactive services, can also be captured in this framework. We define multiparty protocol P with k roles, by separately specifying the behavior of all roles P1 , . . . , Pk in the protocol. Consider the following example.
1
Example 1 (Purchase protocol) We have three roles: the merchant (mr), the customer (ct) and the bank (bk). ct sends a request to mr; mr replies with an offer or by saying that the requested good is not available. If ct receives the offer, it may either accept the offer and send a payment request to bk, or refuse the offer. If ct accepts the offer, then mr delivers the goods. If ct requires bk to pay mr, bk sends the payment. ct can send the request for payment to bk even before he has received the goods.
Introduction
In an open environment, the interaction of agents is ruled by interaction protocols on which agents commonly agree. An important issue, in this regard, concerns agent conformance with the protocol. Although agent policy may somehow deviate from the behavior dictated by the protocol, in some cases we want, nevertheless, to regard the policy as being compatible with the protocol. In this paper, we define a notion of conformance of a set of agents A1 , . . . , Ak with a multiparty protocol P . This notion must assure that agents A1 , . . . , Ak interoperate and their interactions produce correct executions of the protocol. We introduce a notion of interoperability among a set of agents, which guarantees the agents to interact properly. More precisely, each agent can freely choose among its possible emissions without the computation getting stuck. Verifying conformance of a set of agents altogether, however, is not feasible in an open environment, as, in general, the internal behavior of all agents participating in a protocol is not known. The verification of each agent participating in the protocol must be done independently. To this purpose, we introduce a definition of conformance of a single agent Ai (playing role i) with the protocol P . We prove that a set of agents, independently conformant with the protocol P , are guaranteed to be interoperable and to produce correct executions of P .
2
Protocol Specification
The specification of interaction protocols we adopt is based on the Product version of Dynamic Linear Time Temporal Logic (DLTL) [5], a temporal logic which extends LTL by allowing the until operator to be indexed by programs in Propositional Dynamic Logic (PDL). The Product version of DLTL allows to capture the behavior of a network of sequential agents, which coordinate their activities by performing common actions together. In our proposal, the specification of agents and protocols is given in a temporal action theory [4], by means of temporal constraints, and the communication among agents is synchronous. Protocols are given a declarative specification consisting of: (i) the specification of communicative actions by means of their effects and preconditions on the social state which, in
The Purchase protocol P u can be specified by separately defining the protocols of the three participating agents: Pct , Pmr and Pbk . The role Pi in the protoocol is specified by a domain description Di , which is a pair (Πi , Ci ), where Πi is a set of formulas describing the effects actions, including action laws and causal laws (the action theory) of agent i, and Ci is a set of constraints that the executions of agent i must satisfy (including precondition laws). A social approach is adopted and an interaction protocol is specified by describing the effects of communicative actions on the social state, including agents commitments and permissions. The approach is a generalization of the one proposed in [4]. Given Di = (Πi , Ci ) as defined above, we let Pi = Πi ∧ Ci . Once the protocols Pct , Pmr and Pbk are defined, the specification P u of the Purchase protocol can be given as follows: P u = Pct ∧ Pmr ∧ Pbk . The runs of the protocol are defined to be the linear models of P u, namely, infinite linear sequences of worlds (propositional interpretations), each one reachable from the initial world by a finite sequence τ of actions. The runs of P u are all runs that can be obtained by interleaving the actions of the runs of Pct , Pmr and Pbk , while synchronizing on common actions. By projecting the runs of the protocol P u to the alphabets of the participating roles, we get runs of each role Pct , Pmr and Pbk . The i-th projection of a run σ of a protocol P is an infinite run σ|i of Pi .
3
Interoperability
Let A1 , . . . , Ak be a set of agents given through a logical specification, as the one introduced in Section 2. The executions of A1 , . . . , Ak are the runs of A1 ∧ . . . ∧ Ak , obtained by interleaving the executions of the Ai ’s. As the properties we will consider
856
L. Giordano and A. Martelli / Verifying the Conformance of Agents with Multiparty Protocols
in this paper regard only the sequence of communicative action exchanged between agents, in the following, we will consider runs as infinite sequences of actions, and disregard worlds. We want to define interoperability of A1 , . . . , Ak , so that that their interaction cannot get stuck, when each agent is free of choosing its emissions at each step. This requirement is stronger than simply requiring absence of deadlock. Let πi be the prefix of a run of Ai . To model the fact that Ai must be able to choose which action to execute after πi , we define a function choice(Ai , πi ), whose value is either a send action m(i, j), taken from the set {m1 (i, j1 ), . . . , mn (i, jn )} of all the actions Ai can execute after πi , or the value receive, if Ai can execute a receive action after πi . In the last case, Ai expects to receive a message from another agent. While we have assumed that agents can choose among the message they want to send, we have postulated that they cannot decide which message they will receive among those they are able to receive in a given state. This choice is left to the environment. Definition 1 We say that A1 . . . Ak are interoperable if, taken a function choice and a sequence π of actions such that, for each i, π|i is a prefix of a run of Ai , there exists a run σ of A1 . . . Ak with prefix πm(i, j), such that choice(Ai , π|i ) = m(i, j) and choice(Aj , π|j ) = receive, for some i and j. According to the above definition, any prefix obtained by the execution of A1 , . . . , Ak can be extended with a new action according to the choice function. In particular, each agent can chose which action he wants to execute at each stage of the computation and, eventually, he can execute such an action.
4
Conformance
to mr either before or after the message sendAccept is sent from ct to mr. Although ct and bk do not put constraints on the order in which they send the acceptance of the offer and the payment to mr, in the overall protocol P u they are forced to respect the constraint of the merchant. Only the runs in which sendAccept is executed before sendP ayment are accepted as runs of P u. Let us now consider an agent Amr , with the following behavior: either it receives a message sendAccept followed by a message sendP ayment, or receives a message sendP ayment followed by a message sendAccept. Although Amr respects the policy ”less emissions and more receptions”, it cannot be considered to be conformant with P u, as it may produce an execution in which sendP ayment comes before sendAccept. A similar example is discussed in [6], where the problem of conformance checking is analyzed for models of asynchronous message passing software. There a very restrictive policy is proposed to solve the problem, namely, the policy less emissions and same receptions. Here, we propose a definition of the conformance of an agent Ai with respect to the overall protocol P , rather than to its role Pi . Besides referring to the runs of a protocol P = P1 ∧. . .∧Pk , we need to refer to the runs obtained by executing an agent Ai in the context of the protocol P . Let: P [Ai ] = P1 ∧. . .∧Pi−1 ∧Ai ∧Pi+1 ∧. . .∧Pk . The definition of conformance we introduce below, on the one hand, requires (condition C1) that, for each agent Ai , P [Ai ] is interoperable. On the other hand, it requires that the executions of Ai are correct for both emissions and receptions (condition C2), and complete for receptions (condition C3), when Ai is interacting with other agents respecting the protocol P . Definition 3 An agent Ai is conformant with a protocol P = P1 ∧ . . . ∧ Pk when the following conditions are satisfied:
Given a protocol P = P1 ∧ . . . ∧ Pk with k roles, we define the conformance of a set of agents A1 , . . . , Ak with P , by requiring that (C1) INTEROPERABILITY - P [Ai ] interoperates the interaction of the agents cannot give rise to executions which are (C2) CORRECTNESS - all runs of P [Ai ] are runs of P not runs of P . Moreover, we require A1 , . . . , Ak to be interoperable. (C3) COMPLETENESS - whenever there are two runs, σP of P and σP [Ai ] of P [Ai ], such that π is a prefix of σP and of σP [Ai ] , if Definition 2 Agents A1 , . . . , Ak are conformant with P if: action m(j, i) is executed after the prefix π in σP , then there is a (a) A1 , . . . , Ak are interoperable, and run σP [Ai ] of P [Ai ] with prefix πm(j, i) (b) the executions of A1 , . . . , Ak are runs of the protocol P . We can prove the following result: In this section we want to introduce a notion of conformance of a Theorem 1 Let P = P1 ∧ . . . ∧ Pk be an interoperable protocol single agent Ai with the protocol P , so that the conformance of each and let, for each i = 1, . . . , k, the agent Ai be conformant with Ai , proved independently, guarantees the conformance of the overall protocol P according to Definition 3. Then agents A1 , . . . , Ak are set of agents A1 , . . . , Ak with P , according to Definition 2. conformant with P according to Definition 2. Most proposals in the literature rely on a notion of conformance based on the policy: less emissions and more receptions [2, 1, 3]. The problem of verifying the conformance of an agent with a proConsider, for instance, a customer agent Act whose behavior differs tocol can be solved by working on the B¨uchi automaton which can from that of the role “customer” of protocol P u as follows: whenbe extracted from the logical specification of the protocol. For the ever it receives an offer from the merchant, it always accepts it; after two-party case, automata based techniques have been studied in [3]. accepting the offer it expects to receive from Pmr either sendGoods or cancelDelivery. Although the behavior of Act and that of the corREFERENCES responding role of the protocol are different, we could consider however the agent to be conformant with the protocol, since the customer [1] M. Baldoni, C. Baroglio, A. Martelli, Patti. Verification of protocol conformance and agent interoperability. CLIMA’06, LNCS 3900, 265–283. can choose which messages to send, and thus it is not forced to send all the messages required by the protocol. Also, the agent can receive [2] L.Bordeaux, G.Sala¨un, D. Berardi, M.Mecella, When are two web-agents compatible, VLDB-TES 2004. more messages than those required by the protocol, since these re- [3] L. Giordano and A. Martelli. Verifying Agent Conformance with Protoceptions will never be executed. Unfortunately, this argument holds cols Specified in a Temporal Action Logic. In AI*IA 2007, LNAI 4733. [4] L. Giordano, A. Martelli, C. Schwind. Specifying and Verifying Interaconly for two-party protocols, as shown in the next example. Example 2 Consider protocol P u. Assume that mr has the requirement that he can receive the payment from bk only after he has received the acceptance of the offer from ct. According to the protocols of ct and bk the message sendP ayment can be sent from bk
tion Protocols in a Temporal Action Logic J. Applied Logic, 5(2007). [5] J.G. Henriksen and P.S. Thiagarajan A product Version of Dynamic Linear Time Temporal Logic. in CONCUR’97, LNCS 1243, 45–58, 1997. [6] S.K. Rajamani and J. Rehof. Conformance checking for models of asynchronous message passing software. In CAV’02, 166–179, Springer, 2002.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-857
857
Simulated Annealing for Coalition Formation Helena Keina¨ nen and Misa Keina¨ nen 1 INTRODUCTION We study coalition formation in characteristic function games (CFGs) [4, 5]. Consider a -person cooperative game where is the set of agents. A coalition is any non-empty subset of , i.e. such that . In CFGs a charassigns real values (worths) to acteristic function coalitions such that the function may be incomplete. A coalition structure is a partition of into mutually disjoint coalitions , we have , and in which, for all , . The value of a coalition structure is called social welfare, and it is defined as . Given together with a characteristic a set of agents , our aim is to find a coalition structure with function maximum social welfare. It is shown in [5] that finding a social welfare maximizing coalition structure is a -complete problem, and and . that the number of coalition structures is Motivated by the observations in [6, 7] that genetic algorithms provide a useful tool for searching the maximal sum of the values of coalitions, we show that simulated annealing (SA) [1, 3] provides also a very competitive approach to the problem. We observe that the SA algorithm with a suitable neighbourhood relation often finds better values or even the optimal coalition structures well before the state-of-the-art algorithms in [2, 4, 5].
2 SA FOR COALITION FORMATION Algorithm 1 shows our SA algorithm for optimizing the social welfare of a CFG. The algorithm takes a characteristic function for an -agent CFG as its input. Additional inputs are iteration , initial temperature , and the cooling ratio . is limit to keep track of the number of iterations. records the coalition structure with the highest social welfare among the ones seen. At of coalition structure each iteration a random neighbour solution is picked according to a specific neighbourhood . of the The search proceeds with an adjacent coalition structure , if yields a better social welfare original coalition structure . Otherwise, the search is continued with with probabilthan . The temperature decreases after each iteraity where . tion according to an annealing schedule The performances of SA algorithms are very sensitive to parameter adjustments as well as to neighbourhood selection. Given a set of agents together with a characteristic function , let denote the set of all coalition structures that can be formed. The neighbourhood is a function which maps coalition structures to the sets of their neighbour coalition structures. Helsinki University of Technology, Finland, helena.keinanen@tkk.fi Sampo Life Insurance Company Ltd. , Finland, misa.keinanen@gmail.com
We found out that the following two neighbourhoods are particularly appropriate for Alg. 1. Split/merge neighbourhood, in which if and only if can be obtained by either (i) splitting one coalition in into two disfrom joint coalitions in , or (ii) merging two distinct coalitions of into a single coalition in . Shift neighbourhood, in which if and only if can be obtained from by shifting exactly one agent from a coalition to another coalition. Algorithm 1 Inputs: c_max, t_init, alpha External: V() c = 0; t = t_init; CS = random initial coalition structure; CS_best = CS; while c < c_max do CS’ = random neighbour of CS in Neighbour(CS); if V(CS’) > V(CS) then CS = CS’; if V(CS)>V(CS_best) then CS_best=CS; else with probability eˆ((V(CS’)-V(CS))/t) CS = CS’; c = c+1; t = alpha*t; return CS_best;
3 EXPERIMENTAL RESULTS We have implemented the Alg. 1 in C, and evaluated its performance on CFG problems. As our benchmarks we use problems from [2, 4, 5, 6]. In the following we present experimental results considering in particular solution quality, robustness, and runtime performances of various algorihms. The Fig. 1 (left) shows a robustness comparison of split/merge and shift neighbourhoods for the SA algorithm on 300 randomly generated 10-agent CFG problem instances. For each problem instance, an incomplete characteristic function was generated to assign random coalition values between . An exhaustive search was first used to find social welfare maximizing coalition structures, and then the SA algorithm was used to find optimal solutions in the following way. For both neighbourhoods, we executed 11 runs on every proband lem instance with approximately optimal parameters . The runtime limit for each run was set to 100000 coalition structures. We plot the minimum execution times of 11 runs to find an optimal social welfare. The shift neighbourhood is much more robust than the split/merge. SA with shift neighbourhood is able to find the optimum solution in 298 of the 300 instances. In contrast, SA with the split/merge times out in 136 instances without finding an optimum solution. Tne SA with the shift neighbourhood is mostly able to find the optimum values with substantially fewer search steps than SA with the split/merge. Irrespective of the parameter variation the behaviour of the shift neighbourhood was superior. To compare the solution qualities of SA with the two different neighbourhoods, we investigate the behaviours on 100 randomly
858
H. Keinänen and M. Keinänen / Simulated Annealing for Coalition Formation
Minimum runtime to find optimal social welfare
Maximum social welfares
100000
13 ’data’ using 2:1 x 12
10000
shift (number of seen coalition structures)
11
10
shift (social welfare)
1000
100
9
8
7
6 10
5 ’data’ using 7:10 x 1
4 1
10
100 1000 split/merge (number of seen coalition structures)
10000
100000
4
5
6
7
8 9 split/merge (social welfare)
10
11
12
13
Figure 1. Comparisons of neighbourhood relations on random CFGs.
generated 20-agent CFG problem instances, again random coalition . Fig. 1 (right) shows the correlation between the soluvalues in tion quality of SA with the split/merge neighbourhood and SA with the shift neighbourhood. The plot illustrates the maximum social welfares found, measured from 11 runs per neighbourhood. The runtime limit was set to coalition structures, and we used the approximately optimal annealing schedule where and . These results clearly show that SA with the shift neighbourhood outperforms SA with the split/merge neighbourhood. We have also implemented in C algorithms presented in [2, 4, 5], and a random search on the graph induced by the neighbourhood relations. We compared the performances of SA, Random search and the anytime algorithms on a set of randomly generated 10-agent CFGs with coalition’s values picked randomly from a uniform dis. Fig. 2 shows the cumulative solution qualities over tribution runtime (measured as seen coalition structures), on a representative problem instance. The initial temperature for SA is and is fixed to 0.8. Both SA and Random search are run only once. The SA algorithm finds good solutions very quickly. The SA with the shift neighbourhood finds the optimum within short runtime, and
REFERENCES
Cumulative solution quality over run time 1
Relative solution quality
0.8
0.6
0.4
0.2
Anytime Random split/merge SA split/merge Random shift SA shift
0 1
10
100
1000
10000
100000
Run time (number of seen coalition structures)
Figure 2.
also SA with the split/merge neighbourhood climbs very close to the optimum. Random search with both neighbourhoods manages to find quickly relatively good solutions. However, as SA with the split/merge, Random search do not find any maximal social welfare. The anytime algorithm searches for a long time without finding good solutions, but then finally sees a coalition structure with maximal social welfare. Finally, we conducted further experiments on 100 random 20. For each problem inagent CFGs with coalition’s values in stance, we collected the minimal, median and maximal social welfares measured from 11 runs per algorithm. In these tests, the runtime limit for all algorithms was set to coalition structures. We used the SA with the shift neighbourhood, and the SA parameters were the approximately optimal and . The results are consistent with the results of the previous experiments. For all problem instances the SA algorithm substantially outperforms the anytime algorithms in [2, 4, 5]. Notably, every social welfare found with the anytime algorithms [2, 4, 5] is smaller than 2, whereas the SA always finds social welfares better than 9. The results with the SA provide an improvement in the order of a factor 5.
A comparison of SA, Random search and Anytime algorithms.
ˇ y, ‘Thermodynamical approach to the traveling salesman prob[1] V. Cern´ lem: An efficient simulation algorithm’, J. of Optimization Theory and Applications, 45, 41–51, (1985). [2] V.D. Dang and N.R. Jennings, ‘Generating coalition structures with finite bound from the optimal guarantees’, in Proc. 3rd Int. Conf. on Autonomous Agents and Multi-Agent Systems, 546–571, 2004. [3] S. Kirkpatrick, C.D. Gelatt, Jr. and M.P. Vecchi, ‘Optimization by simulated annealing’, Science, 220, 671–680, (1983). [4] K.S. Larson and T.W. Sandholm, ‘Anytime coalition structure generation: An average case study’, J. Expt. Theor. Artif. Intell., 12, 23–42, (2000). [5] T. Sandholm, K. Larson, M. Andersson, O. Shehory and F. Tohm´e, ‘Coalition structure generation with worst case guarantees’, Artificial Intelligence, 111, 209–239, (1999). [6] S. Sen and P.S. Dutta, ‘Searching for optimal coalition structures’, in Proc. 4th Int. Conf. on Multi-agent Systems, 286–292, 2000. [7] J. Yang and Z. Luo, ‘Coalition formation mechanism in multi-agent systems based on genetic algorithms’, Applied Soft Computing, 7, 561– 568, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-859
859
A Default Logic Based Framework for Argumentation Emanuel Santos 1 and Jo˜ao Pav˜ao Martins 2 Abstract. We extend the logic-based framework of Besnard and Hunter for Default Logic. We present structural sound results that provide a natural extension and introduce new concepts that enable us to characterize an argument based on its use of incomplete information.
1
Introduction
We extend the logic-based framework of Besnard and Hunter [1], for a non-monotonic logic, namely Default Logic (DL), in order to make possible the construction of arguments based on non-monotonic reasoning. We present structural sound results that provide a natural extension for the concepts defined in the framework by Besnard and Hunter and for the model-theoretic evaluation introduced by Hunter [4]. We also introduce the concepts of justificative and counterjustificative argument, which enable to characterize an argument based on the use of incomplete information to support its conclusions.
2
Definitions and Results
We define some basic concepts that are used to define the extended definition of argument. In DL [6], a default theory is a pair (R, Δ), composed of a set of default rules, R, and a set of closed wffs, Δ (Δ ⊂ LF OL ). We only consider default theories (R, Δ) such that Δ ⊂ LP L . Definition 1 Let (R, Δ) be a default theory. (R, Δ) is unique if it has only one extension, Ω, given by Ext1((R, Δ)). (R, Δ) is minimum if it is unique, with an extension Ω, and there is no unique default theory (R , Δ ) such that R ⊂ R, Δ ⊆ Δ and Ext1((R , Δ )) = Ω. Definition 2 Let (R, Δ) be a default theory. (R, Δ) α if (R, Δ) is unique, with an extension Ω, and α ∈ Ω. Theorem 1 (Monotonicity of ) Let (R, Δ) and (R , Δ ) be minimum default theories. If (R, Δ) α, R ⊆ R and Δ ⊆ Δ then (R , Δ ) α. Definition 3 Let (R, Δ) be default theory. (R, Δ) is minimum with respect to α if (R, Δ) α and there is no default theory (R , Δ ) such that R ⊂ R, Δ ⊆ Δ and (R , Δ ) α. We use α, β, γ, ... to denote formulae, Δ, Φ, Ψ, ... to denote sets of formulae, A, B, C, ... to denote arguments and R, T, S, ... to denote sets of default rules. Θ = (RΘ , ΔΘ ) denotes a default theory 1
2
Phd Student, Instituto Superior T´ecnico, Technical University of Lisbon, Portugal, email: esantos@ist.utl.pt. Supported by Fundac¸a˜ o para a Ciˆencia e Tecnologia under PhD grant SFRH/BD/27253/2006. Instituto Superior T´ecnico, Technical University of Lisbon, Portugal.
that represents an information repository from which arguments can be constructed and could have no extensions. We assume that for every sub-set of a repository Θ = (RΘ , ΔΘ ) there is a unique canonical enumeration φ1 , ..., φn , ϕ1 , ..., ϕl , ψ1 , ..., ψm , such that ΔΘ = {φ1 , ..., φn }, C(RΘ ) = {ϕ1 , ..., ϕl } and J(RΘ ) = {ψ1 , ..., ψm }3 . To enable the construction of arguments based on unknown information using DL, we extend the definition of argument presented by [1]. Definition 4 An argument is a pair (R, Δ), α such that: 1) (R, Δ) ⊥; 2) (R, Δ) is minimum wrt α; 3) There is no Δ ⊂ Δ such that (R, Δ ) α and (Ext1((R, Δ)) − Ext1((R, Δ )) ∩ J(R) = ∅. We say that (R, Δ), α is an argument for α, α the consequent (conclusion) of the argument and (R, Δ) the support of the argument. The use of minimum default theories enables us to easily extend the concept of “classical” argument to a non-monotonic context, using DL. We are able to construct arguments based on unknown information and easily extend all the other definitions and results presented in [1] and [4], which are mainly based on Theorem 1. Definition 5 An argument A = (R, Δ), α is a sub-argument of an argument B = (T, Ψ), β if R ⊆ T and Δ ⊆ Ψ. If also β % α, A is said to be more conservative than B. From Definition 4, we can construct arguments that depend on unknown information to derive its conclusion. In order to distinguish these arguments from the rest, the definitions of justificative and total justificative are introduced. Definition 6 An argument ∀β∈J(R) Δ % β.
(R, Δ), α
is
justificative
if
Definition 7 (Recursive) An argument (R, Δ), α is total justificative if ∀r∈R ∀β∈J(r) exists an argument (R , Δ ), β, such that R ⊆ R − {r} and Δ ⊆ Δ, which is justificative or total justificative. Given Definition 7 an argument is total justificative if it doesn’t “use” unknown information, through the justifications of its default rules, to derive its conclusion. Definition 8 An argument A = (R, Δ), α is justificative wrt β if exists a sub-argument of A total justificative with consequent β. 3
α : β1 , ..., βm For a set R of default rules and a default rule r = we define γ P (r) = α, J(r) = {β1 , ..., βm }, C(r) = γ and P (R), J(R) and C(R) as theirs respective unions.
860
E. Santos and J. Pavão Martins / A Default Logic Based Framework for Argumentation
Given Definition 8, an argument is justificative wrt β if it is possible to derive β without using some unknown information. In the following we extend the notion of undercut, introduced in [4], that is used to represent a counter-argument. Definition 9 An argument (R , Δ ), ¬(φ1 ∧ ... ∧ φn ∧ ϕ1 ∧ ... ∧ ϕl ) ∧ ¬ψ1 ∧ ... ∧ ¬ψm is an undercut of an argument (R, Δ), α if {φ1 , ..., φn } ⊆ Δ,{ψ1 , ..., ψm } ⊆ J(R) and {ϕ1 , ..., ϕl } ⊆ C(R).
Definition 15 [4] The recursive empathy (r.e) for an argument tree T with a beliefbase Γ, denoted as EP RΓ (T), is given by F e(Ar ) where Ar is the root of T. Example 1 Let Θ = (RΘ , ΔΘ ) be a repository such that ΔΘ = {α, β, π → ¬α, π, δ, ω, λ → ¬γ, λ, λ → ω, η} and RΘ = α:β : ω , η : ρ , δ : ¬β }. For this repository, we construct the { γ , δα β ¬β following argument tree T for γ:6
Definition 10 An undercut A = (R, Δ), α for B = (T, Ψ), γ is counter-justificative if there is no argument C, that is an undercut of A and a sub-argument of B, such that ∀β∈JustArg(C) α ¬β.4
α : β }, {β, α}), γ kV A = ({ γ O VVVVV kk5 V kkk δ : ¬β }, {δ}), ∧ ¬β D = ({ B = (∅, {π → ¬α, π}), ¬β hQQQ O O QQQC = (∅, {λ → ¬γ, λ}), Q Q η : ρ QQQ }, {η}), ∧ ¬¬β G = ({ E = ({ δ : ω }, {δ}), β α QQ
An undercut A for an argument B is counter-justificative (wrt B) if B can’t “defend” itself from the attack. This often happens when an undercut “attacks” a justification of the target argument which is not justificative. Definition 11 An argument A = (R, Δ), ¬(φ1 ∧ ... ∧ φn ∧ ϕ1 ∧ ... ∧ ϕl ) ∧ ¬ψj ∧ ... ∧ ¬ψk is an canonical undercut for (T, Ψ), α if there is no argument B = (R, Δ), ¬(φ1 ∧ ... ∧ φn ∧ ϕ1 ∧ ... ∧ ϕl ) ∧ ¬ψi ∧ ... ∧ ¬ψp less conservative than A, φ1 , ..., φn , ϕ1 , ..., ϕl , ψ1 , ..., ψm is a canonical enumeration of (T, Ψ), 0 ≤ j ≤ k ≤ m and 0 ≤ i ≤ p ≤ m. In order to represent a infinite set of “equivalent” arguments, we extend the definition of canonical undercut because of the existence of default rules justifications. Based on this concept we also extend the definition of argument tree presented in [1]: Definition 12 An argumentation tree for α is a tree where the nodes are arguments such that: 1) The root is an argument for α; 2) For no node (T, Ψ), β with ancestor nodes (T1 , Ψ1 ), β1 , ..., (Tn , Ψn ), βn such that Ψ ⊆ Ψ1 ∪ ... ∪ Ψn and T ⊆ T1 ∪ ... ∪ Tn ; 3) The children nodes of a node A consist of all canonical undercuts for A that obey 2. The evaluation of an argument is done through the comparison of its support and a consistent set of formulae, called beliefbase, which denotes the beliefs of the intended audience of the argument. We use the concept of degree of entailment (DE) introduced in [4] to extended the notions of empathy and recursive empathy. Definition 13 Let Γ be a beliefbase and A = (R, Δ), α an argument. The empathy for the argument A, EPΓ (A), is defined as EPΓ (A) = DE(Γ, Δ ∪ C(R) ∪ J(R)). A empathy value of one, between a beliefbase and an argument, means that the audience, represented by the beliefbase, completely agrees with that argument. A zero empathy means that the audience disagrees with that argument. Definition 14 Let T be an argument tree. The function Fe is defined for every node Ai of T in the following manner, where eAi = EPΓ (Ai ), aAi = M ax({−1} ∪ {F e(Aj ) | Aj ∈ ChildrenN CJ(T, Ai )}) and aAi = M ax({−1} ∪ {F e(Aj ) | Aj ∈ ChildrenCJ(T, Ai )}):5 8 −aAi if aAi > eAi > < −aAi if aAi eAi and aAi > 0 F e(Ai ) = 0 if aAi = eAi and aAi 0 > : eAi if aAi < eAi and aAi 0 4 5
JustArg(A) = {β | β ∈ J(R) and A is not justificative wrt β}. ChildrenCJ(T, Ai ) = {Aj | Aj ∈ Children(T, Ai ) and Aj is counter-justificative of Ai }. ChildrenN CJ(T, Ai ) = {Aj | Aj ∈ Children(T, Ai ) and there is no sub-argument of Ai that is undercut counter-justificative of Aj }.
F = ({ δ : ω }, {δ, λ → ω, λ}), α
For Γ = {¬λ, δ}, the arguments of T are evaluated below: Arguments A B C D
EPΓ 1/8 1/4 0 1/2
Fe 1/8 0 0 -1/8
Arguments E F G
EPΓ 1/4 0 1/8
Fe 1/4 0 1/8
Given these results we have that EP RΓ (T) = 1/8. This means that T has a low but positive recursive empathy for the beliefbase Γ. Theorem 2 (Extension) If every argument (R, Δ), α has R = ∅, Definitions 4, 5, 9, 11, 12, 13 and 14 are equivalent to the respective definitions introduced in [4].
3
Discussion, Conclusions and Future Work
The goal of this paper has been to extend the framework presented in [1] and some of the techniques introduced in [4] to a non-monotonic logic, namely DL. Theorem 2 shows that our framework is a extension of the framework of [1]. Besides providing an extension, we also introduced the concepts of justificative argument and counterjustificative undercut that allows us to differentiate arguments based on their use of unknown information through justifications. The choice of DL is justified by the desired to implement this framework on a system which uses this logic. The generalization of the concepts of information repository and beliefbase, the development of further evaluation techniques and the implementation of such a system in knowledge-based agents are subjects of ongoing research.
REFERENCES [1] P. Besnard and A. Hunter, ‘A logic-based theory of deductive arguments’, Artificial Intelligence, 128, 203–235, (2001). [2] P. Besnard and A. Hunter, ‘Practical first-order argumentation’, AAAI, (2005). [3] P. Dung, Kowalski R., and Toni F., ‘Dialectic proof procedures for assumption-based, admissible argumentation’, Artificial Inteligence, 170, 114–159, (2006). [4] A. Hunter, ‘Making argumentation more believable’, AAAI, (2004). [5] H. Prakken and G. Vreeswijk, ‘Logical systems for defeasible argumentation’, Handbook of Philosophical Logic, 1–87, (2002). [6] R. Reiter, ‘A logic for default reasoning’, Artificial Intelligence, 13, 81– 132, (1980). [7] E. Santos, ‘Argumentac¸a˜ o baseada em l´ogica n˜ao-mon´otona’, Simp´osio Doutoral de Inteligˆencia Artificial - EPIA, 125–135, (2007). 6
As a notation convenience, the symbol $ is used to denote the wff ¬(φ1 ∧ ... ∧ φn ∧ ϕ1 ∧ ... ∧ ϕl ) with respect to the canonical enumeration φ1 , ..., φn , ϕ1 , ..., ϕl , ψ1 , ..., ψm .
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-861
861
An Empirical Investigation of the Adversarial Activity Model Inon Zuckerman and Sarit Kraus 1 and Jeffrey S. Rosenschein 2 Abstract. Multiagent research provides an extensive literature on formal Belief-Desire-Intention (BDI) based models describing the notions of teamwork and cooperation, but adversarial and competitive relationships have received very little formal BDI treatment. Moreover, one of the main roles of such models is to serve as design guidelines for the creation of agents, and while there is work illustrating that role in cooperative interaction, there has been no empirical work done to validate competitive BDI models. In this work we use the Adversarial Activity model, a BDI-based model for bounded rational agents that are operating in a general zero-sum environment, as an architectural guideline for building bounded rational agents in two adversarial environments: the Connect-four game (a bilateral environment) and the Risk strategic board game (a multilateral environment). We carry out extensive simulations that illustrate the advantages and limitations of using this model as a design specification.
1
Introduction
Formal Belief-Desire-Intention (BDI) [1] based models of cooperation and teamwork have been extensively explored in multiagent worlds. They provide firm theoretical foundations and guidelines for the design of cooperative automated agents [4, 2]. However, as cooperation and teamwork led the research agenda, little work was done on providing BDI-based models for adversarial or competitive interactions that naturally occur in multiagent environments. The desire to adapt BDI-based models for competitive interactions comes from their successful implementation in teamwork domains [5] and the limitations of classical solutions in complex adversarial interactions. Recently, the Adversarial Activity (AA) model [6] was presented: a formal BDI-based model for bounded rational agents in zero-sum adversarial environments. Alongside the model were also presented several behavioral axioms that should be used when an agent finds itself in an Adversarial Activity. However, the discussion in [6] lacked empirical work to validate the advantages as well as the limitations of those behavioral axioms in adversarial domains. Our aim here is to fill that gap, demonstrate how the AA model can be used as a design specification, and investigate its usefulness in bounded rational agents. We will explore whether AA-based agents can outperform state of the art solutions in various adversarial environments.
2
Overview of the Adversarial Activity Model
The AA model provides the specification of capabilities and mental attitudes of an agent in an adversarial environment from a single adversarial agent’s perspective. The model describes both bilateral 1 2
Bar-Ilan University, Israel, email: {zukermi,sarit}@cs.biu.ac.il The Hebrew University, Israel, email: jeff@cs.huji.ac.il
and multilateral instantiations of zero-sum environments, in which all agents are adversarial (i.e., there are no cooperative or neutral agents). Alongside the model, there exist several behavioral axioms that the agent can follow: A1. Goal Achieving Axiom. This axiom is a simple and intuitive one, stating that if the agent can take an action that will achieve its main goal (or one of its subgoals), it should take it. A2. Preventive Act Axiom. This axiom relies on the fact that the interaction is zero-sum. It says that the agent might take actions that will prevent its adversary from taking future high beneficial actions, even if they do not explicitly advance the agent towards its goal. A3. Suboptimal Tactical Move Axiom. This axiom relies on the fact that the agent’s reasoning resources are bounded, as is the knowledge it has about its adversaries. In such cases the agent might decide to take actions that are suboptimal with respect to its limited search boundary, but they might prove to be highly beneficial actions in the future, depending on its adversaries reactions. A4. Profile Manipulation Axiom. This provides the ability to manipulate agents’ profiles (the knowledge one agent holds about the other), by taking actions such that the adversary’s reactions to them would reveal some of its profile information. A5. Alliance Formation Axiom This axiom allows the creation of temporary task groups when, during the interaction, several agents have some common interests that they wish to pursuit together. A6. Evaluation Maximization Axiom. In a case when all other axioms are inapplicable, the agent will proceed with the action that maximizes the heuristic value as computed in its evaluation function.
3
Empirical Evaluation
We will use two different experimental domains. The first one is the Connect-Four board game, which will allow us to evaluate the model in a bilateral interaction. The second domain is the well-known Risk strategic board game of world domination. The embedding of behavioral axioms into the agent design, in both domains, was done by providing new functions, one for each of the implemented axioms (denoted as AxiomN V alue(), where N is the number of the axiom in the model). These functions return a possible action if its relevant precondition holds. The preconditions are the required beliefs, as stated in the axiom formalizations, formulated according to the relevant domain. The resulting architecture resembles a rule-based system, where each function returns its value and the final selection among the potential actions is computed in a “Decide” function, whose role is to select among the actions (if there is more than a single possible action) and return its final decision.
3.1
A Bilateral Domain—Connect4
We built an experimental environment where computer agents play the connect-four game against one another, and we have control over
862
I. Zuckerman et al. / An Empirical Investigation of the Adversarial Activity Model
the search depth, reasoning time, and other variables. We built six different agents, each with a different evaluation function (H1-H6), ranging from a naive function to a reasonable function that can win when playing against an average human adversary. We had 12 different agents: 6 alpha-beta and 6 axiom-augmented agents, each using one of the evaluation functions. We staged a round-robin tournament among all agents, where each agent played with 3 different depth searches (3, 5, and 7) against all other agents and possible search depths. The tournament was played twice: once for the agents playing as the first player (yellow), and the other time for them playing as the second (red) player (i.e., 11 opponents * 3 own depth * 3 opponent depth * 2 disc colors = 198 games). The results of the tournament are summarized in Figure 1. The figure shows the percentage of winning games for each of the 12 agents, where the agent names are written as R 1 for regular agent using H1, and A 3 shows the results for axiom-embedded agents using H3. The results clearly indicate that all agents improved their performance following the integration of axioms. The agents with naive heuristics (A 1 and A 2) showed only a small improvement, which usually reflected additional wins over their “regular” versions (R 1 and R 2), while the mid-ranged functions (H4 and H5) showed the largest improvement, with additional wins over different agents that were not possible prior to the embedding of axioms. Overall, we see that the best two agents were A 4 and A 6, with a single win advantage for the A 6 player, which in turn led A 5 by 7 wins.
above list instead of their names). The worst agent was Angry (#1) with a 0.44% win percentage, while the best was KillBot (#10) with 32.54%. Looking at our agents, we can see that the basic heuristic agent (denoted as “He” and whose bar is colored in blue) managed to achieve only 11.79%, whereas its axiom-augmented version Ax (colored red on the graph) climbed all the way up to 26.84%, more than doubling the winning percentage of its regular version.
Figure 1. Connect-Four experiment results
Figure 3. Winning percentage with fixed opponents
3.2
A Multilateral Domain—Risk
Our next domain is a multilateral interaction in the form of the Risk board game. The game is a strategy board game that incorporates probabilistic elements and strategic reasoning in various forms. Risk is too complicated to solve using classical search methods.We used the Lux Delux3 environment which provides a large number of computer opponents implemented by different programmers and employing varying strategies. We chose to work with exactly the same subset of adversaries that was used in [3], which contains 12 adversaries of different difficulty levels (Easy, Medium, and Hard): (1) Angry (2) Stinky (3) Communist (4) Shaft (5) Yakool (6) Pixie (7) Cluster (8) Bosco (9) EvilPixie (10) KillBot (11) Que (12) Nefarious. The basic agent implementation and evaluation function were based on the one described in [3], as it proved to be a very successful evaluation function-based agent, which does not use expert knowledge about the strategic domain. The next step was to augment the original agent with the implementation of the adversarial axioms (we used continent ownership as a subgoal). Experiment 1: The game map was “Risk classic”, card values were set to “5, 5, 5, . . . ”, the continent bonus was constant, and starting position and initial army placement were randomized. Each game had 6 players, randomized from the set of 14 agents described above. Figure 2 shows results of running 1741 such games, with the winning percentage of each of the agents (we use the agent number from the 3
Downloadable from http://sillysoft.net/lux/.
Figure 2. Winning percentage on “Risk classic” map
Experiment 2: In the second experiment we compared the performance of both kinds of agents on randomly-generated world maps. The results show approximately the same improvement, from 9.16% with the regular heuristic agent, to a total of 21.36% with its axiomaugmented version. Experiment 3: We fixed a five-agent opponent set (agent 1 through 5), and ran a total of 2000 games on the classic map setting: 1000 games with agent He and the opponent set, and 1000 games with agent Ax and the opponent set. The results show that even when playing against very easy opponents, in which the regular heuristic agent led the group with a winning percentage of 31.8%, the integration of the axioms managed to lift the agent to an impressive winning percentage of 57.1%.
4
Conclusions
We have presented an empirical evaluation of the Adversarial Activity model for bounded rational agents in a zero-sum environment. Our results show that bounded-rational agents can improve their performance when their original architectures are augmented with the model’s behavioral axioms, even as their evaluation functions remained unchanged.
5
Acknowledgments
This work was supported in part by NSF under grant #IS0705587 and ISF under grant #1357/07. Sarit Kraus is also affiliated with UMIACS.
REFERENCES [1] Michael E. Bratman, Intention, Plans and Practical Reason, Harvard University Press, Cambridge, MA, 1987. [2] Barbara J. Grosz and Sarit Kraus, ‘Collaborative plans for complex group action’, AIJ, 86(2), 269–357, (1996). [3] Stefan J. Johansson and Fredrik Olsson, ‘Using multi-agent system technology in risk bots.’, in AIIDE, pp. 42–47, (2006). [4] H. J. Levesque, P. R. Cohen, and J. H. T. Nunes, ‘On acting together’, in Proc. of AAAI-90, pp. 94–99, Boston, MA, (1990). [5] M. Tambe, ‘Agent architectures for flexible, practical teamwork’, in National Conference on Artificial Intelligence (AAAI), (1997). [6] Inon Zuckerman, Sarit Kraus, Jeffrey S. Rosenschein, and Gal A. Kaminka, ‘An adversarial environment model for bounded rational agents in zero-sum interactions’, in AAMAS 2007, pp. 538–546, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-863
863
Addressing Temporal Aspects of Privacy-Related Norms Guillaume Piolle1 and Yves Demazeau2 Abstract. Agents interacting in open environments such as Internet are often in charge of personal information. In order to protect the privacy of human users, such agents have to be aware of the normative context regarding personal data protection (applicable laws and other regulations). These privacy-related norms usually refer to deadlines and durations. To represent these regulations, we introduce the Deontic Logic for Privacy; this logic represents privacy-related obligations while providing the required temporal expressiveness.
1
INTRODUCTION
Any personal agent designed to evolve in an environment like Internet and to assist a human user with her online activities should then be aware of privacy issues and regulations, in order to protect the user’s personal information. These regulations appear as laws, contracts, company policies, user requirements... Six dimensions have been identified that can be used to analyze regulations dealing with personal data protection [7, 6]. These are user information, user consent, data update, justification of data collection and usage, data retention and data forwarding. Many privacy-enhancing technologies, protocols and architectures try to address parts of the issue [4]. The Platform for Privacy Preferences (P3P), for instance, aims to deal with the first two dimensions, by providing websites with means to communicate on their privacy policies [9]. However, none is able to provide a cognitive agent with means to reason on the regulations themselves, so that it could adapt to the context of a transaction in a dynamic and autonomous fashion. In this paper, we propose a logic designed in order to specifically represent privacy-related regulations concerned with all six dimensions. This Deontic Logic for Privacy (DLP) is able to deal with obligations regarding personal data processing and its temporal organization. We explain why specific operators are needed to represent dated norms, we identify the requirements for expressing obligations with deadlines, we build such an operator on the base of existing propositions and we put it in the context of privacy norms.
2
THE DLP LOGIC
When dealing with privacy management, norms are often linked with notions of delays, deadlines, precedence between actions; an explicit representation of time would then provide valuable reasoning means. Much work has been done on temporal deontic logics in general [1], but to the best of our knowledge none of them deals with privacyrelated norms in a specific way. A prominent temporal feature of 1
Universit´e Joseph Fourier, Laboratoire d’Informatique de Grenoble, France, email: guillaume.piolle@imag.fr 2 CNRS, Laboratoire d’Informatique de Grenoble, France, email: yves.demazeau@imag.fr
privacy regulations is the notion of deadline. We will examine how existing proposals can be of use in privacy-based reasoning, but we must first introduce a common formalism to compare them. This is why, in the light of this background, we present here the DLP language, a temporal deontic logic able to represent specific privacyrelated norms, and in particular the deadlines associated to them. DLP is a language where the SDL obligation modality Ob is freely mixed with LTL operators. The well-formed formulae ϕ of the DLP language are defined as follows, where p is a proposition from a language LDLP to be specified later: ϕ = p | ϕ “ ∨ ” ϕ | “¬” ϕ | “Ob” ϕ |ϕ“U ”ϕ|ϕ“S ”ϕ;
(1)
We have chosen the U S temporal language (U and S being the strict versions of “until” and “since” connectives) for its expressiveness, but we will use the common abbreviations F , G, H, P . We also define U − and F − as the loose versions of U and F including the present. The X i operators, based on a “neXt” operator X and its counterpart in the past X −1 , can be used to travel step-wise along a time flow. The DLP logic is interpreted over bidimensional Kripkelike structures, where a world is defined by its history h (the linear flow of time it belongs) and a date ti in the time flow. The temporal accessibility relation relates a world in a history to its successor in the same history, and the deontic accessibility relation relates a world to all its acceptable deontic alternatives (in all histories).
3
OBLIGATIONS WITH DEADLINES
We have said that in order to express privacy-related norms, we need the notion of deadlines, to which obligations will be attached. Indeed, it is often argued that obligations without deadlines are void [3]: one can not fulfill them, and yet never be in violation of a norm (since one can always postpone and pretend the obligation will be fulfilled later). In order to deal with deadlines, we introduce specialized constants in our language, which we call dated propositions. They are noted {δi }i∈N , δi being true only at date ti . Our aim here is to build an operator Ob(ϕ, δ) expressing the obligation for ϕ to be true before the date represented by δ (i.e. before the propositional date δ becomes true). We have identified six requirements that an operator in our formalism should meet in order to bear the right meaning in privacy-related norms: 1. 2. 3. 4. 5.
Failed obligations should be dropped after the deadline; Violations should be made punctual, not persistent in time; Deadlines that are not dated propositions have no meaning; Obligations on ⊥ should be impossible to fulfill; Obligations on should be trivially respected;
864
G. Piolle and Y. Demazeau / Addressing Temporal Aspects of Privacy-Related Norms
6. It should be impossible to express obligations with past deadlines; 7. The operator must comply with the propagation principle [2], saying that an obligation must be maintained until the deadline is reached or the obligation is fulfilled; 8. The operator must comply with the monotony principle [2], saying that an obligation with a given deadline implies an obligation with a further deadline. Some work has been done already on obligations with deadlines; our first six requirements regard choice points already discussed by Dignum et al [5]. However, our conclusions slightly differs from theirs, for instance on the fact that they take violation as a state rather than as an event. In their own work, they introduces an operator that defines an obligation jointly with its violation. Because of their strictly temporal definition, dated obligations can then be derived whenever they seem to be respected, which is a significant drawback for us. From another point of view, it is not monotonic and deadlines with a value of can be defined, resulting in an immediate obligation. Brunel et al [2] extend a temporal deontic logic with explicit quantification on time, in order to reason on delays rather than on deadlines. For that reason, it cannot be directly expressed in DLP. Furthermore, it is not monotonic. The operator proposed by Demolombe et al [3], although not expressed in temporal deontic logic, can be translated. It satisfies some kind of semi-monotony, ensuring the property provided that the obligation is not violated. This key property makes it our best candidate. The operator matches most of our other requirements, but needs to be adapted to dated propositions in order to comply with the third and sixth points. We integrate these conditions to a DLP translation of the original proposition, and end up with the dated operator Ob(ϕ, δ) (2). The authors propose a persistent violation for their operator; we transform it into a punctual one (3) in order to match our second requirement. One can see that semi-monotony has a nice side-effect: it prevents us from deriving multiple violations for the same initial obligation, while still ensuring monotony if the obligation is fulfilled. def
Ob(ϕ, δ) =
j
F (δ ∧ G¬δ ∧ H¬δ) Ob(F − (ϕ ∧ F δ)) U − (ϕ ∨ δ)
viol(ϕ) = δ ∧ P (Ob(ϕ, δ) ∧ ¬ϕ U − δ) def
4
(2) (3)
APPLICATION TO PRIVACY NORMS
Our deontic and temporal formalism can be used to express privacyrelated norms by its application on a base language LDLP . LDLP is based on predicates related to the six dimensions of personal data protection mentionned in the introduction. Argument domains are finite or countable sets, so we end up with a countable set of propositional terms. LDLP includes for instance a predicate perform representing the actual process involving personal information, a predicate consent representing the user’s authorization, a predicate forget representing data deletion3 ... As an application, let us see how the an example regulation about data retention (One must not keep somebody else’s credit card number more than one week after a transaction) translates into DLP (4). It is an interesting example since it involves antecedence and a deadline. Formally, it says that whenever an agent A performs a process of type transaction to which is attached an information of type creditCardNum owned by an agent B (and not by agent A), if δ 3
Due to page limitations, we are not able to include the full specifications of LDLP here.
represents a date one week in the future, then there is a dated obligation that A should forget this information before the deadline δ. perform(A, ID) ∧ owner(ID, creditCardNum, B) ∧¬owner(ID, creditCardNum, A) ∧ actiontype(ID, transaction) ∧ X 7∗24 δ → Ob(forget(A, ID, creditCardNum), δ)
5
(4)
CONCLUSION AND FUTURE WORK
We have proposed the DLP language, based on temporal deontic logic, to represent privacy-related norms. DLP is expressive enough to represent obligations with deadlines, as well as other (more classical) temporal notions, in an acceptable way. DLP is based on a propositional language specifically oriented towards personal data processing. Some work remains to be done on this logic, including a better basis for the temporal operators of the language. Currently, it is based on the U, S logic, which is very general but somewhat too expressive. Indeed, we must then question inclusion of “since” in the logic, since we do not seem to need it, and it has already been argued that adding it to an until-based logic is not trivial from the point of view of complexity [8]. An automated procedure is to be proposed to generate DLP formulae on the basis of information exctracted from P3P policies [9]. DLP, along with these associated tools, are then to be integrated in a privacy-aware cognitive agent that should be able to model and reason on its privacy-related normative context.
ACKNOWLEDGEMENTS This research has been supported by the Rhˆone-Alpes region Web Intelligence project. We would also like to thank Andreas Herzig and Philippe Balbiani for their valuable comments on our work.
REFERENCES ˚ [1] Lennart Aqvist, ‘Combinations of tense and deontic modalities’, in 7th International Workshop on Deontic Logic in Computer Science (DEON 2004), eds., Alessio Lomuscio and Donald Nute, volume 3065 of LNCS, pp. 3–28, Madeira, Portugal, (2004). Springer. [2] Julien Brunel, Jean-Paul Bodeveix, and Mamoun Filali, ‘A state/event temporal deontic logic’, in Eighth International Workshop on Deontic Logic in Computer Science (DEON’06), number 4048 in LNCS, (2006). [3] Robert Demolombe, ‘Formalisation de l’obligation de faire avec d´elais’, in Troisi`emes journ´ees francophones des mod`eles formels de l’interaction (MFI’05), Caen, France, (2005). [4] Yves Deswarte and Carlos Aguilar-Melchor, ‘Current and future privacy enhancing technologies for the internet.’, Annales des T´el´ecommunications, 61(3-4), 399–417, (2006). [5] Frank Dignum, Jan Broersen, Virginia Dignum, and John-Jules Meyer, ‘Meeting the deadline: Why, when and how’, in Third International Workshop on Formal Approaches to Agent-Based Systems (FAABS’04), eds., Michael G. Hinchey, James L. Rash, Walt Truszkowski, and Christopher Rouff, pp. 30–40, (2004). Springer Verlag. [6] Guillaume Piolle and Yves Demazeau, ‘Une logique pour raisonner sur la protection des donn´ees personnelles’, in 16e congr`es francophone AFRIF-AFIA sur la Reconnaissance de Formes et l’Intelligence Artificielle (RFIA’08), Amiens, France, (2008). AFRIF-AFIA. [7] Guillaume Piolle, Yves Demazeau, and Jean Caelen, ‘Privacy management in user-centred multi-agent systems’, in 7th Annual International Workshop ”Engineering Societies in the Agents World” (ESAW 2006), eds., Gregory O’Hare, Michael O’Grady, Oguz Dikenelli, and Alessandro Ricci, pp. 354–367, Dublin, Ireland, (2006). Springer Verlag. [8] Mark Reynolds, ‘The complexity of the temporal logic with until over general linear time’, Journal of Computer and System Sciences, 66(2), 393–426, (2003). [9] World Wide Web Consortium. Platform for Privacy Preferences specification 1.1. http://www.w3.org/P3P/.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-865
865
Evaluation of global system state thanks to local phenomenona CONTET Jean-Michel and GECHTER Franck and GRUER Pablo and KOUKAM Abder 1 Abstract. This paper presents a new approach for the evaluation of a system’s global state properties. The approach is intented for the application to reactive multiagent system (RMAS) and adresses the evaluation of emergent properties such as global stabilisation. This approach is inspired by statistical physics and thermodynamics, as a way to link the microscopic and a macroscopic points of view. It gives an important role to partition function Z as defined in statistical physics. From this mathematical function can be extracted indicators that represent the global evaluation of the system state based on local phenomena. In this paper, the approach is put into practice by considering a classical reactive multiagent system: bird flocks simulation. The methodology was applied to analyze system stability. Experimental results obtained with a multiagent simulation platform are presented.
1
Introduction
Multiagent systems (MAS) can now be considered as a wide spread technique for the simulation of complex systems. They have been applied to a wide range of applications.In order to simulate complex systems, the reactive approach, in which interaction and emerging phenomena prevail in the definition of the agents themselves, is pertinent. It brings relevant properties such as adaptation skills, reliability or robustness to parameters change. The main drawbacks of the reactive approach is the lack of theoretical background in convergence proofs and emergence characterization. The goal of this article is to propose a method based on partition function Z [2] as it is defined in statistical physics. Statistical physics is generally considered to be one of the first scientific disciplines where statistical methods succeed to link the microscopic and the macroscopic points of view. From this mathematical function can be extracted indicators, the computation of which is based on local estimations, represent global measurements of the system state. This article is structured as follow. After a paragraph dealing with the related works found in literature, partition function is defined relatively to physics. Then, we explain through a simple physics inspired example how to apply partition function theory. Last part deals with the application of partition function to a classical multiagent model based on Reynolds Boids [7]. Finally, we conclude drawing some extends to the work presented.
2
Related Works
As stated in the introduction, one of the main problem in MAS is the evaluation of the accuracy/efficiency of the system relatively to the 1
University of Technology of Belfort-Montbeliard (UTBM), Systems and Transportation Laboratory (SET), Belfort, France, email:firstname.name@utbm.fr
task to perform and to the local mechanisms involved. Those evaluation methods can be classified in 3 categories : (i) Indicators tied to the application field [5] (ii) Indicators based on a global point of view on the system and on its topology [6] (iii) Global Indicators based on local estimation [4]. Solutions found in literature usually take inspiration from biology (fitness functions, etc), sociology (altruism, etc), agency theories (utility functions, etc) or physics (state functions, etc). Moreover, some of them are based on strong mathematical background such as [8] but seem to be hardly applicable to any kind of pratical MAS. Among these methods, the physics inspired solutions are the most widespread. For instance, entropy [1, 6] has been widely used in reactive MAS in particular in order to represent disorder/organisation in the system. Even if this measurement can be useful in many cases, it has two main drawbacks: it depends on the past transformations of the system and it is a global measurement that does not take into account local mechanisms of the system. In order to overcome these drawbacks, other approaches can be used. One generic solution is the computation of energy as a state function on both agent and system levels [3].
3
Evaluating global state properties
3.1
Description of the approach
The approach applies to reactive multiagent systems based on interaction models inspired by physics. The environment must be limited and the number of elements fixed. If the system respects these conditions, the following methodology can be applied: 1. 2. 3. 4. 5.
All interaction forces are computed. System Energy is computed from agents energy at every time step. Partition function Z is computed from system energy. Thermodynamic potential A is plotted in real time. Studying evolution of Helmholtz free energy A explains the system evolution and the time to equilibrium.
3.2
Application to classical reactive multiagent system
Flocking represents an approach to solve some kinds of problem such as spatial distribution. This is a model [7] for the coordinated motion of groups of entities called boids. Craig Reynolds [7] realized that the motion of a flock of birds could be modeled by applying three simple rules to be followed by each boid: Cohesion: steer to move toward the average position of local flockmates, Separation: steer to avoid crowding local flockmates alignment diagram, Alignment: steer towards the average heading of local flockmate’s cohesion diagram.
866
3.3
J.-M. Contet et al. / Evaluation of Global System State Thanks to Local Phenomenona
Interaction model
The environment is closed and the number of boids is fixed. Each bird corresponds to an agent. An agent only perceives flockmates inside it’s perception distance. The interaction model is based on the three forces defined before, with N Number of neighborhood agent, i distance between the agent and the neighborhood agent i and P R agent position. Cohesion = [ F
N i R i=1
Separation = F
N
agent ]−P
N i R i=1
Alignment = F
i -3 -R
N R˙ i i=1
3.4
i R
(2)
(3)
• Kinetic energy: In the following equation, the agent i is represented by its mass mi and its speed Vi . (4)
• Potential energy: it is computed, for agent i, using the classical expression of the energy U (U = δW +δQ) where δW represents the work done on the system and δQ the heat flow (here, δQ = 0 since no heat is dissipate). The work done on the system δW is expressed considering a conservative force (Cf. equation 5) with a unit vector in the direction of agent speed. du =F +F +F total .du C .du S .du A .du Ep = δW = F
3.5
A(T, V, Ni ) = −ln(Z) Z = e−βEi
Boids simulation
Conclusion
Reactive multiagent systems are becoming an important field of research within application domains characterized by distributed aspects. Particulary developement approaches for reactive MAS should include the posibility to evaluate the quality of ermergent phenomena and even, in some cases to the application objective. The aim of this article was to present a new conceptual frame for the evaluation of the global state of reactive MAS. This evolution is based on a local to global approach, inspired from statistic physics and thermodynamics. Statistical physics is generally considered to be one of the first scientific disciplines where statistical methods succeed to link the microscopic and the macroscopic points of view. In this work, we present an approach for the application of statistical physics to RMAS. A great attention has been given to the justifications and conditions of the use of statistical physics. This approach has been put into practice through a classical example: Boids floking. Simulation experiments have shown the relation between the indicator proposed in this paper and the system evolution. Additional research work is needed to extend the applicability of the approach to more complexe phenomena.
REFERENCES (5)
From now, each boid’s energy and allows to compute the partition function and the thermodynamic potential A, with T,V and Ni constant, Ei = Ek + Ep .
Figure 1.
4
According to the interaction model, the energy measurement can be detailed as follow :
1 mi Vi .Vi 2
Oscillation phase
(1)
System energy
EK =
Stability
(6)
Free energy A evolution during simulation
The simulations use run a group of 100 boids. Every simulation begins with a random boids dispersion environment. The simulation starts (Cf figure 1 top left) with the lower free energy A, because of the great agent dispersion. Then, following to the Reynolds model, boids form a moving group similar to a bird flocks. During this phase, the system tends to the stability. The free energy oscillations indicate that the system is not yet in a stable state. Finally, the boids formed a flock (Cf figure 1 top right) and the system is in equilibrium. Thus, the free energy tends toward a constant value representing system stability.
[1] Tucker Balch, ‘Hierarchic social entropy: an information theoretic measure of robot group diversity’, Autonomous Robots, 8(3), 209 – 237, (2000). [2] Roger Balian, From Microphysics to Macrophysics, Springer, 2007. [3] Jean Michel Contet, Franck Gechter, Pablo Gruer, and Abder Koukam, ‘Multiagent system model for vehicle platooning with merge and split capabilities’, Third International Conference on Autonomous Robots and Agents ICARA, 41–46, (2006). [4] Nicolas Gaud, Franck Gechter, St´ephane Galland, and Abderaf`ıaˆ a Koukam, ‘Holonic multiagent multilevel simulation : Application to real-time pedestrians simulation in urban environment’, Twentieth International Joint Conference on Artificial Intelligence, IJCAI’07, 1275– 1280, (2007). [5] Franck Gechter, Vincent Chevrier, and Franois. Charpillet, ‘A reactive agent-based problem-solving model : Application to localization and tracking’, ACM Transactions on Autonomous and Adaptive Systems, (Novembre 2006). [6] H. Van Dyke Parunak and Sven Brueckner, ‘Entropy and selforganization in multi-agent systems’, in AGENTS ’01: Proceedings of the fifth international conference on Autonomous agents, pp. 124–130. ACM, (2001). [7] Craig W. Reynolds, ‘Flocks, herds, and schools: A distributed behavioral model.’, Computer Graphics (ACM), 21(4), 25 – 34, (1987). [8] Daniel Yamins, ‘The emergence of global properties from local interactions’, volume 2006, pp. 1122 – 1124, Hakodate, Japan, (2006).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-867
867
Experience and Trust — A Systems-Theoretic Approach Norman Foo1 and Jochen Renz2 Abstract. An influential model of agent trust and experience is that of Jonker and Treur [Jonker and Treur 99]. In that model an agent uses its experience of the interactions of another agent to assess that agent’s trustworthiness. We showed that key properties of that model are subsumed by classical mathematical systems theory. Using the latter theory we also clarify the issue of when two experience sequences may be regarded as equivalent. An intuitive feature of the Jonker and Treur model is that experience sequence orderings are respected by functions that map such sequences to trust orderings. We raise a question about another intuitive property — that of continuity of these functions, viz. that they map experience sequences that resemble each other to trust values that also resemble each other. Using fundamental results in the relationship between partial orders and topologies we also showed that these two intutive properties are essentially equivalent.
1 INTRODUCTION In electronic internet trading systems like eBay an agent can rank other agents based on its assessment of the behavior of those agents in transactions. For an agent A observing another agent B over time (possibly even B’s interactions with agents other than A), such sequential assessments may be said to form A’s experience sequence of B, and result in its judgement of the trustworthiness of B. In an influential model of agent trust due to [Jonker and Treur 99], agents assess the quality of their interactions and map such experience sequences into a trust space. They required the experience sequence and trust spaces to be at least partially ordered, and the mapping to be order-preserving. They established properties of their model, including condtions for the updating of trust ranks that depend only on the existing rank and a new assessment of experience. In our paper we showed that the update and a number of other properties are in fact subsumed by classical mathematical systems theory. Space limitations restrict to merely outline our results, but a fuller version is in [Foo and Renz 07].
2 SYSTEMS-THEORETIC IMPLICATIONS We took the work of [Jonker and Treur 99] as a starting point accepting in particular the discrete time framework (modelled as the natural numbers) for all functions. A sequel to that work is that by Treur [Treur 07] on properties of states arising from it. We used systems theory to (i) connect established propositions with their work, (ii) showed constraints on trust structure imposed by experience structure, (iii) suggested a way to topologize these and other derivative 1
The School of Computer Science and Engineering, University of New South Wales, Sydney NSW 2052, Australia 2 Research School of Information Science and Engineering, Australian National University, Canberra ACT 0200, Australia
μ
ω1
ω2
F (ω1 , μ) F
F (ω2 , μ)
μ F
equal outputs
Figure 1.
Nerode Equivalence
structures, (iv) and showed that order-preservation of the map from experience sequences to trust is equivalent to its continuity in the topologies. Conceptually the system we consider is a black box that accepts experience sequences as inputs and produces trust sequences as outputs. A basic result from systems theory (see [Padulo and Arbib 74] and [Zeigler, et.al. 2000]) guarantees that this black box can be endowed with a state space of trust values iff the input-output function F representing it is causal, i.e, for any point k in time the trust output at k depends only on the initial segment of the input sequence. This subsumes a key result of [Jonker and Treur 99]. De¯ we noting the initial segment space of experience sequences by Ω, then showed that the canonical state space is in fact a quotient space ¯ with the quotient arising from an equiva(see [Kelley 55]) of Ω, lence relation known in systems theory as the Nerode equivalence (see [Padulo and Arbib 74]), denoted here by ≡N . Indeed, it follows ¯ ≡N . that the trust space can be most succinctly identified with Ω/ ¯ a semigroup using the concatenation To explain ≡N we first make Ω (denoted by ◦) of segments as the binary operation. Next, we use the ¯ to correinput-output function F to induce a function F¯ that maps Ω sponding length output segments. For any two input segments ω1 and ω2 we defined ω1 ≡N ω2 if for any arbitrary segment μ, F¯ (ω1 ◦ μ) agrees with F¯ (ω2 ◦μ) from the respective times when μ is appended. This is only well-defined if F is causal. See figure 1 for intuition. It is then intuitive that ω1 and ω2 cannot be distinguished once ¯ ≡N qualifies as a state of their end points are reached. Thus, Ω/ the system. It is a corollary of that result, known as the State Realization Theorem (see [Padulo and Arbib 74]), that there is an update function δ from inputs and current state to the next state as follows: δ([ω], e) = [ω ◦ eˆ] where eˆ is the unit length segment with value e. Figure 2 illustrates the main points. In the figure γ is the map that “reads” the trust and outputs it into the trust value space Vout , and η is the map induced by the combination of γ and the state update function δ. Also, F is the earlier defined map F¯ restricted to the
868
N. Foo and J. Renz / Experience and Trust — A Systems-Theoretic Approach
F
Ω
Vout γ
ψ
Ω/≡
Ω/≡ × Ω Figure 2.
η
Λ
State Realization – The Key Ideas
(value at the) end of its input segment. It can be shown that the state realization above, call it R, is in a strong sense the most economical among all possible state representations. Formally, it is said that this realization is canonical in that if there is another realization R that reproduces the same F¯ , then there is a unique homomorphism that maps R to R. In particular a typical assumption (see e.g., Treur [Treur 07]) that input (trust, etc.) sequences and system states are both viable primitives in formalizing temporal dynamics is subject to this canonical constraint.
3 EXPERIENCE AND TRUST ORDERINGS One example ordering considered by [Jonker and Treur 99] was worst < bad < neutral < good < best for experience values. They then used these to partially order, say experience sequences. We may ¯ above, and the trust space as well identify these sequences with Ω ¯ ≡N , and it can be partially ordered by, say, T . The T with Ω/ order-preservation postulate of [Jonker and Treur 99] then translates ¯ to T (= Ω/ ¯ ≡N ) in systems theory to the quotient map ψ from Ω defined by ψ(ω) = [ω]≡N to be also order-preserving. That is an intuitive requirement — good experiences should lead to good trust. If a measure of “nearness” is placed on experience sequences and trust values we may also desire the property that ψ maps near sequences to near trust. The formalization of this is the continuity of ¯ ψ. The most abstract way to do this is via topologies for both Ω ¯ ≡N . Fortunately, there is already much classical machinery and Ω/ ¯ [Kelley 55] to do this. We switch notation to the near synonyms of Ω ¯ ≡N (calling it T ) for brevity. If a topology τE (calling it E) and Ω/ is given to E, then since ψ is the quotient map a natural topology τT is induced by ψ that makes it both continuous and open. We then showed that under a simple topology — the Alexandrov topology (see [Arenas 99] or [Wiki Alexandrov]) — the two requirements above, viz., order-presevation and continuity, are equivalent. We now outline how this was done. There is a close connection between partial orders — in fact preorders will do — and topologies on a space. Given a partial order on a space S, the Alexandrov topology defined by it has as open sets the so-called up-sets, viz., subsets θ such that x ∈ θ and x z implies z ∈ θ of up-sets. Conversely, given a topology τ on a set S, the specialization pre-order ≤ is defined by x ≤ y iff y is in every open set that contains x. It is easily seen that ≤ so defined is indeed a pre-order. If we had started with some partial order and used it to define the Alexandrov topology as before, it is natural to ask what is the specialization order that arises from that topology. The answer is that we get back , and although there are other topologies (e.g. the Scott topology [Abramsky and Jung 94] or [Stoy 77]) that have this “reversal” property the Alexandrov topology is the finest one. In this way the partial order E defines the Alexandrov topology on
¯ (which in our context is identified with the input segment space Ω the space of experience sequences E) and is induced by it. Any topology that is placed on the trust space T will induce a specialization pre-order (partial orders are special cases). So what is a suitable topology for it? If we identify T with the range of ψ, i.e., ¯ ¯ Ω/≡ ¯ Ω/≡, then T is the quotient space of Ω. can thus be given the quotient topology. Experience values in the real interval [−1, 1] rather than finite or even discrete values may alter the character of the results and observations because the experience and trust spaces can now be infinite and continuous. Continous values lend themselves to measurements of nearness using the metrics well-known in functional analysis, and it is an obvious question whether the nexus of order-preservation between E and T and continuity of the map ψ still hold. Unfortunately, if the space is Hausdorff (which is the most familiar one), its corresponding Alexandrov topology reduces to the discrete topology which is trivial for convergence. Therefore the requirements for order-preservation and continuity are distinct.
4 CONCLUSION We used classical mathematical systems theory to underpin the foundations of an influential model of agent trust and experience. It was shown that many of the properties of that model follow from results in systems theory. Moreover, the latter provides deep insights into the structural interaction between experience and trust sequences, in particular what it means to say that trust is condensed experience. An intuitive feature of that model is that experience sequence orderings are respected by functions that map such sequences to trust orderings. We raised a question about another intuitive property — that of continuity of these functions, viz. that they map experience sequences that resemble each other to trust values that also resemble each other. Using fundamental results in the relationship between partial orders and topologies we showed that these two intutive properties are essentially equivalent.
REFERENCES [Abramsky and Jung 94] S. Abramsky and A. Jung: Domain theory. In S. Abramsky, D. M. Gabbay, T. S. E. Maibaum, (ed), Handbook of Logic in Computer Science, vol. III. Oxford University Press, 1994. [Wiki Alexandrov] See the Wikipedia entry on Alexandrov topology: http://en.wikipedia.org/wiki/Alexandrov topology. [Arenas 99] F.G. Arenas, Alexandroff spaces, Acta Math. Univ. Comenianae Vol. LXVIII, 1 (1999), pp. 17-25 [Foo and Renz 07] N. Foo and J. Renz. Experience and Trust: A SytemsTheoretic Approach. UNSW CSE Tech Report UNSW-CSE-TR-0717, ftp://ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/0717.pdf. [Jonker and Treur 99] C.M. Jonker and J. Treur. Formal Analysis of Models for the Dynamics of Trust based on Experiences. In: F.J. Garijo, M. Boman (eds.), Multi-Agent System Engineering, Proceedings of the 9th European Workshop on Modelling Autonomous Agents in a MultiAgent World, MAAMAW’99. Lecture Notes in AI, vol. 1647, Springer Verlag, Berlin, 1999, pp. 221-232. [Padulo and Arbib 74] L. Padulo and M.A. Arbib, System theory: a unified state-space approach to continuous and discrete systems, Philadelphia, Saunders, 1974. [Kelley 55] J.L. Kelley, General Topology, Springer Verlag, 1955 (reprinted). [Stoy 77] J.E. Stoy, Denotational Semantics: The Scott-Strachey Approach to Programming Language Semantics. MIT Press, Cambridge, Massachusetts, 1977. [Treur 07] J. Treur. Temporal Factorisation: Realisation of Mediating State Properties for Dynamics. Cognitive Systems Research Volume 8, Issue 2, June 2007, Pages 75-88. [Zeigler, et.al. 2000] B.P. Zeigler, H. Praehofer and T.G. Kim, Theory of Modeling and Simulation :integrating discrete event and continuous complex dynamic systems, 2nd ed, Academic Press, San Diego, 2000.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-869
869
Trust-Aided Acquisition Of Unverifiable Information Eugen Staab and Volker Fusenig and Thomas Engel Abstract. We propose a mechanism for the acquisition of information from potentially unreliable sources. Our mechanism addresses the case where the acquired information cannot be verified. The idea is to intersperse questions (“challenges”) for which the correct answers are known. By evaluating the answers to these challenges, probabilistic conclusions about the correctness of the unverifiable information can be drawn. Less challenges need to be used if an information provider has shown to be trustworthy. Our approach can resist collusion and shows great promise for various application scenarios such as grid-computing or peer-to-peer networks.
1
Introduction
Much research addresses trust that is based on direct experiences [8, 6]. These direct experiences result from evaluating the outcomes of interactions with other agents. Such an evaluation however is not possible when the outcome of an interaction is information that cannot be verified, or the verification is too costly. We give an example to illustrate this problem. Example 1 Assume, agent Alice needs to know the first 100 digits of π. However, Alice cannot compute these digits because she is either incapable to perform the necessary calculations on her own or she is out of resources. So Alice asks another agent Bob for these digits of π. Although Bob knows how to calculate them, he returns three correct digits followed by 97 random digits to Alice in order to save resources. Consequently, Alice, who cannot verify the information, utilizes the wrong digits of π in her further work. This will cause additional costs for her, and if she is not aware of them, she even does not classify the experience with Bob as a negative experience. To overcome this problem, an agent could ask several information providers and compare their answers; if the answers are not the same, they are discarded. However, this so called redundancy in computation, which is used for example in distributed computing (e.g. [1]), can fail in detecting collusion between several malicious information providers. Therefore, we propose an approach which allows for estimating the correctness of acquired information without using redundancy.
2
Mechanism for Information-Acquisition
In the following, we describe one run of the mechanism. An agent wants to get answers to m questions, the real requests. 1
University of gen.staab@uni.lu
Luxembourg,
Luxembourg,
email:
eu-
1
In our particular case, the agent will not be able to verify the corresponding answers, because he is incapable or does not want to spend resources on it. Instead, before sending the request, the agent adds n challenges for which the answers are known to him. These challenges are chosen in a way such that another agent is not able to easily distinguish them from the real requests – how this choice can be made depends on the concrete setting (see Sect. 4 for examples). The number of challenges n depends on the number of real requests m and on how trustworthy the selected information provider has shown to be: The more accurate the information acquired from him was in the past (the more trustworthy he seems to be), the less challenges are used. However, a minimal number of challenges is always retained to account for the first-time offender problem (see [7]). The agent randomly merges the m real requests and the n challenges into a vector of size m + n. This request-vector is then transferred to the information provider who is expected to reply with a response-vector of the same size. After having received the response-vector, the agent verifies the answers to the challenges and finds r correct and s incorrect answers, with r + s = n. The agent uses r and s as basis for the following three computations: 1. Estimate the error rate of the answers to the real requests: Probability theory can be applied here because the n challenges and m real requests were randomly distributed what is from a probabilistic point of view tantamount to picking randomly n samples out of the m + n answers. 2. Decide whether the answers to the real requests seem to be accurate enough: If the estimated error rate is too high, the information is requested again from other agents. 3. Assess (or reassess) the trustworthiness of the information provider, based on the actual and past response-vectors: The number of challenges, that is used for future requests to the same information provider, is decreased if he now is seen to be more trustworthy than before (and vice versa).
3
Discussion
In this section we want to discuss several issues concerning the practical use of the mechanism. Optimal number of challenges Whenever a requestvector is made up, it has to be decided how many challenges are used. The more challenges are used, the better the accuracy of the mentioned error rate estimate will be. At the same time the number of challenges should be kept small: One can assume that certain costs arise when generating a challenge,
870
E. Staab et al. / Trust-Aided Acquisition of Unverifiable Information
requesting the answer (the information provider may get some payment) and evaluating the answer. This optimization problem is subject to future work. In scenarios where a lack of resources is the only reason for not being able to verify acquired information, real requests can be declared as challenges after a response has been received. This has the advantage that challenges cannot be disclosed by the information provider (there are no challenges beforehand) and that less costs arise. Moreover, an optimal number of challenges can be determined during verification based on statistical considerations. This is also possible in scenarios where, for reasons of practicability, real requests and challenges are not bundled in a vector but distributed over time. Here, an agent can decide on the fly whether to intersperse more or less challenges. Collusion For the choice of challenges, two important rules have to be obeyed in order to avoid the possibility of collusion: 1. If a request is resent to another agent since it was not answered satisfactory, the same challenges are to be used. 2. For two requests for differing information, different challenges are to be used. The reader can easily verify that otherwise colluding agents would be able to identify real requests and challenges simply by comparing the different request-vectors and checking what has changed. Malicious providers A malicious information provider might try to answer all challenges correctly and to answer at the same time all real requests incorrectly. However, assuming that he cannot distinguish challenges from real requests, the only way for him to achieve this objective would be to guess the number and the positions of the challenges. Note here that our mechanism does not aim at distinguishing between malicious, incompetent or unmotivated information providers. Context-sensitivity The context-sensitivity of trust is important in certain fields of information acquisition. Agents may be competent in some areas (“what is the prime factorization of 12345?”) and incompetent in others (“will it rain today?”). For making probability theory applicable, it is necessary to choose all questions in one request-vector from the same context. Apart from that the approach to contextsensitive trust by Reh´ ak and Pechoucek [7] seems suitable for a combination with our work. Alternatively, techniques such as Latent Semantic Indexing (LSI) [4] or Concept Indexing [5] could be used. These techniques would allow to define the context-space on the basis of acquired natural-language text.
4
Application Scenarios
As motivated by Ex. 1, where Alice requested some digits of π, the mechanism is intended for cases where calculations are outsourced and the results cannot be verified. Thus, our mechanism can be applied to the scenarios of grid-computing [2] or cloud-computing [10]. In these cases, challenges can either be provided by trusted nodes or be computed whenever the system of the requesting agent is idle. In Wireless Ad Hoc Networks [9], exchanged routing information can only be verified by trial and error. Our mechanism
would help to detect incorrect routing-information provided by malicious or incompetent nodes without testing the route. The challenges can be chosen to be questions about routes that are known to exist (e.g. because packets have been sent over these routes in the recent past). In peer-to-peer networks, our mechanism can be used against pollution and poisoning attacks (see [3]). Challenges would consist of requests for files that already have been verified by a human to match their description and to be free of “bad chunks”. Note that in these settings, a small number of challenges for a given number of real requests would be essential for the practicability of the mechanism. The verification of a partly downloaded response to the challenge should start as soon as a certain amount of packets is received.
5
Conclusion & Future Work
We presented a mechanism for information acquisition that can be used in cases where the acquired information cannot be verified. The mechanism mixes the real requests with some challenges in a random fashion. This way, an agent can use the evaluated answers to the challenges in order to probabilistically estimate the error rate of the unverifiable answers to the real requests. Currently, we are working out the mechanism by focusing on the following four issues: How to estimate the error rate for the real requests, how to find an “optimal” number of challenges for a given number of real requests, how to use trust to reduce the number of challenges and how to decide whether some acquired information is accurate enough.
REFERENCES [1] David P. Anderson, Jeff Cobb, Eric Korpela, Matt Lebofsky, and Dan Werthimer, ‘SETI@home: an experiment in publicresource computing’, Commun. ACM, 45(11), 56–61, (2002). [2] Fran Berman, Geoffrey Fox, and Anthony J. G. Hey, Grid Computing: Making the Global Infrastructure a Reality, John Wiley & Sons, Inc., New York, NY, USA, 2003. [3] Nicolas Christin, Andreas S. Weigend, and John Chuang, ‘Content availability, pollution and poisoning in file sharing peer-to-peer networks’, in EC ’05: Proc. of the 6th ACM Conf. on Electronic commerce, pp. 68–77. ACM, (2005). [4] S. T. Dumais, G. W. Furnas, T. K. Landauer, S. Deerwester, and R. Harshman, ‘Using latent semantic analysis to improve access to textual information’, in CHI ’88: Proc. of the SIGCHI Conf. on Human Factors in Computing Systems, pp. 281–285, New York, NY, USA, (1988). ACM. [5] George Karypis and Euihong Han, ‘Concept indexing: A fast dimensionality reduction algorithm with applications to document retrieval and categorization’, Technical Report TR-000016, University of Minnesota, (2000). [6] Sarvapali D. Ramchurn, T. D. Huynh, and Nicholas R. Jennings, ‘Trust in multi-agent systems’, Knowl. Eng. Rev., 19(1), 1–25, (2004). [7] Martin Reh´ ak and Michal Pechoucek, ‘Trust modeling with context representation and generalized identities’, in CIA ’07: Proc. of the 11th Int. Workshop on Cooperative Information Agents, pp. 298–312. Springer Verlag, (2007). [8] Jordi Sabater and Carles Sierra, ‘Review on computational trust and reputation models’, Artif. Intell. Rev., 24(1), 33– 60, (2005). [9] C.-K. Toh, Ad Hoc Wireless Networks: Protocols and Systems, Prentice Hall PTR, Upper Saddle River, NJ, USA, 2001. [10] Aaron Weiss, ‘Computing in the clouds’, netWorker, 11(4), 16–25, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-871
871
BIDFLOW: a New Graph-Based Bidding Language for Combinatorial Auctions Madalina Croitoru1 and Cornelius Croitoru2 and Paul Lewis3 Abstract. In this paper we introduce a new graph based bidding language for combinatorial auctions. In our language, each bidder submits to the arbitrator a generalized flow network (netbid) representing her bids. The interpretation of the winner determination problem as an aggregation of individual preferences represented as flowbids allows building an aggregate netbid for its representation. Labelling the nodes with appropriate procedural functions considerably improves upon the expressivity of previous bidding languages.
1
Introduction
A Combinatorial Auction (CA) is an abstraction of a marked-based centralized distributed system for the determination of welfare allocations of heterogenous indivisible resources. In such a Resource Allocation (RA) system, there is central node a, the auctioneer, and a set of n nodes, I = {1, . . . , n}, the bidders, which concurrently demand bundles of resources from a common set of available resources, R = {r1 , . . . , rm }, held by the auctioneer. The auctioneer broadcasts R to all n bidders, asking them to submit in a specified common language, the bidding language, their R-valuations over bundles of resources. Bidder’s i R-valuation, vi , is a non-negative real function on P(R), expressing for each bundle S ⊆ R the individual interest (value), vi (S), of bidder i in obtaining S. It is assumed that vi (∅) = 0, and vi (S) ≤ vi (T ) whenever S ⊆ T . No bidder i knows the valuation of any other n−1 bidders, but all the participants in the system agreed on a welfare outcome: Based on bidders Rvaluations, the auctioneer will determine a resources allocation O = (O1 , . . . , On ), specifying for each bidder i her obtained bundle Oi . O is a (weak) n-partition of R, that is Oi ∩ Oj = ∅, for any different bidders i and j, and ∪i=1,n Oi =R. Furthermore, the global (social) value of the outcome va (O) = j=1,n vj (Oj ) is a maximum value allocation, that is va (O) = max{va (O )|O is a n-partition of R}. The task of the auctioneer to find a maximum value allocation for a given set of bidder valuations is called the Winner Determination Problem (WDP). This is a NP-hard problem, being equivalent to weighted set-packing([6]). WDP is expressed as an integer linear program and solved using standard methods. WDP can be parameterized by the set R of resources, considering a fixed set I of bidders and bidders R-valuations {vi |i ∈ I}. Therefore we can write W DP (R) and its corresponding maximum value va (R). With these notations, W DP (S) and va (S) are well defined for each subset S ⊆ R (by considering the restriction of vi to P(S)). We have obtained a global R-valuation va assigning to each bundle S ⊆ R the maximum value 1 2 3
University of Southampton, UK; mc3@ecs.soton.ac.uk; work supported by OpenKnowledge STREP project IST-FP11V341. Al. I. Cuza University, Iasi, Romania; croitoru@infoiasi.ro University of Southampton,UK; phl@ecs.soton.ac.uk; work supported by OpenKnowledge STREP project IST-FP11V341.
of an S-allocation to the bidders from I. Therefore WDP can be viewed as the problem of constructing a social aggregation of Rvaluations of bidders. If we denote by V(R) the set of all R-valuations, it is natural to consider in our RA system the set of superadditive R-valuations due to the synergies among the resources: SV(R) = {v ∈ V(R)| v(A1 ∪ A2 ) ≥ v(A1 ) + v(A2 ) for all A1 , A2 ⊆ R, A1 ∩ A2 = ∅}. It is easy to see that if all vi ∈ I are superadditive then va is superadditive and the following theorem holds: Theorem 1 If all bidders R-valuations are superadditive, then the aggregate R-valuation va satisfies va (A) = maxB⊆A [va (B) + va (A − B)] for all A ⊆ R. Let v ∈ V(R). A v -basis is any B ⊆ P(R) such that for each A ⊆ R we have v(A) = maxB∈B,B⊆A [v(B) + v(A − B)]. In other words, if B is a v-basis, then the value of v(A) is uniquely determined by the values of v on the elements of the basis contained in A, for each A ⊆ R. The elements of a v-basis, B ∈ B, are called bundles and the pairs (B, v(B))B∈B are called bids. It is not difficult to prove that a R-valuation v ∈ V(R) has a v-basis iff v ∈ SV(R) ([5]) and furthermore, the following representational theorem holds: Theorem 2 If in a RA system the bidder superadditive R-valuations vi are represented using vi -basis Bi for each i ∈ I, then the aggregate R-valuation va is represented by the va -basis Ba = ∪i∈I Bi , by taking va (B) = max{vi (B)|i ∈ I and B ∈ Bi }, for all B ∈ Ba .
2
Approach
In the new language, each bidder submits to the arbitrator a generalized flow network called NETBID, which represents the valuation of the bidder, by specifying a basis for it. More precisely, if the set of resources is R = {r1 , r2 , . . . , rm }, then in the NETBID of each agent there is a special starting node s connected to all nodes rj by directed edges with capacity 1. An integer flow in NETBID will represent an assignment of resources to the agent by considering the set of resources rj with flow value 1 on the directed edge (s, rj ). The node rj is an usual node, i.e. it satisfies the conservation law: the total (sum) of incoming flows equals the total flow of outgoings flows. In the network there are also bundle nodes which doesn’t satisfy the conservation law, which are used to combine (via their inputs flows) different goods in subset of goods. The combination is conducted by the (integer) directed edges flows together with appropriate lower and capacity bounds. Once the NETBID constructed, any maximum value flow (in the sense described bellow) will represent the valuation function of the agent. For example, the NETBID in Figure 1 expresses that the bidder is interested in a bundle consisting in two or three resources of type E , together with the resource M which adds 10 to the values sum of particular resources of type E. Formally a NETBID, the bidflows and their values are defined as follows:
872
M. Croitoru et al. / BIDFLOW: A New Graph-Based Bidding Language for Combinatorial Auctions
E
1
1 0,1 E 0,1 0,1 s
2
2 E
0,1
3
0,1
E
3
2,3 sum
t +10
4
4 0,1 E
1,1 5
5 M
interior node and this last node linked to a new superbundle node by a directed edge having as lower bound 1 and capacity k. Clearly, any valuation represented in a XOR language can be obtained in such way and any R-valuation can be represented [5, 3]. The NETBIDS submitted by the bidders are merged by the arbitrator in a common NETBID sharing only the nodes corresponding to s and R, and also a common t node in which are projected the corresponding t nodes of the individual NETBIDS. This common NETBID, Na , is a symbolic representation of the aggregate valuation of the society and is illustrated in Figure 2 bellow. NETBID
0,1
Figure 1. The NETBID for an example from [2]
r
1
r
2
1
0,1 0,1
Definition 1 A R-NETBID is a tuple N = (D,s,t, c, l, λ), where: 1. D = (V, E) is a digraph with two distinguished nodes s, t ∈ V ; the other nodes, V − {s, t}, are partitioned R ∪ B ∪ I: R is the set of resources nodes, B is the set of bundles nodes and I is the set of interior nodes. There is a directed edge (s, r) ∈ E for each r ∈ R, and also (b, t) ∈ E, ∀b ∈ B. There are no other directed edges entering in a resource node or leaving a bundle node. 2. c, l are nonnegative integer partial functions defined on the set of edges of D; if (i, j) ∈ E and c is defined on (i, j) then c((i, j)) ∈ Z+ , denoted cij , is the capacity of edge (i, j); l((i, j)) ∈ Z+ , if defined, is the lower bound on the edge (i, j) and is denoted lij ; if (i, j) has assigned a capacity and a lower bound then lij ≤ cij . All edges (s, r) have csr = 1 and lsr = 0. No edge (b, t) has capacity and lower bound. 3. λ is a labelling function on V − {s, t} which assign to a vertex v a pair of rules (λ1 (v), λ2 (v)) (described in the next definitions). Definition 2 Let N = (D, s, t, c, l, λ) be a R-NETBID. A bidflow in N is a function f : E → Z+ such that (fij denotes f ((i, j))): 1. For each directed edge (i, j) ∈ E: if fij > 0 and cij is defined, then fij ≤ cij ; if fij > 0 and lij is defined, then fij ≥ lij . 2. If v ∈ V − {s, t} has λ1 (v) = conservation then (i,v)∈E(D) fiv = (v,i)∈E(D) fvi . 3. For each v ∈ B, fvt ∈ {0, 1}; fvt = 1 if and only if for each w ∈ R ∪ I, such that (w, v) ∈ E, we have fwv > 0. The set of all bidflows in N is denoted by F N . In order to simplify our presentation we have considered here that for each v ∈ V − {s, t}, λ1 (v) ∈ {conservation, bundle} giving rise to the flow rules in 2 and 3 above. In the figure considered above, the function λ1 (v) is illustrated by the color of the node v: a gray node is a bundle node and a white node is a conservation node. Definition 3 Let f be a bidflow in the R-NETBID N = (D, s, t, c, l, λ). The value of f , val(f ), is defined as val(f ) = b∈B val(b)fbt , where val(v) is 0 if v = s and val(v) = λ2 (Df−1 (v)) if = s, t. Df−1 (v) is the set of all vertices w ∈ V (D) such that (w, v) ∈ E(D) and fwv > 0. λ2 (Df−1 (v)) is the rule (specified by the second label associated to vertex v) of computing val(v) from the values of its predecessors which send flows into v. Definition 4 Let N = (D, s, t, c, l, λ). The R-valuation designated by N is the function vN : P(R) → R+ , where for each S ⊆ R, vN (S) = max{val(f )|f ∈ F N , fsr = 0 ∀r ∈ R − S}. By the above two definitions, the value associated by N to a set S of resources is the maximum sum of the values of the (disjoint) bundles which are contained in the set (assignment) S. This is in concordance with the definition of a v-basis given section 1 for a superadditive valuation v. However, the NETBID structure defined above is more flexible in order to express any valuation. If the bidder desires to express that at most k bundles from some set of bundle nodes must be considered, then these nodes are connected to a new
0,1
NETBID
2
0,1 s
0,1 r
i
t
0,1
r
n
0,1 rn ’ NETBID
n
Figure 2. Aggregate NETBID
From this construction, the following theorem can be proved Theorem 3 If each bidder’s i R-valuation, vi , is represented by RNETBID Ni (i ∈ I), then the aggregate R-valuation va is designated by the aggregate R-NETBID Na , that is, va = vNa .
3
Conclusion
In this paper we proposed a new visual framework for bidding languages. Several bidding languages for CAs have previously been proposed: “logical bidding languages” [5, 3, 7]; the LGB language [1]; TBBL, a tree-based bidding language that has several novel properties [2]; Petri Nets formalisms [4] etc. Our bidflows can be viewed as concise statical tools to represent (following the TBBL spirit) dynamical resources allocation processes. Simple NETBID constructions can simulate “phantom variables”([5]) hence expressing any R-valuation. Future work will practically investigate the proposed alternative for business intelligence.
REFERENCES [1] C. Boutilier and H. Hoos. Bidding languages for combinatorial auctions. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI)., pages 1211–1217, 2001. [2] R. Cavallo, D. Parkes, A. Juda, A. Kirsch, A. Kulesza, S. Lahaie, B. Lubin, L. Michael, and J. Shneidman. Tbbl: A tree-based bidding language for iterative combinatorial exchanges. In Int’l Joint Conf’s on A.I.: Workshop on Advances in Preference Handling, 2005. [3] Y. Fujisima, K. Leyton-Brown, and Y. Shoham. Taming the computational complexity of combinatorial auctions. In Proceedings of the 16th Int’l Joint Conf on AI, pages 548–553, 1999. [4] A. Giovannucci, J. Rodriguez-Aguilar, J. Cerquides, and U. Endriss. Winner determination for mixed multi-unit combinatorial auctions via petri nets. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems. ACM, 2007. [5] N. Nisan. Bidding and allocations in combinatorial auctions. In ACM Conference on Electronic Commerce (EC-2000), 2000. [6] M. Rothkopf, A. Pekec, and R. Harstad. Computationally manageable combinatorial auctions. Management Science, 44:1131–1147, 1998. [7] T. Sandholm. emediator: a next generation electronic commerce server. In Proceedings of the 4th Int’l Conf on Autonomous Agents, pages 341– 348, 2000.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-873
873
Multi-Agent Reinforcement Learning for Intrusion Detection: A case study and evaluation Arturo Servin and Daniel Kudenko1 Abstract. In this paper we propose a novel approach to train MultiAgent Reinforcement Learning (MARL) agents to cooperate to detect intrusions in the form of normal and abnormal states in the network. We present an architecture of distributed sensor and decision agents that learn how to identify normal and abnormal states of the network using Reinforcement Learning (RL). Sensor agents extract network-state information using tile-coding as a function approximation technique and send communication signals in the form of actions to decision agents. By means of an on line process, sensor and decision agents learn the semantics of the communication actions. In this paper we detail the learning process and the operation of the agent architecture. We also present tests and results of our research work in an intrusion detection case study, using a realistic network simulation where sensor and decision agents learn to identify normal and abnormal states of the network.
1
Introduction
Intrusion Detection Systems (IDS) play an important role in the protection of computer networks and information systems from intruders and attacks. Despite previous research efforts there are still areas where IDS have not satisfied all requirements of modern computer systems. Specifically, Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks have received significant attention due to the increased security vulnerabilities in end-user software and bot-nets. A special case of DoS are the Flooding-Base DoS and Flooding-Base DDoS attacks. These are generally based on a flood of packets with the intention of overfilling the network resources of the victim. It is especially difficult to create a flexible hand-coded IDS for such attacks, and machine learning is a promising avenue to tackle the problem. Due to the distributed nature of this type of attacks and the complexities that involve its detection, we propose a distributed reinforcement learning (RL) approach. In RL agents learn to act optimally via observations and feedback from the environment in the form of positive or negative rewards [7]. Multi-Agent RL has been successfully used to solve some challenging problems in various areas. Despite its apparent appeal, MARL needs to deal with problems such as the size of the action-state space which makes scalability an issue; the partial information that agents have of other agents’ observations and actions; a non-stationary environment as result of the actions of other agents, and the credit assignment problem. To overcome these problems we present an architecture of distributed sensor agents that get information from the environment and share it in the form of communication signals with other agents 1
University of York, aservin,kudenko@cs.york.ac.uk
United
Kingdom,
email:
higher up the hierarchy. Without any previous semantic knowledge about the signals, higher-level hierarchical agents interpret them and consequently interact with the environment. This results in a learning process where agents with partial observability make decisions and coordinate their own actions to reach a common goal. In order to evaluate our proposal we explore its use of Distributed Intrusion Detection Systems (DIDS).
2
Agent Architecture
We propose an architecture of autonomus agents divided sensor agents (SA) and decision agents (DA). SA collect and analyse state information about the environment. Each SA receives only partial information about the global state of the environment and they map this local state to communications action-signals. These signals are received by the DA and without any previous knowledge it learns their semantics and how to interpret their meaning. In this way, the DA tries to model the local state of cell environment. Then it decides which final action to trigger (in our case study it triggers an alarm to the network operator). When the DA triggers the action and this is appropriate accordingly with the goal pursued, all the agents receive a positive reward. If the action is not correct, all the agents receive a negative reward. The goal is to coordinate the signals sent by the SA to the DA in order to represent the global state of the environment. To detect the abnormal states that DoS and DDoS generate in a computer network we have designed an architecture composed by four agents. These agents are a Congestion Sensor Agent (CSA), a Delay Sensor Agent (DSA), a Flow Sensor Agent (FSA) and the Decision Agent (DA). We need this diversity of sensor information to develop more reliable IDS. The idea is that each sensor agent perceives different information depending on their capabilities, their operative task and where they are deployed in the network. Furthermore not all the features are available in a single point in the network. Flow and congestion information may be measured in a border router between the Internet and the Intranet whilst delay information may be only available from an internal router.
3
Results
We set up several tests to verify the learning capabilities of our agent architecture. We used a control test to train the agents to categorise basic normal and abnormal activity in the network. To simulate the normal traffic we randomly started and stopped connections from node 0 (TCP/FTP) and node 1 (UDP stream). Using another random pattern of connections we used node 4 to simulate the attacks to the network characterised by a flood of UDP traffic. To evaluate the adaptability of the agents we ran tests changing the normal and abnormal traffic patterns. We also ran tests designed to create more
874
A. Servin and D. Kudenko / Multi-Agent Reinforcement Learning for Intrusion Detection: A Case Study and Evaluation
complex scenarios where the attacker changes its attack to mimic authorised or normal traffic. We compared our learning algorithm against two hard-coded approaches. The first hard-coded approach (Hard-Coded 1) emulated a misuse IDS. In this case the IDS is looking for the patterns that match an attack. The Hard-Coded 2 approach integrates the same variety of input information as our learning algorithm. We evaluated the learning and hard-coded approaches using test 2 and test 5. Test 2 only changes the traffic pattern of the attack and it must be very simple to detect. Attacks in test 5 we changed the packet size and the attack UDP port to be the same used by normal applications. This test is the hardest to detect because it emulates some of the signatures of normal traffic. The learning curves of the test are shown in Fig.1. Hard-Coded 1 had no problem to identify attacks and have low false negatives for test 2 but it completely failed to detect attacks test 5. This is the same problem that misuse IDS have when the signature of the attack changes or when they face unknown attacks. The results for Hard-Coded 2 and our learning approach confirm our argument that for more reliable intrusion detection we need a variety of information sources. Both solutions were capable of detecting the attacks even though one of the sensors was reporting incorrect information. This scenario also could be seen as the emulation of a broken sensor sending bogus information or a sensor compromised by the attacker and forced to send misleading signals. Either way it demonstrates that a system using more than one source to detect intrusions could be more reliable than single-source IDS. Figure 1. Learning Curves
10
5
0
5
Conclusion and Future Work
We have shown how a group of agents can coordinate their actions to reach the common goal of network intrusion detection. During this process decision agents learn how to interpret the action-signals sent by sensor agents without any previously assigned semantics. These action-signals aggregate the partial information received by sensor agents and they are used by the decision agents to reconstruct the global state of the environment. In our case study, we evaluate our learning approach by identifying normal and abnormal states of a realistic network subjected to various DoS attacks. We have also successfully applied RL in a group of network agents under conditions of partial observability, restricted communication and global rewards in a realistic network simulation. Finally we can conclude that using a variety of network data has generated good results to identify the state of the network. In some cases the agents can generate good results even when some of this information is missing. Future work include scaling up our learning approach to a large number of agents a hierarchical approach. This architecture will allow us to create more complex network topologies and eventually the emulation of real packet streams inside the network environment.
REFERENCES Test Test Test Test
-5
2, Hand Coded-1 5, Hand Coded-1 2, Hand Coded-2 5, Hand Coded-2 Test 2, Learning Test 5, Learning
-10 0
50
100
150
200
250
Iteration
Both the Hand-coded 2 and learning approaches present very good results regarding the identification of normal and abnormal states in the network. While the learning algorithm requires some time to learn to recognise normal and abnormal activity, it does not require any previous knowledge about the behaviour of the measured variables. Hand-coded 2 reaches maximum performance since the beginning of the simulation but it requires in-deepth knowledge from the policy programmer about the the network traffic and the variables measured to detect intrusions.
4
[2] where cooperative agents learn how to route packets using optimal paths. Using the same approach of flow control and feedback from the environment, other researchers have expanded the use of RL in routing algorithms [6], explore the use of MARL to control congestion in networks [4], routing using QoS [5] and more recently to control DDoS attacks [9]. The use of RL in the intrusion detection field has not been widely studied and even less in distributed intrusion detection. Some research works are [3] where the authors trained a neural network using RL and [1] where game theory is used to train agents to recognise DoS attacks against routing infrastructure. Other recent research work include the use of RL to detect host intrusion using sequence system calls [10] and the previously mentioned [9].
Related Work
Problems such as the curse of dimensionality; partial observability and scalability in MARL have been analysed using a variety of methods and techniques and they represent the foundation of our research. An application of MARL to networking environments is presented in
[1] B. Awerbuch, D. Holmer, and H. Rubens, ‘Provably Secure Competitive Routing against Proactive Byzantine Adversaries via Reinforcement Learning’, John Hopkins University, Tech. Rep., May, (2003). [2] J.A. Boyan and M.L. Littman, ‘Packet routing in dynamically changing networks: A reinforcement learning approach’, Advances in Neural Information Processing Systems, 6(1994), 671–678, (1994). [3] J. Cannady, ‘Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks’, NISSC00: Proc. 23rd National Information Systems Security Conference, (2000). [4] J. Dowling, E. Curran, R. Cunningham, and V. Cahill, ‘Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing’, Systems, Man and Cybernetics, Part A, IEEE Transactions on, 35(3), 360–372, (2005). [5] E.G. Gelenbe, M. Lent, and R.P.L.P. Su, ‘Autonomous smart routing for network QoS’, Autonomic Computing, 2004. Proceedings. International Conference on, 232–239, (2004). [6] A. Nowe, K. Steenhaut, M. Fakir, and K. Verbeeck, ‘Q-learning for adaptive load based routing’, Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on, 4, (1998). [7] R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998. [8] X. Xu, Y. Sun, and Z. Huang, ‘Defending DDoS Attacks Using Hidden Markov Models and Cooperative Reinforcement Learning’, LECTURE NOTES IN COMPUTER SCIENCE, 4430, 196, (2007). [9] X. Xu and T. Xie, ‘A Reinforcement Learning Approach for Host-Based Intrusion Detection Using Sequences of System Calls’, Proceedings of the International Conference on Intelligent Computing, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-875
875
GR-MAS: Multi-Agent System for Geriatric Residences Javier Bajo1 and Juan M. Corchado and Sara Rodriguez2 Abstract. This paper presents a multiagent architecture (GR-MAS) developed for facilitating health care in geriatric residences. GRMAS (Geriatric Residence Multi-Agent System) contains different agent types and takes into account the integration within RFID, WiFi technologies and handheld devices. The core of GR-MAS is an autonomous deliberative case-based planner agent called GerAg (Geriatric Agent for monitoring alzheimer patients). This agent, which allows adaptation and learning capabilities, has been designed to plan the nurses’ working time dynamically, to maintain the standard working reports about the nurses’ activities, and to guarantee that the patients assigned to the nurses are given the right care. A description of GerAg, its relationship with the complementary agents, and preliminary results of the multi-agent system prototype in a real environment are presented.
1
INTRODUCTION
There is an ever growing need to supply constant care and support to the disabled and elderly and the drive to find more effective ways to provide such care has become a major challenge for the scientific community [3]. During the last three decades the number of Europeans over 60 years old has risen by about 50%. Today they represent more than 25% of the population and it is estimated that in 20 years this percentage will rise to one third of the population, meaning 100 millions of citizens [3]. In the USA, people over 65 years old are the fastest growing segment of the population and it is expected that in 2020 they will represent about 1 of 6 citizens totaling 69 million by 2030. Furthermore, over 20% of people over 85 years old have a limited capacity for independent living, requiring continuous monitoring and daily care. The importance of developing new and more reliable ways to provide care and support to the elderly is underlined by this trend [3], and the creation of mechanisms for monitoring and optimizing health care will become vital. Some authors consider that tomorrow’s health care institutions will be equipped with intelligent systems capable of interacting with humans. Multiagent systems and architectures based on intelligent devices have recently been explored as supervision systems for medical care for the elderly patients, these intelligent systems aim to support them in all aspects of daily life, predicting potential hazardous situations and delivering physical and cognitive support. Multiagent systems together with the use of RFID and Wi-Fi technologies, and handheld devices offer new possibilities and open new fields such as the ambient intelligence that may facilitate the integration of distributed intelligence software applications in our daily life. 1 2
Pontifical University of Salamanca, Spain, email: jbajope@upsa.es University of Salamanca, Spain, email: {corchado, srg}@usal.es
2
GR-MAS: A MULTIAGENT SYSTEM FOR GERIATRIC RESIDENCES
GR-MAS (Geriatric Residence Multi-Agent System) is a multiagent architecture proposed for improving health care services and its integration within the complementary technologies. The GerAg agent, which is a deliberative planning agent, is the core of GR-MAS, and incorporates a planning mechanism that improves the medical assistance in geriatric residences by optimizing the visiting schedules. GR-MAS is a dynamic system for the management of different aspects of the geriatric center. This distributed system uses Radio Frequency Identification (RFID) technology for ascertaining patients’ location in order to maximize their safety or to generate medical staff plans. The development of such multiagent system has been motivated for one of the more distinctive characteristics of geriatric or alzheimer residences, which is their dynamism, in the sense that the patients change very frequently (new patients arrive and others pass away), while the staff rotation is also relatively high and they normally work in shifts of eight hours. GR-MAS uses mobile devices and Wi-Fi technology to provide the personnel of the residence with updated information about the center and the patients, to provide the working plan, information about alarms or potential problems and to keep track of their movements and actions within the center. From the user’s point of view the complexity of the solution has been reduced with the help of friendly user interfaces and a robust and easy to use multiagent system.
Figure 1. GR-MAS wireless technology organization schema
GR-MAS is composed of four different types of agent, as can be seen in Figure 1: Patient agent manages the patient’s personal data and behaviour (monitoring, location, daily tasks, and anomalies); Manager agent plays two roles, the security control and the management of the medical record database; Doctor GerAg agent treats patients; and GerAg agent schedules the nurse’s working day obtaining dynamic plans depending on the tasks needed for each assigned patient.
876
3
J. Bajo et al. / GR-MAS: Multi-Agent System for Geriatric Residences
GERAG: AUTONOMOUS PLANNER AGENT FOR GERIATRIC RESIDENCES
GerAg is an autonomous deliberative case-based planner (CBPBDI) agent [2] developed for integration within a multi-agent system named GR-MAS. The goal of this agent is to provide efficient working schedules, in execution time, for geriatric residences staff and therefore to improve the quality of health care and the supervision of patients in geriatric residences. Each of the GerAg agents is assigned to a nurse or a doctor of a residence, and provides also information about patient locations, historical data and alarms. As the members of the staff are carrying out their duties (following the plan provided by the agent) the initial proposed plan may need to be modified due for example to delays or alarms, in this case the agent is capable of re-planning in execution time. The CBP planner used by the GerAg agent identifies a plan, for a given nurse, to provide daily nursing care in the residence. It is very important to maintain a map with the location of the different patients at the time of planning or re-planning, which is why RFID technology is used to facilitate the location and identification of patients, nurses and doctors. The CBP Agent calculates the most re-plan-able intention (MRPI) as shown in [4], which is the plan than can be easily substituted by other in case the initial plan gets interrupted. In a dynamic environment, to have an alternative plan it is important to maintain the efficiency of the system. This agent follows the 4 stages of a CBR system (Retrieval, Reuse, Review and Retain) [1].
adopted on January 15th, 2007. The average number of patients was the same before and after the implementation. To test the system 30 patient agents, 10 GerAg nurse agents, 2 doctor agents and 1 manager agent were instantiated. The tests have focused on the GerAg nurse agents. As can be seen in Figure 3, the pointed line represents the average number of nurses required in the residence each hour of a day without GR-MAS. The vertical bars represent the same measure but after the implementation. As can be seen, the GR-MAS multiagent system helps the nurses to gain time, which can be dedicated to the care of special patients, to learn or to prepare new activities. The time spent on supervision and control tasks has been reduced substantially, as well as the time spent attending false alarms, while the time for direct patient care has been increased.
Figure 3. Number of nurses working simultaneously
5
Figure 2. Case-based planning cycle
Figure 2 shows the steps carried out in each of the stages of the CBP system. When an interruption occurs, the system initiates a new CBP cycle, taking into account the tasks previously accomplished. That is, in the new retrieval stage, plans with a similar problem description to the current situation (after the interruption) will be recovered. The MRPI guarantees that at least some of the plans closest to the initial geodesic plan will be recovered (the rest of the plans are not valid anymore because of the restrictions, that the tasks have already accomplished, etc.) together with new plans.
4
RESULTS OBTAINED
The GR-MAS system has been tested over the last few months. During the testing period the system usefulness has been evaluated from different points of view. Figure 3 shows the average number of nurses working simultaneously (each of the 24 hours of the day) at the Residence before and after the implantation of the system prototype, with data collected from October 2006 to March 2007. The prototype was
CONCLUSION
In the future, health care for Alzheimer’s patients, the elderly and people with other disabilities will require the use of new technologies that allow medical personnel to carry out their tasks more efficiently. One of the possibilities is the use of multiagent systems. We have shown the potential of deliberative GerAg agents in a distributed GR-MAS on health care, providing a way to respond to some challenges of health care, related for example to the identification, control and health care planning. In addition, the use of RFID technology on people provides a high level of interaction among users and patients through the system and is fundamental in the construction of the intelligent environment. Furthermore, the use of mobile devices, when used well, can facilitate social interactions and knowledge transfer.
ACKNOWLEDGEMENTS This work has been partially supported by the MCYT Spanish Ministry of Science project TIN2006-14630-C03-03.
REFERENCES [1] A. Aamodt and E. Plaza, ‘Case-based reasoning: foundational issues, methodological variations, and system approaches’, AI Communications, 20, 39–59, (1994). [2] Tapia D.I. de Luis A. Rodrguez S. de Paz J.F. Bajo, J. and J.M. Corchado, Hybrid Architecture for a Reasoning Planner Agent, 461–468, Lecture Notes in Artificial Intelligence, 4693, Springer Verlag, Berlin, 2007. [3] L. Camarinha-Matos and H. Afsarmanesh, Design of a virtual community infrastructure for elderly care, 635, PRO-VE02 3rd IFIP Working Conference on Infrastructures for Virtual Enterprises, Kluwer B.W., Deventer, 2002. [4] M. Glez-Bedia and J.M. Corchado, ‘Pa planning strategy based on variational calculus for deliberative agents’, Computing and Information Systems Journal, 10(1).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-877
877
Agent-Based and Population-Based Simulation of Displacement of Crime (extended abstract) Tibor Bosse and Charlotte Gerritsen and Mark Hoogendoorn and S. Waqar Jaffry and Jan Treur1 Abstract. Within Criminology, the process of crime displacement is usually explained by referring to the interaction of three types of agents: criminals, passers-by, and guardians. Most existing simulation models of this process are agent-based. However, when the number of agents considered becomes large, population-based simulation has computational advantages over agent-based simulation. This paper presents both an agent-based and a population-based simulation model of crime displacement, and reports a comparative evaluation of the two models. In addition, an approach is put forward to analyse the behaviour of both models by means of formal techniques.
1 INTRODUCTION Within Criminology one of the main research interests is the emergence of so-called criminal hot spots. These hot spots are places where many crimes occur. After a while the criminal activities shift to another location, for example, because the police has changed its policy and increased the numbers of officers at the hot spot. Another reason may be that the passers by move away, when a certain location gets a bad reputation. Such a shift between locations is called the displacement of crime. The reputation of specific locations in a city is an important factor in the spatio-temporal distribution and dynamics of crime. For example, it may be expected that the amount of assaults that take place at a certain location affect the reputation of this location. Similarly, the reputation of a location affects the attractiveness of that location for certain types of individuals. For instance, a location that is known for its high crime rates will attract police officers, whereas most citizens will be more likely to avoid it. As a result, the amount of criminal activity at such a location will decrease, which will affect its reputation again. The classical approaches to simulation of processes in which groups of larger number of agents and their interaction are involved are population-based: a number of groups is distinguished (populations) and each of these populations is represented by a numerical variable indicating their number or density (within a given area or location) at a certain time point. The simulation model takes the form of a system of difference or differential equations expressing temporal relationships for the dynamics of these variables. Well-known classical examples of such population-based models are systems of difference or differential equations for predator-prey dynamics (e.g., [8], [12], [13], [9], [4]) and the dynamics of epidemics (e.g., [10], [7], [4] [1], [6]). Such models can be studied by simulation and by using analysis techniques from mathematics and dynamical systems theory. From the more recently developed agent system area it is 1
Vrije Universiteit Amsterdam, Department of Artificial Intelligence, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands, email: {cg, tbosse, mhoogen, swjaffry, treur}@few.vu.nl
often taken as a presupposition that simulations based on individual agents are a more natural or faithful way of modelling, and thus will provide better results (e.g., [5], [11], [2]). Although for larger numbers of agents such agent-based modelling approaches are more expensive computationally than population-based modelling approaches, such a presupposition may provide a justification of preferring their use over population-based modelling approaches, in spite of the computational disadvantages. However, for larger numbers of agents (in the limit), agent-based simulations may equally well approximate population-based simulations. In such cases agentbased simulations just can be replaced by population-based simulations. In this paper, for the application area of crime displacement these considerations are explored in more detail. Comparative simulation experiments have been conducted based on different simulation models, both agent-based (for different numbers of agents), and population-based. The results are analysed and related to the assumptions discussed above. This paper is organised as follows. First, Section 2 introduces the population based model which has been defined for this domain and briefly presents the outcomes of a mathematical analysis of the model and simulations using the model . Thereafter, Section 3, introduces the agent-based model and briefly describes the simulation results using that model. Finally, Section 4 is a discussion.
2 A POPULATION-BASED MODEL In the population-based model, the densities of the different agent types (i.e. criminals, passers-by, and guardians) are calculated by means of differential equations. An example of an equation to determine the number of criminals at location L is specified as follows: c(L, t + t) = c(L, t) +
1
(β(L, c, t) - c(L, t)/ c) t
This expresses that the density c(L, t + t) of criminals at location L on t + t is equal to the density of criminals at the location at time point t plus a constant 1 (expressing the rate at which criminals move per time unit) times the movement of criminals from t to t+Δt from and to location L multiplied by t. Here, the movement of criminals is calculated by determining the relative attractiveness β(L, c, t) of the location (compared to the other locations) for criminals. From this, the density of criminals at the location at time point t divided by the total number c of criminals (which is constant) is subtracted, resulting in the change of the number of criminals for this location. For the guardians and the passers-by similar formulae are used. The calculation of the attractiveness of locations has been omitted for the sake of brevity. A mathematical analysis has been conduced to investigate the behaviour of the model, and it was shown that in all cases attraction to the equilibrium will take place. Hence, given the set
878
T. Bosse et al. / Agent-Based and Population-Based Simulation of Displacement of Crime
of assumptions as described above, the model will eventually stabilise. Besides the mathematical analysis, simulation runs have been conducted as well and the outcomes confirm the results found in the mathematical analysis. The computation time needed to perform the simulations is approximately 1 second.
3 AN AGENT-BASED MODEL For the agent-based model, the following algorithm is used: 1. initialise all agents on locations 2. for each time step repeat the following a. each agent calculates the attractiveness of a location depending on its type (passers-by, criminals, and guardians) for all locations. % of the agents of each type is selected at random to b. decide whether the agent moves to a new location or stay at the old one c. the selected agents move to a location with a probability proportional to the attractiveness of the specific location (i.e. a selected agent has a higher probability of moving to a relative attractive location than to a non-attractive one). Using the agent based model, simulation runs have been performed, and the results are closely correlated to the results using the population based model. The computation time needed to perform the agent based model (for 100 runs) is 16.39 seconds.
4 DISCUSSION In this paper two models have been introduced to investigate the criminological phenomenon of the displacement of crime. Hereby, a population-based model has been introduced as well as an agent-based model. These models have been presented in a generic format to allow for an investigation of a variety of different functions representing aspects such as the attractiveness of locations. Using mathematical analysis, and confirmed by simulation results, the population-based model was shown to end up in an equilibrium for one variant of the model. The parameter settings for these simulations have been determined in cooperation with criminologists. The simulation results for the agent-based model using the same parameter settings show an identical trend to the population-based model except for some minor deviations that can be attributed to the fact that the agentbased model is discrete, as confirmed by the formal evaluation. The computation time of the populations-based model was shown to be much lower than the computation time of the agentbased model. The results reported in this paper differ at some points from the results reported in [3]. In the results using an agent-based model reported in that paper, cyclic patterns were observed whereby there is a continuous movement so called hot-spots (i.e. places where a lot of crime takes place). As already stated
before, this paper shows that the population of agents at the various locations stabilises over time. The difference can be attributed to the fact that in [3] all agents decide where to move to based upon the attractiveness of locations, whereas in the case of the models presented in this paper only a subset of the agents move. The results can however be reproduced using the model presented in this paper as well by using an = 1 and t = 1. Determining what settings are most realistic in real life is future work. The idea that population-based models approximate agentbased models for larger populations is indeed confirmed by the simulation results reported in this paper. Future work is to introduce a general framework to make a comparison between the models possible.
REFERENCES [1]
[2]
[3]
[4] [5]
[6] [7]
[8] [9] [10]
[11]
[12] [13]
R.A. Anderson and R.M. May, Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford, UK, 1992. L. Antunes and K. Takadama (eds.), Multi-Agent-Based Simulation VII, Proceedings of the Seventh International Workshop on Multi-Agent-Based Simulation, MABS’06, LNAI, vol.4442, Springer Verlag, 2007. T. Bosse and C. Gerritsen, Agent-Based Simulation of the Spatial Dynamics of Crime: on the interplay between criminals hot spots and reputation, In: Proceedings of the Seventh International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS’08, ACM Press, to appear, 2008. D.N. Burghes and M.S. Borrie, Modelling with Differential Equations, John Wiley and Sons, 1981. P. Davidsson, L. Gasser, B. Logan and K. Takadama (eds.), MultiAgent and Multi-Agent-Based Simulation, Proceedings of the Joint Workshop on Multi-Agent and Multi-Agent-Based Simulation, MABS’04, LNAI, vol. 3415, Springer Verlag, 2005. S.P. Ellner and J. Guckenheimer, Dynamic Models in Biology, Princeton Univerity Press, 2006. W.O. Kermack and W.G. McKendrick, A contribution to the mathematical theory of epidemics, Proceedings of the Royal Society of London, Series A 115, pp. 700-721, 1927. A.J. Lotka, Elements of Physical Biology, reprinted by Dover in 1956 as Elements of Mathematical Biology, 1924. J. Maynard Smith, Models in Ecology, Cambridge University Press, Cambridge, 1974. R. Ross, An application of the theory of probabilities to the study of pathometry Part I, Proceedings of the Royal Society of London, Series A 92, pp.204-230, 1916. J.S. Sichman and L. Antunes (eds.), Multi-Agent-Based Simulation VI, Proceedings of the Sixth International Workshop on Multi-Agent-Based Simulation, MABS’05, LNAI, vol. 3891, Springer Verlag, 2006. V. Volterra, Fluctuations in the abundance of a species considered mathematically, Nature 118, pp. 558-560, 1926. V. Volterra, Variations and fluctuations of the number of individuals in animal species living together, In: Animal Ecology, McGraw-Hill, 1931, translated from 1928 edition by R.N. Chapman.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-879
879
Organizing Coherent Coalitions Jan Broersen and Rosja Mastop and John-Jules Ch.Meyer and Paolo Turrini 1 Abstract. In this paper we provide and discuss a language to talk about coherence, a property of interaction that ensures players’ abilities non to contradict one other and the empty coalition not to make active choices. With this property we can model a closed-world interaction, such as those of a Coordination Game or of a Prisoner Dilemma, where all the outcomes are determined only by the choices of the agents that are present.
1
Introduction
Pauly’s Coalition Logic has shown to be a sound formal tool to analyze the properties of strategic interactions and games. One issue left is to define in that language what the interesting properties of an interaction are, as possible for instance with regularity (abilities of coalitions do not contradict each other) or outcome monotonicity (if a coalition can force an outcome to lie in a set X, can also force an outcome to lie in all supersets of X).
XXX Row
Column XX White Dress XX X
White Dress Black Dress
Table 1.
Black Dress
(3, 3) (0, 0)
(0, 0) (3, 3)
Clothing Conformity
In the situation of Table 1, a legislator who wants to achieve the socially optimal state (players coordinate) should declare that a discordant choice is forbidden, thereby labeling the combinations of moves (black, white), (white, black) as violations. Suppose however that the environment were an active part of the game, and that it could decide to cut off the left side of the matrix, eliminating the possibility for Column to make a proper choice. Then what should a legislator do? It is quite clear that requiring the agents to choose something should depend on the moves that are available to the players. In order to have a regulation of the system we need a proper agent-oriented normative system, in particular we should avoid deontic statements that concern proper choices to be carried out by nature. This translates into ruling out all those systems in which nature plays an active role, i.e. isolating all closed-world interactions. In this paper we will pursue this idea formally, identifying all such interactions and axiomatizing their logic.
2
Coherence
We introduce the concept of a Effectivity Function, adopted from [6]. 1
Universiteit Utrecht, The Netherlands, email: paolo@cs.uu.nl
Definition 1 (Effectivity Function) Given a finite set of agents Agt and a set of states W , a effectivity W function is a function E : W → (2Agt → 22 ). Any subset of Agt will henceforth be called a coalition. For elements of W we use variables u, v, w, . . .. The elements of W are called ‘states’ or ‘worlds’; the subsets of Agt are called ‘coalitions’; the sets of states X ∈ E(w)(C) are called the ‘choices’ of coalition C in state w. The set E(w)(C) is called the ‘choice set’ of C in w. The complement of a set G is calculated from the obvious domain. Intuitively, if X ∈ E(w)(C) the coalition is said to be able to force that the next state after w will be some member of the set X. For studying closed-world interaction, we consider these minimal properties: (1) coalition monotonicity: for all X, w, C, D, if X ∈ E(w)(C) and C ⊆ D, then X ∈ E(w)(D);(2) regularity: for all X, w, C, if X ∈ E(w)(C), then X ∈ E(w)(C); (3) outcome monotonicity: for all X, Y, w, C, if X ∈ E(w)(C) and X ⊆ Y , then Y ∈ E(w)(C); (4) inability of the empty coalition (IOEC): for all w, E(w)(∅) = {W }. If an Effectivity Function has these properties, it will be called coherent. As noticed also by [2] with the last property the empty coalition cannot force non-trivial outcomes of a game. One important class of Effectivity Functions are the playable ones, that have been proved to be corresponding to strategic games ([6] [Theorem 2.27]). For any world w an Effectivity Function is playable if it has the following properties: (1) ∅ ∈ / E(w)(C), for any (C); (2) W ∈ E(w)(C) for any C. (3) E is Agt-maximal, that is for any X ⊆ W , s.t. W \X ∈ E(w)(∅) implies X ∈ E(w)(Agt) (4) E is superadditive, i.e. for C ∩ D = ∅, if X ∈ E(w)(C) and Y ∈ E(w)(D) then X ∩ Y ∈ E(w)(C ∪ D). In order to understand the types of interactions we are isolating, we need to compare coherent and playable effectivity functions. First some definitions: we call Agt-superadditive an Effectivity Function that is superadditive for C, D ⊆ Agt with C ∪ D = Agt, and Csuperadditive an EF that is superadditive for C, D with C ∪ D = Agt. We skip the proofs, for reason of space. Proposition 1 (1) Not all playable games are coherent, and not all coherent games are playable. (2) Coherent Agt-maximal games are Agt-superadditive. (3) Coherent Agt-maximal C-superadditive games are playable.
3
Axiomatization
Let Agt be a finite set of agents and P rop a countable set of atomic formulas. The syntax of Coherent Coalition Logic is defined as follows: φ ::= p|¬φ|φ ∨ φ|[C]φ|Eφ where p ranges over P rop and C ranges over the subsets of Agt. The other Boolean connectives are defined as usual. The informal reading
880
J. Broersen et al. / Organizing Coherent Coalitions
of the modalities is: “Coalition C can choose φ” and “There is a state that satisfies φ”. The dual Aφ is defined as ¬E¬φ. Notice that we have the syntax of standard Coalition Logic (see [6]) plus a global modality. Definition 2 (Models) A model for our logic is a tuple (W, E, R∃ , V ) W
where W is a nonempty set of states; E : W −→ (2Agt −→ 22 ) is a coherent Effectivity Function; R∃ = W × W is a global relation; V : W −→ 2P rop is a valuation function. The satisfaction relation of modal formulas (the rest is standard) with respect to a pointed model M, w is defined as follows: M, w |= [C]φ M, w |= Eφ
iff iff
[[φ]]M ∈ E(w)(C) ∃v s.t. wR∃ v and M, v |= φ
the maximally consistent sets w ∈ W ∗ , closed under the proof system depicted in the table. We take the following conditions to describe coherence of the Effectivity Function on the canonical relation. ∗ ⊆ X : [C]φ ∈ w and ∀ψ ⊆ (W ∗ \X) : [C]ψ ∈ w • wEC X iff ∃φ (for C = ∅) ∗ ∗ • EC ⊆ ED (for C ⊆ D) ∗ • wEC X iff X = W ∗ (for C = ∅) • wR∃ v iff w, v ∈ W ∗
Proposition 4 The set of axioms and rules in the table are sound and complete with respect to the class of Coherent Coalitional Frames Proof We need just to check the statement with respect to the gen erated submodel M ∗ through the global relation. We make use of [Theorem 3.10] in [6]. We omit the detailed proof.
In this definition, [[φ]]M =def {w ∈ W | M, w |= φ}. The modality for coalitional ability is standard from Coalition Logic [6]. What we look for now is the a set of axioms and rules such that the corresponding maximally consistent sets generate a coherent Effectivity Function in the canonical models. However the Inability of the Empty Coalition is not definable in Coalition Logic. To see this it is important to notice that Coalition Logic is a monotonic multimodal logic, and frame validity of formulas of monotonic modal logics is closed under taking disjoint unions. This is proved for modal satisfaction in [4][Definition 4.1, Proposition 4.2]. Proposition 2 There is no formula of Coalition Logic that defines Inability of the Empty Coalition. We can construct two models that have IOEC, while their disjoint union does not (see [4] and [1] for the definitions). We claim that Aφ ↔ [∅]φ defines the Inability of the Empty Coalition. Proposition 3 |=C Aφ ↔ [∅]φ ⇔ E(w)(∅) = {W } for every w in any frame F in the class of Coalitional Frames C. Proof (⇒) Assume that |=C Aφ ↔ [∅]φ while not E(w)(∅) = {W } for every w in any frame F in the class of Coalitional Frames C. Then there is an F in which there is a w such that E(w)(∅) = {W }. Notice that both W and E(w)(∅) are nonempty. So there is a W = W s.t W ∈ E(w)(∅) and W ⊂ W . Take an atom p to be true in all w ∈ W and false in W \ W . Now we have model M based on a coalitional frame C s.t. M |= Ap ↔ [∅]p. (⇐) Assume E(w)(∅) = {W } for a given w in an arbitrary model M of a coalition frame in C, and that w |= Aφ. Then [[φ]]M = W and w |= [∅]φ follows. Assume now that w |= [∅]φ. It has to be the case that [[φ]]M = W by assumption. So also that w |= Aφ, which concludes the proof. Notice that it is enough to have Aφ ↔ [∅]φ to ensure that the global relation axiomatizes an equivalence relation and preserves modal satisfaction. To see this it is enough to check taking a generated submodel with respect to the global relation, given this axiom, ensures the condition of taking also a generated submodel with respect to the neighbourhood modality. = {w ∈ Take a canonical model C ∗ = ((W ∗ , E ∗ ), V ∗ ) with φ ∗ W |φ ∈ w} as the truth set of φ in the canonical model. Take now
A1 A2 A3 A4 A5 A6 A7 R1 R2 R3
Proof System [C]φ → [D]φ (for C ⊆ D) [C]φ → ¬[C]¬φ Aφ ↔ [∅]φ φ → Eφ EEφ → Eφ φ → AEφ A(φ → ψ) → (Aφ → Aψ) φ∧φ→ψ ⇒ψ φ → ψ ⇒ [C]φ → [C]ψ φ ⇒ Aφ
Notice that if we add Agt-maximality to Coherent Games, the following holds: M, w |= [Agt]φ ↔ Eφ At the expressivity level coherent coalition logic is powerful enough to reason on global properties of the models.
4
Conclusion and Future Work
In this paper we studied those interactions in which nature does not play an active role and provided an axiomatization of the resulting logic. The work here described allows for several developments, as the study of stability (Nash-consistency) of normative systems [6] or the efficiency of social procedures [5]. Following this line of reasoning it is possible, given a notion of optimality or efficiency, to construct a deontic language that requires this notion to hold, as done for instance in [3]. We can view Coherent Coalition Logic as a language to talk about those interactions for which it makes sense to construct a deontic language.
REFERENCES [1] P. Blackburn, M. de Rijke, and Y. Venema. Modal Logic. Cambridge Tracts in Theoretical Computer Science, 2001. [2] S. Borgo. Coalitions in action logic. In Proc. of IJCAI, pages 1822–1827, 2007. [3] J. Broersen, R.Mastop, J-J.Ch. Meyer, and Paolo Turrini. A deontic logic for socially optimal norms. forthcoming, 2008. [4] H.H. Hansen. Monotonic Modal Logics. Master Thesis, ILLC, 2001. [5] R. Parikh. Social software. Synthese, 132(3):187–211, 2002. [6] M. Pauly. Logic for Social Software. ILLC Dissertation Series, 2001.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-881
881
A probabilistic trust model for semantic peer-to-peer systems Nguyen Gia-Hien1 and Chatalic Philippe2 and Rousset Marie-Christine1 1
Preliminaries and illustrative example
We consider a network of semantic peers P = (Pi )i=1..n . Each peer Pi uses its own ontology, expressed on its own vocabulary Vi , for describing and structuring its knowledge as well as for annotating its resources. A class C ∈ Vi of a peer Pi is referred by Pi :C or simply by C when no confusion is possible. Peers are connected each other by means of mappings, corresponding to logical constraints linking classes of different peers. Users ask queries to one of the peers, using the vocabulary of this peer. When processing a query, the reasoning propagates from one peer to other peers thanks to those mappings. The mappings are exploited during information retrieval or query answering for query reformulation between peers. For example, let us consider a semantic P2P system sharing movies based on semantic annotations, where P1 organizes his video resources according to their genres (Suspense, Action, Animation), and P2 organizes his films based on the actors playing in the movies (Bruce Willis, Jolie). While having different views for classifying movies, P1 and P2 can establish some mappings between their two classifications. For example, they can agree that the class BruceWillis of P2 (denoted by P2 :BruceW illis) is more specific than the class Action of P1 (denoted by P1 :Action). It will result into the mapping P2 :BruceW illis P1 :Action. Similarly, P1 and another peer P3 can have established the mapping P1 :Action P1 :Suspense P3 :T hriller between their two classifications, in order to state that the category named T hriller by P3 is more general than what P1 classifies as both Action and Suspense. As a result, the movies that are classified by P1 as Suspense and by P2 as BruceW illis are returned as answers to the query T hriller asked by the user at the peer P3 . We assume that each resource r returned as an answer to some query is associated with a label L(r) = Ci1 . . . CiL corresponding to its logical justification. L(r) is a set of classes of the vocabularies of (possibly different) peers known to annotate the resource r and supposed to characterize a sufficient condition for r to be an answer. Any other resource annotated in the same way is thus equally supposed to be an alternative answer to the query. We also assume that the classes used in labels are independent in the sense that for any two classes of a justification, none of them is a subclass of the other. This important assumption means that for a returned answer, the only classes that appear in its justifications are those corresponding to most specific classes of the network. Finally we assume that the user, when querying a peer Pi , is randomly asked to evaluate some of the returned answers as satisfying 1University of Grenoble, LIG, France, email: gia-hien.nguyen@imag.fr, marie-christine.rousset@imag.fr 2 Univ. Paris-Sud, LRI, France, email: philippe.chatalic@lri.fr
or not satisfying and to store the result of this evaluation in a local observation database Oi . Each evaluation is recorded into Oi as a pair S.L or S.L, where S (resp. S) denotes the user satisfaction (resp. unsatisfaction) and L is the label of the evaluated resource. Definition 1 (Observation relevant to a label L) Let Oi be the set of observations of a peer Pi and L be a label. An observation of Oi is said to be relevant to L if and only if its label contains all classes of L. The number of satisfying and unsatisfying observations of Pi that are relevant to L are respectively denoted by: Oi+ (L) = |{S.L ∈ Oi /L ⊆ L }| Oi− (L) = |{S.L ∈ Oi /L ⊆ L }| These two numbers summarize the past experience of the peer Pi relevant to the label L, i.e. of the evaluated resources justified by at least the classes of L. For instance, suppose that Peter is the user querying the peer P1 . After a number of answers have been evaluated, Peter’s past experience may be summarized as in table 1. Label (L) P2 :M yActionF ilms P2 :M yCartoons P4 :ScienceF iction P5 :Italian P5 :W estern P6 :AnimalsDocum P7 :JeanRenoir P8 :Bollywood Table 1.
O1+ (L) 30 3 14 0 8 22 6
O1− (L) 6 15 14 6 2 11 35
Summary of Peter’s observations at P1
Among all the resources evaluated by Peter and annotated with the class M yActionF ilms of the peer P2 , 30 have been considered as satisfactory and 6 as not satisfactory. For the same peer P2 , only 3 out of 18 evaluated resources tagged by M yCartoons were positive. Similarly all evaluated resources annotated with both Italian and W estern by P5 , obtained negative feedbacks. Each peer Pi can progressively update its observation database Oi , as new answers are evaluated, and refine the trust it has towards answers justified by the different observed labels. The level of trust can vary according to the justification.
2
Bayesian model and estimation of trust
Given a label L, let XiL be the binary random variable defined on the set of resources annotated by L as follows: XiL (r) =
n 1 0
if the resource r is satisfying for Pi otherwise
882
G.-H. Nguyen et al. / A Probabilistic Trust Model for Semantic Peer-to-Peer Systems
We define the trust of a peer Pi towards a label L as the probability that the random variable XiL is equal to 1, given the observations resulting from the past experiences of Pi . Definition 2 (Trust of a peer towards a label L) Let Oi be the set of observations of a peer Pi and L be a label, the trust T (Pi , L) of Pi towards L is defined as follows: T (Pi , L) = P r(XiL = 1|Oi ) The following theorem provides a way to estimate the trust T (Pi , L) of a peer Pi towards a label L, and the associated error of estimation. Theorem 1 Let Oi be the set of observations of a peer Pi and L be a label. After Oi+ (L) satisfying and Oi− (L) unsatisfying observations relevant to L have been performed, T (Pi , L) can be estimated to 1 + Oi+ (L) 2 + Oi+ (L) + Oi− (L) with a standard deviation of s (1 + Oi+ (L)) × (1 + Oi− (L)) + (2 + Oi (L) + Oi− (L))2 × (3 + Oi+ (L) + Oi− (L)) It follows from a well known result (e.g., [3],page 336) in probabilities of the application of the Bayes rule to random variables following a Bernoulli distribution the parameter of which is unknown. Table 2 summarizes the estimations with their associated standard deviation obtained by applying Theorem 1 to the Peter’s observations summarized in Table 1. Label (L) P2 :M yActionF ilms P2 :M yCartoons P4 :ScienceF iction P5 :Italian P5 :W estern P6 :AnimalsDocum P7 :JeanRenoir P8 :Bollywood
Estimated trust of P1 towards L 0.815 0.2 0.5 0.125 0.75 0.657 0.162
Standard deviation 0.062 0.087 0.089 0.11 0.12 0.079 0.055
Table 2. Estimated trust of P1 towards the labels of Table 1
3
Propagation of trust
When the observation database does not contain enough observations relevant to a label for computing trust with a good precision, we have to use some propagation mechanism to compensate for the lack of local relevant observations. Instead of propagating trust between peers, our approach consists in propagating the pairs of numbers used for computing trust. Propagating two numbers instead of one does not represent a significant overhead. Yet, it has the significant advantage of providing a wellfounded way to compute a joint trust using the same Bayesian model as the one presented in section 2. Instead of using an ad-hoc aggregation function for combining lo+ + cal coefficients of trust, the numbers Oi1 (L) . . . Oil (L) (respectively − − Oi1 (L) . . . Oil (L)) coming from solicited peers Pi1 . . . Pil are cumulated to compute the joint trust of the subset Pi1 . . . Pil towards L, by applying the formula of Theorem 1. Different strategies are possible to gather on the querying peer the relevant information from the solicited peer’s observations.
• The lazy strategy consists in waiting for getting some answer justified by a label L and then asking one or several trusted neighbors for their direct feedbacks about the label L. Since it applies after the obtention of answers, such a strategy can be used as a postprecessing and does not require to change the query evaluation mechanism itself. As a consequence it can be applied to different kinds of semantic P2P systems, provided they are able to justify answers by means of such labels (e.g. sets of independant semantic annotations). • The greedy strategy consists in collecting the direct feedbacks likely to be relevant (i.e., concerning the classes in the annotation being built) during the query processing. It thus requires some adaptation of the query answering algorithm. In a system like S OME W HERE [1], the D E C A algorithm [2] is first used to infer, from the ontologies and mappings, all the possible reformulations (i.e. rewritings) of the initial query into conjunctions of extensional classes (i.e. containers of instances) C1 , . . . , Cn . Each instance in C1 . . . Cn is then produced as an answer, C1 . . . Cn being the semantic annotation justifying it. The D E C A algorithm can be slightly modified in order to convey, when transmitting back rewritings from a queried peer P to the querying peer P , those feedbacks likely to be relevant. When a rewriting Cj . . . Cm is transmitted from P to P within a message, P uses that message to convey its direct observations (O+ (L), O− (L)) for all labels L containing the classes of the rewriting. By construction, those classes will be part of the annotation of an answer. Therefore, observations relevant to these classes may be relevant for computing (if needed) the joint trust towards the labels annotating answers returned to the peer the initial query is issued from. Note that this strategy leads to combining feedbacks from the very peers that have contributed to obtain an answer. Those peers may thus be considered as naturally relevant for obtaining appropriate feedbacks. However, such sets of peers are determined at query time and may vary according to the query and the returned answer.
4
Perspectives
One of the objectives of reputation systems is the detection and handling of malicious agents in an electronic environment. In a P2P system, a peer can be malicious by providing to other peers virusaffected resources, or by simply lying when reporting its feedbacks about others. In our model, when a peer has enough direct experiences, it does not have to rely on other peers and thus avoid malicious peers. When it has to rely on observations of other peers for estimating its trust towards a label, it is reasonable to assume that the number of malicious peers is small. Therefore, it is possible to either increase the number of peers to solicit to get observations (in order to decrease the impact of wrong observations coming from few peers) or to discard the peers the observations of which change a lot the joint trust (they are likely to be malicious).
REFERENCES [1] P. Adjiman, Philippe Chatalic, Franc¸ois Goasdou´e, Marie-Christine Rousset, and Laurent Simon, ‘Somewhere in the semantic web.’, in PPSWR, pp. 1–16, (2005). [2] Philippe Adjiman, Philippe Chatalic, Franc¸ois Goasdou´e, MarieChristine Rousset, and Laurent Simon, ‘Distributed reasoning in a peerto-peer setting: Application to the semantic web’, Journal of Artificial Intelligence Research, 25, 269–314, (2006). [3] Morris H.DeGroot and Mark J.Schervish, Probability and Statistics, Addison Wesley, 2002.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-883
883
Conditional Norms and Dyadic Obligations in Time Jan Broersen1 and Leendert van der Torre2 1
Introduction
Reasoning about norm violation and time is of central concern to the regulation of multi-agent system behavior. Here we continue work in [2] on an approach to reasoning about norms, obligations, time and agents, involving three main ingredients. First, we assume a branching temporal structure representing the change of propositions over time. Second, we use an algorithm that, given the input of the branching temporal structure and a set of norms, produces an ‘obligation labeling’ of the temporal structure. Finally, we reason about the norms represented by these deontically labeled temporal structures to determine norm redundancy and equivalence of normative systems. We distinguish between conditional norms and conditional obligations. General directives like ”if an agent receives a request, it has to accept or reject within five seconds” are conditional norms. We interpret norms by defining which conditional and/or temporal obligations they give rise to. For example, if at any moment in time for which the norm is in force, the agent receives a request, then for the following five seconds, if it has not accepted or rejected the request yet, it has the obligation to do so. So, norms ‘detach’ obligations. The deontic logic literature distinguishes between so-called factual and deontic detachment [5]. The former is based on a match between the condition of the norm and the facts, and the latter is based on a match between the condition and the obligations.
2
Norms and Obligations
For the temporal structures we use trees, i.e., possibly infinite, branching time temporal structures. Definition 1 (Temporal structure) Let L be a propositional language built on a set of atoms P . A temporal structure is a tuple T = N, E, |= where N is a set of nodes, E ⊆ N × N is a set of edges obeying the tree properties, and |=⊆ N × L is any minimal satisfaction relation for nodes and propositional formulas of L closed under propositional logic. We consider only regulative norms like obligations and prohibitions, since they are the most basic and often used kind of norms. Following input/output logic [8, 9], we write a conditional norm “if i, then o is obligatory” as a pair of propositional formulas (i, o). Definition 2 (Normative system) A norm “if i, then obligatory o” is represented by a pair of formulas of L, and written as (i, o). It is also read as the norm “if i, then forbidden ¬o.” A normative system S is a set of norms {(i1 , o1 ), . . . , (in , on )}. 1 2
University of Utrecht, The Netherlands, email: broersen@cs.uu.nl Computer Science and Communication, University of Luxembourg, Luxembourg, email: leon.vandertorre@uni.lu
Example 1 (Manuscript) The norms are “if owexy , then obligatory payxy ” (owexy , payxy ), and “if payxy , then obligatory receiptyx ” (payxy , receiptyx ). Here x and y are variables ranging over the set of agents, in the sense that each norm is treated as a set of proposition based norms, for each instance of the agent variables. Norms are used to detach obligations. The detached obligations are a labeling of the temporal structure. Definition 3 (Obligation labeling) An obligation labeling is a function O : N → 2L . The way we label the temporal structure determines the meaning of the norms. For the ‘persistent norm semantics’ we assume persistence and deductive closure of obligatory formulas. Definition 4 (Persistent norm semantics) The persistent norm semantics of a normative system S is the unique obligation labeling O : N → 2L such that for each node n, O(n) is the minimal set such that: 1. for all norms (i, o), all nodes n1 and all paths (n1 , n2 , . . . , nm ) with m ≥ 1, if n1 |= i and nk |= o for all 1 ≤ k ≤ m − 1, then o ∈ O(nm ) 2. if O(n) |= ϕ then ϕ ∈ O(n) We now define how to reason about norms, obligations and time. A norm is redundant if it does not affect the obligation labeling of a temporal structure. Definition 5 (Norm redundancy) In normative system S, a norm (i, o) ∈ S is redundant if and only if for all temporal structures, the obligation labeling of S is the same as the obligation labeling of S \ {(i, o)}. Two normative systems S1 and S2 are equivalent if and only if each norm of S1 would be redundant when added to S2 , and vice versa. We have the following result for the semantics of definition 4. Theorem 1 (Completeness norm persistent reasoning) In a normative system S, a norm (i, o) ∈ S is redundant under the persistence semantics if and only if we can derive it from S \ {(i, o)} using replacement of logical equivalents in input and output, together with the following rules: (i1 , o) SI (i1 ∧ i2 , o)
3
(i, o1 ∧ o2 ) WO (i, o1 )
(i1 , o)(i2 , o) OR (i1 ∨ i2 , o)
Fulfilling obligations before they are detached
The persistent norm semantics is not always appropriate. If Peter gives the receipt to John before John has given him the money, maybe
884
J. Broersen and L. van der Torre / Conditional Norms and Dyadic Obligations in Time
because they are in a long standing relationship and Peter trusts John, or maybe because Peter wrongly believed that John already transferred the money, then after John gives him the money, the obligation to write a receipt is still detached, and persists indefinitely. In this section we define a semantics avoiding this property, using a labeling with dyadic obligations O(o|c), read as “o is obligatory in context c.” Definition 6 (Dyadic obligation labeling) A dyadic obligation labeling is a function Od : N → 2L×L . O(receiptpj ) receiptpj * O(receiptpj ) O(payjp ) payjp
O(receiptpj ) O(receiptpj ) HH j O(receiptpj |payjp ) O(payjp ) O(payjp ) O(receiptpj |payjp ) - owejp O(payjp ) payjp * HH j receiptpj O(receiptpj |payjp ) A HH O(payjp ) O(payjp ) O(payjp |owejp ) j A A A O(receiptpj |pay jp ) O(receiptpj |payjp ) AU O(payjp ) O(payjp )
Figure 1. Labeling of the temporal structure using the receipt normative system S = {(owejp , payjp ), (payjp , receiptpj )} with persistent dyadic obligations. The obligations persists in time, until they are fulfilled. The obligation O(receiptpj |payjp ) does not persist after receiptpj holds.
Example 2 (Receipt, continued) See the temporal structure in Figure1. The desired labelling with dyadic obligations is as follows. From the root node, we detach dyadic obligations O(o|i) for all the norms (i, o). Then, a monadic obligation is detached from the dyadic obligation when the context holds in a node, and the obligation persists until its consequent holds in a node. In particular, the obligation for receiptpj in context of payjp does not persist after receiptpj holds. The look-ahead norm semantics adds to the persistence semantics that obligations are possibly not generated because before the moment where their condition becomes true, the obligation has already been satisfied. We achieve this by keeping track of all dyadic obligations generated by the norms at any temporal initial state. Definition 7 (Look-ahead norm semantics) The look-ahead norm semantics of a normative system S is the obligation labeling O : N → 2L together with the dyadic obligation labeling Od : N → 2L×L such that for each n, O(n) and Od (n) are the minimal sets such that: 1. for all norms (i, o) and a root node n0 , (i, o) ∈ Od (n0 ). 2. for all paths (n1 , n2 , . . . , nm ), if (i, o) ∈ Od (n1 ) and nk |= o for all 1 ≤ k ≤ m − 1, then (i, o) ∈ Od (nm ) 3. for all paths (n1 , n2 , . . . , nl , m), if (i, o) ∈ Od (n1 ) and n1 |= i and nk |= o for all 1 ≤ k ≤ l, then o ∈ O(m)
4. if O(n) |= ϕ then ϕ ∈ O(n) Notably, reasoning is not different in this semantics. Theorem 2 (Completeness look-ahead reasoning) In a normative system S, a norm (i, o) ∈ S is redundant under the look-ahead semantics if and only if we can derive it from S \ {(i, o)} using replacement of logical equivalents in input and output, together with the following rules: (i1 , o) SI (i1 ∧ i2 , o)
(i, o1 ∧ o2 ) WO (i, o1 )
(i1 , o)(i2 , o) OR (i1 ∨ i2 , o)
Also this semantics has a drawback. Unlike for the persistence labeling, we do not have that every time the condition of the norm is true, an obligation is detached. For example, if at some future moment in time, we again have that owejp , then we no longer detach the obligations for payjp and receiptpj . This is left for further research.
4
Concluding remarks
The distinction between norms and obligations goes back to the philosophical problem known as Jorgenson’s dilemma [7], which roughly says that a proper logic of norms is impossible because norms do not have truth values. Systems without explicit norms are difficult to use in multi-agent systems. However, most formal systems in the deontic literature [1, 11, 5, 10, 6, 7, 4] are restricted to obligations, prohibitions and permissions, and do not consider the originating norms explicitly. Furthermore, systems that do explicitly represent norms of the system usually do not provide a way to reason about them. Finally, systems for reasoning about norms [7, 8, 3] do not consider the intricacies of time. Our approach aims at filling this gap. Our approach also gives temporal interpretations to well known issues discussed in the deontic logic literature, such as the distinction between ‘conditions’ and ‘contexts’, and the distinction between between creating a new obligation and detaching an obligation.
REFERENCES [1] C.E. Alchourr´on and E. Bulygin, Normative Systems, Springer, Wien, 1971. [2] J.M. Broersen and L. van der Torre, ‘Reasoning about norms, obligations, time and agents’, in Proceedings PRIMA ’07, eds., A. Ghose and G. Governatori, Lecture Notes in Computer Science. Springer, (2008). [3] J. Hansen, ‘Sets, sentences, and some logics about imperatives’, Fundamenta Informaticae, 48, 205 – 226, (2001). [4] J. Horty, Agency and Deontic Logic, Oxford University Press, 2001. [5] B. Loewer and M. Belzer, ‘Dyadic deontic detachment’, Synthese, 54, 295–318, (1983). [6] D. Makinson, ‘Five faces of minimality’, Studia Logica, 52, 339–379, (1993). [7] D. Makinson, ‘On a fundamental problem of deontic logic’, in Norms, Logics and Information Systems. New Studies on Deontic Logic and Computer Science, eds., P. McNamara and H. Prakken, pp. 29–54. IOS, (1999). [8] D. Makinson and L. van der Torre, ‘Input-output logics’, Journal of Philosophical Logic, 29(4), 383–408, (2000). [9] D. Makinson and L. van der Torre, ‘Constraints for input-output logics’, Journal of Philosophical Logic, 30(2), 155–185, (2001). [10] J. J. Ch. Meyer, ‘A different approach to deontic logic: Deontic logic viewed as a variant of dynamic logic’, Notre Dame Journal of Formal Logic, 29(1), 109–136, (1988). [11] J. van Eck, ‘A system of temporally relative modal and deontic predicate logic and its philosophical applications’, logique et analyse, 25, 339–381, (1982).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-885
885
Trust Aware Negotiation Dissolution Nicol´as Hormaz´abal, Josep Lluis de la Rosa i Esteva and Silvana Aciar 1 Abstract. In this paper we propose a recommender system that suggests the best moment to end a negotiation. The recommendation is made from a trust evaluation of every agent in the negotiation based on their past negotiation experiences. For this, we introduce the Trust Aware Negotiation Dissolution algorithm.
1
INTRODUCTION
Negotiation and cooperation are critical issues in multi-agent environments [3], such as in Multi Agents Systems and research on Distributed Artificial Intelligence. In distributed systems, high costs and time delays are associated with operators that make high demands on the communication bandwidth [1]. Considering that agents are aware of their own preferences, which help their decision making during the negotiation process, the negotiation can go through several steps depending on their values as each agent does not know the others’ preferences. This can lead to an increase of communication bandwith costs affecting the general performance, and might put agents in undesirable negotiation situations (such as a negotiation that probably will not end with an acceptable agreement). Termination of the negotiation process or a negotiation dissolution action should be considered when the negotiation is in a situation where the expected result of the following steps cannot be better than the current result. This will not only help to determine when to end a negotiation process, but also to help decide wether to end it with or without an agreement.
2
TRUST AWARE DISSOLUTION
The Trust Aware Negotiation Dissolution algorithm (TAND from now on) takes into account direct interactions from similar situations in the past (Situational Trust [4]). The basic formula used to calculate this type of trust is:
APα ) Ta (y, α) = Ua (α) × Ta (y,
Where e is the number of times that an agreement has been made with the agent y on the each situation from Pα , and n is the total number of observed cases in Pα with the agent y. n = |Pα |. A function g based on agent a’s decision process returns the set S of j possible negotiation situations (the offers the agent is willing to make) σ based on the current situation α the agent is in: g : α −→ S
(3)
S = {σ1 , σ2 , ..., σj }
(4)
From the possible situations, we obtain the best expected situational trust Ea (y, S); which obtains the trust for the best expected case from among the possible situations in which the agents can find themselves in the future, given the current negotiation: Ea (y, S) = max Ta (y, σ) σ∈S
(5)
We know the trust in the current situation Ta (y, α). We also have the best expected situational trust Ea (y, S). With these two values, we can calculate a rate that will help the agent decide whether or not they should continue the negotiation. The situational trust at the present time, divided by the potential situational trust gives us the Dissolution Rate R, which in conjunction with a minimum acceptable trust value M , will help to decide whether or not to dissolve the negotiation process.
(1)
Where: • • • •
a is the evaluator agent. y is the target agent. α is the situation. Ua (α) represents the utility that a gains from a situation α, calculated by its utility function. • Pα is a set of past situations similar to α. APα ) is an estimated general trust for the current situation. • Ta (y, We will calculate this value considering two possible results for each situation in the set of past interactions Pα , that are similar to
1
α: a successful result or an unsuccessful one (whether or not an agreement was reached). This leads to the calculation of the probability that the current situation could end in an agreement based on past interactions (based on the work in [6]). This is calculated by: APα ) = e Ta (y, (2) n
Agents Research Lab, University of Girona, Catalonia, Spain, email: (nicolash, peplluis, saciar)@eia.udg.edu
R=
Ta (y, α) Ea (y, S)
(6)
The dissolution decision depends on the value of R: R ≥ 1 ⇒ Dissolve (R < 1) ∨ (Ea (y, S) < M ) ⇒ Dissolve (R < 1) ∨ (Ea (y, S) ≥ M ) ⇒ Continue Negotiating
(7)
In other words, if, based on future steps, the expected situation does not have a better trust value than the current one, the best thing to do is to end the negotiation now. Otherwise, it is better to continue negotiating.
886
3
N. Hormazábal et al. / Trust Aware Negotiation Dissolution
EXPERIMENT AND RESULTS
Table 1.
For testing the TAND algorithm, we implemented a negotiation environment where two agents negotiate to reach an agreement from a limited number of options; agents consecutively offer their next best option at each step until the offer is no better than the received one. The scenario consists of different agents that each represent a person who wants to go to a movie with a partner, so they negotiate between them from different available movie genres to choose which movie to go to together. The environment was developed in RePast 2 . In order to avoid affecting the system performance, agents will only save the trust of a limited number of potential partners in their knowledge base; that is, they will maintain a limited contact list instead of recording the experience of every partner they have negotiated with. There will be a fixed number of available movie genres (for example, drama, comedy, horror, etc.) during the whole simulation. Each agent will have a randomly generated personal preference value (from a uniform distribution) for each genre between 0 and 1, where 0 is a genre it does not like at all, and 1 is its preferred movie genre. One of these genres, randomly chosen for each agent, will have a preference value of 1, so each agent will have always a favorite genre. We assume that there is always a movie in the theaters available for each genre. Each movie genre will be used to identify the situation α the negotiation is in, for the calculation of the trust from equation 1. The result of the utility function Ua (α) will be the preference for each movie genre randomly assigned for each agent. Partners, involved in the negotiation will be randomly chosen. An agent can participate only in one negotiation at one time. The experiment will run through three different cases, each one with 100 agents and 10 different movie genres: • Case 1: Contact list of 20 most trusted partners. • Case 2: Unlimited contact list size. • Case 3: No TAND, simple negotiation. Every experiment will run through 2000 steps. At each step, 1/4 of the total population (25 agents for the cases described above) will invite another partner to a negotiation for a movie. For evaluating the performance, we will use three values: • Average steps used for all agreements made: AS (lower is better). • Average preference (the final preference value during the agreement for each successful negotiation): AP (higher is better). • Average distance from the perfect pitch: AD (lower is better). We define the perfect pitch P as the highest value for the product of each agent a in the A set of participating agents’ preference (result of the utility function Ua (α) for each movie genre) from every possible agreement d: P = max( d∈D
fa )
(8)
a∈A
The distance from the perfect pitch is the difference from the negotiation preference K with the perfect pitch P . AD = P − K
(9)
After 20 experiments for each case, in every case at each experiment we averaged the results obtained, seen in table 1. 2
http://repast.sourceforge.net
AS AP AD
Avg Std Dev Avg Std Dev Avg Std Dev
Average Final Steps. Case 1 5,2894 0,0283 0,8001 0,0073 0,1370 0,0034
Case 2 4,6683 0,0249 0,8168 0,0064 0,1125 0,0030
Case 3 5,6993 0,0282 0,7892 0,0080 0,1548 0,0048
The results improve in cases 1 and 2, in terms of average steps AS needed for closing a negotiation with an agreement, compared to case 3, where TAND is not used. However, the average preference AP has a higher value, and the distance from the perfect pitch AD is reduced more than 35% from case 3 to case 2. The contact list size is a critical issue, as we can see from comparing results between cases 1 and 2, that the improvement is higher when there are no limits in the contact list’s size.
4
CONCLUSIONS AND FUTURE WORK
We have presented TAND and its preliminary results, where we can see that it improves the negotiation process in terms of agents’ preferences and number of steps to achieve an agreement. Taking into account possible agents’ performance issues, a limited contact list should be considered, but its size limitation could negatively affect the TAND results as we can see in table 1, so that work finding the optimal contact list size should be done. As far as now, the contact list filling criteria are simple, in the trusted contact list, agents with higher trust replace the agents with lower values and when the contact list is full, improved results are expected using other criteria for dealing with the contact list, for example using different levels of priorities, or a relation with the partner selection criteria (in the experiments the selection is made randomly). TAND has been tested on a simple bilateral negotiation process, but can also be used on other types of temporary coalitions such as dynamic electronic institutions [5] for supporting their dissolution phase. Future work will focus on this, expanding its scope to generic types of coalitions. In addition, work on implementing other ways to calculate trust should be done, and other methods to manage the dissolution (such as Case Based Reasoning [2]) in order to compare results. The topic of dissolution of coalitions is not a new one, but it is not a topic that has been studied in depth [2], so this research topic provides a wide open field that needs to be explored.
REFERENCES [1] H. Bui, D. Kieronska, and S. Venkatesh, ‘Learning other agents preferences in multiagent negotiation’, Proceedings of the National Conference on Articial Intelligence (AAAI-96), 114–119, (1996). [2] N. Hormazabal and J. Ll. de la Rosa, ‘Dissolution of dynamic electronic institutions, a first approach: Relevant factors and causes’, in 2008 Second IEEE International Conference on Digital Ecosystems and Technologies, (February 2008). [3] S. Kraus, ‘Negotiation and cooperation in multi-agent environments’, Artificial Intelligence, 79–97, (1997). [4] S. Marsh, Formalising Trust as a Computational Concept, Ph.D. dissertation, Department of Mathematics and Computer Science, University of Stirling, 1994. [5] E. Muntaner-Perich and J. Ll. de la Rosa, ‘Dynamic electronic institutions: from agent coalitions to agent institutions’, Workshop on Radical Agent Concepts (WRAC’05), Springer LNCS (Volume 3825), (2006). [6] M. Schillo, P. Funk, and M. Rovatsos, ‘Using trust for detecting deceitful agents in articial societites’, Applied Articial Intelligence, (Special Issue on Trust, Deception and Fraud in Agent Societies), 14, 825–848, (2000).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-887
887
On the Role of Structured Information Exchange in Supervised Learning Ricardo M. Araujo and Luis C. Lamb Institute of Informatics - UFRGS, Brazil 1
Introduction
When considering Multi-Agent Systems (MAS) composed of learning agents, an important aspect is how agents interact with each other - i.e. their social structure [8, 6]. Several studies are concerned with understanding the role of structures in multi-agent learning (MAL). Research on this topic are typically directed to static structures to constraint interaction between agents [3, 5]. However, in several cases, it may be preferable to have a self-organized social structure by letting agents decide the connections they will make [1]. We propose a multi-agent framework for modeling agents that exchange information through a self-organized network. This framework is composed of communication stages that separate the dynamics of and on the network [7] - i.e. how the underlying network is constructed and how it is used. In several ways, our model resembles social network models and the propagation of memes [2] between individuals. We thus call the framework Memetic Networks. We apply the framework to build a distributed learning algorithm that learn concepts by exchanging information about hypotheses constructed by individual agents. We show that this algorithm is able to learn reasonably complex concepts on a real-world scenario. We use this algorithm to explore questions such as (i) is it advantageous to be connected to many sources of information? and (ii) is it beneficial to have access only to good sources of information?
2
On Memetic Networks
We define a Memetic Network as a set of agents that may exchange information through a network following rules that guide how information flows and is used. Our scenario can be formalized as follows. Let A be an ordered set of agents a1 , a2 , ..., aN and E an unordered set of pairs of distinct agents in A. The ordered pair (A, E) is thus a graph where vertices represent agents and edges represent the possibility of two agents to interact. Three rules define how the graph is wired and how information is processed. We describe conceptually each rule below and shall present specific implementations for each one in the next section. Connection rule. It specifies how individuals will connect to and disconnect from each other. This rule guides the construction of the network structure. For example, an instance of such rule could be “a connection between node n1 and n2 exists if and only if n1 is better evaluated than n2 ” or “connect randomly to a certain number of individuals”. The connection rule is executed at every step of the algorithm, thus the network is dynamic. Aggregation rule. Given a connection, this rule specifies how information is to flow through it. It guides how the solution contained in each node is to be modified as a function of the connected nodes. For example, if every node contains a single bit, this rule could be “adopt the bit that is present in the majority of connected nodes”. It defines the dynamics on the network.
Appropriation rule. After information has been aggregated, this rule specifies any local changes to the information contained in a node (e.g. the application of a hill-climbing search).
3
Concept Learning in Memetic Networks
Using the framework defined above, we can define a Memetic Network Algorithm that is able to inductively learn concepts by searching the hypotheses space. The search is guided by information exchange between agents and new hypotheses are formed by aggregating multiple existing hypotheses. We are interested in learning binary concepts from a set of (possibly noisy) examples. In order to define each rule and thus instantiate a Memetic Network, we must choose a representation for the hypotheses and an evaluation criteria. We use propositional rules in Disjunctive Normal Form (DNF) to represent our search space, using the binary codification proposed in [4]. An agent ai thus contains a binary string hi which represents a hypothesis. Hypotheses are evaluated by the number of examples correctly classified. In what follows, we propose and discuss each implementation of the proposed general Memetic Network rules. Connection rule: a directed edge (ai , aj ) from agent ai to agent aj exists if and only if eval(aj ) > eval(ai ), where eval(ak ) is the evaluation of the hypothesis of agent ak . Aggregation rule: the bit in position j in hypothesis hk is equal to the bit value in position j that happens more often among all agents that ak connects to. Ties are broken by coin toss. Appropriation rule: each bit of the aggreagated hypothesis is flipped with a (small) probability pn and the whole hypothesis is evaluated; if this new hypothesis is better than the previous one, it becomes the hypothesis for the agent; otherwise, the previous hypothesis is kept unchanged.
4
Experiments
We have experimented with the algorithm described in the previous section on the Breast Cancer Recurrence dataset [9]. This dataset is composed of 286 examples with 9 features (each varying between 2 and 13 distinct possible values, all discrete) and 2 classes; 201 examples are positive examples and 85 are negative. We randomly partition the whole set of examples into a training set (200 examples) and validation set (86 examples). Each agent was randomly initialized with a hypothesis and evaluated using the training set by counting how many examples it correctly classified (batch mode). We let the algorithm run for 1000 rounds and repeat the whole process 20 times. We set pn = 0.001, N = 100 and limited the rules to a maximum of 20 disjuncts (hypotheses are 1020 bits in length). Figure 1(a) shows convergence results over the validation set. Network Diversity. In order to better understand whether there are benefits of being connected to many agents, we modify the connec-
888
R.M. Araujo and L.C. Lamb / The Role of Structured Information Exchange in Supervised Learning
tion rule so as to limit the number of connections. In what follows, indegree(ak ) and outdegree(ak ) measure respectively the number of incoming and outgoing connections of agent ak . Connection Rule (2nd version): a directed edge (ai , aj ) from agent ai to agent aj exists if and only if eval(aj ) > eval(ai ) and indegree(aj ) < α and outdegree(ai ) < β. Figures 1(b) and 1(c) show how performance changes with changes in α and β. We can observe a logarithmic increase in the accuracy of the best solution when we increase α. Thus, allowing agents to source information to many other agents is beneficial in our model. By setting a high α value we are allowing good information to reach as many agents as possible; thus one could argue that the above results were expected. However, the excessive spread of good information could cause early stagnation as most agents would converge to the same solution. If this were the case, an intermediate value of α would be the best setting, but this is not what happens. Variations in β are responsible for smaller variations in performance when compared to changes in α. However, unlike for α, there is a optimum intermediate value for β - setting this parameter to lower or higher values than the optimum cause the algorithm’s performance to worsen. When connecting to very few agents, not enough information is being recombined and agents are effectively cloning better hypotheses (thus exploiting good solutions but not exploring efficiently the search space). When connecting to too many agents the high diversity of solutions seem to be detrimental to the aggregation rule’s ability to perform recombinations that are useful. Agent Diversity. We assumed in our learning algorithm that it is beneficial to be connected only to agents that are better evaluated. To test such assumption, we modified the algorithm so as to let agents connect to other agents that have lower evaluations with probability pd . This increases the diversity of solutions that can be used to compose new hypotheses. Figure 1(d) shows the results of varying pd from 0.0 to 1.0 with 0.1 increments. We observe that the algorithm’s performance quickly drops as we increase the probability of bad solutions to take part of the recombination process. Allowing agents to have access to worse evaluated hypotheses is detrimental to the algorithm’s performance.
(a) Convergence results for validation set
(b) Accuracy after 100 rounds for varying α
Acknowledgments. Work partly supported by CNPq - Brazil.
REFERENCES [1] R. Araujo and L. Lamb, ‘Memetic Networks: Analyzing the Effects of Network Properties in Multi-Agent Performance’, in Proc. of the AAAI08, (2008). [2] Richard Dawkins, The Selfish Gene, Oxford University Press, 1976. [3] M. Giacobini, M. Tomassini, and A. Tettamanzi, ‘Takeover time curves in random and small-world structured populations’, in GECCO ’05: Proceedings of the 2005 conference on Genetic and evolutionary computation, (2005). [4] K. De Jong and W. Spears, ‘Using genetic algorithms for concept learning’, Machine Learning, 13, 161–188, (1993). [5] E. Lieberman, C. Hauert, and M. A. Nowak, ‘Evolutionary dynamics on graphs’, Nature, 433(7023), 312–316, (January 2005). [6] P. Mathieu, J. C. Routier, and Y. Secq, ‘Dynamic organization of multiagent systems’, in Proc. of the 1st International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 451–452, New York, NY, USA, (2002). ACM. [7] The Structure and Dynamics of Networks, eds., M. Newman, A.-L. Barab´asi, and D. Watts, Princeton University Press, 2006. [8] Z.-G. Wang and X.-H. Liang, ‘A graph based simulation of reorganization in multi-agent systems’, in Proc. of the IEEE/WIC/ACM International conference on Intelligent Agent Technology, (2006). IEEE Computer Society. [9] M. Zwitter and M. Soklic. Breast cancer data. Institute of Oncology, University Medical Centre Ljubljana, Yugoslavia., 1988.
(c) Accuracy after 100 rounds for varying β
(d) Accuracy after 100 rounds for varying pd
Figure 1.
Experiments results
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-889
889
Magic Agents: Using Information Relevance to Control Autonomy 1 B. van der Vecht23 and F. Dignum3 and J-J.Ch. Meyer3 Abstract. Autonomous agents are believed to have control over their internal state and over their behaviour. For that reason, an agent should control how and by whom it is being influenced. We introduce a reasoning component for BDI-agents that deals with the control over external influences, and we propose heuristics using local knowledge to process incoming stimuli. One of those heuristics is based on information relevance with respect to the agent’s current plans and goals. We have developed a way to determine the relevance of information in BDI-agents using magic sets from database research as basis. The method presented shows a new application of magic sets by applying the theory in agent systems.
1
INTRODUCTION
Agents are believed to be autonomous, meaning that they have control over their internal state and over their behaviour, [4]. In a multiagent environment where coordination of group behaviour is required agents will influence each other. We argue that an autonomous agent should control how and by whom it is being influenced. Key issue is to find general heuristics to control external influences. In this paper we investigate information relevance as such a heuristic. An agent that can determine the information relevance with respect to its goals, is able to deal dynamically with external input and is less sensitive for information overload [2]. We describe a way to determine information relevance in BDI agents based on magic sets [1], a method developed for efficient deductive database searching. We introduce a new use of magic-sets theory, that is beneficial for agent reasoning.
2
CONTROLLING EXTERNAL INFLUENCES
A popular approach for agent reasoning is BDI-reasoning. We can view a BDI-agent as a mental state and a reasoning process. The mental state captures the beliefs, goals and plans and the reasoning process decides upon actions based on the mental state. The mental state is updated via internal actions and via external influences, such as own observations or messages from other agents. In [5] an extension to the classic reasoning model has been proposed, such that an agent analyzes incoming stimuli based on internal knowledge, before it adopts them in its mental state. This requires a separate process next to the goal-directed decision making. The model uses reasoning rules in order to decide on adoption or 1
2 3
The research reported here is part of the Interactive Collaborative Information Systems (ICIS) project, supported by the Dutch Ministry of Economic Affairs, grant no: BSIK03024. TNO Defense, Security and Safety, The Netherlands Department of Information and Computing Sciences, Universiteit Utrecht, The Netherlands, email: bobv, dignum, jj@cs.uu.nl
rejection of external influences. The reasoning rules contain predicates that consult knowledge from the agent’s internal state. Reasoning rules contain a head, a guard and a body. The head indicates the activation event for the rule, the guard contains predicates that should match with the agent’s belief base, and the body describes the resulting action. An example of a reasoning rule: observe(X) <-
relevant(X) | Adopt(X)
The rule is activated by an observation event. The rule states that if something is observed and it is relevant the agent will add the information to its belief base. The agent evaluates the predicates using its local knowledge, and therefore it actively chooses whether it should reject or adopt an event. This process is opposed to models that adopt observations or messages directly into the belief base, and therewith take the control over the internal state away from the agent.
3
INFORMATION RELEVANCE
An agent is continuously receptive for input from the environment. In our model an agent has the option to adopt or reject incoming stimuli based on local knowledge. It evaluates predicates in reasoning rules to determine whether to adopt or reject influences. An immediate question is: what are valid reasons to adopt or reject influences? Castelfranchi introduced information relevance as a typical heuristics to control external influences [2]. He defines relevance of information with respect to a certain goal. Intuitively, an agent might want to focus on a specific type of information given its goals. A typical reason can be that the agent does not want to be distracted, or it wants to prevent information overload. Therefore it should be able to determine whether information is relevant for the goal. Relevance of information is a heuristic for an agent to determine how it influenced. The agent should also to control by whom it is being influenced. The reasoning rules can use knowledge about the organization or about social context, for example by evaluating whether the sender of a message be trusted. Then, agents achieve coordination by allowing influence on the internal state based on social and organizational knowledge. One can think of several other reasons to allow or refuse influence on the internal state. For example, domain knowledge always plays a role. In this paper we focus on the heuristic of information relevance.
4
MAGIC AGENTS
In work on agent autonomy Castelfranchi has described information relevance for agents with respect to goals [2]. According to his definition, information is relevant for a goal if the information is about
890
B. van der Vecht et al. / Magic Agents: Using Information Relevance to Control Autonomy
the goal, if it is about the content of a goal or if it is about plan relations of the goal. Castelfranchi did not include any methods of how relevance of information could be determined. We analyze the notion of relevance based on the 3APL model [3]. 3APL provides a reasoning model and a programming language for BDI-agents. The agent’s internal state consists of a belief base, a goal base, a set of reasoning rules and a set of capabilities. The deliberation cycle describes the decision-making process, that makes use of the concepts in the internal state. During the deliberation process queries will be asked to the belief base. From the formal semantics of the 3APL model we know that the following types of queries exist in the deliberation process: guard checks, test actions and goal achievement checks. The guard checks are used to activate a reasoning rule, test actions are tests on the belief base as part of a plan and goal achievement checks are used to check whether a current goals have been reached. All information used to solve those queries is relevant for the agent’s reasoning process. However, the belief base also contains allows deduction rules and it will be difficult to tell in advance which facts are used to solve the query. We could evaluate all possible queries, in order to determine whether information is relevant at the moment of perception. This would be an intensive task, and therefore we have developed a method for quick evaluation of information relevance based on magic sets.
4.1
Using magic sets for relevance determination
The magic set method is a bottom-up query evaluation technique developed in deductive database research. A straight forward algorithm for the Magic Set transformation is explained in [1]. Magic Sets are used to define the relevant elements in a database for a specific query in order to speed up the search process significantly. We can use the theory behind magic sets to determine information relevance. Consider a program P containing logical derivation rules. A query is written as q(c, X), where some variables of the query are bound (c) and others are open (X) and need to be derived. The solution to the query is a set of bindings for variables in X that make the query expression true. The Magic Set method evaluates program P with the information of the bound variables from the query Q. The program P is rewritten into a new program P’, that is equivalent to P with respect to the query, and that uses the bindings in the query to direct the computation. A new predicate is defined based on the query, in which all values of predicates that need to be computed are stored. This new predicate is the magic set. We can use the magic sets to determine information relevance in a BDI agent. The agent wants to know the relevance of an observation with respect to a query. We can create magic sets for those queries. Therefore we create a predicate relevant(X), which is derived using the magic sets of the queries.
contain the decision to start replanning the route based on traffic information and otherwise execute the action Go. BELIEFBASE: on_route(X, Y, Z) :- planned_route(X, Y). on_route(X, Y, Z) :- planned_route(X, W), on_route(W, Y, Z). traffic(X, Y) :- traffic_message(Z), on_route(X, Z, Y). REASONING RULES: <-- traffic(X, Y) | Replan <-- NO traffic(X,Y) | Go
We want to determine the relevance of traffic messages. Based on the reasoning rules of the agent, we know that there is one query on the belief base that uses traffic messages in its deduction: traffic(X, Y)?. Therefore we need to create a magic set for the traffic predicate. The free variable Z from traffic message(Z) is also used in on route(X, Z, Y). Now we can define relevance of a traffic message as follows: relevant( traffic_message(Z) ) :magic_traffic(X, Y), on_route(X, Z, Y).
The construction of magic traffic(X, Y) predicate is done using the algorithm for the Magic Set transformation [1]. Furthermore, the traffic and on route predicates are rewritten in the magic set transformation, which ensures a quick evaluation using the immediate bindings for X and Y. The set of relevant messages is in the relevance predicate. Whenever the agent receives a message it can derive whether it contains relevant information or not. We can use the predicate in reasoning rules to control external influences. For example, the following agent only accepts relevant messages and ignores all other messages: traffic_message(Y) :- relevant((traffic_message(Y)) | Accept(Y) traffic_message(Y) :- TRUE | Ignore(Y)
5
CONCLUSION
We have argued that an autonomous agent should control how and by whom it is being influenced. We have introduced a reasoning component that deals with the control over external influences, and we have described heuristics based on local knowledge with which the agent can decide to adopt or reject incoming stimuli. Information relevance is such a heuristic. Agents that control external influences based on information relevance can improve their performance and are less sensitive for information overload. We have proposed a way to determine the relevance of information in BDI-agents based on magic sets theory. Our approach showed a new use of magic sets within agent systems.
REFERENCES 4.2
Example
Consider an agent traveling from A to B. It continuously receives traffic information, which might lead to a reconsideration of the planned route. Intuitively we know that only information that concerns the agent’s planned route is relevant. We will construct the predicate relevant( traffic message(X ). The belief base contains a route the agent has planned in planned route(X, Y) facts. Furthermore the agent can determine whether a location is on the planned route. The fact on route(X, Y, Z implies that Y is on the route from X to Z. The reasoning rules
[1] C. Beeri and R. Ramakrishnan, ‘On the power of magic’, Journal of Logic Programming, 10, 255–300, (1991). [2] C Castelfranchi, ‘Guarantees for autonomy in cognitive agent architecture’, Intelligent Agents, (890), 56–70, (1995). [3] M. Dastani, B. van Riemsdijk, F. Dignum, and J-J. Ch. Meyer, ‘A programming language for cognitive agents: Goal directed 3apl’, in ProMAS’03, volume 3067 of LNAI, pp. 111–130. Springer, (2004). [4] N. R. Jennings, ‘On agent-based software engineering’, Artificial Intelligence, 117(2), 277–296, (2000). [5] B. van der Vecht, F. Dignum, J-J. Ch. Meyer, and M. Neef, ‘A dynamic coordination mechanism using adjustable autonomy’, in COIN III, volume 4870 of LNCS, pp. 83–96. Springer, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-891
891
Infection-Based Norm Emergence in Multi-Agent Complex Networks Norman Salazar and Juan A. Rodriguez-Aguilar and Josep Ll. Arcos 1 Abstract. We propose a computational model that facilitates agents in a MAS to collaboratively evolve their norms to reach the best norm conventions. Our approach borrows from the social contagion phenomenon to exploit the notion of positive infection: agents with good behaviors become infectious to spread their norms in the agent society. By combining infection and innovation, our computational model helps a MAS establish better norm conventions even when a sub-optimal one has fully settled in the population.
1
Introduction
Norms have become a common mechanism to regulate the behavior of agents in multi-agent systems (MAS). They exist to balance agents’ interests with respect to the society’s in such a way that each agent can pursue its individual goals without preventing other agents’. However, learning and establishing an adequate set of norms is not trivial. This process, usually referred to as either selforganization or emergence. In societies, conventions result when members agree upon a specific behavior. Thus, a norm convention refers to a set of norms that has been established among the members of a society. One of the trends of thought in social studies is that norm conventions emerge by propagation or contagion, where social facilitation and imitation are key factors [2, 1]. From a MAS perspective, the studies in [8] [7] show that norm emergence is possible. However, these works limit to analyze norm propagation, leaving out norm innovation (discovery of new norms), a key factor for the evolution of societies. When the aim is to help a MAS establish conventions in dynamic environments, propagating norms may not be enough since propagation assumes that at least some agent in the society knows the correct set of norms, which is not always the case. Additionally, the problem can become even more difficult when the aim is not only to establish (any) convention(s), but the best convention(s). We propose an evolutionary computational model that facilitates agents in a MAS to collaboratively evolve their norms to reach the best norm conventions for a wide range of interaction topologies. At this aim, we take inspiration on the argument in the social sciences literature that behavior conventions arise from a social contagion [1]. Although further evolutionary approaches appear in the literature [4], they are usually applied either: (i) as a centralized process; or (ii) as an individual self-contained process for each agent. Both approaches can be potentially slow and tend to be off-line processes, and thus unsuitable to our purpose of dynamically adapting norms. 1
IIIA, Artificial Intelligence Research Institute, CSIC, Spanish National Research Council, Spain, email:{norman, jar, arcos}@iiia.csic.es
2
An Evolutionary Infection-Based Model
We propose a computational model that helps agents in a MAS reach norm conventions that maximize the social welfare. At this aim, we assume, in line with the distributed nature of the problem, that we can achieve our goal by maximizing agents’ individual welfares. The social sciences literature argues that conventions in societies are reached through social contagion [1]: behaviors spread between individuals akin to an infectious disease. Hence, we chose to model the social contagion into a MAS framework. However, we target beneficial conventions that if possible tend to maximize the social welfare. Considering the social welfare as a composition of individual welfares, it makes sense to let the individual behaviors that impact positively on it, here named good behaviors, be more infectious. Nevertheless, positive infection at most achieves a total replication of the best-known behavior among agents. Therefore, we also require a norm innovation mechanism. Hence we expect that a MAS can reach norms that are dominant in the society so that no better ones can be found and no worst ones can upstage them. However, if some unaccounted factor(s) alter(s) the MAS so that the current norms become obsolete (the social welfare deteriorates), the infectious process will re-configure the norms toward a better social welfare. We propose an evolutionary algorithm (EA) approach that helps agents in a MAS reach the best norm conventions. In our infectionbased EA, each agent has genes that encodes its behavior. Agents can infect other agents with their genes following the survival of the fittest concept: the highest its individual welfare, the more infectious. Furthermore, it realizes innovation (exploration) by letting agents mutate their genes. This process runs distributedly: each agent decides whether to infect or mutate based on local knowledge. Thus, each agent is endowed with: i) an evaluation function to assess its individual welfare; ii) a selection process to choose a peer to infect, out of its local neighborhood, based on its fitness; iii) an infection operator to inject some of its genes into the selected agent; and iv) an innovation operator to mutate its genes to create new behaviors.
3
Empirical Results
Agents in a MAS interact with each other by engaging in iterative games with multiple rounds. During a round each agent randomly selects a neighbor agent to play with (an opponent).A play consists in both agents doing an action, either A or B (actions constrained by their current norms). Plays are rewarded with a payoff, which is accumulated after each game round. The payoff for as round can be: -1,1 or α based on the agent’s current action and the action of the selected by the neighbor (different ones, both B and both A respectively). This payoff can help capture pure coordination games [8][7]
N. Salazar et al. / Infection-Based Norm Emergence in Multi-Agent Complex Networks
(α = 1) and coordination games with equilibrium differing in social efficiency [6] (α > 1). Each agent, agi has two parameterized norms: one to help it decide what action to take based on the last opponent’s past action; and another one to decide the action to take when no past action is known. To this end, the agent keeps on its memory the action performed by its last opponent without distinguishing who the opponent was. Thus, our model has the task of finding for each agent the norm parameters that maximize the social-welfare u. It is well known that the behavior of infections its affected by the type of topology on which a population interacts [9, 5]. Therefore, in order to empirically analyze such effects in our infection-based model we chose the following interaction topologies: small-world, 10,0.1 10,−3 10 W1000 ; scale-free, S1000 ; and random graphs, R1000 . We know beforehand that four cooperative-only norms exist (norms that always try to cooperate), and also that they are the strongest attractors. Two of them always make agents do A (Aconventions) and the other two always do B (B-conventions). Aconventions give higher payoffs when α > 1. Our experiments aimed at showing that our model can help establish the best norm convention(s), maximize the social welfare, for a wide range of initial agent settings (norm configurations) and under the most common interaction topologies. Therefore, each experiment is composed of: i) an interaction topology model; ii) a payoff: α ∈ {1, 1.5, 2}; and iii) an initial norm distribution, consisting in initializing the norms of every agent using five distributions: a) random (norms are randomly set); b) attractor-free (norms set from the non-cooperative-only norms); c) low sub-optimal (norms of 25% of the agents set from the B-Conventions ; d) high sub-optimal (75% of agents with norms from the B-Convention); and e) fully sub-optimal (norms of all agents were set from the B-Conventions). We run 50 simulations of each experiment. In a simulation agents interact and infect each other, as described above, during 20000 ticks. To measure if a convention is established, we counted the agents with the same norms per tick, and the agents doing A or B per tick. The counts of each simulation in the experiment where then aggregated using the inter-quartile mean. Pure coordination game [α = 1]. The experiments show that the population converges an A-convention if initially more than 50% of the agents doing action A; otherwise, a B-convention settles down. Importantly, a MAS establishes the cooperative-only norms even though for this game other conventions can achieve the same result. Since the A and B-conventions are equally valuable, in this case the MAS establishes one of the best conventions regardless of the initial norm distribution and independently of the interaction topology. Different social efficiencies [α > 1]. When using random initial distribution, a MAS readily establishes in an A-convention for α > 1.0 independently of the interaction topology. The same occurs for the attractor-free and the low sub-optimal initializations, even though in the former, at startup no agent knew the best norms. Departing from a high sub-optimal distribution, a MAS establishes in a B-convention when α = 1.5 for all interaction topologies. However, by setting α to 2.0, the small-world networks manage to establish an A-convention. Thus, agents will not consider a new convention unless its benefit is significant enough. As to the scale-free case, a greater benefit is needed. The fully sub-optimal distribution represents the worst case scenario (figure 1). In this case, innovation becomes a key factor. When the innovation probability is low, (pmutation = 0.003), the MAS is unable to converge to the best convention, because innovating agents are not able to overcome the high peer pressure. Even more, in-
1000
1000 A-Conventions Tit-For-Tat
800 Agents
892
600
600
400
400
200
200
0
0 0
5000
10000
15000
20000
Ticks
Figure 1.
Action A Action B
800
0
5000
10000
15000
20000
Ticks
Results for scale-free with fully sub-optimal initialization. Left) agents per norm; Right) agents per action
fected scale-free networks are hard to overcome [3, 5]. Hence, we increased the mutation probability (pmutation = 0.055) so that scale-free(α = 2.0) and small-world (α > 1) converged to an AConvention. This occurred because a small group of agents playing tit-for-tat kind of norms starts to appear. Agents with this strategy can coexist with B-Convention agents with a small or non-negative effect to their accumulated payoffs. Therefore, when agents with an A-convention norms appear, they have a higher chance of having neighbors that will cooperate with them. However, a high mutation presents the disadvantage that a small part of the population will be constantly trying to innovate (for our case around 20%). We conclude that highly-clustered agent communities (e.g. smallworld) are more open to positive infections, whereas the lowclustered ones (e.g. scale-free) are harder to infect if some infection has settled. This is similar to some results shown in [6]. However, our evolutionary model can overcome the difficulty of re-infecting low-clustered networks (by using a high innovation through mutation rate) whenever we are ready to pay the following cost: a small subgroup of agents unable to settle on a set of norms. Finally, we claim that i) a convention is always reached, and ii) under certain conditions this convention is the best one for all topologies. Moreover, when these conditions are not met, e.g. a suboptimal convention is fully established, our model can still reach the best convention through innovation.
ACKNOWLEDGEMENTS The first author thanks the CONACyT scholarship. The work was funded by IEA (TIN2006-15662-C02-01), AT (CSD2007-0022), Ssourcing, and the Generalitat of Catalunya grant 2005-SGR-00093.
REFERENCES [1] R Burt, ‘Social contagion and innovation: Cohesion versus structural equivalence’, American J. of Sociology, 92, 1287–1335, (1987). [2] R Conte and Mario P, ‘Intelligent social learning’, Artificial Society and Social Simulation, 4(1), 1–23, (2001). [3] Z Dezso and A.L. Barab´asi, ‘Halting viruses in scale-free networks’, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 65(5), 055103, (2002). [4] D Moriarty, A Schultz, and J Grefenstette, ‘Evolutionary algorithms for reinforcement learning’, Artificial Intelligence Research, 11(1-1), 241– 276, (1999). [5] R Pastor-Satorras and A Vespignani, ‘Epidemic dynamics and endemic states in complex networks’, Physical Review E, 63, 066117, (2001). [6] J.M. Pujol, J Delgado, R Sang¨uesa, and A Flache, ‘The role of clustering on the emergence of efficient social conventions.’, in IJCAI 2005, pp. 965–970, (2005). [7] Y Shoham and M Tennenholtz, ‘On the emergence of social conventions: Modeling, analysis, and simulations’, Artificial Intelligence, 94(12), 139–166, (1997). [8] A Walker and M Wooldridge, ‘Understanding the emergence of conventions in multi- agent systems’, in ICMAS 1995, pp. 384–389, (1995). [9] D.J. Watts and S. H. Strogatz, ‘Collective dynamics of ’small-world’ networks’, Nature, 393(6684), 440–442, (June 1998).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-893
893
Opponent Modelling in Texas Hold’em Poker as the Key for Success Dinis Félix and Luís Paulo Reis1 Abstract. Over the last few years, research in Artificial Intelligence has focussed on games with incomplete information and non-deterministic moves. The game of Poker is a perfect theme for studying this subject. The best known Poker variant is Texas Hold’em that combines simple rules with a huge amount of possible playing strategies. This paper is focussed on developing algorithms for performing simple online opponent modelling in Texas Hold’em Poker enabling to select the best strategy to play against each given opponent. Several autonomous agents were developed in order to simulate typical Poker player’s behaviour and an observer agent was developed, capable of using simple opponent modelling techniques, in order to select the best playing strategy against each opponent. The results obtained in realistic experiments using eight distinct poker playing agents showed the usefulness of the approach. The observer agent is clearly capable of outperforming all their counterparts in all tests performed.
1 INTRODUCTION Incomplete knowledge, risk management, opponent modelling and dealing with unreliable information are topics that identify Poker as an important research area in Artificial Intelligence (AI). Unlike games of perfect information, in poker, players face hidden information resulting from the opponents’ cards and future actions. In such a domain, to be successful, players face the need to use opponent modelling techniques in order to understand and adapt themselves to the opponents playing style [1,2]. However, the huge amount of possible playing strategies in Poker makes opponent modelling a very hard task in this domain.1 Poker is a popular card game in which players bet on the value of the card combination in their possession. The winner is the one who holds the highest valued hand according to an established hand rankings hierarchy, or otherwise the player who remains "in the hand" after all others have folded. Texas Hold’em is the most popular poker game. It is a community card game where each player may use any combination of the five community cards and the player's own two hidden cards to make a poker hand. This characteristic makes it a very good game for strategic analysis. The main goal of the project is to prove that a poker agent that considers the opponent behaviour has better results, against players that use typical poker playing strategies, than an agent that doesn’t, even when playing the same global betting strategy.
2 RELATED WORK This project is based on previous betting strategies developed at the University of Alberta [1,2,3,4]. They are divided in betting strategy 1
FEUP – Faculty of Engineering of the University of Porto, Portugal, LIACC – Artificial Intelligence and Computer Science Lab., Portugal, email: felixdinis@gmail.com, lpreis@fe.up.pt.
before the flop and after the flop [4]. There are 1326 possible hands prior to the flop. The value of one of these hands is called an income rate and is based on an off-line computation that consists of playing several million games where all players call the first bet [5,6]. The basic betting strategy after the flop is based on computing the hand strength (HS), positive potential (PPot), negative potential (NPot), and effective hand strength (EHS) of agent’s hand relative to the board. EHS is a measure of how well the agent's hand stands in relationship to the remaining active opponents in the game. The hand strength (HS) is the probability that a given hand is better than that of an active opponent. Suppose an opponent is equally likely to have any possible two hole card combination. Thus it is possible to calculate the hand strength as: HandStrength(ourcards, boardcards) { ahead = tied = behind = 0 ourrank = Rank(ourcards, boardcards) for each case(oppcards) { opprank = Rank(oppcards, boardcards) if (ourrank>opprank) ahead += 1 else if (ourrank==opprank) tied += 1 else behind += 1 } handstrength=(ahead+tied/2)/ahead+tied+behind) return(handstrength) }
After the flop, there are still two more board cards to be revealed and it is essential to determine its potential impact. The positive potential (PPot) is the chance that a hand that is not currently the best improves to win at the showdown. The negative potential (NPot) is the chance that a currently leading hand ends up losing. PPot and NPot are calculated by enumerating over all possible hole cards for the opponent, like the hand strength calculation, and also over all possible board cards. The effective hand strength (EHS) combines hand strength and potential to give a single measure of the relative strength of a hand against an active opponent. A simple formula for computing the probability of winning at the showdown is: Pr(win)=HSx(1-NPot)+(1-HS)xPPot. Since the interest is the probability of the hand is either currently the best, or will improve to become the best, one possible formula for EHS sets NPot=0, giving: EHS=HS+(1-HS)xPPot.
3 OPPONENT MODELLING No poker strategy is complete without a good opponent modelling system [7]. A strong poker player must develop an adaptive model of each opponent, to identify potential weaknesses. In poker, distinct opponents can make different kinds of errors that may be exploited [4]. The Intelligent Agents developed in this project observe the moves of the other players in the table. There are many possible approaches to opponent modelling [2,8,9], but in this work the observation model is based on basic observation of the starting moves of the players, so it could be created a fast, online estimated guess of their starting hands in future rounds.
894
D. Félix and L.P. Reis / Opponent Modelling in Texas Hold’em Poker as the Key for Success
Players could be classified generally in four models that depend of two parameters: loose/tight and passive/aggressive. Knowing the types of hole cards various players tend to play, and in what position, is probably the starting point of opponent modelling. Players are classified as loose or tight according to the percentage of hands that he plays. These two concepts are obtained analysing the percentage of the time a player puts money into a pot to see a flop in Hold'em - VP$IP (voluntarily put money in the pot). The players are also classified as passive or aggressive. These concepts are obtained analysing the Aggression Factor (AF) which describes the player's nature.
4 INTELLIGENT AGENTS Based on the player classification developed 8 intelligent agents were created, two for each player style: LA - Loose Aggressive (Maniac and Gambler); LP - Loose Passive (Fish and Calling Station); TA - Tight Aggressive (Fox and Ace); TP - Tight Passive (Rock and Weak Tight). A general observer agent was also created capable of keeping the information of every move made from the opponents and calculating playing information like the VP$IP and AF of each opponent in every moment of the game. The opponents are classified into 4 types of players: loose if VP$IP above 28% tight otherwise; aggressive if AF above 1, passive otherwise. After player classification the agent could consider a different range of possible hands for different opponents. A general consideration is that tight players have a smaller range of possible hands than loose agents. In order to pass this information to Hand Strength calculation, for each player is determined a parameter that was called “sklansky”. This parameter represents the lowest value of a hand that belongs to the most probable range of hands that the player plays with that specific movement (call or raise). Taking into account that many times the correct hand of the opponent is wrongly ignored, the better approach of Effective Hand Strength calculation given with this technique should give a better result that compensates this. The Hand Strength and Potential Hand Strength could now be calculated with a better approach. It is calculated only considering the hands with a rank better than the “sklansky” parameter.
5 RESULTS
6000
2000
1800
5000 1600
1400
4000
bankroll
bankroll
1200
3000
800
2000 600
400
1000 200
0
0 1
1
10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244 253 262
52
103 154 205 256 307 358 409 460 511 562 613 664 715 766 817 868 919 970 1021 1072 1123
3500
2500
3000 2000 2500
1500 bankroll
bankroll
2000
1500 1000
1000 500 500
0
0 1
51
6 CONCLUSIONS AND FUTURE WORK From the results achieved it is possible to verify that the Observer agent has better results than a non observer agent, even when the strategy of hand selection is not very good. This proves that even with simple opponent modelling strategies it is possible to achieve good results. However playing normal poker, due to the reduced number of games and the incomplete information gathered, only simple opponent models are possible to create online and thus, the approach proposed is very useful. At the end of this project, we have a good, stable simulator to test future work and an Observer Agent capable of playing poker at an acceptable level, improving the capabilities of the original agent, prepared to be explored, introducing new functionalities. Future work may be concerned in exploring topics like learning to play depending on the position at the table and bluffing. Regarding opponent modelling in Texas Hold’em, future work may include: to consider more than the 4 type of players; analyse other player style variables; and retrieve information from the cards shown at showdown.
REFERENCES
In order to obtain results, several simulations were made with the agents created. In each simulation 8 normal agents and 1 observer were used at the table with the intention to give the Observer Agent the possibility to play in a table with all different kind of players: LA in the first round of simulations, LP in the second, TA in the third and TP in the final round of simulations.
1000
The hand selection in the pre-flop of the Observer was equal to the type of agent modelled using the opponent modelling strategy to change the hand strength potential accordingly to the opponents. Each one of the simulations performed was repeated 3 times and ends up when one of the two agents looses all his bankroll or after 2000 games. Figure 1 shows the bankroll variation of the four observer agents compared with corresponding non-observer agents. In the 12 complete experiments performed (more than 10 000 games in total), the Observer achieved better results than the non observer agent that uses the same hand selection in pre-flop. The most conclusive results are with passive agents, Observer besides having always a big advantage from non observer, the results are also very good, reaching a good level of bankroll. With aggressive agents, the simulations seem to be a bit inconclusive due to big variations of bankroll that sometimes causes the end of the game too soon for an agent. Although, we can conclude that opponent modelling could help these kinds of agents to keep in game for a long time.
101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951 1001 1051 1101 1151
1
61
121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381
Figure 1: Bankroll of LA (top-left), LP (top-right), TA (bottom-left) and TP (bottom-right) observer agents (dark blue) compared with corresponding non-observer agents (magenta)
[1] D. Billings, D. Papp, J. Schaeffer, and D. Szafron. Opponent modeling in poker. In American Association of Artificial Intelligence National Conference, AAAI'98, pages 493-499, 1998 [2] A. Davidson, D. Billings, J. Schaeffer, and D. Szafron. Improved opponent modeling in poker. In International Conference on Artificial Intelligence, ICAI'00, pages 1467-1473, 2000 [3] UA GAMES Group. The University of Alberta GAMES Group, http://www.cs.ualberta.ca/~games [consulted in March 2008] [4] D. Billings, A. Davidson, J. Schaeffer, and D. Szafron. The challenge of poker. Artificial Intelligence, Vol 134(1.2), pages 201-240, January 2002 [5] D. Papp. Dealing with imperfect information in poker. Master's thesis, Department of Computing Science, University of Alberta, 1998 [6] L. Peña. Probabilities and simulations in poker. Master's thesis, Department of Computing Science, University of Alberta, 1999 [7] F. Southey, M. Bowling, B. Larson, C. Piccione, N. Burch, D. Billings, and C. Rayner. Bayes' bluff: Opponent modelling in poker. In 21st Conference on Uncertainty in Artificial Intelligence, UAI'05, pages 550558, July 2005 [8] A. Davidson. Opponent modeling in poker. Master's thesis, Department of Computing Science, University of Alberta, 2002 [9] D. Carmel and S. Markovitch. Incorporating opponent models into adversary search. In American Association of Artificial Intelligence National Conference, AAAI'96, pages 120-125, 1996
8. Constraints and Search
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-897
897
LRTA* Works Much Better with Pessimistic Heuristics Aleksander Sadikov and Ivan Bratko1
1
EXPERIMENTAL DESIGN
We have performed our experiments on classical testbeds for singleagent search methods, the Eight and Fifteen puzzles used in experiments by many authors. We conducted two series of experiments. The first series used artificially constructed heuristic functions for the 8-puzzle. These artificial heuristics enabled good experimental control over the properties of heuristics: optimistic vs. pessimistic vs. mixed (neither optimistic nor pessimistic), all of comparable quality. The construction of these heuristics is the same as in our preceding paper and is described in detail in [4]. In the second series we used a “naturally” constructed pessimistic heuristic function, where the construction was based on problem decomposition. We will therefore be referring to this heuristic as “decomposition heuristic”. The decomposition heuristic is by construction guaranteed to be pessimistic and applies to sliding-tile puzzles of any size. This heuristic was used on the 15-puzzle for comparison with the performance of Manhattan distance heuristic and for direct comparison with other real-time search algorithms. It is based on the decomposition of solving an N × N -puzzle into partial solution of this puzzle plus the solving of an (N − 1) × (N − 1)-puzzle. Accordingly, the 1
25,000
20,000
15,000
10,000
INTRODUCTION
Recently, we proved [4] that incomplete, real-time search methods like RTA* [3] commit less decision errors and produce better solutions when used with pessimistic heuristics instead of optimistic ones of equal quality under very reasonable conditions. Since LRTA* [3] is basically a repetitive running of the RTA* search with a slightly modified update rule to enable the agent to learn from previous runs, we decided to test how pessimistic heuristics behave in the LRTA* setting, thus logically extending our results in [4]. In this paper we experimentally demonstrate that using pessimistic heuristics with LRTA* dramatically speeds-up the convergence process at the cost of just marginally worse converged solutions.
2
30,000
Number of moves
Abstract. Recently we showed that under very reasonable conditions, incomplete, real-time search methods like RTA* work better with pessimistic heuristic functions than with optimistic, admissible heuristic functions of equal quality. The use of pessimistic heuristic functions results in higher percentage of correct decisions and in shorter solution lengths. We extend this result to learning RTA* (LRTA*) and demonstrate that the use of pessimistic instead of optimistic (or mixed) heuristic functions of equal quality results in much faster learning process at the cost of just marginally worse quality of converged solutions.
Artificial Intelligence Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Trˇzaˇska 25, 1000 Ljubljana, Slovenia, email: {aleksander.sadikov;ivan.bratko}@fri.uni-lj.si
optimistic mixed 50:50 pessimistic
5,000
0 5
10
15
20
25
30
Search depth
Figure 1. The comparison of convergence speed between optimistic (solid line), mixed (dashed line), and pessimistic (dash-dotted line) heuristic
heuristic upper bound on the solution length is computed as the cost of solving the left-most column and the top-most row of the N × N puzzle, plus (recursively) the heuristic estimate of the cost of solving the remaining (N − 1) × (N − 1)-puzzle. The complete details of the realization of this idea are somewhat involved, and due to space limitations cannot be presented here.
3
EXPERIMENTAL RESULTS
We were interested in the following characteristics of the LRTA* search: the speed of convergence process and total search effort needed, and the converged solution quality, all depending on the heuristic used. We compared these characteristics between various heuristics and between various search algorithms.
3.1
Artificial Heuristics
When measuring the speed of convergence we varied the nature and quality of heuristics and the depth of lookahead. Results of the experiments with artificial heuristics of various nature and quality are presented in Figure 1. For a given quality of heuristics and a given depth of lookahead, we measured the average speed of convergence on a set of 1,000 randomly chosen puzzles of various levels of difficulty — this way of testing is quite common and was used for example in [3]. The x-axis represents the depth of lookahead, and the y-axis represents the time needed for convergence to take place (measured
898
A. Sadikov and I. Bratko / LRTA* Works Much Better with Pessimistic Heuristics
Table 1.
Experimental results for 15-puzzle. All statistics are per puzzle and averaged over the whole set of 100 optimally solved puzzles.
Algorithm FALCONS LRTA* (d = 1) ε-LRTA* (d = 1, ε = 2) LRTA* (d = 1) LRTA* (d = 5)
Heuristic Manhattan Manhattan Manhattan Decomposition Decomposition
#moves
First sol.
Conv. sol.
Deg. factor
CPU time (s)
7
no convergence on any instance within 4 × 10 states [5] no convergence on any instance within 4 × 107 states [5] 2,391,847.4 1311.07 6564.59 76.96 1.45 1420.01 2,612.7 21.58 114.93 93.55 1.76 6.02 1,922.8 17.17 107.93 83.23 1.57 88.04
by the total number of moves performed by the underlying RTA* search in all trials needed to complete the solving of one puzzle). Figure 1 shows a very small (but representative) subset of the results. The chart shows results obtained with heuristics of quality similar to that of Manhattan heuristic (in terms of root mean squared error σ = 2.5). The three curves on the chart relate to the three types of heuristics used: solid line represents the optimistic heuristic, dashed line represents the mixed heuristic (50%-optimistic and 50%-pessimistic), and the bold dash-dotted line represents the pessimistic heuristic. It is obvious from the chart that the pessimistic heuristic causes LRTA* to converge much faster than the optimistic one. The mixed heuristic is not somewhere in the middle as might be expected, but is much closer to the optimistic one. Further experiments with mixed heuristics confirmed that these behave similar to optimistic ones (results not shown due to lack of space). As we have seen, pessimistic heuristics cause LRTA* to converge much faster than when it uses optimistic heuristics. Now the relevant question to ask is how much do we sacrifice in terms of the quality of converged solutions for this speed-up? If the solutions thus obtained are worthless then there is no benefit in the speed-up. The results for artificial heuristics on the 8-puzzle are as follows. For σ = 2.5, in the worst case, with one ply lookahead, on average over 1,000 test puzzles we lose about one single move, or in other words, about 5% (ε ≈ 0.05). With deeper lookahead this suboptimality decreases and by search depth 2 or 3 it already halves. Heuristics of worse quality are even more susceptible to the benefits of increased lookahead — by search depth 2 or 3 the suboptimality of their converged solutions more than halves. The results for other qualities of heuristics tested are very similar to the reported ones.
3.2
#trials
expensive than Manhattan distance). All the experiments were run on the same platform (Python interpreter on a 2.6 GHz PC). The average optimal solution cost over 100 puzzles was 53.05. Experimenting with greater lookahead of five with decomposition heuristic is justified by the fact that it is affordable due to convergence efficiency of the pessimistic heuristic.
4
DISCUSSION
The obvious point of the obtained results is that the pessimistic heuristic in comparison with the optimistic heuristic and the mixed heuristic used with LRTA* offers orders of magnitude better search efficiency (in terms of CPU time) at a relatively low cost in solution quality. The speed efficiency of pessimistic heuristic in terms of the number of moves is even relatively greater. It is important to note that these performance results are not due to the quality of the heuristics used here. The average value of (optimistic) Manhattan distance heuristic evaluation over the 100 puzzles is 69% of the true solution costs, whereas the average value of (pessimistic) decomposition heuristic evaluation is 250% of the true costs. Of even greater interest is the discrimination power of the two heuristics in deciding which of two given 15-puzzles is easier (has shorter optimal solution). Manhattan distance correctly decides in 74.2% of all 12 × 100 × 99 possible pairs of Korf’s 100 15-puzzles, whereas decomposition heuristic only gives the correct decision in 56.9% of these pairs. It should be admitted that increasing lookahead depth of LRTA* with decomposition heuristic beyond depth five does not significantly improve the quality of converged solutions. It looks like reaching a plateau. On the other hand, it is possible to improve the quality of solutions by decreasing parameter ε below 2, although again at the expense in convergence time.
Decomposition Heuristic
In this section we experimented with 15-puzzles comparing the popular Manhattan distance optimistic heuristic with our decompositionbased pessimistic heuristic. Also, we compared the performance of LRTA* with two other related real-time search algorithms, FALCONS [1] and ε-LRTA* [5]. However, the main point of this comparison is the study of optimistic vs. pessimistic heuristic, which makes the main differences between the compared variants of LRTA*. Table 1 shows experimental results averaged over selected 100 15-puzzles for which Korf [2] gave the costs of optimal solutions. The table gives for each entry in the comparison the heuristic used, average total number of moves to convergence per puzzle, average number of trials to convergence per puzzle, the average cost of the first solution obtained and of the converged solution, average degradation factor (ratio between the average cost of converged solution and average cost of optimal solution), and average CPU times. CPU times are important for comparison because different heuristic functions take different CPU times to compute (decomposition heuristic being more
ACKNOWLEDGEMENTS This work was partly funded by ARRS, Slovenian Research Agency.
REFERENCES [1] David Furcy and Sven Koenig, ‘Speeding up the convergence of realtime search’, in Proceedings of the Seventeenth National Conference on Artificial Intelligence, pp. 891–897, (2000). [2] Richard E. Korf, ‘Depth-first iterative deepening: An optimal admissible tree search’, Artificial Intelligence, 27(1), 97–109, (1985). [3] Richard E. Korf, ‘Real-time heuristic search’, Artificial Intelligence, 42(2-3), 189–211, (1990). [4] Aleksander Sadikov and Ivan Bratko, ‘Pessimistic heuristics beat optimistic ones in real-time search’, in Proceedings of the Seventeenth European Conference on Artificial Intelligence, ed., Gerhard Brewka, pp. 148–152, Riva di Garda, Italy, (August 2006). [5] Masashi Shimbo and Toru Ishida, ‘Controlling the learning process of real-time heuristic search’, Artificial Intelligence, 146(1), 1–41, (2003).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-899
899
Thinking Too Much: Pathology in Pathfinding Mitja Luˇstrek 1 and Vadim Bulitko 2 1
INTRODUCTION
Incomplete single-agent search methods are often better suited to real-time pathfinding tasks than complete methods (such as A*). Incomplete methods conduct a limited-depth lookahead search, i.e., expand a part of the space centered on the agent, and heuristically evaluate the distances from the frontier of the expanded space to the goal. Actions selected this way are not necessarily optimal, but it is generally believed that deeper lookahead increases the quality of decisions. However, in two-player games, where similar methods are used, it has long been known that this is not always the case [7, 1]. This phenomenon has been termed minimax pathology. More recently pathological behavior was discovered in single-agent search as well [3]. Some attempts to explain it have been made [5, 6], but the pathology in single-agent search is largely still not understood. In this paper we investigate lookahead pathology in real-time pathfinding on maps from commercial computer games. First, we present an empirical study showing a degree of pathology in over 90% of the problems considered. Second, we give four explanations for such wide-spread pathological behavior.
2
THE PATHOLOGY OBSERVED
We study the problem of an agent trying to find a path from a start to a goal state in a two-dimensional grid world. The agent plans its path using the Learning Real-Time Search (LRTS) algorithm [2]. LRTS conducts a lookahead search centered on the current state and generates all the states up to d moves away. It heuristically estimates the distances from the frontier states to the goal state and moves to the most promising frontier state. Upon reaching it, it conducts a new search. The initial heuristic is the shortest distance assuming an empty map. After each search, the heuristic of the current state is updated to the estimated distance through the most promising frontier state, which constitutes the process of learning. We conducted two types of experiments: on-policy and off-policy. In the first type the agent follows a path from the start state to the goal state as directed by the LRTS algorithm. In the second type the agent appears in a (randomly selected) state and selects the first move towards the goal state. If the move does not lie on the shortest path to the goal state, it is erroneous. The error e(Sd ) is the fraction of erroneous moves taken in the set of states Sd visited using lookahead depth d. The degree of error pathology in the sequence of sets S1 , . . . , Sdmax is k iff e(Sd+1 ) > e(Sd ) for k different d < dmax . We generated 1,000 problems on maps from a commercial roleplaying game. The lookahead depth ranged from 1 to 10 = dmax . First we conducted the basic on-policy experiment: the agent solved the 1 2
Joˇzef Stefan Institute, Department of Intelligent Systems, Jamova cesta 39, 1000 Ljubljana, Slovenia, email: mitja.lustrek@ijs.si University of Alberta, Department of Computing Science, Edmonton, Alberta T6G 2E8, Canada, email: bulitko@ualberta.ca
problems, we measured the degree of error pathology for each problem and counted the number of problems with each of the possible degrees. The on-policy row in Table 1 shows that over 90% of the problems are pathological. Table 1.
Pathology in the basic on and off-policy experiments.
Degree Pat. problems on-policy [%] Pat. problems off-policy [%]
0 6.3 83.1
1 13.1 14.9
2 24.8 2.0
≥4 26.7 0.0
3 29.0 0.0
The first possible explanation of the on-policy results in Table 1 is that the maps contain a lot of states where deeper lookaheads lead to suboptimal decisions, whereas shallower ones do not. If this were the case, the basic off-policy experiment, where the pathology is measured in randomly selected states, should yield comparable pathology. However, the off-policy row in Table 1 shows much less pathology. In the rest of the paper, we will investigate the reasons for this.
3
EXPLANATIONS OF THE PATHOLOGY
The first explanation is that the LRTS algorithm’s behavioral policy steers the search to pathological states. The explanation was verified by computing off-policy pathology from the error in the states visited during the basic on-policy experiment instead of randomly selected ones. The results in Table 2 do show more pathology compared to the basic off-policy experiment in Table 1 (23.2% vs. 16.9%), but they are still far from the basic on-policy experiment (23.2% vs. 93.7%). Table 2.
Pathology measured off-policy in the states visited on-policy.
Degree Patological problems [%]
0 76.8
1 13.8
2 5.7
3 2.3
≥4 1.4
The basic on-policy experiment involves learning, but no learning takes place in the basic off-policy experiment. It is harder to find the path to the goal when the lookahead depth is small. Consequently the agent backtracks more, encountering updateted states more often when the lookahead depth is large. This leads us to the second explanation. Smaller lookahead depths benefit more from the updates to the heuristic. This can be expected to make their decisions better than the mere depth would suggest and thus closer to larger depths. If they are closer to larger depths, cases where a deeper lookahead actually performs worse than a shallower one should be more common. The first test of the second explanation is an on-policy experiment where the agent is directed by the LRTS algorithm that uses learning (to prevent infinite loops), but the error is measured using only the initial, non-updated heuristic. The results in Table 3 suggest that learning is indeed responsible for the pathology, because the pathology in the new experiment is markedly smaller than in the basic onpolicy experiment shown in Table 1: 70.4% vs. 93.7%. Table 3.
Pathology on-policy with error measured without learning.
Degree Pathological problems [%]
0 29.6
1 20.4
2 19.3
3 18.2
≥4 12.5
900
M. Luštrek and V. Bulitko / Thinking Too Much: Pathology in Pathfinding
The second test is to measure the volume of heuristic updates, which reflects the benefit of learinng. This volume is the sum of the differences between the updated and the initial heuristics in the states generated during search. Figure 1 shows the results for the basic onpolicy experiment and for the basic off-policy experiment (where no learning takes place). We see that in the on-policy experiment the volume of updates decreases with lookahead depth (unlike in the offpolicy experiment), which confirms our explanation. Figure 2.
Figure 1. The volume of heuristic updates encountered per move with respect to the lookahead depth in the basic on- and off-policy experiments.
The results in Table 3 still show more pathology than in the basic off-policy experiment, so there must be a third explanation. Let αoff (d) and αon (d) be the average number of states generated per move in the basic off-policy and on-policy experiments respectively. In off-policy experiments a search is performed every move, whereas in on-policy experiments a search is performed every d moves. Therefore αon (d) = αoff (d)/d. This means that in the basic on-policy experiment fewer states are generated at larger lookahead depths than in the basic off-policy experiment. Consequently the depths in the basic on-policy experiment are closer to each other with respect to the number of states generated. Since the number of states generated can be expected to correspond to the quality of decisions, cases where a deeper lookahead actually performs worse than a shallower one should be more common. The first test of the third explanation is an on-policy experiment where a search is performed every move instead of every d moves. The results in Table 4 confirm the explanation. The percentage of pathological problems is considerably smaller than in the basic onpolicy experiment shown in Table 1: 34.7% vs. 93.7%. Since LRTS that searches every move is very similar to LRTA* [4], LRTA* can also be expected to be less pathological.
The number of states generated per move with respect to the lookahead depth in different experiments.
accurate heuristic value will be selected. We verified the forth explanation with an on-policy experiment with pessimistic heuristic values. If the regular heuristic value of a state s is h(s) = h∗ (s) − e, where e is the heuristic error, then the pessimistic heuristic value is hp (s) = h∗ (s) + e. Such a heuristic is unrealistic, but it should give us an idea of what to expect from realistic pessimistic heuristics, should we be able to design them. The results in Table 5 do show a decrease in pathology compared to the basic on-policy experiment shown in Table 1: 86.1% vs. 97.7%. Table 5.
Pathology on-policy with pessimistic heuristic.
Degree Pat. problems [%]
4
0 13.9
1 4.1
2 8.3
3 22.9
4 27.7
≥5 23.1
CONCLUSION
The first two expanations of the pathology do not seem to offer practical ways for avoiding the pathology. When ivestigating the third explanation, we learned that searching every move the way LRTA* does brings the pathology from 93.7% to 34.7%. It also generates up to 2.6 times shorter solutions. However, it increases the number of states generated per move roughly by a factor of d. This means that the number of states generated per problem when searching every move is up to 4.5 larger (at d = 10) than with the regular LRTS. A promising direction of research therefore seems to be a method for dynamically selecting the point at which a new search is needed. Finally, the fourth explanation suggests that pessimistic heuristics may be less prone to the pathology. In addition, the solutions found using the pessimistic heuristic were nearly optimal (3.8–7.2 times shorter than with the regular heuristic), so pessimistic heuristics deserve further attention.
Table 4. Pathology on-policy when searching every move. Degree Pathological problems [%]
0 65.3
1 14.6
2 8.6
3 7.1
≥4 4.4
The second test is to measure the number of states generated per move. Figure 2 shows that in the basic off-policy experiment and in the on-policy experiment when searching every move, the number increases more quickly with lookahead depth than in the basic onpolicy experiment. The depths are thus less similar than in the basic on-policy experiment, which again confirms our explanation. Experiments with eight-puzzle [8] showed that pessimistic heuristics can prevent the pathology. This inspired the fourth explanation of the pathology. During lookahead search, states with low heuristic values are favored. If the heuristic values are optimistic (as in our case), the lowest heuristic value is likely to be particularly far from the true value. With deeper lookahead, more states are considered and the chances of selecting a state with an especially inaccurate heuristic increase. If the heuristic values are pessimistic, the opposite is true: the states with accurate heuristic values are favored and the more states are considered, the more likely a state with a very
REFERENCES [1] Donald F. Beal, ‘An analysis of minimax’, in Advances in Computer Chess, volume 2, pp. 103–109, (1980). [2] Vadim Bulitko and Greg Lee, ‘Learning in real time search: A unifying framework’, Journal of Artificial Intelligence Research, 25, 119–157, (2006). [3] Vadim Bulitko, Lihong Li, Russell Greiner, and Ilya Levner, ‘Lookahead pathologies for single agent search’, in Proceedings of IJCAI, poster section, pp. 1531–1533, Acapulco, Mexico, (2003). [4] Richard E. Korf, ‘Real-time heuristic search’, Artificial Intelligence, 42(2, 3), 189–211, (1990). [5] Mitja Luˇstrek, ‘Pathology in single-agent search’, in Proceedings of Information Society Conference, pp. 345–348, Ljubljana, Slovenia, (2005). [6] Mitja Luˇstrek and Vadim Bulitko, ‘Lookahead pathology in real-time path-finding’, in Proceedings of AAAI, Learning for Search Workshop, pp. 108–114, Boston, USA, (2006). [7] Dana S. Nau, Quality of Decision versus Depth of Search on Game Trees, Ph.D. dissertation, Duke University, 1979. [8] Aleksander Sadikov and Ivan Bratko, ‘Pessimistic heuristics beat optimistic ones in real-time search’, in Proceedings of ECAI, pp. 148–152, Riva del Garda, Italy, (2006).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-901
901
Dynamic Backtracking for Distributed Constraint Optimization1 Redouane Ezzahir2 and Christian Bessiere3 and Imade Benelallam2 and El Houssine Bouyakhf2 and Mustapha Belaissaoui4 Abstract. We propose a new algorithm for solving Distributed Constraint Optimization Problems (DCOPs). Our algorithm, called DyBop, is based on branch and bound search with dynamic ordering of agents. A distinctive feature of this algorithm is that it uses the concept of valued nogood. Combining lower bounds on inferred valued nogoods computed cooperatively helps pruning dynamically unfeasible sub-problems and speeds up the search. DyBop requires a polynomial space at each agent. Experiments show that DyBop has significantly better performance than other DCOP algorithms.
1 INTRODUCTION The Distributed Constraint Optimization Problem (DCOP) is a powerful formalism to model a wide range of applications in multi-agent coordination. The major motivation for research on DCOPs is that they are an elegant model for many every day combinatorial problems that are distributed by nature, such as distributed resource allocation, distributed scheduling, or sensor networks. In this paper, we present a new distributed algorithm, called DyBop: Dynamic Backtracking search for DCOPs. It is based on branch and bound search and uses the concept of valued nogood introduced in [2, 7]. DyBop is guaranteed to terminate and requires polynomial space. The agents assign their variables sequentially and compute asynchronously a lower bound to the cost of the current context. Whenever an agent is successfully assigned, it sends copies of the current context to all unassigned agents concurrently. These unassigned agents send back the cost of their cheapest assignment wrt this context. If the aggregation of the costs received by the current agent is greater than the current upper bound, the current agent changes its value. If no value remains available, a valued nogood is sent from the current agent to the lowest agent involved in the nogood. Experimental results on random Max-DisCSPs and a real structured problem (Distributed Meeting Scheduling) show that DyBop outperforms both AFB-BJ [5], NCBB [1] and ABFS [4].
2 BACKGROUND The Distributed Constraint Optimization Problem is a tuple (A, X , D, C, F ), where A is a set of agents {A1 , A2 , ..., Ak }, X is a set of variables {X1 , X2 , ..., Xn }, and D = {D1 , D2 , ..., Dn } is a set of domains, where each Di in D is a finite set containing the 1 2 3 4
This work has been supported by the Maroc-France PAI project no. MA/05/127 and the ANR project ANR-06-BLAN-0383-02. LIMIARF/FSR, Morocco, email: ezzahir@lirmm.fr, bouyakhf@fsr.ac.ma, imade.benelallam@ieee.org LIRMM (CNRS/U. Montpellier), France, email: bessiere@lirmm.fr Universit´e Hassan I, Morocco, email: m.belaissaoui@encg-settat.ma
values to which its associated variable Xi may be assigned. Only the agent who contains a variable has control of its value, and knowledge of its domain. C = {cij : Di × Dj → R+ , with i, j ∈ 1...n, i = j} is a set of constraints, represented by a cost function cij for each pair of variables Xi and Xj . The goal is to find a global assignment I of values to variables in X such that the objective function F is minimized. The valued nogood [2] is an extension of the classical nogood for Valued CSP. Recently, Silaghi et al. have introduced the inference power of valued nogoods in DCOP solving [7]. Definition 1 (Valued Nogood [2]) A valued nogood has the form (I, v, C). It specifies that given a set of constraints C, a global assignment extending the partial assignment I = {(X1 , v1 ), . . . , (Xk , vk )} has cost at least v. C is a set of reference constraints called justification in [7].
3 DYBOP In DyBop, each agent stores a nogood per value. During search, each agent holds its view of the current state of search in a data structure called current context CCT X. Definition 2 (Context) For a partial assignment P A that we try to extend, we associate a current context CCT X of the form P A, N , CS, where N = {NX1 , . . . , NXk } is the set of all nogoods associated to variable assignment in P A, and CS = {CS X1 , . . . , CS Xn } is a list of conflict sets. Each conflict set CS Xi contains all agentID which are used to identify the assignments in any nogood stored by Xi . In DyBop, only one agent performs an assignment on the current context CCT X at a time. Whenever the agent was successfully assigned, it sends copies of CCT X to all unassigned agents concurrently and awaits for response messages. All unassigned agents compute asynchronously a valued nogood with a valuation equal to the lower bound of the cost of assigning a value to their variables, and send this nogood back to the agent which performed the assignment. The assigning agent accumulates these valued nogoods using suminference, an aggregation operator based on the objective function F . Once the valuation of accumulated nogood exceeds that of the best known solution found so far, the agent prunes its current value. The accumulated nogood is stored as explanation of value removal. On the other hand, when the cost of the aggregation of all valued nogoods coming from unassigned agents is less than the cost of the best known full assignment, the agent sends the current context to the agent selected as next. So, the current context is propagated forward sequentially. Whenever the current agent cannot find a valid value, it
902
R. Ezzahir et al. / Dynamic Backtracking for Distributed Constraint Optimization
performs the min-resolution of its stored set of nogoods, and sends back the resulting nogood to the last assigned agent in this nogood. When an agent receives such a valued nogood due to a backtrack, before storing it, the agent performs a partial reduction [2] of this nogood by using the last stored nogood related to the current value and the nogood related to the current assignment CCT X. The communication among DyBop agents is performed by five types of messages. CTX: A message that carries the current context CCT X. FB CTX: A forward bounding message that is an exact copy of a CCT X. Every agent that assigns its variables on a CCT X creates an exact copy in the form of a FB CTX and sends it forward to all unassigned agents. An agent receiving an FB CTX message computes a valued nogood with a valuation equal to the lower bound on the cost increment caused by adding an assignment for its variables to the CCT X. This estimated nogood is sent back to the agent which sent the FB CTX message via an ESTIMATE message. BACK: A message that is sent back when dynamic backtracking is performed. It carries a valued nogood that justifies the conflict and the current context CCT X. The receiver of this message is chosen as the last assigned agent in the carried nogood.
Figure 1.
Results on random instances of max-DisCSP with 10 agents
Theorem 1 DyBop is correct and terminates. Proof. During DyBop search, all operations on valued nogoods are logically sound. Thus, if DyBop terminates, the upper bound is optimal, so solution is found. Termination can be proved if we consider a simple version of DyBop where FB CTX messages are not used. This version is a complete algorithm because the nogoods produced by min-resolution are similar to classical nogoods rejecting an assignment. Therefore, it is sufficient to apply the method used by Ginsberg to show the termination of centralized dynamic backtracking. When we add FB CTX messages, the stored nogoods coming with these messages cannot break the termination because they follow the same inference principle used for nogoods coming from Back messages. Thus, DyBop terminates. 2
4 EXPERIMENTAL RESULTS We considered two different problems for our experiments. The first was a random Max-DisCSP with 10 agents containing a single variable, in which all constraint costs are equal to 1, and density of the constraint graph is 40%. The second was a real structured problem, the Distributed Meeting Scheduling problem (DMS). We have tested three DMS problem classes. Each DMS problem class is represented by the pair (m, p) = (#meetings, #participants) in which there are p agents with multiple variables (m variables to the maximum). There are 5 values in each domain, each of them representing a possible meeting start time. All experiments were performed on the DisChoco platform [3] in which agents are simulated by threads which communicate only through message passing. We evaluate the algorithms performance by the mean of non obsolete messages (NO-MSGs), and the Equivalent Non Concurrent Constraint Checks (ENCCCs) [1] on 10 instances. ENCCCs are a weighted sum of processing and communication time. For random problems, we simulate two scenarios of communication: fast communication (message delay cost = 0 CCCs), and slow communication (message delay cost = 1000 CCCs). Experimental results are shown in Fig. 1. We observe that for almost all parameter settings, DyBop is significantly better than both AFBBJ and ABFS. For DMS problems, Table 1 presents the results in a slow communication system. It shows that DyBop is even better than on random. On instances (5, 7), with a cutoff set at 1,800 seconds,
DyBop provides optimal solutions for all instances against 70% for ABFS and AFB. We point out that Gershman et al. showed that AFBBJ is faster than DPOP, which has been shown faster than NCBB [6]. Table 1.
Results on DMS. #msg = #NO-MSG and #cc = #ENCCCs
(m, p) (5, 5) (5, 6) (5, 7)
DyBop #msg #cc 931 2,299 1,676 3,797 2,597 5,791
ABFS #msg #cc 1,315 3,730 3,365 12,450 316K 1,513K
AFB-BJ #msg #cc 1,946 7,982 5,403 34,188 5,387K 80,345K
5 CONCLUSION We have presented DyBop, a new algorithm for solving Distributed Constraint Optimization Problems (DCOPs). DyBop is based on branch and bound search with dynamic ordering of agents, and it uses the concept of valued nogood introduced in [2, 7]. Experiments show that the proposed approach of combining lower bounds on inferred valued nogoods computed cooperatively speeds up the search significantly wrt existing techniques.
REFERENCES [1] A. Chechetka and K. Sycara., ‘No-commitment branch and bound search for distributed constraint optimization’, in AAMAS, pp. 1427–1429, (2006). [2] P. Dago and G. Verfaillie, ‘Nogood recording for valued constraint satisfaction problems’, in ICTAI, pp. 132–139, (1996). [3] R. Ezzahir, C. Bessiere, M. Belaissaoui, and E.H. Bouyakhf., ‘Dischoco: A platform for distributed constraint programming.’, in IJCAI’07 Workshop on DCR, pp. 16–21, (2007). [4] R. Ezzahir, C. Bessiere, E. H. Bouyakhf, I. Benelallam, and M. Belaissaoui, ‘Asynchronous breadth-first search DisCOP algorithm’, in EUMAS’07, (2007). [5] A. Gershman, A. Meisels, and R. Zivan, ‘Asynchronous forwardbounding with backjumping’, in IJCAI’07 Workshop on DCR, pp. 28–39, (2007). [6] A. Gershman, R. Zivan, T. Grinshpoun, A. Grubshtein, and A. Meisels, ‘Measuring distributed constraint optimization algorithms’, in AAMAS’08 Workshop on DCR, (2008). [7] M.C. Silaghi and M. Yokoo, ‘Nogood based asynchronous distributed optimization (adopt-ng)’, in AAMAS, pp. 1389–1396, (2006).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-903
903
Integrating Abduction and Constraint Optimization in Constraint Handling Rules Marco Gavanelli and Marco Alberti and Evelina Lamma 1 1
Abduction and CHR
Abductive Logic Programming (ALP) [10] is a set of languages supporting hypothetical reasoning; the corresponding proof-procedures feature a simple, sound implementation of negation by failure [6]. An ALP is a logic program KB with a distinguished set A of predicates, called abducibles, that do not have a definition, but their truth value can be assumed. A set of implications called Integrity Constraints (IC) restrict the possible set of hypotheses, in order to avoid unrealistic assumptions. Given a goal G, the aim is to find a set Δ ⊆ A such that KB ∪ Δ |= G and KB ∪ Δ |= IC. ALP and Constraint Logic Programming (CLP) have been merged in works by various authors [11, 12, 5]. However, while almost all CLP languages provide algorithms for finding an optimal solution with respect to some objective function (and not just any solution), the issue has received little attention in ALP. We believe that adding optimisation meta-predicates to abductive proof-procedures would improve research and practical applications of abductive reasoning. However, abductive proof-procedures are often implemented as Prolog meta-interpreters, which makes clumsy the strong intertwining with CLP required to fully exploit optimisation meta-predicates. In the line with previous research [1, 9, 4, 2], we implemented the SCIFF abductive proof-procedure [3] in Constraint Handling Rules (CHR) [7], which provides a strong integration between abduction and constraint solving/optimisation. In SCIFF, the abductive logic program can invoke optimisation meta-predicates, which can invoke abductive predicates, in a recursive way. Previous implementations of abduction in CHR mapped abducibles into CHR constraints, and integrity constraints into CHR rules [1, 9, 4, 2]. In this way, the implementation is very efficient, but there are limitations on the language: only abducibles can occur in the condition of ICs. This limits the applicability of sound negation by failure to abducibles, while negative literals of other predicates inherit “the dubious semantics of Prolog” [4]. E.g., the following IC (where abducibles are in bold) a(X, Y ), b(Y ) → c(X) ∧ p(Y ) ∨ q(X)
(1)
cannot be represented in this way, because p/1 is not abducible. This means that it is not possible to deal with negation by failure in a sound way, since not(p(X)) should be rewritten as p(X) → false. In SCIFF, an abducible a(X, Y ) is represented as the CHR constraint abd(a(X, Y)). We do not map integrity constraints to CHR rules, but to other CHR constraints. IC (2) is mapped to the constraint ic([abd(a(X, Y)), p(Y)], [[r(X), q(Y)], [q(X)]]). The operational semantics (derived from the IFF [8]) is defined by a set of transitions [3]. The transitions are then easily implemented as CHR rules; for example, transition propagation (joined with case analysis) [8] propagates an abducible with an implication: abd(P), ic([P1|Rest], Head) =⇒ rename(ic([P1|Rest], Head), ic([RenP1|RenRest], RenHead)), reif unify(RenP1, P, B), (B = 1, ic(Rest, Head); B = 0) We first rename the variables (considering their quantification), and then apply reified unification [12]: a CHR constraint that imposes that either the two first arguments unify and B = 1, or that the two arguments do not unify and B = 0. One of the features of the CHR implementation is that the abductive program written by the user is directly executed by the Prolog engine, and the resolvent of the proof-procedure coincides with the Prolog resolvent. This also means that every Prolog predicate can be invoked, and, in particular, we can invoke optimisation metapredicates: in some cases, it is not enough to find one abductive solution, but the best solution with respect to some criteria is requested. CLP offers an answer to this practical need by optimisation metapredicates (minimize and maximize), that select the best solution amongst those provided by a goal.
2
An example from Game Theory
N grim pirates plundered a treasure of M golden coins. They have to divide their treasure, and they want to have fun. Since they are bloodthirsty, they adopt rules in which blood might be shed:
can be rewritten as a propagation CHR a(X, Y), b(Y) ==> c(X), p(Y) ; q(X) because in the antecedent only abducibles occur, thus in the head of the propagation CHR there are only CHR constraints. Instead, the IC a(X, Y ), p(Y ) → r(X) ∧ q(Y ) ∨ q(X), 1
University of Ferrara, Italy, email: name.surname@unife.it
(2)
1. The lowest pirate in grade proposes a full division: he decides how many coins are given to each pirate (including himself). 2. All the pirates vote: if the majority votes for the proposal, the money is shared as in the division. Otherwise, the proposer is killed, and the process restarts from step 1. Knowing that all pirates are greedy and bloodthirsty (i.e., they mostly care about money, and in case of parity they like to see someone die), we have to propose a division.
904
M. Gavanelli et al. / Integrating Abduction and Constraint Optimization in Constraint Handling Rules
This is clearly an optimisation problem, as pirates want to get as much money as possible; moreover, the proposer has to hypothesise how the other pirates will vote, in order to stay alive. The lowest in grade will abduce an atom bearing the information for each pirate: at the first proposal, there is a literal for each pirate E(pirate(Grade, V ote, Coins, Alive), 1)
(3)
meaning that the proposer gives to the pirate with given Grade (1 being the highest) a number Coins of coins, we suppose his vote is expressed with a boolean V ote (an integer 0=false or 1=true), and that at the end he will be alive if and only if the boolean Alive = 1. Moreover, we had better try to foresee what could possibly happen in the next protocol iterations, in the unlucky case our proposal does not get the majority. We suppose each proposal happens at a time step indicated by an integer (last argument of Eq. 3). Now we can see the rules of the protocol. Predicate npirates/1 defines the number of pirates. The N -th pirate makes the first proposal, the N − 1 has the second choice and so on: turn(Grade, T urn):-npirates(N ), T = N + 1 − Grade. Each pirate is alive, if his turn of proposing has not come yet. E(pirate(Grade, V ote, Coins, Alive), T ) ∧turn(Grade, T urn) ∧ T < T urn → Alive = 1 After his proposal, a pirate is dead: he gets 0 coins and does not vote: E(pirate(Grade, V ote, Coins, Alive), T ) ∧ turn(Grade, T urn) ∧ T > T urn → Alive = 0 ∧ V ote = 0 ∧ Coins = 0 Each pirate votes for his own proposal: E(pirate(Grade, V ote, Coins, Alive), T ) ∧turn(Grade, T urn) ∧ T = T urn → V ote = 1 If in the current proposal I suppose to get more money than in the next, I will vote for the current one. Otherwise, I will vote against: either I hope to get more money, or I hope to see the proposer die. E(pirate(Grade, V ote1 , Coin1 , Alive1 ), T1 ) ∧ T2 = T1 + 1∧ E(pirate(Grade, V ote2 , Coin2 , Alive2 ), T2 ) → Coin1 > Coin2 ∧ V ote1 = 1 ∨ Coin1 ≤ Coin2 ∧ V ote1 = 0. If I suppose next iteration I will be dead, I will accept any proposal: E(pirate(Grade, V ote1 , Coins1 , 1), T1 ) ∧ T2 = T1 + 1∧ E(pirate(Grade, V ote2 , Coins2 , 0), T2 ) → V ote1 = 1. The predicate pirates(Lcoins , Lvotes , T ) is the program entry point. Its arguments are the coins assignment (list Lcoins ), the result of the voting (list Lvotes ), and the iteration number T (initially 1). In the following code, CLP predicates are underlined. The abduce predicate abduces the atom (3) for each pirate. pirates([],[],T):- npirates(N), T>N. pirates(Lcoins ,Lvotes ,T):- npirates(N), ncoins(M), T ≤ N , % Define variables’ domains length(Lcoins ,N), domain(Lcoins ,0,M), sumlist(Lcoins ,M), length(Lvotes ,N), domain(Lvotes ,0,1), sumlist(Lvotes ,Nvotes ), 2Nvotes > N-T+1 ⇔ Win, %One wins if he gets majority % The pirate gets the coins only if he wins nth(T,Lcoins ,CoinsPirate), GotCoins = Win*CoinsPirate,
% The proposer will be alive only if he wins length(Lalive ,N), nth(T,Lalive ,Win), maximize( ( T1 is T+1, pirates( , ,T1), abduce(Lcoins ,Lvotes ,Lalive ,N,T),% Abduce a division labeling(Lcoins ), labeling(Lvotes ), ),GotCoins). %Maximise number of obtained coins The result for N = 4 pirates and M = 9 coins is the following: E(pirate(4,1,7,1),1) E(pirate(4,0,0,0),2) E(pirate(4,0,0,0),3) E(pirate(4,0,0,0),4)
E(pirate(3,0,0,1),1) E(pirate(3,1,9,1),2) E(pirate(3,0,0,0),3) E(pirate(3,0,0,0),4)
E(pirate(2,1,1,1),1) E(pirate(2,1,0,1),2) E(pirate(2,1,0,0),3) E(pirate(2,0,0,0),4)
E(pirate(1,1,1,1),1) E(pirate(1,0,0,1),2) E(pirate(1,0,9,1),3) E(pirate(1,1,9,1),4)
Pirate 4 (first row of the table) takes 7 coins for himself, gives 1 coin each to pirates 1 and 2, and nothing to pirate 3. He is sure to get 3 votes: his own, plus those of pirates 1 and 2. How can he be so sure of surviving? Because if he dies (second row), pirate 3 gets all the money, while 1 and 2 get nothing, and nevertheless pirate 2 votes for the proposal! In fact, in iteration 3, pirate 2 is sure to die: whatever proposal he makes, pirate 1 will vote against, getting in the last iteration all the money, and making pirate 2 die. Besides the correct game theory result, this example shows remarkable features of SCIFF. First, a SCIFF program is a real CLP(FD) program. The user is not restricted to a subset of the available constraints, and, in particular, she can use global constraints (e.g., sumlist) in the knowledge base. Second, we have recursion through the optimisation meta-predicate maximize. SCIFF tightly integrates CLP(FD) and abduction, thanks to its CHR implementation. Finally, SCIFF is efficient: it took 49s to solve the above problem with N = 4 pirates on a Pentium M715, 1.5GHz, 512MB RAM computer, which is reasonable considering that the problem is at the fourth level of the polynomial hierarchy.
REFERENCES [1] S. Abdennadher and H. Christiansen, ‘An experimental CLP platform for integrity constraints and abduction’, in FQAS 2000, pp. 141–152. [2] M. Alberti, F. Chesani, D. Daolio, M. Gavanelli, E. Lamma, P. Mello, and P. Torroni, ‘Specification and verification of agent interaction protocols in a logic-based system’, Scalable Computing: Practice and Experience, 8(1), 1–13, (2007). [3] M. Alberti, F. Chesani, M. Gavanelli, E. Lamma, P. Mello, and P. Torroni, ‘Verifiable agent interaction in abductive logic programming: the SCIFF framework’, ACM Trans. on Computational Logic, 9(4), (2008). [4] H. Christiansen and V. Dahl, ‘HYPROLOG: A new logic programming language with assumptions and abduction.’, in ICLP, (2005). [5] U. Endriss, P. Mancarella, F. Sadri, G. Terreni, and F. Toni, ‘The CIFF proof procedure for abductive logic programming with constraints’, in JELIA 2004, eds., J. Alferes and J. Leite, volume 3229 of LNAI, (2004). [6] K. Eshghi and R. Kowalski, ‘Abduction compared with negation by failure’, in ICLP’89, eds., G. Levi and M. Martelli, pp. 234–255, (1989). [7] T. Fr¨uhwirth, ‘Theory and practice of constraint handling rules’, Journal of Logic Programming, 37(1-3), 95–138, (October 1998). [8] T. Fung and R. Kowalski, ‘The IFF proof procedure for abductive logic programming’, Journal of Logic Programming, 33(2), (1997). [9] M. Gavanelli, E. Lamma, P. Mello, M. Milano, and P. Torroni, ‘Interpreting abduction in CLP’, in APPIA-GULP-PRODE Joint Conf. on Declarative Programming, Reggio Calabria, Italy, (2003). [10] A. Kakas, R. Kowalski, and F. Toni, ‘The role of abduction in logic programming’, in Handbook of Logic in Artificial Intelligence and Logic Programming, eds., D. Gabbay, C. Hogger, and J. Robinson, (1998). [11] A. C. Kakas, A. Michael, and C. Mourlas, ‘ACLP: Abductive Constraint Logic Programming’, J. of Logic Programming, 44(1-3), (2000). [12] A. C. Kakas, B. van Nuffelen, and M. Denecker, ‘A-System: Problem solving through abduction’, in IJCAI-01, ed., B. Nebel, (2001).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-905
905
Symbolic Classification of General Multi-Player Games1 Peter Kissmann and Stefan Edelkamp2 Abstract. For general two-player turn-taking games, first solvers have been contributed. Algorithms for multi-player games like Maxn , however, cannot classify general games robustly, and its extension Soft-Maxn , which can play optimally against unknown and weak opponents, demands large amounts of memory. As RAM is a scarce resource, this paper proposes a memory-efficient implementation of the Soft-Maxn algorithm, by exploiting the functional representation of state and evaluation sets with BDDs.
1: (0, 100, 0) (0, 0, 100) 3
2
1 2: (0, 0, 100) 3 2
1
2: (0, 100, 0) 3 2
3: (0, 0, 100) 1
3 2
2: (0, 100, 0) 2
1
2
INTRODUCTION
In General Game Playing (GGP), the game is provided implicitly in form of a set of rules transforming the initial state eventually into some terminal one. The games we consider, described in the Game Description Language GDL [4], are fully observable, discrete, finite, and deterministic and provide discrete outcomes 0, . . . , 100 for each player. Thus, an automated player has no prior information on which game it actually plays. While much effort is spent in developing good players (e. g. [3]), the ultimate goal is to solve games and to provide a playing strategy. Most of the work in GGP research has focused on two-player games or on transformations into such. In [2] we proposed a symbolic classification algorithm for general two-player turn-taking games. Games with three or more players have received less attention. One line of research extends minimax search from the two-player to the multi-player scenario. As with two-player turn-taking games, opponents attempt to maximize their individual outcome. At each node in the game tree an evaluation vector (the Maxn vector) denotes the reward for each player. The successor with the maximum for the active player is selected. This results in the Maxn algorithm [5]. Even though this computes an equilibrium strategy, its deficiency is that it calculates only one of many equilibria and keeps no information about alternatives. As a bypass, Soft-Maxn [6] backs up the non-dominated information from the leaves to the root. All sets that are not dominated with respect to the active player are propagated bottom-up. Soft-Maxn has been shown to calculate a superset of all equilibria [6]. Unfortunately, the space demands of Soft-Maxn are large, as for each state a set of reward vectors has to be stored. Binary Decision Diagrams (BDDs) [1] have shown considerable advances in form of memory savings in the analysis of large systems. The crucial advantage compared to ordinary explicit-state retrograde analysis algorithms is a compact representation of state sets. As this allows studying much larger games, in this paper we consider a symbolic implementation of the Soft-Maxn algorithm to classify general multi-player games using BDDs to compactly compute and store evaluation sets. We apply the extension to a selection of small games given in GDL. 1 2
Thanks to DFG for support in ED 74/3 and 74/2. Dortmund University of Technology, Germany, email: {peter.kissmamn, stefan.edelkamp}@cs.uni-dortmund.de
1 1: (100, 0, 0) 1
1 3: (0, 100, 0)
1: (100, 0, 0)
1
3: (0, 0, 100)
1
1
3: (0, 0, 100)
1: (0, 0, 100)
1
2
2: (0, 100, 0)
2: (100, 0, 0)
Figure 1. Example graph for 3-player Nim with 5 matches. The nodes show the active player and the classified rewards, the edges the number of matches taken (dotted: 1, dashed: 2, solid: 3).
2
SOFT-MAXn GAME TREE SEARCH
The value of a game state is formalized by an evaluation vector, where component i denotes the value for player i. The Soft-Maxn algorithm [6] avoids the prediction of opponents’ tie-breaking strategies used in Maxn and thus allows to compute a robust player. When a tie is encountered, instead of choosing a single vector, a set of vectors is selected. This set represents the possible outcomes for a particular branch of a tree. The set of Soft-Maxn vectors for player i is computed as follows. For a leaf, the Soft-Maxn vector set consists of the exact evaluation vector. At an internal node, the Soft-Maxn vector set for that node is the union of all sets of its non-dominated children with respect to the current player3 . At the game tree’s root, the player to move can use any decision rule to select the best of the non-dominated moves. An example for Soft-Maxn is given in Figure 1. Here, we present our symbolic extension of Soft-Maxn (cf. Algorithm 1). All states are stored as BDDs and we extend the state description by the possible rewards for each player. After calculating the set of reachable states, the rewards for the players are set for each of the goal states by conjunction with the corresponding reward BDDs. These classified goal states are stored in the set class. For the backward search, we then take the set of states front, which are those unclassified states (unclass) whose successors are all classified. These are determined by the strong preimage: front ← ∀s , v . trans ⇒ class (s and v represent the effect state and reward variable sets). From these we take one state after the other and construct an array of all successors (succ), all of which we check for domination. This is done by calculating the conjunction with the less relation, which is defined for player i on the reward 3
A Soft-Maxn vector set V 1 strictly dominates another Soft-Maxn vector set V 2 with respect to player i iff for all v 1 ∈ V 1 and v 2 ∈ V 2 we have vi1 > vi2 .
906
P. Kissmann and S. Edelkamp / Symbolic Classification of General Multi-Player Games
Algorithm 1: Symbolic Soft-Maxn reach ← reachable (); class ← reach ∧ goal; for p = 1, . . . , |player| do class ← class ∧ i reward (p, i); unclass ← reach ∧ ¬class; front ← ⊥; while unclass = ⊥ do for p = |player| , . . . , 1 do front ← unclass ∧ move (p) ∧ ∀s , v . trans ⇒ class; if front = ⊥ then foreach state ∈ front do for a = 1, . . . , |A| do succa ← ∃s, v. state ∧ transa ; succa ← ∃s , v . succa ∧ s = s ∧ v = v ; dominated ← ∅; for i = 1, . . . , |A| − 1 do if i ∈ dominated then continue; for j = i + 1, . . . , |A| do if j ∈ dominated then continue; if succi ∧ succj ∧ lessp = ⊥ then dominated ← dominated ∪ {i}; break; if succj ∧ succi ∧ lessp = ⊥ then dominated ← dominated ∪ {j}; class ← class ∨ state ∧ i∈/ dominated ∃s. succi ;
Table 1. Results for the three-player version of Tic-Tac-Toe. The Rewards give the rewards for X player, O player and Eraser, respectively, for a line of X, a line of O and a full board with no line (from top to bottom), while Opt Outcome determines the possible outcomes in case of optimal play. t is the time needed for classification (in minutes), sreach the number of reachable states, sclass the number of classified states6 and nclass the number of BDD nodes needed to represent them. Rewards (100, 0, 0) (0, 100, 0) (50, 50, 100) (100, 0, 0) (0, 100, 0) (0, 0, 100) (100, 100, 0) (100, 100, 0) (0, 0, 100)
lessi :=
j=1
vij ∧
|Vi |
k=j+1
l
¬vik ∧
¬ v
l=1
i
m
|Vi |
j
∧
v
3.2
i
m=j+1
with Vi being the set of possible rewards for player i and vij the BDD representation of the j-th reward for player i. If this conjunction is not false, the index of the dominated successor will be inserted into the list of dominated states and it will not be considered any more. Once we are done, we calculate the BDD representing the set of dominating rewards by calculating the disjunction of the rewards of all non-dominated successors. This BDD is attached to the current state by calculating the conjunction. The classified current state will then be inserted into the set of classified states. With this we continue, until finally all states are classified.
3
t
sreach
sclass
nclass
(50, 50, 100)
20
39,742
44,319
5,257
(100, 0, 0) (0, 100, 0)
25
39,742
5,693
5,257
(100, 100, 0)
10
39,742
39,742
5,693
distribution, the X and O player do not mind if their opponent or the Eraser wins. Thus, the Eraser has no chance of winning, while the others can both win, most likely depending on the choice of the Eraser. With the third distribution, the X and O players cooperate against the Eraser, so that in the end the Eraser cannot prevent both of them from completing a line. This results in a victory for both players, though we cannot say who actually created a line.
variable sets v and v in the following way: |Vi |−1
Opt Outcome
EXPERIMENTAL RESULTS
We implemented the algorithm using JavaBDD4 , which provides an interface to Fabio Somenzi’s BDD-library CUDD5 . We performed the experiments on an AMD Opteron with 2.4 GHz and 4 GB RAM. We chose two games to experiment on: a three-player version of TicTac-Toe and a multi-player version of Nim.
Multi-player Nim
In Nim, we have a row of matches. In turn, each player may take one up to three of these. The player to take the last match wins the game. Table 2 shows the results obtained by the symbolic algorithm for different numbers of players n and different numbers of matches m. The resulting classification of the initial state is always a set of n − 1 different classifications, thus it suffices to give the player who surely loses. All the examples took less than one second to compute. Table 2. Results for the game Nim showing the number of players n the number of matches m, the losing player l, the number of BDD nodes b, and the number of classified states s. n 3 3 3 3 3 3 3 3 3 5
m 7 10 15 20 25 30 50 75 100 50
l 3 1 1 1 1 1 1 1 1 2
b 30 44 46 52 54 55 62 67 69 174
s 23 41 71 101 131 161 281 431 581 831
n 4 4 4 4 4 4 4 4 4 6
m 7 10 15 20 25 30 50 75 100 50
l 2 4 4 3 2 1 3 4 4 3
b 41 57 66 73 73 74 81 83 88 216
s 28 58 118 178 238 298 538 838 1,138 1,200
References 3.1
Three-player Tic-Tac-Toe
Three-player Tic-Tac-Toe works similar to the two-player version with the exception of a third player, the Eraser, who erases one of the symbols from the board. It is the Eraser’s turn each time the O player is done. We tried three different reward distributions. The results are shown in Table 1. The first distribution is similar to the two-player version and results in a filled board, i. e. a victory for the Eraser. In the second 4 5 6
http://javabdd.sourceforge.net http://vlsi.colorado.edu/∼fabio/CUDD/ A classified state is a state along with one of its associated classifications. Thus, a state with several classifications results in several classified states.
[1] Randal E. Bryant, ‘Graph-based algorithms for boolean function manipulation’, IEEE Transactions on Computers, 35(8), 677–691, (1986). [2] Stefan Edelkamp and Peter Kissmann, ‘Symbolic exploration for general game playing in PDDL’, in ICAPS-Workshop on Planning in Games, (2007). [3] Hilmar Finnsson and Yngvi Bj¨ornsson, ‘Simulation-based approach to general game playing’, in AAAI, (2008). [4] Nathaniel C. Love, Timothy L. Hinrichs, and Michael R. Genesereth, ‘General game playing: Game description language specification’, Technical Report LG-2006-01, Stanford Logic Group, (April 2006). [5] Carol A. Luckhardt and Keki B. Irani, ‘An algorithmic solution of Nperson games’, in AAAI, pp. 158–162, (1986). [6] Nathan R. Sturtevant and Michael Bowling, ‘Robust game play against unknown opponents’, in AAMAS, pp. 713–719. ACM, (2006).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-907
907
Redundancy in CSPs Assef Chmeiss and Vincent Krawczyk and Lakhdar Sais 1 Abstract. In this paper, we propose a new technique to compute irredundant sub-sets of constraint networks. Since, checking redundancy is Co-NP Complete problem, we use different polynomial local consistency entailments for reducing the computational complexity. The obtained constraint network is irredundant modulo a given local consistency. Redundant constraints are eliminated from the original instance producing an equivalent one with respect to satisfiability. Eliminating redundancy might help the CSP solver to direct the search to the most constrained (irredundant) part of the network.
1
Constraint Satisfaction
Constraint-satisfaction problems (CSPs) involve the assignment of values to variables which are subject to a set of constraints. The modeling and solving phases are known to be heavily interconnected. Indeed, the efficiency of the solver depends on the way the problem instance is modeled. Until recently, these two phases are considered separately. Many improvements have been proposed for the solving side and many other approaches have been suggested to simplify the crucial modeling step [1, 3]. As there exists several ways to model the same problem, this means that the user is not safe from introducing redundancies in such modeling process. Also redundancies might result from an incorrect encoding or merging different parts from several sources. The obtained constraint network (CN), might contain parts that can be removed without losing the information it carries. However, several forms of redundancies can be characterized. In this paper, we address constraint redundancies. A CSP is redundant if and only if some its constraints can be removed while preserving its set of models. As stated by Paolo Liberatore [4] in the context of propositional clausal formulae, the deletion of redundant constraints is clearly important for several reasons. First, removing redundant constraints can simplify the CN by reducing its size. A large amount of redundancies might obscure the real set of constraints (the irredundant part of the CN). In other cases, redundancy might indicate that some pieces of the CN are more important than the others. Consequently, depending on the application domain, redundancy might be either a positive or a negative concept. Our main goal is to measure the relationships between constraints redundancies and the efficiency of CSP solvers. As a side effect, our approach can be seen as a possible technique that can be used to check the degree of redundancy of a given CN. On the current available CSP instances, our approach might give a nice way to approximate their irredundant part. However, checking constraint redundancy, meaning that deciding if a given constraint can be deduced from the remaining part of the CN is known to be Co-NP complete 1
Universit´e Lille-Nord de France, CNRS UMR 8188, Artois, Rue Jean Souvraz, SP-18, F-62307 Lens, email:{chmeiss, krawczyk, sais}@cril.fr
[4]. To deal with this main drawback, in this paper, different polynomial local consistency entailments are used for reducing the computational complexity. The obtained CN is irredundant modulo a given local consistency.
1.1
Definitions and notations
A CSP is defined as a tuple P =< X , C >. X is a finite set of n variables {x1 , . . . , xn }. Each variable xi ∈ X is defined on a finite set of di values, denoted dom(xi ) = {vi1 , . . . vidi }. C is a finite set of m constraints {c1 , . . . , cm }. Each constraint ci ∈ C of arity k is defined as a couple (scope(ci ), Rci ) where scope(ci ) = {xi1 , . . . , xik } ⊆ X is the set of variables involved in ci and Rci ⊆ dom(xi1 ) × . . . × dom(xik ) the set of allowed tuples i.e. t ∈ Rci iff the tuple t satisfies the constraint ci . A CSP P is called binary iff ∀ci ∈ C, |scope(ci )| ≤ 2. A model (solution) is an assignment of a value for each variable x ∈ X which satisfies all the constraints. In this paper, we limit our presentation to binary CSPs. However, our proposed approach can be easily extended to n-ary CSPs. We define φ(P) as the CSP P obtained after applying a local consistency φ. For φ = AC this means that all arc-inconsistent values are removed from P. If there is a variable with an empty domain in φ(P), we denote φ(P) =⊥. The sub-network obtained after the assignment of a variable x to a value v is denoted P|x=v .
1.2
Tuple Arc Consistency
In this section, we propose a new filtering technique, called Tuple Arc Consistency (TAC). This local consistency is introduced to be exploited in our redundancy framework. The main idea is that instead of fixing one value as for SAC, we fix one tuple of a constraint c i.e. we assign the variables involved in c and we apply AC on the obtained sub-network. Definition 1 Let P be a CSP. A constraint cij ∈ C is Tuple Arc Consistent (TAC) iff ∀(a, b) ∈ Rcij , AC(P |xi =a,xj =b ) =⊥. P is TAC iff ∀c ∈ C, c is TAC.
2
Constraints redundancies
The redundancy, in a CSP, occurs when some informations are present several times, that is, a subset of constraints can be deduced from others. To determine if a constraint c is redundant or not, we need to solve the CSP with the negation of a constraint c. This problem is known to be Co-NP complete [4]. However, it’s possible to detect in polynomial time some redundant constraints while using entailment modulo a given local consistency. In this section, we define formally the notion of constraint redundancy in CSP and we
908
A. Chmeiss et al. / Redundancy in CSPs
instance name (#var, #ctr) bqwh-15-106 (106, 644) domino-1000-800 (1000, 1000) driverlogw-02c-sat ehi-85-297-0 (297, 4094) frb30-15-1 (30, 208) rlfap-graph1 (200, 1134) rlfapscen11-f10 (680, 4103)
AC, RedAC time % 0,03 572 (11%) 123,01 0 (100%) 5,5 1910 (53%) 0,26 4094 (0%) 0,01 208 (0%) 0,15 1134 (0%) 0,476 4103 (0%)
AC, RedT AC time % 0,16 570 (11%) 123,85 0 (100%) 10,36 1756 (57%) 12,29 776 (81%) 5,87 208 (0%) 101,83 885 (22%) 197,742 2954 (28%)
TAC, RedAC time % 0,13 559 (13%) 123,06 0 (100%) 6,78 1428 (65%) 10,72 0,11 208 (0%) 964 1134 (0%) 69,827 -
TAC, RedT AC time % 0,25 555 (14%) 123,75 0 (100%) 9,58 1367 (66%) 11,25 5,88 208 (0%) 1042,26 522 (54%) 72,548 -
Table 1. Results on benchmarks from the second international CSPs competition
show how we can use local filterings to detect some redundant constraints. In satisfiability problem redundancy modulo unit propagation has been shown very useful in practice namely on real-world instances [2]. A CSP P is redundant if it contains a subset of redundant constraints otherwise it is called irredundant. For a CSP P =< X , C > and a constraint c ∈ C , we define P \ {c} as the CSP P =< X , C\c >. We define the negation of a constraint c, noted ¬c, as the constraint c such that scope(c ) = scope(c) and Rc = {t|t ∈ ∀x∈scope(c) dom(x), t ∈ / Rc }. Definition 2 Let P be a CSP and c ∈ C. c is redundant iff P \ {c} ∪ ¬c is unsatisfiable. P is redundant iff ∃c ∈ C such that c is redundant. Otherwise P is said to be irredundant. To avoid solving the problem P \ {c} ∪ ¬c to see if c is redundant or not, we consider an incomplete but polynomial time algorithm to detect redundant constraints. We apply a local filtering φ such AC and TAC. Any other local consistency can be used. Checking if a constraint is redundant can be done using a refutation procedure. Namely, a constraint c ∈ C is redundant iff the constraint network in which c is substituted by its negation is unsatisfiable. This is clearly intractable. That’s why, we define weaker form of refutation inducing a weaker form of redundancy. Definition 3 Let P be a CSP and φ a local consistency. A constraint c ∈ C is φ-redundant iff φ(P\{c} ∪ {¬c}) =⊥. A CSP P is called φ-redundant (respectively φ-irredundant) iff it (respectively does not) contains φ-redundant constraints. Algorithm 1: Computing a φ irredundant constraint network Input: P =< X , C > Output: A φ-irredundant CSP P 1 for each c ∈ C do 2 P ← P \ {c} ∪ ¬c; 3 if φ(P ) =⊥ then 4 C ← C\c;
3
Preliminary experiments
In this section, we show the practical interest of our approach. We present the reduction power in terms of the percentage of deleted φ-redundant constraints with φ instantiated to AC and T AC. As a CSP solver, we used the MAC algorithm with dom/WDeg. In table 1, which presents results on some instances, we provide the percentage of φ-redundant constraints. In the four double-columns, the results obtained by applying a φ consistency as a preprocessing and φ-redundancy checking are given. For example, in the second double column (AC, RedT AC ) means that we apply AC on the original problem then we check constraints redundancy using T AC. For each case, we give the run time (in seconds), the number of remaining constraints and the percentage of φ-redundant constraints. Instances solved in the preprocessing step, are indicated with a dash ”-”. On the domino-1000-1000 instance, we can see that all constraints are deleted. In fact, all the constraints become redundant since, after the preprocessing, there is one value for each domain and the instance is proven satisfiable. On the contrary, for some instances like frb30-15-1, the technique does not detect any redundant constraint. For the bqwh-15-106 and driverlogw-02c-sat instances, we remark that a stronger filtering like TAC detects more redundant constraints than AC. This remark is confirmed for other classes like ehi-85 and rlfap-graph1 where the detection of redundant constraints by TAC is more significant than with AC. For classes ehi-85 and rlfap-scen11, the filtering technique TAC prove inconsistency during the preprocessing step.
4
Conclusions
In this paper, a new approach to compute irredundant sub-sets of CN is proposed. Using polynomial time local consistency techniques for redundancy checking, significant reductions in the size of the have been obtained on many classes of CSP instances. The obtained subnetwork is irredundant modulo a local consistency entailment. The new filtering TAC we propose is clearly powerful for detecting redundant constraints. Used as a preprocessing some classes of instances are solved without search.
REFERENCES In algorithm 2, φ can be replaced by any local consistency filtering like AC and TAC. The complexity of Algorithm 1 is polynomial. If we use an AC filtering whose the time complexity is O(md2 ), then the time complexity of the algorithm 1 is bounded by O(m2 d2 ). Let us note that using different constraint orderings in the algorithm 1, might lead to different φ-irredundant constraints subnetworks.
[1] C. Bessi`ere, R. Coletta, B. O’Sullivan, and M. Paulin, ‘Query-driven constraint acquisition’, in IJCAI’2007, pp. 50–55, (2007). [2] O. Fourdrinoy, E. Gr´egoire, B. Mazure, and L. Sa¨ıs, ‘Reducing hard sat instances to polynomial ones’, in IEEE-IRI’07, pp. 18–23, (2007). [3] A. M. Frisch, C. Jefferson, B. Mart´ınez Hern´andez, and I. Miguel, ‘The rules of constraint modelling’, in Ijcai’2005, pp. 109–116, (2005). [4] P. Liberatore, ‘Redundancy in logic i: Cnf propositional formulae’, Artif. Intell., 163(2), 203–232, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-909
909
Reinforcement Learning and Reactive Search: an adaptive MAX-SAT solver Roberto Battiti and Paolo Campigotto 1 1
Introduction
This paper investigates Reinforcement Learning (RL) applied to online parameter tuning in Stochastic Local Search (SLS) methods. In particular, a novel application of RL is proposed in the Reactive Tabu Search (RTS) scheme, where the appropriate amount of diversification in prohibition-based local search is adapted in a fast online manner to the characteristics of a task and of the local configuration. The experimental tests demonstrate promising results on Maximum Satisfiability (MAX-SAT) instances when compared with state-of-the-art SLS SAT solvers, such us AdaptNovelty+ , rSAPS and gNovelty+ .
2
Reinforcement Learning for Reactive Tabu Search
This paper investigates a novel application of Reinforcement Learning in the framework of Reactive Tabu Search (RTS) proposed in [1]. Tabu Search (TS) is a prohibition-based search technique based on local search. At a given iteration some local search moves (e.g., variable flips in the case of the SAT) are prohibited, only a non-empty subset of them is allowed: the local search move executed at iteration t will not be allowed for the next T iterations, where T is the prohibition parameter. In this work, T is assumed to take values over the interval [Tmin , Tmax ]. RTS is a proposal to determine a dynamic value of the prohibition parameter which is appropriate to a specific instance and to the local characteristics of the fitness surface around the current configuration. Among all the RL methods developed, we consider the LeastSquares Policy Iteration (LSPI) algorithm [4], a form of model-free approximate policy iteration using a set of training samples collected in any arbitrary manner. In [6], we present an off-line application of LSPI to tune the prohibition parameter, in particular by considering an application to the MAX-SAT problem. The parameter-tuning policy is modeled as a Markov Decision Process (MDP) where the states summarize relevant information about the recent history of the search, and a near-optimal policy is determined by using the LSPI method. In this work, we consider an online version of the method to determine a critical algorithm parameter while the algorithm is running on a selected instance. The impact of different choices for designing the Markov states and the definition of the basis function for the approximation architecture are discussed. The effect of changing the prohibition parameter on the algorithm’s behavior can only be evaluated after a reasonable number of local moves. We therefore divide the algorithm’s trace into epochs 1
Dipartimento di Ingegneria e Scienza dell’Informazione, University of Trento, Italy, email: {battiti, campigotto}@disi.unitn.it
(E1 , E2 , . . . ) composed of a suitable number of local moves, and allow changes of T only between epochs. The state at the end of epoch Ei is a collection of features extracted from the algorithm’s execution up to that moment. Assume n and m the number of variables and clauses of the input SAT instance, respectively. Let f (x) the score function counting the number of unsatisfied clauses in the truth assignment x. Each state of the MDP is created by observing the behavior of the Tabu search algorithm over an epoch of 2∗Tmax consecutive variable flips. In particular, let us define the following: • xbsf is the “best-so-far” (BSF) configuration before the current epoch; • Tf is the current fractional prohibition value (the actual prohibition period is T = nTf ); • f epoch is the average value of f during the epoch; • H epoch is the average Hamming distance during the current epoch from the configuration at the beginning of the current epoch itself. These variables have been chosen because of the Reactive Search paradigm’s concern on the trade-off between diversification (the ability to explore new configurations in the search space by moving away from local minima) and bias (the preference for configurations with low objective function values). The compact state representation chosen to describe an epoch is the following triplet: « „ f epoch − f (xbsf ) H epoch where Δf = , Tf , . s ≡ Δf, n m The first component is the mean change of f in the current epoch with respect to the best value; all components of the state have been normalized. The actions set is composed by two choices: A = {increase, decrease}, with the following effects: j max {Tf · 1.1, Tf + 1/n} if a = increase (1) Tf = if a = decrease min {Tf /1.1, Tf − 1/n} Changes in Tf are designed in order to ensure variation of at least 1 in the actual prohibition period T . In addition, Tf is bounded between a minimum and a maximum value (0 and .2 in our experiments). An alternative definition for the actions set consists of setting Tf from scratch by one of the 20 uniformly distributed values in the range [0.01, 0.2]: Tf = 0.01 ∗ i, where i ∈ [1, 20]
(2)
The reward signal is given by the normalized change of the best value achieved in the observed epoch with respect to the “best-sofar” value before the epoch: (f (xbsf ) − f (xlocalBest ))/m.
910
R. Battiti and P. Campigotto / Reinforcement Learning and Reactive Search: An Adaptive MAX-SAT Solver
210
Mean best so far
200
Mean best so far
180
rsaps AdNov+ h_rts LSPI for SAT gNov+
190
180
170
160
150 1000
170
10000
100000
Figure 2. Performance of the two implemented actions for the update of the Tf value: increasing/decreasing vs setting Tf from scratch.
100000
Iterations
Figure 1. The comparison among our RL-based method and other SAT solvers.
For the case of the actions set defined via Eq. 1, we use the basis function set presented in [6]. If the actions set is defined by Eq. 2, assume action a being “set Tf to 0.01 ∗ i”, i ∈ [1, 20], and Φj (s, a) being the j-th entry for the considered basis function vector Φ(s, a). We have: 8 Δf > > > > H epoch > > > > < H epoch · Δf (Δf )2 Φj (s, a) = > 2 > > H epoch > > > > > i/100 : 0
if j = 1 if j = 2 if j = 3 if j = 4 if j = 5 if j = 5 + i otherwise
(3)
The training phase is executed online, while solving a single SAT instance. This design choice implies that the best policy learnt by the SAT solver is not defined a priori by an off-line training phase over selected SAT instances, but it is determined by learning while the target optimization task is performed. During an initial set up phase, 100 training examples for the input SAT instance are extracted to calculate the initial policy. Then, the solving phase is started. As soon as the search history provides a new example, it is added to the training set and the policy is updated.
3
10000 Iterations
160
150 1000
increasing/decreasing Tf setting Tf from scratch
Experimental results
For our tests, we use the benchmark described in [5], formed by MAX-3-SAT random instances with 500 variable and 5000 clauses. The Tf parameter has been bounded in [0, .2]. To evaluate our novel MAX-SAT solver based on Reinforcement learning we report here a comparison with some of the best and famous SLS algorithms for MAX-SAT. In particular, the SLS techniques considered are the the AdaptNovelty+ [7], the the RSAPS, a reactive version of SAPS [3], the H RTS [1], and the gNovelty+ [2]. For each algorithm, 10 runs with different random seeds are performed for each of the 50 instances taken from the benchmark set, for a total of 500 tests. Fig. 1 shows the average results as a function of the number of iterations (flips). Fig. 1 indicates that our RL-based approach is competitive with the other existing SLS MAX-SAT solvers. In the experiment in Fig. 1, for our RL-based approach we consider the case where the update of the Tf value is performed by Eq. 1. However, in Sec. 2 we presented two possible definitions for the action that updates the value of Tf : 1. Tf is increased/decreased by the value 1/n (see Eq. 1);
2. Tf is set from scratch via Eq. 2. Fig. 2 compare the two hypotheses, showing an improvement in the second case. E.g., at iteration 100000 an improvement of 2.4% in the mean best-so-far value is registered. Setting the Tf value from scratch, our algorithm reaches the optimal performance of H RTS. Furthermore, for the first hypothesis, a bigger increase/decrease of the Tf parameter has also been tested. In particular, we replaced the factor 1.1 in Eq. 1 by the value 1.3. However, in this case we obtain a little bit worse results.
4
Conclusions
This paper describes an application of Reinforcement Learning for the online tuning of the prohibition parameter in the Reactive Tabu Search algorithm. We discussed a couple of relevant architectural choices and presented preliminary experimental results. The results are promising: over the MAX-SAT benchmark considered our algorithm performs better than the gNovelty+ , which is a Gold Medal winner in the random category in the SAT 2007 competition and achieves results which are comparable with those obtained by the original RTS algorithm. These findings are confirmed by the additional experimental work not presented in this paper because of space limits.
REFERENCES [1] R. Battiti and M. Protasi, ‘Reactive search, a history-sensitive heuristic for MAX-SAT’, ACM Journal of Experimental Algorithmics, 2(ARTICLE 2), (1997). http://www.jea.acm.org/. [2] C. Gretton D.N. Pahm, J.R. Thornton and A. Sattar, ‘Advances in local search for satisfiability’, in 20th Australian Joint Conference on Artificial Intelligence, Gold Coast, Australia, December 2-6, 2007, ed., John Thornton Mehmet Orgun, number 4830 in Lecture Notes in Computer Science, pp. 213–222. Springer, (2007). [3] D.A.D. Tompkins F. Hutter and H.H. Hoos, ‘Scaling and probabilistic smoothing: Efficient dynamic local search for sat’, in Proc. Principles and Practice of Constraint Programming - CP 2002 : 8th International Conference, CP 2002, Ithaca, NY, USA, September 9-13, volume 2470 of LNCS, pp. 233–248. Springer Verlag, (2002). [4] M.G. Lagoudakis and R. Parr, ‘Least-Squares Policy Iteration’, Journal of Machine Learning Research, 4(6), 1107–1149, (2004). [5] D. Mitchell, B. Selman, and H. Levesque, ‘Hard and easy distributions of SAT problems’, in Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), pp. 459–465, San Jose, Ca, (July 1992). [6] M. Brunato R. Battiti and P. Campigotto, ‘Learning while optimizing an unknown fitness surface’, in Proceedings of the 2nd Learning and Intelligent OptimizatioN Conference (LION II), Trento, Italy, Dec 10-12, 2007. Springer LNCS, in press, (2008). [7] D.A.D. Tompkins and H.H. Hoos. Novelty+ and adaptive novelty+ . SAT 2004 Competition Booklet. (solver description).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-911
911
A MAX-SAT Algorithm Portfolio1 Paulo Matos and Jordi Planes and Florian Letombe and Jo˜ao Marques-Silva2 Abstract. The results of the last MaxSAT Evaluations suggest there is no universal best algorithm for solving MaxSAT, as the fastest solver often depends on the type of instance. Having an oracle able to predict the most suitable MaxSAT solver for a given instance would result in the most robust solver. Inspired by the success of SATzilla for SAT, this paper describes the first approach for a portfolio of algorithms for MaxSAT. Compared to existing solvers, the resulting portfolio can achieve significant performance improvements on a representative set of instances.
1 Introduction In recent years, one of the optimization counterparts of the Boolean satisfiability problem (SAT) has attracted the interest of researchers: the maximum satisfiability (MaxSAT) problem. MaxSAT and its variations find a number of relevant applications, including scheduling and design automation [12, 13]. This work is the first attempt to implement and evaluate an algorithm portfolio solving MaxSAT problems. The portfolio computes several features of an instance and estimates the runtime for each solver in the portfolio. Then, it solves the instance with the estimated fastest solver. An extended number of instances have been considered, indicating that the portfolio is able to solve more instances, from the selected set of instances, than any other solver. Besides, the total run time to solve is lower for the portfolio, despite the time spent in the feature computation. The paper is organized as follows, Section 2 gives the notions for MaxSAT solving; Section 3 introduces the portfolio learning process; and Section 4 explains the steps to execute and test the portfolio, and discusses the experimental results. The paper concludes in Section 5.
2 Preliminaries This section provides a brief introduction to the MaxSAT problem solving. Familiarity with SAT and related topics is assumed [1]. The MaxSAT problem consists of finding an assignment which satisfies the maximum number of clauses in a CNF formula. MaxSAT algorithms have been the subject of significant improvements over the last decade (e.g., see [7, 5] for a review of past work). Despite the clear relation with the SAT problem, most modern SAT techniques cannot be applied directly to the MaxSAT problem (e.g. unit propagation or clause learning). As a result, the most successful MaxSAT algorithms, in the most recent MaxSAT Evaluations, implement branch and bound search, and integrate sophisticated lower 1 2
This work is partially supported by EPSRC grant EP/E012973/1, and by EU grants IST/033709 and ICT/217069. School of Electronics & Computer Science, University of Southampton, UK, email: {pocm,jp3,fl,jpms}@ecs.soton.ac.uk
bounding and inference techniques. However, past MaxSAT Evaluations did not consider complex problem instances from practical applications. As a result, we have also considered for the portfolio a set of practical problem instances and a recent solver focused on such instances, msu [10]. We have focused on the experience of an existent efficient portfolio, SATzilla [14], an algorithm portfolio for SAT, which has demonstrated to be a robust solver and very competitive in the SAT Competitions3 . Before SATzilla, Gomes and Selman [3] worked with stochastic search portfolios on several N P-Complete problems. There exists also other preliminary works on algorithm portfolios that deal with problems similar to MaxSAT [8, 6, 4].
3 Model Generation The capacity to predict the time that a solver will spend on a given instance is one of the key aspects in the design of an algorithm portfolio. The prediction is done using a model created by a learning process over a set of instances. Once the model is created, the portfolio computes the features for a given instance and, based on the model, decides which solver to run. Our models are linear functions i>0 βi xi + β0 , which compute the approximate runtime of a solver on a particular instance i. For the linear function, xi is the value for the feature i of the instance and βi are the coefficients to be found for each feature by the model generator. After several steps of forward selection and basis function expansion, in order to fit supra-linear data, we perform ridge regression [9] to obtain the unknowns βi . Forward selection is performed to reduce the number of interesting features. Basis function expansion of the feature set, on the other hand, allows a linear model as the one we used to model supra-linear data (which allowed us to generate the quadratic model presented in section 4). Data preprocessing also handles cases where a solver timed-out on a specific instance by removing it from the training set. The process of generating the model is executed for every solver in the portfolio. After each model is computed, it is tested over a test set. Our model generator was tested for correctness by generating random data and finding a model for it. If the data can be fit using our model, the model output should be the same as the model used to generate the random data. The selected solvers to be used are of three different kinds for the sake of complementarity: a Pseudo-Boolean Optimization solver, minisat+ [2]; a recent solver that efficiently deals with real problem instances, msu [10]; and the strongest solver in the MaxSAT category in the MaxSAT Evaluation 2007, maxsatz [7]. The solver maxsatz implements a branch and bound search and integrates sophisticated lower bounds and inference techniques. On the other hand, algo3
http://www.satcompetition.org/
912
P. Matos et al. / A MAX-SAT Algorithm Porfolio
rithm msu is a process that iteratively solves several SAT problem instances, until it reaches the MaxSAT solution. Three kind of features have been considered [11]: problem size features, balance features and local search probe features. The most important features (among the first selected by forward selection) are in the set of local search probes.
the rest of solvers. We are aware, however, that this can still be improved. As mentioned earlier, our learning method does not handle solver timeouts, which means that our portfolio is biased regarding solvers which timeout often and solve a few instances in short time. Still, having a portfolio capable of achieving these initial results motivates additional research in algorithm portfolios for MaxSAT.
4 Experimental Results
5 Conclusions
The experimentation has been performed in a Linux Intel Xeon 3.0 GHz. A timeout of 1000 seconds was used for all MaxSAT solvers considered. The memory limit was set to 3GB. Some of the sets of in-
This paper presents a method to develop an algorithm portfolio for the MaxSAT problem. Given that no benchmark repository exists for MaxSAT, problem instances from real world problems and from the MaxSAT evaluation have been used. To the best of our knowledge, this is the first algorithm portfolio for MaxSAT problem. From the experimental results we conclude that our MaxSAT algorithm portfolio is the most robust solver among the MaxSAT problem instances we have considered. Future research work includes adapting the model generator to handle timeouts, and also adapting the solver portfolio solver to deal with Partial MaxSAT and Weighted MaxSAT. Additional research on identifying suitable features will be required for further improving the model used.
msu3.1 507
minisat+ 211
maxsatz 135
pfquad 524
pflin 548
oracle 582
Table 1. Total number of solved instances for each solver
stances considered are from the MaxSAT Evaluation 2007, the ones considered hard to solve and close to real problems; and instances from real problems: circuit design and planning. There are 586 instances from the following sets: RAMSEY, SPINGLASS, MAXCUT from the MaxSAT Evaluation 2007; DEBUG, IBM, UCLID, PIMAG from circuit design; SATPLAN from planning problems converted to SAT instances. In order to check our portfolio, we have created the oracle, a virtual portfolio which always selects the best possible result. The entries pflin and pfquad correspond to our portolios using a linear model of the features and a quadratic model of the features respectively. A preprocessing time per instance has been added to its total time. In Table 1, we can notice the portfolio is the most robust MaxSAT solver, since it solves the largest number of instances. 500000 Total time spent (in seconds)
450000 400000 350000 300000 250000 200000 150000 100000 50000
Figure 1.
oracle
pflin
pfquad
maxsatz
minisat+
msu3.1
0
Total spent time in seconds for each solver in MaxSAT
Figure 1 shows the total time taken by each of the solvers in the portfolio, our two portfolio models and the oracle. The results obtained by our models are close to the oracle, and spend less time than
REFERENCES [1] Lucas Bordeaux, Youssef Hamadi, and Lintao Zhang, ‘Propositional satisfiability and constraint programming: A comparative survey’, ACM Computing Surveys, 38(4), (2006). Electronic Edition, 54 pages. [2] Niklas Een and Niklas S¨orensson, ‘Translating pseudo-boolean constraints into SAT’, Journal on Satisfiability, Boolean Modeling and Computation, 2, 1–26, (2006). [3] Carla P. Gomes and Bart Selman, ‘Algorithm portfolios’, Artificial Intelligence, 126(1-2), 43–62, (2001). [4] Kevin Keyton-Brown, Eugene Nudelman, Galen Andrew, Jim McFadden, and Yoav Shoham, ‘A portfolio approach to algorithm selection’, in International Joint Conference on Artificial Intelligence - IJCAI’03, pp. 1542–1543, (2003). [5] Javier Larrosa, Federico Heras, and Simon de Givry, ‘A logical approach to efficient max-SAT solving’, Artificial Intelligence, 172(2–3), 204–233, (2008). [6] Kevin Leyton-Brown, Eugene Nudelman, and Yoah Shoham, ‘Learning the empirical hardness of optimization problems’, in Principles and Practice of Constraint Programming CP’02, volume 2470 of LNCS, pp. 556–572, (2002). [7] Chu Min Li, Felip Manya, and Jordi Planes, ‘New inference rules for max-SAT’, Journal of Artificial Intelligence Research, 30, 321–359, (2007). [8] Lionel Lobjois and Michel Lemaˆıtre, ‘Branch and bound algorithm selection by performance prediction’, in National Conference on Artificial Intelligence - AAAI’98, pp. 353–358, (1998). [9] Donald W. Marquardt and Ronald D. Snee, ‘Ridge regression in practice’, The American Statistician, 29(1), 3–20, (1975). [10] Joao Marques-Silva and Jordi Planes, ‘Algorithms for maximum satisfiability using unsatisfiable cores’, in Design, Automation and Test in Europe - DATE’08, (2008). [11] Eugene Nudelman, Alex Devkar, Yoav Shoham, and Kevin LeytonBrown, ‘Understanding random SAT: Beyond the clauses-to-variables ratio’, in Principles and Practice of Constraint Programming CP’04, volume 3258 of LNCS, pp. 438–452, (2004). [12] Sean Safarpour, Hratch Mangassarian, Andreas Veneris, Mark H. Liffiton, and Karem A. Sakallah, ‘Improved design debugging using maximum satisfiability’, in Formal Methods in Computer Aided Design FMCAD’07, pp. 13–19, (2007). [13] Hui Xu, R. A. Rutenbar, and Karem A. Sakallah, ‘sub-SAT: a formulation for relaxed boolean satisfiability with applications in routing’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(6), 814–820, (2003). [14] Lin Xu, Frank Hutter, Holger Hoos, and Kevin Leyton-Brown, ‘SATzilla-07: The design and analysis of an algorithm portfolio for SAT’, in Principles and Practice of Constraint Programming CP’07, volume 4741 of LNCS, pp. 712–727, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-913
913
On the Practical Significance of Hypertree vs. Tree Width Rina Dechter1, Lars Otten1, and Radu Marinescu2 Abstract. The recently introduced notion of hypertree width has been shown to provide a broader characterization of tractable constraint and probabilistic networks than the tree width. This paper demonstrates empirically that in practice the bounding power of the tree width is still superior to the hypertree width for many benchmark instances of both probabilistic and deterministic networks.
1. For each fj ∈ F , there is at least one v ∈ V such that fj ∈ ψ(v) . 2. If fj ∈ ψ(v), then scope(fj ) ⊆ χ(v) . 3. For each xi ∈ X, the set {v ∈ V |xi ∈ χ(v)} induces a connected subtree of T . The tree width of T is w = maxv∈V |χ(v)|−1 . T is also a hypertree decomposition if it satisfies the following additional condition: 4. For each v ∈ V , χ(v) ⊆
fj ∈ψ(v)
scope(fj ) .
1 INTRODUCTION
In this case, the hypertree width of T is hw = maxv∈V |ψ(v)| .
Inference in graphical models is known to be time and space exponential in the problem graph’s tree width. In practice, however, this measure is often inaccurate, since it ignores the effects of determinism in problem solving, which can, for instance, lead to pruning of large parts of the search space. To that end, in 2000 [3] introduced a parameter called hypertree width and showed that for constraint networks it is more effective in capturing tractable classes. In [4], its applicability was extended to inference algorithms over general graphical models having relational function specification. In this paper we explore the practical significance of the hypertree width against the tree width from a more practical angle. We show empirically, on probabilistic and deterministic benchmarks, that in most cases the tree width yields a far better predictor of instancebased complexity than the hypertree width, except when the problem has substantial determinism. The outline of this paper is as follows: Section 2 gives a brief overview of the two decomposition schemes. Section 3 provides the empirical results, and Section 4 concludes.
Finding tree and hypertree decompositions of minimal width is known to be NP-complete, therefore heuristic algorithms are employed in practice [4, 2]. Once a tree or hypertree decomposition is available, it can be processed by the suitable version of a message passing algorithm like Cluster-Tree Elimination (CTE) [4]. Allowing a probabilistic function to be placed in more than one node will lead to incorrect processing by CTE for any graphical model other than constraint networks. To remedy this we modify multiple showings of a function by flattening all but one of them into a 0/1-valued constraint.
2 DECOMPOSITION SCHEMES We assume the usual definitions of directed and undirected graphs, hypergraphs, primal and dual graphs, and hypertrees. A graphical model is typically defined to be a set of real-valued functions F = {f1 , . . . , fl } over a set of variables X = {x1 , . . . , xn } with domains D = {D1 , . . . , Dn }, together with a combination operator like summation or multiplication. The scope of a function fj , denoted scope(fj ), is the set of variables on which fj is defined. A common approach to solving graphical model problems is to cluster variables and functions such that the resulting decomposition exhibits tree structure: D EFINITION 1 A tree decomposition of a graphical model is a triple T = T, χ, ψ, where T = (V, E) is a tree and χ and ψ are labeling functions that associate with each vertex v ∈ V two sets, χ(v) ⊆ X and ψ(v) ⊆ F , that satisfy the following conditions: 1 2
Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697-3435. Email: {dechter,lotten}@ics.uci.edu Cork Constraint Computation Centre, Department of Computer Science, University College Cork, Ireland. Email: r.marinescu@4c.ucc.ie
Complexity bounds The time complexity of algorithm CTE, when executed on a tree decomposition T with tree width w, has been shown to be O((r + m) · deg · kw+1 ) ,
(1)
where r is the number of functions in the problem, m the number of clusters in T , and deg the maximum degree in T . The space complexity is O(m·kw ) [4]. By virtue of using a tree decomposition, however, the hypergraph structure of the problem is completely ignored. Bound (1) does therefore not account for any determinism that might be present in the function specifications. To that end, if T is also a hypertree decomposition, algorithm CTE can be adapted to exploit the hypergraph structure. Assuming T has hypertree width hw, the time complexity of applying CTE can be shown to be O(m · deg · hw · log t · thw ) ,
(2)
where t bounds the size of the relational representation of each function in the problem (i.e., the number of zero cost tuples in CSPs and the number of non-zero probability tuples in belief networks). Space complexity is O(thw ) [3, 4]. We note that bound (2) indeed takes determinism into account by using the parameter t, which denotes the number of relevant tuples in a function table. It is clear that t ≤ kr ≤ kw , where r is the maximum function arity. Hence bound (2) can only yield tighter results when the problem instance possesses a high degree of determinism. While it has been shown that hypertree decompositions are strictly more general than tree decompositions, it is unclear how the asymptotic bounds compare for practical problem instances.
914
R. Dechter et al. / On the Practical Significance of Hypertree vs. Tree Width
n
instance pedigree1 pedigree18 pedigree20 pedigree23 pedigree25 pedigree30 pedigree33 pedigree37 pedigree38 pedigree39 pedigree42 pedigree50 pedigree7 pedigree9 pedigree13 pedigree19 pedigree31 pedigree34 pedigree40 pedigree41 pedigree44 pedigree51 mm mm mm mm mm mm
03 03 03 04 04 10
08 08 08 08 08 08
03 04 05 03 04 03
k r t w hw Genetic linkage 334 4 5 32 16 13 1184 5 5 50 22 18 437 5 4 50 24 16 402 5 4 50 29 15 1289 5 5 50 27 19 1289 5 5 50 25 18 798 4 5 32 31 21 1032 5 4 32 22 13 724 5 4 50 18 10 1272 5 4 50 25 18 448 5 4 50 24 16 514 6 4 72 18 10 1068 4 4 32 40 23 1118 7 4 50 31 21 1077 3 4 18 35 29 793 5 5 50 27 21 1183 5 5 50 34 29 1160 5 4 32 32 25 1030 7 5 98 31 24 1062 5 5 50 35 25 811 4 5 32 28 22 1152 5 4 50 44 33 Mastermind puzzle game (WCSP) 1220 2 3 4 21 14 2288 2 3 4 31 20 3692 2 3 4 40 25 1418 2 3 4 26 17 2616 2 3 4 38 24 2606 2 3 4 56 34
R
9.934 15.204 10.408 5.214 13.408 13.107 12.944 4.190 4.408 13.107 10.408 4.567 10.536 9.480 19.704 16.806 25.505 15.262 21.591 18.010 16.256 25.311 2.107 2.709 3.010 2.408 3.010 3.612
k r t w Coding networks BN 126 512 2 5 16 56 BN 127 512 2 5 16 55 BN 128 512 2 5 16 50 BN 129 512 2 5 16 54 BN 130 512 2 5 16 53 BN 131 512 2 5 16 53 BN 132 512 2 5 16 52 BN 133 512 2 5 16 56 BN 134 512 2 5 16 55 Dynamic Bayesian networks BN 21 2843 91 4 208 7 BN 23 2425 91 4 208 5 BN 25 1819 91 4 208 5 BN 27 3025 5 7 3645 10 BN 29 24 10 6 999999 6 Digital circuits c432.isc 432 2 10 512 28 c499.isc 499 2 6 32 25 s386.scan 172 2 5 16 19 s953.scan 440 2 5 16 66 Radio frequency assignment (WCSP) CELAR6-SUB0 16 44 2 1302 8 CELAR6-SUB1-24 14 24 2 301 10 CELAR6-SUB1 14 44 2 928 10 CELAR6-SUB2 16 44 2 928 11 CELAR6-SUB3 18 44 2 928 11 CELAR6-SUB4-20 22 20 2 396 12 CELAR6-SUB4 22 44 2 1548 12
instance
n
hw
R
21 22 20 21 21 21 21 21 21
8.429 9.934 9.031 9.031 9.332 9.332 9.633 8.429 8.730
4 3 2 2 2
-4.441 -2.841 -5.159 0.134 6.000
22 25 8 38
51.175 30.103 3.913 25.889
4 5 5 6 6 6 6
-0.689 -1.409 -1.597 -0.273 -0.273 -0.026 -0.583
Table 1. Selected experimental results comparing the tree width and hypertree width based bounds.
3 EXPERIMENTAL RESULTS
4
We evaluated empirically the tree width and hypertree width bounds on 112 practical probabilistic networks and 30 constraint networks. Problem instances were obtained from various sources; all of them are available online3 . To obtain a tree decomposition of a problem, we perform bucket elimination along a minfill ordering (random tie breaking, optimum over 20 iterations). The tree decomposition is then extended to a hypertree decomposition by the method described in [2], where variables in a decomposition cluster are greedily covered by functions. For each problem instance we collected the following statistics: the number of variables n, the maximum domain size k, the maximum function arity r, and the maximum function tightness t. We also report the best tree width and hypertree width found in the experiments described above. hw We define the measure R := log10 ( tkw ) . This compares the two dominant factors of the w bound (1) and the hw bound (2). If R is positive, it signifies how many orders of magnitude tighter the w bound is when compared to the hw bound, and vice versa for negative values of R. Some select instances are shown in Table 1, the full set of results is available in an extended version of this paper [1]. Out of the 112 belief networks, the hw bound was only superior for 5 instances, and not by many orders of magnitude. On the other hand, for genetic linkage instances with considerable determinism in their CPTs, the hw bound is significantly worse, as is the case for most other belief networks. This situation does not change much for constraint problems, except for radio frequency assignment, where the hw bound fares somewhat better, but only by a small margin. In summary we can review that, in order for the hypertree width bound to be competitive with, or even superior to, the tree width bound, problem instances need to comply with several conditions; foremost among these are very tight function specifications. The latter is promoted by large variable domains and high function arity, which we found to be not the case for the majority of practical problem instances.
The contribution of this paper is in exploring empirically the practical benefit of the hypertree width compared with the tree width in bounding the complexity of algorithms over given problem instances. Statistics collected over 112 Bayesian networks instances and 30 weighted CSPs provided interesting, yet somewhat sobering information. We confirmed that while the hypertree is always smaller than the tree width, the complexity bound it implies is often inferior to the bound suggested by the tree width. Only when problem instances possess substantial determinism and when the functions have large arity, the hypertree can provide bounds that are tighter and therefore more informative than the tree width. The above empirical observation raises doubts regarding the need to obtain good hypertree decompositions beyond the already substantial effort into the search of good tree-decompositions, that has been ongoing for three decades now.
3
Repository at http://graphmod.ics.uci.edu/
CONCLUSION
ACKNOWLEDGEMENTS This work was partially supported by NSF grant IIS-0713118 and NIH grant R01-HG004175-02.
REFERENCES [1] R. Dechter, L. Otten, and R. Marinescu: ‘On the Practical Significance of Hypertree vs. Tree Width’, Technical Report, University of California, Irvine, (2008). [2] G. Gottlob, M. Grohe, N. Musliu, M. Samer, and F. Scarcello, Hypertree Decompositions: ‘Structure, Algorithms, and Applications’, International Workshop on Graph-Theoretic Concepts in Computer Science, (2005). [3] G. Gottlob, N. Leone, and F. Scarcello, ‘A comparison of structural CSP decomposition methods’, Artificial Intelligence, (2000). [4] K. Kask, R. Dechter, J. Larrosa, and A. Dechter, ‘Unifying treedecompositions for reasoning in graphical models’, Artificial Intelligence, (2005).
9. Planning and Scheduling
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-917
917
A New Approach to Planning in Networks Jussi Rintanen NICTA & the Australian National University Canberra, Australia Abstract. Control of networks like those for transportation, power distribution, communication to name a few, provides challenges to planning and scheduling. Many problems can be defined in terms of a basic state space model, but more general problems require an expressive language for talking about the topology and connectivity of the system, which are outside the scope of standard planning languages. In this work we introduce a general framework for defining planning languages for networked systems, with capability to express properties of connectivity and topology of such systems.
1
Introduction
Other areas of computer science that use the transition system model of actions include computer-aided verification and validation, where reachability analysis and model-checking problems are described in languages like SMV [6] and PROMELA [5]. These languages describe transition systems in terms concepts naturally occuring in the relevant application areas of the tools. Much of the high-level technological infrastructure has the form of networks: transportation, telecommunications, power distribution and water distribution are all based on networks with clearly definable nodes and edges connecting them. Also, most of the standard planning benchmark problems involve networks. Even minor extensions to many of these problems are difficult to express compactly in standard planning languages. For network-structured applications we propose the use of highlevel planning languages with network features, as well as efficient algorithms for directly solving the problems expressed in these languages. A practical requirement for this approach to be feasible is that the overall complexity is not increased, in comparison to expressing the same problems in a standard classical planning language. A nominally “tractable” reduction of network-planning problems to standard planning languages is often possible, but this involves, in all but the simplest cases, a prohibitively high increase in the size of the problem instance and solution times. To illustrate the differences between the approaches, consider a real-world planning problem with network structure that has been considered in earlier work on AI planning: the power-supply restoration problem for electricity distribution networks [7]. The first modeling of this problem in a classical planning language similar to PDDL leads to huge problem descriptions even for small networks [1]. Using the axioms of PDDL [4] to express network connectivity properties leads to a much more practical representation of the problem [2]. This formulation is compact but arguably not very natural as the axiom mechanism (inductive definitions) doesn’t per se directly represent any natural features of this domain.
2
Problem Definition
We model systems in terms of a set of nodes and connections between them. The properties of each node are expressed in terms of state variables. Every node has the same set of state variables, but as the actions need not treat the nodes uniformly, this is not a restriction. In this paper we only consider a deterministic planning problem similarly to classical planning, and hence we have a unique initial state for the system.The state of the system consists of the connections and the values of the state variables. Definition 1 (State) For given sets A of state variables, V of nodes and E of (atomic) connections, a state is a pair (v, e) where • v : V × A → {0, 1} assigns a value to each state variable at each node, and • e : E → 2V ×V assigns (atomic) connections a binary relation. Definition 2 A network is defined as (A, V, E, O, I, G) where • • • • • •
A is the set of (Boolean) state variables, V is the set of nodes, E is the set of (atomic) connections between the nodes, O is the set of actions (to be defined later), I is the the initial state, and G is a modal formula representing the goal of the system (to be defined later.)
2.1
Network Properties
We employ modal logic to express properties of systems. The modalities express connections between nodes. • • • •
Atomic connections c ∈ E are connections. If c1 and c2 are connections then so are c1 ; c2 , c1 ∪ c2 and c1 ∩ c2 . If c is a connection then so are c∗ and c−1 . If φ is a formula then φ? is a connection.
Composite connection c1 ; c2 between nodes n and n means that n can be reached from n by first following c1 to some intermediate node and from there c2 to n . Connection c1 ∪c2 expresses disjunctivity: there is either connection c1 or c2 . Analogously, c1 ∩c2 expresses conjunctivity. The connection c∗ represents the reflexive transitive closure of c. The connection c−1 is the inverse of c: a connection going from n to n whenever c goes from n to n . The connection φ? conditionally connects a node with itself if φ is true.
Definition 3 The meaning cS s of connections c in a state s = (v, e) of S = (A, V, E, O, I, G) is defined as follows.
918
• • • • • • •
J. Rintanen / A New Approach to Planning in Networks
8 if n: a ∈ F <1 if ¬n: a ∈ F for all n ∈ V and a ∈ A • v (n, a) = 0 : v(n, a) otherwise g(c)\{(n, n )|¬(n, c, n ) ∈ F } • g (c) = for all c ∈ E ∪{(n, n )|(n, c, n ) ∈ F }
cS s = e(c) if c ∈ E S S c; c S s = {(x, z)|(x, y) ∈ cs , (y, z) ∈ c s } S S c ∪ c S = c ∪ c s s s S S c ∩ c S s = cs ∩ c s S c∗S s = cs ∗ −1 S c s = {(n, m)|(m, n) ∈ cS s} φ?S s = {(t, t) ∈ V × V |s |=t φ}
3
Here the operations ∪, ∩ and ∗ on the right hand sides are the settheoretic union and intersection and the reflexive transitive closure. The connections are used as a part of a modal language that includes the classical propositional logic. The atomic formulas include the propositional variables and the names of nodes. • • • • • • •
The constants ⊥ and (for false and true) are formulas. a for a state variable a ∈ A is a formula. n for a node n ∈ V is a formula. φ ∨ ψ is a formula where φ and ψ are formulae. ¬φ is a formula where φ is a formula. [c]φ is a formula if φ is a formula and c is a connection. n: φ is a formula if φ is a formula and n ∈ V is a node.
The modal operators [c] represent universal quantification over all nodes that are reachable by a path described by c. The formula n is true in the node n and false elsewhere. Formulae n : φ refer to the truth of φ in node n. The meaning of → and ∧ and ↔ is defined in the usual way, as is c by cφ = ¬[c]¬φ. Example 1 The next formula is true in cities (nodes) from which one can fly to a tropical destination with a direct flight or two flights without changing planes in the U.S. flight ∪ (flight; ¬US?; flight)tropics The next formula is true if there are paths in a communications network from the current node to node n that go through a designated center node and only visit nodes that are safe on the way. (link; safe?)∗; center?; (safe?; link)∗n
At this point it is apparent that our logic is a variant of the propositional dynamic logic (PDL) [3] with a mechanism for referring to the names of nodes. A truth-definition for this modal logic can be given in the obvious way. We define s |= φ iff s |=n φ for all n ∈ V .
2.2
Actions
Actions can change the values of the state variables associated with the nodes and the connections between the nodes. An action consists of a precondition which determines the circumstances under which the action can be taken, as well as the effects that indicate when and how do the values of the state variables change and which connections between nodes are added or removed. Definition 4 (Action) An action is p, e where p is a formula and e is a set of pairs q r where q is a formula and r is a set of literals. The literals that can be effects in an action are n : a, ¬n : a and (n, c, n ) and ¬(n, c, n ), where a ∈ A, n ∈ N , n ∈ N and c ∈ C. Definition 5 (Successor state) Let S = (A, V, E, O, I, G) be a system. Let p, e be an action and s = (v, S g) a state. The action is executable if s |= p and the set F = {r|q r, s |= q} is consistent. The successor state of s is s = (v , g ) where
Examples
Many of the standard planning benchmark problems can be viewed as consisting of nodes and connections between them. Example 2 In Blocks World the blocks are the nodes and the onrelation are the connections. Moving x from y onto z is defined by ˙ ¸ x: ony ∧ x: [on−1 ]⊥ ∧ z: [on−1 ]⊥, {¬(x, on, y), (x, on, z)} . Here x: [on−1 ]⊥ says that false is true in all nodes related to x by on, meaning that there is no such node, i.e., the block is clear. We can introduce an action that allows moving stacks of blocks. ¸ ˙ x: ony ∧ z: ¬on∗x ∧ z: [on−1 ]⊥, {¬(x, on, y), (x, on, z)} Example 3 With the network planning language it is easy to express movement from one node to any of the reachable nodes. Here a and b are names of locations and p is an object that moves. a: (p ∧ road∗b), {¬a: p, b: p} Further extensions are possible. For example, we can require that property φ is satisfied after each road segment along the path. a: (p ∧ (road; φ?)∗b), {¬a: p, b: p}
4
Conclusions
We have considered a language for planning in networks. The language is directly relevant to many application domains with network structure. The network model is even more general, allowing to express interesting properties of benchmarks that at the surface level are not about networks. This is a consequence of many problems having a relational/graph representation and the support of the language for expressing properties of graphs.
Acknowledgements The research was funded by Australian Government’s Department of Broadband, Communications and the Digital Economy and the Australian Research Council through NICTA and the SuperCom project.
REFERENCES [1] P. Bertoli, A. Cimatti, J. K. Slaney, and S. Thi´ebaux, ‘Solving power supply restoration problems with planning via symbolic model checking’, in ECAI’02, pp. 576–580, (2002). [2] B. Bonet and S. Thi´ebaux, ‘GPT meets PSR’, in ICAPS’03, pp. 102–112, (2003). [3] Michael J. Fischer and Richard E. Ladner, ‘Propositional dynamic logic of regular programs’, J. Computer and System Sciences, 18(2), 194–211, (1979). [4] M. Ghallab, A. Howe, C. Knoblock, D. McDermott, A. Ram, M. Veloso, D. Weld, and D. Wilkins, ‘PDDL - the Planning Domain Definition Language, version 1.2’, Technical report, Yale Center for Computational Vision and Control, Yale University, (1998). [5] Gerald J. Holzmann, Design and Validation of Computer Protocols, Prentice Hall, 1991. [6] Kenneth L. McMillan, Symbolic Model Checking, Kluwer, 1993. [7] S. Thi´ebaux and M.-O. Cordier, ‘Supply restoration in power distribution systems – a benchmark for planning under uncertainty’, in ECP’01. Springer, (2001).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-919
919
Detection of unsolvable temporal planning problems through the use of landmarks1 E. Marzal and L. Sebastia and E. Onaindia 2 Abstract. Deadline constraints have been recently introduced in PDDL3.0. The results obtained in the constraints domains in the last Planning Competition show that planners are not yet fully competitive. When dealing with deadline constraints the number of feasible solutions for a problem is reduced and thus it is specially relevant the ability to detect unsolvability. In this paper we present a new approach, based on the use of temporal landmarks, for the detection of unsolvable temporal planning problems.
1 Introduction The last planning competition (IPC5) [3] introduced the new language PDDL3.0 [4] to allow the user to express strong and soft constraints about the structure of the plans. Deadline constraints, expressed through modal operators such as within, always-within or sometime-after [4], were extensively tested in several domains. Only two planners, MIPS-XXL [2] and SGPlan [6], participated in the time constraints track;MIPS-XXL could only solve a few problems of each domain, and SGPlan solved many more problems but returned worse quality solution plans. A new testing on the last available version of SGPlan revealed that this planner was not able to identify unsolvable problems. Moreover, SGPlan returns the same output when it cannot find a solution and when the problem is actually unsolvable. This brings up the issue that handling deadline constraints introduces a major difficulty in planning as propositions are now bounded to hold within a specific time interval. In this paper we present a preliminary approach based on the extraction of landmarks capable to inform if a temporal planning problem with deadline constraints is unsolvable. The system builds a temporal landmarks graph which represents a skeletal plan of the solution. If the graph is not consistent then the system returns the problem is unsolvable; otherwise, there is no guarantee that a satisfying plan exists. It is not generally possible to prove the unsolvability of problems but the experiments will show that our approach was able to identify all the unsolvable problems that were tested. On the other hand, although the system does not yet compute a solution when the problem is solvable, the landmarks graph comprises a correct partial plan which can be further refined into an executable solution. 1
This work has been partially funded by Consolider Ingenio 2010 CSD200700022 project and by the Spanish Government TIN2005-08945-C06-06 project. 2 Universidad Politecnica of Valencia. e-mail: {emarzal, lstarin, onaindia}@dsic.upv.es
2 System overview The input of our system is a PDDL3.0 problem which contains within, always-within and sometime-after constraints [4] and it returns a message in case the planning problem is unsolvable. First, we extract the set of landmarks [5] of the problem and we build a landmarks graph by adding causal relationships between them. Second, we associate some temporal intervals to each landmark. These intervals, together with the causal relationships, define a set of constraints that are inserted in an agenda. Then a CSP solver performs the consistency checking. If an inconsistency is found in the graph, the system will return a message saying that a feasible solution plan does not exist.
3 Temporal model In a STRIPS context, a landmark [5] is a literal that must be true at some point in any solution plan. Our process for landmarks extraction is similar to the method described in [8]. We establish two types of causal relationships between landmarks. Definition 1 There is a dependency relationship between two landmarks li and lj (li d lj ) if for every temporal plan that achieves lj at time t (lj ∈ St ) from state I there exists at least one state St prior to St which contains li . Definition 2 Given two landmarks li and lj such that li ∈ St , lj ∈ St and li d lj ; there is a necessary relationship between li and lj (li n lj ) if every temporal plan that achieves lj at time t from state St contains a single action a such that li ∈ Cond(a) ∧ lj ∈ AddEf f(a)3 . The set of landmarks and the relationships between them define a landmarks graph. Let g be a top-level goal that must be obtained at time T (specified by a within constraint) and let l be a landmark such that l {d,n} . . . {d,n} g. We define the following intervals for l: • validity interval (denoted as [minv , maxv ]) is the temporal interval when l will be true in the plan. • necessity interval (denoted as [minn , maxn ]) is the set of time points when l must be true in order to satisfy other landmarks. • generation interval (denoted as [ming , maxg ]), where ming and maxg represent respectively the earliest and latest start time of l in order to satisfy g at a time less or equal than T . 3
A PPDL3.0 durative action a contains the following elements: Conditions Cond(a), which denote the set of conditions to be guaranteed over the execution of the action; Duration, which is a positive value represented by dur(a) ∈ R+ ; Effects Ef f (a), classified into AddEf f (a) as the set of all add effects and DelEf f (a) as the set of all delete effects.
920
E. Marzal et al. / Detection of Unsolvable Temporal Planning Problems Through the Use of Landmarks
Initially, ming is set to the time of the earliest temporal state reached from I where landmark l appears (additionally, minv = minn = ming ). maxg , maxv and maxn will be set to the temporal bound T of the corresponding top-level goal. All these values (except ming ) will be eventually updated. The relationships between validity, necessity and generation intervals define a set of constraints between the two endpoints of an interval -for example: minv ≤ maxv -, and constraints between the endpoints of different intervals: ming ≤ minv ≤ minn minv ≤ maxg
maxg ≤ maxv maxn ≤ maxv
A causal relationship between two landmarks li and lj implicitly establishes a temporal constraint such as endpointli + distance ≤ endpointlj . If li n lj we calculate two temporal distances; the minimum distance between the first (and last) time instant when li is needed and the time instant when lj is generated by the set of actions {ai } such that li ∈ Cond(ai ) ∧ lj ∈ AddEf f(ai ). If li d lj we calculate two distance values, namely (DISF (li , lj ) and DISL (li , lj )), by recursively computing the minimum distance between all the literals in the path from a state that contains li to a state that contains lj . Therefore, a causal relationship between two landmarks establishes the following relations between the endpoints of the intervals: ∃li {d,n} lj → maxg (lj ) ≥ maxg (li ) + DISF (li , lj ) ∃li {d,n} lj → minv (lj ) ≥ minv (li ) + DISF (li , lj ) minn (li ) = min(minv (lj ) − DISF (li , lj )), ∀lj : ∃li n lj maxn (li ) = max(maxg (lj ) − DISL (li , lj )), ∀lj : ∃li n lj Our system is capable of handling within, always-within and sometime-after constraints. The introduction of these deadline constraints modifies the landmarks intervals in the following way: Constraint within t l always-within t li lj sometime-after li lj
In our model maxg (l) ≤ t minv (li ) ≤ maxv (lj ) − t maxg (li ) ≤ maxg (lj ) minv (li ) ≤ minv (lj )
Additionally, we also take account of inconsistencies between landmarks. If li and lj are mutex [1] at time t, we can find two different situations. If there exists a causal relationship between both landmarks, we set maxv (li ) to min(maxg (li ), maxg (lj ) − DISL (li , lj )) and, consequently, maxn (li ) = min(maxv (li ), maxn (li )). Then we propagate this new information to the rest of the graph and insert the constraint maxv (li ) ≤ minv (lj ) in the agenda. In case there is not a causal relationship between li and lj , we only insert the disjunctive constraint maxv (li ) ≤ minv (lj ) ∨ maxv (lj ) ≤ minv (li ) in the agenda. A CSP solver is invoked to perform consistency checking. This process will help us restrict the landmarks intervals. In case an inconsistency is found (a constraint that cannot be satisfied), the algorithm will display a message saying that ”No solution exists”. As the experiments will show, our system returned the message ”No solution exists” for all the unsolvable problems we tested. When no inconsistency is found in the landmarks graph, there is no guarantee the problem is solvable (or unsolvable) as it is not generally possible to prove the unsolvability of problems. However, we have observed the landmarks graph is informative enough so as to identify unsolvability in all the cases we have tested; thus we can
affirm the landmarks graph allows us to experimentally detect unsolvability. On the other hand, the graph will comprise a correct partial solution in case the problem is solvable, and this solution can be further expanded until a complete plan is obtained.
4 Experiments We have tested our approach on problems from the Pipesworld domain and a modified version of the Driverlog including within, always-within and sometime-after constraints [3, 4]. We have compared our results with the last available version of SGPlan [7]. Pipesworld. SGPlan did not solve any problem from this domain at IPC5. In contrast, our model processed all tested problems and found no indication of unsolvability in any of them. We then changed the time limits in the within and always-within constrains and run the first ten problems again. SGPlan could only solve the problems that contained very loose deadlines. This indicates that SGPlan fails at finding solutions for problems with very restrictive temporal constraints and few feasible solution plans. Driverlog. We run both SGPlan and our model on ten problems from this domain; four out of the ten problems were solvable and the remaining six problems were unsolvable. Our model identified the six unsolvable problems and it did not show the four remaining problems were unsolvable. However, SGPlan only returned solution plans for two of the solvable problems. For the remaining eight problems, SGPlan did not provide any response, neither a solution plan nor ”Solution not found” nor ”No solution exists”.
5 Conclusions In this paper, we have presented a preliminary but promising approach to deal with temporal planning problems with deadline constraints. Our model allocates landmarks in time and, through the calculation of causal relationships and other constraints between them, it draws a temporal picture of the problem to detect unsolvability. This approximation is very appropriate for planning problems with very restrictive deadline constraints. Currently, we are studying the properties of the landmarks graph in order to detect solvability. Our principal focus in this research is to show that if the algorithm obtains a complete and conflict-free landmarks graph and the agenda does not contain any disjunctive constraint then solvability can be ensured.
REFERENCES [1] A. Blum and M. Furst, ‘Fast planning through planning graph analysis’, Artificial Intelligence, 90(1-2), 281–300, (1997). [2] S. Edelkamp, S. Jabbar, and M. Nazih, ‘Large-scale optimal PDDL3 planning with MIPS-XXL’, in ICAPS-2006 – Fifth International Planning Competition, pp. 28–30, (2006). [3] A. Gerevini and D. Long. ICAPS-2006 Fifth International Planning Competition, 2006. http://zeus.ing.unibs.it/ipc-5/. [4] A. Gerevini and D. Long, ‘Plan constraints and preferences in PDDL3’, in ICAPS-2006 – Fifth International Planning Competition, pp. 7–13, (2006). [5] J. Hoffmann, J. Porteous, and L. Sebastia, ‘Ordered landmarks’, Journal of Artificial Intelligence Research, 22, 215–287, (2004). [6] C. W. Hsu, B. W. Wah, R. Huang, and Y. X. Chen, ‘New features in SGPlan for handling preferences and constraints in PDDL3.0’, in ICAPS2006 – Fifth International Planning Competition, pp. 39–41, (2006). [7] C. W. Hsu, B. W. Wah, R. Huang, and Y. X. Chen. The SGPlan planner, 2007. http://manip.crhc.uiuc.edu/programs/SGPlan/index.html. [8] L. Zhu and R. Givan, ‘Landmark Extraction via Planning Graph Propagation’, in In Printed Notes of ICAPS’03 Doctoral Consortium, (2003). Trento, Italy.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-921
921
A Planning Graph Heuristic for Forward-Chaining Adversarial Planning 1 ¨ Pascal Bercher and Robert Mattmuller
Abstract. In contrast to classical planning, in adversarial planning, the planning agent has to face an adversary trying to prevent him from reaching his goals. In this paper, we investigate a forwardchaining approach to adversarial planning based on the AO* algorithm. The exploration of the underlying AND/OR graph is guided by a heuristic evaluation function, inspired by the relaxed planning graph heuristic used in the FF planner. Unlike FF, our heuristic uses an adversarial planning graph with distinct proposition and action layers for the protagonist and antagonist. First results suggest that in certain planning domains, our approach yields results competitive with the state of the art.
1
Introduction
In many planning problems, the environment in which the agent acts is not static. The exogenous dynamics can be caused by “nature” or by one or more other agents sharing the same environment. Other agents can behave neutrally (simply following their own independent agenda or otherwise acting unpredictably), adversarially, or cooperatively with respect to the protagonist’s goals. Here, we focus on adversarial problems. We assume complete observability, i.e., a plan will be a mapping from physical states to applicable actions. A usual approach to conditional (adversarial) planning is planning as model checking [5], whereas planning as heuristic search [3] tends to yield best results for static, deterministic problems. Both approaches are also used in general game playing [7]. Related work includes the dynamic programming approach by Hansen and Zilberstein [8], and, for partially observable problems, heuristic search in the belief space as implemented in the POND planner by Bryce et al. [4].
2
Adversarial Planning
We consider discrete adversarial planning problems under full observability with alternating turns. More formally, similar to STRIPS problems [6], an adversarial planning problem is given by a set of states S = 2P over a finite set of propositions P , an initial state I ⊆ P , two finite sets of operators Op and Oa (controlled by the protagonist p and antagonist a, respectively), and a goal condition G ⊆ P . Operators have the form o = pre, add, del, where pre ⊆ P is the precondition and add, del ⊆ P are the add and delete lists of o. An operator o is applicable in a state s ⊆ P iff pre ⊆ s, and if applied, leads to the successor state s = (s\del)∪add. A state s is a goal state iff G ⊆ s. The players take alternating turns, starting with 1
University of Freiburg, Germany, {bercherp,mattmuel}@informatik.unifreiburg.de. This work was partly supported by the German Research Council (DFG) as part of the Transregional Collaborative Research Center “Automatic Verification and Analysis of Complex Systems” (SFB/TR 14 AVACS). See www.avacs.org for more information.
the protagonist controlling Op . We assume that the player to move is known in each state. The protagonist tries to reach a goal state in a finite number of steps, whereas the antagonist tries to prevent him from doing so. A winning strategy for the protagonist is a function mapping states in which he is to move to applicable operators, such that, against each possible strategy of the antagonist, a goal state will be reached in a finite number of steps. Such an adversarial planning problem naturally corresponds to the problem of evaluating an AND/OR graph over the state space. OR (AND) nodes correspond to states where the protagonist (antagonist) is to move and arcs correspond to operator applications. The relevant part of a winning strategy for the protagonist corresponds to an acyclic subgraph containing (a) the initial state, (b) for each contained non-goal AND node all outgoing arcs and their target nodes, (c) for each contained non-goal OR node exactly one outgoing arc and its target node, and no further nodes or arcs, such that all leaf nodes are goal states. inAL, atCL, ¬full, ¬nop inAL, inCA, ¬full, nop
load nop nop
inAP, atCP, ¬full, nop
unload
inAP, atCP, full, ¬nop
unload
inAL, inCA, ¬full, ¬nop
fuel
inAL, inCA, full, ¬nop
inAL, inCA, ¬full, nop
fuel
inAL, inCA, full, nop
inAP, inCA, ¬full, nop inAP, inCA, full, ¬nop
nop fuel
fly
fly inAP, inCA, ¬full, ¬nop unload inAP, atCP, ¬full, ¬nop
Figure 1. Cargo transport from London to Paris. The initial state is depicted on the upper left hand side, goal states are doubly framed. The protagonist moves in elliptic, the antagonist in rectangular nodes.
Consider for example a modified version of the Simple Rocket domain [2] with one airplane/rocket whose tank can be either full or empty, a set of cities, and a set of cargo packages which can be loaded and unloaded. Possible actions are flying from one city to another one if the tank is full, loading a package into the plane, unloading a package from the plane unless the same package has just been loaded without an intermittent flying action, fueling the plane if necessary, and performing no-ops. Flying and loading can only be done by the protagonist, fueling only by the antagonist, and unloading and noops by both, with the antagonist being barred from two consecutive no-ops without a flight in between. The goal of the protagonist is to transport the packages to specified target cities. The agents take turns, starting with the protagonist. Assume two cities Paris and London, one package to be trans-
922
P. Bercher and R. Mattmüller / A Planning Graph Heuristic for Forward-Chaining Adversarial Planning
cities 2 2 3 3 3 3 4 4 4
pack’s 1 2 3 4 5 6 6 7 8
Breadth-First Search time mem nodes 0.014 1 44 0.048 2 152 0.354 6 2106 0.870 49 8211 5.556 159 43785 87.691 987 237264 — 3098 722750 — 2192 771629 — 3889 912816
AO* with FF heuristic time mem nodes 0.025 1 37 0.071 1 88 0.202 6 625 0.463 28 1871 1.437 98 6917 16.323 397 63498 76.718 698 169349 373.553 1840 510738 — 3356 738520
AO* with adversarial FF heuristic time mem nodes 0.026 1 37 0.072 1 78 0.260 7 628 0.232 17 605 0.321 23 794 1.157 25 4164 82.701 642 194304 99.639 1487 225544 — 5440 914602
time 0.000 0.016 0.380 1.780 9.041 44.287 130.064 — —
MBP BDD nodes 6601 84424 23068 165718 365272 546666 834704 — —
Figure 2. Experimental results for the transportation benchmark problems. We used a Java implementation, running on a machine with two Quad Xeon processors, 2.66 GHz, and a memory limitation of 16 GB RAM. The time-out, indicated by dashes, was set to ten minutes. Times are given in seconds, memory usage in MB. Memory usage and node counts in case of time-outs are the current values when the time-out occurred.
ported from London (atCL) to Paris (atCP), and the plane initially in London (inAL) with its tank empty (¬full). The variable “nop” is true iff the adversary has already performed a no-op since the last flight. A winning strategy for the protagonist is depicted in Figure 1.
3
Search Algorithm and Heuristic
As search algorithm, we used AO* [10] with maximization of cost estimates at AND nodes. The performance of the AO* algorithm depends on the choice of the evaluation function applied to the fringe nodes. To compute this function, we used an adaption of the graphplan-based [2] distance heuristic used in the FF planning system [9]. Just like the heuristic of the FF planning system, to which we will refer as FF heuristic, the adversarial FF heuristic uses relaxed operators, which we get by ignoring delete lists. For each agent + be the set of relaxed operators he controls. ag ∈ {p, a}, let Oag Fig. 3 shows the pseudocode of the adversarial FF heuristic. Lines 1 to 3 are equal to the forward step of the FF heuristic, except that there is not only one set of relaxed operators, but two distinct sets + that belong to the two agents ag. Lines 4 to 11 correspond to Oag the backward step of the FF heuristic. In addition, in line 12, the se+ , one for each agent. lected operators are put in two distinct sets SOag After these two sets have been completely computed, in line 13 the value of the adversarial FF heuristic is calculated as follows: Since both agents move in turn, the number of moves needed to execute the plan is at most twice the number of operators contained in the larger + + , which we call SOmax . First, we calculate how one of the sets SOag many operators have to be applied by agent max ∈ {p, a}, which + + + + | − |SOmax ∩ Omax |, where Omax is the set of is r := |SOmax relaxed operators agent max ∈ {p, a}\{max} controls. The value of the heuristic can then be calculated as max{2r, |SOp+ | + |SOa+ |}.
4
Experimental Results
We experimented with solvable problems from the example domain described above with varying numbers of cities and packages. We compared running times, memory usage and node creations for uninformed breadth-first search, AO* search with the FF heuristic under the assumption of full cooperation, and AO* search with the adversarial FF heuristic. In addition, we encoded the same tasks as conditional planning problems under full observability in NuPDDL and solved them using MBP [5]. The results are summarized in Fig. 2.
5
Conclusion
The results in Fig. 2 suggest that in domains where the antagonist controls operators that may contribute to a plan, AO* search with adversarial FF heuristic often outperforms AO* search with FF heuris-
1 2 3
4 5 6 7 8 9 10 11 12 13
while G is not contained in the current layer i do Let S[i] be the set of all state variables in layer i. Let O+ [i] be the set of all relaxed operators that are applicable in layer i and that belong to agent ag ∈ {p, a}, who is to move in layer i. Increment i. Let G[m] be G. for layer j := m − 1 to 0 do foreach state variable g ∈ G[j + 1] do if g ∈ S[j] then Put g into G[j]. else Put a relaxed operator o+ into SO + [j] that is in O+ [j] and that creates g. Put the precondition pre of o+ into G[j]. + Put all selected operators of SO+ [j] into SOag , the set of all selected operators of agent ag ∈ {p, a} who is to move in layer j.
If possible, shift operators from SOp+ to SOa+ (or vice versa) to ensure that the difference between |SOp+ | and |SOa+ | is as small as possible. Calculate and return the least number of moves that will be needed to apply all operators of the two rearranged sets. Figure 3.
Adversarial FF heuristic.
tic and uninformed search. It is competitive with the symbolic approach used in MBP.
REFERENCES [1] Pascal Bercher and Robert Mattm¨uller, ‘A Planning Graph Heuristic for Forward-Chaining Adversarial Planning’, Technical Report 238, Albert-Ludwigs-Universit¨at Freiburg, Institut f¨ur Informatik, (2008). [2] Avrim L. Blum and Merrick L. Furst, ‘Fast Planning Through Planning Graph Analysis’, in Proc. of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI’95), pp. 1636–1642, (1995). [3] Blai Bonet and H´ector Geffner, ‘Planning as Heuristic Search’, Artificial Intelligence, 129(1-2), 5–33, (2001). [4] Daniel Bryce, Subbarao Kambhampati, and David E. Smith, ‘Planning Graph Heuristics for Belief Space Search’, Journal of Artificial Intelligence Research, 26, 35–99, (2006). [5] Alessandro Cimatti, Marco Pistore, Marco Roveri, and Paolo Traverso, ‘Weak, Strong, and Strong Cyclic Planning via Symbolic Model Checking’, Artificial Intelligence, 147(1–2), 35–84, (2003). [6] Richard E. Fikes and Nils J. Nilsson, ‘STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving’, Artificial Intelligence, 2(3–4), 189–208, (1971). [7] Michael R. Genesereth, Nathaniel Love, and Barney Pell, ‘General Game Playing: Overview of the AAAI Competition’, AI Magazine, 26(2), 62–72, (2005). [8] Eric A. Hansen and Shlomo Zilberstein, ‘Heuristic Search in Cyclic AND/OR Graphs’, in Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI’98), pp. 412–418, (1998). [9] J¨org Hoffmann and Bernhard Nebel, ‘The FF Planning System: Fast Plan Generation Through Heuristic Search’, Journal of Artificial Intelligence Research, 14, 253–302, (2001). [10] Nils J. Nilsson, Principles of Artificial Intelligence, Springer, 1980.
10. Perception, Sensing and Cognitive Robotics
This page intentionally left blank
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-925
925
Vector Valued Markov Decision Process for robot platooning Matthieu Boussard and Maroua Bouzid and Abdel-Illah Mouaddib 1 1
Introduction
Many approaches have been dedicated to deal with situated agents movements coordination as the flocking where the global behaviour of agents is controlled by three simple rules(separation, alignment and cohesion) or the platoon formation where agents steer towards a position following a leader. The common property of all these group movements is that the global behaviour is emerging from local behaviours. Our general problem is how to formalise the local behaviours as local decision making processes and how the local interactions may lead to a coherent global behaviour. In this paper, we focus our discussion on the platoon formation where many approaches focus on the longitudinal control [8] techniques have been proposed. By using local sensing, the agents can maintain a certain distance between the closest one. In systems using the longitudinal control techniques, it is necessary to have a platoon leader allowing the group to move towards the goal. But those systems consider neither the uncertainty in an explicit way nor other possible interactions between agents as precedence constraints [2] or common resource to share with others. In [7], a framework has been developed to formalize the impact of a local decision in the group by considering two new criteria based respectively on positive and negative effects. The local decision process of each agent considers the individual and the group interests and it proposes a framework based on vector-valued MDP [5] to manage those two criteria. This approach considers all the possible interactions at any state which leads to framework with high complexity. To reduce this complexity, in [3] only local interactions have been considered, where the impact of a local decision on the perceived agents is assessed. In this approach, an agent perceives agents in its neighbourhood and develops on-line a 2V-MDP to derive a policy to behave in this neighbourhood. The algorithm satisfies the following constraints:(Dynamic change) this approach is suitable to world changes since only the neighbourhood is considered to make a decision. (Scalability) this approach is applicable in a real system because each agent constructs a small 2V-MDP with limited space since it formalizes only the decision process problem in a limited area. Also, the underlying DEC-MDP is considered as a set of separate MDPs where the expected value is augmented to consider the interactions with the other MDPs. (Local coordination) an agent can interact with a limited number of agents, and in a limited number of locations. This reduces significantly the complexity of the problem. (Optimal when possible) the behaviour of an agent is optimal when it is in an “easy situation”. 1
GREYC Universit´e de CAEN France, email:{mboussar, bouzid, mouaddib}@info.unicaen.fr
2
Backgrounds and related works
The problem addressed here can be seen as a problem of collective decision and multi-agents planning [10]. Platooning is a kind of flocking problem, where each agent should maintain the cohesion of the group. For this purpose many formalisms based on flocking approaches have been developed [9]. The basic idea is to maintain a global shape of the group of agents, while each agent perceives only its local environment and its close neighbours. The main benefits of these approaches are a strong scalability and the possibility to manage a huge number of agents. However, the drawback of these approaches may be the lack of optimality proof and the lack of expressiveness to consider different kind of interactions. Longitudinal control techniques are also used in platooning approaches [8]. These techniques aim to keep a safety distance between the platoon leader and the closest neighbour, based on a local coordination. The platoon leader can also give orders to the rest of the group. In these approaches, the platoon must have a leader which is the unique member that has an explicit goal. However, the problem of important number of messages exchanged between the platoon members may be a limitation. Planning approaches allow a precise description of the target goal and of the environment. The DEC-MDP [1] has been proposed in order to offer decentralized applications but in the general case the complexity is too high for real applications. However, recent works have shown the possibility to use this framework for applications with large scale [2] considering some specific and local interactions. Our approach is similar to this approach but it offers a rich model of local interactions, uses a precise description of the environment, and before selecting an action, the agents adapt their behaviours according to their local perceptions.
3
The Platooning problem
Let a group of agents be in a start area and from there, they have to reach a goal area as fast as possible. There are requirements, neither for the arriving order, nor for the position in the group on the way. The safety of all agents is the most important constraint. Due to the uncertainty agents have to respect a security inter-agent spacing. If this spacing is too large, it can worsen the quality of the solution. We define a platoon of a group of agents as a set of mobile robots, each trying to manage its distance from its nearest neighbour, so that the group can reach the goal efficiently. We assume that the agents are evolving in a dynamic, entirely observable, discrete space environment, and only a limited number of agents are perceived in its neighbourhood. All the agents are using the same world model, and they all know the same goal area. There are limited number of actions. We suppose that those actions are non-deterministic, the outcomes
926
M. Boussard et al. / Vector Valued Markov Decision Process for Robot Platooning
of each action is ruled by a probability distribution. Because sometimes it is not possible (hostile area) to communicate, we consider that agents do not use communication to coordinate their activities. We also don’t need an explicit platoon leader.
3.1
The platooning problem as a 2V-DEC-MDP
Because, as mentioned before, the world is fully observable, we formalize the planning problem for a single agent as an MDP S, A, T, R, that allows it to compute (off-line) its optimal monoagent policy. The coordination with the other agents is made on-line. The agents know locally the exact position of the other agents. So the s contains the exact position of agent i and also the exact position of neighbour agents. Before each decision, the impacts of actions are computed. Once those interactions determined, they are beeing used by an agent to assess the expected value of its decision. Indeed, to each decision a ∈ A in state s ∈ S a vector of values (ER(s, a), JER(s, a), JEP enalty(s, a)) is assigned. This vector represents respectively, the individual expected value given by ER(s, a)2 , the expected gain of the group JER(s, a)3 because of the local decision a (positive impact) and the expected opportunity cost of this decision JEP enalty(s, a) (negative impact). More details are given in [3].
4
First evaluations
This simulation shows the emergence and the dynamic change of the leader. In Figure 1c, the exit get closed, and the agents have to take another way to reach the goal. The leader change now, and take the lead until the goal. In Figure 1d, the platoon exits the tunnel and goes towards the goal. The online part of the algorithm is linear with
leader’s emergence is one of the main issue of this work. This issue has two parts. The first part is how to identify a leader in a group of agents. The second part is to find which action will make an agent a leader. Once the leader is identified, we could apply a technique as in [6] to improve the global behaviour. The leader’s change should also be studied. Long term impact of action The 2V-DEC-MDP allows us to express short terms impact of actions. But it appears that a few actions of some agents affect the behaviour of the whole group. At the end of the platoon move, agents reach an equilibrium. We are studying the type of equilibrium that arises. We are interested in three kind of equilibria, Nash, Pareto, and Stackelberg.
6
Conclusion
A coordination framework, based on local observations, has been presented in this paper. It uses the MDP framework to describe the planning problem, and we showed, how to express the local relations in the 2V-DEC-MDP. It allows platoon emergence without explicit leader designation. Furthermore, if some changes appear in the world, a new leader can emerge. When a whole group of agent is blocked, only a few number of agents may cause this group to be blocked. The first extension of this work is to add a reinforcement learning algorithm [4]. We would like this learning algorithm to detect deadlocks and, by learning the behaviour of the other agents, to solve them. We will use equilibria from the game theory to detect the agent that should learn. The second part of this work will be around the analysis of the kind of equilibria this algorithm attains.
REFERENCES
(a) Start
(b) Formation of the platoon
(c) Re-planning, new leader
(d) Movement of the platoon
Figure 1. Emergence of a new leader
respect to the number of neighbour agents. For 50 agents, it takes around 10ms to select the action.
5
Discussion and theoretical consideration
Emergence of the leader As mentioned before, in this approach we have never selected an agent as the leader of the platoon. It appears, however, that this leader actually exists.The analysis of the 2 3
Expected Reward Join Expected Reward
[1] Daniel S. Bernstein, Shlomo Zilberstein, and Neil Immerman, ‘The complexity of decentralized control of markov decision processes’, in UAI ’00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 32–37. Morgan Kaufmann Publishers Inc., (2000). [2] Aur´elie Beynier and Abdel-Illah Mouaddib, ‘An iterative algorithm for solving constrained decentralized markov decision processes’, in AAAI, (2006). [3] Matthieu Boussard, Maroua Bouzid, and Abdel-Illah Mouaddib, ‘Multi-criteria decision making for local coordination in multi-agent systems’, in Proceedings of the International Conference on Tools with Artificial Intelligence (ICTAI’07), (Octobre 2007). [4] J´erˆome Chapelle, Olivier Simonin, and Jacques Ferber, ‘How situated agents can learn to cooperate by monitoring their neighbors’ satisfaction’, in ECAI, pp. 68–72, (2002). [5] Kazuyoshi Wakuta, Vector valued Markov Decision Processes with Average Rewad Criterion, volume 14, chapter Journal of Probability in the Engineering and Information Sciences, 533–548, Cambridge, 2000. [6] Ville K¨on¨onen, ‘Asymmetric multiagent reinforcement learning’, in IAT ’03: Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology, p. 336, Washington, DC, USA, (2003). IEEE Computer Society. [7] Abdel-Illah Mouaddib, Matthieu Boussard, and Maroua Bouzid, ‘Towards a framework for multi-objective multiagent planning’, in AAMAS, (2007). [8] David J. Naffin, Gaurav S. Sukhatme, and Mehmet Akar, ‘Lateral and longitudinal stability for decentralized formation control’, in Proceedings of the International Symposium on Distributed Autonomous Robotic Systems, pp. 421–430, (June 2004). [9] Craig W. Reynolds, ‘Steering behaviors for autonomous characters’, in Proceedings Game Group Conference on Game developers, pp. 763– 782, (1999). [10] David H. Wolpert and Kagan Tumer, ‘Collective intelligence, data routing and braess’ paradox’, in JAIR, (2002).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-927
927
Learning to Select Object Recognition Methods for Autonomous Mobile Robots Reinaldo A. C. Bianchi 1,2 and Arnau Ramisa2 and Ram´on L´opez de M´antaras 2 Abstract. Selecting which algorithms should be used by a mobile robot computer vision system is a decision that is usually made a priori by the system developer, based on past experience and intuition, not systematically taking into account information that can be found in the images and in the visual process itself to learn which algorithm should be used, in execution time. This paper presents a method that uses Reinforcement Learning to decide which algorithm should be used to recognize objects seen by a mobile robot in an indoor environment, based on simple attributes extracted on-line from the images, such as mean intensity and intensity deviation. Two stateof-the-art object recognition algorithms can be selected: the constellation method proposed by Lowe together with its interest point detector and descriptor, the Scale-Invariant Feature Transform and a bag of features approach. A set of empirical evaluations was conducted using a household mobile robots image database, and results obtained shows that the approach adopted here is very promising.
1
INTRODUCTION
Reinforcement Learning [7] is concerned with the problem of learning from interaction to achieve a goal, for example, an autonomous agent interacting with its environment via perception and action. On each interaction step the agent senses the current state s of the environment, and chooses an action a to perform. The action a alters the state s of the environment, and a scalar reinforcement signal r (a reward or penalty) is provided to the agent to indicate the desirability of the resulting state. The policy π is some function that tells the agent which actions should be chosen, and is learned through trial-and-error interactions of the agent with its environment. Several algorithms were proposed as a strategy to learn an optimal policy π ∗ when the model (T and R) is not known in advance, for example, the Q–learning [8] and the SARSA [6] algorithms. Some researchers have been using RL as a technique to optimize image segmentation and object recognition algorithms. For example, Peng et al. used RL to learn, from input images, to adapt the image segmentation parameters of a specific algorithm to the changing environmental conditions, in a closed-loop manner [1, 5] and Draper et al. modeled the object recognition problem as a Markov Decision Problem, and proposed a method to learn sequences of image processing operators for detecting houses in aerial images [2]. To allow a robotic agent to decide which object recognition method should be used, during on line world exploration, we propose to use RL to learn a policy that minimizes computing time, discarding an image if it is not suitable for analysis or choosing between two well known algorithms, described in the following section. 1 2
Centro Universit´ario da FEI, S˜ao Bernardo do Campo, Brazil. Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain.
2
TWO OBJECT RECOGNITION METHODS
Two successful general object recognition approaches that have been widely used are the constellation method proposed by Lowe together with its interest point detector and descriptor SIFT [3] and a bag of features approach [4]. The first approach is a single view object detection and recognition system with some interesting characteristics for mobile robots, most significant of which are the ability to detect and recognize objects at the same time in an unsegmented image and the use of an algorithm for approximate fast matching. In this approach, individual descriptors of the features detected in a test image are initially matched to the ones stored in the object database using the Euclidean distance. False matches are rejected if the distance of the first nearest neighbor is not distinctive enough when compared with that of the second. Once a set of matches is found, the generalized Hough transform and Iteratively Reweighted Least Squares are used to cluster each match and to estimate the most probable affine transformation for every hypothesis. The Bag of Features (BoF) approach to object classification comes from the text categorization domain, where the occurrence of certain words in documents is recorded and used to train classifiers that can later recognize the subject of new texts. This technique has been adapted to visual object classification substituting the words with local descriptors such as SIFT. The descriptor space is discretized in a codebook created applying hierarchical k-means to a dataset of descriptors. A histogram of descriptor occurrences is built to characterize an image. Next, a multi-class classifier – the k-NN in this implementation – is trained with the histograms of local descriptor counts. The class of the object in the image is determined as the dominant one in the k nearest neighbors. Although both object recognition methods proved their reliability in real world applications, they have their limitations: Lowe’s method performs poorly when recognizing sparsely textured objects or objects with repetitive textures, while the Bag of Features needs an accurate segmentation stage prior to classification, which can be very time consuming. Furthermore, the method depends on the quality of that segmentation stage to provide good results.
3
EXPERIMENTS AND RESULTS
In order to decide which algorithm should be used by the agent, the RL problem was defined as a 2 stage MDP, with 2 possible actions in each stage: In the first one, the agent must decide if the image contains an object, and thus must be recognized, or if the image does not contain objects, and can be discarded, saving processing time. In the second stage, the agent must decide which object recognition algorithms should be used: Lowe’s or Bag of Features.
928
R.A.C. Bianchi et al. / Learning to Select Object Recognition Methods for Autonomous Mobile Robots
Table 1. MS Back Lowe
80.4 52.3
Correctly classified images (percentage).
Full Img MSE MSI 100.0 93.2
Table 2. MS Back Lowe
Figure 1.
Images from the dataset.
At each stage the agent choses a system state s, composed of the stage the agent is at plus a combination of simple attributes extracted on-line from the images, for example, mean image intensity and standard deviation. Then, it selects an action to be executed, compute the reward and update the value function. The RL algorithm used is the Q-learning [8], because it directly approximates the optimal policy independently of the policy being followed (it is an off-policy method), allowing the state and the action to be executed by the agent to be selected randomly. The rewards used during the learning phase are computed using a set of training images. If the state in which the agent is corresponds to a training image, and the action taken results in a correct classification, the agent receives a reward. Otherwise it is zero. For example, if we have a training image that does not contain an object, with mean intensity value of 50, standard deviation of 10, the reward given to the state Q(stage = 1, mean = 50, std = 10, action = discard) is 100. Several experiments were executed using a dataset consisting of approximately 150 images of objects occurring in typical household environments plus 30 background images. The objects, that can be textured, untextured or with repetitive textures, are mugs, books, trashcans, chairs and computer monitors (Figure 1). The images includes occlusions, illumination changes, blur and other typical nuisances that can be encountered while navigating with a mobile robot. To evaluate the result of the learning process statistical validation method called Leave-One-Out was used. Six different experiments were conducted, using three different combinations of image attributes as space state and two different image sizes (the original size and a 10 by 10 pixels reduced size image). The combinations of image attributes used as space state are: mean and standard deviation of the image intensity (MS); mean and standard deviation of the image intensity plus entropy of the image (MSE); and mean and standard deviation of the image intensity plus the number of interest points detected by the Difference of Gaussians operator (MSI). The parameters used in the experiments were: the learning rate α = 0.1 and the discount factor γ = 0.9. Values in the Q table were randomly initiated. Tables 1 and 2 present the results obtained. The first line of Table 1 shows the percentage times that the agent correctly choose to discard a background image, and the second line shows the percentage of times the agent correctly choose to use the Lowe algorithm, instead of the BoF. The columns in this table presents the results for the six experiments, the first three using the original image and, from the fourth to sixth column, showing the results for the reduced size image. The last column shows the percentage of times a human expert
100.0 22.7
MS 82.6 63.6
100.0 93.2
100.0 11.4
Expert 100.0 93.2
Incorrect classification (percentage).
Full Img MSE MSI
4.8 25.5
Small Img MSE MSI
0.0 0.0
1.4 6.9
MS 3.4 18.6
Small Img MSE MSI 0.7 0.0
1.4 6.9
Expert 8.2 10.8
takes the correct action. Table 2 is similar to Table 1, but shows the classification error. The first line shows the percentage of images discarded as background, when they should be analyzed, and line two presents the number of times the Lowe algorithm is chosen, when the correct one is the BoF. These results shows that the use of the MSE combination presented very good results, for original size images as well as reduced size ones. On the other hand, the use of the number of interest points detected by the Difference of Gaussians operator as space state did not produce good results.
4
CONCLUSION
The results obtained shows that the use Reinforcement Learning to decide which algorithm should be used to recognize objects yields good results, performing better than a human expert in some cases. To the best of our knowledge, there is no similar approach using automatic selection of algorithms for object recognition. Future works includes testing other image attributes that can be used as the system’s state, other RL algorithms and applying RL techniques to the image segmentation problem.
ACKNOWLEDGEMENTS This work has been partially funded by the FI grant and the BE grant from the AGAUR, the 2005-SGR-00093 project, supported by the Generalitat de Catalunya, the MID-CBR project grant TIN 200615140-C03-01 and FEDER funds. Reinaldo Bianchi is supported by CNPq grant 201591/2007-3.
REFERENCES [1] B. Bhanu, Y. Lin, G. Jones, and J. Peng, ‘Adaptive target recognition’, Machine Vision and Applications, 11(6), 289–299, (2000). [2] B. A. Draper, J. Bins, and K. Baek, ‘ADORE: Adaptive object recognition’, in International Conference on Vision Systems, pp. 522–537, (1999). [3] D. Lowe, ‘Distinctive image features from scale-invariant keypoints’, International Journal of Computer Vision, 60(2), 91–110, (2004). [4] D. Nister and H. Stewenius, ‘Scalable recognition with a vocabulary tree’, in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pp. 2161–2168, (2006). [5] J. Peng and B. Bhanu, ‘Closed-loop object recognition using reinforcement learning’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(2), 139–154, (1998). [6] G. A. Rummery and M. Niranjan, ‘On-line Q-learning using connectionist systems’, Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, (1994). [7] R.S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998. [8] C. J. C. H. Watkins, Learning from Delayed Rewards, PhD Thesis, University of Cambridge, 1989.
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-929
929
Robust Reservation-Based Multi-Agent Routing Adriaan ter Mors and Xiaoyu Mao1 and Jonne Zutt and Cees Witteveen2 and Nico Roos3 1 Problem description In a multi-agent routing problem, agents must find the shortest-time path from source to destination while avoiding deadlocks and collisions with other agents. Agents travel over an infrastructure of resources (such as intersections and road segments between intersections), and each resource r has (i) a capacity C(r), which is the maximum number of agents that may occupy the resource at the same time, and (ii) a minimum travel time D(r). Example 1 illustrates this multi-agent routing problem.
r14
The multi-agent routing problem often occurs in application domains of Automated Guided Vehicles (AGVs), such as transportation of goods in warehouses, or loading and unloading of ships at container terminals. The multi-agent routing problem is also relevant in taxiway planning on airports. The quality of a routing method is judged not only on the basis of its efficiency (i.e., in terms of the time required by the agents to reach their destinations), but also on the basis of its ability to deal with changing circumstances and unexpected incidents. Examples of incidents in the application domains mentioned above are human interference with AGVs (e.g. by people stepping in the path of an AGV) in a warehouse scenario, or the delay of a ship (or aircraft) at the (air)port.
r7
2 Reservation-based multi-agent routing
r1 r8 r9
r3
r4
r10
r1
r 12
3
r2
r6
r11
r5
Figure 1: Infrastructure of unit-capacity resources.
Example 1. Figure 1 shows an example of a multi-agent routing problem. There is an infrastructure of 14 resources: resources r1 to r7 represent intersections or interesting locations, whereas resources r8 to r14 represent lanes between the intersections. All resources have a capacity of one and the same minimum travel time. Suppose we have two agents, A1 that wants to go from r1 to r7 , and agent A2 that wants to go from r5 to r2 . The optimal individual plans for these agents are p1 and p2 respectively: p1
=
(r1 , 1), (r8 , 2), (r3 , 3), (r10 , 4), (r4 , 5), (r14 , 6), (r7 , 7)
p2
=
(r5 , 1), (r11 , 2), (r4 , 3), (r10 , 4), (r3 , 5), (r9 , 6), (r2 , 7)
In the above, (r1 , 1) in p1 means that during time unit 1, agent A1 is travelling on resource r1 . These two plans cannot be both put into action, as they are in conflict with each other: both agents plan to travel on resource r10 during time unit 4, but this is not possible, since each resource can hold at most one agent at the same time. 1 2 3
There are two ways to solve this conflict. The first is that either A1 or A2 (but not both) does not make use of r10 , by making a detour along r6 . The second solution is that one of the agents waits until the other has passed. If, as we assume in this paper, we only optimize for time (as opposed to e.g. distance travelled), then the first solution is the best.
Almende BV {adriaan,xiaoyu}@almende.org Delft University of Technology, {j.zutt,c.witteveen}@tudelft.nl Maastricht University, roos@micc.unimaas.nl
In a reservation-based approach to multi-agent routing, agents plan their route by reserving time intervals on resources; these reservations should be made in such a way that an agent’s plan specifies its location (i.e., the resources) at each point in time. Furthermore, the routing method should ensure that reservations of different agents are not in conflict with each other. The definition of a conflict we employ here is simply that the capacity of a resource may not be exceeded at any point in time.4 We have developed a routing algorithm that a single agent can use to find a conflict-free plan, given a set of reservations previously made by other agents. Our algorithm is optimal, in the sense that it finds the shortest-time conflict-free plan for this agent. The algorithm can be described as a shortest path search through the free time window graph, where a free time window is a time interval associated with a resource, in which the resource can accommodate at least one more agent. Our approach is similar to that of Kim and Tanchoco [1]. However, their distinction between lanes and intersections (as opposed to only having resources), combined with explicitly checking for conflicts results in a computational complexity of O(A4 R2 ), whereas our algorithm has a complexity of O(AR log(AR) + AR2 ). The full algorithm, the proof of its correctness, and the analysis of its worst-case complexity can be found in [4]. 4
A more advanced conflict definition, in which agents are not allowed to overtake each other (catching-up conflicts) or pass each other by (head-on conflicts) can also be modeled in our framework, but we do not show this here.
930
A. ter Mors et al. / Robust Reservation-Based Multi-Agent Routing
Example 2. We have the same two agents: A1 with source and destination r1 and r7 , and A2 with r5 and r2 . Suppose they have the following plans, in which A2 plans to wait (in resource r11 ) until A1 has passed: p1
=
(r1 , 1), (r8 , 2), (r3 , 3), (r10 , 4), (r4 , 5), (r14 , 6), (r7 , 7)
p2
=
(r5 , 1), (r11 , 2), (r4 , 6), (r10 , 7), (r3 , 8), (r9 , 9), (r2 , 10)
10 8 6 4
relative mechanism delay (%)
12
14
HH HL LH LL
2
It stands to reason that carefully crafted plans, which detail all actions of all agents at each point in time, may be obsoleted even by minor incidents in the environment. In their survey paper on design and control of AGV systems, Le-Anh and De Koster [2] wrote “a small change in the schedule may destroy it completely”, referring to the reservation-based routing method of Kim and Tanchoco [1]. The truth of this statement depends on the existence and quality of mechanisms that can repair route plans. The quality of a repair mechanism depends on (i) the cost of the repaired plan in relation to the cost of the original plan, (ii) the similarity between the original and the repaired plan (the more similar the better), and (iii) the computational effort required to perform the repairs. Maza and Castagna [3] proposed a repair mechanism designed to prevent deadlocks that is both computationally inexpensive and adheres closely to the original plan. In Section 4, we investigate the cost of repaired plans when combining Maza’s mechanism with our route planning algorithm. To see how a delay of one agent can create a deadlock, consider again the infrastructure of Figure 1.
networks, lattice networks, and a map of an actual airport; (ii) the number of agents in the system; (iii) the frequency and duration of incidents. The frequency is a value p that represents, for every resource in the agent’s plan, a chance of p of having an incident.
0
3 Dealing with incidents
100
200
300
400
500
number of agents
Figure 2: Mechanism delay for Amsterdam Airport Schiphol infrastructure. Incident parameters: HH = (p=0.1, duration=120s); HL = (p=0.1, duration=30s); LH = (p=0.01, duration=120s); LL = (p=0.01, duration=30s).
The idea of Maza and Castagna is to determine for each resource which agent will enter the resource first, second, etc. This information can be derived from the plans of the agents. Then, during the execution of the plans, an agent is only allowed to enter a resource when it’s his turn. In our example, A2 is the second agent to enter r4 , so it will wait in resource r11 until its turn has come, which is after A1 has exited r4 .
Figure 2 shows the mechanism delay, averaged over all agents, as a percentage of the plans of the agents. At least three noteworthy conclusions can be drawn from this figure: first, as number of agents increases, the relative mechanism delay decreases. It turns out that the increased congestion in the system is more important that any increase in complexity that might result in more mechanism delay. Second, even for a high number (p = 0.1) of long incidents (duration = 120s), the mechanism delay is never more than 15% of planned travel time. Third, for a small number (p = 0.01) of short incidents (duration = 30s), there is no discernible impact on plan quality. Experiments conducted on other types of infrastructures produced figures similar to Figure 2: on lattice networks and small-world networks, mechanism delays were slightly smaller (maximum relative mechanism delays around 10%), whereas for random networks they were higher, with a maximum relative mechanism delay of 30%.
4 Evaluating robustness
ACKNOWLEDGEMENTS
We evaluate the ability of our routing method to deal with change (its robustness) by measuring the delay caused by the deadlockprevention mechanism. This mechanism delay is the time agents have to wait before they are allowed to enter a resource (or the time they have to wait behind other agents that are waiting for clearance to enter a resource, as we do not allow overtaking in our experiments). We had the following experimental setup: first, all agents make a route plan for their (randomly chosen) start and destination locations. Then, in a simulation environment, the agents try to execute their plans. If these (reservation-based) plans are executed perfectly, then no conflicts will occur and all agents will arrive at their destination on time. However, we generate random incidents that cause agents to stop for a fixed duration, potentially blocking other agents behind them. Over different experiment runs, we varied the following parameters: (i) the infrastructure: we used random networks, small-world
This research is supported by NWO (Netherlands Organization for Scientific Research), Grant No. CSI4006.
Suppose that in the execution of his plan, A1 is delayed in resource r3 until time 7. To resume his journey, A1 wants to go to r10 , but that resource is occupied by A2 ; similarly, A2 is also stuck, since the next resource in his plan, r3 , is occupied by A1 .
REFERENCES [1] Chang W. Kim and J.M.A. Tanchoco, ‘Conflict-free shortest-time bidirectional AGV routeing’, International Journal of Production Research, 29(1), 2377–2391, (1991). [2] Tuan Le-Anh and M.B.M De Koster, ‘A review of design and control of automated guided vehicle systems’, European Journal of Operational Research, 171(1), 1–23, (May 2006). [3] Samia Maza and Pierre Castagna, ‘A performance-based structural policy for conflict-free routing of bi-directional automated guided vehicles’, Computers in Industry, 56(7), 719–733, (2005). [4] Adriaan W. ter Mors, Jonne Zutt, and Cees Witteveen, ‘Context-aware logistic routing and scheduling’, in Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling, (2007).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © Government of Canada 2008. All rights reserved. doi:10.3233/978-1-58603-891-5-931
931
Automatic Animation Generation of a Teleoperated Robot Arm Khaled Belghith and Benjamin Auder and Froduald Kabanza and Philipe Bellefeuille 1 and Leo Hartman 2 Abstract. In this paper we describe the Automatic Task Demonstration Generator (ATDG), a system implemented into a software prototype for teaching the operation of a robot manipulator deployed on the International Space Station (ISS). The ATDG combines the use of path planning and camera planning to take into account the complexity of the manipulator, the limited direct view of the ISS exterior, and the unpredictability of lighting conditions in the workspace. The pathplanning algorithm not only avoids obstacles in the workspace as is normal for a path-planner, but in addition takes into account the position of corridors for safe operations and the placement of cameras on the ISS. The camera planner is then invoked to find the right arrangement of cameras to follow the manipulator on its trajectory. This allows the on-the-fly production of useful and pedagogical task demonstrations to help the student carry out tasks involving the manipulation of the robot on the ISS. Even if the system has been developed for robotic manipulations, it could be used for any application involving the filming of unpredictable complex scenes.
1
Introduction
The Space Station Remote Manipulator (SSRMS) is an articulated robot arm mounted on the international space station (ISS). The SSRMS is a key component of the ISS, used in the assembling, the maintenance and the repair of the station, and also for moving payloads from visiting shuttles. Astronauts operate the SSRMS through a workstation located inside one of the ISS compartments. The workstation has an interface with three monitors, each connected to a camera placed at a strategic location of the ISS. There are a total of 14 cameras on the ISS. Making the right camera choices for each of the three monitors available in the robotic workstation is essential for the operator to have a good awareness of the space when manoeuvering the arm. Operators manipulating the SSRMS on orbit receive support from ground operations. Part of this support consist in visualizing and validating manoeuvres before they are actually carried out. In order to improve the ground support operations on the SSRMS, we have developed the automatic task demonstration generator (ATDG), which generates 3D animations that demonstrate how to perform a given task with the SSRMS. The ATDG is integrated within the RObot MANipulation Tutoring System (Roman Tutor) [5], a simulator for the command and control of the SSRMS (Figure 1). 1
University of sherbrooke, Canada, email: {khaled.belghith, benjamin.auder, kabanza, philipe.bellefeuille}@usherbrooke.ca 2 The Canadian Space Agency, email: Leo.Hartman@space.gc.ca
Figure 1. Roman Tutor Student Interface
Filming a trajectory of the SSRMS is a particular case of the problem of automatic movie generation. Previous approaches can be generally classified into constraint satisfaction methods and idiom-based approaches. Constraint-satisfaction methods [2] work at the level of the frame. Given a set of constraints about the objects to appear in the frame, they find the camera parameters that best satisfy these constraints. Idiom-based approaches [4] are based on cinematography principles. They establish a formalization of these principles to reduce the large search space produced by the many degrees of freedom the camera has in each frame of the animation. A key difference between these applications and ours is that in their case, they have a detailed script of the animation at the design phase, with well identified scenes and corresponding semantics. Hence, constraints for the placement of objects and the types of camera shots for different scenes are specified off-line at the design phase. In our case, the trajectory for the SSRMS has to be generated online, depending on the task at hand; we do not have a script specifying beforehand all the scenes of interest and how they should be filmed. Our main contribution is to actually explain how idiombased approaches can be adapted to filming complex robot arm trajectories by integrating an automated segmentation of the trajectory into scenes depending on some spatial and cognitive task specifications. Another difference between previous approaches and ours deals with the nature of the domain. A number of general-purpose rules have been developed in the literature constraining the types of camera shots used for filming people or animated characters. These rules do not apply when the object being filmed is an articulated arm, so we had to introduce more appropriate ones.
932
2
K. Belghith et al. / Automatic Animation Generation of a Teleoperated Robot Arm
ATDG - Automatic Task Demonstration Generator
The ATDG system takes as input a start and a goal configuration for the SSRMS. It generates a movie demonstrating how to move the SSRMS from the start to the goal configuration. The ATDG algorithm sequentially performs the following steps: 1. Calls the path-planner to compute the trajectory from the start to the goal configuration 2. Segments the trajectory into scenes 3. Calls the camera planner to plan the shots on the scenes The path-planner implements the FADPRM algorithm introduced by Belghith et al. [3] which takes into account collisions and visibility constraints. Collisions are treated as hard constraints on trajectories that must be avoided at any cost, whereas visibility constraints are handled as preferences among desirable trajectories. This approach generates safe collision-free trajectories such that the robot is visible at all times from one or more of the cameras. In order to categorize the movements performed by the SSRMS and decompose them into scenes and shoot them correctly using specific idioms, it was necessary to add new information to the trajectory provided by the FADPRM path planner. This new information takes the form of a geometric decomposition of the workspace. The trajectory found by the path-planner is mapped within these geometric decompositions to produce a series of corridors. Each of these corridors corresponds to a specific scene category. A list of idioms is associated with each category of scene as in the normal idiom-based animation generation. This geometric decomposition and the choice of idioms for each scene category is done manually by a domain expert. We plan to construct a complete module that will automatically generate these decompositions from the actual state of the ISS including, among others, the geometry of the workplace, visibility constraints and luminosity. The trajectory mapped within the succession of corridors is then passed to the camera planner. For each portion of the path in a single corridor, the camera planner will try to select the best suitable idiom. The selection of the best idiom in each corridor depends on the quality of the rendering and takes into account the cinematic principles guaranteeing continuity between shots and then consistency of the final movie. In ATDG, each shot in the idiom is distinguished by three key attributes: shot type, camera placement mode, camera zooming mode. Shot Types Five shot types are currently defined in the ATDG System: Static, GoBy, Pan, Track and Pov. A Static shot for example is done from a static camera when the robot is in a constant position or moving slowly. Whereas in a Track shot, a camera follows the robot and keeps a constant distance from it. Camera Placements For each shot type, the camera can be placed in five different ways according to some given line of interest: External, Parallel, Internal, Apex and External II. Currently, we take the trajectory of the robot’s center of gravity as the line of interest which allows filming of a number of many typical manoeuvres. For larger coverage of manoeuvres, additional lines of interest will be added later. Zoom modes For each shot type and camera placement, the zoom of the camera can be in five different modes: Extreme Close up, Close up, Medium View, Full View and Long View. Figure 2 shows an idiom illustrating the anchoring of a new component on the ISS. It starts with a Track shot following the robot
Figure 2. Idiom to film the SSRM anchoring a component on the ISS
while moving on the truss. Then, another Track shot showing the rotation of one joint on the robot to align with the ISS structure. And finally a Static shot focusing on the anchoring operation. In DCCL [4], idioms are specified using planning operators, so that the sequence of shots is generated by a planner. We follow a similar approach but use a different planner and another idiom specification language. In our case, we specify idioms in the Planning Definition Language (PDDL 3.0) and use the TLPlan system [1]. Intuitively, a PDDL operator specifies preferences about shot types in time and in space depending on the robot manoeuver. Parsing the trajectory of the robot mapped within the corridors designating the successive scenes performed, the planner will try to find a succession of shots that captures the best possible idioms. The planner also takes into account the cinematic principles to ensure consistency of the resulting movie. Idioms and cinematic principles are in fact encoded in the form of temporal logic formulas within the planner.
3
Conclusion and Future work
We have introduced a heuristic technique for segmenting the robot trajectory and an approach for defining idioms for robot manoeuvres, allowing us to adapt idiom-based approaches for the automatic filming of robot manipulations. As we are using the TLPlan system, this framework also opens up interesting avenues for developing efficient search control knowledge for this particular application domain and possibly for learning it. There are widespread expectations that the TLPLan and the planning techniques it incorporates are useful in real world applications and ATDG is one of the first examples.
REFERENCES [1] F. Bacchus and F. Kabanza, ‘Using temporal logics to express search control knowledge for planning’, Artificial Intelligence, 116(1-2), 123– 191, (2000). [2] W.H. Bares, J.P. Gregoire, and J.C. Lester, ‘Real-time constraint-based cinematography for complex interactive 3d worlds’, in Association for the Advancement of Artificial Intelligence (AAAI/IAAI), pp. 1101–1106, (1998). [3] K. Belghith, F. Kabanza, L. Hartman, and R. Nkambou, ‘Anytime dynamic path-planning with flexible probabilistic roadmaps’, in IEEE International Conference on Robotics and Automation (ICRA), pp. 2372– 2377, (2006). [4] D.B. Christianson, S.E. Anderson, L. He, D.H. Salesin, D.S. Weld, and Cohen M.F., ‘Declarative camera control for automatic cinematography’, in Association for the Advancement of Artificial Intelligence (AAAI), pp. 148–155, (1996). [5] F. Kabanza, R. Nkambou, and K. Belghith, ‘Path-planning for autonomous training on robot manipulators in space’, in International Joint Conference In Artificial Intelligence (IJCAI), pp. 1729–1731, (2005).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-58603-891-5-933
933
Planning, Executing, and Monitoring Communication in a Logic-based Multi-agent System Martin Magnusson and David Land´en and Patrick Doherty1 1
Introduction
Imagine the chaotic aftermath of a natural disaster. Teams of rescue workers search the affected area for people in need of help, but they are hopelessly understaffed and time is short. Fortunately, they are aided by a small fleet of autonomous unmanned aerial vehicles (UAVs). The UAVs help in quickly locating injured by scanning large parts of the area from above using infrared cameras and communicating the information to the command and control center (CCC) in charge of the emergency relief operation. An autonomous agent carrying out tasks in such dynamic environments must automatically construct plans of action adapted to the current situation and the other agents. Its multi-agent plans involve both physical actions, that affect the world, and communicative actions, that affect the other agents’ mental states. In addition, assumptions made during planning must be monitored during execution so that the agent can autonomously recover, should its plans fail. The strong interdependency between these capabilities can be captured in a formal logic. We take advantage of this by building a multiagent system that reasons directly with the logical specification using automated theorem proving. Our implementation and its integration with a physical robot platform, in the form of an autonomous helicopter, goes some way towards demonstrating that this idea is not only theoretically interesting, but practically feasible.
2
Speech Acts in TAL
The system is based on automated reasoning in Temporal Action Logic (TAL) [1], a first-order logic for commonsense knowledge about action and change. Inspired by Morgenstern’s work [3] we extend TAL with syntactic operators for representing agents’ mental states and beliefs. A formula preceded by a quote is a regular firstorder term that serves as a name of that formula. Alternatively one may use a backquote, which facilitates quantifying-in by exposing variables inside the backquoted expression to binding by quantifiers. With quotation one may pass (names of) formulas as arguments to regular first-order predicates, without introducing modal operators. E.g., the fact that the UAV believes, at noon, that there were, at 11:45, five survivors in cell 2,3 in a coordinate grid of the disaster area can be expressed by: (Believes uav 12:00 ’(= (value 11:45 (survivors (cell 2 3))) 5)) 1
Link¨oping University, Sweden, email: {marma,davla,patdo}@ida.liu.se This work is supported in part by a grant from the Swedish Research Council (VR), the National Aeronautics Research Program NFFP04 S4203, CENIIT, and the Strategic Research Center MOVIII, funded by the Swedish Foundation for Strategic Research, SSF.
This epistemic extension of TAL enables us to characterize communication in terms of actions that affect the mental states of others. Such speech acts form a basis for planning both physical and communicative actions in the same framework, as is done e.g. by Perrault, Allen, and Cohen [4]. Speech acts have also been adopted by research on agent communication languages (ACL) such as the widely used FIPA ACL2 , which establish standards that ensure interoperability between different multi-agent systems. With the help of quotation in TAL we formulate the FIPA inform, informRef, and request speech acts. These can be used by agents to communicate beliefs to, and to incur commitment in, other agents.
3
Planning
Planning with speech acts is, in our framework, the result of proving a goal while abductively assuming action occurrences that satisfy three kinds of preconditions. The action must be physically executable by an agent during some time interval (b e], the agent must have a belief that identifies the action, and the agent must be committed to the action occurring, at the start of the time interval: (→ (∧ (Executable agent (b e] action) (Believes agent b ‘(ActionId ‘action ‘actionid)) (Committed agent b ‘(Occurs agent (b e] action))) (Occurs agent (b e] action)) Executability preconditions are different for each action and are therefore part of the specification of an action. The belief preconditions are satisfied when the agent knows identifiers for the arguments of a primitive action [2]. But the time point at which an action is executed is also critically important. However, it seems overly restrictive to require that the agent holds beliefs that identify the action occurrence time points. Actions that do not depend on external circumstances can be executed whenever the agent so chooses, without deciding upon an identifiable clock time in advance. Actions that do depend on external circumstances can also be successfully executed as long as the agent is sure to know the correct time point when it comes to pass. This is precisely what the concept of dynamic controllability captures. Following Vidal and Fargier [6] we denote time points controlled by the agent by b and time points over which the agent has no control by e. The temporal dependencies between actions form a simple temporal network with uncertainty (STNU) that can be checked for dynamic controllability to ensure an executable plan. Finally, the commitment precondition can be satisfied in one of two ways. Either the agent adds the action to its own planned execu2
http://www.fipa.org/specs/fipa00037/
934
M. Magnusson et al. / Planning, Executing, and Monitoring Communication in a Logic-Based Multi-Agent System
tion schedule (described below), or it uses the request speech act to delegate the action to another agent, thereby ensuring commitment.
4
Execution
Scheduled actions are tied to the STNU through the explicit time points in the Occurs predicate. An STNU execution algorithm propagates time windows during which these time points need to occur. Executed time points are bound to the current clock time and action occurrences scheduled at those time points are proved dispatched using the following axiom: (→ (∧ (ActionId ‘action ‘id) (ProcedureCall agent (b e] id)) (Dispatch agent (b e] action)) The axiom forces the theorem prover to find an action identifier with standardized arguments for ProcedureCall predicate. This is the link between the automated reasoner and the execution sub-system in that the predicate is proved by looking up the procedure associated with the given action and calling it. But the actions are often still too highlevel to be passed directly to the low-level system. An example is the action of scanning a cell of the coordinate grid with the infrared camera. This involves using a scan pattern generator, flying the generated trajectory, and applying the image processing service to identify humans in the video footage. The assumption is that the scanning of a grid cell will always proceed in the manner just described, so there is no need to plan its sub-actions. Such macro-actions, and primitive physical actions, are realized (in simulation, so far) by an execution framework, built using the Java agent development framework3 (JADE). It encapsulates the agent so that all communication is channeled through a standardized interface as FIPA ACL speech acts.
5
Monitoring
Executing the plan will satisfy the goal as long as abduced assumptions hold up. But the real world is an unpredictable place and unexpected events are sure to conspire to interfere with any non-trivial plan. To detect problems early we continually evaluate the assumptions that are possible to monitor. E.g., the agent’s plan might rely on some aspect of the environment to persist, in effect making a frame assumption. A failure of such an assumption produces a percept that is added to the agent’s knowledge base. A simple truth maintenance system removes assumptions that are contradicted by observations and unchecks goals that were previously checked off as completed but that depended on the failed assumptions. This immediately gives rise to plan revision and failure recovery as the theorem prover tries to reestablish those goals. If the proof of the unchecked goals succeeds, the revision will have had minimal effect on the original plan. A failed proof means that the current sub-goal is no longer viable, in the context of the execution failure, and the revision is extended by dropping the sub-goals one at a time. This process continues until a revision has been found or the main goal is dropped and the mission fails.
6
Multi-agent Scenario
The theory presented so far needs to be complemented with an automated reasoner. The current work utilizes a theorem prover named 3
http://jade.tilab.com/
ANDI, which is based on Pollock’s natural deduction system that makes use of unification [5]. Natural deduction is an interesting alternative to the widely used resolution method. The set of proof rules is extensible and easily accommodates special purpose rules that make reasoning more efficient. ANDI incorporates specialized inference rules for reasoning with quoted expressions and beliefs according to the rules and axioms of TAL. This enables ANDI to process the follwing scenario in less than two seconds on a laptop with a 2.4GHz Intel Core 2 Duo Mobile T7700 processor. Suppose that the CCC wants to know the number of survivors in grid cell 2,3 at 13:00. The UAV produces the following plan (in addition to an STNU that orders the actions in time): (fly (cell 2 3)) (scan (cell 2 3)) (informRef ccc ’(value 13:00 (survivors (cell 2 3)))) The success of the plan depends on two persistence assumptions that were made during planning and that are monitored during execution, namely that (location uav) is not affected between flying and scanning, and that (radio uav ccc) indicates a functioning radio communication link. There is also an assumption of the persistence of the survivor count, though this is impossible for our UAV to monitor since it can not see the relevant area all at once. If one of the survivors leaves, then the plan revision process will take the resulting body count discrepancy into account when it is discovered. Suppose however that due to the large distance and hostile geography of the area (or some other unknown error) the radio communication stops functioning while the UAV is scanning the area, before reporting the results. The UAV perceives that the fluent (radio uav ccc) was not persistent and the truth maintenance system successively removes incompatible assumptions and sub-goals until a revised plan is found: (informRef mob ’(value 13:00 (survivors (cell 2 3)))) (request mob ’(Occurs mob (b e] (informRef ccc ’(value 13:00 (survivors (cell 2 3)))))) The new plan involves requesting help from another mobile agent (mob). By communicating the survivor count to this “middle man”, and requesting it to pass on the information to the CCC, the UAV ensures that the CCC gets the requested information. Another set of assumptions now require monitoring, namely (radio mob ccc) and (radio uav mob). While the UAV cannot monitor the other agent’s radio communication, it will be monitored if that agent is also running our agent architecture. At this point let us assume that no further failures ensue so that the knowledge gathering assignment is completed successfully within this paper’s page limit.
REFERENCES [1] Patrick Doherty and Jonas Kvarnstr¨om, ‘Temporal action logics’, in Handbook of Knowledge Representation, eds., Vladimir Lifschitz, Frank van Harmelen, and Bruce Porter, Elsevier, (2007). [2] Robert Moore, ‘Reasoning about knowledge and action’, Technical Report 191, AI Center, SRI International, Menlo Park, CA, (1980). [3] Leora Morgenstern, Foundations of a logic of knowledge, action, and communication, Ph.D. dissertation, New York, NY, USA, 1988. [4] Raymond C. Perrault, James F. Allen, and Philip R. Cohen, ‘Speech acts as a basis for understanding dialogue coherence’, in TINLAP’78, pp. 125–132, (1978). [5] John L. Pollock, ‘Natural deduction’, Technical report, Department of Philosophy, University of Arizona, (1999). [6] Thierry Vidal and H´el`ene Fargier, ‘Handling contingency in temporal constraint networks: From consistency to controllabilities’, JETAI, 11(1), 23–45, (1999).
ECAI 2008 M. Ghallab et al. (Eds.) IOS Press, 2008 © 2008 The authors and IOS Press. All rights reserved.
935
Author Index Abela, C. Aciar, S. Adams, N.M. Akchurina, N. Aknine, S. Akodjènou, M.-I. Albayrak, Ş. Alberti, M. Alferes, J.J. Alonso-González, C. Amgoud, L. Amigó Cabrera, E. Amlacher, K. Anagnostopoulos, C. Analyti, A. Anbulagan Angelini, E. Angiulli, F. Annicchiarico, R. Anselma, L. Antanas, L.-A. Anthony, P. Antoniou, G. Araujo, R.M. Arcos, J.Ll. Arzt, A. Asher, N. Atif, J. Atzmueller, M. Auder, B. Avouris , N. Azimifar, Z. Badea, L. Bajo, J. Baldwin, T. Balog, K. Barrué, C. Bassiliades, N. Basu, A. Battiti, R. Bauckhage, C. Bayoudh, M. Beal, C.R. Belaissaoui, M. Belavkin, R.V. Belgasmi, N.
829 885 132, 663, 777 433 418 767 261 903 99 791 463 688 601 132 733 787 621 107 708 803 847 443 70, 733 887 891 241 835 611, 621 683 931 xiii 545 152 875 343 318 708 729, 731 631 909 261 219 663 901 815 563
Belghith, K. Bellefeuille, P. Ben Saïd, L. Benamara, F. Benayadi, N. Benazera, E. Benedico, T. Benelallam, I. Bennaceur, H. Bensalem, S. Bercher, P. Besana, P. Besnard, Ph. Bessiere, C. Bi, Y. Bianchi, R.A.C. Biba, M. Bienvenu, M. Blacoe, I. Blanchard, E. Blatt, R. Bloch, I. Bollegala, D. Bonarini, A. Bosse, T. Botea, A. Both, F. Bouchard, B. Boussard, M. Bouyakhf, E.H. Bouzid, M. Bouzouane, A. Boyer, A. Bratko, I. Bregon, A. Broersen, J. Brun, A. Bruynooghe, M. Buffet, O. Buitelaar, P. Bulfoni, A. Bulitko, V. Buscher, G. Buscher, H.-P. Calabró, E. Callens, L.
931 931 563 835 745 179 708 901 500 631 921 743 723 475, 901 757 927 361 741 648 20 693 611, 621 333 693 877 578 266 811 925 901 735, 925 811 823 234, 897 791 879, 883 823 137 626 288 668 899 683 683 693 653
936
Campana, F. Campigotto, P. Carrault, G. Castagnos, S. Cejnar, P. Cesta, A. Chalkiadakis, G. Charpillet, F. Chatalic, P. Chi, M. Chmeiss, A. Christophides, V. Ciré, A.A. Coelho, H. Cohen, P.R. Cohn, A.G. Colucci, S. Contet, J.-M. Cooper, G.F. Cooper, M.C. Coppola, P. Corchado, J.M. Cordier, M.-O. Cortellessa, G. Cortés, U. Croitoru, C. Croitoru, M. Croonenborghs, T. Cucchiarelli, A. Cuenca Grau, B. Curzi, M. da Costa Pereira, C. Dague, P. Damásio, C.V. Dastani, M. De Falco, I. de Jong, H. de la Higuera, C. de la Rosa i Esteva, J.L. De Paz, J.F. de Rijke, M. de Weerdt, M. Dechter, R. Declerck, T. Del Pozo, D. Della Cioppa, A. Della Mea, V. Della Torre, M. Delteil, A.
708 909 653 823 753 703 393 626 881 122 907 70 578 413 663 606 739 865 214 530 668 643, 875 194, 653, 723, 789 703 708 871 871 847 765 40 765 453 224 733 256 783 229 817 885 643 318 423 913 841 643 783 668 693 45
Demazeau, Y. Demiris, Y. Denis, M. Devignes, M.-D. Di Gaspero, L. Di Rosa, E. Di Sciascio, E. Dignum, F. Dimopoulos, Y. Dingli, A. Dixon, S. Doherty, P. Donati, A. Donini, F.M. Donnarumma, F. Dorado, J. Dowell, A. Driessens, K. Edelkamp, S. Eiter, T. El Falou, M. Elkind, E. Engel, T. Escoffier, B. Esposito, F. Evans, D. Everaere, P. Ezzahir, R. Fabiani, P. Faddoul, J. Fakotakis, N. Fan, T.-F. Fargier, H. Farsinia, N. Faulhaber, A. Félix, D. Felner, A. Ferilli, S. Ferreira, N. Flach, P. Flener, P. Flesch, I. Flouris, G. Foo, N. Fotinopoulos, A.M. Fouquier, G. Fratini, S. Freitas, A.T. Fromont, E.
863 271 703 127 668 510 739 889 463 829 241 933 703 739 783 142 388 779, 847 905 60 735 393 869 366 361 813 737 901 583 725 xiii 749 50 725 276 893 495 361 658 162 520 807 70 867 85 611 703 229 653
937
Fusenig, V. Gallien, M. Gallinari, P. Gama, J. Gange, G. Garcia, F. Gatt, A. Gatti, N. Gavanelli, M. Gebser, M. Gechter, F. Geffner, H. Génin, T. Gerding, E.H. Gerevini, A. Gerritsen, C. Ghahramani, Z. Ghallab, M. Ghassem-Sani, G. Ghédira, K. Gini, M.L. Giordano, L. Giroux, S. Giunchiglia, E. Giunchiglia, F. Glad, A. Goertz, M. Goetschalckx, R. Grastien, A. Gruer, P. Guid, M. Guo, J. Guo, Y.Z. Haarslev, V. Haasdijk, E. Haase, C. Hacker, M. Hadzic, T. Hamadi, Y. Hand, D.J. Hao, Y. Hardcastle, D. Haringer, M. Hartman, L. Hartrumpf, S. Harzallah, M. Hebrard, E. Herzig, A. Hitzler, P.
869 631 767 172 505 583 678 403 903 15 865 588 418 448 573 877 8 xiii 293 563 398 855 811 510 743 626 827 779 209, 787 865 234 122 251 725 761 25 323 698 525 132, 777 308 162 673 931 313 20 475 741 80, 99
Hjelm, H. Hnich, B. Hoche, S. Hoffmann, J. Hogg, D.C. Holte, R. Hoogendoorn, M. Hormazábal, N. Horrocks, I. Hotz, L. Huang, X. Hué, J. Huettig, M. Hunsberger, L. Hunter, J. Huyck, C.R. Iannone, L. Iglesias, J.A. Ingrand, F. Inoue, K. Ishizuka, M. Jaffry, S.W. Janhunen, T. Järvisalo, M. Jeavons, P.G. Jennings, N.R. Junttila, T. Kabanza, F. Kaci, S. Kakas, A. Kaminka, G. Kamp, V. Kan John, P. Karampiperis, P. Karkaletsis, V. Karlsson, L. Karssemeijer, N. Katakis, I. Kather, A. Kaufmann, B. Keinänen, H. Keinänen, M. Kern-Isberner, G. Keyder, E. Kissmann, P. Kitzelmann, E. Kiziltan, Z. Klapaftis, I.P. Klein, M.
288 475 162 558 606 495 266, 398, 877 885 40 673 839 94 683 553 678 815 648 825 631 35 333 877 75 535 530 393, 428, 448 535 931 376 747 825 673 209 688 303, 688 616 658 763 184 15 857 857 65 588 905 781 475 298 266
938
Knorr, M. Konev, B. Konieczny, S. Konstantinidis, G. Koriche, F. Koukam, A. Kowalski, T. Kraus, S. Krawczyk, V. Krieger, H.-U. Krivec, J. Krötzsch, M. Kruijff, G.-J. Kubera, Y. Kudenko, D. Kuminov, D. Kunegis, J. Kuntz, P. Kuter, U. Kutz, O. Labský, M. Lagoon, V. Lamb, L.C. Lamma, E. Lamperti, G. Landén, D. Lang, J. Law, E. Le Goc, M. Le Guillou, X. Lecoutre, C. Lécué, F. Ledezma, A. Léger, A. Leis, A. Leopold, T. Lesaint, D. Lesire, C. Letombe, F. Lewis, P. Li, H. Li, J.J. Li, S. Liau, C.-J. Linardaki, E. Linares López, C. Lison, P. Liu, D.-R. Liu, W.
99 55 737 70 112 865 515 861 907 841 234 80 636 383 873 438 261 20 573 89 688 505 887 903 204, 793 933 351, 366 443 745 194 500 45 825 45 688 65 698 631 911 871 328 515 515 749 283 540 636 749 356, 371
Lokaiczyk, R. Lopes Cardoso, H. López de Mántaras, R. Loukis, E. Lucas, P.J.F. Ludwig, B. Luehrs, H. Lundh, R. Luštrek, M. Lutz, C. Ma, J. Magnusson, M. Maisto, D. Maîtrepierre, R. Makino, K. Manandhar, S. Mandow, L. Mao, X. Maragoudakis, M. Maratea, M. Marinescu, R. Marques-Silva, J. Marquis, P. Martelli, A. Martínez-Velasco, A. Mary, J. Marzal, E. Mastop, R. Mastrogiovanni, F. Mata, A. Mateescu, R. Mathieu, P. Mathieu, Y.Y. Matos, P. Matsuo, Y. Matteucci, M. Mattmüller, R. Mayer, M.A. Mayer, W. Mazuel, L. McBurney, P. McNeill, F. Meditskos, G. Mehlitz, M. Mehta, D. Melis, E. Melo, F.S. Messai, N. Metakides, G.
827 468 927 769 658, 807 323 683 616 899 25, 55 356 933 783 458 60 298 480 929 769 510 913 911 50, 737 855 708 458 919 879 246 643 229 383 835 911 333 693 921 688 795 727 388 743 729, 731 261 698 276 157 127 10
939
Meyer, J.-J.Ch. Micalizio, R. Michael, L. Michalak, T. Mihailidis, A. Miller, R. Mirroshandel, S.A. Mischis, D. Mizzaro, S. Moinard, Y. Möller, R. Montaña, J.L. Monteiro, P.T. Moraitis, P. Mossakowski, T. Mouaddib, A.-I. Možina, M. Munos, R. Nakov, P. Napoli, A. Nau, D. Navigli, R. Nempont, O. Neumann, A. Nguyen, G.-H. Nguyen, T.-H. Nica, M. Niemelä, I. Nishihara, Y. Nouioua, F. O’Sullivan, B. Oddi, A. Olive, X. Oliveira, E. Onaindia, E. Otten, L. Ouyang, D. Oyama, S. Öztürk, M. Paletta, L. Paliouras, G. Palmisano, I. Pane, J. Pantelides, P.-P. Papakonstantinou, A. Papini, O. Park, L.A.F. Partalas, I. Pastorino, U.
256, 879, 889 408 747 388 811 747 293 668 668 723 725 167 229 463 89 735, 925 234 458 338 127, 147 573 765 621 15 881 631 797 535 819 224 698 703 219 468 919 913 189 771 366 601 303, 759, 775 648 743 769 448 94 251 117, 759 693
Pavão Martins, J. Pavlidis, N.G. Pazos, A. Pearson, J. Pencolé, Y. Pennerath, F. Peppas, P. Pérez de la Cruz, J.L. Perny, P. Petasis, G. Peters, G. Petridis, S. Picault, S. Piette, C. Piolle, G. Plakias, S. Planes, J. Polaillon, G. Policella, N. Pöllä, M. Ponzetto, S.P. Portet, F. Powers, D.M.W. Prade, H. Prevete, R. Provan, G. Pulido, B. Puppe, F. Qi, G. Qiu, X. Quesada, L. Quesnel, G. Quiniou, R. Rabenau, E. Rabuñal, J. Rachelson, E. Raeymaekers, S. Ramamohanarao, K. Ramisa, A. Ramon, J. Reis, L.P. Reiter, E. Renz, J. Rintanen, J. Rivero, D. Robin, S. Rodrigues, P.P. Rodriguez, S. Rodriguez-Aguilar, J.A.
859 777 142 520 789 147 85 480 490 303 65 775 383 525 863 833 911 147 703 688 751 653, 678 843 376 783 199, 851 791 683 741 839 698 583 653 703 142 583 137 251 927 847 893 678 515, 867 568, 917 142 194 172 875 891
940
Rogers, A. Roos, N. Ropers, D. Rosenschein, J.S. Roussel, O. Rousset, M.-C. Roy, P. Rozé, L. Rudolph, S. Rune Hansen, E. Ruta, D. Růžička, M. Sabouret, N. Sadikov, A. Saetti, A. Saffiotti, A. Saggion, H. Sais, L. Sakama, C. Salamon, A.Z. Salazar, N. Samadi, M. Samar, M. Samuel, É. Samulski, M. Sanchis, A. Sanner, S. Santana, A. Santos, E. Santos, P. Sato, K. Scagnetto, I. Scalmato, A. Schaeffer, J. Schaub, T. Schmidt, S. Schneider, D. Schulster, J. Schumann, A. Schut, M.C. Sebastia, L. Şensoy, M. Seremetaki, S. Servin, A. Sgorbissa, A. Shen, C. Shen, X. Shi, B. Shvaiko, P.
448 593, 929 229 861 500 881 811 194 80 799 713 688 727 234, 897 573 616 841 525, 907 35 530 891 495, 545 545 817 658 825 779 851 859 30 819 668 246 545, 703 15 261 184 703 795 761 919 773 85 873 246 839 757 428 743
Siabani, M. Sifakis, J. Simões, L. Simonin, O. Smail-Tabbone, M. Soutchanski, M. Spanjaard, O. Spies, M. Spyropoulos, C.D. Sridhar, M. Srinivasa Rao, S. Sripada, S. Staab, E. Stamatakis, K. Stamatatos, E. Steinhauer, H.J. Stenzhorn, H. Stergiou, K. Steunebrink, B.R. Strube, M. Struss, P. Stuckey, P.J. Stumptner, M. Sunayama, W. Šutovský, P. Svátek, V. Tamma, V. Tanaka, K. Tarantino, E. Tasoulis, D.K. Tennenholtz, M. ter Mors, A. Tettamanzi, A.G.B. Theseider Dupré, D. Thonnat, M. Tidemann, A. Tiedemann, P. Ţilivea, D. Tinelli, E. Torabi Asr, F. Torasso, P. Torta, G. Travé-Massuyés, L. Treur, J. Trigo, P. Tsoumakas, G. Turrini, P. Uszkoreit, H. van der Krogt, R.
495 631 761 626 127 30 490 841 xiii, 303 606 799 678 869 688 833 821 837 485 256 751 184 505 795 819 214 688 648 771 783 132, 777 438 929 453 803 3 271 799 152 739 545 408, 805 803, 805 179, 219, 789 266, 877 413 117, 763 879 328 423
941
van der Torre, L. van der Vecht, B. Van Hentenryck, P. Vassena, L. Veale, T. Velardi, P. Velikova, M. Vetsikas, I.A. Vidal, T. Villarroel Gonzales, D. Vlahavas, I. Voigt, T. vor der Brück, T. Vouros, G.A. Vytelingum, P. Wagner, G. Waisbrot, N. Walsh, T. Walther, D. Wang, Q. Weber, J. Weerkamp, W. Widmer, G.
351, 883 889 9 668 308 765 658 428 735, 789 688 117, 759, 763 184 837 775 428 733 573 475 55 122 801 318 241
Wilson, N. Witteveen, C. Wolter, F. Wooldridge, M. Wotawa, F. Wu, L. Wu, S. Würbel, E. Xiong, P. Xu, F. Yatskevich, M. Yencken, L. Yolum, P. Yue, A. Zaccaria, R. Zanella, M. Zavitsanos, E. Zhang, L. Zhang, Y. Zhao, X. Zuckerman, I. Zutt, J.
698, 849 593, 929 55 388 797, 801 839 757 94 757 328 743 343 773 371 246 204, 793 775 122 423 189 861 929
This page intentionally left blank
This page intentionally left blank
This page intentionally left blank