Saman K. Halgamuge, Lipo Wang (Eds.) Computational Intelligence for Modelling and Prediction
Studies in Computational Intelligence, Volume 2 Editor-in-chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected] Further volumes of this series can be found on our homepage: springeronline.com Vol. 1. Tetsuya Hoya Artificial Mind System – Kernel Memory Approach, 2005 ISBN 3-540-26072-2 Vol. 2. Saman K. Halgamuge, Lipo Wang (Eds.) Computational Intelligence for Modelling and Prediction, 2005 ISBN 3-540-26071-4
Saman K. Halgamuge Lipo Wang (Eds.)
Computational Intelligence for Modelling and Prediction
ABC
Dr. Saman K. Halgamuge
Dr. Lipo Wang
Associate Professor and Reader Mechatronics and Manufacturing Research Group Department of Mechanical and Manufacturing Engineering The University of Melbourne Victoria 3010 Australia E-mail:
[email protected]
Associate Professor School of Electrical and Electronic Engineering Nanyang Technological University Block S1, 50 Nanyang Avenue Singapore 639798 Singapore E-mail:
[email protected]
Library of Congress Control Number: 2005926347
ISSN print edition: 1860-949X ISSN electronic edition: 1860-9503 ISBN-10 3-540-26071-4 Springer Berlin Heidelberg New York ISBN-13 978-3-540-26071-4 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com c Springer-Verlag Berlin Heidelberg 2005 Printed in The Netherlands The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: by the authors and TechBooks using a Springer LATEX macro package Printed on acid-free paper
SPIN: 10966518
89/TechBooks
543210
Preface Understanding the decision-making processes of living systems, and the efforts to mimic them, are identified with research in Artificial Intelligence (AI). However, the recent popularity of neural networks, fuzzy systems and evolutionary computation, which are widely considered as areas related in AI, has created a need for a new definition to distinguish them from traditional AI techniques. Lotfi Zadeh, the inventor of fuzzy logic, has suggested the term “Soft Computing.” He created the Berkeley Initiative of Soft Computing (BISC) to connect researchers working in these new areas of AI. In contrast to hard computing, soft computing techniques account for the possibility of imprecision, uncertainty and partial truth. The first joint conference of neural networks, fuzzy systems, and evolutionary computation organized by the Institute of Electrical and Electronic Engineers (IEEE) in 1994 was the World Congress on Computational Intelligence. In his paper at this historic joint conference, James Bezdek defined three kinds of intelligence to distinguish Computational Intelligence from traditional AI and living systems: biological or organic, artificial or symbolic, and computational or numeric. All natural creatures belong to the first category, while traditional AI techniques remain in the second category. Computational Intelligence (CI) is the group of techniques based on numerical or sub-symbolic computation mimicking the products of nature. However a number of “new AI” methods have found a home in CI. The application of CI in emerging research areas, such as Granular Computing, Mechatronics, and Bioinformatics, show its maturity and usefulness. Recently IEEE changed the name of its Neural Network society to become the IEEE Computational Intelligence Society. The International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) held in Singapore in 2002, in conjunction with two other conferences in CI, has led to the publication of these two edited volumes. This volume contains CI methods and applications in modeling, optimisation and prediction. The other volume entitled “Classification and Clustering for Knowledge Discovery”, from the same publisher, includes the papers on Clustering and Classification. Chapters 1-10 present theory and methods of Fuzzy and Hybrid Systems useful in modeling and prediction. In contrast to the unipolar scale of the unit interval [0,1] of a fuzzy set, a more psychologically justified interval [-1,1] is considered in Chapter 1. The qualities of metal containers are predicted using a fuzzy model coupled with a vision system in Chapter 2. An extended neuro-fuzzy system is proposed in Chapter 3 using the inclusion of fuzzy relations that enhance learning and rule flexibility. A new information retrieval system is presented in Chapter 4, which combines a fuzzy relevance model and a semantic based indexing. A novel algorithm on backward reasoning for deciding on the sequence of transition firing of a fuzzy-petri net is proposed in Chapter 5. The generalized rough approximation of a fuzzy set is investigated in Chapter 6 using a weak asymmetric relation. A new approach to the fuzzy shortest path problem including a new heuristic method to find the fuzzy shortest length is proposed in Chapter 7. A method to evaluate the degree of agreement in a group of experts and the application of the method in medical diagnosis is presented in Chapter 8. A method that represents and processes fuzzy information at the coarse granular level is proposed in Chapter 9. A study on intuitive fuzzy relational images and
VI
Preface
their application in the representation of linguistic hedges and in the development of a meaningful concept of a fuzzy rough set is presented in Chapter 10. A book on Modelling and Prediction must have the essence of the various applications. We capture the range of applications varying from image processing (Chapters 11-15), audio processing (Chapters 16-17), commerce and finance (Chapters 18-20), communication networks (Chapters 21-22), and other applications (Chapters 23-28). The prediction of weed dispersal using Geographic Information Systems (GIS) spatial images, optimisation of image compression, multi-layer image transmission with inverse pyramidal decomposition, multiple feature relevance feedback in content-based image retrieval, and non-iterative independent component analysis (ICA) for detecting motion in image sequences are topics discussed in the five chapters in image processing. A practical approach to chord analysis in acoustic recognition and the concepts and applications of audio fingerprinting are presented in the two audio processing chapters. Mining patterns in the US stock market, a fuzzy rule-based trading agent, and a case study with the Brazilian Central Bank are presented in the three chapters on finance. Congestion control in packet networks and prediction of the timeout and retransmission of Transmission Control Protocol (TCP) are the subjects of the two chapters on communication networks. Chapters on other applications include user satisfaction in web searching (Chapter 23), population studies (Chapter 24), information retrieval (Chapter 25), learning Boolean formulas from examples (Chapter 26), fuzzy linear programming (Chapter 27) and controlling hybrid power systems (Chapter 28). Our thanks go to the Department of Mechanical and Manufacturing Engineering, University of Melbourne, Australia, and the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, which supported our project in various ways. We thank the many people who supported the International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) held in Singapore in 2002. We especially thank its honorary chair Prof Lotfi Zadeh, who suggested the conference's name and motivated us throughout the process. Most of the papers in this book reflect the extended work from this conference. Our special thanks go to Mr. Chon Kit (Kenneth) Chan and Mr. Siddeswara Mayura Guru for managing the paper submission and formatting process, and Mr. Sean Hegarty for proofreading. We also acknowledge the partial support of the Australian Research Council. We are grateful to all the authors for enthusiastically submitting high quality work to this publication, and Prof Janusz Kacprzyk and Springer Verlag for realizing this book project. Saman K. Halgamuge and Lipo Wang March 8, 2005
Contents Chapter 1
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation Wladyslaw Homenda and Witold Pedrycz
1
Chapter 2
An Integrity Estimation Using Fuzzy Logic P. Mariño, C. A. Sigüenza, V. Pastoriza, M. Santamaría, E. Martínez and F. Machado
19
Chapter 3
Connectionist Fuzzy Relational Systems Rafa Scherer and Leszek Rutkowski
35
Chapter 4
Semantic Indexing and Fuzzy Relevance Model in Information Retrieval Bo-Yeong Kang, Dae-Won Kim and Sang-Jo Lee
49
Chapter 5
Backward Reasoning on Rule-Based Systems Modeled by Fuzzy Petri Nets Through Backward Tree Rong Yang, Pheng-Ann Heng and Kwong-Sak Leung
61
Chapter 6
On The Generalization of Fuzzy Rough Approximation Based on Asymmetric Relation Rolly Intan and Masao Mukaidono
73
Chapter 7
A new approach for the fuzzy shortest path problem Tzung-Nan Chuang and Jung-Yuan Kung
89
Chapter 8
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning Eulalia Szmidt and Janusz Kacprzyk
101
Chapter 9
Efficient Reasoning With Fuzzy Words Martin Spott
117
Chapter 10
Intuitionistic Fuzzy Relational Images Martine De Cock, Chris Cornelis and Etienne E. Kerre
129
Chapter 11
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset Andrew Chiou and Xinghuo Yu
147
Chapter 12
Optimization of Image Compression Method Based on Fuzzy Relational Equations by Overlap Level of Fuzzy Sets Hajime Nobuhara, Eduardo Masato Iyoda, Kaoru Hirota and Witold Pedrycz
163
VIII
Contents
Chapter 13
Multi-layer Image Transmission with Inverse Pyramidal Decomposition Roumen Kountchev, Mariofanna Milanova, Charles Ford and Roumiana Kountcheva
179
Chapter 14
Multiple Feature Relevance Feedback in Content-Based Image Retrieval using Probabilistic Inference Networks Campbell Wilson and Bala Srinivasan
197
Chapter 15
Non-Iterative ICA for Detecting Motion in Image Sequences Yu Wei and Charayaphan Charoensak
209
Chapter 16
A Practical Approach to the Chord Analysis in the Acoustical Recognition Process Marcin Szlenk and Wladyslaw Homenda
221
Chapter 17
Audio Fingerprinting Concepts And Applications Pedro Cano, Eloi Batlle, Emilia Gómez, Leandro de C.T.Gomes and Madeleine Bonnet
233
Chapter 18
Mining Technical Patterns in The U. S. Stock Market through Soft Computing Ming Dong and Xu-Shen Zhou
247
Chapter 19
A Fuzzy Rule-Based Trading Agent: Analysis and Knowledge Extraction Tomoharu Nakashima, Takanobu Ariyama, Hiroko Kitano and Hisao Ishibuchi
265
Chapter 20
An Algorithm for Determining the Controllers of Supervised Entities: A Case Study with the Brazilian Central Bank Vinícius Guilherme Fracari Branco, Li Weigang and Maria Pilar Estrela Abad
279
Chapter 21
Fuzzy Congestion Control In Packet Networks Tapio Frantti
291
Chapter 22
Fuzzy Logic Strategy of Prognosticating TCP's Timeout and Retransmission Zhongwei Zhang, Zhi Li and Shan Suthaharan
309
Chapter 23
Measuring User Satisfaction in Web Searching M. M. Sufyan Beg and Nesar Ahmad
321
Contents
IX
Chapter 24
Long-Range Prediction of Population by Sex, Age and District Based on Fuzzy Theories Pyong Sik Pak and Gwan Kim
337
Chapter 25
An Efficient Information Retrieval Method for Brownfield Assessment with Sparse Data Linet Ozdamar and Melek Basak Demirhan
355
Chapter 26
A Fuzzy Method for Learning Simple Boolean Formulas from Examples Bruno Apolloni, Andrea Brega, Dario Malchiodi, Christos Orovas and Anna Maria Zanaboni
367
Chapter 27
Fuzzy Linear Programming: A Modern Tool For Decision Making Pandian Vasant, R. Nagarajan and Sazali Yaacob
383
Chapter 28
Fuzzy Logic Control in Hybrid Power Systems Josiah Munda, Sadao Asato and Hayao Miyagi
403
CHAPTER 1 Symmetrization Of Fuzzy Operators: Notes On Data Aggregation 1
Wladyslaw Homenda and Witold Pedrycz
2
1
Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-661 Warsaw, Poland,
[email protected] 2 Dept. of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada T6G 2G7 and Systems Research Institute, Polish Academy of Sciences, 01-447 Warsaw, Poland,
[email protected]
Abstract: Fundamental operators of set theory, fuzzy sets and uncertain information processing explore a unipolar scale of the unit interval [0,1] . However, it is known that human beings also handle other types of scale including a bipolar one that can be expressed on the interval [− 1,1] . This scale is of particular interest since it supports a representation of symmetric phenomena encountered in human behavior. In the paper, we investigate fundamental aspects of information aggregation operators from the standpoint of their symmetric behavior. Likewise, classical set operators are analyzed within the same setting. New type of aggregation operators - balanced operators - are defined and discussed. Keywords: unipolar and bipolar scale, information aggregation, t-norm and t-conorm, uninorm, balanced fuzzy operators
1
Introduction
Fundamentals of fuzzy sets were formulated on the basis of max and min operators applied to membership functions [27]. These operators were, then, generalized to triangular norms. In both theory and application the concept of triangular norms borrowed from [21] play an important role. They are widely utilized in many areas, e.g. logic, set theory, reasoning, data aggregation, etc. To satisfy practical requirements new operators were proposed and developed. The study is structured into 6 sections. In Section 2 we start with a review of fuzzy operators with special attention paid to triangular norms and uninorms. In Section 3 the motivation of this work is given. In Section 4 we introduce balanced operators based on triangular norms, then we study relations between balanced t-conorms and uninorms and finally we define balanced uninorms, the operators that satisfy intuitive constraints put on aggregation operators. The main sections of the paper are supplemented by conclusions and a bibliography.
W. Homenda and W. Pedrycz: Symmetrization Of Fuzzy Operators: Notes On Data Aggregation, Studies in Computational Intelligence (SCI) 2, 1–17 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
2
W. Homenda and W. Pedrycz
2
PRELIMINARIES
So far most of studies of aggregation operators have been conducted on the unit interval 0,1 . The list of such operators includes triangular norms [21] and uninorms [18,26] that are of special interest of this study.
[ ]
2.1
Triangular norms
Triangular norms were introduced in probabilistic metric spaces, c.f. [21] and then they were adopted as generalization of union and intersection of fuzzy sets. Triangular norms are mappings from the unit square into the unit interval satisfying the following axioms: Definition: Triangular norms (t-norms and t-conorms) are mappings p : 0,1 × 0,1 → 0,1 , where p stands for both t-norm t and t-conorm s , satisfying the following axioms: 1. p ( a, p (b, c )) = p ( p ( a, b), c ) associativity
[ ] [ ] [ ]
p (a, b) = p (b, c) commutativity monotonicity 3. p ( a, b) ≤ p (c, d ) if a ≤ c and b ≤ d boundary conditions 4. t (1, a ) = a for a ∈ [0,1] , s (0, a ) = a for a ∈ [0,1] Any t-norm t satisfies the following boundary condition t (0, a ) = 0 for all 0 ≤ a ≤ 1 . For any s-norm we have s (1, a ) = 1 for 0 ≤ a ≤ 1 . 2.
A t-norms (s-norm respectively) is called to be strong if it is continuous and strictly decreasing (increasing, respectively). Strong t-norm (t-conorm, respectively) satisfies the following condition:
t ( x , x ) < x ( s ( x, x ) > x )
( )
x ∈ 0,1 . for all Example The following mappings are t-norms and t-conorms: 1. minimum and maximum, i.e. t ( a, b) = min(a, b) , s ( a, b) =
max(a, b) 2. product and probabilistic-sum, i.e. t (a, b) = a * b , s ( a, b) = a + b − a * b 3. every strict t-norm t is generated by an additive generator f t : t (x, y ) = g t ( f t (x ) + f t ( y )) where: • ft : [0,1] → [0,+∞ ] , f t (1) = 0 , f t ( x ) + → +∞ x →0
•
ft is continuous, strictly decreasing, • gt denotes the inverse of ft . 4. every strict t-conorm s is generated by an additive generator f s : s ( x, y ) = gs ( fs ( x) + fs ( y )) where: • f s : [0,1] → [0,+∞ ] f s : (0 ) = 0 , f s ( x ) − → +∞ x →1
•
fs is continuous, strictly increasing, • gs denotes the inverse of fs .
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
3
f t : [0,1] → [0,+∞] f (x ) = (1 − x) / x and f (x ) = x /(1 − x) are applied as additive generators of t-norms
Note: the following two mapping:
f s : [0,1] → [0,+∞]
and t-conorms ongoing. In the Figures 8 and 9 the right upper quarter of the plots of balanced t-norm and balanced t-conorm present plots of the above t-norm and t-conorm. t-norm t and t-conorm s are said to be dual if they satisfy the following condition with respect to strong negation operator
n( x ) = 1 − x :
s (a, b ) = 1 − t (1 − a,1 − b ) Note: cf., for instance, [18] for weaker conditions on representation of triangular norms by additive generators, [17,18] for studies on continuous triangular norms, [18] for discussion on fuzzy negation operators. t-conorm s and t-norm t could be replaced each by other in this formula in order to define t-norm dual to a given t-conorm. Duality of norms implies that properties of dual norms are dual. Note that the max-min and product-probabilistic-sum are dual pairs of tnorms and t-conorms.
2.2
Uninorms
Recently, unification, extensions and generalizations of triangular norms were introduced, c.f. [4,18,26]. Both the neutral element 1 of a t-norm and the neutral element 0 of a tconorm are boundary points of the unit interval. However, there are many important operations whose neutral element is an interior point of the underlying set. Employing this information we can replace the boundary condition of the definition of triangular norms by any value of the unit interval 0,1 . The fact that the first three axioms of the definition of triangular norms coincide, i.e. the only axiomatic difference lies in the location of neutral element, leads to close relations between triangular norms and uninorms and nullnorms. Definition: Uninorm u and nullnorm v are mappings, where p stands for both u
( )
and v : p : 1. 2. 3. 4.
[0,1]× [0,1] → [0,1] satisfying the following axioms: p
is associative, commutative and monotone,
there exists an identity element for all
x ∈ [0,1] u ( x, e) = x
there exists a neutral element
e ∈ [0,1] such that
a ∈ [0,1] such that
x ∈ [a,1] v( x,1) = x boundary condition for uninorm u and nullnorm v Obviously, a t-norm t is a special uninorm with identity element e = 1 and a t-conorm s is a special uninorm with identity element e = 0 . Assuming that u is a uninorm with identity e and if v is defined as v(x, y ) = 1 − u (1 − x, 1 − y ) , then v is a uninorm with identity 1 − e . v is called the for all
x ∈ [0, a ] v( x,0) = x
and for all
dual uninorm of u . This means that uninorm and its dual analogue differ quantitatively while they are similar from the perspective of global properties discussed in this paper. So that duality is not discussed here. Assuming that u is a uninorm with identity element e, we have:
4
W. Homenda and W. Pedrycz
u (a,0 ) = 0 for all a ≤ e , u (a,1) = 1 for all a ≥ e 2. x ≤ u ( x, y ) ≤ y for all x ≤ e and e ≤ y 3. either u (0,1) = 0 or u (0,1) = 1 Assuming that u is a uninorm with identity e ∈ (0,1) , the mappings tu and su are t1.
norm and t-conorm respectively, cf. [7], we obtain
tu ( x , y ) = su (x, y ) =
u (ex, ey ) e
and
u (e + (1 − e )x, e + (1 − e ) y ) 1− e
or equivalently:
x y uu (x, y ) = etu , e e
if x, y ∈ [0, e]
and
x−e y −e uu (x, y ) = e + (1 − e )su , if x, y ∈ [e,1] 1− e 1− e The above results show that uninorm domain is split into four areas: two squares determined by left-bottom and right-top vertexes 0,0 , e, e and e,e , 1,1 respectively, on one hand, and two remaining rectangles fitting up to the unit square. Uninorm restricted to the first square remains a squeezed t-norm and restricted to the second square remains a squeezed t-conorm, cf. Figure 3.
{( ) ( )}
2.3
{( ) ( )}
Brief survey of other aggregation operators
In [23] OWA (ordered weighted averaging aggregation operators) were introduced. This family of new type operators were concerned with the problem of aggregating multicriteria information to form an overall decision function. In particular, OWA operators have property of lying between the “and”, requiring all the criteria to be satisfied, and the “or”, requiring at least one of the criteria to be satisfied. In other words, OWA operators form a family of a generalized mean operator, cf. [24.25]. In [22] it is stated that classical fuzzy connectives are not symmetric under complement. Partitioning a universe into two subsets having comparable significance suffers from the asymmetry of fuzzy connectives subsequently, a class of operations on fuzzy sets called symmetric summation is developed with intent to avoid the asymmetry drawback of classical fuzzy operators. The above attempts were based on the classical model of fuzzy sets which does not outline an aspect of negative information, cf. [14]. Several attempts have also been developed to explore negative information. An interesting extension of classical fuzzy sets introduced in [1] – intuitionistic fuzzy sets - copes with a kind of negative information that is attached to classical fuzzy set as, in general, a separate component. This additional component deals with negative information, creating a balance to positive information dispersed on the unit interval of classical fuzzy
5
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
sets. The system of intuitionistic fuzzy sets, however, does not provide a tool to combine two separated kinds of information besides a simple condition – the degree of indeterminacy. When investigating a unipolar scale, it becomes clear that the right part of the unit interval - i.e. values close to 1 - numerically expresses the states of strong positive information, e.g. (almost) full inclusion of an element into a set. It would be intuitively desired that the meaning of middle part of the unit interval – i.e. values close to 0.5 – expresses the states of weak information with a kind of neutral information or lack of information expressed by the numerical value 0.5. Likewise, the left part of the unit interval would be expected to represent the states of strong negative information, e.g. (almost) full exclusion of an element from a set. However, it is not clear whether, in the case of classical fuzzy sets, a value close to 0 expresses the state of strong negative information or rather a state of weak positive information. One may anticipate that a value close to 0 denotes a state of strong exclusion. Otherwise - assuming that numerical values monotonically increases from negative through neutral to positive information - the membership function does not provide a mechanism describing state(s) of exclusion or degrees of negative information. And then it is consistent to assume that the value 0.5 expresses the state of neutral information, lack of information, etc. Thus, symmetry of the interval − 1,1 would match the numerical representation of uncertainty better than asymmetry of the unit interval 0,1 does. And, indeed, many studies apply the interval − 1,1 as a numerical representation of information. The very early medical expert system MYCIN, cf. [3], combined positive and negative information by a somewhat ad hoc invented aggregation operator. In [6] it was shown that MYCIN aggregation operator is a particular case in a formal study on aggregation of truth and falsity values. A linear space on the space of fuzzy sets with membership grades valued on the interval −1,1 was defined in [14]. In the sequal it was extended to the interval −1,1 was done. In [11] algebraic structures of fuzzy sets with membership grades valued on the interval − 1,1 were introduced. Based on strict t-norms and t-conorms additive and multiplicative groups were defined. An immersion of fuzzy sets with membership grades valued on the interval − 1,1 into algebraic structures based on the interval − 1,1 was discussed. The immersion provides a practical approach to process fuzzy sets with membership grades in the interval − 1,1 in the framework of algebraic structures. In [15] the model of fuzzy neuron based on algebraic structures of fuzzy sets was studied. The interval − 1,1 is considered and endowed with an algebraic structure like a ring in [7]. The motivation stems from the idea of decision making, where scales which are symmetric with respect to 0 are needed, to represent a symmetry in the behavior of a decision maker. t-conorms and t-norms are utilzed and relations to uninorms are done. Symmetric pseudo-addition and symmetric pseudo-multiplications on the interval − 1,1 were defined and an attempt to build a ring is presented. It is shown that strict t-norms are of importance in algebraic structures created on the interval − 1,1 . An extensive survey of aggregation operators, as well as a discussion on various aspects of information aggregation has been provided in several studies, e.g. [5,20]. An interesting idea of re-scaling true/false concept from 0,1 values to the interval − 1,1 was developed.
[
[ ]
[
(
)
(
)
]
[
[
]
(
[
]
]
)
]
[
]
[
[
{ }
]
[
]
]
6
W. Homenda and W. Pedrycz
[
]
A number of papers considered values of membership function on the interval − 1,1 . However, no negative meaning of information is suggested, c.f. [2,10,19,28]. In [9] an extension of triangular norms to the interval − 1,1 is introduced. Yet, that extension is rather imprecisely defined and does not address an issue of inconsistency between operators’ asymmetry and the human symmetry requirement.
[
3
BALANCED EXTENSION
3.1
Motivation
]
A kind of asymmetry of the set of values of aggregation operators could be observed: if the state of (full) inclusion of the element is denoted by 1, the state of exclusion could be denoted by –1 rather than 0. This observation would be irrelevant since most of studies of fuzzy sets and all studies on crisp sets have been targeted on the unit interval − 1,1 . Yet, both scales unipolar with the unit interval
[− 1,1] are indistinguishable i (x ) = 2 x − 1 . Therefore it
[0,1]
[
]
and bipolar with the symmetric interval
in the sense of each
simple isomorphism as e.g.
would be unreasonable to convert studies between mathematically indistinguishable scales without more benefits except then forcing symmetry of the targeted scale. However, it is known that human beings handle other types of scale including the bipolar one modeled by the interval − 1,1 , cf. [8]. It has been shown that the use of bipolar scales is of particular interest in several fields since they enable the representation of symmetry phenomena in human behavior, when one is faced with positive (gain, satisfaction, etc.) or negative (loss, dissatisfaction, etc.) quantities, but also weak or neutral information with a kind of disinterest (does not matter, not interested in, etc.). These quantities can also be interpreted in context of relation between an element and a set as inclusion/exclusion/lack-of- information or – so called – positive, negative and neutral information. Furthermore, symmetry of the aggregation operator is expected by human beings. This expectation can be briefly expressed as the following: The preservation principle: aggregation operators should preserve strength of aggregated information, i.e. to keep or even to increase the strength of strong information and to keep or even increase the weakness of weak information. Considering the unipolar scale and assuming that the left end of the unit interval expresses negative information, the right end of the unit interval expresses positive information and the middle part represents weak information of positive and negative type, the following conditions illustrate preservation principle with regard to aggregation of two pieces of information: 1. if both pieces of data are strongly positive represented by numerical values close to 1 – say 0.7 and 0.9 - the aggregation result would be at least as strong as the stronger piece of data, i.e. greater or equal to 0.9 2. and vice versa, for strongly negative data pieces represented by numerical values close to 0 – say 0.3 and 0.1 - the aggregation result would be at least as strong as the stronger data piece, i.e. less or equal to 0.1,
[
]
7
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
3. for weak positive data (weak negative data, respectively) expressed by numerical values 0.55 and 0.65 the aggregation result would be as weak as the weaker data, i.e. less or equal to 0.55, though greater then 0.5, 4. having contradictory pieces of information, i.e. strong positive and strong negative – say 0.7 and 0.3 - it would be expected that result of aggregation will be neutral, equal to 0.5. The re-formulation of these conditions to a bipolar scale relies on rescaling of the unit interval 0,1 into the interval − 1,1 , c.f. Figure 1. Classical aggregation operators do not satisfy the preservation principle despite they are valued on the unit interval 0,1 , the interval − 1,1 or any other interval. To elaborate this defect in more detail, consider strong t-norm applied to non-crisp arguments 0 < a, b < 1 . The result value t a, b is, of course, less than the value of the smaller argument. It means that t-norm weakens aggregated positive data and strengthens aggregated negative data, despite the numerical values of these data. The same discussion concerns strong t-conorm - it weakens aggregated negative data and strengthens aggregated positive data. This evidence could be extended to any t-norm and any t-conorm. This phenomenon is clearly outlined when convergence of aggregation of an infinite sequence of repeated non-crisp data unit is considered:
[ ]
[
]
[ ]
[
(
]
)
(t ( (a ).... ), a),
lim t t (.... , a a ) n → 0 if a < 1 → +∞ t n times
(s (s( (a , a )....), lim s .... s a ) ,
a ) n → 1 if 0 < a → +∞ n times
Thus, it is clear that t-norms and t-conorms are inconsistent with symmetry requirements of information aggregation, cf. Figure 2.
negative -1
neutral
negative
positive
0
1
0
neutral
positive
1
1/2 t-norm
negative -1
neutral 0
positive
negative
1
0
neutral 1/2
positive
1
s-norm
Fig. 1 Preservation rule - human anticipation of aggregation symmetry
Fig. 2 Asymmetry of triangular norms
The preservation rule with respect to uninorms is satisfied only when either strong positive data units or strong negative data units are aggregated. Yet, weakness of aggregated data is not preserved. Only neutrality of data (neutral data piece is represented numerically by the identity element of uninorm) is preserved, cf. Figure 4.
8
W. Homenda and W. Pedrycz
1 between min and max
squizzed t-conorm
squizzed t-norm
between min and max
negative
0
e
0
neutral
positive
e
1
u-norm
e
Fig. 4 Uninorms – violation of the preservation rule
1
Fig. 3 Structure of uninorms
The experience with aggregation of infinite sequence of repeated non-crisp data unit gives the following result for uninorms consisting of strong t-norm and strong t-conorm:
1 if e < a ≤ 1 (u(... ),
lim u → e if a = e u ( a, a )... a ) n → +∞ 0 if 0 ≤ a < e n times Similar behavior characterizes nullnorms. This discussion shows that classical operators become asymmetrical in information aggregation process. Thus the isomorphism i x = 2 x − 1 applied in scale symmetrization makes more confusion instead of solving the inconsistency. In fact, this isomorphism makes the scale symmetrical, but does not change asymmetry of aggregation operators. Concluding, we can formulate the following: The asymmetry observation: the idea of fuzzy sets relied on spreading positive information (information about inclusion) fitting the crisp point 1 into the interval
()
{}
(0,1] , while negative information still remain bunched into the crisp point {0}, cf. Figure 5.
The asymmetry observation interprets uncertainty handling in classical fuzzy sets. It founds an alternative model of fuzzy sets, a model that handles both kinds of information: positive and negative. It forms also a basis of a balanced extension of classical fuzzy sets. This extension relies on a distribution of crisp negative information located at the point 0 into the interval − 1, 0 . It offers symmetry of positive and negative information handling.
{ }
[
]
9
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
3.2
The extension method
The balanced extension of classical fuzzy sets and balanced extension of classical operators relies on dispersion of crisp negative information located at the point 0 into the interval
{ }
[− 1, 0] . Balanced extension does not affect classical fuzzy sets based on the unit interval [0,1] . As a consequence, classical fuzzy sets will be immersed into new space of balanced
fuzzy sets. Since both kinds of information – positive and negative – are equally important, 0 0
crisp information
crisp information
1
dispersion of the crisp positive information
1
dispersion of the crisp positive information crisp negative information
crisp negative information
0
fuzzy positive information
dispersion of the crisp negative information
1
Fig. 5 Fuzzy sets as dispersion of crisp positive information into the interval 0,1@
-1
fuzzy negative information
0
fuzzy positive information
1
Fig. 6 Dispersion of crisp negative information into the interval > 1,0
it would be reasonable to expect that such balanced extension supports the symmetry of positive/negative information with a center of symmetry placed in the middle of scale: The symmetry principle: the extension of fuzzy sets to balanced fuzzy sets relies on spreading negative information that fit the crisp point 0 of fuzzy set (information about
[
{ }
)
exclusion) into the interval − 1,0 . The extension should preserve symmetry of positive and negative information. This extension preserves properties of classical fuzzy operators handled for positive information processing. It provides the symmetry of positive/negative information handling with the center of symmetry placed in the point 0 , cf. Figures 6 and 7. A measure of inclusion of an element into a fuzzy set is expressed by a positive values, a measure of exclusion of an element from a fuzzy set is expressed by a negative value of membership function. Of course, the greater the absolute value of membership function, the more certain information about inclusion/exclusion. Classical aggregation operators, cf. f in Figure 7, have the unit square 0,1 × 0,1 as
{ }
[ ] [ ]
[ ]
domain and the unit interval 0,1 as co-domain. Balanced operators, cf. F in Figure 7, are equal to classical operators on the unit square. Due to the symmetry principle, balanced operators are defined by the formula F x, y = − f − x,− y on the square
(
)
(
)
10
[
W. Homenda and W. Pedrycz
) [
[
)
)
− 1,0 × − 1,0 . The interval − 1,0 obviously includes values produced by the balanced operator for the negative values of its arguments. This method determines the structure of the negative information space. Such a concept is consistent with the human symmetry expectation: negative information space has the same structure as its positive counterpart. So then the space of negative information is entirely constrained by its origin in the symmetrical reflection. Unlike, aggregation of positive and negative data would be defined with regard to practical applications. A fuzzy set is usually defined as a mapping from a universe X onto the unit interval 0,1 . Thus, the space of fuzzy sets over the universe X is denoted by F while space of
[ ]
balanced fuzzy sets will be denoted by
G:
F ( X ) = [0,1] = {µ | µ : X → [0,1]} X
G ( X ) = [− 1,1] = {µ | µ : X → [− 1,1]} X
4
BALANCED OPERATORS
The concept of balanced extension of aggregation operators was introduced in [12,13] as an extension and generalization of classical aggregation operators. The concept of balanced aggregation operators defined with respect to the bipolar scale is based on the asymmetry observation and symmetry principle discussed in Section 3.1. The balanced extension method was formulated in Section 3.2 and is illustrated in Figure 7.
4.1
Balanced triangular norms
In this section we introduce formal definitions of balanced fuzzy operators. The intuitive introduction of balanced operators outlined in Figure 5 and discussion on symmetry issues
-1
unconstrained area of balanced operators F(x,y) reflection of classical aggregation operators: F(x,y)=-f(-x,-y)
classical aggregation operators f:[0,1]x[0,1]->[0,1] F(x,y)=f(x,y)
unconstrained area of balanced operators F(x,y)
-1
1
f – classical operator F – balanced operator
Fig. 7 Balanced extension of fuzzy sets
11
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
in Section 3 outlined the fundamental method of fuzzy sets extension. The balanced extension is based on the assumption that classical fuzzy operators are immersed in their balanced extensions. Therefore, the classical operators are equal to their balanced
[ ] [ ]
[ ]
extensions on the unit interval 0,1 or on the unit square 0,1 × 0,1 . Precisely – identity of classical operators and their counterparts might not be satisfied in boundary points. For instance the classical negation and its balanced counterpart – the inverse operators – have different values at 0 . The immersion of classical system of fuzzy sets into its balanced extension preserves features of classical system on its domain (with isolated points excluded from this preservation, in some cases), i.e. on the unit interval 0,1 or on the unit square 0,1 × 0,1 .
{ }
[ ]
[ ] [ ]
Notice that balanced operators are mappings from the interval square
[− 1,1]
or from the
[− 1,1]× [− 1,1] into the interval [− 1,1] satisfying the symmetry principle.
Definition: Balanced t-norms T and balanced t-conorms S are mappings: P : [− 1,1]× [− 1,1] → [− 1,1] , where P stands for both balanced t-norm and balanced tconorm, satisfying the following axioms: 1. 2. 3.
associativity, commutativity & mononicity of both mappings the domain
4.
5.
[− 1,1]× [− 1,1]
T (1, a ) = a for a ∈ [0,1] S (0, a ) = a for a ∈ [0,1] P ( x, y ) = − P ( − x, − y )
and
S
on
boundary conditions symmetry
The balanced counterpart of a strong negation operator inverse operator (with respect to symmetry principle):
I : [− 1,1] → [− 1,1] ,
T
n(x ) = 1 − x is defined as an
1− x I ( x) = − 1 − x undefined
[
] [
for
x>0 x<0 x = 0
]
while balanced negation is the mapping: N : − 1,1 → − 1,1 N ( x) = − x Example Balanced triangular norms defined with respect to classical strong triangular norms are generated by additive generators: 1. balanced t-norm T generated by an additive generator f t :
g ( f (x ) + f t ( y )) T ( x, y ) = t t 0
for
x* y > 0 where: x* y ≤ 0
•
ft is a continuous, strictly decreasing function undefined in 0 , ft : [− 1,1] − {0} → [− ∞,+∞] , ft (− 1) = f t (1) = 0 , f t ± → ±∞ ,
•
gt denotes the inverse of ft .
x →0
12
W. Homenda and W. Pedrycz
S is generated by an additive generator f s : S ( x, y ) = gs ( fs ( x) + fs ( y )) where: • fs is a continuous, strictly increasing function, • f s : [− 1,1] → [− ∞,+∞ ] , f s (0 ) = 0 , f s → ±∞ x → ±1
2. balanced t-conorm
• gs denotes the inverse of fs . Note: the following mappings are extensions of the additive generators used in the example in Section 2.1:
f t : [− 1,1] → [− ∞,+∞]
(1 − x) / x f t (x ) = undefined
f s : [− 1,1] → [− ∞,+∞]
x ≠ 0 x = 0
for
f s (x ) = x / (1 − x )
The contour plots of the above balanced t-norm and balanced t-conorm are shown in Figures 8 and 9. Note that according to the symmetry principle and the balanced extension method the right-upper quarter of the plots are equal to the plots of t-norm and t-conorm.
Y constant value 0
Y 0.9
0.6 0.3
0.6 0.3
X
X
-0.3
-0.3
constant value 0
-0.6
-0.6
-0.9
Fig. 8 The plot of balanced t-norm
Fig. 9 The plot of balanced t-conorm
The infinite sequence of information converge to the following limits:
1 → 0 lim S (S (....S (a, a )....), a ) n →+∞
− 1 n times 1 lim T (T (....T (a, a )....), a ) n → 0 →+∞
− 1 n times
if if
0 < a ≤1 a=0
if
−1 ≤ a < 0
if
a =1
if
−1< a <1
if
a = −1
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
13
Note that balanced t-conorm convergence is similar to uninorm convergence with respect to the following piecewise transformation:
i(x ) =
4.2
x−e e
x ∈ [0, e] and i (x ) =
for
x−e 1− e
for
x ∈ [e,1]
Balanced max/min operators
Since basic triangular norms min and max can be approximated by strong triangular norms, then this observation may be applied to the whole domain of balanced triangular norms. Let
( (
f n (x ) = sign(x ) * x / 1 − x n
n
))be an additive generator of balanced t-conorm
defined in the example in Section 2.1. It is easily proved that the balanced t-conorms
[0,1]× [0,1] to arbitrary accuracy: lim S n (x, y ) n → max (x, y ) for 0 ≤ x and 0 ≤ y → +∞
Sn
approximate basic t-conorm max in the unit square
This limit formula could be used as a basis of a definition of the balanced t-conorm max, i.e. it could be extended to the whole domain − 1,1 × − 1,1 as follows:
[
] [ ] S _ max(x, y ) = lim S n (x, y ), x, y ∈ [− 1,1]× [− 1,1] − {(x, y ) : x ≠ 0 and x ≠ y} n→∞ It is assumed that
S _ max is undefined for x = − y and x ≠ 0 , otherwise
S _ max will not be associative. The limit formula is simplified to the expression:
max( x, y ) N (max(( N ( x), N ( y ))) S _ max( x, y ) = 0 udefined
x + y > 0 x + y < 0 x, y ∈ [− 1,1]× [− 1,1] 0 x = = y otherwise
By analogy, the balanced t-norms min could be defined by the limit formula:
T _ min ( x, y ) = lim Tn (x, y ) n →∞
where
for
x, y ∈ [− 1,1]× [− 1,1]
Tn are balanced t-norms generated by additive generators dual to f n . Finally
min( x, y ) T _ min( x, y ) = N (min(( N ( x), N ( y ))) 0
for
x ≥ 0, y ≥ 0 x ≤ 0, y ≤ 0 otherwise
14
W. Homenda and W. Pedrycz
4.3
Balanced t-conorms vs. uninorms
Aggregation features of balanced t-conorms and uninorms are similar, i.e. both type of operators strengthen non-crisp information. On the other hand, balanced t-norm weakens non-crisp information – what is consistent with nullnorm behavior. Neutral information is neither distinguished by uninorm, nor by nullnorm, nor by balanced triangular norms. The balanced t-conorms, as defined in Section 4.1 are special cases of uninorms in the sense of the isomorphism defined at the end Section 4.1. Amazingly, balanced triangular norms as well as uninorms and nullnorms are similar products of two different viewpoints on fuzzy sets extensions. Detailed properties of balanced triangular norms and uninorms and nullnorms might differ. Despite of this, the general meaning of balanced triangular norms and of uninorms and nullnorms are the same in the sense of isomorphic mapping between them. The definition of balanced t-conorm includes the symmetry axiom in addition to other axioms that are common for uninorm and balanced t-conorm: associativity, commutativity, monotonicity and boundary conditions. The extra restriction – the symmetry axiom - means that not every uninorm is isomorphic with a balanced t-conorm while every t-conorm is isomorphic with some uninorm. Precisely, every balanced t-conorm is isomorphic with a set of uninorms that satisfy the symmetry axiom and differ in the unit elements. Of course, any two uninorms of such a set are isomorphic in the sense of an isomorphism analogous to the formula presented at the end of Section 4.1. Two sets of uninorms related to any two balanced t-conorms are disjoint assuming that respective balanced t-conorms are different. Moreover, the set of uninorms that are not isomorphic with any balanced t-conorm and the sets of uninorms related to balanced t-conorms partition the set of all uninorms, i.e., they create equivalence classes of an equivalence relation.
The same concerns observations the balanced t-norms and nullnorms. We have: u : u is a uninorm . Let us consider isomorphic mapProposition 4.3.1. Let U
^
`
pings as defined at the end of Section 4.1. Then, the pair
U , | S is an equivalence rela-
u and v , u | S v iff u and v are isomorphic with the same balanced t-conorm S or none of u and v is isomorphic with any balanced t-conorm
tion if for every two uninorms
S. 1 between min and max
squizzed t-conorm
squizzed t-norm
between min and max
unconstrained values e
-e
-1
0
between min and max
squizzed t-conorm
squizzed t-norm
between min and max
-e
e
1
unconstrained value
-1
Fig. 10 The structure of balanced uninorms
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
15
Proposition 4.3.2. The analogous proposition holds for the families of nullnorms and balanced t-norms. These propositions describe the characteristic of the set of all t-conorms (t-norms, respectively) as a family of equivalence classes of the relation | S ( |T , respectively) defined on the set of all uninorms (nullnorms, respectively).
4.4
Balanced uninorms
Neither balanced triangular norms, nor uninorms, nor nullnorms satisfy the preservation rule, cf. Figures 1 and 2. Applying the method of balanced extension to uninorms leads to an intuitively appealing definition of balanced uninorm defined as follows: Definition: Balanced uninorm is a mapping U : 1,1 u 1,1 o 1,1 , satisfying the axioms:
>
@ >
@ >
@
1.2.3.5. U is associative, commutative, monotone and symmetrical, 4. identity element: there exists an identity element e [0,1] such that for all x [0,1] U ( x, e) x The structure of balanced uninorm is displayed in Figure 10. As in case of balanced triangular norms, the values of balanced uninorms on the squares 0,1 u 0,1 and
> @ > @
> 1,0@u > 1,0@ are determined by the values of uninorm and symmetry principle. The values of balanced uninorm on the squares > 1,0@ u >0,1@ and >0,1@ u > 1,0@ are unconstrained and could be defined according to subjective aim of application (in fact, values in these regions are determined by monotonicity axiom). This matter, as a subject of detailed discussion, is out of the general considerations of this paper and is not investigated here. Assuming that balanced uninorm takes value 0 on the areas 1,0 u 0,1 and
>
> @ >
@ > @
@
0,1 u 1,0 , it makes it possible to get a continuous balanced uninorm on the whole domain (with separated points of non-continuity). So, continuity of balanced uninorms, important from a practical point of view, is quarantined if uninorm is continuous and takes value 0 on the border of the “between min and max” area and on the unconstrained area of balanced uninorm. Obviously, similar considerations are valid in case of nullnorms, though the values of balanced nullnorms on the unconstrained area meet different type of border conditions. Finally, balanced extension of uninorms brings an expected property: it strengthens negative -1
neutral -e
0
positive e
1
U-norm
Fig. 11. Satisfaction of the preservation rule for balanced uninorms
16
W. Homenda and W. Pedrycz
strong information (strong positive and strong negative information), it weakens weak information (weak positive and weak negative information) and retains neutrality of neutral information, cf. Figure 11. The convergence experiment gives the following results for a given balanced uninorm (assuming that original uninorm is composed of t-norm and t-conorm both being strong):
1 ° 0 ° lim U U ....U a, a .... , a n o ® of ° 1 n times °¯r e
5
if
e a d1
if if
eae 1d a e
if
a
re
CONCLUSIONS
An intuitively appealing aggregation rule – the preservation of strength of information - is investigated for different aggregation operators. The lack of satisfaction of the preservation rule is shown for classical aggregation operators such as triangular norms and uninorms and nullnorms. To alleviate identified difficulties, a family of balanced uninorms has been defined. It is revealed that balanced uninorms conform to the general and intuitively acceptable aggregation mechanism. Further discussion on the aggregation of mixed types of information, repetitive application of the symmetry principle and the balanced extension method to balanced operators, continuity of balanced operators is worth pursuing.
ACKNOWLEDGMENT Support from Natural Sciences and Engineering Research Council (NSERC), Faculty of Mathematics and Information Science, Warsaw University of Technology, Poland, and Alberta Software Engineering Research Consortium (ASERC) is gratefully acknowledged.
REFERENCES 1. 2. 3. 4. 5. 6.
K. T. Atanassov, Intuitionistic Fuzzy Sets, Fuzzy Sets & Systems, 20 (1986) 87-96. J.C. Bezdek, Computing with uncertainty, IEEE Communication Magazine, Sept. 1992, pp. 24-36. B. Buchanan, and E. Shortliffe, Ruled Based Expert Systems, The MYCIN Experiment of the Stanford Heuristic Programming Project, Addison-Wesley, Reading, MA, 1984. T. Calvo at al., The functional equations of Frank and Alsina for uninorms and nullnorms, Fuzzy Sets & Systems, 120 (2001) 385-394. T. Calvo et al., Aggregation Operators: Properties, Classes and Construction Methods, in: T. Calvo, G. Mayor, R. Mesiar (Eds.), Aggregation Operators, pp. 3 – 104, Physica Verlag, Heidelberg, 2002. M. Detyniecki, Mathematical Aggregation Operators and their Application to Video Querying, PhD Thesis, l'UNIVERSITE PARIS VI, Paris, 2000.
Symmetrization Of Fuzzy Operators: Notes On Data Aggregation
7. 8.
9. 10. 11. 12. 13 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.
17
J. Fodor, R. R. Yager, A. Rybalow, Structure of uninorms, Internat. J. Uncertain Fuzziness Knowledge-Based Systems, 5 (1997) 411 –427. M. Grabisch et al., On Symmetric Pseudo-Additions and Pseudo-Multiplications: Is it possible to Built Rings on [-1,+1]?, The 9th Internat. Conf. on Inf. Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, 1-5 July, Annecy, France. M.M. Gupta, D.H. Rao, On the principles of Fuzzy Neural Networks, Fuzzy Sets and Systems, 61 (1994) 1-18. W. Homenda, Fuzzy Relational Equations with Cumulative Composition Operator as a Form of Fuzzy Reasoning, Proc. of the International Fuzzy Engineering Symposium'91, Yokohama, November 1991, 277-285. W. Homenda, Algebraic operators: an alternative approach to fuzzy sets, Appl. Math. and Comp. Sci., 1996, vol. 6, No. 3, 505-527. W. Homenda, Balanced Fuzzy Sets, Warsaw University of Technology, Preprint, September 2001. W. Homenda Triangular norms, uni- and nullnorms, balanced norms:the cases of the hierarchy of iterative operators, 24th Linz Seminar on Fuzzy Set Theory, Linz, February 4-8, 2003 W. Homenda, W. Pedrycz, Processing of uncertain information in linear space of fuzzy sets, Fuzzy Sets & Systems, 44 (1991) 187-198.. W. Homenda, W. Pedrycz, Fuzzy neuron modelling based on algebraic approach to fuzzy sets, Proc. of the SPIE's Intern. Symp., April 1995, Orlando, Florida. W. Homenda, W. Pedrycz, Symmetrization of fuzzy operators: notes on data aggregation, Proc. of the Internat. Conf. on Fuzzy Systems and Knowledge Discovery, Singapore, November 18-22, 2002 E.P. Klement, R. Mesiar, E. Pap, A characterization of the ordering of continuous tnorms, Fuzzy Sets and Systems 86 (1997) 189–195. E. P. Klement, R. Mesiar, and E. Pap, Triangular norms, Kluwer Academic Publishers, Dordrecht, 2000. B. Kosko, Fuzzy cognitive maps, Int. J. Man-Machine Studies, 24 (Jan. 1986). 65-75. Mesiar R. and Komorníková M., Aggregation Operators, Proceeding of the XI Conf. on Applied Mathematics PRIM' 96, Herceg D., Surla K. (Eds.), Institute of Mathematics, Novi Sad, 193-211, 1997. B. Schweizer, A. Sklar, Probabilistic Metric Spaces, North Holland, New York, 1983. W. Silvert, Symmetric Summation: A Class of Operations on Fuzzy Sets, IEEE Trans. System, Man, Cybernetics, 9 (1979) 659-667 R. R. Yager, On Ordered Weighted Averaging Aggregation Operators, IEEE Trans. System, Man, Cybernetics, 18 (1988) 183-190. R.R. Yager, Families of OWA operators, Fuzzy Sets and Systems 59 (1993) 125–148. R.R. Yager, D. Filev, Induced ordered weighted averaging operators, IEEE Trans. Sys. Man and Cybernetics – Part B: Cybernetics 29 (1999) 141–150. R. R. Yager, A. Rybalow, Uninorm aggregation operators, Fuzzy Sets & Systems, 80 (1996) 111-120. L. A. Zadeh, Fuzzy Sets, Inform. And Control, 8 (1965) 338-353. W.R. Zhang et al., Pool2: A Generic System for Cognitive Map Development and Decision Analysis, IEEE Trans. Systems, Man Cybernetics, 19 (Jan.-Feb. 1989) 31-39.
CHAPTER 2 An Integrity Estimation Using Fuzzy Logic P. Mariño, C. A. Sigüenza, V. Pastoriza, M. Santamaría, E. Martínez, F. Machado Department of Electronic Technology, University of Vigo, 36280 Vigo, Spain Email: {pmarino, csiguenza, vpastoriza, msanta, emtnez, fmachado}@uvigo.es
Abstract: The authors have been involved in developing an automated inspection system, based on machine vision, to assess the seaming quality in metal containers (cans) for fish food. In this work we present a fuzzy model building to make the pass/fail decision for each can, and predict the closing machine adjustment state after closing each can, from the information obtained by the vision system. In addition, it is interesting to note that such models could be interpreted and supplemented by process operators. In order to achieve such aims, we use a fuzzy model due to its ability to favor the interpretability for many applications. Firstly, the can seaming process, and the current, conventional method for quality control of can seaming, are described. Then, we show the modeling methodology, that includes the generation of representative input-output data sets, and the fuzzy modeling. After that, results obtained and their discussion are presented. Finally, concluding remarks are stated.
1 Introduction Due to the demand for high-quality products, quality assurance has a major concern in all manufacturing environments, including the food industry. The growing level of production involves new challenges to seam inspection methods. Nowadays, to guarantee the desired lifespan for the target product in the food cannery sector, a manual and destructive inspection of the seam of containers is carried out. This inspection method is based on a statistical supervisory control. The worst features of this method are its inspection rate (one can each fifteen minutes) and its slowness (one check can take up to ten minutes). As an example; if an average closing machine rate is three hundred cans per minute (some closing machine can reach up to six hundred cans per minute) then only one of each 4500 cans is checked. Moreover, if one can is found defective, as the closing machine continues closing, all seamed cans after that (3000 cans) must be retired to be analyzed. Therefore, it is important to develop an automated inspection system to improve the seaming quality control process (all cans are checked, and in line). It is for this reason that we have been involved in the design and implementation of an in-line, automated machine vision system to evaluate the seaming quality of all cans. Because the quality of the can seaming depends on external and internal dimensional attributes, the current, manual inspection is also destructive when measuring the internal features (Sect. 2). However, the machine vision system, which uses CCD cameras, will be able P. Mari˜ no et al.: An Integrity Estimation Using Fuzzy Logic, Studies in Computational Intelligence (SCI) 2, 19–33 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
20
P. Mari˜ no et al.
Fig. 1. Can pieces
Fig. 2. Can shapes and their CRSC inspected points
only to measure the external dimensional attributes. The first stage will be to find a model that estimates the can seaming quality only from its external dimensions. Besides, it will be profitable to evaluate the closing machine adjustment (CMA) state after closing each can, from the same external features, with the purpose of allowing a fast maintenance of it. In addition, it will be important that such models can be interpreted and supplemented by process operators. In this work, we explore the use of fuzzy models to solve these difficulties: to make the pass/fail decision for each can and assess the CMA state after closing each can based on external dimensional attributes. Takagi-Sugeno-Kang fuzzy models are developed using a neuro-fuzzy modeling technique (Sect. 3). The remainder of this document is organized as follows: in the next section we provide an overview of the can seaming process. Then, Section 3 describes the proposed modeling methodology. The results obtained and their discussion are presented in Section 4. Finally, we state concluding remarks in Section 5.
2 Background 2.1 The can seaming process Closing of a three piece steel food can has been well supported by researchers. In 1900 the former soldering method was changed by a new double seam processing, able to be made from increasingly faster and sanitary safer closing machines. A can comprises: a lid and a body, as depicted in Figure 1. Cans can be of different shapes: round, oblong, rectangular (see Fig. 2), and have various sizes as well. In the can seaming process, firstly, a lid is mounted on the can body filled with ingredient. Then the body and lid are held between the chuck and the lifter of machine, known as a closing machine (see Fig. 3a), and after that the assembly is kept rotating while the lid is
An Integrity Estimation Using Fuzzy Logic
21
pressed against an element of the closing machine named the seaming roll. There are two types of seaming roll (double seaming mechanism): 1st roll and 2nd roll. The first roll approaches the can lid, and rolls up the lid curl and body flange sections of the can before retreating (see Fig. 3b). Next, the second roll approaches to compress the rolled-up sections st to end the seaming (see Fig. 3c). In other words, the 1 roll rolls up the can lid and can nd body, doing mainly the bending work, while the 2 roll compresses the rolled-up sections, and mainly does the seaming work.
a)
b)
c)
Fig. 3. Outline of can seaming process: a) Closing machine holding the lid over the can body, b) First operation and, c) second operation of seaming process The can seam obtained, the double seam, is depicted in Figure 4. It consists of five thicknesses of plate interlocked or folded and pressed firmly together, plus a thin layer of sealing compound. The seaming integrity is achieved by a combination of the following two elements: x The sound mechanical interlocking of the can body and can end. x The presence of sealing compound, which effectively blocks potential leakage paths.
2.2 Current can seaming quality control There are international and national entities involved in the regulation of quality control procedures for fish cannery industries. Some of them, like FDA [7] in USA, DFO [2] in Canada, SEFEL [9] in EU, and SOIVRE [8] in Spain, use a similar definition for double seam dimensional attributes, (see Fig. 4) and integrity factors. These entities have defined a quality control based on a conventional routine statistical control (CRSC) used by the canneries. The seaming control in CRSC method is destructive, and based on seaming integrity. This integrity is assessed at several points around the seam; their number and location depends on the can shape (see Fig. 2).
22
P. Mari˜ no et al.
Fig. 4. View of a double seam section and its dimensional attributes The parameters used to estimate the seaming integrity at a point of the seam are: Compactness Rating (CR), Overlap Rating (OR), and Body Hook Butting (BHB). These are named integrity factors and are computed from the double seam dimensional attributes at said point (see Fig. 4), using the following ratios: %CR = 100·(3 EF + 2 EC)/ES,
(1)
%OR = 100·a/c
(2)
%BHB = 100·b/c
(3)
The integrity in a point is acceptable if the three integrity factors in that point are inside their respective acceptation interval (Table 1). A can is only passed if the integrity is acceptable in all its inspected points.
3 Modeling methodology As mentioned in the introduction, it is important to improve the current seaming quality control process. For this reason we have been involved in the design and implementation of an in-line, automated machine vision system to evaluate the seaming quality of all cans. The can seaming quality is assessed from the integrity factors: CR, OR, and BHB (Sect. 2.2). As has been seen in Eqs. (1, 2, and 3), they depend on several external and internal Table 1. Integrity factors for round cans: acceptation range Integrity factor %CR %OR %BHB
Range [75, 95] [45, 90] [70, 95]
An Integrity Estimation Using Fuzzy Logic
23
dimensional attributes (Fig. 4). However, the machine vision system, based on CCD cameras, will be able only to measure the external dimensional attributes: ES, HS and PC. The first stage will be to verify that is possible to find a model that estimates the can seaming quality only from its external dimensions. And so it will be profitable to evaluate the closing machine adjustment (CMA) state after closing each can, from the same external features, with the purpose of allowing a fast maintenance of it. In addition, it will be important that such models can be interpreted and supplemented by process operators (linemen in this case), i.e., such models may provide enough readability of both process (can seaming and CMA) so that their interpretability can be compared with judgments made by an skilled lineman. In order to achieve such aims, we explore the use of fuzzy models due to the ability of these systems to transform human knowledge or experience in terms of linguistic rules, where each rule can easily capture the sense of a rule of thumb used by humans. For purposes aforementioned, four fuzzy inference systems were developed, one to make the pass/fail (P/F) decision for each can, and three to estimate three CMA parameters that affect the can seaming process. All these fuzzy models are limited to use of ES, HS and PC as inputs.
3.1 Data sets Representative input-output data sets of the system should be the first step to develop the different fuzzy models. We have planned a design of experiment (DOE) [1, 11] to generate these data sets. To create the DOE, we had to find out which of the main closing machine adjustments (CMA) affect the can seaming process. It was necessary to perform successive closing experiments (trial and error) using the CRSC method, finding that the main CMA that affect a can seaming process are: plate pressure (CMA1), and first and second roll maximum approach (CMA2 and CMA3). These three CMA are depicted in Figure 3. Then, the DOE assumes these three variables as experiment factors; and where each one can have one of the following categorical values: loose, normal, or tight. A full factorial design was chosen and, as the DOE considers three controllable variables with three possible values for each one, then 33=27 regions of closing machine working universe appeared. In the light of this, these CMA parameters were employed as the indicators of CMA status. We have only worked with a specific round container format named RO-85. This format was chosen because it is the most popular and this type of can imposes the most serious speed restrictions (it is the fastest seaming cadence format). RO-85 cans have been closed in a way that nine cans belonging to each region of closing machine working were gathered and then have been checked using the CRSC method to obtain an input-output data set. As RO-85 cans are round, then this method, based on destructive analysis, checks the seaming integrity at four points around the seaming perimeter (see Fig. 2). Then the final size of the input-output data set is 27 regions of closing machine working x 9 cans per region x 4 measurement points per can = 972 measurement points. Table 2 shows the list of all measured or computed parameters at each one of these points, i.e., the characteristic vector of I/O data set. See Figure 4 as well. Finally, this I/O data set was divided into two subsets, where both are representative of the closing machine working universe. One set – training data – is used to train the models, and the other – checking data – to check. Part of these are shown in Table A.1 of the Appendix.
24
P. Mari˜ no et al.
Table 2. Characteristic vector of I/O data set Parameter name Body wall thickness End component thickness Seam thickness Seam length Countersink depth Body hook length End hook length Overlap Internal body hook length Internal seam length Compactness rating Overlap rating Body hook butting Pass/Fail decision
Abbreviation EC EF ES HS PC GC GF a b c CR OR BHB P/F
3.2 Fuzzy Modeling Fuzzy modeling is the task of identifying the parameters of a fuzzy inference system so that a desired behavior is attained, while neuro-fuzzy modeling refers to the way of applying various learning techniques developed in the neural network literature to fuzzy inference systems [4]. The fuzzy inference system (FIS) [5, 6, 13] is a rule-based system that uses fuzzy logic to reason about data. FIS is also known as fuzzy-rule-based system, fuzzy expert system, fuzzy model, fuzzy associative memory, fuzzy logic controller, and simply fuzzy system. The basic structure of a FIS consists of four conceptual components, as illustrated in Figure 5: x A fuzzifier, which translate crisp inputs into fuzzy values. x An inference engine that applies a fuzzy reasoning mechanism to obtain a fuzzy output. x A defuzzifier, which transforms the fuzzy output into a crisp value.
Fig. 5. Fuzzy Inference System (FIS)
An Integrity Estimation Using Fuzzy Logic
25
x A knowledge base which contains a rule base containing a number of fuzzy if-then rules, and a database which defines the membership functions (MF) of the fuzzy sets used in the fuzzy rules. To solve our problem, we have chosen Takagi-Sugeno-Kang (TSK) fuzzy models [10, 12] since they are suited for modeling nonlinear systems by interpolating multiple linear models. A typical fuzzy rule in a TSK fuzzy model has the form
if x is A and y is B then z=f(x,y) where A and B are fuzzy sets in the antecedent or premise, while z = f(x,y) is a crisp in the consequent or conclusion. For the purpose of interpretability we will use only TSK models where f(x,y) is a constant or a linear combination of input variables plus a constant term. These resulting FIS are respectively named zero-order or first order TSK fuzzy models. As stated before, it is necessary to create four TSK fuzzy models, one to make the P/F decision, and three to assess each one of the three CMA parameters used to estimate the CMA status. Initially, in all models, the fuzzy rules, and the fuzzy values for inputs and outputs were defined by the system designers based on linemen’s judgment and statistical analysis of input-output data sets. In all models, the chosen method of defuzzification was weighted average, and all rules were given equal weights. After that initiation, the adaptive neuro-fuzzy inference system (ANFIS) [3, 4], a well known neuro-fuzzy modeling technique, was used to optimize the fuzzy models. ANFIS was employed using a hybrid algorithm to minimize the sum of square residuals. More specifically, in the forward pass of the hybrid learning algorithm, the consequent parameters are identified by the least-squared method; while in the backward pass, the antecedent parameters are updated by gradient descent. With the aim of providing enough readability of can seaming and CMA processes, we must take the accuracy-interpretability trade-off into account to choose the best fuzzy models. A better interpretability is usually attained at the cost of accuracy and vice versa. In the light of this criterion we pick out as best model the one that provides enough accuracy while incurring as little a loss of interpretability as possible.
4 Results and discussion This section describes the results obtained when applying the methodology exposed in the former section.
4.1 Model to make the pass/fail decision for each can The first model (P/F model) emulates the CRSC method to evaluate the seaming integrity for each can inspection point and make the P/F decision. The best P/F model found uses ES, HS, and PC as inputs, P/F decision as output, and 15 rules to define relationships between inputs and output (Table 3). This P/F model is a zero order TSK fuzzy model (singleton values for each consequent). The membership functions for each input feature are shown in Figure 6 and the singleton value for each consequent in Table 3.
26
P. Mari˜ no et al.
Fig. 6. Membership functions for P/F model (dashed and solid alternating lines are used to assist readability)
Table 3. Rules for P/F model Rules If ES is ES1 & HS is HS1 If ES is ES1 & HS is HS2 If ES is ES1 & HS is HS3 & PC is PC1 If ES is ES1 & HS is HS3 & PC is PC2 If ES is ES1 & HS is HS3 & PC is PC3 If ES is ES1 & HS is HS4 If ES is ES1 & HS is HS5 If ES is ES2 & HS is HS1 & PC is PC1 If ES is ES2 & HS is HS1 & PC is PC2 If ES is ES2 & HS is HS1 & PC is PC3 If ES is ES2 & HS is HS2 If ES is ES2 & HS is HS3 If ES is ES2 & HS is HS4 If ES is ES2 & HS is HS5 If ES is ES3 * S[m] = Singleton [mean].
Pass/Fail decision * S1[0], Fail S2[0], Fail S3[1], Pass S4[0], Fail S5[0], Fail S6[1], Pass S7[0], Fail S8[0], Fail S9[0], Fail S10[2], Pass S11[0], Fail S12[0], Fail S13[0], Fail S14[0], Fail S15[0], Fail
An Integrity Estimation Using Fuzzy Logic
27
4.2 Model to assess the CMA state In Section 3.1 we have established that the main CMA that affect the can seaming process were: plate pressure, and first and second roll maximum approach (see Fig. 3). Three TSK models were obtained, where each one takes charge of the estimation of a CMA after closing each can. We have assigned a numeric value between 0 and 1 to the output in each one of the models (0 is loose, 0.5 is normal, and 1 is tight). One of the fuzzy models (2R model) estimates the second roll maximum approach (CMA3). The best 2R model obtained uses only ES, and HS as inputs, CMA3 as output, and only 4 rules (Table 4). This 2R model is also a zero order TSK fuzzy model. The membership functions for each input are shown in Figure 7 and the singleton value for each consequent in Table 4.
Fig. 7. Membership functions for 2R model
Table 4. Rules for 2R model Rules If ES is ES1 If ES is ES2 & HS is HS1 If ES is ES2 & HS is HS2 If ES is ES3 * S[m] = Singleton [mean].
2nd roll maximum approach (CMA3) * S1[1], Tight S2[0.5], Normal S3[1], Tight S4[0], Loose
Another fuzzy model (P model) evaluates the plate pressure (CMA1). The best P model attained employs ES, HS, and PC as inputs, CMA1 as output, and a rule base of 33 rules (Table 5). This P model is a first order TSK fuzzy model (first-order polynomial in the input variables is assigned to each consequent).The membership functions for each input are shown in Figure 8, and the first-order polynomial for each consequent in Table 5.
28
P. Mari˜ no et al.
Fig. 8. Membership functions for: P model (solid lines) and 1R model (dashed lines)
Table 5. Rules for P model Rules for both models If ES is ES6 & HS is HS2 & PC is PC2 If ES is ES6 & HS is HS1 & PC is PC4 If ES is ES5 & HS is HS2 & PC is PC1 If ES is ES5 & HS is HS2 & PC is PC2 If ES is ES5 & HS is HS2 & PC is PC3 If ES is ES3 & HS is HS3 & PC is PC3 If ES is ES3 & HS is HS3 & PC is PC4 If ES is ES2 & HS is HS3 & PC is PC4 If ES is ES2 & HS is HS2 & PC is PC3 If ES is ES2 & HS is HS4 & PC is PC4 If ES is ES1 & HS is HS3 & PC is PC5 If ES is ES5 & HS is HS1 & PC is PC4 If ES is ES4 & HS is HS3 & PC is PC1 If ES is ES4 & HS is HS3 & PC is PC2 If ES is ES5 & HS is HS2 & PC is PC4 If ES is ES4 & HS is HS2 & PC is PC4 If ES is ES4 & HS is HS2 & PC is PC5 If ES is ES3 & HS is HS4 & PC is PC1 If ES is ES3 & HS is HS4 & PC is PC3 If ES is ES1 & HS is HS3 & PC is PC4 If ES is ES1 & HS is HS4 & PC is PC4 If ES is ES1 & HS is HS4 & PC is PC5 If ES is ES5 & HS is HS1 & PC is PC1 If ES is ES6 & HS is HS1 & PC is PC1 If ES is ES5 & HS is HS3 & PC is PC1 If ES is ES5 & HS is HS3 & PC is PC2 If ES is ES2 & HS is HS4 & PC is PC2 If ES is ES4 & HS is HS4 & PC is PC2 If ES is ES1 & HS is HS4 & PC is PC1 If ES is ES1 & HS is HS4 & PC is PC2 If ES is ES1 & HS is HS4 & PC is PC3 If ES is ES1 & HS is HS5 & PC is PC2 If ES is ES2 & HS is HS5 & PC is PC2 *
Plate pressure (CMA1) P1[-40.57 -1.836 23.37 -22.32] P2[1.55 -0.5334 -0.1888 -0.2882] P3[-17.58 -3.411 -4.116 45.72] P4[ 4.731 -0.07885 -6.375 16.44] P5[-4.716 8.918 -15.38 39.68] P6[21.45 -22.95 4.192 18.62] P7[6.962 -1.926 -16.52 57.28] P8[8.465 3.021 2.711 -26.84] P9[4.646 6.886 10.2 -58.82] P10[18.47 -11.01 -3.803 21.64] P11[-12.04 3.601 -9.694 39.74] P12[-0.02447 -0.1794 0.05579 0.7417] P13[-5.545 18.15 -16.07 17.56] P14[7.413 -3.508 2.881 -9.964] P15[20.89 1.723 3.897 -45.13] P16[0.4597 -0.1405 -0.6883 2.916] P17[-1.192 -2.774 -2.166 16.78] P18[26.61 -31.65 60.26 -156.7] P19[-21.55 5.447 13.76 -38.52] P20[29.64 -3.345 -8.234 8.814] P21[16.4 4.208 -3.928 -12.11] P22[11.54 -11.02 -6.974 43.28] P23[-3.733 -0.8367 2.96 -2.399] P24[-2.565 2.181 0.977 -3.844] P25[4.538 -2.36 1.394 -3.807] P26[-12.88 30.37 -27.89 39.34] P27[-48.08 -0.4343 -8.669 84.85] P28[-20.66 -13.69 -2.758 70.92] P29[-6.177 -1.282 -0.7175 13.21] P30[-42.45 5.764 -0.5232 30.18] P31[38.46 1.49 -9.318 -9.38] P32[15.86 -0.4132 -1.459 -8.67] P33[12.35 1.335 1.718 -22.15]
*
P[p, q, r, s] = first-order polynomial (p·ES + q·HS + r·PC + s).
An Integrity Estimation Using Fuzzy Logic
29
Finally, a third fuzzy model (1R model) assesses the first roll maximum approach (CMA2). The best 1R model found has ES, HS, and PC as inputs, CMA2 as output, and 33 rules (Table 6). This 1R model also has the same antecedents than the P model, as can be appreciated in Tables 5 and 6. This 1R model is also a first order TSK fuzzy model. The membership functions for each input are also depicted in Figure 8 and the first-order polynomial for each consequent in Table 6.
Table 6. Rules for 1R model Rules for both models 1st roll maximum approach (CMA2) * If ES is ES6 & HS is HS2 & PC is PC2 P1[1.531 -27.44 10.93 26.76] If ES is ES6 & HS is HS1 & PC is PC4 P2[-0.2884 1.603 0.9977 -5.984] If ES is ES5 & HS is HS2 & PC is PC1 P3[0.07629 -8.144 33.46 -95.05] If ES is ES5 & HS is HS2 & PC is PC2 P4[-15.44 -11.31 25.06 -40.04] If ES is ES5 & HS is HS2 & PC is PC3 P5[-0.4916 5.865 8.673 -45.09] If ES is ES3 & HS is HS3 & PC is PC3 P6[-7.731 -7.889 -5.042 48.11] If ES is ES3 & HS is HS3 & PC is PC4 P7[-33.61 14.57 40.38 -145.3] If ES is ES2 & HS is HS3 & PC is PC4 P8[-9.041 -24.38 1.943 64.64] If ES is ES2 & HS is HS2 & PC is PC3 P9[2.18 6.656 3.083 -29.41] If ES is ES2 & HS is HS4 & PC is PC4 P10[-40.17 11.66 39.39 -129.2] If ES is ES1 & HS is HS3 & PC is PC5 P11[3.8 -12.47 -3.264 40.26] If ES is ES5 & HS is HS1 & PC is PC4 P12[0.5514 1.098 0.7131 -4.875] If ES is ES4 & HS is HS3 & PC is PC1 P13[-11.41 38.62 -13.77 -34.17] If ES is ES4 & HS is HS3 & PC is PC2 P14[-2.659 -1.996 15.22 -44.72] If ES is ES5 & HS is HS2 & PC is PC4 P15[-19.45 -2.682 -36.69 164.6] If ES is ES4 & HS is HS2 & PC is PC4 P16[1.869 -0.1834 1.518 -6.282] If ES is ES4 & HS is HS2 & PC is PC5 P17[2.555 0.7514 0.916 -7.238] If ES is ES3 & HS is HS4 & PC is PC1 P18[-27.04 52.31 -66.68 127.1] If ES is ES3 & HS is HS4 & PC is PC3 P19[2.37 -0.07172 1.95 -9.561] If ES is ES1 & HS is HS3 & PC is PC4 P20[13.44 9.937 6.871 -63.13] If ES is ES1 & HS is HS4 & PC is PC4 P21[-45.1 -32.9 6.045 108.4] If ES is ES1 & HS is HS4 & PC is PC5 P22[47.38 -144.1 94.41 -30.84] If ES is ES5 & HS is HS1 & PC is PC1 P23[1.224 -0.03894 -0.9869 2.848] If ES is ES6 & HS is HS1 & PC is PC1 P24[1.41 -0.4232 -2.104 7.308] If ES is ES5 & HS is HS3 & PC is PC1 P25[4.22 -5.216 -6.709 31.2] If ES is ES5 & HS is HS3 & PC is PC2 P26[83.59 -32.27 -21.88 48.6] If ES is ES2 & HS is HS4 & PC is PC2 P27[2.148 -15.74 -0.5319 41.17] If ES is ES4 & HS is HS4 & PC is PC2 P28[-22.68 -9.386 12.58 6.64] If ES is ES1 & HS is HS4 & PC is PC1 P29[1.93 1.177 1.4 -8.861] If ES is ES1 & HS is HS4 & PC is PC2 P30[-6.858 5.18 16.21 -62.16] If ES is ES1 & HS is HS4 & PC is PC3 P31[18.76 -20.83 22.57 -46.07] If ES is ES1 & HS is HS5 & PC is PC2 P32[-26.98 -19.16 -1.659 85.72] If ES is ES2 & HS is HS5 & PC is PC2 P33[-12.89 2.659 -1.665 12.65] * P[p, q, r, s] = first-order polynomial (p·ES + q·HS + r·PC + s).
30
P. Mari˜ no et al.
4.3 Discussion In order to assess the prediction accuracy of the models, the root-mean-square error (RMSE) of training data and checking data sets is calculated for each model. The RMSE of each model is shown in Table 7. The results show the excellence of all fuzzy models regarding accuracy. Besides, all models had similar RMSE values for the checking data set relative to the training data set, which indicates that there is no overfitting.
Table 7. *RMSE for each model P/F model Training set 0.0291 Checking set 0.0294 * RMSE (root-mean-square error).
P model 0.0270 0.0292
1R model 0.0595 0.0768
2R model 0.0060 0.0049
Regarding the interpretability of the models, P/F and 2R models are simple models with few rules. Besides, both are zero order TSK fuzzy models (the consequent of each rule is a constant). From the rules of the P/F model (Table 3) it is deduced that the can pass decision is made when one of the three cases (rules: 3, 6, or 10) happens. From the rules of 2R model (Table 4) it is deduced that CMA3 is tight in two cases (rules 1 or 3), normal in one case (rule 2), and loose in the other case (rule 4). However, P and 1R models are more complex. The number of membership functions for each input (Fig. 8), and number of rules (Tables 5 and 6) are higher in these models. They are first-order TSK models (the consequent of each rule is computed from a first-order polynomial in the inputs). Both models have the same antecedents, and the membership functions for the inputs are very much alike. However, they have different polynomial output in the same rule. Due to the complexness of both models, the interpretation of their rules is not evident. As the performance of each model must be understood as a balance between accuracy and interpretability, then P/F and 2R models provide excellent performance, while P and 1R models provide a reduced performance, because of low interpretability.
5 Concluding remarks We have been involved in the implementation of a machine vision system to improve the quality control of the can seaming process. As mentioned in Section 2, the quality of this process depends on external and internal dimensional attributes. However, the machine vision system, which uses CCD cameras, will be able only to measure the external dimensional attributes. Thus the first stage will be to find a model that estimates the can seaming quality only from its external dimensions. Besides, it will be profitable to evaluate the CMA state after closing each can, from the same external features, with the purpose of allowing a fast maintenance of it. In addition, it will be necessary that such models can be interpreted and supplemented by process operators.
31
An Integrity Estimation Using Fuzzy Logic
In spite of the fact that the seaming process of only one can format (RO-85) has been studied, as the can seaming mechanism (double seam) is common to all formats, it is reasonable to think that fuzzy models like the P/F model can be obtained to make the P/F decision for another can format. As the three studied CMA are what affect mainly the can seaming process for any type of closing machine, then it is also reasonable to think that fuzzy models like P, 1R, and 2R models can be found to assess the CMA for any closing machine. All this leads to the conclusion that is possible to design an in-line, automated machine vision system, that only extracts the external dimensional attributes of the seamed cans, makes the P/F decision for each seamed can, and the evaluation of the CMA status after closing each can. ANFIS, the neuro-fuzzy modeling technique used to optimize the fuzzy models, provided an excellent prediction accuracy to all models. However the interpretability of P and 1R models were poor. For this reason, in the future we will study the application of other fuzzy modeling techniques that improve the interpretability without a significant loss of accuracy.
Appendix: Data sets Table A.1. Part of I/O data set gathered EC
EF
0.178 0.180 0.184 0.178 0.181 0.167 0.171 0.181
0.192 0.224 0.201 0.192 0.213 0.210 0.205 0.213
ES 1.50 1.39 1.05 1.42 1.16 1.05 1.13 1.11
HS 2.66 2.68 2.45 2.67 2.93 2.96 2.81 2.73
PC 3.57 3.57 4.13 3.54 3.52 3.63 3.56 3.58
GC 1.72 1.75 1.29 1.98 2.13 1.96 1.99 2.03
GF 1.53 1.53 1.56 1.37 1.66 1.82 1.70 1.53
32
P. Mari˜ no et al.
Table A.1. (cont.) a
b
c
%CR
%OR
%BHB
P/F
0.88 0.90
1.39 1.48
1.78 1.84
62 74
49 49
78 80
Fail Fail
0.78
1.04
1.56
92
50
67
Fail
0.97 1.1 1.06 1.13 1.07
1.72 1.84 1.55 1.82 1.80
1.89 2.16 2.08 2.14 2.03
66 86 92 85 90
51 51 51 53 53
91 85 75 85 89
Fail Pass Pass Pass Pass
References 1. 2. 3. 4. 5. 6. 7.
8.
Atkinson AC, Donev AN (1992) Optimum Experimental Designs. Clasendon Press, Oxford DFO (1995) Metal can defects; classification and identification manual. Inspection Services, Department of Fisheries and Oceans, Government of Canada, Ottawa Jang JSR (1993) ANFIS: Adaptive network based fuzzy inference system. IEEE, Transactions on Systems, Man, and Cybernetics, 23(3):665–685 Jang JSR, Sun CT (1995) Neuro-Fuzzy Modeling and Control. The Proceedings of the IEEE, 83(3):378–406 Klir G, Yuan B (1995) Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, Upper Saddle River, NJ Kosko B (1992) Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice Hall, Englewood Cliffs, NJ Lin RC, King PH, Johnston MR (1998) Examination of Metal Containers for Integrity. In: Merker RI (ed) FDA’s Bacteriological Analytical Manual Online, 8th edn., rev. A, chap. 22. Center for Food Safety and Applied Nutrition (CFSAN), U.S. Food & Drug Administration (FDA) Paños-Callado C (1998) Cierres y defectos de envases metálicos para productos alimenticios, 1st edn. Secretaría de Estado de Comercio, Dirección General de Comercio Exterior, Subdirección General de Control, Inspección y Normalización del Comercio Exterior (CICE), Alicante
An Integrity Estimation Using Fuzzy Logic
9.
33
SEFEL (1999) Recommendation SEFEL no.1, for “non easy open” steel ends. In: EUROSEAM seaming interchangeability of processed 3 piece steel food can ends (NEO) fitted on straight or necked-in bodies, 1st and 2nd Part. European Secretariat of Manufacturers of Light Metal Packaging (SEFEL), Brussels 10. Sugeno M, Kang GT (1988) Structure identification of fuzzy model. Fuzzy Sets and Systems, 28:15–33 11. Taguchi G, Wu YI (1984) Introduction to off-line Quality Control. Central Japan Quality Control Association, Nagoya 12. Takagi T, Sugeno M (1985) Fuzzy Identification of Systems and Its Applications to Modeling and Control. IEEE Transactions on Systems, Man, and Cybernetics, 15(1):116–132 13. Yager RR, Zadeh, LA (eds) (1994) Fuzzy Sets, Neural Networks, and Soft Computing. Van Nostrand Reinhold, NewYork
CHAPTER 3 Connectionist Fuzzy Relational Systems Rafał Scherer12 and Leszek Rutkowski12 1
2
Department of Computer Engineering Cze¸stochowa University of Technology Al. Armii Krajowej 36, 42-200 Cze¸stochowa, Poland, http://kik.pcz.pl Department of Artificial Intelligence Academy of Humanities and Economics in Ł´od´z, ul. Rewolucji 1905 nr 64, Ł´od´z, Poland, http://www.wshe.lodz.pl
[email protected],
[email protected]
Abstract: The paper presents certain neuro-fuzzy systems developed from standard neurofuzzy systems by adding an additional fuzzy relation. In this way the rules are more flexible, and data-driven learning is improved. The systems are tested on several benchmarks. Keywords: Neuro-fuzzy systems, fuzzy relations
1 Introduction There have been an enormous number of fuzzy systems developed so far. For excellent surveys the reader is referred to [1][3][5][7][9][10][18][19]. The most commonly used systems are linguistic models and functional models introduced by Takagi and Sugeno. The linguistic systems store an input-output mapping in the form of fuzzy IF-THEN rules with linguistic terms both in antecedents and consequents. The functional fuzzy systems use linguistic values in the condition part of rules, but the input-output mapping is expressed by functions of inputs in a rule consequent part. The above models are used in all fields of machine learning and computational intelligence. They all have advantages and drawbacks. The linguistic systems use intelligible and easy to express IF-THEN rules with fuzzy linguistic values. Functional systems allow modeling of input-output mapping but they suffer from lack of interpretability. Another approach, rarely studied in the literature, is based on fuzzy relational systems (see e.g. Pedrycz [9]). This relates input fuzzy linguistic values to output fuzzy linguistic values thanks to fuzzy relations. It allows the setting fuzzy linguistic values in advance and fine-tuning model mapping by changing relation elements. They were used in some areas, e.g. to classification [16] and control [2]. In this paper we propose a new neuro-fuzzy structure of the relational system (Section 3 and 5), allowing relation elements to be fine-tuned by the backpropagation algorithm. It will be also shown that relational fuzzy systems are under specific assumptions equivalent to linguistic systems with rule weights (Section 4). Moreover, another new class
R. Scherer and L. Rutkowski: Connectionist Fuzzy Relational Systems, Studies in Computational Intelligence (SCI) 2, 35–47 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
36
R. Scherer and L. Rutkowski
of neuro-fuzzy systems, based on a relational approach with a fuzzy certainty degree, will be suggested in Section 5. Finally, the systems are tested on problems of truck backer-upper nonlinear control and nonlinear function approximation (Section 6).
2 Fuzzy Relational Systems Fuzzy relational models can be regarded as a generalization of linguistic fuzzy systems, where each rule has more than one linguistic value defined on the same output variable, in their consequents. Fuzzy rules in SISO relational model have the following form Rk : IF x is Ak THEN y is B1 (rk1 ) , y is Bm (rkm ) , . . . , y is BM (rkM ) ,
(1)
where rkm is a weight, responsible for the strength of connection between input and output fuzzy sets. Relational fuzzy systems store associations between the input and the output linguistic values in the form of a discrete fuzzy relation R (A, B) ∈ [0, 1] .
(2)
In a general case of a multi-input multi-output system (MIMO), the relation R is a multidimensional matrix containing degree of connection for every possible combination of input and output fuzzy sets. In a multi-input single-output (MISO), there are N inputs xn , and a single output. Every input variable xn has a set An of Kn linguistic values Akn , k = 1, ..., Kn n . (3) An = A1n , A2n , ..., AK n Output variable y has a set of M linguistic values Bm with membership functions µBm (y), for m = 1, ..., M (4) B = B1 , B2 , ..., BM . For certain MIMO and MISO systems the dimensionality of R becomes quite high and it is very hard to estimate elements of R. Sometimes training data are not enough large at all. For this reason we consider a fuzzy system with multidimensional input linguistic values. Then, we have only one set A of fuzzy linguistic values (5) A = A1 , A2 , ..., AK , thus the relational matrix is only two-dimensional in the MISO case. Sets A and B are related to each other with a certain degree by the K × M relation matrix r11 r11 · · · r1M r21 r22 · · · r2M (6) R= . . .. . .. .. r . km rK1 rK2 · · · rKM
Connectionist Fuzzy Relational Systems
37
3 Neuro-fuzzy Relational Systems Having given vector A¯ of K membership values µAk (¯x) for a crisp observed input value x¯ , vector B¯ of M crisp memberships µm is obtained through a fuzzy relational composition B¯ = A¯ ◦ R ,
(7)
implemented element-wise by a generalized form of sup-min composition, i.e. s-t composition K
µm = S [T (µAk (¯x) , rkm )] . k=1
The crisp output of the relational system is computed by the weighted mean m K
y¯ Sk=1 [T (µAk (¯x) , rkm )] ∑M , y¯ = m=1M x) , rkm )] ∑m=1 SK k=1 [T (µAk (¯
(8)
(9)
where y¯m is a centre of gravity (centroid) of the fuzzy set Bm . The exemplary neuro-fuzzy structure of the relational system is depicted in Fig. 1. The first layer of the system consists of K multidimensional fuzzy membership functions. The second layer is responsible for st composition of membership degrees from previous layer and KM crisp numbers from the fuzzy relation. Finally, the third layer realizes center average defuzzification. Depicting the system as a net structure allows learning or fine-tuning of system parameters through the backpropagation algorithm.
Figure 1. Neuro-fuzzy relational system
38
R. Scherer and L. Rutkowski
4 Equivalence between Linguistic and Relational Fuzzy Systems In this section, we show [15] that under certain assumptions a relational fuzzy system is equivalent to a rule-based system with a number of rules being equal to the number of elements of the matrix R.
4.1 Relational Fuzzy System Relational fuzzy systems can be interpreted as a set of fuzzy rules with certainty degrees. Assuming we have a relational fuzzy system with K input and M output linguistic values, we can write down corresponding rules Rk : IF x is Ak THEN y is B1 (rk1 ) OR OR . . . OR y is Bm (rkm ) OR . . . OR y is BM (rkM ) ,
(10)
for k = 1 . . . K. Such a rule can be decomposed into M rules with only one linguistic value in their consequents Rk1 : IF x is Ak THEN y is B1 (rk1 ) Rk2 : IF x is Ak THEN y is B2 (rk2 ) .. . (11) Rkm : IF x is Ak THEN y is Bm (rkm ) .. . RkM : IF x is Ak THEN y is BM (rkM ) . Using a product T-norm and an arithmetic mean (the boundary case of the OWA operator) as T-conorm in s-t composition, the output of the relational system becomes m1 K
y¯ K ∑k=1 {µAk (¯x) · rkm } ∑M
, (12) y¯ = m=1 x) · rkm } ∑mj=1 K1 ∑K k=1 {µAk (¯ and reducing K, we obtain y¯ =
m K
x) · rkm } ∑M m=1 y¯ ∑k=1 {µAk (¯ , x) · rkm } ∑mj=1 ∑K k=1 {µAk (¯
(13)
K m x) · rkm } ∑M m=1 ∑k=1 {y¯ · µAk (¯ . m K {µ (¯ ∑ j=1 ∑k=1 Ak x) · rkm }
(14)
and y¯ =
4.2 Rule-based fuzzy system with rule weights A system with rule weights allows us give a certainty value to each rule. Consider a fuzzy system with the following rules Ri : IF x is Ai THEN y is Bi (wi ) ,
(15)
for i = 1 . . . L where x is the input multidimensional variable, Ai is a multidimensional premise fuzzy set, y is the output variable, Bi is a conclusion fuzzy set, and wi is a rule weight [8]. For
Connectionist Fuzzy Relational Systems
39
singleton fuzzification, algebraic Cartesian product, Larsen (product) implication, and center average defuzzification the output of the system takes the form i
y¯ · µAi (¯x) · wi ∑L , (16) y¯ = i=1L ∑i=1 {µAi (¯x) · wi } where y¯i is a centre of the fuzzy set Bi . If L = K · M, every group of M rules shares the same antecedent set Ak , and every group of K rules shares the same consequent value y¯m , we can rewrite (14) as follows m x) · wkm } ∑M ∑K k=1 {y¯ · µAk (¯ , (17) y¯ = m=1 M K {µ (¯ ∑m=1 ∑k=1 Ak x) · wkm } where wkm is reindexed weight wi . It is obvious that formula (17), describing fuzzy rule-based system is equivalent to the output of a fuzzy relational system (14).
5 A Fuzzy Relational System with Linguistic Antecedent Certainty Factors The power of fuzzy systems stems from their ability to process natural language expressions. We can model nearly any term using different shapes of fuzzy sets and various modifiers, i.e. fuzzy hedges. They transform original fuzzy linguistic values (primary terms e.g. small, big) into new, more specific fuzzy sets like very fast or quite slow flow. Thanks to them, we can make expert statements more precise. The concept of fuzzy hedges is well established in the literature. In general, they can be divided into powered and shifted hedges. Powered hedges model adverbs by powering membership functions, whereas shifted hedges model adverbs by moving points of membership functions. In this section, a different view on linguistic hedges is presented [14]. Instead of modifying antecedent or consequent linguistic values, additional fuzzy sets are introduced. In this approach, a fuzzy relational system with linguistic values defined on a unitary interval is used. These values are elements of a fuzzy relation matrix R connecting antecedent or consequent linguistic values. In this case, the relation matrix contains fuzzy sets Ckm defined on a unitary interval C11 C11 · · · C1M C21 C22 · · · C2M (18) R= . .. . .. .. . . C km
CK1 CK2 · · · CKM Then, if we define several fuzzy linguistic values on a unitary interval (e.g. see Fig. 2), an expert can express his uncertainty concerning antecedent terms by a linguistic description. In SISO systems, or MISO systems with multidimensional antecedent fuzzy sets, the expert can define rules similar to the following exemplary ones R1 : IF x is exactly A1 THEN y is B1 R2 : IF x is more or less A1 THEN y is B2 3
1
(19)
3
R : IF x is roughly A THEN y is B . Rules (19) do not modify antecedent values. The membership degree of an antecedent fuzzy set is divided into several intervals by fuzzy sets Ckm . Instead of defining many antecedent sets
40
R. Scherer and L. Rutkowski
we use a smaller number of input fuzzy sets and several sets Ckm . Every fuzzy set Ak has up to M defined linguistic values Ckm . In Fig. 2 there is also the set not at all, the meaning of which is similar to the standard hedge not. It is activated when its input fuzzy set Ak is not active. The inference in this system is similar to the sup-min composition, but min operation
Figure 2. Example of fuzzy linguistic values, expressing uncertainty in rule antecedents is replaced by a membership degree µCkm τk , where τk is the membership degree of the k-th multivariate input fuzzy set. The vector of crisp memberships is obtained by K
µm = S [µCkm (µAk (¯x))] . k=1
(20)
Eq. (20) reflects the fuzzy hedge modifier operation. For example instead of a quadratic function for concentration operation very, we use a fuzzy set exactly (Fig. 2). Interpretation and operation of the sets in Fig. 2 is different from standard linguistic hedges. For example, the standard fuzzy hedge more or less dilates an input fuzzy set, whereas our roughly, more or less divide membership degree ranges into several intervals. The overall system output is also computed through a weighted average
K m S [µ (µ (¯ x ))] y ¯ k ∑M C A km m=1 k=1 . (21) y¯ = K S [µ (µ (¯ x ))] k ∑M Ckm A m=1 k=1
Graphical inference (without defuzzification) in the new system is in Fig. 3. The example is based on rules (19). The neuro-fuzzy structure of the new system is presented (Fig. 4). The
Figure 3. Graphical inference of the new system on the basis of the exemplary rules (19) first layer consists of K input multivariate fuzzy membership functions. The second layer is
Connectionist Fuzzy Relational Systems
41
responsible for the composition of certainty values and membership values from the previous layer. Finally, the third layer defuzzifies M output values through the center average method. All parameters of the neuro-fuzzy system can be tuned by the backpropagation algorithm. In this section a new fuzzy model with fuzzy sets linking antecedent and consequent fuzzy sets have been proposed. These linking fuzzy sets are elements of the relation. Thus, we can model descriptions like IF x is roughly fast THEN .... This approach does not involve standard fuzzy hedges, so input and output linguistic values remain intact. This improves the transparency of the model. Numerical simulations showed reliable behaviour of the new system. Its performance is comparable with the popular singleton model, but the new system allows expressing expert’s uncertainty regarding input linguistic variables.
Figure 4. The neuro-fuzzy structure of the relational system with fuzzy certainty degrees
6 Numerical Simulations This section presents application examples of the new relational systems. First we test our systems on the well-known problem of nonlinear truck control. Then we use the systems to approximate a nonlinear function. In both cases we compare our models with a singleton system. Systems were trained only by the backpropagation algorithm changing input and output fuzzy set widths and centers and relation R elements. In case of relational systems with fuzzy certainty degrees, widths of Gaussian fuzzy sets in relation R were fixed to 0.6. This value was picked by successive simulations to be the best. The systems were initialized randomly.
6.1 Truck Backer-Upper Nonlinear Control We tested proposed neuro-fuzzy systems on the well-known problem of a truck backer-upper nonlinear control [18]. The goal is to train a fuzzy model on a 282-element data set to park a truck. A fuzzy model generates the right steering angle of the truck on the basis of the
42
R. Scherer and L. Rutkowski
truck position, expressed by an angle and a distance from the loading dock. We do not use distance from the truck and the loading dock, assuming that the truck has always enough space to park. The truck moves backward every time by a fixed distance. As a reference point we used the Mamdani rule-based fuzzy system with an algebraic Cartesian product and 6 rules without rule weights. To compare a new system with the conventional one, we used the relational system with 6 multidimensional input Gaussian fuzzy sets and 3 output singleton fuzzy sets. Relational models with fuzzy certainty degrees had Gaussian sets in the relation R. All parameters of each system were trained only by the backpropagation algorithm. Learning errors during training for all systems for 50,000 iterations are shown in Fig. 5. Fig. 6 shows numerical simulations of truck steering. All systems give good steering trajectories. Learning
Figure 5. Learning error for a) Mamdani system, b) relational system c) relational system with fuzzy certainty degrees
Figure 6. Truck trajectories for a) Mamdani system, b) relational system c) relational system with fuzzy certainty degrees error for the crisp relational system falls slower but in this way we can fine-tune elements of the fuzzy relation. Moreover, the truck steered by the relational system needs fewer steps to reach the loading dock. The relational system with fuzzy certainty degrees performs worst, but for this price an expert can linguistically express his uncertainty for a given rule.
6.2 Nonlinear Function Approximation Here we approximate two-input single-output nonlinear function −1,5 2 y = 1 + x1−2 + x2 , 1 x1 , x2 5 .
(22)
Connectionist Fuzzy Relational Systems
43
We learned and tested systems on an original 50-element data set taken from [17]. All parameters were tuned by the backpropagation algorithm. The singleton model had 6 rules and an algebraic Cartesian product. The relational system had both 6 input fuzzy sets and output fuzzy sets, related to each other by matrix. The relational system with fuzzy certainty degrees had similar structure, however crisp values were replaced by Gaussian fuzzy sets with fixed widths. Only their centres were tuned. Error during learning is shown in Fig. 7. Root mean square error (RMSE) error after 100,000 iterations for each type of model is given in Table 1.
Figure 7. Learning error for a) Mamdani system, b) relational system, c) relational system with fuzzy certainty degrees
In this problem the relational system outperforms the other systems, and the relational system
Table 1. Root mean square error for nonlinear function Singleton model
Relational model
0.084
0.044
Relational model with fuzzy certainty degrees 0.214
with fuzzy certainty degrees again performs worst.
6.3 Synthetic Normal-mixtures Data This data set consists of 2 classes with 125 objects in each class [11]. The training set consists of 250 instances and the testing set of 1000 instances (see Fig. 8). The class distribution only allows for 8% error rate. Simulation results show that relational systems outperform the standard singleton model (Tab. 2). Also the relational system with fuzzy antecedent certainty degree performs slightly better than the standard singleton model.
6.4 Synthetic Cone-torus Data This data set consists of 3 classes with 100 objects in two classes each and 200 objects from the third class [6]. The training set consists of 400 instances, as does the testing set (see Fig. 9). Simulation results (Tab. 3) show that the relational system outperform the standard singleton model. The relational system with fuzzy antecedent certainty degree performs slightly worse than the standard singleton model.
44
R. Scherer and L. Rutkowski
Table 2. Learning results for normal mixture data Singleton model Number of rules
Testing error [%]
2
18.2
4
12.4
5
12.0
6
15.2
8
19.7 Relational system with min-max composition
Number of input and Testing error [%] output fuzzy sets 6
3
12.6
6
6
17.5
Relational system with multiplication-summation composition Number of input and Testing error [%] output fuzzy sets 2
2
18.9
3
2
14.0
3
3
10.8
4
2
10.1
4
3
10.6
6
2
10.8
6
3
11.3
Relational system with fuzzy antecedent certainty factor Number of input and Testing error [%] output fuzzy sets 4
3
11.5
7 Conclusion We proposed a new relational neuro-fuzzy system. The system allows learning all its parameters (relation matrix elements and membership function parameters) by the backpropagation algorithm. Obviously, we can set linguistic values in advance and fine-tune only elements of the relation. We also showed the equivalence between relational fuzzy systems and linguistic systems with rule weights. Finally, we suggested a new class of relational systems with fuzzy relation matrix elements, which can be regarded as linguistic certainty degrees. These are the subject of our future work. Simulations, carried out on the nonlinear control problem and function approximation, proved the usefulness of the relational systems trained by the backpropagation algorithm, showing that this method can be viewed as an alternative to classic IF-THEN systems. The relational system learns longer but the truck reaches the loading
Connectionist Fuzzy Relational Systems
45
Figure 8. Synthetic normal-mixture data
Figure 9. Synthetic cone-torus data
dock faster during testing. Also, in the case of function approximation, it outperforms classic linguistic system. Further improvement can be made by better initialization of antecedent and consequent fuzzy sets, and estimation of relation R from data.
Bibliography 1. R.Babuska, Fuzzy Modeling For Control, Kluwer Academic Press, Boston, 1998.
46
R. Scherer and L. Rutkowski
Table 3. Learning results for cone-torus data Singleton model Number of rules Testing error [%] 5 20.5 8 16.25 9 16.25 10 15.5 11 13.75 12 17.5 15 16.5 Relational system with multiplication-summation composition Number of input and Testing error [%] output fuzzy sets 5 3 18.9 6 4 20.0 8 4 20.75 9 4 16.5 10 3 19.5 10 4 16.0 10 6 11.5 10 7 18.75 10 8 17.25 Relational with fuzzy antecedent certainty factor Number of input and Testing error [%] output fuzzy sets 4 3 20.0 6 4 18.25 10 6 16.5
2. P.J.C. Branco, J.A. Dente, A Fuzzy Relational Identification Algorithm and its Application to Predict the Behaviour of a Motor Drive System, Fuzzy Sets and Systems, vol. 109, pp. 343-354, 2000. 3. E. Czogala, J. Leski, Fuzzy and Neuro-Fuzzy Intelligent Systems, Physica-Verlag, Heidelberg 1999. 4. H. Ischibuchi, T. Nakashima, Effect of Rule Weights in Fuzzy Rule-Based Classification Systems, IEEE Transactions on Fuzzy Systems, vol. 9, no. 4, pp.506-515, 2001. 5. R. J.-S. Jang, C.-T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, A Computational Approach to Learning and Machine Intelligence, Prentice Hall, Upper Saddle River, 1997. 6. Kuncheva L. I., Fuzzy Classifier Design, Physica Verlag, Heidelberg, New York, 2000. 7. D. Nauck, F. Klawon, R. Kruse, Foundations of Neuro - Fuzzy Systems, Chichester, U.K., John Wiley, 1997. 8. D. Nauck, R. Kruse, How the Learning of Rule Weights Affects the Interpretability of Fuzzy Systems, Proceedings of 1998 IEEE World Congress on Computational Intelligence, FUZZ-IEEE, Alaska pp. 1235-1240, 1998. 9. W. Pedrycz, Fuzzy Control and Fuzzy Systems, Research Studies Press, London, 1989. 10. A. Piegat, Fuzzy Modeling and Control, Physica Verlag, Heidelberg New York, 2001.
Connectionist Fuzzy Relational Systems
47
11. B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, 1996. 12. L. Rutkowski, New Soft Computing Techniques for System Modelling, Pattern Classification and Image Processing, Springer-Verlag, 2004. 13. Scherer R., Rutkowski L., Relational Equations Initializing Neuro-Fuzzy System, 10th Zittau Fuzzy Colloquium, 2002, Zittau, Germany 2002. 14. Scherer R., Rutkowski L., A Fuzzy Relational System with Linguistic Antecedent Certainty Factors, in L. Rutkowski, J. Kacprzyk (eds) Neural Networks and Soft Computing, Physica-Verlag Heidelberg, pp. 563-569, 2003. 15. Scherer R., Rutkowski L., Neuro-Fuzzy Relational Systems, 2002 International Conference on Fuzzy Systems and Knowledge Discovery, November 18-22, Singapore, 2002. 16. M. Setness, R. Babuska, Fuzzy Relational Classifier Trained by Fuzzy Clustering, IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics, Vol. 29, No. 5, October, pp. 619-625, 1999. 17. M. Sugeno, T. Yasukawa, A Fuzzy-Logic-Based Approach to Qualitative Modeling, IEEE Transactions on Fuzzy Systems, vol. 1, no. 1, pp. 7-31, 1993. 18. L.-X. Wang, Adaptive Fuzzy Systems And Control, PTR Prentice Hall, Englewood Cliffs, New Jersey, 1994. 19. R.R. Yager, D.P. Filev, Essentials of Fuzzy Modeling and Control, John Wiley Sons, Inc., 1994. 20. R.R. Yager, D.P. Filev, On a Flexible Structure for Fuzzy Systems Models in Fuzzy Sets, Neural Networks, and Soft Computing, R.R. Yager, L.A. Zadeh, Eds.,Van Nostrand Reinhold, New York, pp.1-28, 1994.
CHAPTER 4 Semantic Indexing and Fuzzy Relevance Model in Information Retrieval Bo-Yeong Kang1 , Dae-Won Kim2 , and Sang-Jo Lee1 1 2
Department of Computer Engineering, Kyungpook National University, 1370 Sangyuk-dong, Puk-gu, Daegu, Korea
[email protected] Department of Computer Science, Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu, Daejeon, Korea
[email protected]
Abstract: Let us suppose that an information retrieval system comprehends the semantic content of documents and reflects the preferences of users. Then the system can more effectively search for the information better on the Internet, and may improve the retrieval performance. Therefore, in the present study, a new information retrieval system is proposed by combining a semantic based indexing and a fuzzy relevance model. In addition to the statistical approach, we propose the semantic approach in indexing based on lexical chains. The indexing extracts the semantic concepts in a given document. Furthermore, a fuzzy relevance model combined with the semantic index calculates the exact degrees of relevance between documents based on the user preference. The combination of these notions is expected to improve the information retrieval performance. Keywords: Information retrieval, Indexing, Fuzzy relevance model
1 Introduction The goal of information retrieval system is to search the documents which user wants to obtain in a fast and efficient way [1]. If there is an information retrieval system that comprehends the semantic content of documents and reflects the preferences of users, it can be very helpful to search the information on the Internet and to improve the performance of the existing systems. Because the information retrieval system must interpret the contents of information items or documents in a collection and rank them according to the degree of relevance to the user query, the representation of document contents and the user preference are important factors in the retrieval process. Therefore, in this study, we focus on the system design in the indexing and retrieval model. The proposed system indexes the documents by the semantic approach using lexical chains; it considers not only the terms in a document, but also semantic concepts in a given document. To fully exploit the performance of the semantic indexing, a fuzzy relevance model is proposed. The fuzzy relevance model ranks documents according to the exact relevance of Bo-Y. Kang et al.: Semantic Indexing and Fuzzy Relevance Model in Information Retrieval, Studies in Computational Intelligence (SCI) 2, 49–60 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
50
Bo-Y. Kang et al.
the user preference and a user query by some metrics [2]. The remainder of this chapter is organized as follows: in Section 2, the traditional indexing methods and fuzzy relevance models are discussed. In Section 3, the basic concepts and approaches for the proposed semantic indexing are presented. In Section 4, the proposed fuzzy relevance model based on user preference is addressed. The effectiveness of the proposed approach is demonstrated in Section 5. Finally we conclude our remarks in Section 6.
2 Literature Review 2.1 Indexing Schemes Indexing methods should be based not only on the occurrences of terms in a document, but also on the content of the document. Despite this obvious need, most existing indexing and the weighting algorithms analyze term occurrences and do not attempt to resolve the meaning of the text. As a result, existing indexing methods do not comprehend the topics referred to in a text, and therefore have difficulty in extracting semantically important indexes. Many weighting functions have been proposed and tested [3, 4, 5, 6]. However, most such functions developed to date depend on statistical methods or on the document’s term distribution tendency. Representative weighting functions include such factors as term frequency (TF), inverse document frequency (IDF), the product of TF and IDF(TF·IDF). A drawback of the most TF-based methods is that they have difficulties in extracting semantically exact indexes that express the topics of a document. Thus, the TF approach fails to capture the topics of the text and cannot discriminate the degree of semantic importance of each lexical item within the text. Linguistic phenomena such as the lexical chain [7], which links related lexical items in a text, have been used to enhance the indexing performance [8, 9]. Al-Halimi and Kazman developed a method for indexing transcriptions of conference meetings by topic using lexical trees, the two-dimensional version of lexical chains. However, although their method demonstrated the potential usefulness of lexical trees in text indexing and retrieval, in its present form their method is inappropriate for use in document retrieval, because the method of Al-Halimi and Kazman can extract topics as index terms from a video of a conference but it does not contain a function to estimate the weight of each extracted topic.
2.2 Fuzzy Relevance Models The boolean relevance model is a simple retrieval model based on set theory and boolean algebra. Given its inherent simplicity and neat formalism, boolean models have received great attention in past years and have been adopted by many of the early commercial systems. But the main disadvantage is that there’s no notion of a partial match to the query conditions. Thus the exact matching may lead to retrieval of too few or too many documents. To overcome this disadvantage, a fuzzy model, also called the extended boolean model, was proposed. The fuzzy model could handle the disadvantages of the classical boolean model by introducing the notion of document weight. The document weight is a measure of the degree to which the document is characterized by each index term. The document weights for all index terms lie in the range [0, 1]. However, previous fuzzy model didn’t consider the concept of user preference. In this section, we consider three fuzzy models [10, 11], i.e., MMM, PAICE, and P-NORM models.
Semantic Indexing and Fuzzy Relevance Model in Information Retrieval
51
The MMM Model In the Mixed Min and Max (MMM) model, each index term has a fuzzy set associated with it. The document weight of a document with respect to an index term A is considered to be the degree of membership of the document in the fuzzy set associated with A. Thus given a document D with index-term weight (dA1 , dA2 , ..., dAn ) for terms A1 , A2 , ..., An , and the queries Qand = (A1 and A2 and ... and An ) Qor = (A1 or A2 or ... or An )
(1) (2)
The query-document similarity in the MMM model is computed in the following manner. SIM(Qand , D) = Cand1 × γ(A) +Cand2 × δ(A)
(3)
SIM(Qor , D) = Cor1 × δ(A) +Cor2 × γ(A)
(4)
γ(A) = min(dA1 , dA2 , ..., dAn )
(5)
δ(A) = max(dA1 , dA2 , ..., dAn )
(6)
where Cor1 ,Cor2 are softness coefficients for the or operator, and Cand1 ,Cand2 are softness coefficients for the and operator.
The PAICE Model This model is similar to the MMM model in that it assumes that there is a fuzzy set associated with each index term and document weight of a document. However, while the MMM model considers only the maximum and minimum document weights for the index terms while calculating the similarity, the PAICE model takes into account all of the document weights. SIM(Q, D) =
n
n
i=1
i=1
∑ ri−1 di / ∑ ri−1
(7)
where Q, ri mean the query and the constant coefficient, respectively. di means index term weights that is considered in ascending order for and operation and descending order for or operation.
The P-NORM Model The previous two fuzzy relevance models, MMM and PAICE models, do not provide a way of evaluating query weights. They only consider the document weights. The P-NORM model explicitly reflects the query weight in its model. Given a document D with index-term weights (dA1 , dA2 , ..., dAn ) for terms A1 , A2 , ..., An , and the queries Q with weights (qA1 , qA2 , ..., qAn ), the query-document relevance is calculated as 1/p ∑ni=1 (1 − dAi ) p (qAi ) p (8) SIM(Qand , D) = 1 − ∑ni=1 (qAi ) p 1/p ∑ni=1 (dAi ) p (qAi ) p (9) SIM(Qor , D) = ∑ni=1 (qAi ) p where p is a control coefficient ranged from 1 to ∞. In general, the P-NORM model has shown its superior effectiveness to other fuzzy relevance models.
52
Bo-Y. Kang et al.
3 Semantic Approach to Indexing Scheme Words and phrases in a document are often too specific in representing text contents preventing generic searches of information in texts or vice versa. However, before selecting indexes, the present work extracts semantically important concepts, and then identifies which indexes are really significant.
3.1 Representing Semantic Concepts Most of the traditional indexing schemes based on statistical methods suffer from limitations that diminish the precision of the extracted indexes [12]. TF is useful when indexing long documents, but not short ones. However, TF algorithms do not generally represent the exact TF because they do not take into account semantic characteristics such as anaphoras, synonyms, and so on. Even though the normalized TF × IDF method was proposed to account for the fact that TF factors are numerous for long documents but negligible for short ones which obscures the real importance of terms, this approach still uses the TF function that suffers from the same shortcomings as the TF method does. Documents generally contain various concepts, and we must determine those concepts if we are to comprehend the aboutness of a document. In accordance with the accepted view in the linguistics literature that lexical chains provide a good representation of discourse structures and topicality of segments [7], here we take each lexical chain to represent a concept that expresses one aspect of the meaning of a document. In practice, we define each lexical chain derived from a document as a concept cluster that captures one of the concepts of the document. The proposed method first extracts concept clusters that represent the semantic content of the text and assigns scores to the extracted concept clusters and lexical items in the clusters for representing semantic importance degree. Then, each concept cluster is evaluated to identify the representative concepts, which are used to guide the semantic indexes and their weights.
3.2 Extracting Representative Concepts Concept clusters are lexical chains that represent the concept or topic of a document. It is generally agreed that lexical chains represent the discourse structure of a document. For the semantic importance estimation of terms within a document, the representative concepts should be identified for a given document. To achieve this, we define two scoring functions for each concept cluster and the terms in the cluster. To compose the concept cluster by related lexical items, we use five relations – identity, synonymy, hypernymy, hyponymy, meronymy – where the relation weight is in the order listed (i.e., identity highest and meronymy lowest). The scoring function of each concept cluster (cx ) and the nouns (wi ) in the cluster, denoted by λ(cx ) and φ(wi ) respectively, are defined as follows: φ(wi ) =
∑ τ(k) × υ(k)
k∈R
λ(cx ) =
∑
wi ∈cx
φ(wi )
where τ(k) and υ(k) mean the number of weights of relation k respectively. From the weighted concept clusters, we discriminate a representative concept, since we cannot deal with all the
Semantic Indexing and Fuzzy Relevance Model in Information Retrieval
53
concepts of a document. A concept cr will be considered a representative concept if it satisfies the following criterion. ∑ λ(cx ) (10) λ(cr ) ≥ α · x |cx | where |cx | is the number of concepts in a given document. After extracting the representative concept clusters of a document, we extract the terms in each representative concept cluster as index terms that capture the aboutness of a document, and regard the assigned term scores to those terms as the index weights that represent the semantic importance within a document.
4 Preference-Based Fuzzy Relevance Model In the previous section, we presented a new indexing scheme using lexical concepts within a document. The semantically important terms obtained through the representative concepts are used as the indexes, which are referred to as the documents weights in a fuzzy information retrieval system. In this section, a new fuzzy relevance model is described. Based on fuzzy set theory, the proposed model defines the similarity between a query and a document.
4.1 Overview As already mentioned before, the previous fuzzy models were introduced to overcome the disadvantages of the classical boolean models. However, they ignored the fact that each index term has vagueness and that the user preference is a very important factor in an information retrieval system. Besides the weight of each index term, the preference on the distribution of index term weight is also essential in designing similarity model. Let us show an example. Query (and) is given in the form of composite of (index term, index term weight). Q = ( f uzzy, 0.8), (system, 0.7), (korea, 0.3), (author, 0.2)
(11)
The above query can be represented by a fuzzy set with element and membership value like this. Q = {( f uzzy, 0.8), (system, 0.7), (korea, 0.3), (author, 0.2)} (12) Then suppose that four documents (D1 , D2 , D3 , D4 ) are stored in the document collection. Each document is represented as a set of index term and its weight. D1 = {( f uzzy, 0.8), (system, 0.7)} D2 = {( f uzzy, 0.2), (system, 0.2), (korea, 0.3), (author, 0.2)} D3 = {(korea, 0.7), (author, 0.8)} D4 = {( f uzzy, 0.8), (system, 0.7), (korea, 0.3), (author, 0.2)} Given a query Q and the document collection, what is the rank result of relevance degree in document retrieval? Intuitively, D4 is the most relevant document and D3 is the least relevant one. Then how about the rank of D1 and D2 ? The classical models and previous fuzzy models can’t decide the rank of two documents in a deterministic way. The difference of similarity value between D1 and D2 is very small because the similarity value is calculated by adopting a monotonic increasing function. Thus the sum of two index terms with large weight in D1 is nearly equal to the sum of four index terms with small weight in D2 .
54
Bo-Y. Kang et al.
Hence, we take an approach to resolve the above problem by adopting the concept of user preference. User preference can provide a clearer ranking result. In the above example, a user might want to obtain a document with a high degree of membership values. In other words, by weighing a preference for high membership area, D1 has higher similarity value than D2 even though the number of matched words in D1 is smaller than D2 . In this way, our proposed model can clarify the vagueness which can occur in the retrieval process.
4.2 Relevance Model using Preference To retrieve relevant documents, the query-document similarity for documents in the collection must be calculated. The query-document similarity is an attempt to predict the relevance of a document to the query. Given N fuzzy sets, which represent documents in the collection, the goal is to find the closest one in the collection for the new fuzzy set which stands for a query. So the relevance calculation model can be transformed to the similarity computation process in fuzzy set theory. Most previous work related to the similarity calculation is based on Euclidean or Hamming measures. They simply calculate the vertical distance on each data in fuzzy set. But these kind of methods are not complete and have several error factors. And, as already mentioned, fuzzy sets representing query and documents have a vague and ambiguous property, so we must consider the possibility distribution of truth in order to obtain the similarity value. In addition to this fact, the result of a similarity comparison might depend on user preference. The change of preference viewpoints imply the change the similarity result. Several recent studies reflect these factors, but many of them lack a consistency [13, 14, 15, 16, 17, 18, 19, 20, 21] Thus, we developed a new similarity measure between fuzzy sets, which can generate a better relevance degree and reflect user preference and weight. To accomplish this, we proposed two new concepts, domain preference on the domain axis and membership degree preference on the membership axis. By introducing these concepts, a user can give a weight to the specific part which they think more important. The final similarity value is a composition of domain preference and membership preference using integration function. The approach taken in this paper has a contribution in making a generalization of similarity comparison and providing the preference degree in an information retrieval system and in fuzzy set theory.
4.3 Similarity Measure To generally extend the comparison of similarity between fuzzy sets, we must consider the user preference or intention. We define two preference values in order to reflect the user’s will.
Domain Preference and Membership-value Preference This preference is a weight on the domain axis of comparing fuzzy sets. The definition of this value is shown in Eq. 13. dDomain = f (x)dx (13) Figure 1 shows sample fuzzy sets. Figure 1(a) represents the two comparing fuzzy sets. Figure 1(b) means the applied domain preference in computing a similarity between fuzzy set A and B, in this case the user puts more weight on the right side of domain.
Semantic Indexing and Fuzzy Relevance Model in Information Retrieval
A
1.0
55
B
0.5
membership preference
(a)
(c) (b)
domain preference
Fig. 1. (a) Two fuzzy sets A, B (b) Domain preference function (c) Membership preference function The second preference value is put on the membership degree of fuzzy sets. This one changes the weight of membership value axis to reflect the user’s will. Membership preference function is defined as Eq. 14. (14) dMV = fMV (y)dy Figure 1(c) shows a simple shape of membership-value preference function.
Similarity Computation using Preference The algorithm requires two steps. First, the preliminary similarity value is calculated using the domain preference function. After that, we compute the whole similarity value by applying the membership preference function. For a point x on the domain, the similarity value ψA,B (x, y) which corresponds to specific membership value y is calculated by Eq. 15.
1 if y ≤ MIN(µA (x), µB (x)) ψA,B (x, y) = (15) 0 otherwise According to the above equation, if fuzzy set A and B is similar at specific level y, it means both fuzzy sets are greater than membership value y. By Eq. 13, domain preference can be applied, the integral form is given in Eq. 16. ξ(y) =
Domain
= Domain
ψA,B (x, y)dDomain
(16)
ψA,B (x, y) fDomain (x)dx
(17)
Figure 2 shows this situation. Given a domain preference fDomain (x) and a domain area r, the similarity calculation process is carried out at specific membership value y. For all x ∈ r, ψA,B (x, y) is set to 1 by Eq. 15. The integration value of domain function in Eq. 16 becomes the similarity value which reflects the preference at specific level y. The similarity value obtained by domain preference is based on the specific membership level y. So it’s necessary to add the membership preference concept in the second step of algorithm. This computation is represented in Eq. 18.
56
Bo-Y. Kang et al.
membership value 1.0
A B
y 0.0
r
domain
fDomain(x) : domain preference
Fig. 2. Apply the domain preference membership value fMV(y)
1.0
A B
y domain
0.0
membership preference domain preference
Fig. 3. Apply the membership value preference δA,B =
MV 1
= y=0
ξ(y)dMV
(18)
ξ(y) fMV (y)dy
(19)
δA,B is a final similarity value, which is obtained by integrating ξ(y) on membership area (MV , membership-value). This can be rewritten in Eq. 19 using membership function fMV (y), as shown in Fig. 3. In other words, a similarity value is calculated by integrating a membership preference function ( fMV (y)) and a similar value at specific level y (ξ(y)). The final similarity value (δA,B ) between fuzzy set A and B is determined by calculating for a whole membership area. Now, we’ll go back to the problem proposed in the document retrieval. Given N fuzzy sets in a document collection, the problem is find the closest fuzzy set, as represented in Eq. 20. In this equation, we denote that χ is the closest fuzzy set which corresponds to the most relevant document, A to Z are fuzzy sets which represent documents in the collection, and I is the fuzzy set which stands for a query.
Semantic Indexing and Fuzzy Relevance Model in Information Retrieval χ = MAX(δA,I , δB,I , ..., δZ,I )
57
(20)
5 Experimental Results In this section we discuss document retrieval experiments in which the proposed fuzzy information retrieval models were applied to the TREC-2 collection of 1990 Wall Street Journal (WSJ) documents, which comprises 21,702 documents. The preference-based fuzzy relevance model was compared with the well-known PAICE and P-NORM models, and moreover, the semantic-based indexing was compared with TF-based indexing by applying each indexing scheme to the three relevance models. The parameters of PAICE and P-NORM models were experimentally set to be r = 1.0 and p = 2.0 respectively. The domain preference function was set to be constant, and the membership preference of the proposed model was given in a step function; 1.0 if y ≥ 0.6 and 0.1 if y ≤ 0.6. Because the TF and semantic weight only represent the importance degree of a term within a document, and cannot express the importance degree of the term within a document collection, we used IDF in both weighting schemes for representing the importance degree of the term within a document collection. For the relevance judgement of the TREC-2 collection, 50 queries from query number 101 to 150 are supported. However, we only used 40 queries because there are no 1990 WSJ documents for the other 10 queries and we have no answer set to compare the relevance for those 10 queries. When a document is searched, the retrieval system will already have loaded the index file to memory. When a user inputs a query to the system, it constructs the inverted file appropriate to the query using the index file. As the size of the index file decreases, the time required for loading and constructing the inverted file becomes less, and the overall search time is reduced accordingly. The critical determinant of the index file size is the index term dimension; thus, a reduction in the number of index terms will decrease the size of the indexing file. When a document is indexed based on the TF, all the terms in the document are used as indexes, and hence the index term dimension simply equals the number of words in the document. However, when we index a document using the proposed indexing scheme, we first extract representative concepts from the document and then extract index terms from those concepts. Hence, the index term dimension of the document will be less than that obtained using the TF approach. This is clearly demonstrated in the present experiments on the 1990 WSJ documents, for which the average index term dimension was 89.55 using the TF method but 31.90 using the proposed method. Thus, on average the TF method represents the 1990 WSJ documents using about 90 words as index terms whereas the proposed method requires only about 32 words, indicating that the proposed scheme reduces the index term dimension by about 64.40% compared to the TF method. In the query, there are various components for retrieval clues, for example narrative, concept, and summary. We used nouns of the narrative part as the query, and processed the traditional query processing. The weighting scheme for the query was TF. After performing query processing, we searched the WSJ documents and compared the search results obtained using TF weighting with those obtained using the proposed semantic weight scheme. Table 1 lists the overall search results of the three fuzzy models using TF-based indexes, showing average precision of each method for Top 1 to Top 5 documents. The search results show that the proposed preference-based model outperforms PAICE and P-NORM models. The average precision of PAICE and P-NORM models are 8.90% and 9.07%, respectively. We see that the
58
Bo-Y. Kang et al. Table 1. Average precisions of three relevance models using TF index Top N PAICE P-NORM Proposed Top 1 Top 2 Top 3 Top 4 Top 5
12.50 10.00 7.50 7.50 7.00
7.50 8.75 8.33 11.25 9.50
10.00 10.00 13.33 11.88 11.50
Avg.
8.90
9.07
11.34
Table 2. Average precisions of three relevance models using the semantic index Top N PAICE P-NORM Proposed Top 1 Top 2 Top 3 Top 4 Top 5
20.00 12.50 12.50 11.25 11.50
22.50 21.25 20.00 20.63 19.50
20.00 22.50 23.33 21.25 20.00
Avg.
13.55
20.78
21.42
precision of the proposed system for the top-ranked document (11.34%) is higher than the other two models. Table 2 lists the average precision of the three fuzzy models using the proposed semantic indexes. We find that the proposed preference-based model also outperforms PAICE and P-NORM models. The average precision of PAICE and P-NORM models are 13.55% and 20.78%, respectively. The precision of the proposed system for the top-ranked document (21.42%) is higher than the other two models. Moreover, we see that the semantic approach to indexing shows its superior performance to TF-based indexing. The average precisions of the three fuzzy systems were markedly improved by the semantic index, especially 88.89% improved in the proposed fuzzy model.
6 Concluding Remarks In the present study a new information retrieval system has been proposed by combining the semantic based indexing and the fuzzy relevance model. So far there are few index term extraction systems that use semantic relations between words or sentences for index term detection. However, this is an important factor to understand the content of a document, thus we should not discard this semantic aspect in detecting index terms that can fully represent the content of a document. To find out index terms in documents, lexical chains were employed. Using lexical chains we detected index terms and their semantic weights. Furthermore, we
Semantic Indexing and Fuzzy Relevance Model in Information Retrieval
59
proposed an extended fuzzy information retrieval system that can handle the shortcomings of the previous models. In the proposed model, a new fuzzy similarity measure was introduced in order to calculate the relevance degree among documents that exploit user preference. The improvements of these problems will enable us to implement a robust system, by which we are able to guess the content of a document better and retrieve the exact documents.
References 1. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval, Addison-Wesley 2. Kim DW, Lee KH (2001) Proceedings of the 10th IEEE International Conference on Fuzzy Systems 3. Salton G, McGill MJ (1983) Introduction to modern information retrieval, New York 4. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval, Information Processing and Management 24:513–523 5. Fuhr N, Buckley C (1991) A probabilistic learning approach for document indexing, ACM Tracsactions on Information Systems 9:223–248 6. Lee JH (1995) Combining multiple evidence from different properties of weighting schemes, Proceedings of the 18th SIGIR Conference 180–188 7. Morris J, Hirst G (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text, Computational Linguistics 17(1):21–43 8. Kominek J, Kazmand R (1997) Accessing multimedia through concept clustering, Proceedings of CHI’97 19–26 9. Kazmand R et al (1996) Four paradigms for indexing video conferences, IEEE Multimedia 3(1):63–73 10. Salton G, Fox EA, Wu H, Extended boolean information retrieval, Communications of the ACM 26(12): 1022–1036 11. Lee JH (1994) On the evaluation of Boolean operators in the extended Boolean retrieval framework, Proceedings of the 17th ACM-SIGIR conference on Research and development in information retrieval 182–190. 12. Moens MF (2000) Automatic Indexing and Abstracting of Document Texts, Kluwer Academic Publishers 13. Lee JH (1999) Comparison, ranking and determination of the ranks of fuzzy numbers based on a satisfaction function, Ph.D. Dissertation, Department of Computer Science, KAIST. 14. Wang WJ (1997) New similarity measures on fuzzy sets and on elements, Fuzzy Sets and Systems 85:305–309 15. Lee KH (1994) Similarity measure between fuzzy Sets and between elements, Fuzzy Sets and Systems 62:291–293 16. Hong DH, Hwang SY (1994) A note on the value similarity of fuzzy systems variables, Fuzzy Sets and Systems 66:383–386 17. Fan J, Xie W (1999) Some notes on similarity measure and proximity measure, Fuzzy Sets and Systems 101:403–412 18. Koczy LT, Hirota K (1993) Ordering, distance and closeness of fuzzy sets, Fuzzy Sets and Systems 59:281–293 19. Pappis CP, Karacapilidis NI (1993) A comparative assessment of measures of similarity of fuzzy values, Fuzzy Sets and Systems 56:171–174 20. Xuecheng L (1992) Entropy, Distance measure and similarity measure of fuzzy sets and their relations, Fuzzy Sets and Systems 52:305–318
60
Bo-Y. Kang et al.
21. Chen SM et al (1995) A comparison of similarity measures of fuzzy values, Fuzzy Sets and Systems 72:79–89
CHAPTER 5 Backward Reasoning on Rule-Based Systems Modeled by Fuzzy Petri Nets Through Backward Tree Rong Yang1 , Pheng-Ann Heng2 , and Kwong-Sak Leung3 1 2 3
Dept. Computer Science and Engineering, The Chinese University of Hong Kong
[email protected] Dept. Computer Science and Engineering, The Chinese University of Hong Kong
[email protected] Dept. Computer Science and Engineering, The Chinese University of Hong Kong
[email protected]
Abstract: The crux in rule-based systems modeled by Fuzzy Petri Nets (FPN) is to decide the sequence of transitions firing. In this problem, backward reasoning shows advantages over forward reasoning. In this paper, given goal place(s), an FPN mapped from a rule-based system is mapped further into a backward tree, which has distinct layers from the bottom to the top. The hierarchical structure of the backward tree provides the order of transitions firing. The nearer the top the transition, the earlier it fires. An innovative and efficient algorithm on backward reasoning through the backward tree with detailed descriptions on data structure is proposed in this paper.
1 Introduction Petri Net (PN) is a sucessful tool for describing and studying information systems. Incorporating the fuzzy logic introduced in [9], Fuzzy Petri Net (FPN) has been widely used to deal with fuzzy knowledge representation and reasoning. It has also proved to be a powerful representation method for the reasoning of a rule-based system. However, most reasoning algorithms on rule-based systems modeled by FPNs are focus on forward reasoning [5, 2, 3, 6]. In fact, for a rule-based system modeled by FPNs, forward reasoning has an inherent limitation on determining the sequence of transitions firing. When an FPN has a complex structure, the traditional forward reasoning becomes difficult and time-consuming, because all information, not matter if it is relative to the goal or not, are considered during the whole reasoning process. Due to these reasons, backward reasoning is considered to be a better alternative for many cases. The advantage of backward reasoning is that only information relative to the goal will be considered. This feature makes the reasoning process more flexible and more R. Yang et al.: Backward Reasoning on Rule-Based Systems Modeled by Fuzzy Petri Nets Through Backward Tree, Studies in Computational Intelligence (SCI) 2, 61–71 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
62
R. Yang et al.
intelligent. Although the algorithm presented in [1] could perform fuzzy backward reasoning, it was not efficient enough in real-world applications due to the fact that it was based on interval-valued fuzzy sets for performing fuzzy backward reasoning and took a large amount of time to perform the task. In [3], Chen provided a new fuzzy backward reasoning algorithm, where an FPN was transformed into an AND-OR graph. However, it is difficult to realize this algorithm into practice for the following reasons: • The data structure of the fuzzy AND-OR graph and how to sprout it in computer language have not been explicitly provided; • Since the sequence of transitions firing has not been provided, when the algorithm is applied to an FPN with a complex structure, it meets with trouble, because the IBIS of a current nonterminal node may have the degree of truth not available when it is considered; • The lack of description on data structure of the fuzzy AND-OR graph makes it difficult to separate the four cases in programming. We now proposed an improved method from an innovative viewpoint. Here, given the goal place(s), an FPN is mapped into a tree structure, which we call the Backward Tree (BT). The BT has an explicit data structure. It comprises two parts, the layers and the terminal places. Each layer consists of several transitions and their output places. After an FPN has been mapped into a backward tree, the reasoning process can be performed easily by firing the transitions from the top layer to the bottom layer in turn. This chapter is organized as follows. Section 2 gives the preliminary knowledge on the FPNs and the fuzzy rule-based reasoning represented by FPNs. In Section 3, structure of the BT is introduced. The mapping from an FPN into a BT is also presented in this section. In Section 4, a new efficient, complete, and practical algorithm is proposed. Some examples are provided in Section 5 to illustrate the reasoning process through our algorithm. The final section gives the conclusion and the future work.
2 Fuzzy Reasoning via Petri Nets A Fuzzy Petri Net (FPN) is a directed, weighted, bipartite graph consisting of two kinds of nodes called places and transitions, where arcs are either from a place to a transition or from a transition to a place. In graphical representation, places are drawn as circles, transitions as bars. Figure 1 shows a graph of the simplest FPN, where transition t has one input place p1 and one output place p2 . According to the definition in [2], an FPN can be defined as an 8-tuples: FPN = (P, T, D, I, O, f , ω, β) where,
(1)
Backward Reasoning on Rule-Based Systems Modeled
p1
t
63
p2
Fig. 1. The simplest FPN
P = {p1 , p2 , · · · , pn } is a finite set of places; T = {t1 ,t2 , · · · ,tn } is a finite set of transitions; D = {d1 , d2 , · · · , dn } is a finite set of propositions; I : T → P is an input function, mapping transitions to their input places; O : T → P is an output function, mapping transitions to their output places; f : T → [0, 1] is a function, mapping transitions to real values between zero and one to express the Certainty Factors (CF) of the corresponding rules; ω : P → [0, 1] is a function, mapping places to real values between zero and one to denote the degrees of truth of the corresponding propositions; β : P → D is an association function, mapping places to propositions. A transition t may fire if ω(pi ) ≥ λi , ∀pi ∈ I(t), where λi is a set of threshold values between zero and one. When transition t fires, the degrees of truth of its input places are used to calculate the degree of truth of its output places according to certain firing schemes. A typical fuzzy production rule is represented in the form of “IF antecedent proposition THEN consequence proposition (CF=µ)”. Such a rule can be mapped to an FPN shown as Fig. 1, where the antecedent and consequence propositions are represented as the input and output places, respectively, and the causality between them is represented as the transition. In real applications of fuzzy rules reasoning, there exist some cases where the antecedent part or consequence part of a fuzzy rule contain “AND” or “OR” connectors. According to [2], these composite fuzzy production rules can be distinguished into the following types and their corresponding FPN structures are depicted along with their firing schemes: Type 1: IF d1 AND d2 AND · · · AND dm , THEN dz (CF = µ). Figure 2 shows the FPN structure and the firing scheme of this case, where pi = β(di ), i = 1, 2, . . . , m, z. Type 2: IF d1 THEN da AND db AND · · · AND dz (CF = µ). Figure 3 shows the FPN structure and the firing scheme of this case, where pi = β(di ), i = 1, a, b, . . . , z. Type 3: IF d1 OR d2 OR · · · OR dm , THEN dz (CF = µ). Figure 4 shows the FPN structure and the firing scheme of this case, where pi = β(di ), i = 1, 2, . . . , m, z. Let pi be a place, and t j be a transition in an FPN. If pi ∈ I(t j ), then pi is called the Nearest Backward Place of t j . All the nearest backward places of t j constitute a Set of the Nearest Backward Place of t j , denoted by SNBP(t j ). If pi ∈ O(t j ), then pi is called the Nearest Forward Place of t j . All the nearest forward places of t j
64
R. Yang et al.
p1
ω(p )
1 Q Q Q p2 ω(p2 ) XX Q f (t) = µ X s Q XX z 3 .. t . pm ω(pm )
- ω(pz ) pz
ω(pz ) = min ω(pi ) · f (t) 1≤i≤m
Fig. 2. Fuzzy reasoning process of Type 1
ω(pb )
p1
ω(p1 )
pb
3 f (t) = µ pa ω(pa ) : - Q .. Q t . Q Q s ω(p ) p Q z z
ω(pa ) = ω(pb ) = · · · = ω(pz ) = ω(p1 ) · f (t)
Fig. 3. Fuzzy reasoning process of Type 2
p1
ω(p1 )
-
µ1 = f (t1 )
Q t1
Q Q - XX Q XXQ s t2 z ω(pz ) pz X 3 .. ω(pz ) = max ω(pi ) · f (ti ) . 1≤i≤m µm = f (tm ) pm ω(pm ) tm p2
ω(p2 )
µ2 = f (t2 )
Fig. 4. Fuzzy reasoning process of Type 3
constitute a Set of the Nearest Forward Place of t j , denoted by SNFP(t j ). In an FPN, we define pi as a Terminal Place if there does not exist a transition t j such that pi ∈ O(t j ). The reasoning always starts from the terminal places, which are also called Starting Place(s). The degrees of truth of the starting places are either already known before the reasoning starts or provided by the user during the reasoning process. Another kind of place with degrees of truth that we are interested in is called the Goal Place(s).
Backward Reasoning on Rule-Based Systems Modeled
65
3 Backward Tree The difficulty in fuzzy reasoning through an FPN is to determine the sequence of transition firing. A transition can fire under the condition that the degrees of truth of all its input places are not nulls and greater than certain threshold values. If any of its input places has no information about its degree of truth, we need to go backward to find which transition infers this place. Such a backward process proceeds continuously until all input places of the current transition are available. Actually, this backward process can be expressed explicitly as a tree structure, with root(s) as the goal place(s) and leaves as the starting places. We call such a tree structure mapped from an FPN the Backward Tree (BT). The transformation from FPN to BT considers the transitions as the key element. It is different from the existing approaches [2, 3, 5, 6], where only the relationship between places are considered. This innovative idea proposed below makes it easy to perform the transformation, and furthermore, straightforward to implement programs in a computer language. As shown in Section 2, the composite fuzzy production rules can be distinguishably mapped into three types of FPNs. Here, these three types of FPNs can also be transformed into three cases of BTs, respectively: Case 1: Transition t has more than one input places p1 , p2 , . . . , pm and only one output pz (corresponding to Type 1 in Section 2). Figure 5 shows the mapping. Case 2: Transition t has only one input place p1 , and more than one output places pa , pb , . . . , pz (corresponding to Type 2 in Section 2). Figure 6 shows the mapping. Case 3: Transition t has an input place pi and an output place p j (corresponding to Type 3 in Section 2). Figure 7 shows the mapping.
p1
p2
pm
m m... m Q A Q s Q UA +t =⇒ ? mpz
p p
p
s 1 s .2. . s m QQ A t s pz
Fig. 5. Mapping from FPN to BT: Case 1
The transformation from FPNs to BTs starts at the goal place(s). For each goal place in an FPN, find all transitions with their sets of output places connecting to this goal place. These transitions and the goal places form Layer 1 of the BT. For Layer 2, all input places of the transitions of Layer 1 are selected. All transitions with their sets of output places connecting to these input places are chosen. These transitions
66
R. Yang et al. p1
m
p
?t =⇒ Q Q + s Q m m... m
s1 t s s .Q .Q . s
pa pb
pz
pa pb pz Fig. 6. Mapping from FPN to BT: Case 2 pi
m ?t ? m
p
si =⇒
t
s
pj
pj Fig. 7. Mapping from FPN to BT: Case 3
and their output places form Layer 2. Such a process is repeated until all input places of transitions in current layer are the terminal places. The BT has a hierarchical structure constituted by distinct layers from the top to the bottom. Each layer consists of several transitions and their output places. Two adjacent layers are connected so that all the input places of the transitions in the lower layer are included in the set of the output places of the next layer nearer to the top. Obviously, a BT of an FPN comprises two kinds of elements, the layers and the terminal (starting) places. Figure 8 shows an example, where the FPN shown on the left hand side is mapped into its BT shown on the right hand side.
4 Reasoning Algorithm Through the Backward Tree The hierarchical structure in the BT gives the order of transitions firing. The transitions in upper layers should fire before those in lower layers so as to guarantee that every transition has its input places available before firing. Here, the term “input places available” means that the degrees of truth of all input places of the considered transition are already derived. They have been either provided by user or inferred by other transitions and they are represented by values between zero and one. Before illustrating our algorithm, some definitions are given as follows: Layer: An array of the layer structure, which is defined as a Transitions List (TL) and a Places List (PL). Places in the PL are the output places of transitions in the TL.
Backward Reasoning on Rule-Based Systems Modeled p1
p2
m
m
@ R t1 @ p3
p4 ?
?t2
?t3
p5 ? m
p7 ? m
s p1 s p2 QQ t1
m
m
67
=⇒
@ R t4 @ p7 ? m
?t5
s p3
s p4
t2
t3
s p5 s p6 Q Q t4 s p7 t5
s p8
} } } }
Layer 4
Layer 3 Layer 2
Layer 1
p8 ? m Fig. 8. Mapping from FPN to BT: An example
Known Place Set (KPS): A list of places with their degrees of truth already known either by user input or by inference during the reasoning process. Current Layer Place Set (CLPS): A copy of the PL of current layer. Let ti be a transition and p j be a place in an FPN. Let f (ti ) represent the degree of certainty of ti , ω(p j ) denote the degree of truth of p j . A complete and practical algorithm is represented as follows: Step1: Input – FPN structure (every transition, with its input and output places), and the name of the goal place(s). Step2: Finding the terminal (strating) places, and put them into KPS – For each ti ∈ FPN, put their output places to a list called Output-Place-List; / Output-Place-List, then put p j into KPS (if overlap, For each p j ∈ FPN, if p j ∈ remain one); Put the goal place(s) into PL of Layer[1]; Set num = 1, where num is the sequence number of layer. Step3: Construct the backward tree – copy PL of Layer[num] to CLPS Start do-while loop if CLPS = Ø { for each ti ∈ FPN { for each p j ∈ O(ti ) if p j ∈ CLPS, then put ti into T L of Layer[num]
68
R. Yang et al.
(if overlap, remain one); } Reset CLPS; for each ti ∈ T L of Layer[num] { for each p j ∈ I(ti ) { if p j is not a terminal place, then put p j into PL of Layer[num + 1] (if overlap, remain one); copy PL of Layer[num + 1 to CLPS } } num + + } Set num = num − 1. Now num is the sequence number of the highest layer. Step4: Start reasoning layer by layer from the top to the bottom – Start do-while loop if num ≥ 1 { for each ti ∈ T L of Layer[num] { ti fires; for each p j ∈ O(ti ) { ω(p j ) = min(ω(pk ) f˙(ti )), where pk ∈ I(ti ) insert p j into KPS (if p j already exists in KPS, denoted as p j , then ω(p j ) = ω(p j ) if ω(p j ) < ω(p j ) ) } } num − − } Step 5: Output – Degrees of truth of the goal place(s), and degrees of truth of the places related to the goal place(s). The algorithm has two phases. One is the construction of the BT, and the other is the forward reasoning layer by layer. The layer structure provides the sequence of transitions firing, and guarantees that all input places of transitions are available before they fire. Meanwhile, during the construction of the BT, all transitions and places which have no relationship with the goal places are excluded. This makes the algorithm more efficient. After the BT is constructed, the reasoning process is just firing the transitions in turn from the most top layer to the most bottom layer.
Backward Reasoning on Rule-Based Systems Modeled
69
5 Illustration Example Let d0 , d1 , d2 , d3 , d4 , d5 , and d6 be seven propositions. Assume that the knowledge base of a rule-based system contains the following fuzzy production rules: R1 : R2 : R3 : R4 : R5 : R6 : R7 :
IF d0 IF d0 IF d2 IF d1 IF d3 IF d3 IF d0
THEN d1 (CF = 0.90) THEN d2 (CF = 0.85) THEN d3 (CF = 0.95) AND d4 THEN d5 (CF = 0.85) THEN d4 (CF = 0.90) AND d5 THEN d6 (CF = 0.75) THEN d6 (CF = 0.90)
Those rules can be modeled by an FPN shown in Fig. 9. p0
m PP q t2 P ) t1 p1 ? m
?t7
p2 ? m
?t3 p4
m ? t4
p3 ? m
t5
p5 ? m
j?t6 p6 ? - m Fig. 9. The FPN structure of the illustration example
Assume that we want to know the degree of truth of proposition d6 (in the FPN, it corresponds to p6 ), then the proposition d6 is called the goal proposition, and its corresponding place p6 is called the goal place. This FPN has one starting place p0 and its degree of truth is given as 0.90. The FPN structure being considered can be easily transformed into a BT, as shown in Fig. 10. The corresponding BT has five layers: Layer[1].T L = {t6 ,t7 }, Layer[1].PL = {p6 }; Layer[2].T L = {t3 ,t4 }, Layer[2].PL = {p3 , p5 };
70
R. Yang et al.
s p0 t2
s p2 t3
s p0
s p0
s p3
t2
t1
t5
s p2
s p1 s p4 Q Q t
t3
4
s p0 s p3 s p5 PP HH P 7 t6 HtH HHs p 6
} } } } }
Layer 5
Layer 4
Layer 3
Layer 2
Layer 1
Fig. 10. Backward tree mapped from the FPN shown in Fig. 9
Layer[3].T L = {t2 ,t1 ,t5 }, Layer[3].PL = {p2 , p1 , p4 }; Layer[4].T L = {t3 }, Layer[4].PL = {p3 }; Layer[5].T L = {t2 }, Layer[5].PL = {p2 }; According to our proposed algorithm, after constructing the BT, transitions can fire in turn from the top layer (Layer[5]) to the bottom layer (Layer[1]). The reasoning process is shown as follows: Layer[5]: t2 Layer[4]: t3 Layer[3]: t2 t1 t5 Layer[2]: t3 t4 Layer[1]: t6 t7
fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}, {p1 , 0.81}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}, {p1 , 0.81}, {p4 , 0.73}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}, {p1 , 0.81}, {p4 , 0.73}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}, {p1 , 0.81}, {p4 , 0.73}, {p5 , 0.58}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}{p1 , 0.81}, {p4 , 0.73}, {p5 , 0.58}, {p6 , 0.44}} ; fires, then KPS = {{p0 , 0.90}, {p2 , 0.77}, {p3 , 0.73}, {p1 , 0.81}, {p4 , 0.73}, {p5 , 0.58}, {p6 , 0.81}} ;
Now, the degree of truth of the goal place p6 has been derived. It is valued as 0.81. Other places, which are relative to place p6 , also have their degrees of truth be achieved.
Backward Reasoning on Rule-Based Systems Modeled
71
6 Conclusion We have presented a new backward reasoning technique for deriving possible conclusions in fuzzy rule-based systems modeled by FPNs. The crux in such systems is to determine the sequence of transitions firing. Our technique solves this problem successfully over existing forward and backward reasoning methods. In our approach, a fuzzy rule-based system is transformed into an FPN first. Given the goal place(s), an FPN is mapped further into a BT, which has distinct layers from the top to the bottom. Each layer consists of several transitions and their output places. The hierarchical structure of the BT provides the order of transitions firing. The nearer to the top the transition, the earlier it fires. The transition nearer to the top layer fires before those nearer to the bottom layer, so as to guarantee that every transition has its input places available before firing. To fire the transition, we follow the most common model, that is, the degrees of truth of output places are equal to the minimum of degrees of truth of input places multiplying the degree of certainty of the transition. This model holds a basic supposition that there is no interaction among the input places of a transition. However, in many real applications, such inherent interactions cannot be ignored. In our future work, we will consider the new nonlinear multiregression model proposed in [7, 8], and merge this model with the backward reasoning technique presented here.
Acknowledgment The work described in this paper was fully supported by the Research Grants Council of the Hong Kong Special Administrative Region (Project No. CUHK 4185/00E).
References 1. Arnould T, Tano S (1995) Interval-valued fuzzy backward reasoning. In: IEEE Trans Fuzzy Systems 3:425-437 2. Chen SM (1990) Knowledge representation using fuzzy Petri nets. In: IEEE Trans Knowledge and Data Engineering 2:311–319 3. Chen SM (2000) Fuzzy backward reasoning using fuzzy Petri nets. In: IEEE Trans SMCPart B: Cybernetics 30:846–855 4. Li X, Lara-Rosano F (2000) Adaptive fuzzy Petri nets for dynamics knowledge representation and inference. In: Expert System Applications 19:235-241 5. Looney CG (1988) Petri fuzzy nets for rule-based decision making. In: IEEE Trans SMC 18:178–183 6. Pedrycz W, Gomide F (1994) A generalized fuzzy Petri net model. In: IEEE Trans Fuzzy Systems 2:295–301 7. Wang Z, Leung KS, Wang J (1999) A genetic algorithm for determining nonadditive set functions in information fusion. In: Fuzzy Sets and Systems 102:463-469 8. Wang Z, Leung KS, Wang J, Xu K (2000) Nonlinear nonnegative multiregression based on Choquet integrals. In: International Journal of Approximate Reasoning 25:71-87 9. Zadeh LA (1965) Fuzzy sets. In: Information and Control 8:338–353
CHAPTER 6 On The Generalization of Fuzzy Rough Approximation Based on Asymmetric Relation Rolly Intan1 and Masao Mukaidono2 1 2
Petra Christian University, Jl. Siwalankerto 121-131, Surabaya 60236, Indonesia.
[email protected] Meiji University, Higashi-mita 1-1-1, Kawasaki-shi, Kanagawa-ken, Japan,
[email protected]
Abstract: An asymmetric relation, called a weak similarity relation, is introduced as a more realistic relation in representing the relationship between two elements of data in a real-world application. A conditional probability relation is considered as a concrete example of the weak similarity relation by which a covering of the universe is provided as a generalization of a disjoint partition. A generalized concept of rough approximations regarded as a kind of fuzzy rough set is proposed and defined based on the covering of the universe. Additionally, a more generalized fuzzy rough approximation of a given fuzzy set is proposed and discussed as an alternative to provide interval-valued fuzzy sets. Their properties are examined. Keywords: Rough Sets, Fuzzy Sets, Fuzzy Rough Sets, Conditional Probability Relations
1 Introduction Rough set theory generalizes classical set theory by allowing an alternative to formulate sets with imprecise boundaries. A rough set is basically an approximate representation of a given crisp set in terms of two subsets, called lower and upper approximations, derived from a crisp partition defined on the universal set involved [9]. In the partition, every element belongs to one equivalence class and two distinct equivalence classes are disjoint. Formally, let U denote a non-empty universal set, and E be an equivalence relation on U. A partition of the universe is referred to as a quotient set expressed by U/E, where [x]R denotes an equivalence class in U/E that contains x ∈ U. A rough set of A ⊆ U is represented by a pair of lower and upper approximations. The lower approximation, apr(A) = {x ∈ U|[x]E ⊆ A}, =
{[x]E ∈ U/E|[x]E ⊆ A},
is the union of all equivalence classes in U/E that are contained in A. The upper approximation, R. Intan and M. Mukaidono: On The Generalization of Fuzzy Rough Approximation Based on Asymmetric Relation, Studies in Computational Intelligence (SCI) 2, 73–88 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
74
R. Intan and M. Mukaidono
/ apr(A) = {x ∈ U|[x]E ∩ A = 0}, =
/ {[x]E ∈ U/E|[x]E ∩ A = 0},
is the union of all equivalence classes in U/E that overlap with A. However, as pointed out in [14, 11], even if it is easy to analyze, the rough set theory built on a partition induced by equivalence relations may not provide a realistic view of relationships between elements in a real-world application. Instead a covering of the universe [14] might be considered as an alternative to provide a more realistic model of rough sets. A covering of the universe, C = {C1 , ...,Cn }, is a family of subsets of a non-empty universe U such that U = {Ci |i = 1, ..., n}. Two distinct sets in C may have a non-empty overlap. An arbitrary element x of U may belong to more than one set in C. The sets in C may describe different types or various degrees of similarity between elements of U. This paper, as an extension of [5], introduces a weak similarity relation with intent to provide a more realistic model of relation in representing relationships between two elements of data. Naturally, a relationship between two elements in a real-world application is not necessarily symmetric and transitive as characterized by the weak similarity relation. Fuzzy conditional probability relations proposed in [2] is regarded as a concrete example of the weak similarity relation. Moreover, weak similarity relation as well as conditional probability relations can be considered as generalizations of the similarity relation proposed in [17]. A covering, called α-coverings of the universe, is then constructed dealing with the conditional probability relations. A generalized concept of rough approximations is introduced and defined based on α-coverings of the universe into two interpretations, element-oriented generalization and similarity-class-oriented generalization. The generalized concept of rough approximations is regarded as a kind of fuzzy rough set. Some symmetric relations can be performed dealing with the (fuzzy) conditional probability relation. Through the symmetric relations, we introduce various formulations of generalized rough approximations, and examine their properties in the relation to the generalized rough approximations induced by the conditional probability relation. Additionally, a more generalized fuzzy rough approximation of a given fuzzy set is proposed and discussed as an alternative to provide interval-valued fuzzy sets from an information system. Their properties are examined.
2 Conditional Probability Relations The similarity relation proposed by Zadeh [17] generalizes equivalence relations for dealing with fuzzy data. A similarity relation on a given universe, U, maps each pair of elements in the universe to an element in the closed interval [0,1]. Definition 1. [17] A similarity relation is a mapping, s : U ×U → [0, 1], such that for x, y, z ∈ U, (a) Reflexivity : s(x, x) = 1, (b) Symmetry : s(x, y) = s(y, x), (c) Max−min transitivity : s(x, z) ≥ maxy∈U min[s(x, y), s(y, z)]. There are some considerable criticisms [12, 11] about the use of similarity relation, especially for the property of max-min transitivity. Therefore, a proximity relation [1] (also called a resemblance relation) is proposed by (just) considering reflexive and symmetric properties. Moreover, it should be mentioned that even symmetry as required in similarity and proximity relations is too strong property to represent relationships between two elements or objects in a real-world application (see [2]). Although it is true to say that if “x is similar to y” then “y is
On The Generalization of Fuzzy Rough Approximation
75
similar to x”, but these two statements might have different degrees of similarity. Hence, we consider a conditional probability relation as a more realistic relation in representing relationships between two elements. Definition 2. A conditional probability relation is a mapping, R : U × U → [0, 1], such that for x, y ∈ U, |x ∩ y| , (1) R(x, y) = P(x | y) = P(y → x) = |y| where R(x, y) means the degree y supports x or the degree y is similar to x. In the definition of conditional probability relations, the probability values may be estimated based on the semantic relationships between two elements by using the epistemological or subjective view of probability theory. The conditional probability relation could be used for calculating the degree of similarity relationships between elements (objects) in the universe U in terms of an information table. When every object in U is represented by a subset of attributes as in the case of a binary information table, we have a simple procedure for estimating the conditional probability relation as shown in Definition 2, where | · | denotes the cardinality of a set. Consider a binary information table given in Table 1, where the set of objects, U = {O1 , O2 , O3 }, is characterized by a set of attributes, At = {a1 , a2 , . . . , a8 }.
Table 1. Binary Information Table Obj O1 O2 O3
a1 0 1 0
a2 0 1 0
a3 1 0 1
a4 0 1 1
a5 1 0 0
a6 0 0 0
a7 0 1 1
a8 0 0 1
As shown in Table 1, O1 = {a3 , a5 }, O2 = {a1 , a2 , a4 , a7 }, and O3 = {a3 , a4 , a7 , a8 }. Therefore, we have: R(O1 , O2 ) = 0, R(O1 , O3 ) = 1/4, R(O2 , O3 ) = 2/4, R(O2 , O1 ) = 0, R(O3 , O1 ) = 1/2, R(O3 , O2 ) = 2/4. The notion of binary information table can be easily generalized to a fuzzy information table by allowing a number in the unit interval [0, 1] for each cell of the table. The number is the degree to which an object has a particular attribute. Each object might be represented as a fuzzy set on the set of attributes. The degree of similarity between two objects represented by two fuzzy sets on the set of attributes can be calculated by a fuzzy conditional probability relation [2, 3]. In this case, |x| = ∑a∈At µx (a), where µx is a membership function of x over a set of attribute At, and intersection is defined by minimum in order to obtain a property of reflexivity, even though some operations of t-norm could be used.
76
R. Intan and M. Mukaidono
Definition 3. Let µx and µy be two fuzzy sets over a set of attribute At for two elements x and y of a universe of elements U. A fuzzy conditional probability relation is defined by: min{µx (a), µy (a)} ∑ R(x, y) = a∈At . (2) ∑a∈At µy (a) It can be easily verified that (fuzzy) conditional probability relation R satisfies properties as shown in Proposition 1. Proposition 1. [6] Let R(x, y) be (fuzzy) conditional probability relation of x given y and R(y, x) be (fuzzy) conditional probability relation of y given x, such that for x, y, z ∈ U, (r1) R(x, y) = R(y, x) = 1 ⇐⇒ x = y, (r2) [R(y, x) = 1, R(x, y) < 1] ⇐⇒ x ⊂ y, (r3) R(x, y) = R(y, x) > 0 =⇒ |x| = |y|, (r4) R(x, y) < R(y, x) =⇒ |x| < |y|, (r5) R(x, y) > 0 ⇐⇒ R(y, x) > 0, (r6) [R(x, y) ≥ R(y, x) > 0, R(y, z) ≥ R(z, y) > 0] =⇒ R(x, z) ≥ R(z, x), (r7) R(x, z) ≥ max(R(x, y) + R(z, y) − 1, 0) ∗ R(y, z)/R(z, y). Proof. (r1) [R(x, y) = |x∩y| |y| = 1 ⇒ x ∩ y = y, R(y, x) = 1 ⇒ x ∩ y = x] =⇒ x = y, (r2) [R(y, x) = 1 ⇒ x ∩ y = x, R(x, y) < 1 ⇒ x ∩ y ⊂ y] =⇒ x ⊂ y, |x∩y| (r3) |x∩y| |y| = |x| > 0 =⇒ |y| = |x|, (r4)
|x∩y| |y|
|x∩y| |x| =⇒ |x| < |y|, [R(x, y) = |x∩y| |y| > 0 ⇒ |x ∩ y| > 0
<
(r5) ⇒ |x∩y| |x| = R(y, x) > 0, (r6) [R(x, y) ≥ c j (y, x) > 0 ⇒ |x| ≥ |y|, R(y, z) ≥ R(z, y) > 0 ⇒ |y| ≥ |z|] =⇒ [|x| ≥ |z| ⇒ R(x, z) ≥ R(z, x)], (r7) R(x, z) ≥ max(R(x, y) + R(z, y) − 1, 0) ∗ R(y, z)/R(z, y) ⇒ x ∩ z ≥ max(x ∩ y + z ∩ y − y, 0), where max(x ∩ y + z ∩ y − y, 0) could be verified as the least possible solution for x and z have intersection through y. From Proposition 1, (r7) follows the formulation of the Lukasiewicz intersection (bounded difference) to perform relationships between x and z through y, and in the relation to the properties (r1), (r5) and (r6), we can define an interesting mathematical relation, called the weak similarity relation, based on their constraints represented by axioms in representing the similarity level of a fuzzy relation as follows. Definition 4. A weak similarity relation is a mapping, s : U × U → [0, 1], such that for x, y, z ∈ U, (a) Reflexivity : s(x, x) = 1, (b) Conditional symmetry : if s(x, y) > 0 then s(y, x) > 0, (c) Conditional transitivity : if s(x, y) ≥ s(y, x) > 0 and s(y, z) ≥ s(z, y) > 0 then s(x, z) ≥ s(z, x).
On The Generalization of Fuzzy Rough Approximation
77
By definition, the similarity relation is regarded as a special case (type) of the weak similarity relation, and (fuzzy) conditional probability relation is a concrete example of the weak similarity relation.
3 Generalized Rough Approximations In this section, we generalize the classical concept of rough approximations based on a covering of the universe induced by the conditional probability relation. Two different interpretations of lower and upper approximations are introduced in the presence of an α-covering of the universe. 3.1 Generalizations Based on The Asymmetric Property First, based on the asymmetric property of conditional probability relations, two asymmetric similarity classes of a particular element x ∈ U as a basis of constructing a covering is given by the following definition. Definition 5. Let U be a non-empty universe, and R be a conditional probability relation on U. For any element x ∈ U, Rαs (x) and Rαp (x) are defined as a set that supports x and a set supported by x, respectively, such that Rαs (x) = {y ∈ U|R(x, y) ≥ α}, Rαp (x) = {y ∈ U|R(y, x) ≥ α},
(3) (4)
where α ∈ (0, 1]. Rαs (x) can also be interpreted as a set that is similar to x. On the other hand, Rαp (x) is a set to which x is similar. The relationship between Rαs (x) and Rαp (x) satisfies the following theorems. Theorem 1. y ∈ Rαp (x), |x| ≥ |y| ⇒ y ∈ Rαs (x), y ∈ Rαs (x),
(5)
|x| ≤ |y| ⇒ y ∈ Rαp (x).
(6)
Proof. Through Bayes’ theorem: P(x|y) = |x|×P(y|x) , (1) in Definition 2 can be ex|y| pressed by: |x| × R(y, x). R(x, y) = |y| Proof of (5): y ∈ Rαp (x) ⇔ R(x, y) ≥ α ⇒ |y| |x|
|x| |y|
× R(y, x) ≥ α ⇒ R(y, x) ≥
|x| ≥ |y| ⇒ × α ≤ α ⇒ R(y, x) ≥ α ⇔ Likewise (6) can be proved.
y ∈ Rαs (x).
|y| |x|
× α,
78
R. Intan and M. Mukaidono
Theorem 2. Let x ∈ U, S1 = Rαp (x) − (Rαp (x) ∩ Rαs (x)) and S2 = Rαs (x) − (Rαs (x) ∩ Rαp (x)). Then, (7) ∀u ∈ S1 , ∀v ∈ S2 , |v| < |u|. Proof. Every u ∈ S1 satisfies R(u, x) > R(x, u), for u ∈ Rαp (x) and u ∈ Rαs (x), such that u ∈ S1 ⇒ R(u, x) > R(x, u) ⇒ |x∩v| |x|
|x∩v| |v|
|x∩u| |x|
>
|x∩u| |u|
⇒ |x| < |u|. Similarly, v ∈ S2 ⇒ R(v, x) <
R(x, v) ⇒ < ⇒ |v| < |x|. Hence, |v| < |x|, |x| < |u| ⇒ |v| < |u|. By reflexivity, it follows that we can construct two different coverings of the universe, {Rαs (x) | x ∈ U} and {Rαp (x) | x ∈ U}. Formally, based on the similarity class of x in Definition 5, the lower and upper approximation operators can be defined into two interpretations of formulations as follows. Definition 6. For a subset A ⊆ U, two pairs of generalized rough approximations are given by: (i) Element-oriented generalization: aprαes (A) = {x ∈ U | Rαs (x) ⊆ A},
aprαes (A)
= {x ∈ U
(8)
/ | Rαs (x) ∩ A = 0}.
(9)
(ii) Similarity-class-oriented generalization: aprαcs (A) = aprαcs (A)
=
{Rαs (x) | Rαs (x) ⊆ A, x ∈ U},
(10)
/ x {Rαs (x) | Rαs (x) ∩ A = 0,
(11)
∈ U}.
In Definition 6(i), the lower approximation consists of those elements in U whose similarity classes are contained in A. The upper approximation consists of those elements whose similarity classes overlap with A. In Definition 6(ii), the lower approximation is the union of all similarity classes that are contained in A. The upper approximation is the union of all similarity classes that overlap with A. Also, another formulation of upper approximation could be considered providing a different result as defined by: aprαas (A) = {Rαs (x) | x ∈ A}. (12) Relationships among these approximations can be represented by: aprαes (A) ⊆ aprαcs (A) ⊆ A ⊆ aprαes (A) ⊆ aprαcs (A), A ⊆ aprαas (A) ⊆ aprαcs (A), where relationship between aprαes (A) and aprαas (A) cannot be figured out. The difference between lower and upper approximations is the boundary region with respect to A: Bndαes (A) = aprαes (A) − aprαes (A), Bndαcs (A)
=
aprαcs (A) − aprαcs (A).
(13) (14)
On The Generalization of Fuzzy Rough Approximation
79
Similarly, one can define rough approximations based on the covering {Rαp (x) | x ∈ U} such as given by aprαep (A), aprαep (A), aprαcp (A), aprαcp (A) and aprαap (A). They may provide different results for the asymmetric property of a conditional probability relation. However, it can be proved using (r5) in Proposition 1 that for α = + + 0+ , R0s (x) = R0p (x), ∀x ∈ U. Therefore the results of generalized rough approximations as shown in Definition 6 will be the same in both coverings, {Rαp (x) | x ∈ U} and {Rαs (x) | x ∈ U}. The pairs of (aprαes , aprαes ) and (aprαcs , aprαcs ) may be interpreted as pairs of set-theoretic operators on subsets of the universe. It is referred to as rough approximation operators [16]. By combining with other set-theoretic operators such as ¬, ∪, and ∩, Definition 6(i) has properties: (e0) aprαes (A) = ¬aprαes (¬A), aprαes (A) = ¬aprαes (¬A), (e1) aprαes (A) ⊆ A ⊆ aprαes (A), / = aprαes (0) / = 0, / (e2) aprαes (0) α (e3) apres (U) = aprαes (U) = U, (e4) aprαes (A ∩ B) = aprαes (A) ∩ aprαes (B), aprαes (A ∩ B) ⊆ aprαes (A) ∩ aprαes (B), (e5) aprαes (A ∪ B) ⊇ aprαes (A) ∪ aprαes (B), aprαes (A ∪ B) = aprαes (A) ∪ aprαes (B), (e6) A = 0/ =⇒ apr0es (A) = U, / (e7) A ⊂ U =⇒ apr0es (A) = 0, β
(e8) α ≤ β =⇒ [apres (A) ⊆ aprαes (A), aprαes (A) ⊆ aprβes (A)], (e9) A ⊆ B =⇒ [aprαes (A) ⊆ aprαes (B), aprαes (A) ⊆ aprαes (B)].
However, lower and upper approximations in Definition 6(ii) show different properties: (c0) aprαcs (A) ⊆ A ⊆ aprαcs (A), / = aprαcs (0) / = 0, / (c1) aprαcs (0) α (c2) aprcs (U) = aprαcs (U) = U, (c3) aprαcs (A ∩ B) ⊆ aprαcs (A) ∩ aprαcs (B), aprαcs (A ∩ B) ⊆ aprαcs (A) ∩ aprαcs (B), (c4) aprαcs (A ∪ B) ⊇ aprαcs (A) ∪ aprαcs (B), aprαcs (A ∪ B) = aprαcs (A) ∪ aprαcs (B), (c5) aprαcs (A) = aprαcs (aprαcs (A)), aprαcs (A) = aprαcs (aprαcs (A)), (c6) A = 0/ =⇒ apr0cs (A) = U, / (c7) A ⊂ U =⇒ apr0cs (A) = 0, β
(c8) α ≤ β =⇒ aprcs (A) ⊆ aprαcs (A), (c9) A ⊆ B =⇒ [aprαcs (A) ⊆ aprαcs (B), aprαcs (A) ⊆ aprαcs (B)].
The property of dual operators is no longer satisfied in Definition 6(ii). On the other hand, property (c5) indicates that iterative operation is not applied in the lower approximation operator. The above properties show almost the same properties, which are also satisfied in classical concept of rough sets, except that they have the additional parameter α and its relation to both of operators, the lower approximation and the upper approx-
80
R. Intan and M. Mukaidono
imation, as shown in properties (e6,c6), (e7,c7), and (e8,c8). In fact, a covering is a generalization of a partition, so that some properties are no longer satisfied. Also, one may define other interpretations of pair approximation operators based on the intersection of the complements of elements as well as the complements of similarity classes [7] as shown in the following equations. (i) Element-oriented generalization: aprαes (A) =
1
/ {U − {x} | Rαs (x) ∩ (U − A) = 0},
aprαes1 (A) =
{U − {x}
/ | Rαs (x) ∩ A = 0}.
(15) (16)
(ii) Similarity-class-oriented generalization: aprαcs (A) = 1
/ {U − Rαs (x) | Rαs (x) ∩ (U − A) = 0},
aprαcs1 (A) =
/ {U − Rαs (x) | Rαs (x) ∩ A = 0}.
(17) (18)
Related to the approximation operators as defined in Definition 6 (based on union of both elements and similarity classes), we can prove aprαes (A) = aprαes (A) ⊆ A, aprαcs (A) ⊆ aprαcs (A) ⊆ A, 1
A ⊆ aprαes (A) = aprαes1 (A),
1
A ⊆ aprαcs1 (A) ⊆ aprαcs (A).
In element-oriented generalization, lower and upper approximation operators based on both union and intersection are exactly the same. However, in similarity classoriented generalization, aprαcs (A) is a better lower approximation than aprαcs (A), but 1 aprαcs1 (A) is a better upper approximation than aprαcs (A). Here, we cannot verify the relation between aprαes (A) and aprαcs (A) as well as aprαes1 (A) and aprαcs1 (A). Sim1 1 ilarly, one may use Rαp (x) to define approximation operators denoted by aprαep (A), 1 aprαep1 (A), aprαcp (A) and aprαcp1 (A), as similar to (15)-(18). The rough approxima1 tions as defined in the previous definitions may be regarded as a kind of fuzzy rough set based on the asymmetric property. 3.2 Generalizations Based on The Symmetric Property Still in the relation to the conditional probability relation, the symmetric property can be provided using these relations: R∧ (x, y) = min(R(x, y), R(y, x)), R∨ (x, y) = max(R(x, y), R(y, x)), |x ∩ y| , R∗ (x, y) = |x ∪ y|
(19) (20) (21)
where R(x, y) is a conditional probability relation x given y. It can be verified that R∧ (x, y), R∨ (x, y) and R∗ (x, y) do not satisfy the transitive property; that is, they satisfy properties of the proximity relation (also called the resemblance relation). Some properties of those relations are given in Proposition 2.
On The Generalization of Fuzzy Rough Approximation
81
Proposition 2. For x, y, z ∈ U, (t1) R∧ (x, x) = 1, R∨ (x, x) = 1, R∗ (x, x) = 1, (t2) R∧ (x, y) = R∧ (y, x), R∨ (x, y) = R∨ (y, x), R∗ (x, y) = R∗ (y, x), (t3) R∧ (x, y) = 1 ⇐⇒ x = y, (t4) R∗ (x, y) = 1 ⇐⇒ x = y, (t5) R∗ (x, y) ≤ R∧ (x, y) ≤ β ∈ {R(x, y), R(y, x)} ≤ R∨ (x, y), (t6) R∧ (x, z) ≥ max(R∧ (x, y) + R∧ (y, z) − 1, 0), (t7) R∗ (x, z) ≥ max(R∗ (x, y) + R∗ (y, z) − 1, 0). Proof. Properties of (t1) to (t5) can be easily verified, so their proofs are omitted. Proof of (t6) is similar to the proof of (r7) in Proposition 1; to minimize the intersection between x and z through y, it should be considered that R∧ (x, y) = R(x, y) and R∧ (z, y) = R(z, y) ⇒ |y| ≥ |x| and |y| ≥ |z|. From (6), it can be proved that: |x ∩ y| + |y ∩ z| − |y| |x ∩ z| ≥ max( , 0), |y| |y| |y| ≥ |x|, |y| ≥ |z| ⇒ R∧ (x, z) ∈ {
|x ∩ z| |x ∩ z| |x ∩ z| |x ∩ z| ≥ , ≥ , |x| |y| |z| |y|
|x ∩ z| |x ∩ z| |x ∩ y| + |y ∩ z| − |y| , } ⇒ R∧ (x, z) ≥ max( , 0). |x| |z| |y|
Proof of (t7): Similarly, to have a least possible intersection between x and z through y, y is assumed to be equal to U (universe). Lukasiewicz intersection :
|x| |z| |x ∩ z| ≥ max( + − 1, 0), |U| |U| |U|
y = U ⇒ |x| = |x ∩ y|, |z| = |z ∩ y|, |U| = |x ∪ y| = |z ∪ y| ⇒ |x ∩ y| |y ∩ z| |x ∩ z| ≥ max( + − 1, 0), |U| |x ∪ y| |y ∪ z| |U| ≥ |x ∪ z| ⇒
|x ∩ y| |y ∩ z| |x ∩ z| ≥ max( + − 1, 0), |x ∪ z| |x ∪ y| |y ∪ z|
R∗ (x, z) ≥ max(R∗ (x, y) + R∗ (y, z) − 1, 0) is proved. Since the transitive property is not satisfied, a covering of the universe is still induced by the similarity classes of those relations, which are defined by: Rα∧ (x) = {y ∈ U | R∧ (x, y) ≥ α},
Rα∨ (x) = {y ∈ U | R∨ (x, y) ≥ α}, Rα∗ (x) = {y ∈ U | R∗ (x, y) ≥ α},
(22) (23) (24)
where α ∈ (0, 1]. In relation to Definition 5, Rα∧ (x) and Rα∨ (x) satisfy the following theorem.
82
R. Intan and M. Mukaidono
Theorem 3. For every x ∈ U: Rα∧ (x) = Rαs (x) ∩ Rαp (x), Rα∨ (x) = Rαs (x) ∪ Rαp (x).
(25) (26)
Proof. From (25): Rαs (x) ∩ Rαp (x) = {y ∈ U | R(x, y) ≥ α and R(y, x) ≥ α} = {y ∈ U | min(R(x, y), R(y, x)) ≥ α} = {y ∈ U | R∧ (x, y) ≥ α} = Rα∧ (x). Similarly, proof of (26), Rαs (x)∪Rαp (x) = {y ∈ U | R(x, y) ≥ α or R(y, x) ≥ α} = {y ∈ U | max(R(x, y), R(y, x)) ≥ α} = {y ∈ U | R∨ (x, y) ≥ α} = Rα∨ (x). Also, (t5) in Proposition 2 implies the relation of those similarity classes such as given by: (27) Rα∗ (x) ⊆ Rα∧ (x) ⊆ X ∈ {Rαs (x), Rαp (x)} ⊆ Rα∨ (x).
That is, {Rα∗ (x) | x ∈ U} provides the finest covering; conversely, {Rα∨ (x) | x ∈ U} provides the coarsest covering. Two pairs of generalized rough approximations based on two interpretations as given in Definition 6 can be defined on {Rα∗ (x) | x ∈ U}. (i) Element-oriented generalization: aprαe∗ (A) = {x ∈ U | Rα∗ (x) ⊆ A},
aprαe∗ (A)
= {x ∈ U
(28)
/ | Rα∗ (x) ∩ A = 0}.
(29)
(ii) Similarity-class-oriented generalization: aprαc∗ (A) = aprαc∗ (A) =
{Rα∗ (x) | Rα∗ (x) ⊆ A, x ∈ U},
(30)
/ x ∈ U}. {Rα∗ (x) | Rα∗ (x) ∩ A = 0,
(31)
Also, aprαa∗ (A), aprαe∗ (A), aprαe∗1 (A), aprαc∗ (A) and aprαc∗1 (A) could be defined cor1 1 responding to (12), (15)-(18). Similarly, aprαe∧ (A), aprαe∧ (A), aprαc∧ (A), aprαc∧ (A), aprαa∧ (A), aprαe∧ (A), aprαe∧1 (A), aprαc∧ (A) and aprαc∧1 (A) are defined on {Rα∧ (x) | 1 1 x ∈ U}; aprαe∨ (A), aprαe∨ (A), aprαc∨ (A), aprαc∨ (A), aprαa∨ (A), aprαe∨ (A), aprαe∨1 (A), 1 aprαc∨ (A) and aprαc∨1 (A) are defined on {Rα∨ (x) | x ∈ U}. 1
It follows that all rough approximations defined above satisfy some properties as a consequence of (27). (p1) aprαe∨ (A) ⊆ aprαe∧ (A) ⊆ aprαe∗ (A), (p2) aprαe∗ (A) ⊆ aprαe∧ (A) ⊆ aprαe∨ (A), (p3) aprαc∨ (A) ⊆ aprαc∧ (A) ⊆ aprαc∗ (A),
(p4) aprαc∗ (A) ⊆ aprαc∧ (A) ⊆ aprαc∨ (A), (p5) aprαa∗ (A) ⊆ aprαa∧ (A) ⊆ aprαa∨ (A),
(p6) aprαe∨ (A) ⊆ aprαe∧ (A) ⊆ aprαe∗ (A), 1
1
1
(p7) aprαe∗1 (A) ⊆ aprαe∧1 (A) ⊆ aprαe∨1 (A), (p8) aprαc∨ (A) ⊆ aprαc∧ (A) ⊆ aprαc∗ (A), 1
1
1
(p9) aprαc∗1 (A) ⊆ aprαc∧1 (A) ⊆ aprαc∨1 (A).
On The Generalization of Fuzzy Rough Approximation
83
Moreover, in the relation to the generalized rough approximations based on asymmetric property, they satisfy the following proposition. Proposition 3. (f1) aprαe∧ (A) ⊇ aprαes (A) ∪ aprαep (A), (f2) aprαe∧ (A) ⊆ aprαes (A) ∩ aprαep (A), (f3) aprαc∧ (A) ⊇ aprαcs (A) ∩ aprαcp (A), (f4) aprαc∧ (A) ⊆ aprαcs (A) ∩ aprαcp (A), (f5) aprαe∨ (A) = aprαes (A) ∩ aprαep (A), (f6) aprαe∨ (A) = aprαes (A) ∪ aprαep (A), (f7) aprαc∨ (A) ⊆ aprαcs (A) ∪ aprαcp (A), (f8) aprαc∨ (A) ⊇ aprαcs (A) ∪ aprαcp (A),
(f9) aprαa∧ (A) = aprαas (A) ∩ aprαap (A), (f10) aprαa∨ (A) = aprαas (A) ∪ aprαap (A), (f11) aprαe∧ (A) ⊇ aprαes (A) ∪ aprαep (A), 1
1
1
(f12) aprαe∧1 (A) ⊆ aprαes1 (A) ∩ aprαep1 (A), (f13) aprαc∧ (A) ⊇ aprαcs (A) ∪ aprαcp (A), 1
1
1
1
1
1
(f14) aprαe∨ (A) = aprαes (A) ∩ aprαep (A), (f15) aprαe∨1 (A) = aprαes1 (A) ∪ aprαep1 (A), (f16) aprαc∨ (A) ⊆ aprαcs (A) ∩ aprαcp (A), 1
1
1
(f17) aprαc∨1 (A) ⊇ aprαcs1 (A) ∩ aprαcp1 (A). Proof. Axioms in Proposition 3 are proved as follows: Possibly, ∃x, Rαs (x) ⊆ A or Rαp (x) ⊆ A, but Rα∧ (x) ⊆ A. / but Rα∧ (x) ∩ A = 0. / Possibly, ∃x, Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0, Proof is similar to (f1). Proof is similar to (f2). ∀x, Rαs (x) ⊆ A and Rαp (x) ⊆ A ⇒ Rα∨ (x) ⊆ A. / ∀x, Rαs (x) ∩ A = 0/ or Rαp (x) ∩ A = 0/ ⇒ Rα∨ (x) ∩ A = 0. Rαs (x) ⊆ A or Rαp (x) ⊆ A ⇒ Rα∨ (x) ⊆ A. Rαs (x) ∩ A = 0/ or Rαp (x) ∩ A = 0/ ⇒ Rα∨ (x) ∩ A = 0/ ⇒ M ∈ {Rαs (x), Rαp (x), Rαs (x) ∪ Rαp (x)} ⊆ Rα∨ (x) = Rαs (x) ∪ Rαp (x). (f9) Proof is given by (25) in Theorem 3. (f10) Proof is given by (26) in Theorem 3. (f11) Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∧ (x) ∩ (U − A) = 0/ ⇒ U ∪ U = U. Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∧ (x) ∩ (U − A) = 0/ ⇒ (U − {x}) ∪ U = U. Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∧ (x) ∩ (U − A) = 0/ ⇒ (U − {x}) ∪ (U − {x}) ⊆ M ∈ {U,U − {x}}. (f1) (f2) (f3) (f4) (f5) (f6) (f7) (f8)
84
R. Intan and M. Mukaidono
(f12) Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0/ ⇒ Rα∧ (x) ∩ A = 0/ ⇒ (U − {x}) ∩ (U − {x}) = U − {x}. Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0/ ⇒ Rα∧ (x) ∩ A = 0/ ⇒ (U − {x}) ∩U = U − {x}. But, Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0/ ⇒ Rα∧ (x) ∩ A = 0/ ⇒ U ∩ U ⊇ M ∈ {U,U − {x}}. (f13) Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∧ (x) ∩ (U − A) = 0/ ⇒ (U − Rαs (x)) ∪ (U − Rαp (x)) ⊆ U − Rα∧ (x). Also, Rαs (x) ∩ (U − A) = 0/ or Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∧ (x) ∩ (U − A) = 0/ ⇒ (U − Rαs (x)) ∪ (U − Rαp (x)) ⊆ U − Rα∧ (x). (f14) Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∨ (x) ∩ (U − A) = 0/ ⇒ (U − {x}) ∩ (U − {x}) = U − {x}. Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∨ (x) ∩ (U − A) = 0/ ⇒ (U − {x}) ∩ U = U − {x}. Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∨ (x) ∩ (U − A) = 0/ ⇒ U ∩U = U. (f15) Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0/ ⇒ Rα∨ (x) ∩ A = 0/ ⇒ (U − {x}) ∪ (U − {x}) = U − {x}. Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0/ ⇒ Rα∨ (x) ∩ A = 0/ ⇒ (U − {x}) ∪U = U. Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0/ ⇒ Rα∨ (x) ∩ A = 0/ ⇒ U ∪U = U. (f16) Rαs (x) ∩ (U − A) = 0/ and Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∨ (x) ∩ (U − A) = 0/ ⇒ (U − Rαs (x)) ∩ (U − Rαp (x)) = U − Rα∨ (x). However, Rαs (x) ∩ (U − A) = 0/ or Rαp (x) ∩ (U − A) = 0/ ⇒ Rα∨ (x) ∩ A = 0/ ⇒ (U − Rαs (x)) ∩ (U − Rαp (x)) ⊇ U − Rα∨ (x). (f17) Rαs (x) ∩ A = 0/ and Rαp (x) ∩ A = 0/ ⇒ Rα∨ (x) ∩ A = 0/ ⇒ (U − Rαs (x)) ∩ (U − Rαp (x)) = U − Rα∨ (x). However, Rαs (x) ∩ A = 0/ or Rαp (x) ∩ A = 0/ ⇒ Rα∨ (x) ∩ A = 0/ ⇒ (U − Rαs (x)) ∩ (U − Rαp (x)) ⊆ U − Rα∨ (x). Almost all combinations of relationship between rough approximations based on symmetric and asymmetric properties can be verified as given in Proposition 3, except for the relationship between aprαc∧1 (A) and pair of aprαcs1 (A) and aprαcp1 (A). In / it does not imply Rα∧ (x) ∩ A = 0. / Hence, case of Rαs (x) ∩ A = 0/ or Rαp (x) ∩ A = 0, α α it might be (U − Rs (x)) ∪ (U − R p (x)) ⊆ U − Rα∧ (x) as well as (U − Rαs (x)) ∪ (U − Rαp (x)) ⊇ U − Rα∧ (x).
4 Generalized Fuzzy Rough Set Covering of the universe as a generalization of a disjoint partition constructed by similarity classes as defined in Definition 5 is considered as a crisp covering. Both crisp covering and disjoint partition are regarded as crisp granularity. Here, crisp covering can be generalized to fuzzy covering. In this case, crisp covering can be constructed by applying an α-level set of the fuzzy covering. Fuzzy covering might be regarded as a case of fuzzy granularity in which similarity classes as a basis of constructing the covering are fuzzy sets as defined by: Definition 7. Let U be a non-empty universe, and R be a (fuzzy) conditional probability relation on U. For any element x ∈ U, Rs (x) and R p (x) are regarded as fuzzy sets and defined as the set that supports x and the set supported by x, and called s-fuzzy similarity class of x and p-fuzzy similarity class of x, respectively as given by: µRs (x) (y) = R(x, y), y ∈ U,
(32)
µR p (x) (y) = R(y, x), y ∈ U,
(33)
On The Generalization of Fuzzy Rough Approximation
85
where µRs (x) (y) and µR p (x) (y) are grades of membership of y in Rs (x) and R p (x), respectively. Now, by considering that a given set A be a fuzzy set on U instead of a crisp set and the covering of the universe be a fuzzy covering constructed by fuzzy similarity classes in Definition 7 instead of a crisp covering, a concept of generalized fuzzy rough approximations can be defined as shown in the following definition. Definition 8. Let U be a non-empty universe, Rs (x) be s-fuzzy similarity class of x, and A be a given fuzzy set on U: (i) Element-oriented generalization:
µapr (A) (x) = inf min(µRs (x) (y), µA (y)) , (34) es y∈U
µapres (A) (x) = sup min(µRs (x) (y), µA (y)) . (35) y∈U
(ii) Similarity-class-oriented generalization, for y ∈ U:
µaprm (A) (y) = inf inf min(µRs (x) (z), µA (z)) , cs ν z∈U
µaprM (A) (y) = sup inf min(µRs (x) (z), µA (z)) , cs z∈U ν
µaprmcs (A) (y) = inf sup min(µRs (x) (z), µA (z)) , ν z∈U
µaprMcs (A) (y) = sup sup min(µRs (x) (z), µA (z)) , ν
(36) (37) (38) (39)
z∈U
where ν = {x ∈ U|µRs (x) (y) > 0}, for short. µapr (A) (x) and µapres (A) (x) are grades of membership of x in apres (A) and apres (A), es respectively. Similarly, µapr∗ (A) (y) and µapr∗cs (A) (y) are grades of membership of y in cs apr∗cs (A) and apr∗cs (A), respectively (Note: ∗ ∈ {m, M}). Using p-similarity classes, similarly Definition 8 will provide µapr µ
aprm (A) cp
(x), µ
aprm cp (A)
ep
(A) (x), µaprep (A) (x),
(x), µaprM (A) (x) and µaprMcp (A) (x). cp
Since µRs (x) (y) = µR p (y) (x) as shown in Definition 7, we may represent Definition 8 by using R p as follows: (i) Element-oriented generalization: µapr (A) (x) = inf min(µR p (y) (x), µA (y)) , (40) es y∈U (41) µapres (A) (x) = sup min(µR p (y) (x), µA (y)) . y∈U
(ii) Similarity-class-oriented generalization, for y ∈ U:
86
R. Intan and M. Mukaidono
µ (y) = inf inf min(µR p (z) (x), µA (z)) , ψ z∈U
, µaprM (A) (y) = sup inf min(µR p (z) (x), µA (z)) aprm (A) cs
ψ
cs
z∈U
, µaprmcs (A) (y) = inf sup min(µR p (z) (x), µA (z)) ψ z∈U
µaprMcs (A) (y) = sup sup min(µR p (z) (x), µA (z)) , ψ
(42) (43) (44) (45)
z∈U
where ψ = {x ∈ U|µR p (y) (x) > 0}, for short. Obviously, apres (A) and apres (A) as well as apr∗cs (A) and apr∗cs (A) are fuzzy sets, where we have, ∀y ∈ U, µapr
es
(A) (y) ≤ µA (y) ≤ µapres (A) (y),
µaprm (A) (y) ≤ µaprM (A) (y) ≤ µA (y) ≤ µaprmcs (A) (y) ≤ µaprMcs (A) (y). cs
cs
Moreover, the relationship between element-oriented generalization and similarityclass-oriented generalization is represented by, µapr
es
(A) (y) ≤ µaprM (A) (y), cs
µapres (A) (y) ≤ µaprMcs (A) (y), where relationship between µapr (A) (y) and µaprm (A) (y) as well as relationship bees cs tween µapres (A) (y) and µaprmcs (A) (y) cannot be queried. Also, the pairs of µapr (A) (y), µapres (A) (y), µaprm (A) (y), µaprmcs (A) (y) and es cs µaprM (A) (y), µaprMcs (A) (y) are regarded as lower and upper membership functions of cs y in A given on s-fuzzy similarity class. Lower and upper membership functions are the bounds of an interval value characterized by an interval-valued fuzzy set, and all pairs of lower and upper membership functions can be combined to represent a fuzzy set of type 2. In this case, Definition 8 shows an alternative to obtain an interval-valued fuzzy set from an information system via a generalized fuzzy rough approximation of a fuzzy set. Let A be defined as an interval-valued fuzzy set given an ordinary fuzzy set A. For y ∈ U, µes A (y) = [µapr
es
(A) (y), µapres (A) (y)],
m M (y)], µcs (y)]. µcs A (y) = [µaprm (A) (y), µaprm A (y) = [µaprM (A) (y), µaprM cs (A) cs (A) cs
cs
A fuzzy set of type 2 represented by a membership function, ϒ : U → F ([0, 1]), where F ([0, 1]) is a fuzzy power set of [0, 1], could be defined based on element-oriented generalization and similarity-class-oriented generalization by: ϒAes (y) = {
1 β , }, (A) (y) µA (y) µapres (A) (y)
β µapr
es
,
On The Generalization of Fuzzy Rough Approximation
ϒAcs (y) = {
87
β2 1 β2 β1 β1 , , , , }, µaprm (A) (y) µaprM (A) (y) µA (y) µaprmcs (A) (y) µaprMcs (A) (y) cs
cs
where β, β1 , β2 ∈ (0, 1], β1 ≤ β2 . Both ϒAes (y)(µA (y)) and ϒAcs (y)(µA (y)) should be equal to 1 representing the most accurate membership function of y in A, and its membership function will decrease when difference between the value and µA (y) is greater (represented by β1 ≤ β2 ). As discussed in the last section, symmetric relations defined in (19)-(21) can be used to provide various formulations of the generalized fuzzy rough approximations as proposed in this section. Relationship between the generalized fuzzy rough approximations based on the conditional probability relation (regarded as a asymmetric relation) and the generalized fuzzy rough approximations based on the symmetric relations would give interesting properties and contributions in the relation to fuzzy granularity. We will discuss this issue in future work.
5 CONCLUSIONS A notion of asymmetric relation, called the weak similarity relation was introduced. Two examples of such relations, conditional probability and fuzzy conditional probability, were suggested for construction and interpreting coverings of the universe. Based on such covering, rough approximations were generalized based on elementoriented generalization and similarity-class-oriented generalization. Such generalization was considered as a type of fuzzy rough set. Some symmetric relations were proposed dealing with the (fuzzy) conditional probability relation. Through symmetric relations, we introduced various formulations of generalized rough approximations, and examined their properties in relation to the generalized rough approximations induced by the conditional probability relation. Additionally, a notion of a generalized fuzzy rough set given a fuzzy set was proposed and discussed as an alternative to provide interval-valued fuzzy sets as well as a fuzzy set of type 2 from an information system.
References 1. Dubois D, Prade H (1990) Rough Fuzzy Sets and Fuzzy Rough Sets. Intern. J. of General Systems 17(2-3): 191-209 2. Intan R, Mukaidono M (2000) Conditional Probability Relations in Fuzzy Relational Database. Proceedings of RSCTC’00, LNAI 2005, Springer-Verlag, 251-260 3. Intan R, Mukaidono M, Yao Y.Y (2001) Generalization of Rough Sets with α-coverings of the Universe Induced by Conditional Probability Relations. Proceedings of International Workshop on Rough Sets and Granular Computing, 173-176 4. Intan R, Mukaidono M (2002) Generalization of Rough Membership Function based on α-Coverings of the Universe. Proceedings of AFSS’02, LNAI 2275, Springer-Verlag, 129-135
88
R. Intan and M. Mukaidono
5. Intan R, Mukaidono M (2002) Generalization of Fuzzy Rough Sets By Fuzzy Covering Based On Weak Fuzzy Similarity Relation. Proceeding of Fuzzy Sets and Knowledge Discovery 2002, 344-348 6. Intan R, Mukaidono M (2002) Generalized Fuzzy Rough Sets By Conditional Probability Relations. International Journal of Pattern Recognition and Artificial Intelligence 16(7), World Scientific, 865-881 7. Inuiguchi M, Tanino T (2001) On Rough Sets under Generalized Equivalence Relations. Proceedings of International Workshop on Rough Sets and Granular Computing, 167-171 8. Komorowski J, Pawlak Z, Polkowski L, Skowron A (1999) ‘Rough Sets: A Tutorial. In: Pal S.K, Skowron A (eds) Rough Fuzzy Hybridization. Springer, 3-98 9. Klir G.J, Yuan B (1995) Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall, New Jersey 10. Pawlak Z (1982) Rough Sets. International Journal Computation & Information Science 11: 341-356 11. Slowinski R, Vanderpooten D (2000) A Generalized Definition of Rough Approximations Based on Similarity. IEEE Transactions on Knowledge and Data Engineering 12(2): 331-336 12. Tversky A (1977) Features of Similarity. Psychological Rev. 84(4): 327-353 13. Yamaguchi Y., Intan R., Emoto M., Mukaidono M. (2003) Generalization of Rough Sets Using Active and Passive Relations. Proceeding of Intech 2003, 539-544 14. Yao Y.Y, Zhang J.P (2000) Interpreting Fuzzy Membership Functions in the Theory of Rough Sets. Proceedings of RSCTC’00, LNAI 2005, Springer-Verlag, 82-89 15. Yao Y.Y (1996) Two View of the Theory of Rough Sets in Finite Universe. International Journal of Approximate Reasoning 15: 291-317 16. Yao Y.Y (1998) A comparative study of fuzzy sets and rough sets. International Journal of Information Science 109: 227-242 17. Zadeh L.A (1970) Similarity Relations and Fuzzy Orderings. Inform. Sci.3(2): 177-200
CHAPTER 7 A new approach for the fuzzy shortest path problem Tzung-Nan Chuang1 and Jung-Yuan Kung2 1
Department of Merchant Marine, National Taiwan Ocean University Department of Information Management, Chinese Naval Academy
2
Abstract: Many researchers have paid much attention to the fuzzy shortest path problem since it is central to many applications. In this problem, the fuzzy shortest length and the corresponding shortest path are useful information for decision makers. In this paper, we propose a new approach that can obtain the important information. First, we propose a heuristic procedure to find the fuzzy shortest length among all possible paths in a network. It is based on the idea that a crisp number is a minimum number if and only if any other number is larger than or equal to it. It owns a firm theoretic base in fuzzy sets theory and can be implemented effectively. Secondly, we propose a way to measure the similarity degree between the fuzzy shortest length and each fuzzy path length. The path with the highest similarity degree is the shortest path. An illustrative example is given to demonstrate our proposed approach.
1. INTRODUCTION In a network, the arc length may represent time, or cost. The shortest path problem has received lots of attention from researchers in past decades because it is important to many areas such as communication, transportation, scheduling, and routing. In practical situations it is reasonable to assume that each arc length is a fuzzy set. There were several methods reported to solve the shortest-path problem in the open literature [1-10]. Dubois and Prade [1] first treated the fuzzy shortest problem. In their method, the shortest length can be found, but there may not exist a corresponding actual path in the network. To overcome this shortcoming, Kelin [6] proposed a dynamical programming recursion based fuzzy algorithm that specified each arc length with an integer value between 1 and a fixed number. However, such a constraint seems impractical in real applications. On the other hand, Yager [4] introduced the idea of possibilistic production system, which was composed of states and was possibilistic production rules necessary to traverse between states, to investigate a path problem. The concept of possibility was utilized for traversing between states and similar to that of uncertainty introduced in [15]. He developed a general algorithm to find a path with maximal possibility from an initial state (or node) to a goal state (or node) and called it a path of least resistance. In [10], Okada and Soper proposed a fuzzy algorithm, which was based on multiple labeling method [2,3] for solving a multi-criteria shortest path problem, to offer non-dominated paths to a decision maker. In order to reduce the number of nondominated paths, they [10] also used the concept of the possibility level h 䯴 the degree of optimism of a decision maker䯵 for order relation which was introduced by [13] and easily T.-N. Chuang and J.-Y. Kung: A new approach for the fuzzy shortest path problem, Studies in Computational Intelligence (SCI) 2, 89–100 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
90
T.-N. Chuang and J.-Y. Kung
implemented for computer computation by [14]. Nevertheless, the fuzzy path length that corresponds to the obtained shortest path via such an algorithm is not necessarily the actual shortest length. In this paper, to avoid the problems stated above, we propose an approach that can obtain the fuzzy shortest length and the corresponding shortest path. This approach is composed of two parts. First, we propose a heuristic procedure to find the fuzzy shortest length among all possible paths in a network. This procedure can yield a permissible solution and is meaningful, rational, computationally efficient, and general in nature. It is based on the idea that a crisp number is a minimum number if and only if any other number is larger than or equal to it. It owns a firm theoretic base in fuzzy sets theory and can be implemented effectively. Secondly, we propose a way to measure the similarity degrees between the fuzzy shortest length and each fuzzy path lengths. The path with the highest similarity degree is the shortest path.
2. FUZZY SHORTEST LENGTH HEURISTIC PROCEDURE In this section, we design a procedure to determine the fuzzy shortest path length. If the path length is crisp, lmin(prv) is defined as the minimum path length among all given path lengths l(prv). That is, a number (l(prv)) is the minimum number (lmin(prv)) if and only if any other number is greater than or equal to it. This idea is generalized to the fuzzy shortest path length Lmin(prv). Assume there are m paths pirv from node r to v in a network, i = 1, 2, …, m. L(pirv) is ith fuzzy path length. We want to determine Lmin(prv). For the following derivation, L(pirv) and Lmin(prv) is replaced by Li and Lmin, respectively, for case of description. Let µL1(x) denote the membership function for the possible time x with respect to L1. As noted for crisp value, x is the minimum if any other number is greater than or equal to x. In other words, if x with respect to L1 is the shortest time, x does exist and the other time with respect to Lk (k z 1) smaller than x does not exist. This idea is extended to determine Lmin. We define SL1 as the fuzzy set that the possible time x with respect to L1 is the shortest one. Let µSL1(x) be the membership function for x with respect to SL1. Therefore, µSL1(x)= µL1(x) µL’2(x) µL’3(x) … µL’m(x) = min[µL1(x), µL’2(x), µL’3(x),…, µL’m(x)] = min[µL1(x),
min [µL’k(x)]]
(1)
k z1
Formula (1) can also be expressed as follows䰆 SL1 = L1 ŀ L’2 ŀ…ŀ L’m = L1 ŀ (
L’k)
(2)
k z1
where L’k denotes the fuzzy set that the other time y with respect to Lk (k z 1) smaller than x does not exist, µL’k(x) is the membership function for x with respect to L’k and the minimum operator is used for the intersection of two fuzzy sets. Further, µL’k(x) can be detremined as y x
µL’k(x) = 1 - Max µLk(y)(3) yLK
Assume the triangular membership function of Lk is represented as (ak, bk, ck). The following two cases may exist for calculatingµL’k(x):
A new approach for the fuzzy shortest path problem
91
y x
Case (a): x is at the right of bk. For this case, µL’k(x) = 1 - max µLk(y) = 1µLk(bk) = yLK
y x
11.0 = 0.0. Case (b): x is at the left of bk. For this case, µL’k(x) = 1 - max µLk(y) = yLK
1µLk(x). Therefore, for a triangular membership function of Lk = (ak, bk, ck), called a half-inverse fuzzy set and can be found as follows:
, x ! bk 0 ½ ° ° x a x b 1 P ( ) , d d ® Lk k k¾ °1 ° , x ak ¯ ¿
P L' ( x ) k
µL’k(x) is
(4)
Substitution of (4) into (1) can yieldµSL1(x). Similarly, we can get the membership functionµSLk(x).Let µLmin(x) be the membership function for x with respect to Lmin. It can be determined as follows. µLmin(x) = µSL1(x) µSL2(x) µSL3(x) … µSLm(x) m
=
max µSLt(x) t 1 m
= max [min[µLt(x), t 1
min µL’k(x)]]. k zt
(5)
According to the above derivation, the fuzzy shortest path length procedure can be performed as follows.
Fuzzy shortest path length procedure Input: m triangular fuzzy sets, namely, L1, L2, …, Lm. Output: The fuzzy shortest path length Lmin. Step 1: For each fuzzy set Lk, k = 1 to m, find its half-inverse membership function L’k. Step 2: For each fuzzy set Lt, t = 1 to m, find: µSLt(x) = min[µLt(x), min µL’k(x)]. k zt
m
Step 3: SetµLmin(x) =
max µSLt(x). t 1
Step 4: Normalize Lmin where Lmin is then output as the fuzzy shortest path length. Although the fuzzy shortest path length procedure stated above can be used to find the fuzzy shortest path length, the computation associated with it is somewhat complicated. We hope to reduce the computational complexity and membership-function complexity. First, we discuss the fuzzy shortest path length between two fuzzy path lengths. We simply want to obtain three points to describe the new intermediate results, with the b point representing the highest membership function. Take two fuzzy path lengths L1 and L2 for an example.
There are 24 possible combinations of shapes for L1 and L2. According to the fuzzy shortest path length procedure, we can find the fuzzy shortest path length for each case.
92
T.-N. Chuang and J.-Y. Kung
The fuzzy shortest path length for the other cases can be similarly derived. Summarizing the 24 cases, we can derive a formula for finding fuzzy shortest path length as follows. The fuzzy shortest path length formula on two fuzzy path lengths
Let two fuzzy path length L1 = (a1, b1, c1) and L2 = (a2, b2, c2). Then the fuzzy shortest path length Lmin = (a, b, c) can be determined by if min(b1 , b2 ) d max(a1 , a 2 ) min(b1 , b2 ), ° (b u b ) ( a u a ) 1 2 ® 1 2 , if min(b1 , b2 ) ! max(a1 , a 2 ) °¯ (b1 b2 ) ( a1 a 2 ) a = min (a1, a2) c = min [min (c1, c2), max (b1, b2)] b
If there are more than two fuzzy path lengths, we can design the procedure based on the above formula. First, we form the set Q by sorting Li in ascending order of bi. Then, we find the fuzzy shortest path length from the first and second fuzzy path length in Q using the above formula. Next, we find the fuzzy shortest path length from the previous obtained fuzzy shortest path length and the third fuzzy path length in Q using the same formula. By repeating the step, the final fuzzy shortest path length can be found. According to the above derivation, the fuzzy shortest path length heuristic procedure can be performed as follows: Fuzzy shortest length heuristic procedure
Input: Li = (ai, bi, ci), i = 1, 2, 3 ….., m, where Li denotes the triangular fuzzy length. Output: Lmin = (a, b, c) where Lmin denotes the fuzzy shortest length. Step 1: Form the set Q by sorting Li in ascending order of bi; Q = {Q1, Q2, Q3, …, Qm} where Qi = (a’i, b’i, c’i), i = 1, 2, ….., m. Step 2: Set Lmin = (a, b, c) = Q1 = (a’1, b’1, c’1). Step 3: Let i = 2. Step 4: Calculate
b
b; if b d a i' ° ' ' ® (b u bi ) ( a u a i ) ' ° (b b ' ) ( a a ' ) ; if b ! a i i i ¯
a = min (a, a’i); c = min (c, b’i).; Step 5: Set Lmin = (a, b, c). Step 6: Set i = i+1. Step 7: Repeat Step 4 to Step 6 until i = m +1.
A new approach for the fuzzy shortest path problem
93
Remark: In Step 1, if there are more than two Lis with the same bi, then the corresponding sorting order for Li can be made arbitrarily and it will not affect the result as we use this procedure. It can be easily seen that the heuristic procedure is very simple and needs only a light computational load. Example 1: It is supposed that there are five fuzzy arc lengths which are L1 = (201,
248, 281), L2 = (199, 253, 292), L3 = (148, 195, 227), L4 = (186, 234, 259), L5 = (174, 222, 245), respectively. This procedure can be executed as follows: Step 1: Q1=(177, 195, 256 ) = L3, Q2 = (160, 222, 235 ) = L5, Q3 = (159,234,249 ) = L4, Q4 = (201, 248, 271 ) = L1, Q5 = (196, 253, 282 ) = L2, Step 2: Set Lmin = (a, b, c) = Q1 = (a’1, b’1, c’1) =(177, 195, 256). Step 3: Let i =2. Step 4: Calculate (a, b, c). b = 195 > a’2 = 160, ?b
(195 u 222 ) (177 u 160 ) (195 222 ) (177 160 )
187 .125 ;
a = min(177,160) = 160 ; c = min(256,222) = 222. Step 5: Set Lmin = (160, 187.125 ,222). Step 6: Set i = i+1=2+1=3 . Step 7: Repeat Step 4 to Step 6 until i = 6. After the above procedure is executed, the shortest length is (159, 179.65, 222).
94
T.-N. Chuang and J.-Y. Kung
3. A NEW APPROACH FOR THE FUZZY SHORTEST PATH PROBLEM Several approaches had been presented to treat the fuzzy shortest path problem and appropriately provide decision makers with the shortest path [1-10]. In this section, we intend to propose a new approach, which consists of a new fuzzy shortest length heuristic procedure and a way for determining the similarity degree between two fuzzy lengths, to find the fuzzy shortest length and offer a corresponding shortest path for decision makers. The fuzzy shortest length heuristic procedure has already been presented in Section 2 and an illustration has also been given in Example 1. Now, we want to show a way to measure the similarity degree between two fuzzy lengths. Based on the idea that the larger the intersection area of two triangles is, the higher the similarity degree between them is, we use the intersection area of two triangular fuzzy sets to measure the similarity degree between them. Let the i-th fuzzy path length Li = (ai, bi, ci) and the fuzzy shortest length Lmin = (a, b, c). Then the similarity degree si between Li and Lmin can be calculated as
si
areaLi Lmin 0; if Li Lmin ) ° (1) c ai 2 ® ° 2>c b b a @ ; if Li Lmin z ) i i ¯
It should be noted that a, b, and c is always smaller than or equal to ai, bi, and ci, respectively. Integrating the new fuzzy shortest length heuristic procedure and the way for finding the similarity degree between two triangular fuzzy sets, we propose the following fuzzy shortest path approach:
An approach for fuzzy shortest path problem Input: A directed network of n nodes with fuzzy arc lengths necessary to traverse from one node to another node. Output: The fuzzy shortest length and the corresponding shortest path. Step 1: Form the possible paths from source node to destination node and compute the corresponding path lengths Li = (ai, bi, ci), i = 1, 2, ..., m, for possible m paths. Step 2: Find the Lmin = (a, b, c) by using The fuzzy shortest length heuristic procedure. Step 3: Find the similarity degree si between Lmin and Li, i = 1, 2,..., m. Step 4: Determine the actual shortest path with the highest similarity degree si. Some remarks concerning the above approach are worth mentioning as below: (R1): For an easier demonstration of our proposed approach, in step 1 we form the
possible paths directly instead of utilizing multiple labeling method [2,3,12] which can also be applied to our approach and is capable of reducing the computational load.
A new approach for the fuzzy shortest path problem
95
(R2): In step 3, one can also use similarity measures [13-14] to measure the simi-
larity degree between fuzzy sets if one does not choose our proposed way as a measuring one. Next, one example is provided to illustrate our proposed approach and shown as below. Example 2: A classical network with fuzzy arc lengths is shown in Figure 1.
(56, 58, 72) (33, 45, 50) 1
2
(36, 38, 47)
4
(51, 79, 85)
(88, 92, 134)
(32, 40, 46)
6
(42, 57, 61) 3
5
(43, 55, 60)
(75, 110, 114)
Fig. 1. Classical Network. Let rn denote the arc length from node r to n. In Fig.11, suppose that 12 = (33, 45, 50), 13 = (42, 57, 61), 23 = (36, 38, 47), 24 = (56, 58, 72), 25 = (51, 79, 85), 35 = (43, 55, 60), 45 = (32, 40, 46), 46 = (88, 92, 134), and 56 = (75, 110, 114), then the possible paths are 12-3-5-6 with length L1 = (187, 248, 271), 1-2-4-5-6 with L2 = (196, 253, 282), 1-2-4-6 with length L3 = (177, 195, 256), 1-2-5-6 with L4 = (159, 234, 249), and 1-3-5-6 with L5 = (160, 222, 235), respectively. In Step 2, we can obtain the shortest length Lmin = (159, 179.65, 222) through the proposed fuzzy shortest length heuristic procedure, which is illustrated in Example 1. Next, substituting Lmin and Li into (1), one can get the similarity degree s1 = 16.78, s2 = 18.42, s3 = 16.91, s4 = 2.47, and s5 = 3.40. Finally, in step 4, we choose 1-3-5-6 as the shortest path since the corresponding length L5 (= Q2 ) has the highest similarity degree (=18.42) to the fuzzy shortest length Lmin.
96
T.-N. Chuang and J.-Y. Kung
4. CONCLUSIONS The fuzzy shortest length and the corresponding shortest path are useful information for decision makers in a fuzzy shortest path problem. In this paper, we proposed the fuzzy shortest length heuristic procedure that can find the fuzzy shortest length among all possible paths in a network. It is based on the idea that a crisp number is a minimum number if and only if any other number is larger than or equal to it. In addition, we proposed a way to decide the shortest path that corresponds to the fuzzy shortest length. It is based on the calculation of the similarity degree between the fuzzy shortest length and each fuzzy path lengths. The approach that combines the above methods can provide the fuzzy shortest length and the corresponding shortest path for the decision makers. Some illustrative examples are included to demonstrate our proposed methods.
5. REFERENCES 1. Dubois D, Prade H (1980) Theory and Applications. Fuzzy Sets and Systems, Academic Press, New York 2. Hansen P (1980) Bicriterion path problems. In:. Beckmann M, Kunzi H.P (Eds.), Multiple Criteria Decison Making: Theory and Applications, Lecture Note in Economics and in Mathematical Systems, vol. 177, Springer, Berlin, pp. 109-127 3. Martins EQV (1984) On a multicriteria shortest path problem. Eur. J. Oper. Res. 16: 236-245 4. Yager RR (1986) Paths of least resistance on possibilistic production systems. Fuzzy Sets and Systems 19: 121-132 5. Broumbaugh-Smith J, Shier D (1989) An empirical investigation of some bicriterion shortest path algorithm. Eur J Oper Res 43: 216-224. 6. Klein CM (1991) Fuzzy shortest paths. Fuzzy Sets and Systems 39: 27-41 7. Lin K, Chen M (1994) The fuzzy shortest path problem and its most vital arcs. Fuzzy Sets and Systems 58: 343-353 8. Okada S, Gen M (1994) Fuzzy shortest path problem. Comput. Indust. Eng. 27: 465-468 9. Henig MI (1994) Efficient interactive methods for a class of multiattribute shortest path problems. Management Science 40 (7): 891-897 10.Okada S, Soper T (2000) A shortest path problem on a network with fuzzy lengths. Fuzzy Sets and Systems 109: 129-140 11.Hyung LK, Song YS and Lee KM (1994) Similarity measure between fuzzy sets and between elements. Fuzzy Sets and Systems 62: 291-293 12.Wang WJ (1997) New similarity measures on fuzzy sets and on elements. Fuzzy Sets and Systems 85:305-309 13.Tanaka H, Ichihashi H, Asai K (1984) A formulation of fuzzy linear programming problem based on comparison of fuzzy numbers. Control and Cybernetics 13: 185-194 14.Ramik J, Rimanek J (1985) Inequality relation between fuzzy numbers and its use in fuzzy optimization. Fuzzy Sets and Systems 16: 123-138 15.Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets ans Systems 1:3-28.
A new approach for the fuzzy shortest path problem
97
Appendix: 24 possible combinations of shapes for L1 and L2 L1
L2
(1)
a2
b2
c2
a1
L2
(2)
b1
a2
c1
L
L
2
a1
c2
b1
c1
Lmin = (a, b, c) = (a2, b2, c2)
Lmin = (a, b, c) = (a2, b2, c2) (3)
b2
L1
(4)
L2
L1
b2
c 2 b1
1
bL1
cL1
a2
a1
f b2
c2
b1
c1
a1
Lmin = (a, b, c) = (a2, f, c2) L2
(5)
a2
b2
L1
a1
b1
Lmin = (a, b, c) = (a2, b2, b1)
c1
Lmin = (a, b, c) = (a1, f, c2) (6)
c2
a2 f
c1
L2
a2
b2
L1
a1
b1
c1
Lmin = (a, b, c) = (a2, b2, b1)
c2
T.-N. Chuang and J.-Y. Kung
98
(7)
a2
L2
L1
a1 f b2
b1
(8)
a1
c2 c1
a2
L2
L1
f b2
b1
c2 c1
Lmin = (a, b, c) = (a1, f, b1)
Lmin = (a, b, c) = (a2, f, b1) L1
(9)
L2
L2
(10)
L1
aL2
a2 a1 f
b2
b1
c1 c2
a1 a2 f
Lmin = (a, b, c) = (a2, f, b1) L2
(11)
a1 f
b2 b1
b1
c1
Lmin = (a, b, c) = (a1, f, b1) L1
(12)
L1
a2
b2
L2
c2
c1
Lmin = (a, b, c) = (a2, f, b1)
a1
a2 f b2 b1
c2
Lmin = (a, b, c) = (a1, f, b1)
c1
c2
A new approach for the fuzzy shortest path problem
L2
(13)
a2
a1
f
b1b2
L1
(14)
L1
c1
L2
c2
a1
Lmin = (a, b, c) = (a2, f, b2)
(15)
a2
a1
L1
L2
f b1
b2
(16)
c2
a1
c1
b1 b2
c1
c2
a2 f
L1
L2
b1
b2
c2
c1
Lmin = (a, b, c) = (a1, f, b2)
L2
(17)
a2
a2 f
Lmin = (a, b, c) = (a1, f, b2)
Lmin = (a, b, c) = (a2, f, b2) L1
99
L1
L2
b1
b2
(18)
a1 f
b1
b2
Lmin = (a, b, c) = (a2, f, b2)
c1 c2
a1
a2 f
Lmin = (a, b, c) = (a1, f, b2)
c1 c2
T.-N. Chuang and J.-Y. Kung
100
L1
(19)
L2
L1
(20)
L2
aL2
a1
b1
a2
b2
c2
c1
a1
b1 a2
Lmin = (a, b, c) = (a1, b1, b2)
a2
a1
f b1
c1 b2
c2
a1
b1 a2
c1
b2
c2
L2
a1 a2
f b1
c1 b2
c2
Lmin = (a, b, c) = (a1, f, c1)
L2
L1
L1
(22)
Lmin = (a, b, c) = (a2, f, c1)
(23)
c1
Lmin = (a, b, c) = (a1, b1, b2)
L2
L1
(21)
b2
(24)
c2
Lmin = (a, b, c) = (a1, b1, c1)
a1
L1
b1
L2
c1 a2
b2
Lmin = (a, b, c) = (a1, b1, c1)
c2
CHAPTER 8 Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning Eulalia Szmidt and Janusz Kacprzyk Systems Research Institute, Polish Academy of Sciences ul. Newelska 6, 01–447 Warsaw, Poland E-mail: {szmidt, kacprzyk}@ibspan.waw.pl Abstract: In this article we propose the use of intuitionistic fuzzy sets (Atanassov [2]) as a tool for reasoning under imperfect facts and imprecise knowledge, particularly via distances between intuitionistic fuzzy sets. We consider two issues: (1) a method to evaluate a degree (extent) of agreement (meant here as a distance from consensus) in a group of experts (individuals), and (2) a new approach to supporting medical diagnosis based on a reasoning scheme using a distance between intuitionistic fuzzy sets. Keywords: intuitionistic fuzzy sets, distances between intuitionistic fuzzy sets, medical databases, medical diagnosis
1 Introduction Intuitionistic fuzzy sets (Atanassov [1], [2]), because of an additional degree of freedom in comparison with fuzzy sets (Zadeh [22]), can be viewed as their generalization. The additional degree of freedom makes it possible to better model imperfect information which is omnipresent in reality, notably in most reasoning schemes. First, we propose a method to evaluate a degree (extent) of agreement (meant as a degree of consensus) in a group of experts (individuals) when individual testimonies are intuitionistic fuzzy preference relations, as opposed to traditional fuzzy preference relations commonly employed. For a comprehensive review of group decision making and (soft) measures of consensus under fuzzy preferences and fuzzy majority, see Kacprzyk and Nurmi [5]. Basically, in those works, the point of departure is a set of individual fuzzy preference relations which associate with each pair of options a number from [0, 1]. So, we have a set of n options, S = {s1 , . . . , sn }, and a set of m individuals, I = {1, . . . , m}. Each individual k provides his or her individual fuzzy preference relation µRk : S × S → [0, 1], conveniently represented by a matrix [rikj ] such that rikj = µRk (si , s j ); i, j = 1, . . . , n; k = 1, . . . , m; [rikj ] + [rkji ] = 1. The elements
E. Szmidt and J. Kacprzyk: Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning, Studies in Computational Intelligence (SCI) 2, 101–116 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
102
E. Szmidt and J. Kacprzyk
of the matrix 0 < rikj < 1 are such that the highe r the preference o f individual k of si over s j the higher rikj : from rikj = 0 indicating a definite preference s j over si , through rikj = 0.5 indicating indifference between si and s j , to rikj = 1 indicating a definite preference si over s j . In Szmidt [8], Szmidt and Kacprzyk [11, 13, 14, 17], Kacprzyk and Szmidt [16] the use of intuitionistic fuzzy sets for formulating and solving group decision problems, and for the determination of soft measures of consensus was considered. The starting point of this approach is the use of intuitionistic fuzzy preferences instead of fuzzy preferences. In effect, each individual k provides his or her (individual) intuitionistic fuzzy preference relation, giving not only Rk , (µRk ), but also Πk – so-called hesitation margins, πk : S × S → [0, 1], conveniently represented by a matrix [πkij ]; i, j = 1, . . . , n; k = 1, . . . , m. Such a representation of individual preferences (taking into account another degree of freedom, i.e. hesitation margins) makes it possible to express the hesitation of individuals, and leads to finding out a degree of soft consensus – in this article using distances between preferences of individuals. Second, a medical database is considered. Employing intuitionistic fuzzy sets, we can simply and adequately express a hesitation concerning the objects considered – both patients and illnesses, and render two important facts. First, values of symptoms change for each patient as, e.g., temperature goes up and down, pain increases and decreases, etc. Second, for different patients suffering from the same illness, values of the same symptom can often be different. Our task, i.e. to find an illness for each patient, is fulfilled by looking for the smallest distance [cf. Szmidt and Kacprzyk [12, 15]] between symptoms that are characteristic for a patient and symptoms describing the illnesses considered.
2 Brief introduction to intuitionistic fuzzy sets As opposed to a fuzzy set in X(Zadeh [22]), given by
A = {< x, µA (x) > |x ∈ X}
(1)
where µA (x) ∈ [0, 1] is the membership function of the fuzzy set A , an intuitionistic fuzzy set (Atanassov [1], [2]) A is given by A = {< x, µA (x), νA (x) > |x ∈ X}
(2)
where: µA : X → [0, 1] and νA : X → [0, 1] such that 0<µA (x) + νA (x)<1
(3)
and µA (x), νA (x) ∈ [0, 1] denote a degree of membership and a degree of nonmembership of x ∈ A, respectively. Obviously, each fuzzy set may be represented by the following intuitionistic fuzzy set
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning
A = {< x, µA (x), 1 − µA (x) > |x ∈ X}
103
(4)
For each intuitionistic fuzzy set in X, we will call πA (x) = 1 − µA (x) − νA (x)
(5)
an intuitionistic fuzzy index (or a hesitation margin) of x ∈ A and, it expresses a lack of knowledge of whether x belongs to A or not (cf. Atanassov [2]). It is obvious that 0<πA (x)<1, for each x ∈ X. Here is the definition of basic relations and operations on intuitionistic fuzzy sets: Definition 1. (Atanassov, 1999) For every two intuitionistic fuzzy sets A and B the following relations and operations can be defined (“iff” means “if and only if”): A ⊂ B iff (∀x ∈ E)(µA (x)<µB (x)&νA (x) ≥ νB (x)),
(6)
A ⊃ B iff B ⊂ A, A = B iff (∀x ∈ E)(µA (x) = µB (x)&νA (x) = νB (x)), AC = {x, νA (x), µA (x)|x ∈ E},
(7) (8) (9)
A ∩ B = {x, min(µA (x), µB (x)), max(νA (x), νB (x))|x ∈ E} A ∪ B = {x, max(µA (x), µB (x)), min(νA (x), νB (x))|x ∈ E}
(10) (11)
A + B = {x, µA (x) +µB (x) − µA (x).µB (x), νA (x).νB (x), | x ∈ E}
(12)
A.B = {x, µA (x).µB (x), νA (x) +νB (x) − νA (x).νB (x) | x ∈ E}
(13)
The relations (6), (7) and (8) defined above are analogous to the relations of inclusion and equality in the ordinary fuzzy set theory. Here it is also true that, for every two intuitionistic fuzzy sets A and B, A ⊂ B and B ⊂ A iff A = B.
(14)
Applications of intiutionistic fuzzy sets to group decision making, negotiations and other situations are presented in Szmidt [8], Szmidt and Kacprzyk [10], [11], [13], [16], [20], [21]. 2.1 Distances between intuitionistic fuzzy sets Our point of departure to calculating distances is a geometrical interpretation of intuitionistic fuzzy sets shown in Figure 1 (cf. Szmidt [8], Szmidt and Baldwin [9], Szmidt and Kacprzyk [12], [15]) to be meant as follows. As the membership function, non-membership function, and hesitation margin are from the interval [0, 1],
104
E. Szmidt and J. Kacprzyk
x
X
D 0,0,1
x' C 0,0,0
A 1,0,0
B 0,1,0
Fig. 1. A geometrical interpretation of intuitionistic fuzzy sets
we can imagine a unit cube with three edges given by these parameters. Because of the condition µ(x) + ν(x) + π(x) = 1, the values of the parameters characterizing an intuitionistic fuzzy set can belong to the triangle ABD only. In other words, ABD triangle represents a surface where coordinates of any element belonging to an intuitionistic fuzzy set can be represented. Points A and B represent crisp elements. Point A(1, 0, 0) - represents elements fully belonging to an intuitionistic fuzzy set. Point B(0, 1, 0) represents elements fully not belonging to an intuitionistic fuzzy set. Point D(0, 0, 1) represents elements about which we are not able to say if they belong or not belong to an intuitionistic fuzzy set (the intuitionistic fuzzy index, or hesitation, is π(x) = 1). Segment AB (where µ(x) + ν(x) = 1, i.e. π(x) = 0) represents elements belonging to the classical fuzzy sets. Remark: for the above geometrical interpretation we used letters A and B when speaking about points representing crisp elements belonging to any intuitionistic fuzzy sets. The same letters are used in the article when we speak about any two intuitionistic fuzzy sets but it should not lead to a confusion as each time we give an exact meaning of the symbols used. Employing the above geometrical representation, we can calculate distances between any two intuitionistic fuzzy sets A and B containing n elements (see Szmidt [8], Szmidt and Kacprzyk [15] as, e.g.: • the normalized Hamming distance:
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning
lIFS (A, B) =
105
1 n ∑ (|µA (xi ) − µB (xi )| + |νA (xi ) − νB (xi )| + 2n i=1
+ |πA (xi ) − πB (xi )|).
(15)
• the normalized Euclidean distance: eIFS (A, B) =
1 n ∑ ((µA (xi ) − µB (xi ))2 + (νA (xi ) − νB (xi ))2 + 2n i=1 1
+ (πA (xi ) − πB (xi ))2 ) 2
(16)
For (15) and (16), there holds, respectively: 0
(17)
0<eIFS (A, B)<1
(18)
In Szmidt [8], Szmidt and Kacprzyk [15] it is shown why when calculating distances between intuitionistic fuzzy sets it is expedient to take into account all the three parameters (degrees of membership, non-membership and hesitation) describing intuitionistic fuzzy sets although distances based on two parameters only (degrees of membership and non-membership) fulfil all necessary conditions to be proper distances. The main reason is that when taking into account two parameters only, for elements from classical fuzzy sets (which are a special case of intuitionistic fuzzy sets) we obtain distances from a different interval than for elements belonging to intuitionistic fuzzy sets. It practically makes it impossible to consider by the same formula the two types of sets. In the case of the Hamming distance the situation can be overcome as the difference is removed by multiplying results for fuzzy sets (or for intuitionistic fuzzy sets) by a proper constant value. Theoretically it is simple but for large sets of data it would be troublesome - to verify each time the type of data (fuzzy or intuitionistic fuzzy), to multiply by a constant (or not), and to be able to compare the obtain results. We can illustrate the problem in a case of the normalized Hamming distance in the following way. As we can see (Figure 1), each side of the considered triangle is of the same length, i.e. AB = BD = AD. But when using two parameters only when calculating distances between two intuitionistic fuzzy sets A and B i.e., using the following general formula lIFS (A, B) =
1 n ∑ (|µA (xi ) − µB (xi )| + |νA (xi ) − νB (xi )|) 2n i=1
(19)
we obtain the following distances between elements which parameters are represented by the points (Figure 1) A(1, 0, 0), B(0, 1, 0), and D(0, 0, 1). 1 lIFS (A, B) = (|1 − 0| + |0 − 1|) = 1 2
(20)
106
E. Szmidt and J. Kacprzyk
1 lIFS (A, D) = (|1 − 0| + |0 − 0|) = 2 1 lIFS (B, D) = (|0 − 0| + |1 − 0|) = 2
1 2 1 2
(21) (22)
so lIFS (A, D) = lIFS (A, B) = lIFS (B, D)
(23)
Only when using formula (15) with all three parameters we obtain 1 lIFS (A, B) = (|1 − 0| + |0 − 1| + |0 − 0|) = 1 2
(24)
1 lIFS (A, D) = (|1 − 0| + |0 − 0| + |0 − 1|) = 1 (25) 2 1 lIFS (B, D) = (|0 − 0| + |1 − 0| + |0 − 1|) = 1 (26) 2 which means that the condition lIFS (A, D)=lIFS (A, B)=lIFS (B, D) is fulfilled. As it was shown above, even for the linear distances (the Hamming distance) it is more convenient to use all the three parameters when calculating distances for both types of data - fuzzy sets and intuitionistic fuzzy sets. In the case of nonlinear distances (e.g., the Euclidean distance) the situation is even more complicated there is not a simple way to obtain results of calculating distances (using two parameters only) which could belong for fuzzy sets and intuitionistic fuzzy sets to the same interval (to compare them). The only way out is to use the formulas with all the three parameters what makes it possible to obtain results belonging to the same interval - both for fuzzy sets and intuitionistic fuzzy sets. For more details we refer the interested reader to Szmidt [8], Szmidt and Kacprzyk [15]. In our further considerations we will use both the normalized Hamming distance and the normalized Euclidean distance between intuitionistic fuzzy sets.
3 Analysis of agreement in a group of experts expressed as distances between intuitionistic fuzzy preferences We will use the concept of distance between intuitionistic fuzzy sets to analyse the extent of agreement between experts, i.e. to say if all of considered pairs of expert’s preferences are: • just the same (i.e. full agreement meaning consensus in a traditional sense – a distance between preferences is equal to 0), • quite opposite (i.e. full disagreement or dissensus – a distance between preferences is equal to 1), • different to some extent (which means that a distance from consensus is from interval (0, 1)).
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning
107
Preferences given by each individual are expressed via intuitionistic fuzzy sets (describing intuitionistic fuzzy preferences). Since the calculation of distances between intuitionistic fuzzy sets involves all three parameters, we start from a set of data which consists of three types of matrices describing individual preferences. The first type of matrices is the same as for classical fuzzy sets, i.e. the [rikj ]’s given by each individual k concerning each pair of options si , s j (for simplicity, we will denote this i j). But, additionally, it is necessary to take into account hesitation margins [πkij ] and non-membership functions [νkij ]. In general, the extent of agreement for two experts k1 , k2 considering n options is given in a case of • the normalized Hamming distance as
k1 ,k2 = lIFS
1 n−1 n 1 n−1 n li, j = ∑ ∑ (| µi j (k1 ) − µi j (k2 ) | + ∑ ∑ A i=1 j=i+1 A i=1 j=i+1
+ | νi j (k1 ) − νi j (k2 ) | + | πi j (k1 ) − πi j (k2 ) |) where A=
1 1 = 2Cn2 n(n − 1)
(27)
(28)
• the normalized Euclidean distance as
1 ,k2 ekIFS =
1 n−1 n 1 n−1 n ei, j = ∑ ∑ ((µA (xi ) − µB (xi ))2 + ∑ ∑ A i=1 j=i+1 A i=1 j=i+1 1
+ (νA (xi ) − νB (xi ))2 + (πA (xi ) − πB (xi ))2 ) 2
(29)
where A is given by (28). When we have m experts, we pairwise examine their agreement (27) and next, we find an agreement of all experts lIFS =
m−1 m 1 k p ,kr ∑ ∑ lIFS m(m − 1) p=1 r=p+1
(30)
m−1 m 1 k p ,kr ∑ ∑ eIFS m(m − 1) p=1 r=p+1
(31)
k ,k
p r where lIFS is given by (27) or
eIFS = k ,k
p r where eIFS is given by (29)
Example 1. For 3 individuals (m = 3) considering 3 options (n = 3), let the individual intuitionistic fuzzy preference relations be:
108
E. Szmidt and J. Kacprzyk
− .1 .5 − .9 .4 − 0 .1 µ1 (i, j) = .9 − .5 ν1 (i, j) = .1 − .3 π1 (i, j) = 0 − .2 .4 .3 − .5 .5 − .1 .2 − − .1 .5 − .9 .2 − 0 .3 µ2 (i, j) = .9 − .5 ν2 (i, j) = .1 − .2 π2 (i, j) = 0 − .3 .2 .2 − .5 .5 − .3 .3 − − .2 .1 − .8 .2 − 0 .7 µ3 (i, j) = .8 − .6 ν3 (i, j) = .2 − .3 π3 (i, j) = 0 − .1 .2 .3 − .1 .6 − .7 .1 − p,r
p,r
First, we calculate distances li, j and ei, j between preferences given by each pair (p, r) of experts for each pair of options (i, j) 1,2 =0 l1,2
1,2 l1,3 = 0.2
1,2 l2,3 = 0.1
1,2 e1,2 =0
(32)
1,2 e1,3 = 0.2
(33)
1,2 e2,3 = 0.22
(34)
To find out the agreement concerning all options, we aggregate the above values for experts k = 1, 2 to 1 (0 + 0.2 + 0.1) = 0.1 3 1 = (0 + 0.2 + 0.22) = 0.14 3
1,2 = lIFS
(35)
e1,2 IFS
(36)
The same calculations are done for the pairs of experts 1, 3, and 2, 3 1,3 l1,2 = 0.1
1,3 l1,3 = 0.6
1,3 l2,3 = 0.1
1,3 e1,2 = 0.1
(37)
1,3 e1,3 = 0.53
(38)
1,3 e2,3 = 0.1
(39)
1 (0.1 + 0.6 + 0.1) = 0.27 3 1 = (0.1 + 0.53 + 0.1) = 0.24 3
1,3 lIFS =
(40)
1,3 eIFS
(41)
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning
109
2,3 l1,2 = 0.1
2,3 e1,2 = 0.1
(42)
2,3 = 0.4 l1,3
2,3 e1,3 = 0.4
(43)
2,3 e2,3 = 0.17
(44)
2,3 = 0.2 l2,3
1 (0.1 + 0.4 + 0.2) = 0.23 (45) 3 1 2,3 eIFS (46) = (0.1 + 0.4 + 0.17) = 0.22 3 From (30), (35), (40), (45), (for lIFS ), and (36), (41), (46) (for eIFS ) we obtain the distance from a full agreement between considered three experts 2,3 lIFS =
1 (0.1 + 0.27 + 0.23) = 0.2 (47) 3 1 eIFS = (0.14 + 0.24 + 0.22) = 0.2 (48) 3 In our example an agreement of the group of experts is rather strong (we can say that a distance from consensus is equal to 0.2, while the biggest distance from consensus is given by value 1). lIFS =
Moreover, the presented method makes it possible to take into account that some experts can be more important than others – proper weights for pairs of individuals can be included in (30) and (31).
4 Medical diagnostic reasoning via distances for intuitionistic fuzzy sets Before we present the usefulness of distances for intuitionistic fuzzy sets in the case of medical diagnostic reasoning, we recall another method (having its roots in Sanchez’ [6], [7] approach). It will make it possible to emphasise advantages of the reasoning via distances. 4.1 An intuitionistic fuzzy set based approach to medical diagnosis due to De, Biswas and Roy [4] By following the reasoning of De, Biswas and Roy [4] (which is an extension of Sanchez’ approach [6], [7]), we will now consecutively recall their approach to medical diagnosis via intuitionistic fuzzy sets, or to be more precise – via intuitionistic fuzzy relations that in effect boils down to applying the max-min-max composition [3].
110
E. Szmidt and J. Kacprzyk
Definition 2. [4] Let X and Y be two sets. An intuitionistic fuzzy relation R from X to Y is an intuitionistic fuzzy set of X ×Y characterized by the membership function µR and non-membership function νR . An intuitionistic fuzzy relation R from X to Y will be denoted by R(X → Y ). Definition 3. [4] If A is an intuitionistic fuzzy set of X, the max − min − max composition of the intuitionistic fuzzy relation R(X → Y ) with A is an intuitionistic fuzzy set B of Y denoted by B = R ◦ A, and is defined by the membership function µR◦A (y) =
[µA (x) ∧ µR (x, y)]
(49)
[νA (x) ∨ νR (x, y)]
(50)
x
and the non-membership function νR◦A (y) =
x
where
= max and
∀y ∈ Y = min.
Definition 4. [4] Let Q(X → Y ) and R(Y → Z) be two intuitionistic fuzzy relations. The max − min − max composition R ◦ Q is the intuitionistic fuzzy relation from X to Z defined by the membership function µR◦Q (x, z) =
[µQ (x, y) ∧ µR (y, z)]
(51)
[νQ (x, y) ∨ νR (y, z)]
(52)
y
and the non-membership function νR◦Q (x, z) =
y
∀(x, z) ∈ X × Z and ∀y ∈ Y. The approach presented by De, Biswas and Roy [4] involves the following three steps: • determination of symptoms, • formulation of medical knowledge expressed by intuitionistic fuzzy relations, and • determination of diagnosis on the basis of a composition of intuitionistic fuzzy relations. A set of n patients is considered. For each patient pi , i = 1, . . . , n, a set of symptoms S is given. As a result, an intuitionistic fuzzy relation Q is given from the set of patients to the set of symptoms S. Next, it is assumed that another intuitionistic fuzzy relation R is given – from a set of symptoms S to the set of diagnoses D. The composition T of intuitionistic fuzzy relations R and Q describes the state of a patient given in terms of a membership
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning
111
function, µT (pi , dk ), and a non-membership function, νT (pi , dk ), for each patient pi and each diagnosis dk . The functions are calculated in the following way [4]: µT (pi , dk ) =
[µQ (pi , s) ∧ µR (s, dk )]
(53)
[νQ (pi , s) ∨ νR (s, dk )]
(54)
s∈S
and νT (pi , dk ) = where
= max and
s∈S
= min.
Example 2. [4] Let there be four patients: Al, Bob, Joe and Ted, i.e. P = {Al, Bob, Joe, Ted}. The set of symptoms considered is S = {temperature, headache, stomach pain, cough, chest pain}. The intuitionistic fuzzy relation Q(P → S) is given in Table 1. Table 1. The intuitionistic fuzzy relation Q(patients → symptoms) Q
Temperature Headache Stomach pain Al (0.8, 0.1) (0.6, 0.1) (0.2, 0.8) Bob (0.0, 0.8) (0.4, 0.4) (0.6, 0.1) Joe (0.8, 0.1) (0.8, 0.1) (0.0, 0.6) Ted (0.6, 0.1) (0.5, 0.4) (0.3, 0.4)
Cough (0.6, 0.1) (0.1, 0.7) (0.2, 0.7) (0.7, 0.2)
Chest pain (0.1, 0.6) (0.1, 0.8) (0.0, 0.5) (0.3, 0.4)
Let the set of diagnoses be D = {Viral fever, Malaria, Typhoid, Stomach problem, Chest problem}. The intuitionistic fuzzy relation R(S → D) is given in Table 2. Table 2. The intuitionistic fuzzy relation R(symptoms → diagnosis) R
Viral f ever Temperature (0.4, 0.0) Headache (0.3, 0.5) Stomach pain (0.1, 0.7) Cough (0.4, 0.3) Chest pain (0.1, 0.7)
Malaria Typhoid Stomach problem (0.7, 0.0) (0.3, 0.3) (0.1, 0.7) (0.2, 0.6) (0.6, 0.1) (0.2, 0.4) (0.0, 0.9) (0.2, 0.7) (0.8, 0.0) (0.7, 0.0) (0.2, 0.6) (0.2, 0.7) (0.1, 0.8) (0.1, 0.9) (0.2, 0.7)
Chest problem (0.1, 0.8) (0.0, 0.8) (0.2, 0.8) (0.2, 0.8) (0.8, 0.1)
Therefore, the composition T (53)–(54) is given in Table 3. But as the max-min-max composition was used when looking for T , ”dominating” symptoms were in fact only taken into account. So, in the next step an improved version of R is calculated for which the following holds [4]:
112
E. Szmidt and J. Kacprzyk Table 3. The intuitionistic fuzzy relation T (patients → diagnosis) T
Viral f ever Al (0.4, 0.1) Bob (0.3, 0.5) Joe (0.4, 0.1) Ted (0.4, 0.1)
Malaria Typhoid Stomach problem (0.7, 0.1) (0.6, 0.1) (0.2, 0.4) (0.2, 0.6) (0.4, 0.4) (0.6, 0.1) (0.7, 0.1) (0.6, 0.1) (0.2, 0.4) (0.7, 0.1) (0.5, 0.3) (0.3, 0.4)
Chest problem (0.2, 0.6) (0.1, 0.7) (0.2, 0.5) (0.3, 0.4)
• SR = µR − νR πR is the greatest, and • equations (53)–(54) are retained. Effects of the presented improvements [4] are given in Table 4. Table 4. The intuitionistic fuzzy relation SR = µR − νR πR SR
Viral Malaria Typhoid Stomach Chest f ever problem problem Al 0.35 0.68 0.57 0.04 0.08 Bob 0.20 0.08 0.32 0.57 0.04 Joe 0.35 0.68 0.57 0.04 0.05 Ted 0.32 0.68 0.44 0.18 0.18
It seems that the approach proposed in [4] has some drawbacks. First, the maxmin-max rule alone (Table 3) does not give a solution. To obtain a solution, the authors [4] proposed some changes in medical knowledge R(S → D). But it is difficult to justify the proposed changes as medical knowledge R(S → D) is (at least should be) based on many cases, and knowledge of physicians, so it is difficult to understand sudden, arbitral changes in it. Next, the type of changes: SR = µR − νR πR means that the membership function describing relation R (medical knowledge) is weakened. But, in the idea of intuitionistic fuzzy sets there is nowhere assumed that the membership function can decrease because of the hesitation margin or the non-membership function. The hesitation margin (or part of it) can be split between the membership and non-membership functions (so, in fact, it can be added to, not substracted from the membership function). Summing up, the proposed improvements, although leading to some solutions, are difficult to understand because of arbitral (both from practical (doctors’ knowledge) and theoretical (theory of intuitionistic fuzzy sets)) points of view.
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning
113
Table 5. Symptoms characteristic for the diagnoses considered
Viral f ever Temperature (0.4, 0.0, 0.6) Headache (0.3, 0.5, 0.2) Stomach pain (0.1, 0.7, 0.2) Cough (0.4, 0.3, 0.3) Chest pain (0.1, 0.7, 0.2)
Malaria
Typhoid
(0.7, 0.0, 0.3) (0.2, 0.6, 0.2) (0.0, 0.9, 0.1) (0.7, 0.0, 0.3) (0.1, 0.8, 0.1)
(0.3, 0.3, 0.4) (0.6, 0.1, 0.3) (0.2, 0.7, 0.1) (0.2, 0.6, 0.2) (0.1, 0.9, 0.0)
Stomach problem (0.1, 0.7, 0.2) (0.2, 0.4, 0.4) (0.8, 0.0, 0.2) (0.2, 0.7, 0.1) (0.2, 0.7, 0.1)
Chest problem (0.1, 0.8, 0.1) (0.0, 0.8, 0.2) (0.2, 0.8, 0.0) (0.2, 0.8, 0.0) (0.8, 0.1, 0.1)
4.2 A new approach to medical diagnostic reasoning via distances for intuitionistic fuzzy sets To solve the same problem as in [4], but without manipulations in medical knowledge base, and with taking into account all the symptoms characteristic for each patient, we propose a new method based on calculating distances between diagnoses and patient tests. As in [4], to make a proper diagnosis D for a patient with given values of tested symptoms S, a medical knowledge base is necessary. In our case a knowledge base is formulated in terms of intuitionistic fuzzy sets. To compare the approach proposed in this article with the method of De, Biswas and Roy [4], and described shortly in Subsection 4.1, we consider just the same data. The set of diagnoses is D = {Viral fever, Malaria, Typhoid, Stomach problem, Chest problem}, and the set of symptoms is S = {temperature, headache, stomach pain, cough, chest-pain}. The data are given in Table 5 – each symptom is described by three numbers: membership µ, non-membership ν, hesition margin π. For example, for malaria, we have: the temperature is high (µ = 0.7, ν = 0, π = 0.3), whereas for the chest problem, we have: temperature is low (µ = 0.1, ν = 0.8, π = 0.1). In fact data in Table 2 and Table 5 are exactly the same (due to (5)) but involve in an explicit way the hesitation margin too; this is useful and expedient in our approach. The set of patients is P = {Al, Bob, Joe, Ted}. The symptoms characteristic for the patients are given in Table 6 – as before, we use all three parameters (µ,ν,π) to describe each symptom but the data are the same (due to (5)) as in Table 1. We seek a proper diagnosis for each patient pi , i = 1, . . . , 4. To do so, we propose • to calculate for each patient pi a distance of his symptoms (Table 6) from a set of symptoms s j , j = 1, . . . , 5 characteristic for each diagnosis dk , k = 1, . . . , 5 (Table 5), • to find a proper diagnosis minimizing the obtained distance. Using the results obtained earlier [cf. also Szmidt and Kacprzyk [12], [15]] we find the normalised Hamming distance for all symptoms of patient i from diagnosis k which is equal to
114
E. Szmidt and J. Kacprzyk
Table 6. Symptoms characteristic for the patients considered
Temperature Headache Al Bob Joe Ted
(0.8, 0.1, 0.1) (0.0, 0.8, 0.2) (0.8, 0.1, 0.1) (0.6, 0.1, 0.3)
(0.6, 0.1, 0.3) (0.4, 0.4, 0.2) (0.8, 0.1, 0.1) (0.5, 0.4, 0.1)
Stomach pain (0.2, 0.8, 0.0) (0.6, 0.1, 0.3) (0.0, 0.6, 0.4) (0.3, 0.4, 0.3)
Cough
Chest pain (0.1, 0.6, 0.3) (0.1, 0.8, 0.1) (0.0, 0.5, 0.5) (0.3, 0.4, 0.3)
(0.6, 0.1, 0.3) (0.1, 0.7, 0.2) (0.2, 0.7, 0.1) (0.7, 0.2, 0.1)
Table 7. The normalized Hamming distances for each patient from the considered set of possible diagnoses
Viral Malaria Typhoid Stomach Chest f ever problem problem Al 0.28 0.24 0.28 0.54 0.56 Bob 0.40 0.50 0.31 0.14 0.42 Joe 0.38 0.44 0.32 0.50 0.55 Ted 0.28 0.30 0.38 0.44 0.54
1 5 ( µ j (pi ) − µ j (dk ) + ∑ 10 j=1 + ν j (pi ) − ν j (dk ) + π j (pi ) − π j (dk ))
lIFS (s(pi ), dk ) =
(55)
The distances (55) for each patient from the considered set of possible diagnoses are given in Table 7. The lowest distance points out a proper diagnosis: Al suffers from malaria, Bob from a stomach problem, Joe from typhoid, whereas Ted from fever. We obtained the same results, i.e. the same diagnosis for each patient when looking for the solution while applying the normalized Euclidean distance [cf. Szmidt and Kacprzyk [12], [15]] qIFS (s(pi ), dk )) = (
1 10
5
∑ (µ j (pi ) − µ j (dk ))2 +
j=1
1
+ (ν j (pi ) − ν j (dk ))2 + (π j (pi ) − π j (dk ))2 ) 2
(56)
The results are given in Table 8 – the lowest distance for each patient pi from possible diagnosis D indicates a solution sought. As before, Al suffers from malaria, Bob from stomach problem, Joe from typhoid, and Ted from fever. It is worth noticing that the proposed approach differs in spirit from the intuitionistic fuzzy relation based approach by De, Biswas and Roy [4]. The latter one
Distances Between Intuitionistic Fuzzy Sets and their Applications in Reasoning
115
Table 8. The normalized Euclidean distances for each patient from the considered set of possible diagnoses
R
Viral Malaria Typhoid Stomach Chest f ever problem problem Al 0.29 0.25 0.32 0.53 0.58 Bob 0.43 0.56 0.33 0.14 0.46 Joe 0.36 0.41 0.32 0.52 0.57 Ted 0.25 0.29 0.35 0.43 0.50
employs the max-min-max composition of intuitionistic fuzzy relations, with all their drawbacks.
5 Concluding remarks We proposed a method for the evaluation of a degree (extent) of agreement in a group of individuals by calculating distances between intuitionistic fuzzy preference relations. We also proposed to employ intuitionistic fuzzy sets to more adequately represent knowledge. We showed that reasoning via those distances makes it possible to avoid some drawbacks of the max-min-max approach, notably the “neglecting” of values but the extreme ones.
References 1. Atanassov K. (1986), Intuitionistic fuzzy sets, Fuzzy Sets and Systems, 20, 87–96. 2. Atanassov K. (1999), Intuitionistic Fuzzy Sets: Theory and Applications. SpringerVerlag, Heidelberg and New York. 3. Biswas R. (1997) Intuitionistic fuzzy relations. Bull. Sous. Ens. Flous.Appl. (BUSEFAL), 70, 22–29. 4. De S.K., Biswas R. and Roy A.R. (2001) An application of intuitionistic fuzzy sets in medical diagnosis. Fuzzy Sets and Systems, 117, 209–213. 5. Kacprzyk J. and Nurmi H. (1998) Group decision making under fuzziness. In R. Slowinski (Ed.): Fuzzy Sets in Decision Analysis, Operations Research and Statistics, Kluwer, Boston, pp. 103–136. 6. Sanchez E. (1977) Solutions in composite fuzzy relation equation. Application to medical diagnosis Brouwerian Logic. In: M.M. Gupta, G.N. Saridis, B.R. Gaines (Eds.), Fuzzy Automata and Decision Process, Elsevier, North-Holland. 7. Sanchez E. (1976) Resolution of composition fuzzy relation equations. Inform. Control, 30, 38–48. 8. Szmidt E. (2000): Applications of Intuitionistic Fuzzy Sets in Decision Making. (D.Sc. dissertation) Techn. Univ., Sofia, 2000. 9. Szmidt E. and Baldwin J.F. (2003) New similarity measures for intuitionistic fuzzy set theory and mass assignment theory. Notes on IFS, 9, 60–76.
116
E. Szmidt and J. Kacprzyk
10. Szmidt E. and Kacprzyk J. (1996a) Intuitionistic fuzzy sets in group decision making, Notes on IFS, 2, 15–32. 11. Szmidt E. and Kacprzyk J. (1996c) Remarks on some applications of intuitionistic fuzzy sets in decision making, Notes on IFS, 2, 22–31. 12. Szmidt E. and Kacprzyk J. (1997) On measuring distances between intuitionistic fuzzy sets, Notes on IFS, 3, 1–13. 13. Szmidt E. and Kacprzyk J. (1998a) Group Decision Making under Intuitionistic Fuzzy Preference Relations. Proc. IPMU’98 (Paris), pp. 172-178. 14. Szmidt E. and Kacprzyk J. (1998b) Applications of Intuitionistic Fuzzy Sets in Decision Making. Proc. EUSFLAT’99 (Pamplona), pp. 150–158. 15. Szmidt E. and Kacprzyk J. (2000) Distances between intuitionistic fuzzy sets, Fuzzy Sets and Systems, 114, 505–518. 16. Szmidt E. and Kacprzyk J. (2000) On Measures of Consensus Under Intuitionistic Fuzzy Relations. Proc. IPMU’2000 (Madrid), pp. 641–647. 17. Szmidt E. and Kacprzyk J. (2001) Distance fron Consensus Under Intuitionistic Fuzzy Preferences. Proc. EUROFUSE Workshop on Preference Modelling and Applications (Granada), pp. 73–78. 18. Szmidt E., Kacprzyk J. (2001) Entropy for intuitionistic fuzzy sets. Fuzzy Sets and Systems, 118, 467–477. 19. Szmidt E. and Kacprzyk J. (2001) Analysis of Consensus under Intuitionistic Fuzzy Preferences. Proc. Int. Conf. in Fuzzy Logic and Technology (Leicester), pp. 79–82. 20. Szmidt E. and Kacprzyk J. (2002) Analysis of Agreement in a Group of Experts via Distances Between Intuitionistic Fuzzy Preferences. Proc. IPMU’2002 (Annecy), pp. 1859– 1865. 21. Szmidt E. and Kacprzyk J. (2002) An Intuitionistic Fuzzy Set Based Approach to Intelligent Data Analysis: An application to Medical Diagnosis. In A. Abraham, L.Jain and J. Kacprzyk (Eds.): Recent Advances in Intelligent Paradigms and and Applications. Springer-Verlag, pp. 57–70. 22. L.A. Zadeh (1965) Fuzzy sets. Information and Control, 8, 338–353. 23. Zadeh, L.A. (1983). A computational approach to fuzzy quantifiers in natural languages. Comput. Math. Appl., 9, 149–184.
CHAPTER 9 Efficient Reasoning With Fuzzy Words Martin Spott Intelligent Systems Research Centre , BT Group Adastral Park, Orion pp 1/12, Martlesham Heath Ipswich, IP5 3RE, UK
[email protected] Abstract: Dealing with coarse granular information is very attractive for several reasons: 1. The number of details in an application might be so large that processing is not feasible without abstracting from details. 2. As we see in spoken languages, for example, coarse granular information is often easier to understand than lots of details. 3. Detailed information is not always available. Fuzzy systems are a natural choice for processing coarse granular information, but unfortunately, most fuzzy systems suffer from two drawbacks. Although knowledge is formulated on a coarse granular level using fuzzy sets, most information processing algorithms operate on the details and are, therefore, computationally costly. Furthermore, the fuzzy results are not expressed with the predefined fuzzy sets that we used to describe fuzzy knowledge in a comprehensive way, and are therefore often difficult to understand. As a solution to these problems, we propose a methodology that represents and processes fuzzy information at the coarse granular level. Keywords: computing with words, fuzzy granules, fuzzy partition, approximate reasoning
1 Introduction In many applications, we need to abstract from detailed information in order to make processing of the information feasible, since the number of details is too large. In other cases, information is a priori an imprecise abstraction from details, because more precise information is not available. Furthermore, it is often easier to understand information at a higher level of abstraction. For instance, precise voltages of batteries are useless information for many people whereas “the battery is almost flat” might be much more helpful. We can classify the precision of information by its level of granularity. We call the most precise units of information fine granular. If we, for example, deal with heights of persons, then the most precise information units might be cm or mm, depending on the accuracy of the measuring device. However, a loss in precision leads to coarse granular information units, which might be collections of fine granular units, for example. In this chapter, we distinguish only two levels of granularity, fine and coarse. Generally, whole hierarchies of granularity might be useful in applications, since users could dig into details in different depths.
M. Spott: Efficient Reasoning With Fuzzy Wordss, Studies in Computational Intelligence (SCI) 2, 117–128 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
118
M. Spott
Fuzzy systems have been invented for processing coarse granular information. Unfortunately, most of them suffer from a serious drawback: although information like a fuzzy rule “rich customers buy many bottles of champagne” is formulated at the coarse granular level, the information is still processed at the fine granular level. An example is the max-mincomposition (or more generally, the compositional rule of inference) as an inference mechanism for rule-based fuzzy systems [11]. In the worst case, you have to calculate a maximum on the set of details (fine granular pieces of information). Since the number of details can be very large, think of the number of customers of a supermarket chain, these methods are useless, if your reason for abstraction is saving computational costs. We would have expected that the computational expense depends only on the number of coarse granular information units but not on the number of fine granular ones. Nevertheless, some special cases of the maxmin-composition like the M AMDANI controller [6] are quite efficient processing techniques [3]. A second problem is related to the interpretability of processing results. Rules in rulebased systems are often defined by experts who use linguistic values like tall for the size of a person, i.e., a meaning is attached to a fuzzy set. Here, we call a linguistic value a fuzzy word. Methods like the max-min-composition then aggregate a fuzzy input and the fuzzy rules and calculate fuzzy sets as output that represent the imprecise result of the information processing. These fuzzy sets can often not be interpreted in terms of the fuzzy words the experts used to define the rules. Therefore, the interpretability of information at the coarse granular level is lost. This holds, for example, for the popular M AMDANI controller. There are methods, indeed, that interpret fuzzy sets in terms of predefined words (linguistic approximation), but they are usually separate from the processing procedure. This might result in additional computational expense or inconsistencies in semantics. In summary, it would be useful if processing results are inherently expressed with the predefined fuzzy words, which would guarantee that the results are easy to understand. Aiming at efficient information processing at the coarse granular level and at understandable fuzzy results, we propose a method to combine fuzzy words that has been introduced first in [8, 9]. The main idea is to represent fuzzy information at the coarse granular level as a combination of predefined fuzzy words. The concept will be described in Sections 2 and 3. The main part of this chapter is dedicated to efficient reasoning with fuzzy words. Instead of processing arbitrary fuzzy sets, we always base processing on the fuzzy words and obtain results in the form of combined fuzzy words. Therefore, computational costs depend only on the number of fuzzy words and the results are easy to understand.
2 Fuzzy Partitions and Fuzzy Words As pointed out in the introduction, we aim at processing coarse granular pieces of information. Before we can discuss ways to process knowledge, we have to define what a coarse granular piece of information is. The starting point in our approach is a universe U being a set of basic elements as fine granular pieces of information. At this level, information can only be expressed with the basic elements: “Peter is 190 cm tall” with 190 cm as a basic element of the set of possible heights [1 cm, 300 cm]. At a coarse granular level, we abstract from the basic elements and use a more abstract language to express information like “Peter is tall.” We call a term like tall a word. Since words like tall might be inherently fuzzy, we will represent words by fuzzy sets on a universe U and call them fuzzy words. By abstraction, the set U of details will then be replaced by a set of fuzzy words as basic pieces of information that partition the entire universe in a
Efficient Reasoning With Fuzzy Words
119
fuzzy way. As the details do, the set of words shall cover the entire universe (completeness: all phenomena in the universe can be described), and, on the other hand, the words shall represent disjoint pieces of information (consistency: ability to distinguish phenomena). If the words would be represented by crisp subsets W ⊆ U of the universe, these two requirements would be (c1) The union of all words is the universe (Completeness) (c2) The words are pairwise disjoint (Consistency) A set of crisp subsets of U that is complete and consistent is called a partition of U . For the definition of fuzzy partitions, we express fuzziness as a combination of different crisp definitions. A fuzzy partition is then formed by a collection of crisp partitions. For example, we may ask people to define crisp partitions {very small, small, medium, tall, very tall} of U = [1 cm, 300 cm] that describe possible heights of men as crisp sets. We then have several crisp definitions for each of the words very small, small, medium, tall, very tall. For the construction of the respective fuzzy words, we use the context model [5, 1]. In this model, a source c of information is called a context. In our example, each person is a context. Each context c defines a word W as a crisp set W = Γ(c) via a function Γ : C −→ 2U . C is the set of contexts, 2U the set of all subsets of U . For instance, a person c might define tall = Γ(c) = [185 cm, 200 cm]. To each context c, a weight is attached that measures the correctness, importance or reliability of an information Γ(c). This weight is modelled by a probability [5]. Finally, a fuzzy word W˜ aggregates all definitions W (c) = Γ(c). W˜ is defined by a fuzzy set µW˜ , i.e., for all u ∈ U µW˜ (u) = P({c ∈ C | u ∈ Γ(c)}) =
∑
P(c)
(1)
c∈C : u∈Γ(c)
As an example, we asked 22 persons to define the heights very small, small, medium, tall and very tall of men. The resulting five fuzzy sets are shown in Fig. 1. Since we considered all persons equally reliable, the probability of each piece of information Γ(c) is 1/22. Therefore, a membership value like µtall (u) is equivalent to the relative number of persons that declare a man of u cm height as tall. If we use the context model to form a set of fuzzy words P = {W˜ 1 , W˜ 2 . . . W˜ n } from a collection of crisp partitions, it is easy to show that P forms a partition of unity [8]: ∀u ∈ U : ∑W ∈P µW˜ (u) = 1 . From here on, we call a partition of unity a fuzzy partition. An example for a fuzzy partition is shown in Fig. 1.
3 Combining Fuzzy Words Having abstracted from fine granular information, we have a set of words we can use to describe phenomena. For example, we can describe the height of a man with one of the words very small, small, medium, tall or very tall. A restriction to such a small number of possible statements, in this case five different heights, is not very expressive and therefore not very useful for applications. In [8], we proposed a method called combining fuzzy words in order to dramatically increase the expressiveness of fuzzy words. It works in a similar way, as we combine words of our vocabulary in spoken languages to form new statements. We, for example, say that a man is “tall or of medium height” (simple combination), or it is “more likely that his height is medium than tall” (weighted combination). More likely or less likely are typical terms that
120
M. Spott
are used to weight statements. The overall idea behind the combination of words is to use a small set of words, the vocabulary, to generate a large number of statements on the basis of a grammar, which defines the rules how words can be combined. We use a probabilistic model for the combination of fuzzy words, which is consistent with the context model that we use for the definition of fuzzy sets. P = {W˜ 1 , W˜ 2 . . . W˜ n } be a set of words that form a fuzzy partition of the universe U . We define a probability measure PP on P , or more precisely, on 2P , that models the weights of the words. The result are statements like “The height of the man is medium with PP (medium) and he is tall with PP (tall)”. More formally: A statement “∀W˜ ∈ P : x˜ = W˜ with probability PP (W˜ )” is called a weighted combination of the fuzzy words W˜ ∈ P ; x˜ is a linguistic variable like man’s height that can hold fuzzy values. In the following, we often leave out the word weighted for reasons of simplicity and just say combination of fuzzy words. A combination of fuzzy words is a fuzzy set and is represented by a membership function, as a simple fuzzy word is. We therefore have two different levels of presentation for fuzzy information: the coarse granular or symbolic level at which statements are expressed using fuzzy words and the fine granular level where we deal only with membership values to fuzzy sets. The crucial question is how these two levels of granularity are related. At the fine granular level, fuzzy information is given by a membership function that defines a fuzzy set. The relation of such a fuzzy set to predefined fuzzy words is unclear. We therefore cannot easily express this information on the basis of the predefined fuzzy words. On the other hand, we can describe information at the symbolic level using a combination of fuzzy words, but we do not know the membership function of the fuzzy set that defines the combination at the fine granular level. In order to describe the relation of the fine and the coarse level we introduced two operators in [8, 9]: The synthesis of a fuzzy set translates a symbolic combination of fuzzy words into the equivalent membership function, whereas the analysis of a fuzzy set transforms a membership function into a symbolic combination of predefined fuzzy words. In this way, the synthesis calculates the exact definition of a combination of fuzzy words, and the analysis explains a fuzzy set in terms of comprehensible fuzzy words. Fig. 2 illustrates this relation. 1 0.8 0.6 0.4 0.2 0 120
140
160
180
200
220
Fig. 1. Fuzzy heights of men: very small, small, medium, tall and very tall (left to right)
Efficient Reasoning With Fuzzy Words
121
This idea of transforming fuzzy information back and forth between the coarse and fine granular levels has not as yet received special attention in the relevant literature. Regarding the analysis, measures of similarity [4] or consistency [2] could be used to describe the relation of fuzzy sets. A fuzzy set, representing the fuzzy height of a person, for example, could be compared separately with each of the predefined fuzzy words like medium or tall with respect to similarity or consistency. The problem is that these comparisons are independent from another. Instead, we will propose an approach that finds a combination of all fuzzy words, which is similar to the given fuzzy height in the above example. Furthermore, we will see that our approach allows us to transform a combination of fuzzy words into a fuzzy set on the fine granular level and back into a combination of fuzzy words without losing information. This is generally not possible with most of the techniques based on the max-min-composition or fuzzy production rules like [10].
coarse
˜ 1 with 10%, W ˜ 2 with 60% and W ˜ 3 with 30%. ˜x = W
Synthesis
Analysis
fine granular x Fig. 2. Synthesis and Analysis of a fuzzy set
3.1 Synthesis Of A Fuzzy Set As introduced above, a synthesis transforms a combination of fuzzy words, given by a probability measure PP attached to a fuzzy partition P , into a fuzzy set. In order to achieve this, we enrich the context model that is used to define the fuzzy words by PP . The basic idea is to adapt the context weights P(c) for each fuzzy word W˜ ∈ P with PP (W˜ ). In the context model, P(c) measures the importance or correctness of c and ΓW˜ (c) at the same time. ΓW˜ (c) is the crisp definition of a fuzzy word W˜ by the context c, as described in Sect. 2. We now interpret the fact that W˜ is weighted with PP (W˜ ) in the following way: The importance or correctness of the definition ΓW˜ (c) of the fuzzy word W˜ depends not only on the weight P(c) but also on the importance or correctness PP (W˜ ) of the fuzzy word W˜ . We assume that the probabilities P(c) and PP (W˜ ) are independent and weight ΓW˜ (c) with P(c, W˜ ) = P(c) · PP (W˜ )
(2)
The assumption of independency means that the definition of the fuzzy words is independent from the determination of their probabilities. In applications, we have to take care that the assumption of independency is fulfilled. If we, for example, ask the same people that defined the fuzzy words very small, small, medium, tall and very tall, to describe the height of a person with these words, they might tend to base their description on their own definitions ΓW˜ (c) of these words rather than using the fuzzy
122
M. Spott
words. We have to tell them not to do so, but the psychological aspects in surveys like this must not be underrated. Following the descriptions above, we can calculate the membership of the fuzzy set A˜ that is represented by the combination ”∀W˜ ∈ P : x˜ = W˜ with PP (W˜ )”. We generalise (1) with the help of (2): µA˜ (u) = P({c ∈ C , W˜ ∈ P | u ∈ ΓW˜ (c)}) = ∑ P(c) PP (W˜ )
(3) (4)
c∈C ,W˜ ∈P : u∈ΓW˜ (c)
This leads to Theorem 1. Let P be a fuzzy partition of the universe U based on a set C of contexts with probability measure P on 2C . For the fuzzy set A˜ that represents the weighted combination “∀W˜ ∈ P : x˜ = W˜ with probability PP (W˜ )” holds µA˜ (u) =
∑
W˜ ∈P
PP (W˜ ) µW˜ (u)
(5)
Proof. Rewriting (4) and applying (1) reveals µA˜ (u) = =
∑
c∈C ,W˜ ∈P : u∈ΓW˜ (c)
∑
PP (W˜ )
∑
PP (W˜ ) µW˜ (u)
W˜ ∈P
=
P(c) PP (W˜ )
W˜ ∈P
∑
P(c)
c∈C : u∈ΓW˜ (c)
The theorem shows that the membership function µA˜ of a weighted combination of fuzzy words W˜ ∈ P can be computed directly from the membership functions µW˜ of the words. In particular, it is not necessary to consider the contexts and their probabilities as in (4). Due to this result, we also write A˜ = ∑W˜ ∈P PP (W˜ ) W˜ alternatively to (5). Furthermore, (5) defines the space S of possible fuzzy statements that can be defined by a combination of fuzzy words. S = {∑W˜ ∈P αW˜ µW˜ | ∑W˜ ∈P αW˜ = 1} is the convex hull of the membership functions {µW˜ | W˜ ∈ P } and obviously a subset of the set of all membership functions on the universe U . We will now have a closer look at (5) and interpret the equation in a slightly different way that proves useful for the analysis of fuzzy sets in the following section. Let us assume a random variable Y that takes fuzzy words W˜ ∈ P as possible values. As an example, we can again use Y being the fuzzy height of a person. Furthermore let Y (u) be the membership value of u to a given value Y = W˜ , for instance, Y (u) = µtall (u) in case of Y = tall. With the probability measure PP introduced above, obviously PP (W˜ ) denotes the probability of Y = W˜ . Under these assumptions, µA˜ (u) in (5) is the expected value E(Y (u)), since for all u ∈ U E(Y (u)) = ∑ P(Y (u) = yi (u)) yi (u) = i
∑
W˜ ∈P
PP (W˜ ) µW˜ (u) = µA˜ (u)
(6)
if we take into account that the possible values yi (u) of Y (u) are indeed the membership values µW˜ (u) for the set of given fuzzy words W˜ ∈ P . For the example of the fuzzy set A˜
Efficient Reasoning With Fuzzy Words
123
being the height of a person, defined as weighted combination of fuzzy words, this means that µA˜ (u) measures the expected membership value of the crisp height u to the fuzzy height of the person.
3.2 Analysis Of A Fuzzy Set The analysis of a fuzzy set A˜ calculates a combination of fuzzy words that describe A˜ in an understandable way. As we saw with the definition of S , not all fuzzy sets on U can be represented accurately by a combination of fuzzy words. Generally, we can only expect that ˜ This is the price we have to pay for the abstraction the combination is an approximation of A. of knowledge. However, the analysis should at least fulfil the following requirement: A˜ =
∑
W˜ ∈P
˜ = A˜ PP (W˜ ) W˜ ⇒ synthesis(analysis(A))
(7)
In other words: If we analyse a fuzzy set that can be accurately represented by a combination of fuzzy words then the analysis must find this combination. The starting point for an analysis is a fuzzy partition P , i.e., a set of fuzzy words, and a fuzzy set A˜ that is to be described with the fuzzy words. Following (5), we are looking for a probability measure PP such that the weighted combination of fuzzy words approximates A˜ as accurately as possible: (8) µA˜ (u) ≈ ∑ PP (W˜ )µW˜ (u) W˜ ∈P
for all u ∈ U . We need to solve two problems. The first is how to define a measure for the quality of the approximation, and the second one is, how we optimise the probability measure PP with respect to the quality measure. Our derivation of a quality measure is based on a maximum likelihood estimator. For reasons of simplicity, let us assume a finite universe U with elements ui , i.e., (8) turns into yi := µA˜ (ui ) ≈ ∑W˜ ∈P PP (W˜ )µW˜ (ui ) for all i. Further we assume that the yi are values of independent, normally distributed random variables Yi . In accordance with (6) we also require E(Yi ) = ∑W˜ ∈P αW˜ µW˜ (ui ) and Var(Yi ) = σ2 > 0. Depending on how the fuzzy set A˜ we seek to analyse has been obtained, the assumption of independence of the Yi might be too strong. However, we view this assumption as a sensible, practical simplification. The likelihood function that measures the joint likelihood of the points (ui , yi ) is then given by n
2 2 − −(yi −∑W˜ ∈P αW˜ µW˜ (ui )) L(αW ˜ , σ ; y1 , y2 . . .) = ∏(2πσ ) 2 e 1
2
/2σ2
(9)
i=1
= (2πσ2 )− 2 e− ∑i=1 (yi −∑W˜ ∈P αW˜ µW˜ (ui )) n
n
2
/2σ2
(10)
Following the maximum likelihood approach we look for parameters αW˜ that maximise the likelihood of the occurrence of the (ui , yi ), i.e., we seek to maximise L with respect to αW ˜ . Given any variance σ2 > 0, L reaches its maximum when ∑ni=1 (yi − ∑W˜ ∈P αW˜ µW˜ (ui ))2 is minimal. Therefore, we base the quality measure of the approximation on this function: n
Q = ( ∑ (µA˜ (ui ) − i=1
∑
W˜ ∈P
1
αW˜ µW˜ (ui ))2 ) 2 = ||µA˜ −
∑
W˜ ∈P
αW˜ µW˜ ||2
(11)
with standard norm || · ||2 . We know from approximation theory that the solution to this optimisation problem is unique, if and only if the vectors (µW˜ (ui ))i are linearly independent. In case of the continuous
124
M. Spott
b 2 1/2 problem, we replace the norm in (11) by its continuous equivalent || f ||2 = a f (x) dx . In this case, the solution is unique if and only if the membership functions µW˜ are linearly independent. For the following reason, it is not a restriction to assume linear independency: If the membership functions of our fuzzy words are linearly dependent then at least one of the respective fuzzy words can be represented accurately by a weighted combination of the other fuzzy words. Therefore, we lose nothing, if we choose a linearly independent subset of membership functions, since we can always reconstruct the excluded functions. The optimal parameters αW˜ are calculated by solving a system of linear equations, cf. standard books on numerical mathematics. The optimised parameters αW˜ generally do not define a probability measure PP (W˜ ) on 2P , since ∑W˜ ∈P αW˜ = 1 and some of the αW˜ may even be negative. A negative αW˜ can obviously not be interpreted as probability. For practical reasons, we propose the following procedure as analysis of A˜ in favour of adding the constraints above to the optimisation problem. ˜ M := {W˜ ∈ P | supp(W˜ ) ∩ 1. Determine the set M of fuzzy words, that overlap with A: ˜ = 0}. / supp(A) 2. Optimise the weights αW˜ for all W˜ ∈ P . 3. If some of the αW˜ are negative, exclude the respective fuzzy words from M. 4. Define αW˜ = 0 for all W˜ ∈ M und normalise the weights αW˜ so that ∑W˜ ∈P αW˜ = 1. 5. Define the probabilities PP (W˜ ) ∀W˜ ∈ P : PP (W˜ ) := αW˜
(12)
˜ = {u ∈ U | µ ˜ (u) > 0}. Step 1 is based on The support of a fuzzy set A˜ is defined as supp(A) A the observation that a fuzzy word that does not overlap with A˜ cannot contribute to the approximation of A˜ with a positive weight, since all membership functions are positive. Positive side effects of this step are that the approximation problem can be reduced – fewer linear equations – and that the likelihood of negative weights is smaller. The proposed method indeed fulfils requirement (7): Theorem 2. Let the fuzzy set A˜ be a combination of fuzzy words W˜ ∈ P with linearly independent membership functions and weights PP (W˜ ): ∀u ∈ U : µA˜ (u) =
∑
W˜ ∈P
PP (W˜ ) µW˜ (u)
(13)
Under these assumptions, for the proposed procedure for the analysis of A˜ holds ∀W˜ ∈ P : PP (W˜ ) = PP (W˜ )
(14)
with PP (W˜ ) defined as in (12). Therefore, the analysis reconstructs the synthesised fuzzy set ˜ A. The theorem can be proven by using the fact that the solution of the function approximation problem is unique [8]. The theorem proves that the proposed analysing procedure of a fuzzy ˜ if such a combination set A˜ finds a combination of fuzzy words that accurately describes A, exists. If such a combination of fuzzy words does not exist, we generally end up with an ˜ In both cases, the combination of fuzzy words is unique. approximation of the fuzzy set A. If no negative weight can be detected in step 3 of the procedure, we even know that the combination is optimal with respect to the quality measure: we found the best interpretation of A˜ with fuzzy words.
Efficient Reasoning With Fuzzy Words
125
4 Reasoning With Fuzzy Words In Sections 2 and 3, we showed how fuzzy information can be represented at the symbolic level. Information represented in this way is easy to understand and the representation is very compact. As already mentioned in the introduction, we also aim at information processing at the symbolic level in order to achieve comprehensible results with low computational costs. Let us begin with a couple of terms frequently used in information processing: knowledge base, observations, inference and conclusion. The knowledge base comprises knowledge that is important and generally valid for the solution of a given problem. Inference is a procedure that draws conclusions or deduces information from the knowledge base. In the example of medical diagnosis, known relations between symptoms and causes form the knowledge base. Observed symptoms of patients are then used to infer possible causes. In this case, a conclusion (possible causes) is not drawn from the knowledge base alone, but from a combination of the knowledge base and an observation. The difference between an observation and an element of the knowledge base is that an observation is not generally valid. In this example, the observation depends on the patient. The idea of information processing at the symbolic level is based on the assumption that important parts of the knowledge base are defined at the symbolic level, anyway. An example are fuzzy if-then rules like “if the person is tall, then the person is heavy”. Since tall and heavy are fuzzy words at the symbolic level, the rule is a relation at the symbolic level, as well. In most approaches for fuzzy rule-based reasoning like Zadeh’s compositional rule of inference, such a rule is transformed into a fuzzy relation at the fine granular level. A fuzzy observation of a person’s height, given as a fuzzy set or fuzzy word, is then combined with the transformed rule and a fine granular conclusion will be drawn. By following such an approach, we obtain a computationally costly algorithm, since we operate at the fine granular level. The result is also at the fine granular level, i.e. not expressed with fuzzy words. Instead, we propose to process information at the symbolic level. A fuzzy observation is either given at symbolic level, e.g., a person is tall, or at fine granular level as an arbitrary fuzzy set. In the latter case, the fuzzy set will be analysed as described in Sect. 3.2 and in this way transferred onto the symbolic level. Then the symbolic observation and the symbolic knowledge base will be combined and a symbolic conclusion can be drawn.
4.1 Linear Inference Functions Let us formalise the inference procedure and assume that an inference maps a combination of fuzzy words onto another combination of fuzzy words, thus it is a function fK : Comb(PU ) −→ Comb(PV ). Comb(PU ) and Comb(PV ) denote the sets of all combinations of fuzzy words on universes U and V . fK also depends on the knowledge base K . For reasons of simplicity we assume that the knowledge base does not change and we can just write f instead of fK . Following the idea of the previous sections to represent fuzzy information as a combination of fuzzy words that form a partition of the underlying universe, we introduce an important subset of all inference functions, the linear inference functions. Being linear means that for all A˜ i ∈ PU , αi ∈ R holds f (∑i αi A˜ i ) = ∑i αi f (A˜ i ). We will indeed introduce rule-based reasoning later on that can be expressed as a linear inference function. The great thing about linear inference functions in the context of our theory is that the computational complexity of an inference is very low. Instead of calculating an inference ˜ for a combination A˜ ∈ Comb(PU ) from scratch, we can pre-calculate f (A˜ i ) for all fuzzy f (A)
126
M. Spott
˜ is just a linear combination of the pre-calculated words A˜ i ∈ PU . Due to the linearity of f , f (A) ˜ f (Ai ). This is the simple but important Theorem 3. Let PU = {A˜ i }i=1...n and PV = {B˜ i }i=1...m be sets of fuzzy words and A˜ a combination of fuzzy words ∑ni=1 αi A˜ i . f : Comb(PU ) −→ Comb(PV ) be a linear inference function ˜ = ∑m β j B˜ j with β j = ∑n αi γi j . with f (A˜ i ) := ∑mj=1 γi j B˜ j . Then f (A) j=1 i=1 For the proof, we use the linearity property of f . The theorem shows that the coefficients β j we are looking for can be obtained by a simple vector-matrix-multiplication b = Gt a with vectors a = (αi )i , b = (β j ) j and matrix G = (γi j )i j . In the same way, a chain of linear inferences can be written as a chain of matrix multiplications. Consider linear inference functions fk : Comb(PUk ) −→ Comb(PUk+1 ) and the chain of infer˜ It is easy to see that this can be represented by b = Gtm · · · Gt2 Gt1 a. ences fm (. . . ( f2 ( f1 (A)))). Furthermore, we can transform this chain of inferences into a single-step inference by combining the matrices: b = Gt a with Gt = Gtm · · · Gt2 Gt1 . In this way, the computational complexity can be further reduced, since the product of matrices can be calculated in advance. Finally, we observe that the computational complexity depends on the number of fuzzy words (or fuzzy rules) only and not on the number of underlying details.
4.2 Rule-Based Reasoning As an example of linear inference we introduce fuzzy rule-based reasoning that is based on Bayesian networks [7]. Each connection in a Bayesian network between nodes A and B represents a conditional probability P(B|A). We can interpret P(B|A) as a weighted if-then rule “if A then B with probability P(B|A)”. Bringing this together with common fuzzy rules and ˜ then y˜ = B˜ with probability p” as our theory, we interpret a weighted fuzzy rule “if x˜ = A, ˜ ˜ ˜ ˜ conditional probability PP (B|A) := PP (y˜ = B|x˜ = A) = p with P = PU × Pv Vv . Instead of the ˜ ˜
PP (B|A) ˜ if-then phrase we write A˜ −→ B. For the inference mechanism, we refer to probability theory:
PP (B˜ j ) = ∑ PP (B˜ j |A˜ i )PP (A˜ i )
(15)
i
which holds, since the fuzzy words A˜ i are discrete elements of PU and i A˜ i = PU . Using ˜ (15), an inference process that maps a fuzzy observation x˜ = A˜ onto a fuzzy conclusion y˜ = B, ˜ ˜ ˜ consists of three steps. First, we have to determine the probabilities PP (Ai ) = PP (Ai |A). If A˜ is represented as a combination of fuzzy words A˜ i like “the person is tall”, we already know the probabilities. Otherwise, we analyse A˜ as described in Sect. 3.2. In the second step, the probabilities PP (B˜ j ) are calculated using (15) and finally combined in conclusion B˜ using (5) ˜ (substitute W˜ with B). Obviously, the inference formula (15) is a linear inference function as described in Sect. 4.1, i.e., the inference can be described as vector-matrix-multiplication b = Gt a with vectors a = (PP (A˜ i ))i , b = (PP (B˜ j )) j and matrix G = (PP (B˜ j |A˜ i ))i j . The conclusion of the inference is a combination of fuzzy words, i.e., it is easy to understand, and at the same time, it is well defined as a fuzzy set using (5). Furthermore, the modus 1 ponens holds. That means, that in case of a rule A˜ i −→ B˜ in the rule set, an observation A˜ i ˜ This is not true for most fuzzy inference mechanisms, will be mapped onto the conclusion B. for example, M AMDANI ’ S popular approach. The problem is that if the modus ponens does not hold, the fuzziness of conclusions increases with every inference step. Therefore, after
Efficient Reasoning With Fuzzy Words
Observation coarse
˜ i) A ˜i PP (A
127
Conclusion ˜ i) PP (A
Inference ˜ j |A ˜ i) PP (B
˜ j) PP (B
Analysis
˜ j) B ˜j PP (B Synthesis
fine granular
Fig. 3. Rule-based inference with fuzzy words
a couple of inferences, the level of fuzziness might be so large that the conclusion becomes meaningless. This will not happen with our approach. Additionally, multi-stage inferences can be rewritten as a single-stage inference, as already explained in Sect. 4.1, which leads to a very efficient inference mechanism.
5 Conclusions Two very attractive goals of processing coarse granular fuzzy information are a reduction of computational complexity and the interpretability of information, in particular, of fuzzy conclusions. Unfortunately, most of the existing fuzzy information processing approaches are based on fine granular representations of coarse granular information and, therefore, cannot fulfil the required goals. In order to achieve the goals, we proposed a method called combination of fuzzy words in [8, 9] that represents fuzzy information always at the coarse granular level. The underlying technique describes in a mathematically sound way, how a fuzzy set can be synthesised by a combination of fuzzy words. Arbitrary fuzzy information, on the other hand, will be transformed into a combination of fuzzy words with a method called analysis, which finds an accurate coarse granular representation if it exists, and an approximation, in the other case. Thereby, the representations at the fine and coarse granular levels are isomorphic. The implications of this result are extremely interesting with respect to our goals of processing fuzzy information, computational efficiency and interpretability of information. Existing approaches process fuzzy information at the fine granular level; we, instead, transform information onto the coarse granular level, process it there, and transform it back, if necessary. We showed that, in this way, a very efficient multi-stage fuzzy rule-based reasoning mechanism can be realised. Apart from an initial transformation step onto the coarse granular level, all operations are done at the coarse granular level and therefore, the computational complexity depends on the number of fuzzy words and not on the number of details. Quite often, the costly initial step onto the coarse granular level is not necessary in practical applications, since the observations to be combined with the knowledge base are crisp [8]. Furthermore, fuzzy conclusions drawn by the system are represented as combinations of fuzzy words and are, therefore, easy to understand.
128
M. Spott
An interesting topic for future research is whether there are linear inference functions other than the rule-based approach that could benefit from these results with respect to comprehensibility of conclusions and computational complexity.
References 1. C. Borgelt. Data Mining with Graphical Models. PhD thesis, Otto-von-GuerickeUniversit¨at Magdeburg, 2000. 2. M. Cayrol, H. Farreny, and H. Prade. Fuzzy pattern matching. Kybernetes, 11:103–116, 1982. 3. H.-P. Chen and T.-M. Parng. A new approach of multi-stage fuzzy logic inference. Fuzzy Sets and Systems, 78:51–72, 1996. 4. D. Dubois and H. Prade. On distances between fuzzy points and their use for plausible reasoning. In Proc. of IEEE Intern. Conference on Cybernetics and Society, pages 300– 303, Bombay, India, 1983. 5. J. Gebhardt and R. Kruse. The context model—an integrating view of vagueness and uncertainty. Intern. Journal of Approximate Reasoning, 9:283–314, 1993. 6. E. H. Mamdani and S. Assilian. An experiment in linguistic synthesis with a fuzzy logic controller. Intern. Journal of Man–Machine Studies, 7, 1975. 7. J. Pearl. Probabilistic reasoning in intelligent systems: networks of pausible inference. Morgan Kaufmann, 1988. 8. M. Spott. Schließen mit unscharfen Begriffen (Reasoning with Fuzzy Terms). PhD thesis, Universit¨at Karlsruhe, Germany, 2000. 9. M. Spott. Combining fuzzy words. In Proc. of FUZZ-IEEE 2001, Melbourne, Australia, 2001. 10. D. S. Yeung and E. C. C. Tsang. Weighted fuzzy production rules. Fuzzy Sets and Systems, 88:299–313, 1997. 11. L. A. Zadeh. A theory of approximate reasoning. In J.E. Hayes, D. Mitchie, and L.I. Mikulich, editors, Machine Intelligence, pages 149–194. Wiley, New York, 1979.
CHAPTER 10 Intuitionistic Fuzzy Relational Images Martine De Cock, Chris Cornelis, and Etienne E. Kerre Ghent University Dept. of Mathematics and Computer Science Fuzziness and Uncertainty Modelling Research Unit Krijgslaan 281 (S9), B-9000 Gent, Belgium {Martine.DeCock, Chris.Cornelis, Etienne.Kerre}@UGent.be Abstract: By tracing intuitionistic fuzzy sets back to the underlying algebraic structure that they are defined on (a complete lattice), they can be embedded in the well–known class of L– fuzzy sets, whose formal treatment allows the definition and study of order–theoretic concepts such as triangular norms and conorms, negators and implicators, as well as the development of intuitionistic fuzzy relational calculus. In this chapter we study the intuitionistic fuzzy relational direct and superdirect image. An important aspect of our work, differentiating it from the study of L–fuzzy relational images in general, concerns the construction of an intuitionistic fuzzy relational image from the separate fuzzy relational images of its membership and non–membership function. We illustrate our results with two applications: the representation of linguistic hedges, and the development of a meaningful concept of an intuitionistic fuzzy rough set. Keywords: intuitionistic fuzzy relational calculus, direct and superdirect image, linguistic hedge, intuitionistic fuzzy rough set
1 INTRODUCTION Intuitionistic fuzzy sets (IFSs for short), an extension of fuzzy sets, were introduced by Atanassov [1] and are currently generating a great deal of interest. IFS theory basically defies the claim that from the fact that an element x “belongs” to a given degree (say µA (x)) to a fuzzy set A, naturally follows that x should “not belong” to A to the extent 1 − µA (x), an assertion implicit in the concept of a fuzzy set. On the contrary, IFSs assign to each element x of the universe both a degree of membership µA (x) and one of non–membership νA (x) such that µA (x) + νA (x) ≤ 1 thus relaxing the enforced duality νA (x) = 1 − µA (x) from fuzzy set theory. Obviously, when µA (x) + νA (x) = 1 for all elements of the universe, the traditional fuzzy set concept is recovered.
M.De Cock et al.: Intuitionistic Fuzzy Relational Images, Studies in Computational Intelligence (SCI) 2, 129–145 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
130
M.De Cock et al.
Wang and He [32], and later also Deschrijver and Kerre [17], noticed that IFSs can be considered as special instances of Goguen’s L–fuzzy sets [19], so every concept definable for L–fuzzy sets is also available to IFS theory. In this spirit, in [16, 18] suitable definitions and representation theorems for the most important intuitionistic fuzzy connectives have been derived; negators, triangular norms and conorms, and implicators can be used to model the elementary set–theoretical operations of complementation, intersection, union, and inclusion as well as the logical operations of negation, conjunction, disjunction, and implication. In this way, slowly but surely, IFSs start giving away their secrets. Using these building blocks, we can arrive at the study of more complex frameworks such as intuitionistic fuzzy relational calculus (IF relational calculus for short). The importance of L–fuzzy relational calculus in computer science can hardly be overestimated. It has already proven its usefulness in fields such as approximate reasoning (modelling linguistic IF–THEN rules as fuzzy relations, see e.g. [34]), fuzzy morphology for image processing (see e.g. [25]), fuzzy preference modelling (see e.g. [9]), and for obvious reasons also fuzzy relational databases. Furthermore benefits of fuzzy relations for the search in unstructured environments is becoming more and more clear (see [14] for a recent overview on the construction of fuzzy term–term relationships). In this chapter we will illustrate our results in yet two other applications, namely intuitionistic fuzzy rough sets for knowledge discovery, and the mathematical representation of linguistic terms for computing with words. At the heart of L–fuzzy relational calculus and very closely related to the composition of L–fuzzy relations, are the notions of direct and superdirect image of an L–fuzzy set under an L–fuzzy relation. Most of the research on L–fuzzy relational images so far has been carried out for L = [0, 1]. We refer to [11, 26] for an overview of both established theoretical properties and applications. The mapping of elements of the universe to the interval [0, 1] however implies a crisp, linear ordering of these elements, making [0, 1]–valued fuzzy set theory inadequate to deal with incomparable information. Attention to other complete lattices L of membership degrees is growing. In [8] a thorough study of L–fuzzy relational images was carried out, and in [13] their use for the representation of linguistic hedges — such as very and more or less — was proposed and investigated. The approach boasts in general a lot of nice properties as well as many practical and intuitive advantages over “traditional” modifiers such as powering [33] and shifting hedges [24]. IF relational calculus is situated between the extremes of the traditional [0, 1]–fuzzy relational calculus (or simply called fuzzy relational calculus) on one hand, and the very general notion of L–fuzzy relational calculus on the other. As a consequence it provides a more expressive formalism than traditional fuzzy relational calculus, but at the same time treasures special properties and a specific behavior that is lost when moving onto the more general L–fuzzy relational calculus. This makes IF relational calculus an attractive topic of study. An important issue, differentiating it from the study of L–fuzzy relational images in general, is what we call the “divide and conquer” rationale: taking images of membership and non–membership function separately, ensuring that the resulting construct is still an IFS, thus effectively breaking up our original problem into simpler, better–understood tasks. However the reader should not get the impression that IFS theory comes down to merely applying ideas from fuzzy set theory twice, once for the membership and once for the non– membership function. Indeed throughout this chapter it will become clear that the “divide and conquer” approach is rather a challenge than a triviality in IFS theory, and sometimes even impossible. However some conditions implied on the logical operators involved can allow for some results in this direction. This chapter is an extended version of [4] in which the use of IF relational images for the representation of linguistic hedges was introduced. We will recall this in Section 4. In Section
Intuitionistic Fuzzy Relational Images
131
5 we additionally discuss IF rough sets as a second application, drawing upon results from [6]. First however we give the necessary preliminaries on IFSs (Section 2), and we study IF relational images in general (Section 3).
2 Preliminaries 2.1 L–fuzzy sets In 1967 Goguen formally introduced the notion of an L–fuzzy set with a membership function taking values in a complete lattice L [19]. In this paper we assume that (L, ≤L ) is a complete lattice with smallest element 0L and greatest element 1L . An L–fuzzy set A in a universe X is a mapping from X to L, again called the membership function. The L–fuzzy set A in X is said to be included in the L–fuzzy set B in X, usually denoted by A ⊆ B, if A(x) ≤L B(x) for all x in X. An L–fuzzy set R in X × X is called an L–fuzzy relation on X. For all x and y in X, R(x, y) expresses the degree to which x and y are related through R. For every y in X, the R-foreset of y is an L–fuzzy set in X, denoted as Ry and defined by Ry(x) = R(x, y) for all x in X. L–fuzzy–set–theoretical operations such as complementation, intersection, and union, can be defined by means of suitable generalizations of the well-known connectives from boolean logic. Negation, conjunction, disjunction and implication can be generalized respectively to negator, triangular norm, triangular conorm and implicator, all mappings taking values in L. More specifically, a negator in L is any decreasing L → L mapping N satisfying N (0L ) = 1L . It is called involutive if N (N (x)) = x for all x in L. A triangular norm (t-norm for short) T in L is any increasing, commutative and associative L2 → L mapping satisfying T (1L , x) = x, for all x in L. A triangular conorm (t-conorm for short) S in L is any increasing, commutative and associative L2 → L mapping satisfying S (0L , x) = x, for all x in L. The N -complement of an L–fuzzy set A in X as well as the T -intersection and the S -union of L–fuzzy sets A and B in X are the L–fuzzy sets coN (A), A ∩T B and A ∪S B defined by coN (A)(x) = N (A(x)) A ∩T B(x) = T (A(x), B(x)) A ∪S B(x) = S (A(x), B(x)) for all x in X. The dual of a t-conorm S in L w.r.t. a negator N in L is a t-norm T in L defined as T (x, y) = N (S (N (x), N (y))) An implicator in L is any L2 → L–mapping I satisfying I (0L , 0L ) = 1L , I (1L , x) = x, for all x in L. Moreover we require I to be decreasing in its first, and increasing in its second component. If S and N are respectively a t-conorm and a negator in L, then it is well-known that the mapping IS ,N defined by IS ,N (x, y) = S (N (x), y)
132
M.De Cock et al.
is an implicator in L, usually called S-implicator (induced by S and N ). Note that if N is involutive then S (x, y) = IS ,N (N (x), y) for all x and y in L. For ease of notation we also use the concept of S–implicator induced by a t–norm T and an involutive negator N I T ,N (x, y) = N (T (x, N (y))) If S is the dual of T then IS ,N = IT ,N . Furthermore, if T is a t-norm in L, the mapping IT defined by I T (x, y) = sup{λ|λ ∈ L and T (x, λ) ≤L y} is an implicator in L, usually called the residual implicator (of T ). The partial mappings of a t-norm T in L are sup-morphisms if T sup xi , y = sup T (xi , y) i∈I
i∈I
for every family I of indexes. Every implicator induces a negator in the following way
N (x) = I (x, 0) for all x in L. The negator induced by an S–implicator IS ,N coincides with N . The negator induced by the residual implicator IT is denoted by NT . It is easy to verify that the meet and the join operation on L are respectively a t-norm and a t-conorm in L. We denote them by TM and SM respectively. Also A ∩ B is a shorter notation for A ∩TM B, while A ∪ B corresponds to A ∪SM B. The [0, 1] → [0, 1] mapping Ns defined as Ns (x) = 1 − x for all x in [0, 1] is a negator on [0, 1], often called the standard negator. For a [0, 1]-fuzzy set A, coNs (A) is commonly denoted by co(A). Table 1 depicts the values of well-known t-norms and t-conorms on [0, 1], for all x and y in [0, 1]. The first column of Table 2 shows the values
Table 1. Triangular norms and conorms on [0, 1] t-norm t-conorm TM (x, y) = min(x, y) SM (x, y) = max(x, y) TP (x, y) = x · y SP (x, y) = x + y − x · y TW (x, y) = max(x + y − 1, 0) SW (x, y) = min(x + y, 1)
of the S-implicators in [0, 1] induced by the t-conorms of Table 1 and the standard negator Ns , while the second column lists the values of the corresponding residual implicators. An L–fuzzy relation R on X is called an L–fuzzy T -equivalence relation if for all x, y, and z in X (E1) R(x, x) = 1L (E2) R(x, y) = R(y, x) (E3) T (R(x, y), R(y, z)) ≤ R(x, z)
(reflexivity) (symmetry) (T -transitivity)
Intuitionistic Fuzzy Relational Images
133
Table 2. S-implicators and residual implicators on [0, 1] S-implicator
residual implicator 1, if x ≤ y ISM ,Ns (x, y) = max(1 − x, y) ITM (x, y) =
y, else 1, if x ≤ y ISP ,Ns (x, y) = 1 − x + x.y ITP (x, y) = y x , else ISW ,Ns (x, y) = min(1 − x + y, 1) ITW (x, y) = min(1 − x + y, 1)
When L = {0, 1}, L–fuzzy set theory coincides with traditional set theory, in this context also called crisp set theory. {0, 1}-fuzzy sets and {0, 1}-fuzzy relations are usually also called crisp sets and crisp relations. When L = [0, 1], fuzzy set theory in the sense of Zadeh is recovered. [0, 1]-fuzzy sets and [0, 1]-fuzzy relations are commonly called fuzzy sets and fuzzy relations. Furthermore it is customary to omit the indication “in [0, 1]” when describing the logical operators, and hence to talk about negators, triangular norms, etc.
2.2 Intuitionistic Fuzzy Sets IFSs can also be considered as special instances of L–fuzzy sets [16]. Let (L∗ , ≤L∗ ) be the complete, bounded lattice defined by: L∗ = {(x1 , x2 ) ∈ [0, 1]2 | x1 + x2 ≤ 1} (x1 , x2 ) ≤L∗ (y1 , y2 ) ⇔ x1 ≤ y1 and x2 ≥ y2 The units of this lattice are denoted 0L∗ = (0, 1) and 1L∗ = (1, 0). For each element x ∈ L∗ , by x1 and x2 we denote its first and second component, respectively. An IFS A in a universe X is a mapping from X to L∗ . For every x ∈ X, the value µA (x) = (A(x))1 is called the membership degree of x to A; the value νA (x) = (A(x))2 is called the non–membership degree of x to A; and the value πA (x) is called the hesitation degree of x to A. Just like L∗ -fuzzy sets are called IFSs, L∗ -fuzzy relations are called IF relations. By complementing the membership degree with a non-membership degree that expresses to what extent the element does not belong to the IFS, such that the sum of the degrees does not exceed 1, a whole spectrum of knowledge not accessible to fuzzy sets can be accessed. The applications of this simple idea are manyfold indeed: it may be used to express positive as well as negative preferences; in a logical context, with a proposition a degree of truth and one of falsity may be associated; within databases, it can serve to evaluate the satisfaction as well as the violation of relational constraints. More generally, IFSs address the fundamental two-sidedness of knowledge, of positive versus negative information, and by not treating the two sides as exactly complementary (like fuzzy sets do), a margin of hesitation is created. This hesitation is quantified for each x in X by the number πA (x) = 1 − µA (x) − νA (x) The terms IF negator, IF t-norm, IF t-conorm and IF implicator are used to denote respectively a negator in L∗ , a t-norm in L∗ , a t-conorm in L∗ and an implicator in L∗ . A t-norm T in L∗ (resp. t-conorm S ) is called t-representable [16] if there exists a t-norm T and a t-conorm S in [0, 1] (resp. a t-conorm S and a t-norm T in [0, 1]) such that, for x = (x1 , x2 ), y = (y1 , y2 ) ∈ L∗ ,
134
M.De Cock et al.
T (x, y) = (T (x1 , y1 ), S(x2 , y2 )) S (x, y) = (S (x1 , y1 ), T (x2 , y2 )) T and S (resp. S and T ) are called the representants of T (resp. S ). Finally, denoting the first projection mapping on L∗ by pr1 , we recall from [16] that the [0, 1] − [0, 1] mapping N defined by N(a) = pr1 N (a, 1 − a) for all a in [0, 1] is an involutive negator in [0, 1], as soon as N is an involutive negator in L∗ . N is called the negator induced by N . Furthermore
N (x1 , x2 ) = (N(1 − x2 ), 1 − N(x1 )) for all x in L∗ . The standard IF negator is defined by
Ns (x) = (x2 , x1 ) for all x in L∗ . The meet and the join operators on L∗ are respectively the IF t-norm TM and the IF t-conorm SM defined by
TM (x, y) = (min(x1 , y1 ), max(x2 , y2 )) SM (x, y) = (max(x1 , y1 ), min(x2 , y2 )) Combining TW and SW of Table 1 gives rise to the t-representable IF t-norm TW and IF tconorm SW defined by
TW (x, y) = (max(0, x1 + y1 − 1), min(1, x2 + y2 )) SW (x, y) = (min(1, x1 + y1 ), max(0, x2 + y2 − 1)) However also TL and SL are possible extensions of TW and SW to IFS theory
TL (x, y) = (max(0, x1 + y1 − 1), min(1, x2 + 1 − y1 , y2 + 1 − x1 )) SL (x, y) = (min(1, x1 + 1 − y2 , y1 + 1 − x2 ), max(0, x2 + y2 − 1)) They are however not t-representable [16]. All of these IF t-conorms induce IF S-implicators ISM ,Ns (x, y) = (max(x2 , y1 ), min(x1 , y2 )) ISW ,Ns (x, y) = (min(1, x2 + y1 ), max(0, x1 + y2 − 1)) ISL ,Ns (x, y) = (min(1, y1 + 1 − x1 , x2 + 1 − y2 ), max(0, y2 + x1 − 1)) while the IF t-norms have residual IF implicators 1L∗ if x1 ≤ y1 and x2 ≥ y2 (1 − y2 , y2 ) if x1 ≤ y1 and x2 < y2 I TM (x, y) = (y , 0) if x1 > y1 and x2 ≥ y2 1 (y1 , y2 ) if x1 > y1 and x2 < y2 TW (x, y) = (min(1, 1 + y1 − x1 , 1 + x2 − y2 ), max(0, y2 − x2 ))
Finally we note that ITL equals ISL ,Ns .
Intuitionistic Fuzzy Relational Images
135
3 IF Relational Images Next to the composition of relations, the direct image of a set under a relation is a basic operation in traditional relational calculus. Let R be a relation from X to X and A a subset of X, then the direct image of A under R is defined by R↑A = {y|y ∈ X and (∃x ∈ X)(x ∈ A and (x, y) ∈ R)}
(1)
The direct image of A contains all elements of X that are related to at least one element of A. Furthermore the superdirect image of A under R R↓A = {y|y ∈ X and (∀x ∈ X)((x, y) ∈ R ⇒ x ∈ A)}
(2)
contains all elements of X that are related only to elements of A. These images can be generalized to the L–fuzzy relational case ([8, 21]). Since in our quest for a “divide and conquer” approach we attempt to express IF relational images as constructs of fuzzy relational images for membership and non–membership functions, we recall the most general definition on the level of L–fuzzy relational calculus. Definition 1 (L–fuzzy relational images). Let T and I be a t–norm and an implicator in L. Let A be an L–fuzzy set in X and R an L–fuzzy relation on X. The direct and superdirect image of A under R are the L–fuzzy sets in X respectively defined by R↑T A(y) = sup T (A(x), R(x, y)) x∈X
R↓ IA(y) = inf I (R(x, y), A(x)) x∈X
for all y in X. R↑T A(y) is the height of the T –intersection of A and Ry, i.e. the degree to which A and Ry overlap. R↓I A(y) corresponds to a well–known measure of inclusion of Ry in A. The main aim in this section is to provide properties of these L–fuzzy relational images in general and of IF relational images in particular. In the following sections we will go into the semantics of the concepts of IF relational direct and superdirect images and their properties in the context of modelling linguistic hedges and IF rough sets. Proposition 1. [12] Let T and I be a t–norm and an implicator in L. Let R be an L–fuzzy relation on X. If R is reflexive then for every L–fuzzy set A in X R↓I A ⊆ A ⊆ R↑T A Proposition 2. [12] Let T and I be a t–norm and an implicator in L. Let R be an L–fuzzy relation on X. Let A and B be L–fuzzy sets in X. If A ⊆ B then R↓IA ⊆ R↓I B R↑T A ⊆ R↑T B Proposition 3. [12] Let T and I be a t–norm and an implicator in L. Let A be an L–fuzzy set in X. Let R1 and R2 be L–fuzzy relations on X. If R1 ⊆ R2 then R1 ↓IA ⊇ R2 ↓I A R1 ↑T A ⊆ R2 ↑T A
136
M.De Cock et al.
Proposition 4. [12] Let T and IT be a t–norm and its residual implicator in L. Let A be an L–fuzzy set in X and R a T –transitive L–fuzzy relation on X. If the partial mappings of T are sup–morphisms then the following are equivalent (1) A = R↓IT A (2) A = R↑T A Proposition 5. [12] Let T and IT be a t–norm and its residual implicator in L. Let A be an L–fuzzy set in X and R a T –transitive L–fuzzy relation on X. If the partial mappings of T are sup–morphisms then R↑T (R↑T A) = R↑T A R↓IT (R↓IT A) = R↓IT A Corollary 1. Let T and IT be a t–norm and its residual implicator on L. Let A be an L– fuzzy set in X and R a T –transitive L–fuzzy relation on X. If the partial mappings of T are sup–morphisms then R↑T (R↓IT A) = R↓IT A R↓IT (R↑T A) = R↑T A We recall some results concerning the interaction of L–fuzzy relational images with union, intersection, and complementation of L–fuzzy sets. Proposition 6. [12] Let T and IT be a t–norm and its residual implicator in L. Let A and B be L–fuzzy sets in X and R an L–fuzzy relation on X. If the partial mappings of T are sup–morphisms then R↑T (A ∪ B) = R↑T A ∪ R↑T B R↓IT (A ∩ B) = R↓IT A ∩ R↓IT B Proposition 7. [12] Let T and IT be a t–norm and its residual implicator in L. Let R be an L–fuzzy relation on X. For every L–fuzzy set A in X coNT (R ↑T A) ⊆ R↓IT (coNT A)
(3)
If the partial mappings of T are sup–morphisms then R↑T (coNT A) ⊆ coNT (R↓IT A)
(4)
If NT is involutive then the left and right hand sides in (3) and (4) are equal. Proposition 8. [12] Let T and N be a t–norm and an involutive negator in L, and let IT ,N be the corresponding S–implicator. Let R be an L–fuzzy relation on X. For every L–fuzzy set A in X coNT (R ↑T A) ⊆ R↓IT ,N (coN A) R↑T (coN A) ⊆ coN (R↓IT ,N A) An IFS A is characterized by means of a membership function µA and a non-membership function νA . A natural question which arises is whether the direct image and the superdirect image of A could be defined in terms of the direct and the superdirect image of µA and νA , (all under the proper L–fuzzy relations of course). Generally such a “divide and conquer” approach is everything but trivial in IFS theory, and sometimes even impossible. However some conditions implied on the logical operators involved can allow for some results in this direction. Particularly attractive are the t-representable t-norms and t-conorms, and the Simplicators that can be associated with them.
Intuitionistic Fuzzy Relational Images
137
Proposition 9. Let T be a t-representable IF t–norm such that T = (T, S), let N be an involutive negator in [0, 1], and let IS,N be the S-implicator in [0, 1] induced by S and N. Let R be an IF relation on X. For every IFS A in X R↑T A = (µR ↑T µA , (coN (νR ))↓IS,N νA )
(5)
Proof. For all y in X we obtain successively R↑T A(y) = sup T (R(x, y), A(x)) x∈X
= sup (T (µR (x, y), µA (x)), S(νR (x, y), νA (x))) x∈X = sup T (µR (x, y), µA (x)), inf IS,N (N(νR (x, y)), νA (x)) x∈X x∈X = (µR ↑T µA )(y), (coN (νR )↓IS,N νA )(y)
Proposition 10. Let S be a t-representable IF t-conorm such that S = (S, T ), let N be an involutive IF negator, let IS ,N be the IF S–implicator induced by S and N , let N be the negator in [0, 1] induced by N and let IS,N be the S-implicator induced by S and N. Let R be an IF relation on X. For every IFS A in X R↓IS ,N A = ((co νR )↓IS,N µA , co(coN µR )↑T νA )
(6)
Proof. For all y in X we obtain successively R↓IS ,N A(y) = inf IS ,N (R(x, y), A(x)) x∈X
= inf S (N (R(x, y)), A(x)) x∈X = inf S(N(1 − νR (x, y)), µA (x)), x∈X sup T (1 − N(µR (x, y)), νA (x)) x∈X = inf IS,N (co(νR )(x, y), µA (x)),
sup T (co(coN (µR ))(x, y)), νA (x)) x∈X = (co(νR )↓IS,N µA )(y), (co(coN (µR ))↑T νA )(y) x∈X
Observe that in both (5) and (6) on the “fuzzy level” the images are taken under the membership function µR , or something semantically very much related, such as the N-complement of the non-membership function νR or once even the standard complement of the N-complement of µR . Presented in this way, the resulting formulas look quite complicated. For better understanding, let N be the standard negator, and let the IF relation R be a fuzzy relation, i.e. µR = co(νR ), then formulas (5) and (6) reduce to R↑T A = (µR ↑T µA , µR ↓IS,N νA ) R↓IS ,N A = (µR ↓IS,N µA , µR ↑T νA )
138
M.De Cock et al.
Apparently in this case the membership function of the direct image of A is the direct image of the membership function of A, while the non-membership function of the direct image of A is the superdirect image of the non-membership function of A. For the superdirect image of A the dual proposition holds. Finally let us recall a proposition that helps to construct non trivial T –transitive IF relations (i.e. µR not necessarily equal to co(νR )). Proposition 11. [6] Let T be a t-representable IF t-norm such that T = (T, S) and such that S(x, y) = 1 − T (1 − x, 1 − y), and let R1 , R2 be two fuzzy T –equivalence relations such that R1 (x, y) ≤ R2 (x, y), for all x and y in X. Then R defined by, for x and y in X, R(x, y) = (R1 (x, y), 1 − R2 (x, y)) is an IF T –equivalence relation. Example 1. Let X = [0, 100], and let the fuzzy TW -equivalence relation Ec on X be defined by | x−y | ,0 Ec (x, y) = max 1 − c for all x and y in X, and with real parameter c > 0. Obviously, if c1 ≤ c2 then Ec1 (x, y) ≤ Ec2 (x, y). By proposition 11, (Ec1 , co(Ec2 )) is an IF TW -equivalence relation.
4 Representation of Linguistic Hedges Since its introduction in the 1960’s, fuzzy set theory [34] has rapidly acquired an immense popularity as a formalism for the representation of vague linguistic information. Over the years many researchers have studied the automatic computation of membership functions for modified linguistic terms (such as very cool) from those of atomic ones (such as cool). Whether we are working with fuzzy sets, IFSs or L–fuzzy sets in general, establishing a concrete mathematical model for a given linguistic expression is typically one of the most difficult tasks when developing an application. Therefore it is very useful to have standard representations of linguistic modifiers such as very and more or less at hand, since they allow for the automatic construction of representations for the modified terms from representations of the original terms. The first proposal in this direction was made by Zadeh [33] who suggested to transform the membership function of a fuzzy set A in X into membership functions for very A and more or less A in the following way very A(y) = A(y)2 1 more or less A(y) = A(y) 2 for all y in X. One can easily verify that the following natural condition, called semantical entailment, [24] is respected: very A ⊆ A ⊆ more or less A These representations have the significant shortcoming of keeping the kernel and the support, which are defined as
Intuitionistic Fuzzy Relational Images
139
ker A = {y|y ∈ X and A(y) = 1} supp A = {y|y ∈ X and A(y) > 0} As a consequence they do not make any distinction between e.g. being old to degree 1 and being very old to degree 1, while intuition might dictate to call a woman of 85 old to degree 1 but very old only to a lower degree. Many representations developed in the same period are afflicted with these and other disadvantages on the level of intuition as well as on the level of applicability (we refer to [22] for an overview), in our opinion due to the fact that these operators are only technical tools, lacking inherent meaning. In fact it wasn’t until the second half of the 1990’s that new models with a clear semantics started to surface, such as the horizon approach [27] and the context (or fuzzy relational) based approach [10, 15]. The latter can be elegantly generalized to L–fuzzy sets [13] which accounts for its strength. A characteristic of the “traditional” approaches is that they do not really look at the context: when computing the degree to which y is very A, Zadeh’s representation for instance only looks at the degree to which y is A. It completely ignores all the other objects of the universe and their degree of belonging to A. In the context based approach the objects in the context of y are taken into account as well. This context is defined as the set of objects that are related to y by some relation R that models approximate equality. Specifically it is the R-foreset of y. One could say that somebody is more or less adult “if he resembles an adult”. Likewise a park is more or less large “if it resembles a large park”. In general: v is more or less A if y resembles an x that is A. Hence y is more or less A if the intersection of A and Ry is not empty. Or to state it more fuzzy–set–theoretically: y is more or less A to the degree to which Ry and A overlap, i.e. more or less A(y) = R ↑T A(y) For the representation of very an analogous scheme can be used. Indeed: if all men resembling Alberik in height are tall, then Alberik must be very tall. Likewise Krista is very kind “if everyone resembling Krista is kind”. In general: y is very A if all x resembling y are A. Hence y is very A if Ry is included in A. To state it more fuzzy–set–theoretically: y is very A to the degree to which Ry is included in A, i.e. very A(y) = R ↓ I A(y) Under the natural assumption that R is reflexive (every object is approximately equal to itself to the highest degree), semantical entailment holds (Proposition 1). As mentioned in the introduction, since IFSs are also L–fuzzy sets, a representation for more or less and very is readily obtained. Example 2. Figure 1 depicts the membership function µA and non-membership function νA of an IFS A in R. A is modified by taking the direct image by means of TW under an IF relation R with a membership function based on the general shape S-membership function 0, x≤α 2(x−α)2 , α ≤ x ≤ (α + γ)/2 (γ−α)2 S(x; α, γ) = 2(x−γ)2 1 − (γ−α)2 , (α + γ)/2 ≤ x ≤ γ 1, γ≤x for x, α and γ in R and α < γ. Specifically R is defined as if x ≤ y − 5 S(x; y − 20, y − 5) if y − 5 < x < y + 5 µR (x, y) = 1 1 − S(x; y + 5, y + 20) if y + 5 ≤ x
140
M.De Cock et al. Fig. 1. Membership and non-membership functions of A and R↑A 1
µR↑A
0.9
0.8
0.7
νA
0.6
0.5
µ
0.4
A
ν
0.3
R↑A
0.2
0.1
0
0
10
20
30
40
50
60
70
80
90
100
and νR (x, y) = 1 − µR (x, y), for all x and y in R. This results in the membership and the non-membership function for the modified IFS R↑A also depicted in Figure 1. As Figure 2 illustrates, the modification does not preserve the local hesitation: depending on its context, the hesitation degree of y in A increases, decreases or remains unaltered when passing to R↑A. On the global level however, the overall hesitation seems to be invariant, but this is not in
Fig. 2. Hesitation 1
0.9
πR↑A
0.8
πA
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
50
60
70
80
90
100
general the case. Let us assume that the conditions of Proposition 9 are fulfilled. Under the natural assumption that R is reflexive, we have (coN(νR )) ↓I νA ⊆ νA and µA ⊆ µR ↑T µA If νA is the constant [0, 1] → {0} mapping, modification of the non–membership function will have no effect. Any change in the membership function will therefore give rise to a decrease
Intuitionistic Fuzzy Relational Images
141
of the overall hesitation. Note that this seems natural: the hesitation to call objects A might be greater than the hesitation to call them more or less A. As far as the authors are aware, the only other existing approach to the modification of linguistic terms modeled by IFSs is due to De, Biswas and Roy [7]. They proposed an extension of Zadeh’s representation; it is based on the so–called product A ∩TP B of IFSs A and B. One can easily verify that A2 (u) = (µA (u)2 , 1 − (1 − νA (u))2 )
(7) 1
in which A2 is used as a shorthand notation for A ∩TP A. Furthermore for A 2 defined in a similar manner as 1 1 1 (8) A 2 (u) = (µA (u) 2 , 1 − (1 − νA (u)) 2 ) 1
1
one can verify that A 2 ∩TP A 2 = A which justifies the notation. Entirely in the line of Zadeh’s 1
work, in [7] the authors propose to use A 2 and A2 for the representation of more or less and very respectively. As a consequence, the drawbacks listed at the beginning of this section are also inherited, making the approach less interesting from the semantical point of view. Nevertheless Equations (7) and (8) reveal some interesting semantical clues. Indeed, these formulas actually suggest to model very A by (very µA , not (very not νA )) and more or less A by (more or less µA , not (more or less not νA )) As such it is an example of what we have called the divide–and–conquer approach. The resulting expressions for the non–membership functions are clearly more complicated than those for the membership functions; they stem from the observation that the complement of the non– membership function can be interpreted loosely as a kind of second membership function. As Proposition 9 indicates, taking the IF direct image (“more or less”) involves both a fuzzy direct image (“more or less”) and a fuzzy superdirect image (“very”). A dual observation can be made for Proposition 10. Interestingly enough, De, Biswas and Roy [7] do not use both hedges at the same time, but their approach involves negation of the non-membership function. Possible connections between intensifying hedges (like very) and weakening hedges (such as more or less) by means of negation have already intrigued several researchers. In [2] the meaning of “not overly bright” is described as “rather underly bright” which gives rise (albeit simplified) to the demand for equality of the mathematical representations for not very A and more or less not A. Propositions 7, 8, 9, and 10 show that under certain conditions on the connectives involved, the IF relational model indeed behaves in this way.
5 IF Rough Sets Pawlak [28] launched rough set theory as a framework for the construction of approximations of concepts when only incomplete information is available. Since its introduction it has become a popular tool for knowledge discovery (see e.g. [23] for a recent overview of the theory
142
M.De Cock et al.
and its applications). As a new trend in the attempts to combine the best of several worlds, very recently all kinds of suggestions for approaches merging rough set theory and IFS theory start to pop up [3, 20, 30, 31]. In the literature there exist many views on the notion “rough set” which can be grouped into two main streams. Several suggested options for fuzzification have led to an even greater number of views on the notion “fuzzy rough set”. Typically, under the same formal umbrella, they can be further generalized to the notion “L–fuzzy rough set”. Needless to say that, when trying to compare and/or to combine rough set theory, fuzzy set theory and IFS theory, one finds oneself at a complicated crossroads with an abundance of possible ways to proceed. The proposals referred to above all suffer from various drawbacks making them less eligible for applications. In [6] we made the definition of fuzzy rough set by Radzikowska and Kerre [29] — which exists already in a more specific form for more than a decade — undergo the natural transformation process towards intuitionistic fuzzy rough set theory (IFRS theory for short), which lead to a mathematically elegant and semantically interpretable concept. Definition 2 (IF rough set). Let T and I be an IF t–norm and an IF implicator respectively. Let R be an IF T –equivalence relation on X. We say that a couple of IFSs (A1 , A2 ) is called an intuitionistic fuzzy rough set (IFRS) in the approximation space (X, R, T , I) if there exists an IFS A such that R↓I A = A1 and R↑T A = A2 . R↓I A and R↑T A are called the lower and upper approximation of A respectively. Proposition 1 ensures that the lower approximation of A is included in A, while A is included in its upper approximation. Propositions 2 and 3 describe how the lower and upper approximations behave w.r.t. a refinement of the IFS to be approximated, or a refinement of the IF relation that defines the approximation space. Definition 3. [6] A is called definable in (X, R, T , I ) iff R↓IA = R↑T A In classical rough set theory, a set is definable if and only if it is a union of equivalence classes. This property no longer holds in IFRS theory. However, if we imply sufficient conditions on the IF t-norm T and the IF implicator I defining the approximation space, we can still establish a weakened theorem, relying on Propositions 4 and 5. Theorem 1. [6] Let T and IT be an IF t-norm and its residual implicator. If the partial mappings of T are sup-morphisms then any union of R-foresets is definable, i.e. (∃B) B ⊆ X and A =
Rz
implies R↓IT A = R↑T A
z∈B
Under the same conditions implied on T and I as in Theorem 1, the SM -union and TM intersection of two definable IFSs is definable. This is a corollary of Proposition 6. Finally, the following examples illustrate the concept of an IFRS computed in approximation spaces involving t–representable as well as non t–representable IF t–norms. Example 3. Figure 3 shows the membership function µA and the non–membership function νA of the IFS A in the universe X = [0, 100]. Using the non-t-representable IF t-norm TL , its residual IF implicator ITL and the IF relation R defined by R(x, y) = (E40 (x, y), 1 − E40 (x, y)) for all x and y in [0, 100] (see Example 1 for the definition of E40 ) we computed the lower approximation of A (= A1 ) as well as the upper approximation of A (= A2 ). They are both depicted as well in Figure 3.
Intuitionistic Fuzzy Relational Images
143
Fig. 3. A (solid lines); upper approximation of A (dashed lines); lower approximation of A (dotted lines) 1
0.9
ν
0.8
A
µ
2
A
1
0.7
0.6
ν
A
0.5
µ
A
0.4
0.3
µ
0.2
ν
A
2
A
1
0.1
0
0
10
20
30
40
50
60
70
80
90
100
Example 4. Figure 4 shows the same IFS A we used in Example 3. However to compute its lower approximation A1 and its upper approximation A2 , this time we used the t-representable IF t-norm TW , its residual IF implicator and the IF relation R defined by R(x, y) = (E30 (x, y), E50 (x, y)) for all x and y in [0, 100].
Fig. 4. dashed line: upper approximation of A; dotted line: lower approximation of A 1
0.9
0.8
ν
0.7
A
2
µ
0.6
A
1
ν
A
0.5
0.4
µ
ν
A
A
1
0.3
µ
A
0.2
2
0.1
0
0
10
20
30
40
50
60
70
80
90
100
6 Conclusion Partly due to existing studies on connectives in the lattice L∗ , the direct and superdirect relational images are readily obtained in an intuitionistic fuzzy setting. Under certain conditions
144
M.De Cock et al.
on the connectives used in the formulas of direct and superdirect image, a meaningful representation for the image of the whole in terms of that of its constituting parts is established. This is not only interesting from the computational point of view, but, using the images to represent linguistic hedges, also helps us to gain more insight in the semantics of the linguistic modification process. Furthermore the IF relational images lead to a mathematically elegant concept of an IF rough set, where they are used to construct upper and lower approximations.
Acknowledgement M. De Cock and C. Cornelis would like to thank the Fund for Scientific Research - Flanders (FWO) for funding the research reported on in this paper.
References 1. Atanassov K T (1999) Intuitionistic Fuzzy Sets. Physica–Verlag, Heidelberg, New York 2. Bolinger D (1972) Degree Words. Mouton, the Hague 3. Chakrabarty K, Gedeon T, Koczy L (1998) Intuitionistic fuzzy rough sets. In: Proceedings of JCIS’98 (4th Joint Conference on Information Sciences), 211– 214 4. Cornelis C, De Cock M, Kerre E E (2002) Linguistic Hedges in an Intuitionistic Fuzzy Setting. In: Proceedings of FSKD’02 (1st International Conference on Fuzzy Systems and Knowledge Discovery), Volume I:101–105 5. Cornelis C, Deschrijver G, De Cock M, Kerre E E (2002) Intuitionistic Fuzzy Relational Calculus: an Overview. In: Proceedings of First International IEEE Symposium “Intelligent Systems”, 340–345 6. Cornelis C, De Cock M, Kerre E E (2003) Intuitionistic Fuzzy Rough Sets: at the Crossroads of Imperfect Knowledge. Expert Systems 20(5):260–269 7. De S K, Biswas R, Roy A R (2000) Some operations on intuitionistic fuzzy sets. Fuzzy Sets and Systems 114:477–484 8. De Baets B (1995) Solving Fuzzy Relational Equations: an Order Theoretical Approach. PhD thesis, Ghent University (in Dutch) 9. De Baets B, Fodor J, Perny P (2000) Preferences and Decisions under Incomplete Knowledge. Physica–Verlag, Berlin 10. De Cock M, Kerre E E (2000) A New Class of Fuzzy Modifiers. In: Proceedings of 30th IEEE International Symposium on Multiple-Valued Logic, IEEE Computer Society, 121–126 11. De Cock M, Nachtegael M, Kerre E E (2000) Images under Fuzzy Relations: a Master– Key to Fuzzy Applications. In: Ruan D, Abderrahim H A, D’hondt P, Kerre E E (eds) Intelligent Techniques and Soft Computing in Nuclear Science and Engineering. World Scientific, 47–54 12. De Cock M (2002) A Thorough Study of Linguistic Modifiers in Fuzzy Set Theory. PhD thesis, Ghent University (in Dutch) 13. De Cock M, Kerre E E (2002) A Context Based Approach to Linguistic Hedges. International Journal of Applied Mathematics and Computer Science 12(3):371–382 14. De Cock M, Guadarrama S, Nikravesh M (2005) Fuzzy Thesauri for and from the WWW. In: Nikravesh M, Zadeh L A, Kacprzyk J (eds) Soft Computing for Information Processing and Analysis, Studies in Fuzziness and Soft Computing 164, Springer-Verlag
Intuitionistic Fuzzy Relational Images
145
15. De Cock M, Kerre E E (2004) Fuzzy Modifiers based on Fuzzy Relations. Information Sciences 160:173–199 16. Deschrijver G, Cornelis C, Kerre E E (2002) Intuitionistic Fuzzy Connectives Revisited. In: Proceedings of IPMU2002 (9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems), 1839–1844 17. Deschrijver G, Kerre E E (2003) On the Relationship Between Some Extensions of Fuzzy Set Theory. Fuzzy Sets and Systems 133(2):227–235 18. Deschrijver G, Cornelis C, Kerre E E (2004) On the Representation of Intuitionistic Fuzzy t-norms and t-conorms. IEEE Transactions on Fuzzy Systems 12(1):45–61 19. Goguen J (1967) L–fuzzy sets. Journal of Mathematical Analysis and Applications 18:145–174 20. Jena S P, Ghosh S K, Tripathy B (2002) Intuitionistic Fuzzy Rough Sets. Notes on Intuitionistic Fuzzy Sets 8(1):1–18 21. Kerre E E (1992) A Walk through Fuzzy Relations and their Applications to Information Retrieval, Medical Diagnosis and Expert Systems. In: Ayyub B M, Gupta M M, Kanal L N (eds) Analysis and Management of Uncertainty: Theory and Applications. Elsevier Science Publishers, 141–151 22. Kerre E E, De Cock M (1999) Linguistic Modifiers: an overview. In: Chen G, Ying M, Cai K Y (eds) Fuzzy Logic and Soft Computing. Kluwer Academic Publishers, 69–85 23. Komorowski J, Pawlak Z, Polkowski L, Skowron A (1999) Rough Sets: A Tutorial. In: Pal S K, Skowron A (eds) Rough Fuzzy Hybridization: A New Trend in DecisionMaking. Springer-Verlag, Singapore 24. G. Lakoff (1973) Hedges: a Study in Meaning Criteria and the Logic of Fuzzy Concepts. Journal of Philosophical Logic 2:458–508 25. Nachtegael M, Kerre E E (2000) Classical and Fuzzy Approaches towards Mathematical Morphology. In: Kerre E E, Nachtegael M (eds) Fuzzy Techniques in Image Processing. Springer–Verlag, Heidelberg, 3–57 26. Nachtegael M, De Cock M, Van Der Weken D, Kerre E E (2002) Fuzzy Relational Images in Computer Science. In: de Swart H C M (ed) Relational Methods in Computer Science, Lecture Notes in Computer Science Volume 2561, Springer–Verlag Berlin Heidelberg, 134–151 27. Nov´ak V (1996) A Horizon Shifting Model of Linguistic Hedges for Approximate Reasoning. In: Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, New Orleans, Volume I:423–427 28. Pawlak Z (1982) Rough Sets. International Journal of Computer and Information Sciences 11(5):341–356 29. Radzikowska A M, Kerre E E (2000) A Comparative Study of Fuzzy Rough Sets. Fuzzy Sets and Systems 126:137–156 30. Rizvi S, Naqvi H J, Nadeem D (2002) Rough Intuitionistic Fuzzy Sets. In: Caulfield H J, Chen S, Chen H, Duro R, Honavar V, Kerre E E, Lu M, Romay M G, Shih T K, Ventura D, Wang P P, Yang Y (eds) Proceedings of 6th Joint Conference on Information Sciences, 101-104. 31. Samanta S K, Mondal T K (2001) Intuitionistic Fuzzy Rough Sets and Rough Intuitionistic Fuzzy Sets. The Journal of Fuzzy Mathematics 9(3):561–582 32. Wang G, He Y (2000) Intuitionistic fuzzy sets and L-fuzzy sets Fuzzy Sets and Systems 110(2):271–274 33. Zadeh L A (1972) A Fuzzy–Set–Theoretic Interpretation of Linguistic Hedges. Journal of Cybernetics 2(3):4–34 34. Zadeh L A (1975) The Concept of a Linguistic Variable and its Application to Approximate Reasoning I, II, III. Information Sciences 8:199–249, 301–357; 9:43–80
CHAPTER 11 Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset Andrew Chiou1 and Xinghuo Yu2 1
Faculty of Informatics and Communication Central Queensland University Rockhampton, 4702 Queensland, Australia e-mail:
[email protected] 2 School of Electrical and Computer Engineering RMIT University Melbourne, 3001 Victoria, Australia e-mail:
[email protected]
Abstract: This paper demonstrates the framework and methodology of how weed population dynamics can be predicted using rule-base fuzzy logic as applied to GIS spatial image. Parthenium weed (parthenium hysterophorus L.) infestation in the Central Queensland region poses a serious threat to the environment and to the economic viability of the infested areas. Government agencies have taken steps to control and manage existing infestation and to curb future spread of this noxious weed. One of the tools used in these strategies is the prediction of parthenium weed population. Conventional weed forecasting methods utilises discrete values in exponential models and linear algorithms extensively. Attempts at predicting weed dispersal relied heavily on accuracy of the original charts or images to yield reasonable results. Using these methods, results of weed population forecasting are only as reliable as the data originally provided. This paper demonstrates that by using GIS spatial image categorised into themes, a fuzzy logic based forecasting methodology can be performed. Fuzzy logic is best suited to this type of problem because of its ability to handle approximate data Keywords: fuzzy logic, GIS, spatial image, datasets, weed dispersal, prediction, meta consequent, thematic datasets
1 Introduction In Central Queensland, parthenium weed (parthenium hysterophorus L.) has demonstrated the ability to cause significant environmental, health, and financial impacts on the pasture and grazing industries if not controlled or managed properly (Adamson, 1996; Chippendale and Panetta, 1994). However, due to the limited resources and capabilities of government agencies and individuals, there is a need of an alternative means of disseminating expert knowledge. One means of the alternative is in the development of an expert advisory sysA. Chiou and X. Yu: Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset, Studies in Computational Intelligence (SCI) 2, 147–162 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
148
A. Chiou and X. Yu
tem, P-Expert, that can be deployed to end users to provide expert knowledge and recommendation in the control and management strategies of parthenium weed. P-Expert’s usefulness to the community lies in its different number of advisory outputs that is presented to the user to cater to different requirements. These are, management strategies, control strategies, herbicide use, biological control strategies, explanatory function, and parthenium weed forecasting (Chiou, Yu and Adsett, 2002; Chiou, Yu and Adsett, 2001; and Chiou, Yu and Lowry, 2000). In this paper, a detailed description is presented on the methodology employed in using fuzzy rules to perform forecasting in the prediction of parthenium weed population dynamics in Central Queensland. The forecasting mechanism is carried out on spatial images of known infested sites and growth influence factors (eg. rivers, roads, soil type). These spatial images are separated and categorised into individual layers, where each layer is further sub-classified into themes. By applying a fuzzy membership function onto each of these themes, it is possible to aggregate the individual values into a final weed infestation factor. This infestation factor is then further applied to the actual spatial image of known infested sites to yield a weed population forecast (Chiou and Yu, 2001). By using fuzzy rules in this methodology, it overcomes some of the weaknesses encountered with more conventional methods of predicting parthenium weed infestation. Currently, the process of charting infested areas is still being carried out manually and hence, available data are only estimates. By applying conventional methods to these estimates, results obtained would only be as reliable as the accuracy of the original data. In this case, fuzzy logic is best suited to this type of problem as its original requisite is in handling approximate data (Saint-Joan and Desachy, 1995). Furthermore, by applying the methodology presented in this paper, it not only allows weed prediction at a large-scale level, but also at a localised level. By allowing qualitative data (e.g. anecdotal references, local information) to be included in the forecasting process, the fuzzy logic mechanism will be able to utilise it to refine large-scale forecasts to obtain localised forecasts.
2 Forecasting Parthenium Weed Population In parthenium weed control and management strategies, it is of utmost importance that preventive measures be taken to hinder weed growth in uninfested areas (Rejmanek, 2000). In past and present practices, this is achieved by destroying isolated outbreaks and the edges of saturated areas to maintain economic thresholds (Auld, Menz and Monaghan, 1978; Norris, 1999). That is, the cost of destroying a known infestation should not be greater than the cost of actual damages caused by the infestation itself. This strategy is only as successful or optimal as far as the weed forecast results permits. However, only a few studies and attempts have been made to study weed dispersal (Auld and Coote, 1990; Ghersa and Roush, 1993; Rejmanek, 2000). Established work in predicting weed population dynamics utilises discrete values in linear algorithms and exponential models extensively (Auld and Coote, 1980; Auld, Hosking and McFadyen, 1982; Cousens, 1985; Mortimer, McMahon, Manlove et al., 1980). Such attempts relied heavily on accuracy in the original charts or images for the forecasting mechanism to be able to yield reasonably acceptable or useful results (Rozenfelds and Mackenzie, 1999). However, due to coverage of existing infested sites 2 (approximately 170,000km or 10% of Queensland), this can only be achieved by proper and efficient mapping tools where such accuracy cannot be practically attained in the original spatial image. Data gathering were normally done manually via visualisation, that is, an official would travel in person to the site and give an approximate measurement of that infested site. And in many instances, data collection is accomplished by referring to
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset
149
anecdotal references given by landowners themselves. The majority of spatial data collected in such a manner are the only existing data available used for forecasting parthenium weed population in this region. In addition, conventional forecasting algorithms normally do not have provisions to take into account other influencing factors (e.g. gas exchange, photoperiod). At best, other factors are only taken into account under special circumstances. Conventional methodology requires that the algorithm be re-written each time an additional factor is taken into consideration as a supplementary variable. Given that results may be affected by these factors, it is nevertheless difficult to determine which factor or factors played a major role in influencing the weed dispersal. Moreover, conventional forecasting algorithms normally predict future infestation on a global scale (ie. large-scale), ignoring local conditions. Landowners whose property is smaller than the scale of the given global resolution would find local forecast results of minimal use (Daehler, 1998). The parthenium weed prediction methodology in P-Expert was developed using fuzzy logic techniques to: 1) predict weed population reasonably well based on approximate data, 2) take into consideration additional parameters without re-writing the algorithm, 3) refine large-scale forecasts to suit localised situations, 4) allow users to determine and inspect individual infestation factors, and 5) adapt the methodology to other weed or plant species.
3 Methodology This section describes the methodology used in the prediction of parthenium weed population using spatial images from GIS datasets. The overall methodology is in four phases: pre-processing, thematic forecast, global forecast and local forecast. All references made to examples refer to the diagram in Figure 1.
3.1 Phase 1: Pre-processing This step involves preparing the proper data structures to facilitate analysis of the spatial image datasets. The spatial image representation of GIS datasets is first separated into different layers. Each layer is sectioned into a matrix, where each cell represents a resolution of 102m in this demonstration. However in practice, this resolution is determined by the level of detail in the original charts and the resolution required by the fuzzy membership functions. Adhering to standard GIS practices (Demers, 2000, p98-109; Tomlin, 1990, p46-54), each layer comprises of themes, τ1, τ2…τn. Themes are made up of features, Fs1, Fs2…Fsn, indicating natural or man-made phenomenon found in maps (e.g. rivers, roads, buildings). In Figure 1, each theme will contain only one feature. Each of the themes are represented as raster images in the matrices in each layer. In practice, this method of structuring spatial image helps accommodate fuzzy variables and membership function (Dragicevic and Marceau, 1999). In Figure 1., the fuzzy representation is: 1) all layers represent a universe of discourse, and 2) each theme represents a fuzzy variable. There are two special themes, T1 and τnull.; T1 , is a theme reserved for spatial images representing known locations of infested sites. While, τnull is a reserved theme for regions labelled as null-zones. Null-zone areas are nonnegotiable regions, where no known plant species will propagate at all. In actual weed control and management practices, these areas are occupied by man-made structures, such as buildings, washdown and quarantine areas. Null-zones are significant in the forecasting
150
A. Chiou and X. Yu
process as explained in the following sections. Hence in Figure 1, the following themes are as follows: T1 : known infestation sites
τ1 : river feature τ2 : road feature τ3 : soiltype feature τnull: washdown areas
(1)
3.2 Phase 2: Thematic Forecast This phase requires that a thematic forecast be carried out individually on each τ1, τ2 and τ3. In this way, each factor that influences the propagation of parthenium weed can be segregated from other factors. Forecasting is performed layer by layer on each corresponding cell to determine its infestation factor, that is, a fuzzy consequent membership value. Membership parameters can be derived from the dispersal rates of weeds given by Cousens (1995, p57-85). In each theme, a thematic forecast function is assigned to all cells, such that the function is a fuzzy If-Then rule to determine the infestation factor. The function for each current cell, Cm, in τn , its predicted consequent infestation factor, In is, flayer(τn: Cm) = If (Cm is proximity_Fsn) Then (infestation_factor is In)
(2)
Using rule (2) based on the example in (1), the actual fuzzy If-Then rule implementation block for thematic forecast, excluding special themes, τnull and T1 is, begin If (current_cell is close_to_river) Then (infestation_factor is likely1) If (current_cell is close_to_road) Then (infestation_factor is likely2) If (current_cell is close_to_soiltype) Then (infestation_factor is likely3) end
(3)
Hence, the thematic forecast phase has the function, ftheme(T1: τ1, τ2, τ3) = (T1: τ1f, τ2f, τ3f)
(4)
Where, τ1f, τ2f and τ3f are the resulting themes after flayer function has been performed on all cells at each layer of the respective themes. Note that the predicate In (ie. likelyn) of the linguistic variable, infestation_factor from (2) and (3) are not identical in each theme. Since each theme has only one membership function assigned, linguistic predicates such as low, medium or high do not apply. Instead In is used to indicate the fuzzy sets of different membership functions for each level. For example, I1, indicates the infestation_factor for τ1 and I2, indicates the infestation_factor for τ2.
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset
151
3.3 Phase 3: Global (Large-Scale) Forecast In this phase, all the layers in the previous thematic forecast, W1f, W2f, W3f, are mapped onto the known infestation theme, T1, to form a composite, T2, to result in the forecast of infestation sites on a global scale. It must be pointed out that the term global does not refer to a worldwide scale, but rather to the whole given area represented on a spatial image (e.g. state, city, county). Based on parthenium weed dispersal and life-cycle parameters, the length of time between T1 and T2 is approximately 28 days. Inserting the themes T1f,W1f, W2f and W3f from (1) into the following global forecast function we have, fglobal(T1: τ1f, τ2f, τ3f, τnull) = T2
(5)
Therefore, by recursively iterating (5), we can obtain a forecast in multiples of approximately 28 days. The actual implementation for this process would be the aggregating of all the consequent infestation factor, In, of each corresponding cell on each layer to form a new membership function. Hence, this process would result in a DOF (degree of fulfilment). The DOF is a straightforward implementation of a number of standard defuzzification methods (Berkan and Trubatch, 1997, p70-73). The DOF for every cell is mapped onto every cell in T1, to result in T2. Thus, for every cell in τ1f, τ2f ,and τ3f, we have an implication that maps onto a corresponding cell in T1.
W1f(Cm, 1f) Φ W2f(Cm, 2f) Φ W3f(Cm, 3f) ĺ T1(Cm)
(6)
where ) is an aggregation operator. Recall from (1) we have a special theme Wnull, that requires special treatment. As this theme represents null-zones with cells of null values, the standard aggregation process for Wnull cannot be carried out. Instead, a cell from Wnull that has a null value shall nullify all previously aggregated values from W1f, W2f, W3f for each corresponding cell. Thus, from (6) if the aggregation process encounters a null-zone, then the resulting forecast for that cell in T1 is also null,
W1f(Cm, 1f) Φ W2f(Cm, 2f) Φ W3f(Cm, 3f) Φ Wnull(null) ĺ T1(null)
(7)
However, not all regions in a Wnull theme is a null-zone. In the case where the aggregation process encounters a cell that is not a null-zone in Wnull, the cell in question is simply ignored and the process reverts to (6). Note that null-zones in parthenium weed prediction are of great significance in best management practices as, by law, it is required that ‘quarantine’ areas such as washdown locations are made available in heavily infested sites to allow vehicles or persons to be sanitised to prevent spread of seeds to uninfested areas (Parthenium Action Group, 2000). Hence, the emphasis placed on this issue.
152
A. Chiou and X. Yu
river
Phase 1: Pre-processing Forecast
road
soil static themes null-zone If C1 is notin a null-zone, then ignore τnull.
If C2 is in a null-zone, Phase 2: then nullify all values. Thematic Forecast known infestation
τ
global Forecast DOFglobal
Phase 3: Global Forecast weather weather dynamic themes
flood others flood
local Forecast
DOFlocal
local forecast
Phase 4: Local Forecast
Fig. 1. The methodology requires each feature (e.g. rivers, roads, soil type) to be layered into themes to facilitate the assignment of appropriate fuzzy membership functions.
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset
153
3.4 Phase 4: Local Forecast In a global forecast, all factors influencing parthenium weed growth can mostly be categorised as static themes. Static themes normally do not and are not expected to change in the time frame of the forecasting process. However, the results from such a prediction are more useful on a global scale than for a localised condition. As most spatial images only indicate infested sites on larger scales, it does not cover sites that may be considered as insignificant in the original data collection process. Landowners on these properties can normally give useful input (i.e. anecdotal references, local knowledge) to the forecasting process that are not available elsewhere. Coincidentally, most of these inputs coincide with factors termed as dynamic themes. These factors can be conveniently layered into individual themes as in Phase 1. In Figure 1, the themes are,
τ4 : local weather τ5 : local flood areas
(8)
In this case, to obtain a local forecast of parthenium infestation, these dynamic factors are mapped against the original global forecast obtained from (5). Hence, the function is,
flocal(fglobal (T1: W1f, W2f, W3f, Wnull), W4, W5 ) = T3
(9)
Observe from (9), that since anecdotal references are the source of data and where the membership values are arbitrarily adjusted, the dynamic themes need not undergo the thematic forecasting phase as their static theme counterparts, therefore is represented by W4, W5 without the subscript f. T3 is the resulting local forecast. Since the size of the spatial image that T3 represents is less than or equal to T2 (ie. T3 T2), there can be more than one single instance of T3. That is, as many T3 as needed can be generated, each independently from other occurrences of T3. Each instance of T3 can be generated from additional sets of dynamic themes other than the ones shown in (8). Note that dynamic themes can also contain previously unchartered null-zones.
4 Meta Consequent Function In the case of local forecasting that has different consequence depending on different localised condition, an operator has been defined specifically as part of P-Expert’s meta consequent function (Chiou, Yu and Lowry, 2000). The operator Case-Of, is a passive operator. That is, it does not operate directly on the operands, but rather perform as a navigational function. In this way, in the instance of a one-to-many mapping consequent, the multi consequent function in P-Expert will allow branching to different membership function within the same fuzzy variable. Depending on its complexity, the meta-consequent function can be categorised as simple consequent, multi consequent or complex consequent function.
154
A. Chiou and X. Yu
4.1 Simple Consequent Function Lee and Kim (1995) have previously proposed a post adjustment mechanism to incorporate real world data with data from the expert system’s knowledge base. While the knowledge base contains pre-defined data collected from past instances, real world data represents the trends of external factors unknown at run-time (e.g. weather anomalies) and personal views (e.g. anecdotal references) not contained in the knowledge base. This post adjustment mechanism has been extensively modified for P-Expert’s purposes. Therefore, in the instance of a one-to-one mapping consequent, the meta-consequent function caters for the post adjustment by applying a BUT operator. This is expressed as, (IF x is a THEN y is b) BUT (z is c THEN y is d)
(10)
where the antecedent z is c is not a condition, but a factual statement. Unlike an ELSE operator, the BUT part of the If-Then rule is always true. That is, the BUT operator is evaluated and executed under all conditions. It is used to supersede the consequent of original rule on the left-hand side of the BUT operator. From (10), this implies that b is not equal to d. However, under certain circumstances it is possible that b has the same value as d, that is, (b=d). That is, in the case of b=d, the consequent is the same regardless of the LHS values. This case is highly unlikely and its a mechanism provided to negate the effects of the BUT operator in rare circumstances as a ‘catch-all’ condition.
4.2 Multi Consequent Function In the instance of a one-to-many mapping consequent, the multi consequent function allows branching to different membership functions within the same fuzzy variable. This will allow the same variable to have ownership over different sub-domains. We introduce a new operator, CASE-OF, to facilitate the operation of this function. The simplified expression is, (IF x is a THEN CASE-OF m) {CASE-OF m1:y1 is b1; . . CASE-OF mn:yn is bn }
(11)
4.3 Complex Consequent Function In a one-to-one-to-many instance, the operators, BUT and CASE-OF are combined to give a mechanism to override the consequent of a rule, and yet facilitating branching under different cases. The simplified expression is a combination of (10) and (11) giving, (IF x is a THEN y is b) BUT (z is c THEN CASE-OF m) {CASE-OF m1:w1 is d1; . . CASE-OF m n:wn is dn }
(12)
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset
155
Here, we see that the BUT operation overrides the default consequent of the If-Then rule, and at the same time the overriding consequent is dependent on the real-world input. In this example, the location determines the final outcome.
5 Example 5.1 Progressive Linear Dispersal: A Pedagogical Example The basis of the range of generic weed dispersal parameters is based on the Cousens and Mortimer (1995) dispersal vector, where dispersal is likely related to plant height, distances are given as multiple of height (H). Even though Cousens and Mortimer stated that this dispersal vector data was based on very limited available data, conforming it to fuzzy membership in this project was uncomplicated. There are many other factors for parthenium weed dispersal, such as wind direction, season and soil type. In this example, these factors are omitted as not to detract readers from the underlaying methodology. However, even with some of these factors omitted, it yields a totally acceptable prediction of weed infestation, as we shall see. This is to the credit of fuzzy logic, that it inherently retains its robustness towards incomplete or limited knowledge of other influencing factors. Using the known fact that a mature parthenium weed has the potential of growing up to an average height of 1 metre, we have H=1. As parthenium weed is categorised as the ‘shakers’ type, where propagule dispersal is by vigorous shaking of the branches before the seedlings fall off the parent plant and is carried further by the wind factor, we shall use the wind-assisted/‘shaker’ factor in the dispersal vector (Table 1). Table 1. The Cousens and Mortimer (1995) weed dispersal vector. The distance of dispersal is a multiple of the plant height (H). The maximum dispersal distance is dmax and d50 is the dispersal distance of where 50% of propagules are found. Vector
dmax
d50
Unaided Explosive Wind: ‘Gliders/parachuters’ ‘Shakers’ ‘Tumblers’ Water Animals: Farm & birds Ants Cultivation Harvesting
1H 3H
0.5H 05-1.0H
100 m 5H Several km >100 m
2H 2H ? ?
>100 m 20 m (?) 2-5 m 20-100 m
>100 m 5 m (?) 0.5 m 5-50 m
From the table, we can determine that the maximum distance (dmax) parthenium weed can disperse is over 5 metres (with wind assistance) from the parent plant. An approximate 50% (d50) of the seedlings can be found dispersed unaided within 2 to 3 metres of the parent
156
A. Chiou and X. Yu
plant. In this example, only wind assisted dispersal factors for the ‘shakers’ type is developed into fuzzy membership functions to simplify explanation. Hence, the antecedent (LHS) linguistic variable, area, is concerned with how far, midpoint or near the proximity of an infested cell. The consequent (RHS) is the infestation factor with the predicates low, medium or hight. The If-Then fuzzy rule block is,
If area is near Then infestation factor is high If area is midpoint Then infestation factor is medium If area is far Then infestation factor is low Hence, the fuzzy sets for LHS and RHS are in Figure 2. near
midpoint
far
0 1 2 3 4 5 distance from infestation
low
medium
high
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 infestation factors
Fig. 2. The membership for the LHS predicates is the distance of predicted area from a known infestation and the RHS memberships are the level of infestation factor. In this purely pedagogical example, if weed dispersal is linear (i.e. propagule spreads toward a single direction with wind assistance in that same direction), it is possible to obtain a prediction showing a progressive level of weed infestation. As parthenium weed requires approximately 28 days from seed to full maturity capable of flowering, prediction for this particular weed’s dispersal dynamics is set at a cycle of every 28 days, Cycle28. Therefore, to predict the uni-direction of weed dispersal for three parthenium weed cycle can be implemented in the following program block, For Cycle28 = 1 To 3 Predict_Dispersal(Cycle28) Next Cycle28 Figure 3 shows this progression using a vector of linear cells. Each cell represents a 1m2 area and the level of infestation through time is uni-directional (towards North). The originating cell at Cycle28 0 has an infestation point of 60% saturation. After a cycle of 28 days, the predicted dispersal is Northwards covering five more cells (i.e. within 5 to 6 metres of the original infested site). However, the saturation of each cell gets lighter as it departs further from the original cell. As the prediction is repeated for two more cycles (Cycle28 2 to Cycle28 3), the dispersal gets further away from the original infested site. Note that the level of saturation does not refer to the height or maturity of the plant, but refers to the approximate number of physical weed found within each 1m2 area. Also, an area that has been infested can only disperse to uninfested areas, it cannot reinfest itself.
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset
157
direction of dispersal with wind assistance Cycle28 0
Cycle28 1
Cycle28 2
Cycle28 3
Fig. 3. Progressive linear dispersal in a uni-direction showing the original infestation through a period of three cycles. Heavy infestation is shaded with a darker colour band (in this chapter, colour bands have been converted to a grey-scale). In practice, infestation of 10% is normally ignored. In this case, a previously infested area increases its saturation by 10% for every cycle. Therefore, a previously infested area can increase the number of plants in its current area through several cycles until it attains 100% saturation. That is, the number of plants cannot increase any more within that given area. In this case, all subsequent seedlings borne after the 100% saturation point has been attained will not germinate due to competition for resources within the same infested area. In this case, the parthenium weed completely dominates the pasture, resulting in a monoculture where no other plants can survive within this area.
5.2 Scale-Colour Bands In the P-Expert expert system, the level of predicted infestation for each cell is indicated by different scale-colour bands (i.e. dark colour = heavy infestation, light colour = light infestation). These colours are not arbitrarily assigned, but is part of the requirements for GIS spatial image representation using scale-colour bands as meaningful legends. In practice, it is extremely difficult for users to read actual printed numbers on spatial images. Hence, users have to rely on scale-colour for efficient visual cues. Even though it is common for current hardware display units to be capable of displaying and outputting spatial images with greater than 1,024 scale-colours, this level of accuracy is not fully utilised here as the required scale-colour goes up to a maximum 100 (i.e. 100% saturation).
6 Case Study The parthenium weed infestation prediction methodology presented in this paper is part of a three-year long on-going project, P-Expert Project, jointly funded by the Australian Re-
158
A. Chiou and X. Yu
search Council (ARC), Queensland Department of Natural Resources (QDNR) and Central Queensland University (CQU). Printed spatial images (i.e. maps) of known location of parthenium weed infestation were provided by QDNR for the years 1981, 1985, 1989, 1992 and 2000 for the state of Queensland. A specific region of known parthenium weed location, covering approximately 300,000km2 of Central Queensland has been selected for the case study. The themes and features of the selected region were entered into a specialised GIS editor specially developed for this project with provision to include fuzzy If-Then rules and associated variables and membership functions. The membership functions were adjusted and fine-tuned from one known period to another known period until predicted infested sites corresponded within the known sites. In this way, the adjusted membership functions can be applied to future predictions.
7 Results A 50 by 50 metre (shown as 50 by 50 grid on the spatial image) local area was selected from the property of a volunteer stakeholder. This area has not been significantly infested by parthenium weed except for several isolated known patches. Throughout six parthenium weed cycles (approximately 6 months), records and manual field survey was completed by the stakeholder. For each cycle, the known infestation was drafted onto a grid map of that selected area. Based on the first original grid map at Cycle28 0, the fuzzy prediction algorithm performed a predicted dispersal of parthenium weed of the selected area for six cycles. As it is impractical to compare the actual physical number of weeds in this sort of survey, comparison of results is performed solely on spatial images with scale-colour banding. In this chapter, the colour bands have been reproduced as a grey-scale.
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset
159
N
>0 >10 >20 >30 >40 >50 >60 >70 >80 >90 Level of infestation (%)
Fig. 4. The original known parthenium weed infestation as at Cycle28 0 recorded on a 50 by 50 grid. Each cell is an area of 1m2. For readability, all other features (e.g. contours, landmarks) of the area have been removed.
N
>0 >10 >20 >30 >40 >50 >60 >70 >80 >90 Level of infestation (%)
Fig. 5. Actual known infestation of the same 50 by 50 grid local area after six cycles (approximately six months).
160
A. Chiou and X. Yu
Figure 4 shows the original area, with known isolated patches. Figure 5 and 6 respectively shows the previously known and predicted dispersal for the parthenium weed after six cycles. As can be seen from the two images, the predicted dispersal closely matches the known infestation. In actual practice, the level of accuracy shown by the predicted dispersal in Figure 6 is totally adequate and acceptable to help decide the next course of action to carry out.
N
>10 >20 >30 >40 >50 >60 >70 >80 >90 >100 Level of infestation (%)
Fig. 6. This is the predicted dispersal of parthenium weed based on the original known infestation from Figure 4 using the fuzzy algorithm.
8 Conclusion Parthenium weed infestation in the Central Queensland region poses a serious threat to the environment and also economic viability of areas occupied by agricultural industries. Government agencies have taken steps to control and manage existing infestation and also to curb the future spread of this noxious weed. One of the tools used in these strategies is the prediction of parthenium weed population. With the foreknowledge of where and what location the next outbreak would occur, preventive steps can be taken to hinder the weed dispersal from actually taking place. Conventional weed forecasting methods utilises discrete values in exponential models and linear algorithms extensively. Attempts at predicting weed dispersal relied heavily on accuracy in the original charts or images to yield reasonable results. However, since early data were poorly collected, the results of weed population forecasting are only as reliable as the data originally provided. In this case, fuzzy logic is best suited to this type of challenge as its original requisite is in handling approximate data.
Thematic Fuzzy Prediction of Weed Dispersal Using Spatial Dataset
161
This chapter shows, how by using GIS spatial image datasets categorised into themes, a fuzzy logic based forecasting methodology can be performed. This methodology would allow us to: 1) predict weed population reasonably well based on approximate data, 2) take into consideration additional parameters without re-writing the algorithm, 3) refine largescale forecasts to suit localised situations, 4) allow users to determine and inspect individual infestation factors, and 5) adapt the methodology to other weed or plant species.
References Adamson, D. C. (1996). Determining the Economic Impact of Parthenium on the Australian Beef Industry: A Comparison of Static and Dynamic Approaches. MSc Thesis. Brisbane, University of Queensland. Auld, B. A. and Coote, B. G. (1980). "A Model of a Spreading Plant Population." OIKOS 34: 287-92. Auld, B. A. and Coote, B. G. (1990). "Invade: Towards the Simulation of Plant Spread." Agricultural Ecosystem & Environment 30: 121-128. Auld, B. A., Hosking, J. and McFadyen, R. E. (1982). "Analysis of the Spread of Tiger Pear and Parthenium Weed in Australia." Australian Weeds 2: 56-60. Auld, B. A., Menz, K. M. and Monaghan, N. M. (1978). "Dynamics of Weed Spread: Implications for Policies of Public Control." Protection Ecology 1: 141-148. Berkan, R. C. and Trubatch, S. L. (1997). Fuzzy Systems Design Principles: Building Fuzzy IF-THEN Rule Bases, IEEE Press. Chiou, A., Yu, X. and Adsett, K. (2002). P-Expert Project: Development of a Prototype Expert System for Control and Management of Parthenium in Central Queensland. Transforming Regional Economies and Communities with Information Technology. Green Greenwood Publishing. in press. Chiou, A. and Yu, X. (2001). Prediction of Parthenium Weed Infestation Using Fuzzy Logic Applied to Geographic Information System (GIS) Spatial Image. Proceedings of 10th IEEE Inernational Conference on Fuzzy Systems 2001, Vol. 3. University of Melbourne. Chiou, A., Yu, X. and Adsett, K. (2001). Parthenium Weed Problem in Regional Queensland: The Role of an Expert Advisory System. Proceedings of the Third International Information Technology in Regional Areas Conference. Central Queensland University, CQU Press. Chiou, A., Yu, X. and Lowry, J. (2000). P-Expert: A Prototype Expert Advisory System in the Management and Control of Parthenium Weed in Central Queensland. Second International Discourse With Fuzzy Logic In The New Millennium. Physica-Verlag. Chippendale, J. F. and Panetta, F. D. (1994). "The Cost of Parthenium Weed to the Queensland Cattle Industry." Plant Protection 9: 73-76. Cousens, R. (1985). "A Simple Model Relating Yield Loss to Weed Density." Annals of Applied Biology 107: 239-252. Cousens, R. and Mortimer, M. (1995). Dynamics of Weed Population. Cambridge, Cambridge University Press. Daehler, C. (1998). Comparison of Region-Specific Models for Predicting Plant Invaders: Prospects for a General Model. VIIth International Congress of Ecology Proceedings: 103. Demers, M. N. (2000). Fundamentals of Geographic Information Systems. New York, John Wiley & Sons.
162
A. Chiou and X. Yu
Dragicevic, S. and Marceau, D. J. (1999). "Spatio-Temporal Interpolation and Fuzzy Logic for GIS Simulation of Rural-to-Urban Transition." Cartography and Geographic Information Science 26(2): 125-137. Ghersa, C. M. and Roush, M. L. (1993). "Searching for Solutions to Weed Problems." BioScience 43(2): 104-109. Lee, K.C. and Kim W.C. (1995). “Integration of Human Knowledge and Machine Knowledge by Using Post Adjustment: its Performance in Stock Market Timing Prediction.” Expert Systems 12(4): 331-338. Mortimer, A. M., McMahon, D. J., Manlove, R. J. and Putwain, P. D. (1980). The Prediction of Weed Infestations and Cost of Differing Control Strategies. Proceedings 1980 British Crop Protection Conference - Weeds. Brighton, England. 2: 415-422. Norris, R. F. (1999). "Ecological Implications of Using Thresholds for Weed Management." Journal of Crop Production 2(1): 31-58. Parthenium Action Group (2000). Parthenium Weed: Best Management Practice. Brisbane, DPI Publications. Rejmanek, M. (2000). "Invasive Plants: Approaches and Predictions." Austral Ecology 25(5): 497-506. Rozenfelds, A. C. F. and Mackenzie, R. (1999). The Weed Invasion in Tasmania in the 1870s: Knowing the Past to Predict the Future. 12th Australian Weed Conference Papers and Proceedings: 581-583. Saint-Joan, D. and Desachy, J. (1995). A Fuzzy Expert System for Geographical Problems: an Agricultural Application. Proceedings of IEEE International Conference on Fuzzy Systems. 2: 469-476. Tomlin, C. D. (1990). Geographic Information Systems and Cartographic Modeling. Englewood Cliffs, NJ, Prentice-Hall Inc.
CHAPTER 12 Optimization of Image Compression Method Based on Fuzzy Relational Equations by Overlap Level of Fuzzy Sets Hajime Nobuhara1 , Eduardo Masato Iyoda1 , Kaoru Hirota1 , and Witold Pedrycz2 1 2
Tokyo Institute of Technology {nobuhara, iyoda, hirota}@hrt.dis.titech.ac.jp University of Alberta
[email protected]
Abstract: A design method of coders on YUV color space is proposed based on an overlap level of fuzzy sets, in order to optimize the image compression/reconstruction method based on fuzzy relational equations. In the YUV color representation of the original image, the Y plane contains more information than the U and V planes, in terms of human perception. Therefore, coders with different sizes (Y plane coders bigger than U and V planes) lead to more effective compression/reconstruction, where the appropriate coders for YUV planes can be constructed based on an overlap level of fuzzy sets. Through image compression/reconstruction experiments using 100 typical images (extracted from Corel Gallery, Arizona Directory), it is confirmed that the peak signal to noise ratio of the proposed method increases at a rate of 7.1% ∼ 13.2%, compared to the conventional method, when compression rates range from 0.0234 ∼ 0.0938. Keywords: Fuzzy Relations, Fuzzy Relational Equations, Image Compression, Overlap of Fuzzy Sets.
1 Introduction The Image Compression and reconstruction method based on Fuzzy relational equations (ICF) has been proposed in [1] as an image compression method with practical applicability (e.g. image database [12][13][14][15], digital watermarking [7]) Improvements to ICF have been proposed to achieve faster compression/reconstruction times [5][6]. Currently, ICF is performed over the RGB color space [8], i.e., the original image is considered as three fuzzy relations (R, G, and B fuzzy relations) and these fuzzy relations are compressed/reconstructed by coders with the same size. In this paper, in order to optimize the quality of reconstructed images, the ICF on YUV color space is proposed. In the proposed method, the original image is represented as Y,U, and V fuzzy relations, where the Y plane includes more information than the U and V planes, in terms of human perception. By constructing coders with different sizes for
H. Nobuhara et al.: Optimization of Image Compression Method Based on Fuzzy Relational Equations by Overlap Level of Fuzzy Sets, Studies in Computational Intelligence (SCI) 2, 163– 177 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
164
H. Nobuhara et al.
YUV planes, particularly, by adjusting the size of Y plane coders bigger than that of U and V planes, more efficient compression/reconstruction is achieved. Although there are indexes (parameters) to construct appropriate coders for compression/reconstruction, these indexes depend on the size of the coders. In order to construct appropriate coders with different sizes on the YUV color space, an invariant index called an overlap level of fuzzy sets is proposed. The effectiveness of ICF on YUV using coders with different sizes, constructed based on the overlap level, is shown through experiments using the Corel image gallery. ICF on RGB color space is formulated in Section 2. In Section 3, ICF on YUV color space, and the overlap level for appropriate coders with different sizes are proposed. Section 4 shows image compression and reconstruction experiments using 100 typical natural images (extracted from Corel Gallery, Arizona Directory) in order to compare ICF on YUV with ICF on RGB.
2 Formalization of ICF on RGB Color Space The lossy Image Compression and reconstruction method based on Fuzzy relational equations (ICF) treats an original gray scale image of size M × N as a fuzzy relation R ∈ F(X × Y), X = {x1 , x2 , . . . , xM }, Y = {y1 , y2 , . . . , yN } by normalizing the intensity range of each pixel into [0, 1]. In RGB color space [8], an original image is expressed by three fuzzy relations, i.e., P = {P(R) , P(G) , P(B) } ⊂ F(X × Y) denote the red, green, and blue planes of the color image, respectively. ICF compresses the original image P into an image C = {C(R) ,C(G) ,C(B) } ⊂ F(I × J) of size I × J by max t-norm composition, i.e.,
(X) (X) , (1) C(X) (i, j) = max B j (y) t max Ai (x) t P(X) (x, y) y∈Y
x∈X
(X)
(X)
where t denotes a continuous t-norm [4], and Ai ∈ A(X) ⊂ F(X), B j ∈ B(X) ⊂ F(Y) are the coders (Figure 1). In this chapter, only max t-norm composite fuzzy relational equations are considered, but other types of fuzzy relational equations, e.g., min s-norm composite, adjoint of max t-norm, and adjoint min s-norm composite fuzzy relational equations can also be applied to ICF [3] [9] [10] [11]. The super script (X) denotes (R), (G), and (B), respectively. The coders (Gaussian type) of R, G, and B planes are defined by (X)
(X)
(X)
A(X) = {A1 , A2 , . . . , AI }, 2 iM (X) −m Ai (xm ) = exp −Sh , I
(2) (3)
(m = 1, 2, . . . , M), and (X)
(X)
(X)
B(X) = {B1 , B2 , . . . , BJ }, 2 jN (X) −n B j (yn ) = exp −Sh , J
(4) (5)
Optimization of Image Compression Method Based on Fuzzy Relational Equations
165
(n = 1, 2, . . . , N), (X) = (R), (G), (B). Triangular and parabolic membership functions can also be considered as coders for ICF. The coders defined above are uniform, i.e, the fuzzy sets have the same size and shape for each plane. This is done to preserve the color balance in the reconstructed image. The comI×J . The image reconstrucpression rate ρ of an image compressed by ICF is given by M×N tion process of ICF corresponds to an inverse problem, under the condition that the compressed image C(X) and coders A(X) , B(X) , ((X) = (R), (G), (B)) are given (Figure 2). A method for solving the inverse problem has been presented in [6], and the reconstructed image P˜ = {P˜ (R) , P˜ (G) , P˜ (B) } ⊂ F(X × Y) is given by (X) (X) P˜ (X) (x, y) = min B j (y) ϕt Ai (x) ϕt C(X) (i, j) , (6) j∈J
where ϕt denotes the t-relative pseudo-complement defined by aϕt b = sup{c ∈ [0, 1]|at c ≤ b}. A
t-norm t
(7)
must be selected such that ∀a, b ∈ [0, 1],
atb ≤ at b.
(8)
In order to perform the image reconstruction process (Eqs.(6)-(8)), a parameterized t-norm (e.g. Yager’s t-norm [5] [9]) should be employed.
R Plane Coders
N
(R)
A ,B
(R)
I
G Plane Coders (G)
A ,B
M
(G)
J B Plane Coders (B) (B) A ,B Compressed Image Original Image
Max Continuous T-norm Composition
Correspondence
Original Image (Fuzzy Relations)
T
Coders (Fuzzy Relations)
Compressed Image (Fuzzy Relations)
Fig. 1. Color Image Compression by Fuzzy Relational Equations
166
H. Nobuhara et al.
R Plane Coders
N
(R)
A ,B
(R)
I
G Plane Coders (G)
A ,B
M
(G)
J B Plane Coders (B) (B) A ,B
Compressed Image
Reconstructed Image Given Correspondence
Original Image (Fuzzy Relations)
T
Coders (Fuzzy Relations)
Compressed Image (Fuzzy Relations)
Fig. 2. Color Image Reconstruction by Fuzzy Relational Equations
An example of image compression and reconstruction is performed using an original image of size M × N = 256 × 256 pixels (File No.611000, Arizona Directory, Corel Gallery) is shown in Fig. 3. The original image is compressed into an image of size I × J = 64 × 64 (ρ = 0.0625) using Yager’s t-norm (9) at p b = 1 − min 1, {(1 − a) p + (1 − b) p }1/p , where the parameter p in the compression and reconstruction processes are set at 1.0 and 1.8, respectively. The parameter Sh of Eqs. (2)- (5) is set at 0.2, 0.05, and 0.01, respectively. If the parameter Sh of coders is large, the sharpness of the reconstructed image increases, but the white slit is more prominent, as shown in Fig. 6. Conversely, if Sh is small, then the white slit is decreased, but the reconstructed image is blurred (Fig. 4). Therefore, we assign a compromise value between these opposite sides (Fig. 5).
3 ICF on YUV Color Space ICF on YUV color space [8] is considered. The RGB to YUV conversion is performed by P(Y ) (x, y) = 0.299 · P(R) (x, y) + 0.587 · P(G) (x, y) + 0.114 · P(B) (x, y), P(U) (x, y) = −0.1687 · P(R) (x, y) − 0.3313 · P(G) (x, y) + 0.5 · P(B) (x, y) + 0.5, P(V ) (x, y) = 0.5 · P(R) (x, y) − 0.4187 · P(G) (x, y) − 0.0813 · P(B) (x, y) + 0.5, (x, y) ∈ X × Y.
(10)
Optimization of Image Compression Method Based on Fuzzy Relational Equations
167
Fig. 3. Original image (File No.611000, Arizona Directory, Corel Gallery) Membership Value ’coder_001.dat’
1
0.8
0.6
0.4
0.2
0
0
20
40
60
80
100
Pixel Coordinate
Fig. 4. Reconstructed image (left) and coders (right), Sh = 0.01 Membership Value ’coder_005.dat’
1
0.8
0.6
0.4
0.2
0
0
20
40
60
80
100
Pixel Coordinate
Fig. 5. Reconstructed image (left) and coders (right), Sh = 0.05
Figure 7 shows the original image and its RGB representation, and Fig. 8 shows the original image and its YUV representation. As shown in Fig. 7, the appearance of each plane is almost the same, which means that information quantity of each plane is almost the same in terms of human perception. ICF on
168
H. Nobuhara et al.
Membership Value ’coder_02.dat’
1
0.8
0.6
0.4
0.2
0
0
20
40
60
80
100
Pixel Coordinate
Fig. 6. Reconstructed image (left) and coders (right), Sh = 0.2
Fig. 7. Original image (upper left) and R (upper right), G (lower left), and B (lower right) planes.
RGB requires coders with the same size and shape of membership function, in order to preserve the color balance in the reconstructed image. As noted in Fig. 8, in the case of ICF over YUV color space, intensity distributions of Y, U, and V planes are deviated, and the Y plane particularly includes more important information than that of U and V planes from the viewpoint of human perception [8]. An effective compression/reconstruction can be achieved by constructing coders with different sizes, i.e., defining the size of Y plane coders bigger than that of U and V planes. However, the indexes (parameters) required to construct appropriate
Optimization of Image Compression Method Based on Fuzzy Relational Equations
169
Fig. 8. Original image (upper left) and Y (upper right), U (lower left), and V (lower right) planes.
coders for compression/reconstruction depend on the size of the coders. In this paper, an invariant index for constructing appropriate coders with different sizes independently of compression rates and type of membership functions, called an overlap level of fuzzy sets, is proposed. The overlap is defined as α(Ai−1 , Ai ) = min {max(Ai−1 (x), Ai (x))} , (i)
(11)
x∈X(i−1) (i)
where X(i−1) = {xm ∈ X | ai−1 ≤ xm ≤ ai } and ai corresponds to the center point of Ai (see Fig. 9). The overlap level of coders A, i.e., the average of overlap levels of fuzzy sets in A, is defined as 1 I (12) α¯ = ∑ α(Ai−1 , Ai ). I − 1 i=2 In order to confirm the overlap level invariance to compression rates and type of membership functions, root mean square errors (RMSE) of reconstructed images are measured with respect to the overlap level, for compression rates equal to 0.0625 and 0.1406, respectively. Here, 1000 test images extracted from the Corel Gallery are used to measure the average values. The RMSE ε is defined as # $ 2 $∑ (X) % (X)∈{Y,U,V } ∑(x,y)∈X×Y E (x, y) , ε= 3 × |X × Y|
170
H. Nobuhara et al.
A(x) Ai-1
Ai
ai-1
ai
Overlap Level
x
Fig. 9. Overlap level of fuzzy sets Ai−1 and Ai . E (X) (x, y) = P(X) (x, y) − P˜ (X) (x, y).
(13)
In this experiment, Gaussian (Eqs.(2) - (5)), triangular, and parabolic membership functions are used. The triangular membership functions of coders are defined as: (X)
(X)
(X)
A(X) = {A1 , A2 , . . . , AI (X) }, 1 m − iM D if D I (X) + Ai (xm ) = 1 m − iM if D I (X)
(14) m≤ m>
iM , I (X) iM , I (X)
(15) (m = 1, 2, . . . , M), and (X)
(X)
(X)
B(X) = {B1 , B2 , . . . , BJ (X) }, 1 n − jN D if D J (X) + (X) B j (yn ) = jN 1 n − (X) if D
J
(16) n≤ n>
jN , J (X) jN , J (X)
(17) (n = 1, 2, . . . , N), (X) = (R), (G), (B). The parabolic membership functions of coders are defined as: (X)
(X)
(X)
A(X) = {A1 , A2 , . . . , AI (X) }, iM (X) Ai (xm ) = −Pa m − (X) − 1.0, I
(18)
(19)
Optimization of Image Compression Method Based on Fuzzy Relational Equations
171
(m = 1, 2, . . . , M), and (X)
(X)
(X)
B(X) = {B1 , B2 , . . . , BJ (X) }, jN (X) B j (yn ) = −Pa n − (X) − 1.0, J
(20)
(21) (n = 1, 2, . . . , N), (X) = (R), (G), (B). Figures 10 and 11 show RMSE comparison with respect to the overlap level of each membership function. From Figs. 10 and 11, the appropriate overlap level α¯ is approximately 0.85 for all membership functions in terms of RMSE. It is confirmed that the overlap level is invariant to compression rate and type of membership functions. RMSE 80 75
Gaussian Triangular Parabolic
70 65 60 55 50 45 40 35 30 25 0.4
0.5
0.6
0.7
0.8
0.9
1
Overlap Level
Fig. 10. RMSE with respect to overlap level (compression rate = 0.0625).
Although triangular and parabolic functions can also be used as coders in ICF, in this paper only Gaussian are considered. The Gaussian coders for YUV color space are defined as (X)
(X)
(X)
A(X) = {A1 , A2 , . . . , AI }, 2 iM (X) Ai (xm ) = exp −Sh (X) − m , I
(22) (23)
(m = 1, 2, . . . , M),
172
H. Nobuhara et al. RMSE 40 38
Gaussian
36
Triangular
34
Parabolic
32 30 28 26 24 22
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Overlap Level
Fig. 11. RMSE measurement with respect to overlap level (compression rate = 0.1406).
and (X)
(X)
(X)
B(X) = {B1 , B2 , . . . , BJ }, 2 jN (X) B j (yn ) = exp −Sh (Y ) − n , J
(24) (25)
(n = 1, 2, . . . , N), (X) = (Y ), (U), (V ). The compressed images are expressed by C = {C(Y ) ∈ F(I(Y ) × J(Y ) ),C(U) ∈ F(I(U) × J(U) ),C(V ) ∈ F(I(V ) × J(V ) )}. In order to perform effective compression/reconstruction, the number of fuzzy sets in the coders of the U and V planes are reduced, such that, |I(U) × J(U) | = |I(V ) × J(V ) | < |I(Y ) × J(Y ) |.
(26)
The shape of fuzzy sets in coders of Y, U, and V planes are adjusted to suit the appropriate overlap level (≈ 0.85).
4 Experimental Comparisons Image compression and reconstruction experiments using 100 typical images (extracted from Corel Gallery, Arizona Directory, image size M × N = 256 × 256 pixels) are performed in order to compare the proposed method with the conventional method which corresponds to image compression on RGB color space using uniform coders defined by Eqs.(2)-(5). The proposed method corresponds to image compression on YUV color space using non-uniform coders defined by Eqs.(22)-(25). Peak signal to noise ratio (PSNR) of the proposed method and the conventional method is measured, for the compression rates defined in Tabs. 1 and 2.
Optimization of Image Compression Method Based on Fuzzy Relational Equations
173
Table 1. Compression rates and corresponding coder sizes in the conventional method Compression Rates R plane size G plane size B plane size 40 × 40 48 × 48 60 × 60 80 × 80
0.0234 0.0366 0.0527 0.0938
40 × 40 48 × 48 60 × 60 80 × 80
40 × 40 48 × 48 60 × 60 80 × 80
Table 2. Compression rates and corresponding coder sizes in the proposed method Compression Rates Y plane size U plane size V plane size 64 × 64 80 × 80 96 × 96 128 × 128
0.0234 0.0366 0.0527 0.0938
16 × 16 20 × 20 24 × 24 32 × 32
16 × 16 20 × 20 24 × 24 32 × 32
Figure 12 shows the results of a comparison between the proposed method and the conventional method. Figures 13 - 18 illustrate examples of reconstructed images obtained by the proposed method and the conventional method, under the condition that the compression rate is 0.0938. As can be observed in Fig. 12, it is confirmed that the PSNR of the proposed method increases at a rate of 7.1% ∼ 13.2%, compared with the conventional method for the compression rates ranging in the interval 0.0234 ∼ 0.0938. Furthermore, Figs. 13 - 18 show that the quality of reconstructed images obtained by the proposed method is better than that obtained by the conventional method. PSNR 22
YUV + Non-uniform 21
20
19
18
RGB+Uniform
17
16
15 0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Compression Rate
Fig. 12. PSNR Comparison.
174
H. Nobuhara et al.
Fig. 13. Original Image (File No. 611000)
Fig. 14. Reconstructed Images, ICF on RGB (left), ICF on YUV (right), File No. 611000.
5 Conclusions An image compression/reconstruction method based on fuzzy relational equations on YUV color space has been proposed. In the case of ICF over YUV color space, intensity distributions of Y, U, and V planes are deviated, and an effective compression and reconstruction can be achieved by applying coders with different sizes to each plane. In order to design appropriate coders with different sizes, an invariant index for compression rates and types of membership function in coders, called the overlap level, has been proposed. Through image compression/reconstruction experiments using 100 typical natural images (extracted from Corel Gallery, Arizona Directory), it is confirmed that peak signal to noise ratio of the proposed method (ICF on YUV color space, non-uniform coders) is increased at a rate of 7.1% ∼ 13.2%, compared with the conventional method (ICF on RGB color space, uniform coders) for compression rates ranging from 0.0234 ∼ 0.0938.
Optimization of Image Compression Method Based on Fuzzy Relational Equations
175
Fig. 15. Original Image (File No. 611001)
Fig. 16. Reconstructed Images, ICF on RGB (left), ICF on YUV (right), File No. 611001.
Fig. 17. Original Image (File No. 611002)
References 1. Hirota K., and Pedrycz W., Fuzzy Relational Compression, IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 29, No. 3, 1999, pp. 407-415.
176
H. Nobuhara et al.
Fig. 18. Reconstructed Images, ICF on RGB (left), ICF on YUV (right), File No. 611002.
2. Liang K., and Kuo C., WaveGuide: A Joint Wavelet-Based Image Representation and Description System, IEEE Transactions on Image Processing, Vol. 8, No. 11, 1999, pp. 1619-1629. 3. Miyakoshi M., and Shinbo M., Solutions of Composite Fuzzy Relational Equations with Triangular Norms, Fuzzy Sets and Systems, vol. 16, no. 1, 1985, pp. 53-64. 4. Mizumoto M., Pictorial Representation of Fuzzy Connectives, Fuzzy Sets and Systems, Vol. 31, No. 2, 1989, pp. 217-242. 5. Nobuhara H., Pedrycz W., and Hirota K., Fast Solving Method of Fuzzy Relational Equation and its Application to Image Compression/Reconstruction, IEEE Transaction on Fuzzy Systems, Vol. 8, No. 3, 2000, pp. 325-334. 6. Nobuhara H., Takama Y., and Hirota K., Image Compression/Reconstruction Based on Various Types of Fuzzy Relational Equations, The Transactions of The institute of Electrical Engineers of Japan, Vol. 121-C, No. 6, 2001, pp.1102-1113. 7. Nobuhara H., Pedrycz W., and Hirota K., A Digital Watermarking Algorithm using Image Compression Method based on Fuzzy Relational Equation, IEEE International Conference on Fuzzy Systems, CD-ROM Proceeding, 2002, Hawaii, USA. 8. Pennebaker W. B. and Mitchell J. K, JPEG : Still Image Data Compression Standard, VAN NOSTRAND REINHOLD, NEW YORK, 1993. 9. Pedrycz W., Fuzzy Relational Equations with Generalized Connectives and Their Applications, Fuzzy Sets and Systems, vol. 10, 1983, pp. 185-201. 10. Pedrycz W., On Generalized Fuzzy Relational Equations and Their Applications, Journal of Mathematical Analysis and Applications, 107, 1985, pp. 510-536. 11. Pedrycz W., Processing in Relational Structures: Fuzzy Relational Equations, Fuzzy Sets and Systems, vol. 40, no. 1, 1991, pp. 77-106. 12. Shanbehzadeh J., Moghadam A.M.E., and Mahmoudi F., Image Indexing and Retrieval Techniques : Past, Present, and Next, Proc. of SPIE, vol. 3972, 2000, pp. 461-470. 13. Stejic Z., Iyoda E.M., Takama Y., and Hirota K., Content-Based Image Retrieval using Local Similarity Patterns defined by Interactive Genetic Algorithm , Late-Breaking Papers of the Genetic and Evolutionary Computation Conference (GECCO-2001), SAN FRANCISCO, CA, USA, July 2001, pp. 390-397. 14. Yu D., Liu Y., Mu Y., and Yang S., Integrated System for Image Storage, Retrieval and Transmission Using Wavelet Transform, Proc. of SPIE, vol. 3656, 1999, pp. 448-457.
Optimization of Image Compression Method Based on Fuzzy Relational Equations
177
15. Zhu B., Ramsey M., and Chen H., Creating a Large-Scale Content-Based Air-photo Image Digital Library, IEEE Transactions on Image Processing, vol. 9, no. 1, 2000, pp. 163-167.
CHAPTER 13 Multi-layer Image Transmission with Inverse Pyramidal Decomposition 1
2
3
Roumen Kountchev , Mariofanna Milanova , Charles Ford and Roumiana 4 Kountcheva 1
Technical University of Sofia University of Arkansas at Little Rock 3 University of Arkansas at Little Rock 4 T&K Engineering Co. 1 Department of Radiocommunications, Technical University of Sofia, Boul. Kl. Ohridsky 8, Sofia 1000, Bulgaria
[email protected] 2 Department of Computer Science, University of Arkansas at Little Rock, AR, 72204, USA
[email protected] 3 Department of Computer Science, University of Arkansas at Little Rock, AR, 72204, USA
[email protected] 4 T&K Engineering Co. Mladost 3, POB12, Sofia 1712, Bulgaria
[email protected] 2
Abstract: This paper presents a new image compression method based on the Inverse Difference Pyramid (IDP) decomposition, and one specific application of this method aimed at layered image transfer via a standard communication networks. A basic feature of the IDP method is that image decomposition is in the frequency domain coupled with the fact that every successive layer consists of increasing number of spectral coefficients, i.e. the pyramid, constituted by these coefficients values, is inverse. The higher pyramid levels correspond with higher image quality after image restoration. This permits, depending on the specific requirements, the image to be transferred and restored layer by layer until the required image quality is obtained. A special header of the compressed image data contains information about the number of pyramid levels, the coefficients used, etc. which can be used, when necessary, to transfer and restore only a selected part of the image only. In this case, the initial information (the data from the lower pyramid levels of the compressed image) is first transferred via a communication network. At the receiving side the image is visualized scaled down. At this point, the customer can, on demand, request to see the whole picture or a part of it, enlarged and with more details. The basic advantage of the method described for pyramidal image decomposition is that the received initial information for the lowest pyramid levels is used to upgrade the whole image while no part of the compressed image data is sent twice. Keywords: image compression, inverse pyramid decomposition, layered image transfer, fast image transfer
R. Kountchev et al.: Multi-layer Image Transmission with Inverse Pyramidal Decomposition, Studies in Computational Intelligence (SCI) 2, 179–196 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
180
R. Kountchev et al.
1. Introduction The up-to-date application of image compression systems in e-commerce, digital libraries and mobile video communications, require the creation of specific structures of compressed data to facilitate the process of data transmission and receiving. In order to meet these requirements, image compression methods must ensure random code stream access and processing, progressive transmission by pixel accuracy and resolution, to have open architecture, and to permit content based description. Some of the most effective methods for still image compression are based on different kinds of pyramidal decompositions. There are two basic approaches used for the pyramid creation: non-orthogonal [1-6] and orthogonal decomposition [7-9]. These pyramidal decompositions have some common deficiencies: they require multiple decimations and interpolations together with low frequency or subband filtration, which cause specific distortions in the restored images. The specific features of the Inverse Difference Pyramid (IDP) method [10, 11] which result from the image decomposition in the frequency domain, avoid these deficiencies to a great extent. This paper represents one application of the IDP method, which permits flexible image transfer layer by layer, gradually increasing its resolution and quality.
2. Basic principles of the IDP decomposition The essence of the IDP decomposition can be explained for an 8-bit grayscale image as follows. The image is processed with a two-dimensional (2D) orthogonal transform using only limited number of coefficients. The values of the coefficients, calculated by the transform, constitute the first pyramid level. Then, using the values of these coefficients, the image is restored with an inverse orthogonal transform and then - subtracted pixel by pixel from the original one. The obtained difference image, which is of the same size as the original, is divided into 4 sub-images and each sub-image is processed with the 2D orthogonal transform again; the values of the coefficients constitute the second pyramid level. The processing continues in a similar way for the following pyramid levels. In this way, all levels consisting of transform coefficients only are calculated. The set of the retained orthogonal transform coefficients for every pyramid level can be different. The image decomposition is stopped when the required image quality for the restored image is reached - usually earlier than the highest pyramid level. The coefficients, obtained as a result of the orthogonal transform from all pyramid levels, are quantizated, sorted in accordance with their frequency, and scanned sequentially. In order to achieve higher compression ratio the data is processed with adaptive entropy and run-length coding. The algorithm described above is shown on Figure 1. The application of the IDP method, aimed at layered image transfer, is performed as follows. At the beginning, the matrix of the halftone image with elements B(i,j) is divided in K sub-images with size 2n×2n. Then, each sub-image is represented with IDP, consisting of p P levels (1
L p × L p and L p = 2 n − p elements, we obtain the coefficients of the
corresponding transform, defined with the equation:
Multi-layer Image Transmission with Inverse Pyramidal Decomposition
s k p (u r , v r ) =
L p −1L p −1
∑ ∑T i =0 j=0
k p (i,
181
(1)
j) t p (i, j, u r , v r )
for r = 1,2,.., Rp and ur,vr = 0,1,..,Lp-1. B k 0 (i, j)
E k p-1 (i, j)
+ 1
2
3
4
Inverse Orthogonal Transform
−
Σ
~ ~ B k 0 (i,j)/E k p- 2 (i, j )
Z p (t)
RLD + HD
s 'k p (u r , v r )
IDP coder
s qk p (u r , v r )
s kp (u r , v r )
Truncated Orthogonal Transform
B k 0 (i,j)/E k p -2 (i, j)
MS
IQ
s qk p ( t)
s qk p ( t)
Inverse Orthogonal Transform
+
~ ~ B k 0 (i, j)/E k p -1 (i, j)
RLE + HE
IMS
+
Σ
IDP decoder
Q
Z p (t)
s qk p ( u r , v r )
IQ
s′k p (u r ,v r )
P -1 ~ ~ B k 0 (i, j)+∑ E k p −1 (i, j) p =1
B′k p (i, j )
B′k 0 (i, j)
Fig. 1: Block diagram of the recursive IDP algorithm MS and IMS are blocks for forward and inverse “meander” scans of the two-dimensional data massif from a given frequency band of the corresponding pyramid level and (RLD+HD) is a block for decoding the series of equal symbols and the Huffman code; Q – quantization; IQ – inverse quantization.
182
R. Kountchev et al.
n-p Here Rp is the number of the retained transform's coefficients, limited by 1•Rp•4 ; tp(i,j,ur,vr) which is the element (i,j) of the basic image (transform kernel) with spatial frequency (ur,vr) in the layer p, defined by the selected orthogonal transform, and
B k (i, j) Tk p (i, j) = 0 E (i, j) k p-1
for p = 0; for p = 1,2, ..., P - 1
(2)
where i, j = 0,1,2,..,Lp-1. In this case
B k 0 (i, j) is one element of the sub-image k0 from the input image, and
E k p−1 (i, j) - one element of the difference image in the layer p = 2,..,P-1, defined with the recursive relation:
~ ~ Е k p−1(i, j) = E k p−2 (i, j) − E k p−2 (i, j) = Tk p −1 (i, j) − Tk p −1 (i, j) ,
(3)
and in accordance with the initial condition p=1:
~ ~ Е k 0 (i, j) = B k 0 (i, j) − B k 0 (i, j) = Tk 0 (i, j) − Tk 0 (i, j) , where
(4)
~ Tk p (i, j) is one element of the approximating model for the layer p of the
restored image, obtained with the truncated inverse two-dimensional orthogonal transform of the coefficients from the sub-image kp, in correspondence with the relations:
~ Tkp (i, j) =
∑∑s′ ur vr
k p (u r , v r ) f 0 (i, j, u r ,v r ) = 4
p−n
∑∑s′ ru
rv
kp (u r ,v r ) t 0 (i,
j, u r , vr )
(5)
for i, j = 0,1,..,Lp-1 and r = 1,2,.., Rp. Here f p (i, j, u r ,v r ) is the kernel of the inverse orthogonal transform;
{
}
{ [
s′k p (u r , v r ) = Q −p1 s qk p (u r , v r ) = Q −p1 Q p s k p (u r , v r ) where
]}
(6)
Q p {∗} and Q −1 p {∗} are the corresponding operators for quantization and
dequantization of the coefficients
s k p (u r , v r ) from the sub-image kp transform, and
s qk p (u r , v r ) are the quantizated coefficients from the same transform. Quantization and dequantization operations are performed in accordance with the relations:
s qk p (u r , v r ) = Q p {s k p (u r , v r )} = s k p (u r , v r ) / ∆ p (u r , v r )
s′k p (u r , v r ) = Q −p1{s qk p (u r , v r )} = s qk p (u r , v r ) ∆ p (u r , v r ) ,
(7) (8)
Multi-layer Image Transmission with Inverse Pyramidal Decomposition
where symbol
183
∆ p (u r , v r ) is the corresponding quantization step, set in advance, and the
. in Eq. (6) and (7) is an operator, representing that only the integer part of the
result is used. The coefficients
s qk p (u r , v r ) from all sub-images in the pyramid layer p are gathered
in Rp two-dimensional blocks in correspondence with their spatial frequency ( u r , v r ) for r=1,2,.., Rp, and are arranged in a one-dimensional data sequence (“meander” scan). After that the data is processed with two-stage lossless coding: - coding the lengths of the series of equal symbols (RLE); - entropy coding with adaptive modified Huffman code (HE). As a result, the compressed data for layer p of the processed image is obtained. The compression ratio and the quality of the restored image depend on the selected number of coefficients for every sub-image Rp, on the quantization step ∆ p (u r , v r ) and on the kind of the selected orthogonal transform, which defines the basic images tp(i,j,ur,vr) in Eqs. (1) and (5). The inverse processing of the compressed image data starts with entropy and RLE
s qk p (u r , v r ) in correspondence ~ with Eq. (6), calculation of the model of the sub-image Tk p (i, j) with inverse orthogonal
decoding, followed by dequantization of the coefficients
transform - Eq. (5), and finally - image restoration when all the information from the P pyramid layers is transferred, in correspondence with the equation:
B′k 0 (i, j) = Here
P −1
~
∑T p =0
kp
(i, j) , for i, j = 0,1,2,. ., Lp-1.
(9)
B′k 0 (i, j) is the gray level (respectively - the brightness) of the element (i,j) in the
restored image. The same method is also used for processing of 24-bit color bmp images. When color images are represented with the components Y, U and V and are compressed, three similar pyramids are built: one for the luminance (Y) and two for the chrominance components (U,V). The chrominance pyramids have the same number of levels as the luminance one, but their resolution differs in accordance with the selected standard – 4:4:4, 4:2:2, 4:2:0, or 4:1:1. For RGB images the preferred standard is usually 4:4:4. A higher compression ratio for the color images is obtained using the techniques already explained for grayscale images: limitation of the coefficients and pyramid levels number, quantization with increasing step, and entropy coding. For standards 4:2:2, 4:2:0, or 4:1:1 every pyramid could have different settings, but the standard 4:4:4 requires the settings for the three pyramids to be the same. The algorithm offers the highest compression ratio with retained image quality for the 4:2:0 standard. All parameters, used for the image compression, are described in a special header, sent together with the compressed data. The header contains information about the number of used pyramid levels (image layers), the start level, the selected orthogonal transform in every level, the number of coefficients and their corresponding spatial frequencies, the quantization parameters and the kind of entropy coding for every level.
184
R. Kountchev et al.
The results obtained with IDP decomposition of P=9 levels and K=1 are shown in Figure 2. An example is the grayscale test image, “Lena,” with size 512u512 pixels and 8 bpp. It is easy to evaluate the increasing image quality, which corresponds with the higher pyramid levels.
p=0, PSNR(0)=14,07 dB p=1, PSNR(1)=14,67 dB
p=2, PSNR(2)=16,04 dB
p=3, PSNR(3)=18,50 dB p=4, PSNR(4)=20,69 dB p=5, PSNR(5)=23,71 dB
p=6, PSNR(6)=27,11 dB p=7, PSNR(7)=32,10 dB p=8, PSNR(8)=40,85 dB Fig.2. Test image “Lena”, restored using the data, obtained from 9 consecutive pyramid layers
Multi-layer Image Transmission with Inverse Pyramidal Decomposition
185
3. IDP adaptation The retained coefficients of the transform, which describe the sub-image model in layer p, are defined with the relation:
s k p ( u r , v r ) = m p (u, v) s k p (u , v) for
(10)
u, v = 0,1,.., L p −1 , r = 1,2,..,Rp,
where
s k p (u,v) is the coefficient (u,v) of the sub-image kp transform in layer p; mp(u,v) is
one element from the binary mask, described with the matrix [Mp] with size Lp× Lp, which defines the positions of the retained coefficients in the transform; Rp is connected with mp(u,v) with the relation: L p −1L p −1
R p=
∑ ∑ m (i,j) .
(11)
p
i = 0 j= 0
In order to increase the efficiency of the pyramidal decomposition the described method should be adapted in correspondence with the image contents. The parameters, which could be changed for this purpose, are the sub-image size L p× L p , the number of the pyramid layers P, and the values of the elements for the layer p in the mask [Mp]. Below the two basic algorithms used for the IDP adaptation are explained: The First algorithm is based on the relation between MSE (Mean Square Error) and the number of layers, P. The minimum value of P (for P=2,3, .. , n) is defined in accordance with following conditions: 2
2 n −12 n −1
ε (P)=
∑∑ i = 0 j=0
P −1 ~ ~ [B(i,j)−B k 0 (i,j)− L k p (i,j)]2 ≤ ε 0
∑
(12)
p =1
2
and ε (P −1) > ε 0 . 2
Here ε0 is a threshold, selected in advance, which defines ε (P) of the restored image. In this case the compression ratio is different for every image and depends on the image contents. The Second algorithm is related to the number of the retained coefficients and their positions, defined with the elements of the mask [Mp]. This algorithm is performed in the following four steps: • Step 1. Calculation of the modules of the transform coefficients for every sub-image in the current layer p=1,2,..P-1:
1 | s k p (u, v) | = | 2 Lp
L p −1 L p −1
∑ ∑L i =0
j= 0
k p (i, j) t p (i,
j, u, v) |
(13)
• Step 2. Calculation of the modules of the mean transform coefficients for the layer p:
186
R. Kountchev et al. p
| s p (u , v ) | =
1 4 K | s (u, v) | 4 p K k p =1 k p
∑
(14)
• Step 3. Arrangement of the modules of the mean coefficients in monotonously decreasing order, limited with the selected value for Rp in the interval 1≤ Rp< 4n-p:
| s (u 1 ,v1) | ≥ | s (u 2 ,v 2 ) | ≥ . .. .≥ | s (u R p , v R p ) | •
(15)
Step 4. The elements mp(u,v) of the mask [Mp] are defined with the relation:
1 for u = u r and v = v r when r = 1,2,..., R p ; m p ( u , v) = in all other cases. 0
(16)
The defined matrix-mask [Mp] ensures the obtaining of the required compression ratio. In this case the quality of the restored image depends on the image contents and is different for every image.
4. Image transmission with multi-layered IDP decomposition The coefficients
k
s p p (u ,v) build the pyramid in the image frequency domain. The number
of the coefficients in every pyramid level is correspondingly: 2 n −1 2 n −1 0
0
p = 0 - 4 R0 = 4
∑ ∑m u =0 v =0
0 ( u , v)
2 n −1 −1 2 n −1 −1
p=1
∑ ∑ m ( u , v)
- 41 R1 = 4
u =0
2
p = 2 - 4 R2 = 4
v =0
∑∑ u =0 v =0
(18)
1
(19)
2 n −1 2 n −1 2
(17)
m 2 ( u , v)
……………………………………………………………………….. 2 n − r −1 2 n − r −1
p = r - 4r Rr = 4r
∑ ∑ u =0
v =0
(20)
m r ( u , v)
In this case the total number of the coefficients in a pyramid of r levels is: r
∑
R Σ (r ) =
s =0
r
2 n −s 2 n −s
∑ ∑ ∑4
4s R s =
s =0 u =0 v =0
s
m s ( u , v) for r < n,
In particular, if R0 = R1 =....= Rr = 4 from Eq. (22) is obtained:
(21)
Multi-layer Image Transmission with Inverse Pyramidal Decomposition r
∑
R Σ (r ) =
s =0
4 s +1 =
4 2 r 1 1 r+2 (4 − ) ≈ 4 3 4 3
187
(22)
If the matrix [B(2n)] represents one block from the image with size NxN pixels, then the total number of the coefficients for all blocks is: n −s n −s
RΣ =
r 2 2 N2 2 R ( r ) = N 4 s − n m s ( u , v) Σ n 4 s =0 u =0 v =0
∑∑∑
(23)
Then, for R0 = R1 =....= Rr = 4 and from Eq. (23) is obtained:
R Σ = ( N 2 / 3)4 r − n + 2 .
Finally all the coefficients
k
s p p (u ,v) from the pyramid level p are processed with lossless
compression which results in obtaining the binary massif {Zp(r): r=1,2,..,lp}, with length lp. At the beginning of the massif {Zp(r)}, the special header, Hp is included. The description of the IDP algorithm, given here, includes all the operations used in the process of the image compression. In the case when the image has to be transferred via the Internet, or represented with increasing resolution, it is suitable that this be done layer by layer. The nature of the method, based on the pyramidal decomposition, suits this approach very well. It could be used for e-commerce applications: at the beginning, only a small picture is shown, and if requested, the detailed image, or a selected part of it, follows. For example, it would be convenient, when a customer needs to view only a part of a road or a city map. Such applications are best illustrated with Fig. 3. The restored initial image is shown in Fig.3a. This is the lowest pyramid level, representing the whole image, transferred with low resolution and visualized scaled down (1/4 of the real image size). If the customer wants to see the whole or full-size picture, there are no special requirements and the remaining information (the higher pyramid levels up to the last one), follow. If only a part of the image is needed, the customer can mark it (Fig.3b), and after that only the corresponding information for the higher pyramid levels will be transferred and received. The two following pyramid levels (containing the selected part of the image only) as they are received and visualized, are shown in Fig. 3c and Fig. 3d. Only three of the pyramid levels are used in this example. An advantage of the method is that no part of the compressed data is transferred twice. The information from the initial image (the first pyramid level) is used as a basis for the upgraded full-size image (or the selected part of it) and when all the remaining information is transferred, the image is restored.
Fig.3.a
Fig.3.b
188
R. Kountchev et al.
Fig.3.c
Fig.3. Example image HongK. Imagea is the restored image (scaled down to 25%) after compression 179:1; Image b shows the selected part; Image c represents the result from the restoration of the selected part of the image using the data from pyramid level 2
Fig.3.d
Fig.3. Example image HongK. Image d represents the result from the restoration of the selected part of the image using the data from pyramid level3.
Multi-layer Image Transmission with Inverse Pyramidal Decomposition
189
The simpler case is to select the upper (lower) left (right) corner, asshown in Fig. 4. The header and the data of the compressed image are rearranged depending on the selected part of the picture.
Fig.4.a
Fig.4.b
Fig.4. Image a is the restored image after compression 225:1; Image b shows the marked selected part.
Fig.4.c
190
R. Kountchev et al.
Fig.4.d Fig.4. Example image City. Images c and d represent the results from the restoration of the selected part of the image using the data from pyramid levels 2 and 3.
5. Computational complexity of IDP and JPEG 2000 This section presents the evaluation of the computational complexity of IDP based on a combination of discrete cosine transform (DCT) and Walsh-Hadamard Transform (WHT), compared with the wavelet image transform, used in the standard JPEG2000.
5.1. Computation complexity of the IDP decomposition In order to evaluate the computation complexity of the IDP decomposition, based on DCT (Discrete Cosine Transform), it is necessary to define the number of additions (A) and multiplications (M) used for the calculation of the DCT coefficients. In the case that the Lee algorithm [5] for fast DCT of N-dimensional vector for N=2n is used, then:
191
Multi-layer Image Transmission with Inverse Pyramidal Decomposition
3 ( n 1) N 1 ; 2
A
1 nN , where n 2
M
lg 2 N .
(24)
In the case of truncated DCT to only one spectral coefficient, the relation in Eq. (24) becomes
A1
3 n 1 N ( ) N 1 2 ( N 1) , 2 l 1 2 l1
M1
N
¦
n
¦
(
l 1
1 ) 2l
(25)
N 1 .
In the case when two of the N coefficients are used, then:
A2 M2
3 n 1 1 1 3 N ( l1 ) n 2 ( N ) N 1 2 N 1 2 l1 2 2 2
¦
M1
(26)
N 1.
Correspondingly, for the truncated 2-dimensional DCT follows: x For the calculation of 2 from N2 coefficients:
x
A2
(2 N 1) N 2( N 1).2
M2
( N 1) N ( N 1).2 ( N 1)( N 2)
N(2 N 5) 4
(27) (28)
For the calculation of 4 from N2 coefficients:
A4 M4
(2 N 1) N (2 N 1)2 (2 N 1)( N 2) M2
( N 1)( N 2)
(29) (30)
Then for the IDP decomposition with z levels, with start level p0=0,1,2,..,r and number of DCT coefficients for one sub-image R p 4 - for levels p0,p0+1,..,p0+z-2, and R p0 z 1 2 0
– for the last (p0+z-1) level, is obtained: x For one sub-image in the level (p0+ i) of the IDP coder:
A c (p 0 i)
2(2 N p0 i 1)( N p0 i 2)
(31)
M c ( p 0 i)
2( N p0 i 1)( N p0 i 2) ,
(32)
where i = 0,1,..,z-2;
N p0 i 2 n p0 i is the size of the sub-image in
the level (p0+i). In the last level (p0+z-1) of the IDP coder the equations (27) and (28) are changed in correspondence with equations (31) and (32) as follows:
A c (p 0 z 1) M c (p 0 z 1)
2 [ N p0 z 1 (2 N p0 z 1 5) 4] 2( N p0 z 1 1)( N p0 z 1 2) .
(33) (34)
192
R. Kountchev et al.
x
For one sub-image in the level (p0+i) of the IDP decoder:
A d ( p 0 i)
1 A c (p 0 i) ; M d (p 0 i) 2
1 M c ( p 0 i) 2
(35)
where i = 0, 1,.., z-1. In order to do the computational complexity of IDP lower, DCT is used only at the start level while the remaining levels use WHT. Then, for IDP decomposition with z = 3 levels and applying fast WHT, the global number of additions and multiplications for one pixel of the image
x 2 n is: 2 A 0c 2( N 2 p0 1)(N 2 p0 1) 3N ( N 2 p0 2 ) 2 N
with size 2
n
^
`
2 ( N 2 p0 )( N 2 p0 1 ) N2
M 0c
2M 0d
(36)
(37)
2M 0d
For n = 9, p0 = 6 and z = 3, from Eqs. (36) and (37) follows: 0 and M c
2A 0d
A 0c
2A 0d
14,31
2,18 .
5.2. Computation complexity of the Wavelet decomposition The Wavelet pyramid (WP) with 3 levels (respectively, with 10 frequency bands) in JPEG 2000 [13], built using a bank of separable 2-dimensional digital filters with 3 and 5 coefficients: 1/2(1,2,1) and 1/8(-1,2,6,2,-1), requires:
A 0c
[(8 6 6 4)
M 0c
M 0d
A 0d
[(9 7 7 5)
24 24 1 ] 31 4 16 2
[(9 3 3 1)
(38)
16 16 ] 21 4 16
(39)
28 28 3 ] 36 4 16 4
(40)
5.3. Comparison of the computation complexity of IDP and WP Table 1 contains the values of the normalized number of additions A0 and multiplications M0, obtained for IDP (z=3, p0=6) and for WP for image with size NuN=512u512 pixels. The comparison shows that the IDP decomposition has a lower computational complexity than WP. In the process of coding the number of multiplications
M 0c in IDP is about 10 times
193
Multi-layer Image Transmission with Inverse Pyramidal Decomposition smaller, and in the decoding
M 0d - about 19 times; the number of additions for IDP A 0c is
more that 2 times smaller, and
A 0d - more than 5 times.
Table 1. Comparison of the computational complexity of IDP and JPEG 2000
Operation Decomposition IDP WP (JPEG 2000)
A 0c 14.31 31.50
A 0d
M 0c
M 0d
7.15 36.75
2.18 21.00
1.09 21.00
6. Results The quality evaluation of the restored image Bc(i, j) with size MxN pixels was performed on the basis of the peak signal to noise ratio (PSNR), defined with the equation: (41)
PSNR = 10lg10 (255 2 / H 2 ) [dB] , where:
H2
1 M 1 N 1 [B(i, j) - Bc(i, j)] 2 MN i =0 j=0
¦¦
(42)
was the mean square error of the restoration, and 255 was the maximum grayscale value for the image pixels. The compression ratio for an image coded with modified IDP, limited up to the level p (before RLE and HE) is calculated in accordance with:
MN log 2 m
C( p )
p
K
¦ (A
s
for
p = 0, 1, … , P-1 ,
(43)
4s q s R s )
s 0
where d is the number of the gray levels for a single pixel; As – the number of bits used for coding the positions of the coefficients
s k p (u r , v r ) selected for the transforms of the pyra-
mid level s; qs - number of bits used for coding of every coefficient from level s, and Rs - the number of coefficients of the spectral model of a sub-image in the level s. As an example, for MuN = 49, P = 3, m = 256, K = 1024 (n = 4), qs = 8, Rs = 4 and As = 0 for s = 0,1,2 (i.e. for same positions used for the selected spectral coefficients), from Eq.(43) we obtain correspondingly C(0)=64, C(1)=12,8 and C(2)=3,04. After RLE and HE were applied over pyramid coefficients
s qk p (u r ,v r ) , this compression ratio C(p) was increased 4-6
times, retaining the image quality. Table 2 contains the results from the compression of two example images: HongK and City. Here the size of the original image HongK is 1024u768 pixels, and pixels for the City are
194
R. Kountchev et al.
600u480. Fig.3.a is the restored image HongK, obtained using the data from the lowest pyramid level only, after compression 179, and Fig.4a – the restored image City after compression 225.5. The selected parts of the images are marked with white line, and the restored parts are shown correspondingly in Fig. 3c, d and Fig.4 c, d. The image on Fig.3c (HongK3) was restored using the initial data file (13 176 [B]) and the difference (36 693 [B] - 13 176 [B]) with the data from the following pyramid level; this on Fig.3d (HongK4) - using the corresponding difference (60 535 [B] - 36 693 [B]) with the data from the last level. The example shows that in most cases the customer will be satisfied with the image quality of the second picture and will not need the last one (with the highest quality). The selected parts of the two images are smaller than the original ones and their size is correspondingly: 484u424 for the image in Fig. 3c (HongK3), and 450u450 for Fig.4d (City4). The compression ratio for the images HongK3, HongK4, City3 and City4 is calculated for these sizes. The table also shows the number of the pyramid level used for the image restoration. Table 2. Compression results for example images IDP layers
Image HongK1 HongK3 HongK4 City1 City3 City4
Layer 0 1 2 0 1 2
Compression ratio 179.06 9.44 5.72 225.53 13.64 31.84
PSNR [dB] 18.63 21.40 27.80 20.53 25.16 31.84
File length [B] 13 176 36 693 60 535 3 831 19 793 31 514
The orthogonal transform used in the lowest pyramid layer is DCT, and for the two following levels – Walsh-Hadamard Transform (WHT). The selected coefficients are as follows: for level 0 they are s k (0,0) , s k (1,0) and s k (0,1) ; for level 1 - s k (0,0) , s k (1,0) , s k (0,1) 0
and
0
0
1
1
1
s k1 (1,1) ; for level 2 - s k 2 (0,0) , s k 2 (0,1) and s k 2 (1,0) . The values of quantization
steps are:
' k 0 (u r ,v r )
4 - for level 0 and r = 1, 2, 3;
' k1 (0,0) 16, ' k1 (0,1)
' k1 (1,0) 32, ' k1 (1,1)
' k1 (0,0) 16, ' k1 (0,1)
' k1 (1,0) 32, ' k1 (1,1) 32 - for level 2.
64 - for level 1;
The results from the comparison of IDP with JPEG standard [12] are shown in Table 3. All examples are color images, 24 bpp. The compression with JPEG was performed with Microsoft Photo Editor 3.01. The comparison was made for the highest compression ratios which correspond with lowest pyramid level and could be used as first transferred image layer. The results show that for equal compression ratios, the quality of the restored image processed with IDP is always higher than that with JPEG. Even in cases when the compression ratio obtained with IDP method was higher, the image quality was better. The software implementation of the IDP method was performed with fast DCT and WHT [11] in C++ and Visual C. This special program, created on the basis of the described algorithm, is a powerful and flexible tool, permitting all the parameters – the number of pyramid levels, coefficients, quantization steps, etc. - to be set independently. The time necessary for the processing is not noticeable.
Multi-layer Image Transmission with Inverse Pyramidal Decomposition
195
Table 3. Results of the compression with IDP and JPEG Image Hungary Hungary Birds Birds Birds Myanmar Myanmar Myanmar Myanmar Nepal Nepal Nepal Summer Summer Summer Building Building Building
Size [pixels] 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 1024/768 2048/2048 2048/2048 2048/2048
Compression 159.54 161.87 317.96 164.00 161.56 106.09 140.74 168.70 106.10 185.73 156.42 139.07 181.50 263.50 177.05 223.05 204.68 169.25
Method IDP JPEG IDP IDP JPEG IDP IDP IDP JPEG IDP IDP JPEG IDP IDP JPEG IDP IDP JPEG
PSNR [dB] Compressed file[B] 26.79 14 788 23.95 14 575 26.62 7 420 28.41 14 377 23.92 14 603 19.66 21 239 19.40 16 764 19.06 13 985 19.30 21 233 21.35 12 703 21.69 15 083 21.17 16 964 34.26 13 020 30.73 8 954 25.10 13 325 24.92 56 414 25.10 61 475 22.45 74 345
7. Conclusion A novel method for layered image compression has been investigated. The main advantages of the IDP method compared with the standard JPEG [12], are the following: • It gives better image quality of the restored image for compression with high compression ratios (over 40:1) and close image quality for lower compression ratios; • It offers fast initial representation of the image (scaled down), permitting interactive request for higher resolution; • It has lower computational complexity. The future development of the method could be in the following main directions: • Pyramid adaptation on the basis of image histogram; • Increasing the efficiency of processing for color images, using the Karhunen-Loeve transform; • Improving the EC of the coded data with adaptive arithmetic coding; Increasing the noise suppression with pre- and post- image processing with homomorphic filtration.
196
R. Kountchev et al.
References 1. B. Aiazzi, L. Alparone, S. Baronti. A Reduced Laplacian Pyramid for Lossless and Progressive Image Communication, IEEE Trans. on Communication, Vol. 44, No 1, January 1996, pp. 18-22. 2. O. Egger, P. Fleury, T. Ebrahimi. High-Performance Compression of Visual Information: A Tutorial Review-Part I: Still Pictures, Processing of the IEEE, Vol. 87, No 6, June 1999, pp. 976-1011. 3. W. Kim, P. Balsara, D. Harper, J. Park. Hierarchy Embedded Differential Image for Progressive Transmission Using Lossless Compression, IEEE Trans. on Circuits and Systems for Video Technol., Vol. 5, No 1, Feb.1995, pp. 2-13. 4. C. Lu, A. Chen, K. Wen. Polynomial Approximation Coding for Progressive Image Transmission, Journal of Visual Communication and Image Representation, Vol. 8, June 1997, pp. 317-324. 5. K. Rao, P. Yip, Ed. The Transform and Data Compression Handbook, CRC Press LLC, 2001. 6. M. Helsingius, P. Kuosmanen, J. Astola. Image Compression using multiple transforms. Signal Processing, Vol. 15, 2000, pp. 513-529. 7. F. Meyer, A. Averbuch, R. Coifman. Multilayered Image Representation: Application to Image Compression. IEEE Trans. on Image Processing, Vol. 11, No 9, September 2002, pp. 1072-1080. 8. P. Topiwala. Wavelet Image and Video Compression. Kluwer, NY, 1998. 9. M. Unser. Splines: A Perfect Fit for Signal and Image Processing, IEEE Signal Processing Magazine, Nov. 1999, pp. 22-38. 10. R. Kountchev, J. Ronsin. Inverse Pyramidal Decomposition with “Oriented” Surfaces. Picture Coding Symposium’99, April 1999, Portland, USA, pp. 395-398. 11. R. Kountchev, V. Haese-Coat, J. Ronsin. Inverse Pyramidal Decomposition with multiple DCT. Signal Processing: Image Communication, Vol. 17, February 2002, pp. 201-218. 12. G. Wallace. The JPEG Still Image Compression Standard. IEEE Trans. Consumer Electronics, Vol. 38, No.1, Feb. 1992. 13. M. Rabbani, R. Joshi. An Overview of the JPEG 2000 Still Image Compression Standard. Signal Processing: Image Communication, Vol. 17, Jan. 2002, pp. 3-48.
CHAPTER 14 Multiple Feature Relevance Feedback in ContentBased Image Retrieval using Probabilistic Inference Networks Campbell Wilson and Bala Srinivasan School of Computer Science and Software Engineering Monash University P.O. Box 197, Caulfield East 3145, Australia {campbell.wilson,bala.srinivasan}@infotech.monash.edu.au
Abstract: The BIR content based image retrieval system uses a Bayesian belief network architecture to match query by example images to images in a database. This probabilistic architecture provides support for multiple image features at varying levels of abstraction. Relevance feedback may be natively implemented in the model via diagnostic inference in the Bayesian network. In this paper we describe how different relevance feedback scenarios can be dealt with in this system, in particular those involving feedback involving multiple classes of image features. In addition, a feature weighting scheme is proposed in order to automatically apportion flows of diagnostic inference according to the relative importance of visual features in the images chosen as relevant by the user.
1 Content-Based Image Retrieval using Probabilistic Inference Networks The BIR (Bayesian network Image Retrieval) system [1], [2] was developed as a research prototype to demonstrate the utility of probabilistic inference networks in image retrieval tasks. Bayesian belief networks (or more simply Bayesian networks) have been employed in textual information retrieval with encouraging results [3], [4]. In addition to the native modelling of uncertainty offered by these structures, the BIR system is constructed in such a way that multiple features at various levels of abstraction may be incorporated into the image retrieval system. A typical BIR network (Fig. 1.) consists of three subnetworks: the (trivial) query subnetwork or QNet which contains one node representing the user’s query by example image, the indexing subnetwork or INet which consists of an arbitrary number of nodes representing the extracted image features and the database subnetwork, or DNet, the nodes in which represent the images making up the database being queried. Each node in the network is associated with a belief value, initially instantiated to be the a priori probability of observing the event associated with the node (imposing a causal interpretation on the Bayesian C. Wilson and B. Srinivasan: Multiple Feature Relevance Feedback in Content-Based Image Retrieval using Probabilistic Inference Networks, Studies in Computational Intelligence (SCI) 2, 197–208 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
198
C. Wilson and B. Srinivasan
Query By Example
INet Indexing Network
{
Level 1 Features Level 0 Features
{ Image Database Objects
Fig. 1. Typical BIR Network Architecture
DNet Database Object Network
Level 2 Features
Transient Unity Weighted Link
QteN Query Network
network). Additionally, each link AoB in the network is weighted with a value which represents the conditional probability P(B|A). The interpretation of this link weight then depends on the nodes being linked. For example, P(B|A) may represent the probability of observing a particular feature in an image (which can be derived by treating the feature histogram of the image as a probability distribution), or it may represent the probability of observing a particular image given a feature (which can be derived by applying Bayes’ Theorem using the local feature histogram of the image and the global feature histogram representing the distribution of the particular feature across all the images in the database). In the most general case, the INet component may consist of several layers, each representing features at different levels of abstraction, i.e. higher level features which are represented as aggregations of lower level features. For example, an object appearing in a query image may be composed of particular shapes, colours and/or textures. The network architecture is independent of the mechanisms used to extract and weight the individual features in an image. The semantic interpretation of inference in the network depicted in figure 1 is essentially analogous to the flow of inference in text retrieval Bayesian networks, when the topology of Indrawan [4] is employed. In other words, the instantiation of the query node causes the remaining nodes in the network to have their belief values updated via evidential inference (i.e. inference in the direction of the connecting arcs). This inference is performed using a closed form approximation expression equivalent to the link tensor approach to evaluation taken in Indrawan [4], which itself is similar in nature to the noisy-OR canonical interaction model described by Pearl [5]. The updated belief values of the nodes representing the images in the database (i.e. the DNet nodes) are used directly to rank the retrieved images. The user interface to the system is shown in Fig. 2., which also shows relevant images selected by a user.
Multiple Feature Relevance Feedback in Content-Based Image
199
The network evaluation algorithm for a generalised multilayer BIR network is as follows:
Evaluation of the leaf nodes in the QNet: Let:
Q j be the jth leaf node in the QNet,
b(Q j ) be the a priori belief in the node Q j . w(Q j ) be the weight of the incoming link terminating at Q j . Then,
Q j QNet , P (Q j | e)
w(Q j )b(Q j )
Evaluation of the nodes in the INet: Let:
Fi j be the jth feature node at level i, 4 Fi
j
be the set of parent nodes of
4( k) Fi
j
be the kth element of
Fi j ,
4 Fi
,
j
k [0, 4 Fi j 1]
b(N) be the belief value of node N, w(NoM) be the weight of the link from node N to M. Then, For i = n-2 to 1 do {
Fi j INeti P( Fi j | e) b(4( 4Fi 1) Fi )w(4( 4Fi 1) Fi o Fi j ) j
4Fi 2 j
¦ k 0
}
j
j
j
4Fi 1 ½ j ° ° ®b(4(k ) Fi j )w(4(k ) Fi j o Fi j ) 1 b(4(l ) Fi j )w(4(l ) Fi j o Fi j ) ¾ l k 1 ° ° ¯ ¿
C. Wilson and B. Srinivasan
200
Evaluation of the nodes in the DNet to determine the final set of belief values Let:
Di be the ith (leaf) database object node in the DNet. 4 D i be the set of parent nodes of Di , 4( k ) Di be the kth element of 4 Di b(N) be the belief value of node N, w(NoM) be the weight of the link from node N to M.
Then,
Di DNet P(Di | e) b(4( 4Di 1)Di )w(4( 4Di 1)Di o Di ) 4 Di 2
¦ k 0
4 Di 1 ½ ° ° ®b(4(k )Di )w(4(k )Di o Di ) 1 b(4(l )Di )w(4(l )Di o Di ) ¾ l k 1 ° ° ¿ ¯
Fig. 2. BIR User Interface Diagnostic inference (i.e. inference counter to the direction of the connecting arcs) is suppressed in the initial network evaluation in order to avoid the problem of infinite propa-
Multiple Feature Relevance Feedback in Content-Based Image
201
gation of inference which may occur since the undirected network is likely to include cycles. However, diagnostic inference in the network offers us a natural methodology by which we can implement relevance feedback. The utility of relevance feedback is well recognised within the image retrieval community. The general mechanism by which relevance feedback is implemented using diagnostic inference in BIR is described conceptually elsewhere (e.g. [6]). In this paper, we will outline more specific relevance feedback scenarios than those considered in [6], and present the algorithm for multiple feature feedback. The remainder of the paper is organised as follows: Section 2 describes the taxonomy of relevance feedback scenarios we consider in the BIR system. In section 3 we concentrate on a relevance feedback scenario which has particular importance, namely feedback based on multiple image features. We show how this feedback may be implemented in the Bayesian network context. Section 4 contains experimental results and discussion, with section 5 concluding the paper.
2 BIR Feedback Taxonomy The implementation of relevance feedback in an image retrieval system is commonly dependent on (tightly coupled to) the system’s underlying retrieval model. In the context of the Bayesian network image retrieval model, this requires consideration of the direction of probabilistic inference. Inference in Bayesian networks may be either evidential (inference in the direction of causation) or diagnostic (inference against the direction causation). It is evidential inference via a noisy-OR type mechanism which is used to perform the initial retrieval iteration in BIR. This process results in the assignment of belief values to the database image object nodes and consequently the presentation of a ranked list of images given a particular query image. In order to perform relevance feedback via the BIR system, we employ diagnostic inference. The semantic implication of such an inference process is that we are changing our belief in the likelihood of a particular cause A given the observation of an effect B (of which A is a possible cause). This conceptual overview of the relevance feedback process in BIR is more fully outlined elsewhere [6]. Briefly, the process may be summarized as below:
1. Perform predictive inference In1 using the user’s query by example image and evaluate the belief values in all database object nodes. This gives us the initial retrieval set Ri.
2. The user chooses a set of relevant images Rr Ri. The nodes in Rr are then used as the evidential nodes in the diagnostic inference process In2.
3. Perform predictive inference In3 using the entire representational feature node space in order to present a new ranked list of images to the user. From this point, inference In2 and In3 may be reapplied if the user desires further refinement of the result.
We have divided the relevance feedback process into several distinct categories or feedback scenarios. Firstly, we can classify the feedback scenario depending on whether or not the user wishes to weight the images chosen as relevant (i.e. indicate that they believe some images are comparatively more or less relevant than others). If the user does wish to assign a weight, then we term this scenario weighted feedback and conversely, if no weight is to be assigned, we term the feedback Boolean. Within the Boolean and weighted categories, we can further subdivide the feedback scenarios depending on whether a single feature was used for the initial retrieval or multiple features were used for the initial retrieval. The rationale behind this classification is that the consideration of relevance feedback in multi-
202
C. Wilson and B. Srinivasan
ple feature retrieval is significantly more complex than for single features. Furthermore, relevance feedback may be either positive (images are selected because they are relevant and the user wishes to see more images similar to the selected images), negative (images are selected because they are not relevant and the user does not wish to see more images similar to the selected images), or a combination of positive and negative feedback. This therefore leaves us with a taxonomy of BIR relevance feedback scenarios as follows: x x x x
Single Feature Boolean/Weighted Positive Feedback Multiple Feature Boolean/Weighted Positive Feedback Single Feature Boolean/Weighted Positive and/or Negative Feedback Multiple Feature Boolean/Weighted Positive and/or Negative Feedback
The feedback results we have presented previously [6] have been instances of the simplest case, namely, single feature Boolean positive relevance feedback. However, since the BIR system is designed to be flexible enough to allow retrieval based on multiple features, the more interesting relevance feedback scenarios are those based on multiple features, which we will discuss in the next section.
3 Multiple Feature Feedback We described in [2] the method by which retrieval based on multiple image features is implemented in BIR. In particular, we employ a late integration modality whereby various result integration algorithms may be employed, i.e. retrieval based on n independent features involves inference across n BIR networks. We employ the same result combination algorithm used in the initial retrieval to combine the results of relevance feedback across the different feature networks involved in the retrieval. The relevance feedback process for each feature under consideration (i.e. the inference In2+In3) is described by the following algorithm: For each feature network, perform the network evaluation process as follows: Let: nI = |INet|, nD = |DNet| Fi be the ith feature node in the INet (i=0,…,nI –1 ).
: Fi be the set of child nodes of Fi representing relevant images.
:(k) Fi be the kth element of : Fi , (k=0,…, | :(k ) Fi | 1 ).
Di be the ith (leaf) database object node in the DNet. 4 D i be the set of parent nodes of Di , 4( k ) Di be the kth element of 4 Di
Multiple Feature Relevance Feedback in Content-Based Image
203
b(N) be the belief value of an arbitrary node N, w(NoM) be the weight of a link from a node N to a node M. Then, For i = 0 to ni-1 do {
: ( k ) Fi , w(:( k ) Fi o Fi ) : w( Fi o : ( k ) Fi )b( Fi ) < (m) Fi , w( < ( m) Fi o Fi ) : w( Fi o < ( m) Fi )b( Fi ) §b(:( : 1) )w(:( : 1) oF ) · Fi Fi Fi Fi i ¨ ¸ ¨ ¸ 1 :Fi 1 ½¸ b(Fi ) P(Fi | e) b(Fi ) (1 )¨ :Fi 2 ° ° :Fi ¨ ®b(:(k)Fi )w(:(k)Fi oFi ) 1b(:(l)Fi )w(:(l)Fi oFi ) ¾¸¸ ¨ ¦ l k1 °¸ ¨ k0 ° ¯ ¿¹ ©
if (b(Fi)>1) then b(Fi)=0
· §b(<( < 1) )w(<( < 1) oF ) Fi Fi Fi Fi i ¸ ¨ ¸ ¨ 1
if (b(Fi)<0) then b(Fi) =0 }
Di DNet, b( Di ) P( Di | e) b(4( 4Di 1)Di )w(4( 4Di 1)Di o Di ) 4 Di 2
¦ k 0
4 Di 1 ½ ° ° b ( ( k ) ) w ( ( k ) D ) 1 b ( ( l ) ) w ( ( l ) D ) 4 4 o 4 4 o ® Di Di i Di Di i ¾ l k 1 ° ° ¿ ¯
We can then combine the (normalized) vectors of belief values, ȕi={b(Dj), Dj INet} using, for example, the normalized belief averaging method [2]:
Ti
1 n ¦ w j E ji n j1
where wj weighting of feature j, and Ti is the ith element of the final result vector T. The major issue to consider in this scenario is how to apportion the flow of diagnostic inference from the nodes representing the relevant images amongst the individual feature networks, in other words, the determination of the values wj. For example, if the initial retrieval had been based on colour and texture, the diagnostic inference In2 should be performed across both the colour and the texture networks used for the initial retrieval. However, it may not necessarily be obvious from the set of images chosen by the user as relevant the degree to
204
C. Wilson and B. Srinivasan
which the colour and/or texture features contributed to this relevance judgment. Different approaches may be taken to this problem, namely: 1. Apply the diagnostic inference In2+In3 to each individual feature network as described. Then combine the results using the same result combination method and feature weighting scheme used for the initial retrieval. 2. Apply the diagnostic inference In2+In3 to each individual feature network as described. Then combine the results using the same result combination method used in the initial retrieval, with equal feature weighting (with the assumption in mind that the original feature weighting scheme was inadequate from the user’s point of view). 3. Ask the user to specify a new feature weighting scheme, and use this scheme in the result combination module after applying the inference In2+In3 to the individual feature networks. 4. Apply the diagnostic inference In2+In3 to each individual feature network. Examine the feature histograms of the images chosen as relevant in order to automatically modify the initial feature weighting scheme so that it better reflects the particular distribution of features in those images chosen as relevant. This new feature weighting scheme would then be used in the result combination phase after the final inference In3 has been performed. Of these four approaches, the last would seem to be the most promising. In order to perform such automatic feature weighting, we would need to assess the similarity of the relevant images in terms of their feature composition to determine what features seem to be influencing the user’s choice of images as relevant. We propose to consider the normalised histograms as points in n-dimensional feature space (where n is the number of bins in the histograms, equal to the number of feature nodes at the lowest level of the INet). The feature space which contains the cluster with the smallest radius (commonly defined as the average distance of any point in the cluster to the centroid of the cluster) may be considered to represent the dominant feature for relevance feedback. This is because a small cluster is indicative of a high correlation between the histograms of the images chosen as relevant, whereas a larger cluster means that the images in the cluster are less closely related. Furthermore, we propose that the ratio of cluster radii be used as the new feature weightings for the final result combination. An algorithm for this process is presented as follows: Let: n be the number of feature classes participating in the In2 inference process and m be the number of images chosen as relevant by the user, Fi, i=1,…,n represent the set of feature classes (where a feature class corresponds to one entire BIR indexing layer, e.g. colour or texture) participating in the In2 inference process, n(Fi) be the number of individual feature elements in feature class Fi J j ( Fi ) , j=1,…,m be the histogram vector representing the distribution of the features in feature class Fi in the jth image chosen as relevant, J C ( Fi ) be the the centroid of the set { J j ( Fi ) , j=1,…,m } r(Fi) be the radius of the cluster of histograms
J j ( Fi ) , j=1,…,m
w(Fi) be the feature weight to be assigned to feature class Fi for the feedback inference In2.
relevance
Multiple Feature Relevance Feedback in Content-Based Image
205
1. Calculate the centroid point for each feature class
For k = 1 to n do: ®J C ( Fk ) :
¯
½ 1 n ( Fk ) ¦ J l ( Fk )¾ n( Fk ) l 1 ¿
2. Calculate the the average (Euclidean) cluster radii
For k = 1 to n do: ®r ( Fk ) :
¯
1 n ( Fk ) ¦ J l ( Fk ) J C ( Fk ) n( Fk ) l 1
2
½ ¾ ¿
3. Assign feature weightings to the feature classes for relevance feedback inference In2 based on the ratio of radii calculated in step For k = 1 to n do: w( F ) ® k
¯
r ( Fk ) ½ ¾ max{r ( Fk )} ¿
In addition to the above algorithms, we have recently investigated the use of rough set theory in the assignment of feature weights. This is achieved by considering the retrieved images, the feature components Fi involved in the retrieval and the users’ relevance classification of the retrieved images as a decision system. Two feature weighting schemes are proposed in this context; attribute significance reweighting and core composition reweighting. Preliminary results of this work are presented in [7].
C. Wilson and B. Srinivasan
206
DWR COLLECTION/64 Colour Quantized Histogram+5x5-gram Texture Feature Weighting for Initial Retrieval: Colour 1.0, Texture 1.0
Query Image
Initial Retrieval Set (Images chosen as relevant marked with a *):
1(*)
2
3(*)
4(*)
5(*)
6
7
8
9
10
11
12
Fig. 3(a). BIR Multiple Feature Feedback Results – Initial Retrieval Set Multiple Feature BIR Relevance Feedback with weighting: Colour 1.0, Texture 1.0:
1
2
3
4
5
6
7
8
9
10
11
12
Multiple Feature BIR Relevance Feedback with weighting: Colour 1.0, Texture 0.0:
1
2
3
4
5
6
7
8
9
10
11
12
Multiple Feature BIR Relevance Feedback with weighting: Colour 0.0, Texture 1.0:
1
2
3
4
5
6
7
8
9
10
11
12
Fig. 3(b). BIR Multiple Feature Feedback Results
Multiple Feature Relevance Feedback in Content-Based Image
207
4 Experimental Results In the absence of standard image database test collections, we employed images from the California Department of Water Resources photography collection [8] for the testing of our relevance feedback approach. Figures 3(a) and 3(b) show the results of multiple feature relevance feedback on colour (RGB histogram) and texture (NM-gram texture [9]). All feature weights employed in the BIR model are considered to be in the range [0,1]. As such, figures 3(a) and 3(b) show the extremes of the weight range for the allocation of diagnostic inference to the individual feature networks. Informally, we can see that the images selected as relevant were most likely selected because they were “blue water” images. The feedback results indicate for this example that putting more weight on the colour network compared with the texture network does produce more “blue” images, post In3, as we would expect whereas a bias towards the texture network tends to produce images broadly texturally similar to images chosen relevant. Additionally, we would expect that when the relevant images chosen are clustered in the appropriate feature space, the cluster radius would be smaller in the colour space compared with that in the texture space. Since we are comparing heterogeneous features, there is always an issue of the semantic gap between the numeric comparison of the results of retrieval based on these features and a comparison done on a purely visually perceptual basis. However, the proposed ratio of cluster radii is performed on the basis of normalized belief values and is intended to serve as a significant first basis for the relevance feedback process.
5 Conclusion Relevance feedback is viewed by many in the content based image retrieval field as particularly important, given the highly subjective nature of the image retrieval task. The BIR Bayesian network image retrieval model gives us a probabilistic framework for image retrieval which is not only suited to modelling uncertainty but offers a native paradigm for relevance feedback, namely that of diagnostic inference. We have described in this paper a late integration noisy-OR type approximation algorithm for the evaluation of such inference in BIR networks, the results of which have been encouraging when tested with users of the system. Furthermore, since relevance feedback is often performed on the basis of multiple features, we have proposed an algorithm based on the ratio of the radii of clusters formed by considering the vectors of images chosen as relevant in an appropriate feature space for the automatic assignment of weightings for the networks to be considered in the diagnostic inference feedback process. Future work in this area includes the refining and further testing of the cluster radii scheme. Another promising area of research in the BIR model context is the idea of persistent relevance feedback or user profiling which may also be used to tailor the image retrieval results more specifically to individual users.
References [1] C. Wilson, B. Srinivasan and M. Indrawan, “A General Inference Network Based Architecture for Multimedia Information Retrieval”, Proc. ICME2000, New York City, pp 347-350, 2000. [2] C. Wilson, B. Srinivasan and M. Indrawan, “BIR – The Bayesian Network Image Retrieval System”, in Proc. ISIMP2001, Hong Kong, pp 304-307, 2001.
208
C. Wilson and B. Srinivasan
[3] H. Turtle, Inference Networks for Document Retrieval, PhD Dissertation, The University of Massachussetts, 1991. [4] M. Indrawan, A Framework for Information Retrieval Based on Bayesian Networks, PhD Dissertation, Monash University, 1998. [5] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, 1998. [6] C. Wilson, B. Srinivasan and M. Indrawan, “Relevance Feedback in an Image Retrieval System based on Bayesian Networks” Proc. Sixth International Conference on Control, Automation, Robotics and Vision, ICARCV2000, Singapore, 2000. [7] S. Zutshi, C. Wilson, B. Srinivasan and S. Krishnaswamy, “Modelling Relevance Feedback using Rough Sets”, Proc. ICAPR2003, Calcutta, India, 2003. [8] C. Carson and V.E. Ogle, “Storage and Retrieval of Feature Data for a Very Large Online Image Collection”, IEEE Computer Society Bulletin of the Technical Committee on Data Engineering, vol. 19, no. 4, pp 19-27, 1996. [9] A. Soffer, “Image Categorization using Texture Features”, in Proc. 4th International Conference on Document Analysis and Recognition, Ulm, pp 233-237, 1997.
CHAPTER 15 Non-Iterative ICA for Detecting Motion in Image Sequences Yu Wei1 and Charayaphan Charoensak2 1
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798.E-mail:
[email protected] 2 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798.E-mail:
[email protected]
Abstract: The ICA (Independent Component Analysis) technique can be used to detect difference in a sequence of images. However, most implementations of this algorithm are highly computation intensive. In this chapter, non-iterative algorithms for ICA are introduced and a non-iterative algebraic ICA is selected to detect motion in image sequences. Simulation results in MATLAB are compared with direct image subtraction and FPGA (Field Programmable Gate Array) implementation to speed up the computation of this noniterative algebraic ICA. A system level approach to the design of an FPGA prototype is explained. FPGA allows short development time and enables verification of algorithms in hardware at low cost. Keywords: Image Segmentation, Algebraic ICA, FPGA, Non-Iterative
1 Introduction Techniques for segmenting objects in image sequences may be grouped into three main categories: (1) intra-frame segmentation, (2) motion field cluster segmentation, and (3) frame differencing [10]. In the first approach, each frame in the sequences is segmented separately. Image segmentation methods using color, texture, and intensity matching are applied to every image frame. This technique works well when it produces a small number of regions; i.e. in images with simple compositions. For more complicated images, however, over-segmentation often occurs. In the second technique, the motion field is used for segmentation. One such criterion for grouping the pixels in the motion field into a region is based on similar motion vectors. This approach offers good performance when a reliable motion field can be obtained from image motion estimation. However, motion estimation techniques may not produce accurate results, especially in real world applications. In the third approach, image segmentation is computed partially using the difference between image frames. This technique is very easy to implement and works rather well in
Y. Wei and C. Charoensak: Non-Iterative ICA for Detecting Motion in Image Sequences, Studies in Computational Intelligence (SCI) 2, 209–220 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
210
Y. Wei and C. Charoensak
simple image sequences. However, the result achieved is often sensitive to noise and the technique may fail completely if there is global motion or an abrupt scene change. In recent years, ICA (Independent Component Analysis) has been successfully applied in various signal processing applications such as audio signal processing [5], EEG [6], ECG, watermarking, and financial signal analysis [3]. Due to its high computation, ICA has not been applied very successfully in real time applications. Taro and Kazuyoshi [8] have proposed an algebraic algorithm to compute ICA without requiring iteration. Using this technique, the speed of ICA computation can be drastically improved. In section 3, experimentation results of this technique when applied to some image sequences are presented. Its performance in comparison to direct image subtraction is described. A system level design of the FPGA for the computations of the algebraic ICA is explained in section 4. Optimal parallelism, which is needed to handle the high computation load in real-time, is easily achieved through the implementation using FPGA.
2 Independent Component Analysis 2.1 ICA introduction ICA computation of a random vector x consists of finding a linear transform s=Wx so that the component si are as independent as possible; in the sense of maximizing some function F( s1 , , sm ) that measures independence [1]. Statistically independent sources s1 , , sm are assumed to be a mixture of observed linear signals x1 , , xm , which can be expressed as: n
xi
¦a s ij
j
ai1s1 ai 2 s2 ... ain sn ;1 d i d m
j 1
or in matrix form:
x
(1)
As
The only observed signal is the random vector x. Both A and s must be estimated using the available signal x. It is impossible to solve for A and s using only the equation x As . The key to estimating the ICA model is non-Gaussianity. The Central Limit Theorem, a classical theory in probability, provides a way to solve this problem. This theory shows that the distribution of a sum of independent random variables tends to move toward a Gaussian distribution under certain conditions. Thus, a sum of two independent random variables usually has a distribution that is closer to Gaussian than either of the two original random variables. The statistical properties such as consistency, asymptotic variance, and robustness achieved from the ICA method depend on the choice of the objective function and the algorithm implementation. The other performance such as convergence speed, memory requirements, and numerical stability depend on the optimization algorithm. We may write:
Non-Iterative ICA for Detecting Motion in Image Sequences
ICA = Objective function + Optimization algorithm
211 (2)
There are many implementations of ICA models based on various assumptions regarding the characteristics of noise and the source densities. Two basic approaches are commonly used to solve the problem of the slow calculation. One approach is using greedy algorithms to let the data converge as soon as possible. The other is avoiding the timeconsuming iteration. The ICA implementations based on the later approach are not very well studied in spite of its potential advantage of fast execution. Table 2.1. Non-iterative ICA methods
Name Algebraic
ICA[8] MS-ICA (Molgedey
-Schuster ICA)[4] Scatter diagram
of mixed signals[9]
Advantages
Assumption
Disadvantages
Very fast
None
Only supports two independent components
Fast
Very fast
Non-vanishing auto-correlation functions of the independent sources that can be Gaussian Assume that the density of the data around apexes is not low
Not directly guaranteed to minimize the likelihood. May not get the minimum generalization error Transform precisely the scatter diagram when the density of the data around apexes is not low
Table 2.1 lists three non-iterative ICA methods described in the literature. Among the three, only algebraic ICA does not base on any assumptions regarding the input data. This is one of the reasons why this algorithm was selected in our application in motion detection. The main disadvantage of this algorithm is that it supports only two independent components. Higher order data are not supported by this algorithm. The detail of algebraic ICA algorithm is introduced in the following section.
2.2 Algebraic ICA algorithm Most ICA algorithms require iteration by making use of the learning ability of an artificial neural network, and optimization techniques such as the fixed–point method [2]. Yamaguchi and Itoh [8] proposed an algebraic ICA algorithm, which does not require iteration. The math model of two dimensions ICA is defined as follows. Two observed signals X1 and X2 are given as:
212
Y. Wei and C. Charoensak
§ X1 · ¨ ¸ © X2 ¹
§ 1 a · § S1 · ¨ ¸¨ ¸ © b 1 ¹ © S2 ¹
(3)
where a and b are unknown mixing coefficients. S1 and S2 are two unknown original independent signals. A series of constants is defined below:
C1
E ( X 12 ) [ E ( X 1 )]2
C2
E ( X 22 ) [ E ( X 2 )]2
C3
E ( X1 X 2 ) E ( X1 ) E ( X 2 )
C4
E ( X 14 ) E ( X 13 ) E ( X 1 )
C5
E ( X 13 X 2 ) E ( X 13 ) E ( X 2 )
C6
E ( X 13 X 2 ) E ( X 12 X 2 ) E ( X 1 ) 2 1
2 2
(4)
2 1
C7
E( X X ) E( X X 2 )E( X 2 )
C8
E ( X 12 X 22 ) E ( X 1 X 22 ) E ( X 1 )
C9
E ( X 1 X 23 ) E ( X 1 X 22 ) E ( X 2 )
C10
E ( X 1 X 23 ) E ( X 1 ) E ( X 23 )
C11
E ( X 24 ) E ( X 23 ) E ( X 2 )
From this definition of ICA, two basic rules can be made as shown in equations 5 and 6.
E ( S1 S 2 )
E ( S1 ) E ( S 2 )
(5)
E ( S13 S2 )
E ( S13 ) E ( S 2 )
(6)
From Equation 3 and 5, b can be expressed as:
b
aC2 C3 aC3 C1
From Equation 6 and 7, b can be eliminated. A fourth-order equation for a is left:
(7)
Non-Iterative ICA for Detecting Motion in Image Sequences
213
(C2 C10 C11C3 ) u a 4 (3C3C9 3C2 C8 C3C10 C1C11 ) u a 3 (3C2 C6 3C3C8 3C1C9 3C3C7 ) u a 2
(8)
(C3C5 3C1C7 3C3C6 C2 C4 ) u a (C3C4 C1C5 )
0
Therefore, a and b can be solved using Equation 8. There should be four solutions from Equation 8. The following rules are used to screen out three solutions and keep only one: x x x
Real solution, Positive solution, Minimized absolute value of a+b
3 Performance Analysis This section presents the performance comparison of the two frame difference detection techniques, ICA and Direct Image Subtraction. The video frames used in the simulation are randomly recorded from local television channels. Fig 3.1 (a) and (b) show two images from an image sequence with abrupt global and local brightness changes. In (a), the image brightness is increased and a dark patch is added in the middle-left corner using image editing program. The algebraic ICA is then performed and the two output images are shown in (c) and (d). The result from direct image subtraction is shown in (e). Finally, (f) shows the difference between (d) and (e). The output in (c) shows the common component between the two images in (a) and (b). The other output in (d) shows their difference. More details in the difference can be seen from the ICA result in (d) than in the direct subtraction in (e). For example, the building in the background behind the reporter is clearly visible.
214
Y. Wei and C. Charoensak
(a)
(b)
(c)
(d)
(f) (e) Fig. 3.1. (a) and (b) are input frames where frame (a) has abrupt changes in both global and local brightness, (c) and (d) are ICA output results, (e) is the result from direct subtraction of (a) and (b), and (f) is the difference between (d) and (e)
Non-Iterative ICA for Detecting Motion in Image Sequences
(a)
(b)
(c)
(d)
(e)
(f)
215
Fig. 3.2. (a) and (b) are input frames where there is abrupt change due to a scene switch, (c) and (d) are ICA output results, (e) is the result from direct subtraction of (a) and (b), and (f) is the difference between (d) and (e) Fig 3.2 (a) and (b), shows two images from an image sequence with global changes due to an abrupt scene switch. The algebraic ICA is performed and the two output images are shown in (c) and (d). As before, the result from direct image subtraction is shown in (e) and the difference between (d) and (e) is shown in (f). When comparing the results between (d) and (e), it can be seen that the ICA performs better. As two images are more independent if there is a global change, the results of ICA will show that the global change is almost equal to the first frame in (a). Fig 3.3 (a) and (b) show the case of typical motion in normal video without illumination change or scene switch. The ICA result shown in (d) is comparable to that of direct subtraction in (e). Some areas such as the bottom-right corner of (d) show a slightly clearer edge when compared to the same area in (e).
216
Y. Wei and C. Charoensak
It is thus shown that ICA generally performs better than direct image subtraction in all conditions, although direct image subtraction is much simpler.
(a)
(b)
(c)
(d)
(e) (f) Fig. 3.3. (a) and (b) are iinput fframes from a normall video id sequence, ((c) and (d) are ICA output results, (e) is the result from direct subtraction of (a) and (b), and (f) is the difference between (d) and (e) Table 3.1 shows the time needed for the computation of the non-iterative algebraic ICA in the above simulations. The simulation program was written using MATLAB 6.1 and the computer used is a Pentium II, with a 400MHz clock and 256MB of RAM.
Non-Iterative ICA for Detecting Motion in Image Sequences
217
Table 3.1. Simulation time of the non-iterative algebraic ICA Two 388 u 284 pixels image
Fig 3.1
Fig 3.2
Fig 3.3
Time(s)
1.26
1.21
1.21
First running
1.63
1.59
1.59
With FPGA hardware acceleration, algebraic ICA implementation can achieve a real time performance, as will be shown in the next section.
4 FPGA Implementation This section presents a system level approach to FPGA design for the algebraic ICA implementation explained earlier.
4.1 System level FPGA hardware design In this experimentation, the FPGA design tools used were XilinxTM System Generator version 2.1, SimulinkTM, and MATLABTM version 6.1. The FPGA synthesis tool used was XilinxTM ISE 4.1i. System Generator from Xilinx [7] provides a bit-true and cycle-true simulation of the SimulinkTM models under the MATLAB environment. It provides a very convenient platform for testing the algorithms before hardware prototyping in real hardware. In this experimentation, a portion of MATLAB program is required. This program is implemented as an s-function for simulation under Simulink together with the System Generator design. The two input images are stored in two RAM blocks in the FPGA.
218
Y. Wei and C. Charoensak
Fig. 4.1. Top-level design of algebraic ICA algorithm using System Generator under MATLAB SimulinkTM
Fig. 4.2. Circuit design of computation in Equation 2.4 The top-level design of the motion detection based on a non-iterative, algebraic ICA using System Generator is shown in Fig 4.1. A more detailed circuit design of the portion, which implements the computation in Equation 2.4, is shown in Fig 4.2. The FPGA simulation was then carried out to test the design. The output of the simulation show that the FPGA design performs the motion detection as expected.
Non-Iterative ICA for Detecting Motion in Image Sequences
219
After that, the System Generator was used to automatically generate a set of VHDL source codes from the designs. The VHDL codes were then used to synthesize the FPGA using the Xilinx ISE 4.1i development tool. The Xilinx Virtex-E family of FPGA was chosen for synthesis with optimization set for speed and the design required 90,200 gates. The testing of the designed FPGA in prototype board is given in the next subsection.
4.2 Testing algebraic ICA in prototype board The Xilinx Virtex-E family, fabricated on a 0.18 um six-layer metal silicon process, provides high speed and density performance. The Virtex-E family combines advanced DLL (Delayed-locked loop) with high-performance Select I/O technologies to deliver over 311 Mbits/second I/O speed performance (www.xilinx.com/partinfo/ds022-1.pdf). The testing of the FPGA was done using a prototype board equipped with a 600K gate Virtex-E FPGA, XCV600E-6HQ240, with a speed grade 6. The chip is in a 240-pin package with 158 user I/O pins. A digital input/output interfacing adapter was used for transferring data between the computer and the prototype board. MATLAB SimulinkTM was used to control the operation of the FPGA and to read the data stream from the FPGA in the prototype board using the input/output adapter. The rate of the data stream is thus less than real-time operating frequency. This means that the testing was not in real-time but with real functioning of the FPGA. The estimated achievable maximum processing speed of 20 frames per second is reported by ISE 4.1i after the FPGA routing. The two input images in Fig 3.1, (a) and (b), were used for testing the FPGA prototype and the motion detection results achieved were identical to the simulation result shown in Fig 3.1, (c) and (d).
5 Conclusion In this chapter, we have shown that a non-iterative algebraic ICA technique can be used to detect difference in a sequence of images successfully. Requiring no iteration, the algorithm is simple and thus suitable for real-time hardware implementation. The performance and implementation of the FPGA has also been discussed. It has been estimated that the design for non-iterative algebraic ICA can easily achieve the real-time speed of 20 frames/second. Using FPGA for hardware implementation allows a quick prototyping for algorithm verification. Although the price/performance of the designs using FPGAs often lag behind those using ASICs (Application Specific Integrated Circuits), the start-up cost is much lower. The newer generations of FPGA promise the improvement that is closing the gap between the two technologies. The developed motion detection hardware requires only a small size FPGA (90,200 gates) and so the device cost is low. The solution can be added into equipment such as digital cameras, security cameras, etc., without excessive additional production cost.
220
Y. Wei and C. Charoensak
References 1. A. Hyvärinen (1997) Survey on independent component analysis. http://www.cis.hut.fi/aapo/ 2. A. Hyvärinen, E Oja (1997) A fast fixed-point algorithm for independent component analysis, Neural Computation, 9(7):1483-1492 3. K Kiviluoto and E Oja (1998) Independent component analysis for parallel financial time series In Proc. ICONIP’98, volumn 2, pages 895-898, Tokyo, Japan 4. L Molgedey and H G Schuster (1994) Separation of a Mixture of Independent Signals Using Time Delayed Correlations, PHYSICAL REVIEW LETTERS, Volume 72, Number 23, June 5. Smaragdis (1997) Efficient Blind Separation of Convolved Sound Mixtures, IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz NY, October 6. R Vigário (1997) Extraction of ocular artifacts from EEG using independent component analysis, Electronceph. clin. Neurophysiol., 103(3):395-404 7. Xilinx Inc. (2001) Xilinx System Generator v2.1 for MathWorks Simulink: Quick Start Guide 8. Taro Yamaguchi, Kazuyoshi Itoh (2000) An algebra solution to independent component analysis, Optics Communications 178, pp. 59-64 9. Taro Yamaguchi, Katsuhisa Hirokawa, Kazuyoshi Itoh (2000) Independent component analysis by transforming a scatter diagram of mixtures of signals, Optics Communications 173 107–114 10. J Zhang and W Lin (2001) Real time image sequence segmentation using curve evolution, Proceeding of SPIE Vol. 4303
CHAPTER 16 A Practical Approach to the Chord Analysis in the Acoustical Recognition Process Marcin Szlenk1 and Wladyslaw Homenda2 1 2
Institute of Control and Computation Engineering, Warsaw University of Technology, 00-665 Warsaw, Poland,
[email protected] Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-661 Warsaw, Poland,
[email protected]
Abstract: The identification of simultaneously sounding notes is one of the key problems for music recognition. In this paper we outline the localization of this problem in the whole process of music recognition and propose a practical solution. Keywords: music recognition, acoustical recognition, chord recognition, automatic identification, spectrum analysis
1 Introduction 1.1 Problem Definition Music recognition is the mathematical analysis of an audio signal generated by musical instruments and its conversion into musical notation. The input data for the system of music recognition is a digitally sampled signal representing the analogue sound waveform generated by musical instruments. A typical example of such data is the contents of a CD-Audio disc. From the user’s point of view, the operation of an automated music recognition system is as follows: in response to digital sound data at the input, the system returns a musical score at the output. The input sound data can be received as a piece of music performed by only one musician, playing one instrument (e.g. a piano sonata, a violin sonata etc.) or as multi-instrument music, i.e. as a piece of music performed by many musicians (e.g. a string quartet or chamber orchestra). Multi-instrument music can be seen as a union of single-instrument music and in this case the recognition system could be based on a system recognizing music performed on only one instrument. Such a solution would require the separation of input data into the parts played on single instruments. The implementation of this idea is, however, a long way off. The problem of single-instrument music recognition is still a challenge without, as yet, a satisfactory solution. The more complex the piece of multi-instrument music is, the more difficult it is to recognize solo instrument signals because they interfere with each other, which moves
M. Szlenk and W. Homenda: A Practical Approach to the Chord Analysis in the Acoustical Recognition Process, Studies in Computational Intelligence (SCI) 2, 221–231 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
222
M. Szlenk and W. Homenda
the problem of music recognition to a more complex class1 . On the other hand, conversion of an acoustic signal into digital format distorts the music data, which raises another issues to be dealt with. These arguments allow for the conclusion that multi-instrument music recognition is still over the horizon of current technological development. Therefore, in this paper the discussion is restricted to the problem of single-instrument music only.
1.2 Music Recognition Stages In the process of generating a musical score, using an input audio signal, there are two unique stages (see Fig. 1): • •
acoustical recognition, and music analysis.
The aim of acoustical recognition is to determine the number of simultaneously sounding notes, to establish their pitches and time parameters, i.e. the start time and duration. The result of this stage gives sufficient information to generate a proper sequence of MIDI (Musical Instrument Digital Interface) commands. The MIDI standard is primarily used to control digital musical instruments [9]. The MIDI commands only contain the information about the occurrence of a certain event at a given moment (e.g. the Note On and Note Off events define the starting and ending time of a note) and they no longer contain any additional information about the real sound waveform. Music analysis is engaged, on the contrary, in determining the information required to generate musical score from data received at the acoustical recognition stage (e.g. in the form of a MIDI file). It includes the recognition of tempo, tonality, note values, dynamics and other musical characteristics which can be reflected in printed music.
Fig. 1. Music recognition stages
1.3 Acoustical Recognition The music recognition stages above can be implemented completely independently because both solve different technological problems. This paper deals with acoustical recognition and, more precisely, the problem of simultaneously sounding note recognition. Regardless of the way in which the acoustical recognition stage is implemented, there is always the issue of how to recognize the number and determine the pitches of simultaneously sounding notes. Let us consider the simplest schema of acoustical recognition. The input sequence of samples is divided into short time periods and then the notes employed in consecutive periods are recognized. Then, notes with the same pitch, recognized in successive periods of time, are glued together. Below, we present a more formal description of this solution: 1
The problem of multi-instrument music recognition is similar to speech recognition, where a number of people speak simultaneously.
A Practical Approach to the Chord Analysis
223
The input samples of sound can be observed by moving them over a window with a certain width. Let us assume that Z denotes the set of notes which are currently being played. The operation of inserting a note into the set Z is accompanied by generating a MIDI event indicating the starting time of a note as the moment corresponding to a current position of the window. By analogy, the operation of removing a note from the set Z is accompanied by generating an event indicating the ending time of it. With the initially empty set Z, the following steps are performed: 1. Place the window at the beginning of the sequence of samples. 2. Define the set Z’ of notes occurring in the fragment visible in the window. 3. For all the notes in set Z perform: • If the note n currently being played is not among the identified notes (n ∈ Z and n ∈ / Z’) then remove it from set Z. 4. For all the notes in set Z’ perform: • If the recognized note n’ is not among the notes currently being played (n’ ∈ Z’ and n’ ∈ / Z) than insert it into set Z. 5. If the analyzed fragment was the last one then remove the remaining notes from set Z and finish. If not, shift the window to the next position (move at a distance equal to the window’s width) and repeat the above steps, starting at 2. The solution to step 2 in the above algorithm is the main objective of this paper. The above algorithm requires further explanation. We should take into account that the width of the window, which we move over the input data, determines the time resolution for the output MIDI information. As the window is narrowed we can identify start times and the duration of notes more accurately. In the case of most musical compositions the sufficient size is a width equal to 50 ms. Obviously, the narrower the window we use, the better resolution we get but the correct recognition of notes becomes more difficult.
2 Musical Instrument Sound 2.1 Introduction Sound generated by musical instruments is an acoustic wave. Such a wave has a complex structure that does not have a precise mathematical model. In this paper we discuss selected aspects of sound generated by musical instruments. Mathematical modelling estimates aspects explored in the paper and provides tools for the analysis of selected features of the sound of musical instruments.
2.2 Tone and Frequency The sound of musical instruments is an acoustic wave at a certain frequency or rather a series of such waves with differing frequencies interfering with each other. Analyzing a single sound generated by one instrument, e.g. the sound generated by a piano string in reaction to hitting one key of the keyboard, a sinusoidal wave of fundamental frequency f is generated (also called the first harmonic) and its higher harmonic frequencies, i.e. sinusoidal waves of frequencies equal to f , 2 f , 3 f , 4 f , etc. can be detected. All these waves are generated by a vibrating string since not only the whole string vibrates, but also its parts of lengths equal to 1/2, 1/3, 1/4, etc. do. So then we can interpret sound generated by a string as a collection
224
M. Szlenk and W. Homenda
of its harmonic partials. On the other hand, all harmonic partials aggregated create a periodic wave with a complex shape and frequency f . Thus, the frequency of the first harmonic is taken as the frequency of the sound. The set of values of the amplitudes of the harmonic partials is called the harmonic spectrum of sound. The spectrum decides about timbre and colour of sound and differentiates between the same tones generated by different instruments, e.g. violin, flute, piano, etc. For instance, the spectrum of the clarinet sound initially has odd harmonic partials much stronger than others [1] (see Fig. 2). The spectrum of a given instrument may change depending on the fundamental frequency of a tone. It may even happen that the presence of a given harmonic partial depends on the fundamental frequency. However, changes are usually rather small and do not affect music recognition systems, in most cases.
2.3 Musical Scale Frequency as a physical parameter of a sound corresponds to ear impression. We can experience lower and higher tones. Some tones are mapped to the musical scale. For instance, 440 hertz frequency corresponds to the note A4 – the symbol A4 denotes tone A in octave number 4. The 880 hertz frequency corresponds to the note A5 , i.e. to note A in the octave number 5. Thus we have the following sequence of tones A: A1 – 55 hertz, A2 – 110 hertz, A3 – 220 hertz, A4 – 440 hertz, A5 – 880 hertz, A6 – 1760 hertz, A7 – 3520 hertz, A8 – 7040 hertz, A9 – 14080 hertz. All these tones are denoted by letter A since ear experience is very similar for all of them. Note that higher A tones are harmonic for lower A tones. The frequencies between two neighboring tones A are split into 12 intervals called halftones. Notes corresponding to . The proportion of the consecutive halftones are: A, A , B, C, C , D, D , E, F, F , G, G√ 12 frequencies corresponding to two consecutive halftones is equal to 2 : 1 (see Table 1).
2.4 Monophony vs. Polyphony Monophony is music having only one voice line. Roughly speaking this means that in any time at most one note can be played. By contrast, polyphony is a kind of music with many voice lines, which means that many notes may be played at a time. Recognition of monophonic music is much easier than recognition of polyphonic music. In the case of monophonic music, the input sound signal can contain only a single harmonic structure (see Sect. 2.2). Excluding usually small inharmonic partials and additional noise, such a sound signal may be approximately described by
Amplitude
0
5f
10f
15f
Fig. 2. Clarinet sound spectrum ( f = 233.08 Hz)
20f
Frequency
A Practical Approach to the Chord Analysis
225
Table 1. Notes and their frequencies (Hz) [1] C2 D2 /C2 D2 E2 /D2 E2 F2 G2 /F2 G2 A2 /G2 A2 B2 /A2 B2
65.41 69.30 73.42 77.78 82.41 87.31 92.50 98.00 103.83 110.00 116.54 123.47
C3 D3 /C3 D3 E3 /D3 E3 F3 G3 /F3 G3 A3 /G3 A3 B3 /A3 B3
130.81 138.59 146.83 155.56 164.81 174.61 185.00 196.00 207.65 220.00 233.08 246.94
C4 D4 /C4 D4 E4 /D4 E4 F4 G4 /F4 G4 A4 /G4 A4 B4 /A4 B4
261.63 277.18 293.66 311.13 329.63 349.23 369.99 392.00 415.30 440.00 466.16 493.88
C5 D5 /C5 D5 E5 /D5 E5 F5 G5 /F5 G5 A5 /G5 A5 B5 /A5 B5
x(t) = ∑ Ak (t) sin(kωt + φk ),
523.25 554.37 587.33 622.25 659.26 698.46 739.99 783.99 830.61 880.00 932.33 987.77
(1)
k
where x(t) is the sound signal in the time domain; ω is the fundamental frequency of the current note; Ak (t) is the amplitude of the kth harmonic at time t; φk is the phase of the kth harmonic. In this case acoustical recognition amounts to detecting the fundamental frequency, finding a note corresponding to it (having in mind that the detected fundamental frequency usually differs slightly from that defined in Table 1) and establishing note duration. Recognition of polyphonic music poses a more complicated problem. Since many notes may sound at the same time, the analysis of the spectrum must consider many harmonic structures corresponding to different simultaneous notes. Harmonic partials of different notes may superimpose on each other, so then the analysis of such data is much more difficult than in the case of monophony.
3 Mathematical Apparatus 3.1 Frequency Analysis One of the most popular tools for determining the harmonic contents of a signal is the Fourier transform. For analyzing the signal by virtue of its N input samples (sequence x(n)), a socalled discrete Fourier transform (DFT) is used and is defined as N−1
S(m) =
∑ x(n)e−i2πnm/N , m = 0, 1, 2, . . . , N − 1.
(2)
n=0
For N input samples in the time domain, DFT establishes the harmonic contents of the input signal in N equally spaced points of the frequency axis. For a certain m, the value S(m) (the value of the DFT) is the value of a spectrum at the frequency fanalysis (m) =
m fs , N
(3)
226
M. Szlenk and W. Homenda
where fs denotes the frequency used to sample the input signal. For example, in the case of data coming from CD-Audio this frequency is equal to 44.1 kHz. When we examine signals in the frequency domain we are usually interested in the values of their power in comparison to the power of another signal. If we show the instantaneous power of signals, represented by successive values of the DFT, then the easiest way is to compare them with the partial of the highest power |S(m)| dB. (4) S power (m) = 20 log10 |S(mmax )| where mmax is the index of the DFT value with the highest power. It is the so-called normalized decibel scale. Obviously the highest value on such a normalized scale is 0 dB.
3.2 Spectrum Analysis As a result of the frequency analysis we obtain information about the harmonic contents of the input signal. By virtue of this information we would like to find out which note, or simultaneously sounding notes, correspond to the received harmonic structure. This problem appears to be very interesting especially if we do not know the number of simultaneously sounding notes and we do not know anything about the instrument from which the input signal comes. Tanguiane [7] reduced this problem to the search of the appropriate deconvolution of the chord spectrum. Only an outline of this approach will be presented here. In practice spectrum analysis is restricted to a certain frequency range. The lower limit is defined by the fundamental frequency of the lowest note which is to be recognized and the upper limit is a consequence of the frequency used to sample the input signal2 . Let us divide the frequency range which we analyze into bands wherein the signal power is represented by one value. By discrete spectrum we understand the expression of the form S = S(x) =
N−1
N−1
n=0
n=0
∑ Sn δ(x − n) = ∑ Sn δn ,
(5)
where N is the total number of frequency bands; n is the index of a frequency band; Sn is a value interpreted as the signal power in the nth frequency band; δn is the Dirac delta function, i.e. the unit impulse at the nth frequency. For the given discrete spectrum S we can introduce the conception of Boolean spectrum (associated with S) defined by
0 if Sn = 0, (6) s = s(n)δn , s(n) = 1 if Sn = 0. n √ Each successive note on the musical scale has the frequency 12 2 times larger than the preceding one (see Sect. 2.3). It means that equal distances of notes’ pitches (e.g. the interval equal to one octave) do not correspond to equal distances on the frequency axis. For example, the musical distances between the notes A3 –A4 and A4 –A5 are equal to an octave but, in terms of their fundamental frequencies, these distances are respectively 220 and 440 Hz. The corresponding distances on both scales may be achieved by rescaling the frequency axis with the log2 function. The index of the frequency band wherein falls a frequency f is defined by the formula 2
The spectrum of a discrete signal is a periodic function with the period equal to the sampling rate.
A Practical Approach to the Chord Analysis
227
' & f (7) n = C log2 + 0,5 , f0 where C is the constant, equal to the number of frequency bands per octave; f0 is the middle of the frequency band with the index 0. For log2 -scaled frequency axis a Boolean spectrum s of a sound of several simultaneously sounding notes can be treated as generated by multiple translations of a Boolean spectrum of one note. The number of notes and their musical interval (precisely, the number of frequency bands) in relation to a note with the spectrum t is defined by the interval distribution i. Thus the spectrum s has the form of the following s=
s(n)δn = t ∗ i,
(8)
n
where both t and i are Boolean spectra. Generally, each Boolean spectrum s can be represented as s = t ∗ i + ε − λ.
(9)
In [7] Tanguiane shows how to find this representation for any Boolean spectrum s minimizing, at the same time, the components ε and λ. This issue, however, is too wide to be presented here in detail.
4 Chord Recognition Process We assume that the input data for the recognition process is a sequence of signal samples representing a sound of one or several notes played on any musical instrument. The identification of notes occurring in the signal will be realized in several stages (see Fig. 3):
Reading samples from file
Computation of DFT spectrum
Computation of power spectrum
Conversion into discrete spectrum
Conversion into Boolean spectrum
Finding spectrum deconvolution Fig. 3. Chord recognition stages
228
M. Szlenk and W. Homenda
1. The first step of the recognition process is the frequency analysis. For computing the DFT spectrum of the input signal we use the fast Fourier transform (FFT) algorithm [3, 4, 5]. The computation of the N-point DFT directly from the definition (see Eq. 2) requires O(N) complex multiplications whereas the usage of the FFT algorithm reduces the complexity of computation to O(N log2 N). The only disadvantage of the FFT algorithm in comparison with the classical DFT algorithm is the requirement that the number of input samples must be the whole power two. 2. For the obtained DFT spectrum we compute the corresponding power spectrum and represent it using a normalized decibel scale (see Eq. 4). 3. The next step is to convert the power spectrum into the form of a discrete spectrum with the simultaneous rescaling of the frequency axis (see Eq. 7). According to the applied resolution of the discrete spectrum its bands correspond to wider or shorter frequency ranges. As the value of the discrete spectrum in the given band the highest value of the power spectrum belonging to this frequency range is taken3 . As the lowest note which is to be recognized we took the note C2 . Thus the middle of the band with the index 0 is equal to the fundamental frequency of the note C2 (65.406 Hz). 4. The discrete spectrum4 of the signal power is then converted into a Boolean spectrum. From the definition of the Boolean spectrum s associated with the discrete spectrum S (see Eq. 6) we have
0 if Sn = 0, s(n) = 1 if Sn = 0. In practice, the Fourier transform signals the presence of partials (which are possibly weak but always with power above zero) in each frequency band. This fact does not allow us to apply the above definition directly. The simplest way of overcoming this inconvenience is to remove all spectrum partials which lie below a certain threshold. After this operation, according to the definition of Boolean spectrum, we replace the remaining partials with unit values. 5. The last step is to find the deconvolution t ∗i+ε−λ of the Boolean spectrum. The interval distribution i, which we obtain as a result, is the answer to the question about the number and pitches of notes employed in a chord. Determining the notes’ pitches we interpret interval values in the spectrum i in relation to the first partial of the spectrum t.
5 Examples of Recognition Tables 2, 3 and 4 show the results of the recognition of three chords: (C4 , E4 , G4 ), (C4 , E4 , G4 , A4 ), (C4 , E4 , G4 , A4 , D4 ), played on the piano, organ and flute respectively. The recording parameters were the same as in the case of CD-Audio, i.e. 16-bit samples were taken at 44,100 samples/s. The length of the input sequence was equal to 2048 samples which corresponds to about 50 ms. Before the calculation of the FFT (see step 1 in Sect. 4) the input samples were first centered and windowed using the Hanning window [3, 4, 5]. The tables show the answer of our chord recognition system only for those threshold values (see step 4 in Sect. 4) for 3 4
The other approach may be based on estimation of the signal power in the given band. Because of the applied normalized decibel scale this spectrum does not completely satisfy the definition of the discrete spectrum where the value 0 denotes the lack of signal partial in the given band. Conformability to the definition may be achieved by an appropriate rescaling of the spectrum values.
A Practical Approach to the Chord Analysis
229
which the result of the recognition of one of the chords had changed. The misrecognized notes are marked in bold. The results shown allow us to make some vital observations. A simple prediction to make was that the correctness of recognition significantly depends on the number of notes employed in a chord. As the number of simultaneously sounding notes is increased, it becomes more difficult to recognize them accurately. There are two main reasons for this. Firstly, the increased number of strong harmonic partials in a sound signal makes the problem of spectral leakage in FFT-based spectrum analysis more noticeable [3, 4]. This problem could be reduced by calculating the FFT for more samples of the signal, but as previously mentioned in Sect. 1.3, it would decrease the time resolution in the whole process of acoustical recognition. Secondly, the increased number of harmonic partials in the signal causes an increase of the complexity (number of partials) of the Boolean spectrum (see step 4 in Sect. 4). Consequently, this reduces the effectiveness of the algorithm for finding the deconvolution of the Boolean spectrum (see step 5 in Sect. 4). Briefly, with the increased complexity of the Boolean spectrum it becomes increasingly likely that one or a subset of its partials fit more than one note at the same time. The analysis of the above results leads us to make another observation. The correctness of recognition depends not only on the number of simultaneously sounding notes but also on the kind of instrument on which they were played. The origin for this phenomena is, however, the same. Musical instruments differ in the number and the relative amplitudes of the harmonic partials in their sound. This is particularly apparent with the sound of the organ. Here the only harmonic partials which appear have a distance of one octave between them [6]. Thus, the complexity of its spectrum is much smaller in comparison to other musical instruments researched, which visibly makes recognition easier.
6 Conclusion Analyzing the properties of the proposed solution to the problem of simultaneously sounding note identification we performed a number of tests taking account of different initial conditions such as the kind of musical instrument, the number of simultaneously sounding notes and the
Table 2. Example of chord recognition (piano) dB
C4 , E4 , G4
C4 , E4 , G4 , A4
C4 , E4 , G4 , A4 , D5
0 -1 -2 -5 -6 -7 -8 -9 -10 -12 -13 -14
G4 G4 G4 C4 C4 , C5 C4 , C5 E 4 , F4 , E5 correct correct correct correct C4 , C4 , C5
G4 G4 G4 G4 , A4 C4 , C5 E4 , A4 , E5 C 4 , F4 , C 5 C4 , C4 , C5 C4 , F4 , G4 C4 , F4 , G4 correct C4 , E4 , F4 , A4
D5 G4 G4 G4 , D5 G4 , D5 G4 , D5 E4 , G4 , D5 F4 , G4 , D5 , G5 C4 , F4 , A4 , G4 , D5 E4 , G4 , C5 E4 , G4 , C5 E4 , G4 , C5
230
M. Szlenk and W. Homenda Table 3. Example of chord recognition (organ) dB
C4 , E4 , G4
C4 , E4 , G4 , A4
C4 , E4 , G4 , A4 , D5
0 -1 -2 -3 -4 -5 -6 -8 -9 -10 -12 -14
G4 C4 E4 ,E5 E4 ,F4 ,E5 E4 ,F4 ,E5 C4 ,C4 ,C5 correct correct correct correct A3 , C4 , F4 A3 , D4 , F4
G4 C4 E4 , E5 C4 , C5 C4 , C5 C4 , G4 , A4 C4 , E4 , G4 correct correct correct A3 , D4 , F4 , G4 C4 , E4 , F4 , A4
A5 E4 , G4 , D5 correct C4 , E4 , D5 C4 , A4 , D5 C4 , A4 , C5 , D5 C4 , E4 , G4 , A4 C4 , E4 , G4 , A4 correct C4 , E4 , C5 C4 , E4 , C5 C4 , C4 , A4 , C5 , D5
Table 4. Example of chord recognition (flute)
dB
C4 , E4 , G4
C4 , E4 , G4 , A4
C4 , E4 , G4 , A4 , D5
0 -1 -3 -4 -5 -6 -7 -8 -10 -11 -12 -13
G4 G4 G4 G4 G4 E 4 , E5 E 4 , F4 , E 5 E 4 , F4 , E 5 E4 , G4 E4 , G4 E4 , G4 E4 , G4
G4 A4 G4 G4 , G5 G4 , G5 G4 , A4 , G5 F4 , G4 , A4 ,E5 E4 , G4 E4 , G4 E 4 , F4 , B 4 E4 , F4 , G4 , E5 E4 , F4 , G4 , E5
D6 D6 D6 A5 C4 E4 , E5 E4 , E5 E4 , E5 C4 , C4 , C5 C4 , A4 , D5 C4 , A4 , C5 , D5 C4 , A4 , C5 , D5
number of signal samples. The issue which still requires a solution is an appropriate choice of the threshold value (or values) used to generate the Boolean spectrum. As a result of analyzing the answers of the recognition process for several threshold values, it is possible to accurately identify the whole chord even if we did not get the correct answer for any of these values. In view of this the results obtained, thus far, seem to be very promising, indeed.
References 1. Berg RE, Stork DG (1995) The Physics of Sound. Prentice-Hall Inc., New Jersey 2. Bregman AS (1990) Auditory Scene Analysis: The Perceptual Organization of Sound. M.I.T Press, Massachusetts
A Practical Approach to the Chord Analysis
231
3. Lyons RG (1997) Understanding Digital Signal Processing. Addison Wesley Longman Inc. 4. Oppenheim AV, Schafer RW (1989) Discrete-Time Signal Processing. Prentice-Hall Inc., New Jersey 5. Oppenheim AV, Willsky AS with Nawab SH (1983) Signals & Systems. Prentice-Hall Inc., New Jersey 6. Szlenk M (2000) The Automatic Identification of the Properties of Acoustical Space from the Sound Generated by Musical Instruments (in Polish). MS Thesis, Warsaw University of Technology, Warsaw 7. Tanguiane AS (1993) Artificial Perception and Music Recognition. Springer-Verlag, Berlin 8. Intel Integrated Performance Primitives, Reference Manual, vol 1: Signal Processing. http://developer.intel.com 9. The Complete MIDI 1.0 Detailed Specification. http://www.midi.org
CHAPTER 17 Audio Fingerprinting: Concepts And Applications Pedro Cano1 , Eloi Batlle1 , Emilia G´omez1 , Leandro de C.T.Gomes2 , and Madeleine Bonnet2 1
2
Institut Universitari de l’Audiovisual, Universitat Pompeu Fabra Ocata 1, Barcelona, 08003, Spain {pcano, eloi, egomez}@iua.upf.es InfoCom-Crip5, Universit´e Ren´e Descartes 45, rue des Saints-P`eres, 75270 Paris cedex 06, France {tgomes,bonnet}@math-info.univ-paris5.fr
Abstract: An audio fingerprint is a unique and compact digest derived from perceptually relevant aspects of a recording. Fingerprinting technologies allow the monitoring of audio content without the need of metadata or watermark embedding. However, additional uses exist for audio fingerprinting. This paper aims to give a vision on Audio Fingerprinting. The rationale is presented along with the differences with respect to watermarking. The main requirements of fingerprinting systems are described. The basic modes of employing audio fingerprints, namely identification, authentication, content-based secret key generation for watermarking and content-based audio retrieval and processing are depicted. Some concrete scenarios and business models where the technology is used are presented, as well as an example of an audio fingerprinting extraction algorithm which has been proposed for both identification and verification. Keywords: Audio fingerprinting, content-based audio retrieval, integrity verification, watermarking.
1 Definition of Audio Fingerprinting An audio fingerprint is a content-based compact signature that summarizes an audio recording. Audio fingerprinting has attracted a lot of attention for its audio monitoring capabilities. Audio fingerprinting or content-based identification (CBID) technologies extract acoustic relevant characteristics of a piece of audio content and store them in a database. When presented with an unidentified piece of audio content, characteristics of that piece are calculated and matched against those stored in the database. Using fingerprints and efficient matching algorithms, distorted versions of a single recording can be identified as the same music title [1].
P. Cano et al.: Audio Fingerprinting: Concepts And Applications, Studies in Computational Intelligence (SCI) 2, 233–245 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
234
P. Cano et al.
The approach differs from an alternative existing solution to monitor audio content: Audio Watermarking. In audio watermarking [7], research on psychoacoustics is conducted so that an arbitrary message, the watermark, can be embedded in a recording without altering the perception of the sound. Compliant devices can check for the presence of the watermark before proceeding to operations that could result in copyright infringement. In audio fingerprinting, the message is automatically derived from the perceptually most relevant components of sound. Compared to watermarking, it is ideally less vulnerable to attacks and distortions since trying to modify this message, the fingerprint, means alteration of the quality of the sound. It is also suitable to deal with legacy content, that is, with audio material released without watermark. In addition, it requires no modification of the audio content. As a drawback, the complexity of fingerprinting is higher than watermarking and there is the need of a connection to a fingerprint repository. Contrary to watermarking, the message is not independent from the content. It is therefore for example not possible to distinguish between perceptually identical copies of a recording. For more detail on the relation between fingerprinting and watermarking we refer to [14]. At this point, we should clarify that the term “fingerprinting” has been employed for many years as a special case of watermarking devised to keep track of an audio clip’s usage history. Watermark fingerprinting consists in uniquely watermarking each legal copy of a recording. This allows to trace back to the individual who acquired it [12]. However, the same term has been used to name techniques that associate an audio signal to a much shorter numeric sequence (the “fingerprint”) and use this sequence to e.g. identify the audio signal. The latter is the meaning of the term “fingerprinting” in this article. Other terms for audio fingerprinting are Robust Matching, Robust or Perceptual Hashing, Passive Watermarking, Automatic Music Recognition, Content-based Digital Signatures and Content-based Audio Identification. The areas relevant to audio fingerprinting include Information Retrieval, Pattern Matching, Signal Processing, Cryptography and Music Cognition to name a few.
2 Properties of Audio Fingerprinting The requirements depend heavily on the application but are useful in order to evaluate and compare different audio fingerprinting technologies. In their Request for Information on Audio Fingerprinting Technologies [1], the IFPI (International Federation of the Phonographic Industry) and the RIAA (Recording Industry Association of America) tried to evaluate several identification systems. Such systems have to be computationally efficient and robust. A more detailed enumeration of requirements can help to distinguish among the different approaches [2][18]: Accuracy: The number of correct identifications, missed identifications, and wrong identifications (false positives). Reliability: Methods for assessing that a query is present or not in the repository of items to identify is of major importance in play list generation for copyright enforcement organizations. In such cases, if a song has not been broadcast, it should not be identified as a match, even at the cost of missing actual matches. Approaches to deal with false positives have been treated for instance in [11]. In other applications, like automatic labeling of MP3 files (see Sect. 4), avoiding false positives is not such a mandatory requirement. Robustness: Ability to accurately identify an item, regardless of the level of compression and distortion or interference in the transmission channel. Ability to identify whole titles from excerpts a few seconds long (known as cropping or granularity), which requires methods
Audio Fingerprinting: Concepts And Applications
235
for dealing with lack of synchronization. Other sources of degradation are pitching, equalization, background noise, D/A-A/D conversion, audio coders (such as GSM and MP3), etc. Security: Vulnerability of the solution to cracking or tampering. In contrast with the robustness requirement, the manipulations to deal with are designed to fool the fingerprint identification algorithm. Versatility: Ability to identify audio regardless of the audio format. Ability to use the same database for different applications. Scalability: Performance with very large databases of titles or a large number of concurrent identifications. This affects the accuracy and the complexity of the system. Complexity: This refers to the computational costs of the fingerprint extraction, the size of the fingerprint, the complexity of the search, the complexity of the fingerprint comparison, the cost of adding new items to the database, etc. Fragility: Some applications, such as content-integrity verification systems, may require the detection of changes in the content. This is contrary to the robustness requirement, as the fingerprint should be robust to content-preserving transformations but not to other distortions (see Subsection 3.2). Improving a certain requirement often implies losing performance in some other. Generally, the fingerprint should be: •
• •
•
A perceptual digest of the recording. The fingerprint must retain the maximum of acoustically relevant information. This digest should allow the discrimination over a large number of fingerprints. This may be conflicting with other requirements, such as complexity and robustness. Invariant to distortions. This derives from the robustness requirement. Content-integrity applications, however, relax this constraint for content-preserving distortions in order to detect deliberate manipulations. Compact. A small-sized representation is interesting for complexity, since a large number (maybe millions) of fingerprints need to be stored and compared. An excessively short representation, however, might not be sufficient to discriminate among recordings, affecting accuracy, reliability and robustness. Easily computable. For complexity reasons, the extraction of the fingerprint should not be excessively time-consuming.
3 Usage Modes 3.1 Identification Independently of the specific approach to extract the content-based compact signature, a common architecture can be devised to describe the functionality of fingerprinting when used for identification [1]. The overall functionality mimics the way humans perform the task. As seen in Fig. 1, a memory of the works to be recognized is created off-line (top); in the identification mode (bottom), unlabeled audio is presented to the system to look for a match. Database creation: The collection of works to be recognized is presented to the system for the extraction of their fingerprint. The fingerprints are stored in a database and can be linked to a tag or other meta-data relevant to each recording.
236
P. Cano et al.
Recordings’ collection
Fingerprint extraction
Recordings’ IDs
DB
Unlabeled recording
Recording ID
Fingerprint extraction
Match
Fig. 1. Content-based audio identification framework
Identification: The unlabeled audio is processed in order to extract the fingerprint. The fingerprint is then compared with the fingerprints in the database. If a match is found, the tag associated with the work is obtained from the database. A reliability measure of the match can also be provided.
3.2 Integrity Verification Integrity verification aims at detecting the alteration of data. The overall functionality (see Fig. 2) is similar to identification. First, a fingerprint is extracted from the original audio. In the verification phase, the fingerprint extracted from the test signal is compared with the fingerprint of the original. As a result, a report indicating whether the signal has been manipulated is output. Optionally, the system can indicate the type of manipulation and where in the audio it occurred. The verification data, which should be significantly smaller than the audio data, can be sent along with the original audio data (e.g. as a header) or stored in a database. A technique known as self-embedding avoids the need of a database or a specially dedicated header, by embedding the content-based signature into the audio data using watermarking (see Fig. 4). An example of such a system is described in [15]. Test Audio Signal
Original Audio Fingerprint
Fingerprint Extraction
Test Audio Fingerprint Results
Comparison
Fig. 2. Integrity verification framework
Audio Fingerprinting: Concepts And Applications
237
3.3 Watermarking Support Audio fingerprinting can assist watermarking. Audio fingerprints can be used to derive secret keys from the actual content. As described by Mihc¸ak et al. [19], using the same secret key for a number of different audio items may compromise security, since each item may leak partial information about the key. Perceptual hashing can help generate input-dependent keys for each piece of audio. Haitsma et al [17] suggest audio fingerprinting to enhance the security of watermarks in the context of copy attacks. Copy attacks estimate a watermark from watermarked content and transplant it to unmarked content. Binding the watermark to the content can help to defeat this type of attacks. In addition, fingerprinting can be useful against insertion/deletion attacks that cause desynchronization of the watermark detection: by using the fingerprint, the detector is able to find anchor points in the audio stream and thus to resynchronize at these locations [19].
3.4 Content-Based Audio Retrieval and Processing Deriving compact signatures from complex multimedia objects and powerful indexes to search media assets is an essential issue in Multimedia Information Retrieval. Fingerprinting can extract information from the audio signal at different abstraction levels, from low level descriptors to higher level descriptors. Especially, higher level abstractions for modeling audio hold the possibility to extend the fingerprinting usage modes to content-based navigation, search by similarity, content-based processing and other applications of Music Information Retrieval [13]. Adapting existing efficient fingerprinting systems from identification to similarity browsing can have a significant impact in the music distribution industry (e.g: www.itunes.com, www.mp3.com ). At the moment, on-line music providers offer searching by editorial data (artist, song name, and so on) or following links generated through collaborative filtering. In a query-by-example scheme, the fingerprint of the song can be used to retrieve not only the original version but also “similar” ones [10].
4 Application Scenarios Most of the applications presented in this section are particular cases of the identification usage mode described above. They are therefore based on the ability of audio fingerprinting to link unlabeled audio to corresponding metadata, regardless of audio format.
4.1 Audio Content Monitoring and Tracking Monitoring at the Distributor End Content distributors may need to know whether they have the rights to broadcast the content to consumers. Fingerprinting can help identify unlabeled audio in TV and Radio channels repositories. It can also identify unidentified audio content recovered from CD plants and distributors in anti-piracy investigations (e.g. screening of master recordings at CD manufacturing plants) [1].
238
P. Cano et al.
Monitoring at the Transmission Channel In many countries, radio stations must pay royalties for the music they air. Rights holders need to monitor radio transmissions in order to verify whether royalties are being properly paid. Even in countries where radio stations can freely air music, rights holders are interested in monitoring radio transmissions for statistical purposes. Advertisers also need to monitor radio and TV transmissions to verify whether commercials are being broadcast as agreed. The same is true for web broadcasts. Other uses include chart compilations for statistical analysis of program material or enforcement of “cultural laws” (e.g. French titles in France). Fingerprinting-based monitoring systems are being used for this purpose. The system “listens” to the radio and continuously updates a play list of songs or commercials broadcast by each station. Of course, a database containing fingerprints of all songs and commercials to be identified must be available to the system, and this database must be updated as new songs come out. Examples of commercial providers of this service are: Broadcast Data System (www.bdsonline.com), Music Reporter (www.musicreporter.net), Audible Magic (www.audiblemagic.com), Yacast (www.yacast.fr). Napster and Web-based communities alike, where users share music files, have been excellent channels for music piracy. After a court battle with the recording industry, Napster was prohibited from facilitating the transfer of copyrighted music. The first measure taken to conform with the judicial ruling was the introduction of a filtering system based on filename analysis, according to lists of copyrighted music recordings supplied by the recording companies. This simple system did not solve the problem, as users proved to be extremely creative in choosing file names that deceived the filtering system while still allowing other users to easily recognize specific recordings. The large number of songs with identical titles was an additional factor in reducing the efficiency of such filters. Fingerprinting-based monitoring systems constitute a well-suited solution to this problem. Napster actually adopted a fingerprinting technology (see www.relatable.com) and a new file-filtering system relying on it. Additionally, audio content can be found in ordinary web pages. Audio fingerprinting combined with a web crawler can identify this content and report it to the corresponding right owners (e.g. www.baytsp.com).
Monitoring at the Consumer End In usage-policy monitoring applications, the goal is to avoid misuse of audio signals by the consumer. We can conceive a system where a piece of music is identified by means of a fingerprint and a database is contacted to retrieve information about the rights. This information dictates the behavior of compliant devices (e.g. CD and DVD players and recorders, MP3 players or even computers) in accordance with the usage policy. Compliant devices are required to be connected to a network in order to access the database.
4.2 Added-Value Services Content information is defined as information about an audio excerpt that is relevant to the user or necessary for the intended application. Depending on the application and the user profile, several levels of content information can be defined. Here are some of the situations we can imagine: •
Content information describing an audio excerpt, such as rhythmic, melodic or harmonic descriptions.
Audio Fingerprinting: Concepts And Applications • •
239
Meta-data describing a musical work, how it was composed and how it was recorded. For example: composer, year of composition, performer, date of performance, studio recording/live performance. Other information concerning a musical work, such as album cover image, album price, artist biography, information on the next concerts, etc.
Different user profiles can be defined. Common users would be interested in general information about a musical work, such as title, composer, label and year of edition; musicians might want to know which instruments were played, while sound engineers could be interested in information about the recording process. Content information can be structured by means of a music description scheme (MusicDS), which is a structure of meta-data used to describe and annotate audio data. The MPEG-7 standard proposes a description scheme for multimedia content based on the XML metalanguage [20], providing for easy data interchange between different equipments. Some systems store content information in a database that is accessible through the Internet. Fingerprinting can then be used to identify a recording and retrieve the corresponding content information, regardless of support type, file format or any other particularity of the audio data. For example, MusicBrainz, Id3man or Moodlogic (www.musicbrainz.org, www.id3man.com, www.moodlogic.com) automatically label collections of audio files; the user can download a compatible player that extracts fingerprints and submits them to a central server from which meta data associated to the recordings is downloaded. Gracenote (www.gracenote.com), who has been providing linking to music meta-data based on the TOC (Table of Contents) of a CD, started offering audio fingerprinting technology to extend the linking from CD’s TOC to the song level. Their audio identification method is used in combination with text-based classifiers to improve accuracy. Another example is the identification of a tune through mobile devices, e.g. a cell phone; this is one of the most demanding situations in terms of robustness, as the audio signal goes through radio distortion, D/A-A/D conversion, background noise and GSM coding, mobile communication’s channel distortions and only a few seconds of audio are available (e.g: www.shazam.com).
4.3 Integrity Verification Systems In some applications, the integrity of audio recordings must be established before the signal can actually be used, i.e. one must assure that the recording has not been modified or that it is not too distorted. If the signal undergoes lossy compression, D/A-A/D conversion or other content-preserving transformations in the transmission channel, integrity cannot be checked by means of standard hash functions, since a single bit flip is sufficient for the output of the hash function to change. Methods based on fragile watermarking can also provide false alarms in such a context. Systems based on audio fingerprinting, sometimes combined with watermarking, are being researched to tackle this issue (see Sect. 5.3). Among some possible applications [15], we can name: Check that commercials are broadcast with the required length and quality, verify that a suspected infringing recording is in fact the same as a recording whose ownership is known, etc.
240
P. Cano et al.
5 Audio Fingerprinting System: An Example 5.1 Introduction The actual implementations of audio fingerprinting usually follow the presented scheme in 3.1. They differ mainly in the type of features being considered, the modeling of the fingerprint, the type of similarity metric to compare fingerprints and the indexing mechanisms for the fast database look-up. The simplest approach would be direct file comparison. A way to efficiently implement this idea consists in using a hash method, such as MD5 (Message Digest 5) or CRC (Cyclic Redundancy Checking), to obtain a compact representation of the binary file. In this setup, one compares the compact signatures instead of the whole files. Of course, this approach is not robust to compression or distortions of any kind, and might not even be considered as content-based identification of audio, since it is not based on an analysis of the content (understood as perceptual information) but just on manipulations performed on binary data. This approach would not be appropriate for monitoring streaming audio or analog audio; however, it sets the basis for a class of fingerprinting methods: Robust or Perceptual hashing [19][17]. The idea behind Robust hashing is the incorporation of acoustic features in the hash function, so that the final hash code is robust to audio manipulations as long as the content is preserved. Many features for audio characterization can be found in the literature, such as energy, loudness, spectral centroid, zero crossing rate, pitch, harmonicity, spectral flatness [3] and Mel-Frequency Cepstral Coefficients (MFCC’s). Several methods perform a filter bank analysis, apply a transformation to the feature vector and, in order to reduce the representation size, extract some statistics: means or variances over the whole recording, or a codebook for each song by means of unsupervised clustering. Other methods apply higher-level algorithms that try to go beyond signal processing comparisons and make use of notions such as beat and harmonics [6]. For a complete review of algorithms for fingerprinting we refer to [8].
5.2 Hidden Markov Model based Audio Fingerprinting We now present a case study to illustrate in more detail an implementation of an audio fingerprinting solution. The implementation was designed with high robustness requirements: identification of radio broadcast songs [9], and has been tested for content integrity verification [15]. The inherent difficulty in the task of identifying broadcast audio is mainly due to differences between the original titles (as available on CDs) and the broadcast ones: a song may be partially transmitted, the speaker may talk on top of different segments of the song, the piece may be played faster and several manipulation effects may be applied to increase the listener’s psychoacoustic impact (compressors, enhancers, equalization, bass-booster, etc.). Yet the system also has to be fast because it must do comparisons with several thousand (of the order of 100,000’s) songs on-line. This affects memory and computation requisites, since the system should observe several radio stations, give results on-line and should not be very expensive in terms of hardware. In this scenario, a particular abstraction of audio to be used as robust fingerprint is presented: audio as a sequence of basic sounds. The whole identification system works as follows. An alphabet of sounds that best describe the music is extracted in an off-line process out of a collection of music representative of the type of songs to be identified. These audio units are modeled with Hidden Markov Models (HMM). The unlabeled audio and the set of songs are decomposed into these audio units. We end up with
Audio Fingerprinting: Concepts And Applications
241
a sequence of symbols for the unlabeled audio and a database of sequences representing the original songs. The song sequence that best resembles the sequence of the unlabeled audio is obtained using approximate string matching [16]. Audio Signal
Preprocessing
Fingerprint
Mel-Frequency Analysis
Acoustic Modeling
ADU HMM Models
Fig. 3. Fingerprint extraction case study
The rationale is indeed very much inspired on speech technology. In speech, an alphabet of sound classes, e.g: phonemes, can be used to transcribe a collection of raw speech data into a collection of text. We achieve thus a great deal of redundancy reduction without losing “much” relevant information. Similarly, we can view a collection of songs as concatenations of musical phonemes. “Perceptually equivalent” acoustic events, say drum kicks occur in different commercial songs. In speech, the phonemes of a given language are known in advance. In music, the phonemes, which we refer to as Audio Descriptor Units, are unknown. The discovery of the audio descriptor units is performed via unsupervised training, that is, without any previous knowledge of music events, with a modified Baum-Welch algorithm [4, 21]. The audio data is pre-processed by a front-end in a frame-by-frame analysis. Then a set of relevant feature(s) vectors is extracted from the sound. In the acoustic modeling block, the feature vectors are run against the statistical models of the ADU: HMM-ADU using the Viterbi algorithm. As a result, the most likely ADU sequence is produced. The fingerprint consists then in a sequence of symbols with their associate start and end times. The average output can be adapted depending on the task tuning the alphabet size and the average number of symbols per minute [5].
5.3 Integrity verification task The presented framework is used for audio-integrity verification in combination with watermarking [15]. The verification is performed by embedding the fingerprint into the audio signal by means of a watermark. The original fingerprint is reconstructed from the watermark and compared with the new fingerprint extracted from the watermarked signal (see Fig. 4). If they are identical, the signal has not been modified; if not, the system is able to determine the approximate locations where the signal has been corrupted. The watermarked signal should go through content preserving transformations, such as D/A and A/D conversion, resampling, etc. without triggering the corruption alarm. The requirements of the fingerprint of this particular task include: an extremely small fingerprint size and robustness to content-preserving transformations. The fingerprint extractor was thus tuned for a small size ADU alphabet (16
242
P. Cano et al.
symbols) and an average output rate of 100 ADU per minute. As a result of the tuning, the average fingerprint rate is approximately 7 bits/s. The small fingerprint rate allows its watermark embedding in the signal with high redundancy and enough information to detect corruptions.
Original audio signal
Fingerprint extraction Fingerprint
Watermark embedding
Watermarked signal
(a) Watermarked signal
Watermark extraction
Fingerprint extraction Current fingerprint
Original fingerprint
Results
Comparison
(b)
Fig. 4. Self-embedding integrity verification framework: (a) fingerprint embedding and (b) fingerprint comparison.
Results of cut-and-paste tests are presented for four 8-s test signals: two songs with voice and instruments (signal “cher”, from Cher’s “Believe”, and signal “estrella morente”, a piece of flamenco music), one song with voice only (signal “svega”, Suzanne Vega’s “Tom’s Diner”, a cappella version), and one speech signal (signal “the breakup”, Art Garfunkel’s “The Breakup”). Fig. 5 shows the simulation results for all test signals. For each signal, the two horizontal bars represent the original signal (upper bar) and the watermarked and attacked signal (lower bar). Time is indicated in seconds on top of the graph. The dark-gray zones correspond to attacks: in the upper bar, they represent segments that have been inserted into the audio signal, whereas in the lower bar they represent segments that have been deleted from the audio signal. Fingerprint information (i.e. the ADUs) is marked over each bar. For all signals, the original fingerprint was successfully reconstructed from the watermark. Detection errors introduced by the cut-and-paste attacks were eliminated by exploiting the redundancy of the information stored in the watermark. A visual inspection of the graphs in Fig. 5 shows that the ADUs in the vicinities of the attacked portions of the signal were always modified. These corrupted ADUs allow the system to determine the instant of each attack within a margin of approximately ±1 second. For the last signal (“the breakup”), we also observe that the attacks induced two changes in relatively distant ADUs (approximately 2 s after the first attack and 2 s before the second one). This can be considered a false alarm, since the signal was not modified in that zone.
Audio Fingerprinting: Concepts And Applications
243
Fig. 5. Simulation results: (a) signal “cher”; (b) signal “estrella moriente”, (c) signal “svega”, (d) signal “the breakup”.
5.4 Content-based similarity task As mentioned in Sect. 3.4, the identification requirements can be relaxed to offer, for instance, a similarity search. [10] is an example of browsing and retrieval using a similarity measure derived from the fingerprints. In this setup, a user can give the system a musical piece and ask for similar ones. The best match will be the original song and the next 10 best matches, hopefully similar ones. Another possible feature allows the interaction with a huge collection of music. A similarity matrix is obtained by comparing all songs against all songs. Using visualization techniques, like Multidimensional Scaling (MDS), one can then map the songs into points of an Euclidean space. In Fig. 6, we can see a representation of 1840 commercial songs using MDS.
6 Summary We have presented an introduction to concepts related to Audio Fingerprinting along with some possible usage scenarios and application contexts. We have reviewed the requirements desired in a fingerprinting scheme acknowledging the existence of a trade-off between them.
244
P. Cano et al.
MDS 0.6
0.4
0.2
0
−0.2
−0.4
−0.6 −0.5
0
0.5
1
Fig. 6. Representing 1840 songs as points in a 2-D space by MDS. Asterisks and circles correspond to songs by Rex Gildo and Three Doors Down respectively.
Most applications profit from the ability to link content to unlabeled audio but there are more uses. The list of applications provided is necessarily incomplete since many uses are likely to come up in the near future.
References 1. Request for information on audio fingerprinting technologies, (2001) http://www.ifpi.org/site-content/press/20010615.html 2. Audio identification technology overview, (2002) http://www.audiblemagic.com/about 3. Allamanche E, Herre J, Hellmuth O, Fr¨oba B, Cremer M (2001) AudioId: Towards content-based identification of audio material. Proc. AES 110th Int. Conv., Amsterdam, Netherlands 4. Batlle E, Cano P (2000) Automatic segmentation for music classification using competitive hidden markov models. Proc. of the Int. Symp. on Music Information Retrieval, Boston, US 5. Batlle E, Masip J, Cano P (2003) System analysis and performance tuning for broadcast audio fingerprinting. Proc. of the DAFX, London, UK 6. Blum T L, Keislar D F, Wheaton J A, and Wold E H (1999) Method and article of manufacture for content-based analysis, storage, retrieval and segmentation of audio information. US Patent 5,918,223 7. Boney L, Tewfik A, Hamdy K (1996) Digital watermarks for audio signals. IEEE Proceedings Multimedia, pps. 473–480
Audio Fingerprinting: Concepts And Applications
245
8. Cano P, Batlle E, Kalker T, Haitsma J (2002) A review of algorithms for audio fingerprinting. Proc. of the IEEE MMSP, St. Thomas, V.I. 9. Cano P, Batlle E, Mayer H, Neuschmied H (2002) Robust sound modeling for song detection in broadcast audio. Proc. AES 112th Int. Conv., Munich, Germany 10. Cano P, Kaltenbrunner M, Gouyon F, Batlle E (2002) On the use of fastmap for audio information retrieval. Proc. of the Int. Symp. on Music Information Retrieval, Paris, France 11. Cano P, Kaltenbrunner M, Mayor O, Batlle E (2001) Statistical significance in songspotting in audio. Proc. of the Int. Symp. on Music Information Retrieval, Bloomington, Indiana 12. Craver S A, Wu M, Liu B (2001) What can we reasonably expect from watermarks? Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY 13. Dannenberg R, Foote J, Tzanetakis G, Weare C.(2001) Panel: New directions in music information retrieval. Proc. of the ICMC, La Havana, Cuba 14. Gomes L de C T, Cano P, G´omez E, Bonnet M, and Batlle E (2003) Audio watermarking and fingerprinting: For which applications? Journal of New Music Research 32(1) pps. 65–82 15. G´omez E, Cano P, Gomes L de C T, Batlle E, Bonnet M (2002) Mixed watermarkingfingerprinting approach for integrity verification of audio recordings. Proc. of the Int. Telecommunications Symposium, Natal, Brazil 16. Gusfield D (1997) Algorithms on strings, trees and sequences. Cambridge University Press 17. Haitsma J, Kalker T, Oostveen J (2001) Robust audio hashing for content identification. Proc. of the Content-Based Multimedia Indexing, Firenze, Italy 18. Kalker T (2001) Applications and challenges for audio fingerprinting. Presentation at the 111th AES Convention, New York 19. Mihc¸ak M K, Venkatesan R (2001) A perceptual audio hashing algorithm: a tool for robust audio identification and information hiding. 4th Int. Information Hiding Workshop, Pittsburg, PA 20. Moving Picture Experts Group (MPEG) (2002) Mpeg working documents 21. Vidal E and Marzal A (1990) A Review and New Approaches for Automatic Segmentation of Speech Signals. Signal Processing V: Theories and Applications, Elsevier Science Publisher, pps. 43–53
CHAPTER 18 Mining Technical Patterns in The U. S. Stock Market through Soft Computing Ming Dong1 and Xu-Shen Zhou2 1 2
Department of Computer Science Wayne State University, Detroit, MI 48202, USA
[email protected], Department of Finance and Legal Studies, Bloomsburg University, Bloomsburg, PA 17815, USA
[email protected]
Abstract: Technical analysis has been a part of financial practice for many decades. One of the most challenging areas in technical analysis is the automatic detection of technical patterns that are similar in the eyes of expert investors. In this chapter, we propose a soft computing based approach for technical analysis. By introducing the inter & intra fuzzification into an automatic pattern detection and analysis process, we incorporate human cognitive uncertainty into the technical analysis domain. The importance of fuzzy technical patterns on investment decisions is confirmed through a neural network based saliency analysis. Using a random sample of U. S. stocks, we find that our approach is able to detect subtle differences within a clearly defined pattern. Our results suggest that such subtle differences could be a source of controversy surrounding technical analysis. Compared with existing visual technical pattern analysis approaches, our soft computing based approach offers superior precision in detecting and interpreting technical patterns. Keywords: Fuzzy Logic, Neural Networks, Data Mining, Technical Patterns
1.1 Introduction The progress of computer technologies has dramatically reduced the cost of collecting, storing and retrieving data. However what is really useful is not the data itself, but the knowledge that we can acquire from it. With the huge amount of stock data readily available, automatic knowledge acquisition capabilities are becoming more and more important. Recent applications of soft computing technology have inspired great interest among researchers. Wong et al. [1] develop a fuzzy neural system for stock selection. The system is intended to resemble experts’ knowledge. Using a genetic algorithm approach, Allen et al. [2] found technical trading rules for the S&P 500 index. They then apply these rules to out of sample daily prices of the S&P 500. Technical analysis of securities, which is based on price and volume rather than underlining firms’ fundamentals, has been a practice among practitioners as well as academic
M. Dong and Xu-S. Zhou:Mining Technical Patterns in The U. S. Stock Market through Soft Computing, Studies in Computational Intelligence (SCI) 2, 247–263 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
248
M. Dong and Xu-S. Zhou
researchers for many decades. Leigh et al. [3]’s romantic decision support system incorporates pattern recognizers, neural networks, and genetic algorithms. In forecasting the NYSE composite index using technical analysis, the system achieved high quality results. Recently Lo et al. [4] proposed to use pattern recognition and statistical inference methods in technical analysis. Building upon their foundation work, we use fuzzy logic to introduce human cognitive uncertainty [5] into the automatic detection and analysis process. This reformulation of technical indicators makes machines one step closer to thinking and reasoning like human experts in the stock markets. Specifically, we use the Gaussian kernel to smooth the adjusted stock price time series [6], then we identify five extrema that meet the definition of eight technical pattern templates provided in [4]. We fuzzify each technical pattern we had previously identified and assign a membership value from 0 to 1 to each. Based on a random sample of 1451 U. S. stocks from 1962 to 2000, we first show that these fuzzy memberships play an important role in investment decisions using a neural network based saliency analysis. Specifically, our results show that among the 26 variables that are related to future returns, membership value has the second highest saliency coefficients in average, and thus strong relevance. We then test if the occurrence of technical patterns with a certain membership value would signal the future abnormal returns of the stocks. Our findings suggest that the fuzzy logic approach can be used to detect subtle differences even within a single type of pattern. The findings can also explain why there are so much controversy surrounding technical analysis. The remainder of this chapter is organized as follows: Section 1.2 provides our motivation to introduce fuzzy logic into technical analysis and the details of our fuzzy logic model of technical patterns. A brief review of neural network based saliency analysis is introduced in Section 1.3. Section 1.4 presents the data sample, analysis, and the results. Section 1.5 concludes.
1.2 Fuzzy Logic Based Automating Technical Analysis 1.2.1 Gaussian kernel based smoothing Any study of technical analysis starts from the recognition that the stock market is a nonlinear dynamic system and that nonlinearity contains certain regularities or patterns. In general, the stock price Pt can be described by the following equation, Pt = H(Xt ) + et ,t = 1, 2, ..., T
(1.1)
where H is a dynamic system, Xt is a state vector at time t and et is white noise. One of the most common methods for eliminating white noise is smoothing, in which the white noise is greatly reduced by averaging the data. Assume we have n observations of Pti , i = 1, 2, ..., n, it ˆ t ) = P¯t → H(Xt ) when n → ∞. Surely Pt is a time series and can be easily shown that H(X there is no way that we can get n observations of P(t). However if we assume H is sufficiently smooth, we can use the average over a predefined neighborhood instead of over n observations. In this case, P¯t can be calculated by the weighted sum as follows, P¯t = ∑ wi Pi , i ∈ S
(1.2)
S
where S is the pre-defined neighborhood and wi is the corresponding weight for data point Pi . One popular approach to choosing the weights is to use Gaussian kernel regression. If there are n data points in the neighborhood of point t, we can generate n Gaussian distributions, each distribution centered at one of the data points. Then we have,
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
P¯t =
n
1
i=1
(σ2 2π) 2
∑ Pi
e−(Pi −Pt )
2
1
/2σ2
, i = 1, 2, ..., n
249
(1.3)
where σ2 , the variance of the Gaussian distribution, acts as a smoothing parameter. When is too large, the data is oversmoothed. Conversely, if σ2 is too small, the data will be undersmoothed. σ2
1.2.2 Motivation The second step of automating technical analysis is the detection of technical patterns. The detection must be able to match the judgment of a professional technical analyst. Following Lo et al. [4], we use a sequence of five consecutive local extrema E1 , ..., E5 to describe a pattern template. In the following, we will focus on the eight pattern templates that have been proposed in Lo et al.. They are head-and-shoulders (HS) and inverted head-and-shoulders (IHS), broadening tops (BTOP) and bottoms (BBOT), triangle tops (TTOP) and bottoms (TBOT), and rectangle tops (RTOP) and bottoms (RBOT). For the convenience of discussion, we duplicate the definitions of HS, IHS, and RBOT as follows, Definition 1 Inverted head-and-shoulders (IHS) pattern is characterized by a sequence of five consecutive local extrema E1 , ..., E5 such that E1 is a minimum E < E1 , E3 < E5 IHS = { 3 E1 and E5 are within 1.5 percent of their average E2 and E4 are within 1.5 percent of their average Definition 2 Rectangle bottoms (RBOT) pattern is characterized by a sequence of five consecutive local extrema E1 , ..., E5 such that E1 is a minimum Tops are within 0.75 percent of their average RBOT = { Bottoms are within 0.75 percent of their average Lowest top > highest bottom Although the definitions above are straightforward, their crisp nature suffers from inadequately handling the uncertainty of human perception and reasoning [7, 8]. Consequently they cannot truly reflect the judgments of a professional technical analyst. By introducing fuzzy logic into the definition of technical patterns, we will have a better way to match the opinion of a professional technical analyst. In practice, there are two separate steps to introduce fuzziness: Inter-fuzzification and Intra-fuzzification. To illustrate, consider the following two examples we get from the U. S. stock market. •
The data of the first example are daily prices of the Potlatch Corp. (ticker: PCH) from July 21, 1997 to October 23, 1997. Figure 1.1 shows the prices after Gaussian kernel based smoothing and corresponding extrema detection. It is not hard to see that the pattern shown in Figure 1.1 fits both the above description of the IHS and RBOT templates. The membership value of IHS is 0.974 and that of RBOT is 0.664 according to our algorithm. This is the case that we want to distinguish from the single pattern match. By introducing the inter-fuzzification process, we can handle the situation with more flexibility and act more like human beings. The detail of inter-fuzzification process is given in section 1.2.3.
250
M. Dong and Xu-S. Zhou
PCH, July 21, 1997 to October 23, 1997 52
Smoothed Price Maxima Minima
51
Price
50
49
48
47
46 20
30
40
50 Trading Date
60
70
80
Fig. 1.1. Illustration example of intrafuzzification. The data are the smoothed daily prices of the Potlatch Corp. (PCH) from July 21, 1997 to October 23, 1997. •
The data of the second example are smoothed daily prices of the California Water Service Group Holding (ticker: CWT) from June 1, 1992 to July 29, 1992 (upper panel of Figure 1.2) and from April 9, 1990 to May 23, 1990 (bottom panel of Figure 1.2) respectively. Again the prices after Gaussian kernel based smoothing and corresponding extrema detection are shown in Figure 1.2. If we examine the two patterns shown in Figure 1.2 carefully, we will find that both of them satisfy the four conditions of the IHS template. However, it is obvious that they have differences based on our visual judgments. For example, the price difference between the two local maxima of the upper pattern is much less than that of the bottom pattern. By introducing the intra-fuzzification process, we can model those subtle differences with great ease. In fact, the membership of the upper pattern is 0.99 while the membership of the bottom pattern is only 0.76 according to our intra-fuzzification process given in section 1.2.3.
1.2.3 Fuzzification process We followed a consistent method of fuzzifying the membership for all 8 pattern templates. As mentioned before, there are two steps of fuzzification: inter-fuzzification and intra-fuzzification. The first step, inter-fuzzification, simply means the fuzzification is done among the pattern templates. The method is quite straightforward. Assume we have a membership vector m = [m1 , m2 , ..., m8 ] where the elements mi , i = 1...8 indicate the corresponding membership value for 8 pattern templates. If the testing pattern only belongs to one pattern template, we simply set the corresponding membership value to 1 while keeping all other membership values at 0. In the case where the testing pattern belongs to several pattern templates at the same time, illustrated by the pattern shown in Figure 1.1, we set all corresponding membership values to 1. Then we will get a binary membership vector in which 1 indicates that the testing pattern belongs to the corresponding pattern template and 0 does not. By introducing the intra-fuzzification process, we try to model the subtle differences of patterns within the same pattern template. We fuzzify the crisp conditions of each pattern template using the trapezoid membership function shown in Figure 1.3. The parameters of
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
251
CWT, June 01, 1992 to July 29, 1992 Smoothed Price Maxima Minima
32
Price
31.5
31
30.5
30
85
90
95
100 105 Trading Date
110
115
120
CWT, April 09, 1990 to May 23, 1990 Smoothed Price Maxima Minima
27
26.8
26.6
Price
26.4
26.2
26
25.8
25.6
25.4
25.2 25
30
35
40 Trading Date
45
50
Fig. 1.2. Illustration example of intrafuzzification. The data are smoothed daily prices of the California Water Service Group Holding (ticker: CWT) from June 1, 1992 to July 29, 1992 (upper panel) and from April 9, 1990 to May 23, 1990 (bottom panel) respectively.
trapezoid membership functions for each condition of each pattern template are shown in Table 1.1. For example, the second row of Table 1.1 shows the parameters for fuzzification of the first condition of the IHS pattern template. Here the fuzzification is based on the variable x that is defined as E3 − Ave1 (1.4) x= Ave1 − Ave2 where Ave1 = (E1 + E5 )/2 is the average value of the first and last maximum and Ave2 = (E2 + E4 )/2 is the average value of the first and second minimum. Under such a definition, variable x will indicate how high “the head” is above “the shoulder” relative to the distance between “the shoulder” and “the body”(the two minima). According to our visual observation, when x is less than 0.1, “the head” is so close to “the shoulders” that the entire pattern looks rather flat. Therefore we set the membership value to 0. When x is above 40, “the head” is very high and it looks like a spike in price instead of the normal IHS pattern. Again we set the membership value to 0 in this case. When x is in the range of [1, 5], “the head”, “the shoulders” and “the body” are well placed and give us a perfect visualization of the IHS
252
M. Dong and Xu-S. Zhou
pattern. In such situations, we set the membership value to 1. Based on this procedure, we can get those parameters for fuzzification of the first condition of the IHS pattern template. Although the choices of the variable and parameters are ad hoc, we can re-define the variable x in Equation 1.4 or adjust the parameters to achieve the best out of the sample prediction results in ourexperiments.
Membership
1
0
l
lp
rp
r
x
Fig. 1.3. The trapezoid membership function.
Similarly, we can define the variables for all fuzzification processes (each condition and each pattern template) and these are listed in Table 1.2. We fuzzify each variable using the corresponding membership functions defined in Table 1.1. Then we can get the inter-fuzzification membership value by averaging all memberships within the pattern template. For example, if the three sub-intra-memberships of a pattern for the IHS pattern template is 0.8, 0.6, 1, the final intra-membership value of that pattern for the IHS pattern template is (0.8 + 0.6 + 1)/3 = 0.8. If we do this for all the pattern templates, we will get another vector n = [n1 , n2 , ..., n8 ] that contains the intra-membership values of the testing pattern for all pattern templates. Finally, the membership values w of the testing pattern for each pattern template after the two-step fuzzification can be described by the following equation, w = [m1 ∗ n1 , m2 ∗ n2 , ..., m8 ∗ n8 ]
(1.5)
1.3 Neural Network Based Saliency Analysis Correlation coefficients between two variables or two time series are often used to explore the relationships between them. However, only the linear relation can be shown. Coherence in time series analysis is the counterpart of correlation coefficient in two random variables analysis. Again coherence is only suitable to measure the linear relationship between two time series, even though a certain time delay is allowed. Neural networks based saliency analysis for time series has three advantages when compared with correlation coefficients/coherence,
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
Pat. HS/IHS BTOP/ BBOT TTOP/ TBOT RTOP/ RBOT
Cd. 1 2 3 1 2 1 2 1 2 3
l 0.1 −∞ −∞ 0.1 1.2 0.1 1.2 −∞ −∞ 1
lp 1.0 −∞ −∞ 0.8 2 0.8 2 −∞ −∞ 5
rp 5.0 0.5% ∗ p1 0.5% ∗ p2 1.2 4 1.2 4 0.5% ∗ p3 0.5% ∗ p4 ∞
253
r 40.0 4% ∗ p1 0.5% ∗ p2 10 15 10 15 4% ∗ p3 4% ∗ p4 ∞
Table 1.1. The trapezoid membership function parameters for the technical patterns HS, IHS, E2 +E4 5 p3 = E1 +E33 +E5 and BTOP, BBOT, TTOP, TBOT, RTOP and RBOT. p1 = E1 +E 2 , p2 = 2 E2 +E4 p4 = 2 . See Figure 1.3 for the meaning of parameters l, l p, r and rp
Pat.
Cd. 1 HS/IHS 2 3 1 BTOP/BBOT 2 1 TTOP/TBOT 2 1 RTOP 2 3 1 RBOT 2 3
Variable
2∗E3 −E1 −E5 E1 +E5 −E2 −E4 0.5 ∗ |E5 − E1 |
0.5 ∗ |E4 − E2 | E5 −E3 E3 −E1 E5 −E4 E1 −E2 E5 −E3 E3 −E1 E1 −E2 E5 −E4
0.5 ∗ (p1 − p2 ) 0.5 ∗ |E2 − E4 | 100 ∗ (p2 /p3 − 1) 0.5 ∗ (p1 − p2 ) 0.5 ∗ |E2 − E4 | 100 ∗ (p4 /p1 − 1)
Table 1.2. The variables for all fuzzification process (each condition and each pattern template). p1 = Max(E1 , E3 , E5 ), p2 = Min(E1 , E3 , E5 ), p3 = Max(E2 , E4 ) and p4 = Min(E2 , E4 )
• • •
It attempts to perform simultaneously the analysis and the processing of the data: the input/output analysis process is part of the training process and their relationships are sought for optimizing a cost function (usually forecasting). Neural networks are able to capture nonlinear relationships between inputs and outputs. We can have many different inputs feed the neural network. Therefore, it is possible to investigate multivariate relations.
A neural network’s performance depends on its weights. We need to find a heuristic function that can reveal the relationship between the kth input and the output based on the weights W . We call it the C function, denoted by C(W, k). Generally there are three ways to define the C function,
254 • • •
M. Dong and Xu-S. Zhou
Zero order functions which use only the weight values [9]. First order functions which use the first derivatives of weight values [10]. Second order functions which use the second derivatives of weight values [6].
Considering the influence of the perturbation of weight w jk on the change of energy function E ( j is the index of the neurons in hidden layer), we choose the saliency measure defined as follows [11], ∂2 E (1.6) s jk ≈ 2 w2jk ∂w jk And our C function will be, ∂2 E C(W, k) = ∑ 2 w2jk ∂w j jk The second derivative term,
∂2 E , ∂w2jk
(1.7)
can be easily calculated by the chain rule. Given a
two layer feed-forward neural network with 1 output neuron and assume that the activation functions of the hidden layer and output layer are f1 (x) = 1+e1 −x and f2 (x) = x respectively, the second derivative can be reduced to a very simple form, ∂2 E = xk2 w1 j ( f1 2 − f1 e) ∂w2jk
(1.8)
where e is the training error and xk is the kth input. Let a network have q input time series. If a time series xm (m=1, 2, ..., q) has previous p steps information be the inputs, then the C function for the time series xm is: p
C(xm ) =
∑ C(W, k)
(1.9)
k=1
For one training epoch, let Cmax be the largest value of all C(xm ) (m=1, 2, ..., q). Then the saliency coefficient of a time series xm is defined as: S(xm ) =
C(xm ) Cmax
(1.10)
1.4 Data Selection, Analysis and Results 1.4.1 Data and Sample Selection We select data from the 2000 Center for Research in Security Prices (CRSP) daily database. We first list all of the firms each year from 1962 to 2000 and select firms that are assigned CRSP size deciles from 1 to 10. Decile 1 contains the smallest firms and decile 10 contains the largest. From each decile, we randomly select 200 firms with replacement. To be included in our sample, the firm must meet the following criteria: 1. It must be listed on the NYSE, AMEX, or Nasdaq and must be an ordinary common stock of a U.S. based corporation (CRSP share codes 10 and 11). We select stocks based on this criterion in order to be consistent with the selection criteria imposed by corporate events study researchers, such as [12] and [13]. This leaves 1699 firms in our sample.
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
255
2. It must exist on the CRSP database for consecutive 24 months. In this way, each sample stock would have at least six months’ data available for our pattern detection. 3. It must contain at least 80 percent of the non-missing price observations in the sample period. Following [4], we also omit missing price observations when we apply our algorithm to the data. Our final sample contains 1451 stocks with a total of 44,150 technical pattern occurrences detected. In the following, we focus our study on the head-and-shoulders (HS), inverted headand-shoulders (IHS), rectangle tops (RTOP), and rectangle bottoms (RBOT) patterns.
1.4.2 Saliency Analysis For each trading day, we use stock prices 10 days before and 3 days after in the Gaussian kernel smoothing method. Let t = 0 be the day when the pattern has completed for 3 days. We calculate the five-day post-pattern return from t = 0 to t = 5, so the return does not contain any information that is used in detecting technical patterns. We use this post-pattern return as the output of a neural network. We use the following 26 variables as the input to the neural network: the membership value (memb-value), the pattern length (length), size, stock price, prior one-year return (return-1yr), prior one-month return (return-1mon), the daily returns from t = -10 to t = -1 (return-10 to return-1), and the daily turnover of these prior ten days. The turnover is defined as the ratio of trading volume to the number of shares outstanding. The inputs and output are all normalized. The network has 40 hidden neurons and it is trained by back propagation with an adaptive learning rate and momentum. The corresponding training parameters are: training rate = 0.01, increasing rate = 1.05, decreasing rate = 0.7 and momentum threshold = 1.04. It is trained for 4, 000 epochs. At the end of 4, 000th epoch, we compute the saliency coefficients for each input variables to see the impact of each input on the post-pattern return. We trained the network 20 times with random initial weights and the average results of network training, saliency coefficients, their rank, and statistics for head-and-shoulders pattern are shown in Table 1.3. At the end of 4, 000 epochs, the average mean of squared errors (MSE) is 0.34 with a standard deviation of 0.019. The pattern length has saliency coefficient of 1 in 17 out of 20 training. It has the highest mean coefficient followed by membership value and firm size. Prior one-month return ranks 5th. Stock price has the lowest saliency coefficients. These results are consistent with various previous studies. Studies show that the head-and-shoulders pattern is informative. Our pattern length and membership value further define the shape of pattern, thus they are the most relevant to post-pattern return. Firm size is a well-known factor in assets returns. Our saliency results confirm that. It has also been documented that price momentum affect returns. Our prior one-month return is the 5th most relevant input. It is interesting to note that return-8 and turnover-10 rank higher than return-1, return-2, return-3, and turnover-1, which seems counterintuitive. Intuitively we feel that the most recent returns and volume turnover should have larger impact on the five-day return. The reason is that our data contain the head-and-shoulders with the pattern length ranging from 20 to 50 trading days. So t = -10 to -8 are likely to be the days surrounding the last one or two extrema, while t = -3 to t = -1 are the end of the right shoulder of the pattern. So our results are consistent with the technical analysis practice that investors look for returns and volume around turning points. Thus, the information at t = -3 to t= -1 may not be as important as information at t =-10 to t =-8. We also calculate the correlation coefficients between the five-day return and 26 input variables and rank them based on their absolute value. Return-2 ranks the highest with -0.066 and turnover-9 ranks the lowest with -0.001. The membership value is the 12th with -0.026. With a
256
M. Dong and Xu-S. Zhou
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Inputs length memb-value size return-8 return-1mon turnover-10 return-4 return-6 return-1 return-7 return-9 return-5 return-3 return-10 turnover-7 return-2 turnover-8 turnover-6 turnover-9 turnover-2 turnover-1 return-1yr turnover-3 turnover-5 turnover-4 price
Mean 0.963 0.759 0.686 0.645 0.610 0.555 0.548 0.542 0.525 0.494 0.439 0.434 0.427 0.424 0.338 0.328 0.274 0.267 0.266 0.254 0.232 0.214 0.203 0.078 0.075 0.009
Std. 0.069 0.154 0.172 0.172 0.130 0.165 0.158 0.153 0.133 0.187 0.149 0.158 0.108 0.092 0.123 0.133 0.087 0.065 0.099 0.113 0.083 0.085 0.063 0.017 0.028 0.003
Min 0.692 0.478 0.421 0.380 0.346 0.271 0.337 0.306 0.257 0.273 0.280 0.254 0.224 0.287 0.187 0.197 0.127 0.140 0.115 0.146 0.157 0.102 0.124 0.044 0.034 0.006
Max 1.000 1.000 0.977 0.958 0.908 0.883 0.980 0.831 0.745 1.000 0.855 0.767 0.586 0.663 0.620 0.685 0.466 0.374 0.517 0.565 0.474 0.484 0.356 0.114 0.120 0.016
Table 1.3. Saliency coefficients and their rank and statistics of 26 input variables for the headand-shoulders pattern. The output is the five-day post-pattern return. With 40 hidden neurons, the network has been trained 20 times, each time for 4, 000 epochs. At the end of 4, 000 epochs, the average mean of squared errors (MSE) is 0.34 with a standard deviation of 0.019.
0.0097 value, the pattern length is the 23th. Having the correlation coefficient of -0.0192 with the output, firms size ranks 14th. Based on the correlation coefficients results, none of the 26 variables are important for determining the five-day post-pattern return. Among all variables that are related to the head-and-shoulders pattern, membership value and pattern length have even lower correlation coefficients compared to other variables. The saliency results, however, show that membership value and pattern length followed by firm size are the most important variables, consistent with many other studies and practice. We obtained similar results for IHS, RTOP, and RBOT patterns, which is summarized in Table 1.4. Table 1.5 through Table 1.6 present the summary of the saliency coefficients ranks of 26 input variables when the output is one-month and three-month post-pattern return respectively. It is clear that pattern length, fuzzy membership value and firm size are still the most important variables for longer term returns.
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
257
Input Variables HS Rank IHS Rank RTOP Rank RBOT Rank Ave. Rank length 1 1 2 2 1.50 memb-value 2 2 1 3 2.00 size 3 4 5 1 3.25 return-1mon 5 3 8 5 5.25 return-1 9 8 4 7 7.00 return-10 14 5 11 4 8.50 return-3 13 10 6 6 8.75 return-4 7 9 12 10 9.50 return-8 4 13 7 14 9.50 return-2 16 7 9 8 10.00 return-7 10 11 10 13 11.00 turnover-10 6 15 3 23 11.75 return-9 11 16 13 11 12.75 return-5 12 12 14 16 13.50 return-6 8 14 20 12 13.50 turnover-4 25 6 17 15 15.75 turnover-3 23 17 24 9 18.25 turnover-6 18 22 16 18 18.50 turnover-7 15 21 21 19 19.00 turnover-9 19 24 19 17 19.75 turnover-2 20 19 18 24 20.25 turnover-1 21 18 22 22 20.75 turnover-8 17 23 23 20 20.75 return-1yr 22 25 15 26 22.00 turnover-5 24 20 25 21 22.50 price 26 26 26 25 25.75
Table 1.4. Average saliency ranks for five-day post-pattern returns for head-and-shoulders (HS), inverted head-and-shoulders (IHS), rectangle tops (RTOP), and rectangle bottoms (RBOT) patterns. The output is the five-day post-pattern return. The last column (Average Rank) shows the rank averaged over four different ranks in HS, IHS, RTOP, and RBOT patterns. The first column shows the rank in the ascending order of Average Rank.
1.4.3 Statistical Test and Results Saliency analysis shows that membership value and pattern length are the most relevant variables for future return, but would the occurrence of technical patterns with a certain membership value signal the future abnormal returns of the stocks? We apply our fuzzy logic based algorithm to the adjusted stock prices (adjusted for all corporate events including stock splits and dividends). Since for any trading day, we use adjusted stock prices m days before and r days after in the Gaussian kernel smoothing method (in this study we use 10 days for m and 3 days for r), we let t = 0 be the day when the pattern has completed for r days. We calculate post-pattern returns starting from t = 1, so the return on t = 1 does not contain information that is used in detecting technical patterns. Following Boehme and Sorescu’s [13] control
258
M. Dong and Xu-S. Zhou
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Input Variables HS Rank IHS Rank RTOP Rank RBOT Rank Ave. Rank length 1 1 4 2 2.00 memb-value 2 3 2 3 2.50 size 4 4 1 1 2.50 return-9 12 6 7 9 8.50 return-7 6 11 12 6 8.75 return-1 7 9 10 10 9.00 return-1mon 13 2 9 12 9.00 return-6 5 14 15 5 9.75 return-8 3 16 13 8 10.00 return-2 15 10 6 11 10.50 return-3 14 12 3 14 10.75 return-10 16 7 18 4 11.25 return-5 10 8 11 16 11.25 return-4 9 5 16 18 12.00 turnover-10 11 13 5 22 12.75 turnover-7 8 21 17 17 15.75 turnover-3 21 18 24 7 17.50 turnover-2 17 20 14 23 18.50 turnover-6 19 23 20 13 18.75 return-1yr 18 25 8 25 19.00 turnover-4 25 15 23 15 19.50 turnover-1 20 17 19 24 20.00 turnover-9 23 22 22 19 21.50 turnover-5 24 19 25 20 22.00 turnover-8 22 24 21 21 22.00 price 26 26 26 26 26.00
Table 1.5. Average saliency ranks for one-month post-pattern returns for head-and-shoulders (HS), inverted head-and-shoulders (IHS), rectangle tops (RTOP), and rectangle bottoms (RBOT) patterns. The output is the one-month post-pattern return. The last column (Average Rank) shows the rank averaged over four different ranks in HS, IHS, RTOP, and RBOT patterns. The first column shows the rank in the ascending order of Average Rank.
firm-matching method, we compute abnormal returns of sample firms relative to a group of control firms matched one by one based on size and one-year price momentum. We compute the abnormal returns for day t as: ARit = Rit − Rct ,t = 1, ...., 120
(1.11)
where Rit is the return for firm i for day t and Rct is the return for the corresponding control firm for that day. We put all firms that complete a certain pattern in a portfolio. For each day t, we compute a mean abnormal return MARt across all the firms in the portfolio: Nt
MARt =
ARit i=1 Nt
∑
(1.12)
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
259
Input Variables HS Rank IHS Rank RTOP Rank RBOT Rank Ave. Rank length 1 1 2 1 1.25 size 2 2 1 2 1.75 memb-value 3 3 3 3 3.00 return-1 4 11 5 7 6.75 return-6 5 10 6 9 7.50 return-1mon 13 4 4 10 7.75 return-9 9 14 8 5 9.00 return-5 11 7 7 12 9.25 return-2 10 12 11 6 9.75 return-4 8 5 12 14 9.75 return-3 14 13 10 4 10.25 return-8 6 8 16 11 10.25 return-10 17 6 13 8 11.00 return-7 7 9 17 13 11.50 turnover-10 15 15 15 23 17.00 turnover-6 18 23 14 18 18.25 turnover-7 12 21 21 20 18.50 turnover-1 19 16 19 24 19.50 turnover-2 20 18 18 22 19.50 turnover-4 25 17 20 16 19.50 return-1yr 22 24 9 25 20.00 turnover-3 23 19 24 15 20.25 turnover-9 21 22 23 17 20.75 turnover-8 16 25 22 21 21.00 turnover-5 24 20 25 19 22.00 price 26 26 26 26 26.00
Table 1.6. Average saliency ranks for three-month post-pattern returns for head-and-shoulders (HS), inverted head-and-shoulders (IHS), rectangle tops (RTOP), and rectangle bottoms (RBOT) patterns. The output is the three-month post-pattern return. The last column (Average Rank) shows the rank averaged over four different ranks in HS, IHS, RTOP, and RBOT patterns. The first column shows the rank in the ascending order of Average Rank.
where Nt is the total number of firms in the portfolio. We then calculate the cumulative abnormal return CARt from day 1 to day t: t
CARt =
∑ MARt
(1.13)
i=1
We compute CARs for stocks after the completion of a pattern based on the pattern membership value. We form two sample portfolios for each pattern, one containing stocks with pattern membership values no larger than 0.7 and the other containing stocks with pattern membership values larger than 0.7. The results for RBOT is shown in Figure 1.4 . Technical analysts regard RBOT as a bullish technical indicator (indicating that the stock prices would rise, [14]). Our results, however, show that the CARs for the portfolio of stocks with no larger
260
M. Dong and Xu-S. Zhou
than 0.7 membership values are actually significantly negative from the 77th day to the 120th day. Tests of the null hypothesis of equality of the means of the two portfolios and equality of the medians of the two portfolios (using the F-test and the Kruskal and Wallis test) show that during the same period, the CARs of the two portfolios are significantly different. Similar results are obtained for HS and IHS patterns. The results for HS are shown in Figure 1.5.
0.025 (0.7 1] (0 0.7]
0.02
CARs
0.015
0.01
0.005
0
−0.005
0
20
40
60 Days
80
100
120
1 F−test Kruskal and Wallis test 0.9
0.8
0.7
p value
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60 Days
80
100
120
Fig. 1.4. Cumulative abnormal returns for two portfolios (upper panel) and p-values for tests of the null hypothesis of equality of the means & medians of the two portfolios (F-test and Kruskal and Wallis test, lower panel) after the completion of the RBOT pattern. The CARs that are statistically significant from zero at 0.05 level are marked with ”*”
Based on our fuzzification procedure, a pattern with a membership value below 0.7 meets the classical definition of the pattern but visually is less appealing to investors. The results suggest that for HS, IHS, and RBOT patterns, our fuzzy logic based algorithm can be used to detect subtle differences even within a single type of pattern. The results may also explain
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
261
0.045 (0.7 1] (0 0.7] 0.04
0.035
0.03
CARs
0.025
0.02
0.015
0.01
0.005
0
−0.005
0
20
40
60 Days
80
100
120
1 F−test Kruskal and Wallis test 0.9
0.8
0.7
p value
0.6
0.5
0.4
0.3
0.2
0.1
0
0
20
40
60 Days
80
100
120
Fig. 1.5. Cumulative abnormal returns for two portfolios (upper panel) and p-values for tests of the null hypothesis of equality of the means & medians of the two portfolios (F-test and Kruskal and Wallis test, lower panel) after the completion of the HS pattern. The CARs that are statistically significant from zero at the 0.05 level are marked with ”*”
why there is so much controversy surrounding technical analysis, since a single pattern could generate entirely different post-pattern performances. The difference within a pattern detected by our fuzzy logic algorithm may not be apparent to average investors, but only to certain experts, so the experts can avoid using a pattern with a low membership value. We also form two different sample portfolios for each pattern, one containing stocks with pattern membership values of 1 and the other containing stocks with pattern membership values other than 1. The results are similar to those of two other sample portfolios mentioned above.
262
M. Dong and Xu-S. Zhou
1.5 Conclusions and Future Research In this chapter, we have presented a soft computing based approach to measure the degree of effectiveness of technical patterns. Our approach incorporates human cognitive uncertainty into automatic pattern detection and analysis process such that it may simulate the human judgment better than before. Our results show that our algorithm is able to detect subtle differences within a clearly defined pattern template. The difference may not be apparent to average investors, but only to certain experts. We find that the post-pattern performance is not consistent within a certain pattern template. This may explain why there is so much controversy surrounding technical analysis. Comparing with existing visual technical pattern analysis approaches, our approach offers superior precision in detecting and interpreting the technical patterns. Chang and Osler (1999) [15] studied the profitability and efficiency of the head-and-shoulders pattern in foreign exchange markets. They found the technical pattern is profitable, but not efficient, because simpler trading rules dominate the trading strategy based on the head-and-shoulders pattern. Our approach has the potential to explain their results. If we exclude the head-and-shoulders pattern with a certain membership value, we may improve the efficiency of the trading strategy based on the head-and-shoulders pattern. Lo et al. [4] used the conditional (conditioned on the technical patterns) and unconditional distribution of returns to test the informativeness of technical analysis. Their findings suggest that the technical patterns are informative. The informativeness, however, does not necessarily generate profits when using technical trading strategies. Our approach with firm-matching abnormal returns test would help investors to identify profitable trading opportunities. In this study, we did not consider the situation where more than one pattern show up consecutively in a short period. This may affect the post-pattern performance. For instance, if the bearish indicator HS is followed by the bullish indicator IHS, how should we handle this situation? This problem will be addressed in the future.
References 1. F. S. Wong, P. Z. Wang, T.H.G., Quek, B.: Fuzzy neural system for stock selection. Financial Analysts Journal (1992) 47–74 2. Allen, F., Karjalainen, R.: Using genetic algorithms to find technical trading rules. Journal of Finance Economics 51 (1999) 245–271 3. Leigh, W., Purvis, R., Ragusa, J.M.: Forecasting the nyse composite index with technical analysis, pattern recognizer, neural network, and genetic algorithm: a case study in romantic decision support. Decision Support Systems 32 (2002) 361–377 4. Lo, A.W., Mamaysky, H., Wang, J.: Foundations of technical analysis: Computational algorithms, staqtistical inference, and empirical implementation. Journal of Finance 4 (2000) 1705 – 1765 5. Zadeh, L.: Fuzzy sets as a bases for a theory of possibility. Fuzzy Sets and Systems (1978) 3–28 6. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995) 7. Gupta, M.M.: Twenty-five years of fuzzy stes and systems: A tribute to professor lotfi a. zadeh. Fuzzy sets and systems 40 (1991) 409–413 8. Klir, G.J.: Where do we stand on measures of uncertainty, ambiguity, fuzziness and the like? Fuzzy sets and systems 24 (1987) 141–160
Mining Technical Patterns in The U. S. Stock Market through Soft Computing
263
9. Yacoub, M., Bennani, Y.: Hvs: A heuristic for variable selection in multilayer artifiical neural network classifier. In: Proc. of ANNIE’97. (1997) 527–532 10. Refenes, A.N., Zapranis, A., J., U.: Neural model identification, variable selection and model adequacy. In: Neural Networks in Financial Engineering, Proceedings of NnCM96. (1996) 11. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In Touretzky, D.S., ed.: Advances in Neural Information Processing Systems. Volume 2., San Mateo, CA, Morgan Kaufmann Publishers (1990) 598–605 12. Michaely, R., Thaler, R., Womack, K.: Price reactions to dividend initiations and omissions: overreaction or drift? Journal of Finance 50 (1995) 573–608 13. Boehme, R.D., Sorescu, S.M.: The long-run performance following dividend initiations and resumptions: underreaction or product of chance? Journal of Finance (2002) 14. Edwards, R., Magee, J.: Technical Analysis of Stock Trends. John Magee Inc., Boston, Massachusetts (1997) 15. Chang, P.K., Osler, C.L.: Methodical madness: Technical analysis and the irrationality of exchange-rate forecasts. Economic Journal 109 (1999) 636–661
CHAPTER 19 A Fuzzy Rule-Based Trading Agent: Analysis and Knowledge Extraction Tomoharu Nakashima, Takanobu Ariyama, Hiroko Kitano, and Hisao Ishibuchi Department of Industrial Engineering Osaka Prefecture University Gakuen-cho 1-1, Sakai, Osaka 599-8531, JAPAN {nakashi,ariyama,kitano,hisaoi}@ie.osakafu-u.ac.jp Abstract: In this paper, we show how a fuzzy rule-based system is developed for trading in a futures market. By our fuzzy rule-based system, an agent determines whether it should buy a futures spot or not based on the time series of both the spot and the futures prices. The fuzzy rule-based system is fine-tuned so that the amount of profit is maximized. Since a fuzzy system is used as a decision making tool, the decision making process by the learning agent can be linguistically interpreted. The performance of the fuzzy rule-based system is evaluated in a virtual stock market. We also try to extract a knowledge base from the fuzzy rule-based system after it is fully trained. Statistical test shows the effectiveness of the extracted knowledge as a human decision support. Keywords: Fuzzy Systems, Future Trade, Online Learning, Knowledge Extraction, Time Series Data
1 Introduction Fuzzy systems have proved to be a good modeling method since the first concept of fuzzy logic was proposed [1]. Especially, fuzzy rule-based systems have been successfully applied to various problems such as control problems [2, 3]. A number of learning algorithms have been applied in order to improve the performance of fuzzy rule-based systems. Genetic algorithm [4] is one of the most important topics for the fine-tuning of the fuzzy rule-based systems. For pattern classification problems, fuzzy rule-based systems have also been successfully applied. For example, Ishibuchi et al.[5] has formulated a rule extraction method from numerical data. They used fuzzy if-then rules with a certainty grade. Abe & Lan [6] also proposed a rule extraction method from numerical data. The difference between [5] and [6] is the shape of the membership functions used in fuzzy if-then rules: The shape of membership functions in [5] was pre-specified and never adjusted while the shape of the membership functions was determined depending on the distribution of training patterns given for a particular pattern classification problem in [6].
T. Nakashima et al.: A Fuzzy Rule-Based Trading Agent: Analysis and Knowledge Extraction, Studies in Computational Intelligence (SCI) 2, 265–277 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
266
T. Nakashima et al.
While the shape of membership functions was fixed in [5], they adjusted the grade of certainty of fuzzy if-then rules in order to improve the performance of fuzzy rule-based systems in [7]. The grade of certainty of fuzzy if-then rules can be viewed as a weight of fuzzy if-then rules. It was shown that adjusting the weight of membership function has the same effect as adjusting the shape of membership functions [8]. Thus, human users can linguistically understand the fuzzy if-then rules if the shape of the membership functions used in fuzzy if-then rules can be linguistically understood. One important advantage of using fuzzy rule-based systems over other input-output systems is its linguistic interpretability. For example, it is well-known that neural networks can approximate arbitrarily complex input-output relations. It is, however, difficult to extract the information process in the neural networks since they are a black-box type of information processing tool. On the other hand, fuzzy rule-based systems can be linguistically interpreted since they normally consist of fuzzy if-then rules that linguistically describe the behavior of target systems. Furthermore, fuzzy rule-based systems have also proved to approximate arbitrarily high-order input-output systems. In this paper, we develop a trading agent that trades in futures stock index in a virtual market. The decision making of the trading agent is made by a fuzzy rule-based system. The virtual market allows a number of autonomous agents to take part in the stock market. Human users are also allowed to trade in the futures stock index in the virtual market. Autonomous agents and human users are required to determine whether they buy or sell the futures stock index, the limit price, and the quantity of the futures stock index. Fuzzy rule-based systems are trained in an on-line manner based on the result of the trade. The learning of our fuzzy rule-based trading agent is performed by adjusting the weights of fuzzy if-then rules based on the previous decision making and its evaluation. That is, the weight of a fuzzy if-then rule is increased if the agent’s decision making in the previous time step was successful. On the other hand, we decrease the weight of a fuzzy if-then rule if the decision making was not successful. The performance of our fuzzy rule-based trading agent is evaluated both by comparative experiments and by a public competition of the virtual futures market. The tuned weights of fuzzy if-then rules after enough number of trades are also analyzed for linguistically understanding the behavior of the agent. We select a small number of fuzzy if-then rules from the fuzzy rule-based system based on the difference of the weights in a fuzzy if-then rule. That is, we select those fuzzy if-then rules that have high contrast in the two weights. A set of the selected fuzzy if-then rules can be viewed as a compressed knowledge base for the futures trade. We examine the performance of the selected fuzzy if-then rules as a decision support for human trades.
2 U-Mart: A Virtual Futures Market Recently, virtual economic markets have attracted much attention for analyzing economic systems and developing autonomous agents. The U-Mart (Unreal Market as Artificial Research Test-bed) project is one of such virtual markets that trades in a futures stock index (Fig. 1). In U-Mart, machine clients (i.e., autonomous agents) and human clients are given market information such as time series data of spot prices and futures prices (called U-Mart prices). The client also has its own current information such as its position (i.e., a balancing amount of the futures index trade), remaining cash, and time to the final settlement. Based on the above information, each client has to make a decision on whether it buys or sells the futures index, the limit price of the futures trading, and the quantity of the futures trade. Thus, a client in U-Mart can be viewed as an input-output system A as follows:
A Fuzzy Rule-Based Trading Agent
267
Fig. 1. A snapshot of the U-Mart. A(S, F, Pos,Cash,t) = (BS, P, Q),
(1)
where S is the time series of spot prices, F is the time series of futures prices (i.e., U-Mart prices), Pos is the position of a client, Cash is the remaining cash, t is the remaining time to the final settlement, BS represents whether a client decides to buy or sell the futures stock index, P is the limit price, and Q is the quantity of the trade. Each U-Mart client interacts with the U-Mart server for trading in a futures stock index via the TCP/IP connection. In Fig. 2, we show a general view of the trade between a U-Mart client and the U-Mart server.
U-Mart Server
External information ( S, F )
Trade order ( BS, P, Q)
U-Mart Client Internal information ( Pos, Cash, t )
Fig. 2. A general view of the trade in U-Mart. The U-Mart server determines the futures index price by a Itayose method. That is, the UMart server first collects the order such as buy or sell of the futures index, the limit price, and
268
T. Nakashima et al.
the quantity of trade from all the U-Mart clients. Then it compares the buy orders with the sell orders. The futures price is determined at the point that the price and quantity of buy orders are matched by those of sell orders. The iteration of collecting the orders and determining the futures price is performed until the final settlement time comes.
3 Fuzzy Rule-Based System 3.1 Problem Formulation and Fuzzy If-Then Rules In this section, we introduce a fuzzy rule-based system for determining whether an agent should buy or sell a futures stock index. In the virtual futures market described in Section 2, time series data of both the spot and the futures prices are available to the agent. Let us assume that n pieces of information are used by the agent to determine whether the agent buys or sells the futures stock index. That is, the task of the agent in this paper is to make a decision on whether to buy or sell the futures stock from the information on the time series of spot and futures prices. The problem of the futures market in this paper can be viewed as a two-class pattern classification problem with n-dimensional inputs. In this paper, a fuzzy rule-based system is employed for the pattern classification problem. A fuzzy rule-based system consists of a number of fuzzy if-then rules of the following type: R j : If x1 is A j1 and . . . and xn is A jn then Buy with q j1 and Sell with q j2 , j = 1, . . . , N,
(2)
where R j is the label of the j-th fuzzy if-then rule, x = (x1 , . . . , xn ) is an input vector to the fuzzy rule-based system, A j1 , . . . , A jn are antecedent fuzzy sets, and q j1 and q j2 are real values of the fuzzy if-then rule R j that correspond to buying and selling the futures stock, respectively. As an input value we use the average difference between a spot price and a futures price. Three different time steps are used for calculating the three average differences between the spot and the futures prices. That is, the number of input variables in the fuzzy rulebased system is three and the decision making problem can be viewed as a three-dimensional two-class pattern classification problem (Fig. 3). Figure 3 shows the time series of the spot prices (sm , sm−1 , . . .) and the futures prices ( fm , fm−1 , . . .). The fuzzy rule-based system used in our virtual futures trade in this paper considers the difference in price at the same time step (i.e., st − ft , for several t s) in its decision making process. Let S = (s1 , . . . , sm ) be the time series of spot prices from the beginning of the trade until time step m where sk is a spot price at time step k. Also let F = ( f1 , . . . , fm ) be the time series of futures prices. We use the following three pieces of information as input variables for the fuzzy rule-based system: x1 = sm − fm ,
(3)
5
x2 =
∑ (sm−i − fm−i )/3,
(4)
i=3 10
x3 =
∑ (sm−i − fm−i )/3.
(5)
i=8
That is, we employ a price difference and two average price differences around three time steps (i.e., around time steps 1, 4, and 9) as input variables in our fuzzy rule-based system.
A Fuzzy Rule-Based Trading Agent
269
Price Futures price: S
fm-10
fm-9
sm
fm-5 fm-4
fm-8
sm-10
fm
Spot price: F
sm-9
fm-3
sm-5 sm-4
sm-8
sm-3
Time Current time step
Fig. 3. Input variables in our fuzzy rule-based system.
Our fuzzy rule-based trading agent in this paper makes a decision on whether it buys or sells the futures stock using a fuzzy rule-based classification system which consists of fuzzy if-then rules of the following type: R j : If x1 is A j1 and x2 is A j2 and x3 is A j3 then Buy with q j1 and Sell with q j2 , j = 1, . . . , N.
(6)
3.2 Fuzzy Inference and Decision Making Let us consider that at a particular time we have already calculated three input variables x1 , x2 , and x3 for the next decision making (i.e., whether it buys or sells the futures stock at the next time). In this section, we show how an agent makes a decision on whether it buys or sells the futures stock using the fuzzy rule-based system in (6). Assume that there are N fuzzy if-then rules in the fuzzy rule-based system. Note that each fuzzy if-then rule has two weight values q j1 , q j2 that are associated with buying and selling the futures stock, respectively. After the calculation of three input values x1 , x2 and x3 (see the last subsection for details), the evaluation values QBuy and QSell for buying and selling the futures stock are calculated as follows: N
∑ µ j (x) · q jY
QY =
j=1
N
,
Y ∈ {Buy, Sell},
(7)
∑ µ j (x)
j=1
where x = (x1 , x2 , x3 ) is an input vector to the fuzzy rule-based system, µ j (·) is a compatibility of an input vector x with a fuzzy if-then rule R j . The compatibility µ j (x) of an input vector x with a fuzzy if-then rule R j is defined by a multiplication operator as follows: µ j (x) = µ j1 (x1 ) × µ j2 (x2 ) × µ j3 (x3 ),
(8)
where µ ji (·) is a membership function of an antecedent fuzzy set A ji in the j-th fuzzy if-then rule R j . Any types of the membership functions can be used for our fuzzy rule-based system. Also we can use any number of fuzzy sets for each variable. In Fig. 4, we show membership functions that are used as antecedent fuzzy sets in this paper. We use trapezoidal and triangular type membership functions. The fuzzy partition of each axis in Fig. 4 is manually specified in
270
T. Nakashima et al.
Membership 1.0
N 0.0
-60
Z 0
P 60
(Spot price) - (Futures price) Fig. 4. Membership functions. our computer simulations. Although the automatic generation of optimal membership functions is an interesting and important topic, we do not go into the detail of such topics since it is beyond the scope of this paper. After calculating QBuy and QSell , the agent makes a decision on whether it buys or sells the futures stock according to the following decision rule: [Decision Rule: To buy or sell the futures stock] 1. If QBuy > QSell , the agent buys the futures stock, 2. Else if QBuy < QSell , the agent sells the futures stock, 3. Otherwise, the agent’s trade is the same as the decision at the previous time step. Note that the agent does not make a decision for the first six time steps since there are not enough information to calculate an input vector for the first six time steps.
3.3 Decision Making on the Limit Price and the Quantity In this paper, we assume that the agent tries to optimize the decision making only on whether it buys or sells the futures stock. Thus fuzzy if-then rules in the fuzzy rule-based system have weights corresponding to only Buy and Sell. In the U-Mart virtual futures stock market, each agent must determine not only whether to buy or sell but also the limit price and the quantity of the futures stock. After determining whether to buy or sell the futures stock by the procedure described in the last subsection, the other information that are necessary for the trade order is determined as follows. First, the limit price P in (1) is determined as follows:
fm − 15, if decision is Buy, (9) P= fm + 15, if decision is Sell, where fm is the latest spot price. That is, the limit price is determined based on the futures price at the previous time step. Then the quantity Q of the trade in (1) is determined according to QBuy and QSell in (7) as follows:
QBuy × 100, if decision is Buy, (10) Q= QSell × 100, if decision is Sell.
3.4 On-Line Learning of Fuzzy If-Then Rules In this subsection, we show how weights q j1 and q j2 , j = 1, . . . , N, of fuzzy if-then rules in our fuzzy rule-based system are adjusted so that the agent can maximize the profit through the futures trading.
A Fuzzy Rule-Based Trading Agent
271
Let us assume that the agent has already made a decision on whether it buys or sells the futures index. At the next time step, the agent is given another information on the time series data of both the spot and the futures prices. In this paper, we evaluate the agent’s decision from the latest prices sm+1 and fm+1 , and the evaluation values QBuy and QSell that were calculated by (7) in the previous subsection as follows: [Evaluation Criterion] 1. If QBuy > QSell and sm+1 > fm+1 , then the decision at the previous time step is evaluated as successful, 2. Else if QBuy < QSell and sm+1 < fm+1 , then the decision at the previous time step is evaluated as successful, 3. Otherwise the decision at the previous time step is evaluated as unsuccessful. That is, we evaluate the decision making based on the absolute price difference between the spot price and the futures price. If the agent’s decision is Buy, the evaluation for the decision is successful only if the spot price is higher than the futures price since in this case it is expected that the agent can gain some profit at the final settlement. On the other hand, if Sell is chosen, the evaluation for the decision is successful only if the spot price is lower than the futures price. The evaluation described above is used for updating the weights of fuzzy ifthen rules. The main idea of the weight update is that the weights of those fuzzy if-then rules that contribute to the successful decision making are increased while we decrease the weights of those fuzzy if-then rules that are responsible for unsuccessful decision making. Thus the update rule of the weights is described as follows: ( qold jY + α · (1 − q jY ) · µ j (x), if successful, new q jY = (11) otherwise, qold jY − α · q jY · µ j (x), Y ∈ {Buy, Sell}, j = 1, . . . , N, where α is a positive learning rate and q jBuy and q jSell are weight values of the j-th fuzzy if-then rule R j that are associated with buying and selling the futures stock, respectively.
4 Performance Evaluation 4.1 Experimental Results In this subsection, we examine the performance of our fuzzy rule-based system in the virtual futures market. In the computer simulations in this subsection, the virtual futures trade was iterated 53 times with different time series of spot prices. Before the trade, initial weights of fuzzy if-then rule R j were specified as q j1 = 0.5 and q j2 = 0.3 for j = 1, 2, . . . , N. Since the number of fuzzy sets is three for each axis (see Fig. 4) and the number of attributes is three, the total number of fuzzy rules is N = 33 = 27. For comparison, we also examined the performance of a random strategy. The random strategy randomly decides whether it sells or buys the futures stock. In order to compare the fuzzy rule-based system with the random strategy in a fair manner, the quantity is specified as 100 and the limit price is specified as in (9). We show the simulation results in Fig. 5 for the fuzzy rule-based system and Fig. 6 for the random strategy. From Fig. 5 and Fig. 6, we can see that the performance of the fuzzy rule-based system is better than that of the random strategy. The average profit of the fuzzy
272
T. Nakashima et al.
8
Total profit ( ×10 )
rule-based system is higher than that of the random strategy. However, the variance of the profit for the fuzzy rule-based system during the course of the experiments is higher than that for the random strategy. This means that our fuzzy rule-based system does not constantly gain positive profit. This issue is left for future research.
14.0 12.0 10.0 8.0 6.0 4.0 2.0 0.0 -2.0 -4.0 1
5
9
13
17
21 25 29 33 37 The number of trials
41 45
49
53
8
Total profit ( ×10 )
Fig. 5. Simulation results for the proposed fuzzy rule-based system.
14.0 12.0 10.0 8.0 6.0 4.0 2.0 0.0 -2.0 -4.0 1
5
9
13 17
21 25 29 33 37 41 The number of trials
45 49
53
Fig. 6. Simulation results for the random strategy.
4.2 Competition Results We took part in the U-Mart2001 competition which was held in August 2001 and the UMIE2002 international competition [9]. In the competitions, two types of on-line learning agents were submitted: Type A: All the weights in a fuzzy rule-based system were initialized as q jY = 0.5, j = 1, . . . , N, Y ∈ {Buy, Sell}.
A Fuzzy Rule-Based Trading Agent
273
Type B: The weights had been already learned before competitions by applying the fuzzy rule-based system to the on-line learning agent in several preliminary experiments where the environmental conditions are similar to those in the competition. Forty two agents participated in the U-Mart2001 competition. There were two parts in the competition: Qualification and final. In the qualification part, four types of time series data were used for evaluating the performance of each machine agent. For each trial of time series data, a prespecified number of best agents were given a score. The score is accumulated through the qualification trials, and about half of the best agents in terms of the accumulated scores qualify for the final competition. In October 2002, there was another competition of U-Mart. In this competition, we won first place in terms of the total profit. Unfortunately, since the detailed information on the final competition is not publicly available, we cannot describe the results.
5 Knowledge Extraction 5.1 Procedure of Knowledge Extraction The learning U-Mart client in Section 3 can be used as a knowledge acquisition tool since the adaptive fuzzy rule-based system in the U-Mart client can be seen as a knowledge base for the virtual futures trade. In this section, we examine such a possibility through laboratory experiments. The knowledge extraction procedure consists of two phases: tuning of fuzzy if-then rules in the adaptive fuzzy rule-based system and selecting a small number of fuzzy if-then rules with a large contrast between consequent weights. In the following subsections, each phase of knowledge extraction is explained.
5.2 Tuning and Interpreting the Fuzzy Rule-Based System First, the learning U-Mart client with our fuzzy rule-based system is iteratively applied to the virtual futures market. Since the learning client needs a number of iterations for learning the weights of fuzzy if-then rules, we repeated the virtual futures trade several times. After the futures trade, it is expected that those fuzzy if-then rules that are related to the critical input states have a contrast between weights for Buy and Sell. For example, the weight of Buy is larger than Sell for a fuzzy if-then rule if the U-Mart client has made many successful decisions of Buy in a situation compatible to the antecedent part of the fuzzy if-then rules. Such a fuzzy if-then rules is likely to suggest Buy in the corresponding situation. Another example is that if the U-Mart client has made many unsuccessful decisions in a situation compatible to the antecedent part of a fuzzy if-then rule, the weight corresponding to the decision making becomes smaller than the weight corresponding to the other decision making. In this case, the suggestion by such a fuzzy if-then rule is not to perform the trade action (either Buy or Sell) with the smaller weight.
5.3 Selecting a Small Number of Fuzzy If-Then Rules We examined the weights of each fuzzy if-then rule to select a small number of fuzzy if-then rules with a strong contrast between weights for Buy and Sell. From the total number of 27
274
T. Nakashima et al. Table 1. Selected fuzzy if-then rules. No. 1 2 3 4 5
x3 N N N P P
x2 N Z P Z P
x1 N P N Z P
q j1 0.598 0.778 0.546 0.573 0.924
q j2 0.086 0.318 0.870 0.800 0.141
(= 33 ) fuzzy if-then rules, we manually selected five such fuzzy if-then rules. Table 1 shows the selected fuzzy if-then rules with a strong contrast in the weights. Note that these fuzzy if-then rules were manually and subjectively selected according to the difference in the weights of fuzzy if-then rules. Although it is possible to systematically select a small number of fuzzy if-then rules using a statistical technique, it is beyond the scope of this paper. It will be investigated in our future research.
5.4 Experiments with Human Users In this subsection, we show the experimental results where human users are provided with a small number of the selected fuzzy if-then rules when they participate in the virtual futures trade. In order to make the selected fuzzy if-then rules more understandable, we visualized the selected fuzzy if-then rules as shown in Fig. 7. Figs. 7 (a) ∼ (e) correspond to the selected rules No. 1 ∼ No. 5 in Table 1, respectively. The visualization is done such that the antecedent linguistic values (N, Z, and P) are interpreted as the relative position between the present spot price st and the previous spot prices st−3 , st−2 , and st−1 , and the recommended action is determined according to the extreme value of the consequent value (0 or 1) for each selected fuzzy if-then rule (note that the value 0 means that the corresponding action is not recommended and it is recommended when the value is 1). In our experiments, six human users separately participated in the virtual futures trade twice. One experiment was done with the presentation of the selected fuzzy if-then rules, and the other without the presentation of the selected fuzzy if-then rules. For three human users, the first experiment was done with the presentation of the selected fuzzy if-then rules and the second experiment without the presentation. On the other hand, the experiments for the other three human users were done with the selected fuzzy if-then rules presented in the first experiment and without the presentation in the second experiment. This is because we need to minimize the effect of the ordering condition in the experiments. That is, the bias of presenting the selected fuzzy if-then rules in the first experiment for the first three human users is offset by the bias of presenting them in the second experiment for the other three. We show the experimental results in Table 2. Table 2 shows the remaining assets after the final settlement. From Table 2, we can see that almost all the human users could perform better with the presentation of the selected fuzzy if-then rules than without it. Thus we can expect that our learning client could become a human decision support tool. In order to confirm this observation statistically, we perform the Wilcoxon’s rank-sum test. This is a nonparametric test that is a sample t-test based solely on the order in which the observations from the two samples fall. In the Wilcoxon’s test, we order the results of human users in the descending order of the
A Fuzzy Rule-Based Trading Agent
Not Sell
Difference
Buy
Difference
-25
-25
Price (t )
Price (t )
275
+25
+25
Time
Time t- 3
t- 3
t
t- 2 t- 1
(a) N, N, N
(b) N, Z, P
Sell
Difference
t
t- 2 t- 1
Sell
Difference
-25
-25
Price (t )
Price (t )
+25
+25
Time t- 3
Time
t
t- 2 t- 1
t- 3
(c) N, P, N
t- 2 t- 1
t
(d) P, Z, Z Buy
Difference -25 Price (t ) +25
Time t- 3
t- 2 t- 1
t
(e) P, P, P Fig. 7. Visualization of the selected fuzzy if-then rules. Table 2. Experimental results. Human ID Without presentation With presentation A 258,687,000 1,449,355,000 878,280,000 2,093,095,000 B 961,393,000 2,335,901,000 C * 1,675,616,000 D 926,983,000 1,220,221,000 E 740,654,000 508,363,000 F * shows the human user went bankrupt.
final remaining assets in Table 2. The order is used as a rank for each human user. The sum the rank is used as the statistic for one-sided test. In the test that compares the null hypothesis (there is no difference between the results with and without the presentation of the selected rules) against the alternative hypothesis (there is difference), the null hypothesis is rejected
276
T. Nakashima et al.
with a 0.05 level. Thus, we can statistically say that the human users perform better with the help of the selected fuzzy if-then rules.
6 Conclusions In this paper, we developed an on-line learning agent for a futures market. A fuzzy rule-based system was used for determining whether the agent should buy or sell a particular futures stock based on the time series of both the spot and the futures prices. The fuzzy rule-based system was fine-tuned so that the amount of profit is maximized. That is, weights of fuzzy ifthen rules are adjusted so that weights of those fuzzy if-then rules that contribute to successful decision making are increased and weights of those fuzzy if-then rules that are responsible for unsuccessful decision making are decreased. The developed agent is applied to a virtual futures market where multiple machine agents and human agents simultaneously make a decision on the futures trade. Comparison results with a random strategy showed that the proposed fuzzy rule-based system performed better than the random strategy. We also reported the results of the competitions that any trading agents are publicly allowed to participate in. Another topic in this paper is to show the secondary use of fuzzy rules in the fuzzy rule-based system as a support for human users. From the trained fuzzy rule-based system, a small number of fuzzy if-then rules are selected. The selected fuzzy if-then rules are visually represented so that human users can easily understand the meaning of the selected fuzzy if-then rules. A statistical test shows that human users with the support of the selected fuzzy if-then rules perform significantly better than without the support. One of the future works for the trading agent is to implement an adaptive determination of the limit price and quantity. This could be done by using fuzzy inference in this paper. However we did not consider this issue in this paper since we would like to first evaluate the performance of the agent with only a simple decision making mechanism. The analysis of the learned agent by checking the weights of fuzzy if-then rules in the fuzzy rule-based system is also an interesting topic. Another future research includes the automatic determination of the shape and the number of fuzzy sets and an investigation on how to establish stable trade where the fuzzy rule-based system constantly gains a profit in arbitrary situations.
References 1. Yen J (1999) Fuzzy Logic – A Modern Perspective. IEEE Trans. on Knowledge and Data Engineering. 11(1):153–165 2. Sugeno M (1985) An Introductory Survey of Fuzzy Control. Information Science. 30(1/2):59–83 3. Lee C (1990) Fuzzy Logic in Control Systems: Fuzzy Logic Controller Part I and Part II. IEEE Trans. Systems, Man, and Cybernetics. 20:404–435 4. Goldberg D E (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA 5. Ishibuchi H, Nozaki K, Tanaka H (1992) Distributed Representation of Fuzzy Rules and its Application to Pattern Classification. Fuzzy Sets and Systems. 52(1):21–32 6. Abe S, Lan M -S (1995) A Method for Fuzzy Rules Extraction Directly from Numerical Data and its Application to Pattern Classification. IEEE Trans. on Fuzzy Systems. 3(1):18–28
A Fuzzy Rule-Based Trading Agent
277
7. Nozaki K, Ishibuchi H, Tanaka H (1996) Adaptive Fuzzy Rule-Based Classification Systems. IEEE Trans. on Fuzzy Systems 4(3):238–250 9(4):505–515 8. Ishibuchi H, Nakashima T (2001) Effect of Rule Weights in Fuzzy Rule-Based Classification Systems. IEEE Trans. on Fuzzy Systems 9(4):505–515 9. U-MART International Experiment 2002 in CASOS2002, the web site accessible at http://www.u-mart.econ.kyoto-u.ac.jp/umie2002/index.html.
CHAPTER 20 An Algorithm for Determining the Controllers of Supervised Entities: A Case Study with the Brazilian Central Bank Vinícius Guilherme Fracari Branco, Li Weigang and Maria Pilar Estrela Abad 1 Deorf/Copec of the Brazilian Central Bank, Brasilia – DF, Brazil
[email protected] 2 Department of Computer Science, University of Brasilia, C.P. 4466, CEP: 70919-970, Brasilia – DF, Brazil
[email protected] 3 Politec Informática Ltda., Brasilia – DF, Brazil
[email protected]
Abstract: An algorithm was developed and implemented to find controllers of the stock shares for some financial institutions for the Brazilian Central Bank (BCB). The original problem is similar to a typical Sum of Subset problem that might be solved by a backtracking algorithm and the problem complexity is NP-complete. Usually BCB solves this problem manually which is time consuming and prone to errors. The heuristical approximation algorithm presented in this paper has polynomial complexity O(n3) and is based on subroutines for determining controllers at the first two levels. The paper describes the basic concepts and business rules currently employed in BCB, our algorithm and its major subroutines. It also gives a brief complexity analysis and an example at level 2. Our experimental results indicate the feasibility of an automation of the process of finding controllers. Though developed for BCB, our algorithm works equally well for other financial institutions. Keywords: Brazilian Central Bank, heuristic approximation algorithm, Sum of Subset.
1 Introduction Although computer hardware development has increased the processing speed enormously over the last few years, there are still many real problems which cannot be solved optimally in a reasonable time using modern computers, as computation theory has proven. To find controllers of an institution or company based on information about stock shares involving intermediate entities is one example of such a problem. According to the rules of the Brazilian Central Bank (BCB), all financial institutions in Brazil, such as banks, investment and financial companies, credit societies, etc. should inform BCB about each ownership change. These institutions are called Supervised Entities (SE). After each stock movement, BCB needs to recalculate the potential controllers of an SE, which V.G.F. Branco et al.: An Algorithm for Determining the Controllers of Supervised Entities: A Case Study with the Brazilian Central Bank, Studies in Computational Intelligence (SCI) 2, 279–289 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
280
V.G.F. Branco et al.
are defined as possible combinations of shareholders who have the privilege to vote directly and hold together 51% or above of the stock shares of the SE. The importance of determining the potential controllers is significant. For example, if a SE bankrupted unreasonably, BCB needs to know its main shareholders in order to congeal immediately their properties for further juridical activities. In 2001 alone, there were 1100 financial institutions in Brazil and at least 500 SEs informed BCB about changes to their ownerships. This meant that BCB had to find out the controllers of all these institutions 500 times during 2001. Usually the ownership of an SE is complicated, because of the transitive nature of the control property due to having stock shareholders. SE A may hold stocks of SE B, SE B may be the main controller of SE C. BCB divides the distribution of stock shares into levels. In this case, SE C is at level 1, SE B at level 2 and SE A at level 3. On the other hand, to determine the controllers is a difficult problem. In terms of mathematics, the problem is known as Sum of Subset, and the best known existing method to solve it optimally is by a backtracking algorithm with the complexity of O(p(n) 2n) or O(p(n) n!), where p(n) is a nth order polynomial [2, 3, 4]. Here n corresponds to the number of individual shareholders. This means it is close to impossible to determine the controllers from a huge number of stock shares for a big SE. To solve the Sum of Subset problem, there have been various theoretical and empirical studies [5, 6, 7]. In the BCB case, the possible solution space is reduced quickly because of the great quantity of stock holders with a small amount of shares. This makes it possible to develop an approximation algorithm with polynomial complexity. Even though all other management tasks of BCB are automatically processed, the determination of controllers is still manually checked. The main disadvantages of manually searching for controllers are: it is time consuming and there are human errors. For example, to find the controllers of a financial institute involved in a three-level SEs it can take a full day for two bank officers and sometimes, they may end up with a wrong result. Recently, a project to automatically determine the controllers has been launched by BCB. The intention of it was the development of an algorithm and its implementation in a new system to establish the feasibility of the automation of the process of finding controllers. Though developed for BCB, the algorithm naturally works equally well for other financial institutions which have the same kind of applications. This paper describes the basic concepts and business rules currently employed in BCB that have to be upheld by the algorithm, the subroutines for determining controllers at the first two levels (levels 1 and 2), the main algorithm, a brief complexity analysis of it and its application at the second level (for applications at first level, see [8]).
2 Concepts, Business Rules and Example Description The basic concepts and business rules for dealing with controllers have been established by BCB. In the following, through some simple examples, we describe the problem of controller calculation at two levels. It is useful to understand them to develop an efficient model to resolve the problem [1].
2.1 Basic Concepts A Supervised Entity (SE) is a company which is registered at BCB. At any moment, BCB would like to know who the controllers of that SE are, based on ownership of the stock shares.
An Algorithm for Determining the Controllers of Supervised Entities
281
To effect the calculation of the controllers, only the stocks or quotas held by shareholders with the right to vote are considered. Types of shareholder controllers a) Natural Person (PF) - legal person represented by a single shareholder; b) Artificial Person (AP) - legal person represented by a group of shareholders; c) Individual Artificial Person (IAP, in Portuguese PJDD) - legal entities whose specific owners are not clearly defined. The IAPs can be: cooperatives, associations and foundations, investment funds, entities of private providence, public companies headquartered in the exterior, financial institutions headquartered in the exterior, excepting solicitations of BCB. An AP needs to be divided into PFs and/or IAPs who are the only entities to be considered as controllers. If an AP is found at a certain level, it surely indicates that there are upper levels of controllers at this SE. The society control of a SE is considered as definite when its final control is executed by one, and only one natural person (PF) or Individual Artificial Person (IAP) and indefinite when its final control is executed by more than one legal person. x
Levels of participation of the controllers in a Supervised Entity (SE) a) The first level: a SE is considered as on the first level when it contains only PFs and IAPs in the total composition of the stocks or quotas held by those with the right to vote. b) The second, third and other levels: if a SE possesses some APs in their composition of voting shares, and for any AP it is possible to divide it into PFs and/or IAPs. Then his entity has more than one level of participation of the controllers. In our research, the algorithm was developed with the highest level being 3. This paper concentrates on the first two levels. x
2.2 Business Rules The business rules were established by BCB for a variety of situations. The following describes the rules for stock shares at levels 1 and 2. x At level 1: The shareholders with the right of voting are considered to be the potential controllers in this work. If there is a large shareholder of a SE who holds more than fifty percent of the stock shares, he is already considered the unique controller of the company. If this condition does not exist, the calculation to find who the controllers are can use the following rules: a) assessing all the possible combinations of shareholders: if the total of the stocks from such a combination is 50% plus one of the voting shares they are considered the effective controllers of the SE; b) considering these combinations chosen in item a), if we remove a shareholder with a determined percentage of stocks, and the total of the rest of the stocks remains larger than 50%, this removed shareholder will not be considered a controller of the SE. x At level 2 a) if a shareholder in level 2 is a unique controller of an AP, he is immediately considered as a unique shareholder PF of the AP at level 1 and, if he still possesses other shares in level 1, all his shares will be aggregated for calculating the controllers of the SE.
282
b)
c) d)
V.G.F. Branco et al.
if there is no unique controller at level 2, all the controllers of this level are possible controllers. Therefore, no PF in an AP can sum his share within other APs and his share at level 1. at level 1, if two or more APs are controlled by the same group of PFs, the shares of these APs will be summed together at level 1. at level 1, if the APs are controlled by different groups of PFs at level 2, no stock shares of the same PF that constitutes within some APs and level 1 can be summed together.
2.3 Example Description Suppose a Supervised Entity (SE), called BM Bank, with four stockholders: BM Automovies (AP1), Nelson da Silva (PF1), BM Insurance (AP2) and Andre de Barros (PF3), see Fig. 1. For BM Automovies (AP1), there are three stockholders: Nelson da Silva (PF1), Anderson Ferrnando (PF2), Marcelo Dantas (PF4). For GM Insurance (AP2), there are also three stockholders: Nelson da Silva (PF1), Anderson Ferrnando (PF2), Marcelo Dantas (PF4). All PFs in APs are considered at level 2. The first step of the analysis is to transfer APs to PFs and combine them into level 1. Then we have to find the controllers of the SE at level 1 (with only PF stockholders now). In section 4, we will present the solution for this example of BM Bank in Figure 1, using our algorithm.
3 Algorithm To determine the controllers of a SE, three main subroutines are needed (what have been developed by us): a subroutine for determining controllers at level 1 (sdcl1), a subroutine for determining controllers at level 2 (sdcl2) and a subroutine for determining the possible combinations of the stock shares to form groups of controllers (sdmlc). Sdmlc arranges the most possible combinations of the stock shares for the group of controllers, but has still not proved that it can find all combinations. For every new combination, sdcl1 is called to determine the controllers within sdmlc. When there is any AP to share the stock, sdcl2 is used to analyze the stock shares of that AP and combine them to level 1 as PFs. Figure 2 shows the general procedure to determine the controllers using these subroutines.
Fig. 1 The distribution of BM Bank stock shares
An Algorithm for Determining the Controllers of Supervised Entities
283
Fig. 2 The general procedure to determine the controllers
3.1 The subroutine for determining controllers at level 1 (sdcl1) The task of this subroutine is to determine the controllers from a group of stock shares from a SE, if there are no APs. The stock shares of every shareholder are represented as percentages. The total of these percentages is 100%. The process of the algorithm is described in the following. 1) 2)
3)
Rearrange the stock share percentages of the shareholders in decreasing order: A(1) > A(2) > … > A(k) > … > A(n), where k = 1, 2, …, n. Test the first shareholder. If his/her shares equal 50%, all shareholders of the SE are controllers and this finishes the process. If his/her shares are larger than 50%, the shareholder is the only controller of the SE and this finishes the process. Create a vector Sum(k) to add one by one the stock shares from A(1), A(2), … until the A(k) with which Sum(k) arrives at 50% or higher. If Sum(k) equals 50%, all of the share-
284
4)
5)
6)
7)
V.G.F. Branco et al.
holders are controllers, and we can stop the program. If it is higher than 50%, all the shareholders whose stock is in Sum(k) are controllers, k = 1, 2, …, s. Calculate the key variable Difference = 50% - Sum(s). Test the stock shares from k=s+1: if the stock share is larger than Difference, the shareholder is a controller. Suppose this test stop at A(d), s< d < n. Create a vector Remainder_Sum (k) which equals the sum of all remaining stock shares from k to n, k = 1, 2, …, n. For k = d+1, do the following tests. If the Remainder_Sum (k) is smaller than Difference, stop the program; if it equals Difference, all the remaining shareholders are controllers; if it is larger than Difference, the shareholder of A(k) is the controller. At the same time, a vector is generated i.e. V1(p) = V1(p) + A(k), k from d+1, V1(0) = 0, if V1(p) > Difference, V1(p) = V1(p) – A(k). Suppose this test stops at A(h), d < h < n, so p = 0, 1, 2, …, h - d. There are still some additional possible controllers. The test now begins from k = h + 1. if (Remainder_Sum (k) + V1(p) = Difference) then all the remaining shareholders are controllers and we stop the program. If Remainder_Sum (k) + V1(p) > Difference, p = 0, 1, …, h - d, the shareholder of A(k) is a controller. Another vector is also generated i.e. V2(q) = V2(q) + A(k), k from h+1, V2(0) = 0, if V2(q) > Difference, V2(q) = V2(q) – A(k). Suppose this test stops at A(z), h < z < n, so q = 0, 1, 2, …, z - h. If Remainder_Sum (k) + V1(p) < Difference, p = 0, 1, …, h, if Remainder_Sum (k) + V1(p) + V2(q) > Difference and V1(p) + V2(q) < Difference, then the shareholder of A(k) is a controller. Suppose this test stops at A(z), h < z < n, so for V2(q), q = 0, 1, 2, …, z-h. Remainder_Sum (k) + V1(p) + V2(q) < Difference, there is no more controller in this Supervised Entity. Stop. If z = n, it means all of the shareholders are controllers, then stop the program.
3.2 The subroutine for determining the possible combinations (sdmlc) The subroutine for determining the controllers at level 1 (sdcl1) was developed to find controllers from a combination of the shareholders. To find more controllers, almost all of the combinations should be considered. Sdmlc is designed to combine almost all possible shareholders to find the possible groups of the controllers. Suppose I is an upper level loop to alternate the combinations of stock shares. J is a lower level loop to alternate the combinations of stock shares. I x J are the total number of loops to generate the most combinations of shareholders A(k) to find possible controllers when using sdcl1. 1) 2)
3)
Rearrange the stock share percentages of the shareholders in decreasing order: A(1) > A(2) > … > A(k) > … > A(n), where k = 1, 2, …, n. Test the first shareholder. If his/her shares equal 50%, all shareholders of the SE are controllers and this finishes the process. If his/her shares are larger than 50%, the shareholder is the only controller of the SE and this finishes the process. In both cases, I equals 1 and J equals 1 too. Define the value of I. To create a variable of Sum and add one by one of the stock share percentages from A(1) to Sum until it arrives 50%. If Sum equals 50%, all of the shareholders are controllers, and this finishes the process. If it is immediately menus 50%, all these shareholders are controllers, A(k), k = 1, 2, …, s. I = k - 1.
An Algorithm for Determining the Controllers of Supervised Entities
4) 5)
285
Create a vector Remainder_Sum (k) which equals the summary of all the rest of the stock share percentages from k = n-1, n-2, …, to 1. For every i = 1, 2, …, I (I > 1) to define J (i). If i = 1, j = 1, call directly the subroutine for determining the controller in level 1 - sdcl1() to find controllers. x if i = 1 then call sdcl1(). x if i = 2, put the initial value j = 0, while (Remainder_Sum ((i-1) + (j+1)) – A((i-1) + (j+1)) 50%), do j = j + 1, temporarily eliminate A(i-1),…,A(j+1) to form a new group of stock shares and call sdcl1(). x if i = 3, put the initial value j = 0, while (A(1) + Remainder_Sum ((i-1) + (j+1)) – A((i-1) + (j+1)) 50%), do j = j + 1, temporarily eliminate A(i-1),…,A(j+1) to form a new group of stock shares and call sdcl1(). x if i > 3, put the initial value j = 0, while (A(1) + … + A((i-1)-1) + Remainder_Sum ((i-1) + (j+1)) – A((i-1) + (j+1)) 50%), do j = j + 1, temporarily eliminate A(i1),…,A(j+1) to form a new group of stock shares and call sdcl1().
3.3 The subroutine for determining controllers at level 2 (sdcl2) This subroutine was developed to extend the function of sdcl1 if there are some APs in a SE. If n elements of the stock shares of the SE, A(k), k=1,.., n, are considered as at level 1, m elements of an AP of the SE, Ap(k,i), i = 1,…, m, are at level 2. In this paper, all Ap(.,i) are PFs, i.e. the algorithm covers up to level 2. The main objective of sdcl2 is to analyze the elements at level 2 and combine them into level 1. 1)
Read all elements of A(k) of a SE, where k = 1, …, n: if A(k) is an AP, then go to step 2; if there is no AP, this means that all elements are PFs and there is no level 2, then do sdmlc/sdcl1 for A(.). 2) For every AP, create a matrix Ap(k,i), i = 1, …, m, where k is according to the k of A(k), m is the total number of PFs in A(k). All of the elements of Ap(k,.) are at level 2. Suppose K is the total number of APs in A(.). 2.1) For all k K, do sdmlc/sdcl1 on Ap(k,i), i = 1, …, m: if for any Ap(k,i), which is an only controller for this AP, then use this PF, Ap(k,i), to substitute A(k). Read A(k), k = 1,…, n, if there is another same PFs as Ap(k,i) in A(.), sum it to A(k) and eliminate it from A(.); if there is no unique controller at Ap(k,.), create a vector of Apc(k, j), j = 1,…,p for all k İ K, where p is the number of controllers of Ap(k,.). 2.2) Compare Apc(k,j), j = 1, …, p, with Apc(k’, j), j = 1, …, p, for all k, k’ K and k z k’, if there are APs with same PFs, then sum the shares of Apc(k,.) and Apc(k’,.), and eliminate Apc(k’,.). 2.3) After the possible elimination, a new vector of A’(.) is formed, where, k = 1, …, n’ and n’ < n. Do sdmlc/sdcl1 on A’(.) and get a new vector A_controller(k’’), k’’ = 1, …, n’’, n’’ < n’ and there are K’’ of APs in A_controller(.). 3) In A_controller(.), there are still some APs, but all of them are combined into the level 1. The following steps are to be used to substitute the APs by their PFs. 3.1) For all k’’ K'' , using the elements of Apc(k’’,.) to substitute A_controller(k’’), where Apc(k’’,.) is all PFs of AP of k’’.
286
V.G.F. Branco et al.
3.2) In the A_controller(.), all elements are PFs. There are still some repeated PFs. For k’’ = 1, …, n’’, eliminate the repeated A_controller(k’’). Put the new element in vector A_controller_final(.).
4 Complexity Analysis To determine the controllers from their stock shares is similar to the combinational problem Sum of Subset. A backtracking algorithm was developed with a possible search space which includes all of the possible solutions [2, 3]. Unfortunately, this algorithm, in the worst case, has the complexity O(p(n) 2n) or O(p(n) n!), where p(n) is a nth order polynomial [2, 3]. Therefore, this algorithm is only of limited value to the practical problems of BCB. The developed subroutines (sdcl1, sdcl2 and sdmlc) take advantage of the features of the actual problem instance. For decreasing the number of combinations of stock shareholders, some criteria are established to reduce the dimensions of the search space significantly compared to the backtracking algorithm. For example, in sdcl1, if Remainder_Sum (k) + V1(p) + V2(q) < Difference, then A(k) is not a controller and there are no more controllers in this group of stock shares. In sdmlc, if A(1) + … + A((i-1)-1) + Remainder_Sum (((i-1) + (j+1)) – A((i-1) + (j+1)) < 50%) then there are no more potential controllers. x Complexity analysis of sdcl1 The complexity analysis of the main steps of sdcl1 is illustrated in Table 1. From step 6 to 8, O(p x q x (z-h)) < O(h x z x (z-h)) < O(z3-h3) < O((z-h)3). After simplifying the results in Table 1, the complexity of sdcl1 is max(O(n log n), O((z-h)3)), where 1 < h < z < n. Table 1. Complexity analysis for sdcl1 Step
1 3 4 5 6-8
Method
merge sort while iterative while iterative for iterative while iterative while and 2 for iterative
Complexity
O(n log n) O(s) O(d-s), O(n) O(h-d), O(p x q x (z-h))
Condition
1<s
x Complexity analysis of sdmlc The complexity analysis of the main steps of sdmlc is shown in Table 2. From step 5, the complexity of sdmlc is max (O(I x J x n log n), O(I x J x (z-h)3))), where 1 < I < d < h < z < n, and 1 < J < n. I and J are usually given parameters. Due to max (O(I x J x n log n), O(I x J x (z-h)3))) < O(n3), the complexity of sdmlc is O(n3). x Complexity analysis of sdcl2 The main operations of sdcl2 are to call sdcl1 and sdmlc for some APs of a SE. Other operations may involve the necessary comparison, etc. Basically, the complexity of sdcl2 is similarly as sdmlc.
An Algorithm for Determining the Controllers of Supervised Entities
287
Table 2. Complexity analysis of sdmlc Step
Method
1 3 4 5
merge sort while iterative for iterative for iterative while iterative call sdcl1()
Complexity
Condition
O(n log n) O(I) O(n) max (O(I x J x (n log n)), O(I x J x (z-h)3))
1
5 Examples Table 3 shows an example of the application of sdcl1. There are 21 stock shares for this SE. In this example, I = 4. When doing loop i = 1 and j = 1, n = 21, Difference = 2.0%, s = 3, d = 11, h = 18, z = 21. Actually, if z = n, all of the shareholders are controllers and the program stops [8].
AP1
PF3
PF1
% of every stockholder
40,0%
30,0%
22,0%
8,0%
Accumulated Sum
Ap (1,.) at level 2 (10% to 50%)
40,0%
70,0%
92,0%
100,0%
Inverse Accumulated Sum
100,0%
60,0%
30,0%
8,0%
1ª Ap (1,.) at level 2 (10% to 50%)
AP1/PF1
Step 2 A(i)
AP2
2ª AP1/PF2
Step 2 Apc(1,.) & Apc(2,.) 1ª
AP1/PF4
Ap(2,.) at level 2 (10% to 50%)
AP2/PF2
40,0%
30,0%
30,0%
% of participation
40,0%
Accumulated Sum
40,0%
70,0%
100,0%
Accumulated Sum
40,0%
Inverse Accumulated Sum
100,0%
60,0%
30,0%
Inverse Accumulated Sum
% of participation
1ª
2ª
Eliminate repeated Aps
AP1
PF3
PF1
% of every stockholder
48,0%
30,0%
22,0%
Accumulated Sum
48,0%
30,0%
52,0%
Inverse Accumulated Sum
100,0%
52,0%
22,0%
A-Controllers (with Aps)
AP1
PF3
PF1
% of every stockholder
48,0%
30,0%
22,0%
PF1
PF2
PF4
A-controllers (just with PFs) % of every stockholder Stock Controllers
48,0% PF1
48,0% PF2
48,0% PF4
2ª AP2/PF4 35,0%
100,0%
100,0%
60,0%
25,0%
Step 3 A_Controller(.)
PF1
Step 3.1 - A_Controller(.)
30,0% 22,0% PF3
Step 3.2 - A_Controller_Final(.)
Fig. 3 The illustration of the application of algorithm: sdcl2
25,0%
75,0%
Step 2.2 Apc(1,.) = Apc(1,.) + Apc(2,.)
PF3
AP2/PF1
288
V.G.F. Branco et al.
Table 3. Illustration of the application of algorithm: sdcl1 and sdmlc
k
s
d
h
z
Stock shares % 22
Sum % 22
Remainder_Sum % 100
13
35
78
13
48
65
10
58
52
8
66
42
7
73
34
4.5
77.5
27
3.3
80.8
22.5
2.9
83.7
19.2
2.8
86.5
16.3
2.1
88.6
13.5
1.8
90.4
11.4
1.8
92.2
9.6
1.75
94
7.8
1.75
95.7
6.5
1.1
96.8
4.3
0.75
97.6
3.2
0.75
98.3
2.45
0.7
99
1.7
0.7
99.7
1.0
0.3
100
0.3
In Figure 3, sdcl2 was used to find the controllers for a SE, BM Bank, as in Fig. 1. In this example, there are two APs (BM Automovies and BM Insurance) and other two PFs that hold stocks of BM Bank. To find the stock controllers of that bank, the process is described in Figure 3. The final stock controllers of BM Bank are PF1, PF2, PF3 and PF4. This example is aimed simple at showing the complexities of real instances of our problem.
An Algorithm for Determining the Controllers of Supervised Entities
289
6 Conclusions An algorithm for determining the stock controllers of financial institutions for the Brazilian Central Bank (BCB) was developed and implemented. Initial analysis shows that the proposed algorithm has polynomial complexity O(n3). It can be used to find almost all possible controllers of a SE, though how near to a complete solution it is has to be further evaluated. The paper described the three subroutines dealing with the first two levels, but in the real situation, to determine the controllers through stock shares for a SE, sometimes, may require up to 10 levels. The algorithm for the first two levels can be basically adapted for higher levels. But BCB asked us to just report the first two levels in this paper. The developed algorithm was implemented by Politec Informatics Ldta, Brazil and is currently employed at Unicad/BCB. Natural is a basic language which was used for implementation. All the codes are adapted in the computing operation system of Unicad (National Unique Registration) of Brazil and integrated with the database of BCB. The experimental results indicate the feasibility of automating the process of finding controllers using our approximation algorithm instead of the manual operation. What took two agents a full day of work has now been reduced to just minutes of computation. Though developed for BCB, it is expected to work equally well for other financial institutions which are controlled by stockholders. Future work will involve methods of Artificial Intelligence and mathematical models to improve efficiency and reduce human involvement in the process even more.
References 1. Banco Central do Brasil: Regras para se calcular o controlador de uma entidade supervisionada pelo Banco Central do Brasil, (2001). 2. Cormen, T. H., Leiserson, C. E. and Rivest, R. L.: Introduction to Algorithms, The MIT Press, Combridge, USA, (1996). 3. Horowitz, E., Sahni, S. and Rajasekaran, S.: Computer Algorithms, Computer Science Press, New York, USA, (1998). 4. Garey, M. R., and Johnson , D. S.: Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Company (1979). 5. Brickell, E. F.: Solving low density knapsacks, Advances in Cryptology, Proceedings of Crypto '83, Plenum Press, New York (1984), 25-37. 6. Lagarias, J. C., and Odlyzko, A. M.: Solving low-density subset sum problems, J. Assoc. Comp. Mach. 32(1) (1985), 229-246. 7. Radziszowski, S. and Kreher, D.: Solving subset sum problems with the L3 algorithm, J. Combin. Math. Combin. Comput. 3 (1988), 49-63. 8. Branco, V. G. F., Weigang, L., Ribeiro, A. F., Shibata, W., Abad, M. P. E., Torres, N. V., Dib, M. V. P. and de Andrade, V. M.: An Algorithm for Determining the Controllers of Supervised Entities at the First Level: Brazilian Central Bank Case Study, Proc. of FSKD'02 1st International Conference on Fuzzy Systems and Knowledge Discovery, pp. 240-244, Singapore, (2002).
CHAPTER 21 Fuzzy Congestion Control In Packet Networks Tapio Frantti1 Technical Research Center of Finland. Kaitov¨ayl¨a 1, PL 1100, 90571, Oulu, Finland
[email protected] Abstract: In this paper an adaptive fuzzy congestion control solution for active queue management in packet switched networks which use TCP to solve end-to-end flow control is described. It supports end-to-end control for packet data traffic between intermediate nodes, such as routers, in the network. In the presented solution, a fuzzy controller with an adaptive fuzzy membership function generation and tuning procedure and a multilevel hierarcical decision making method regulates the rejection probability of received packets on intermediate nodes to prevent buffer overflow and the rejection of all transmitted packets during congestion. In the simulations we compared a fuzzy controller with a fixed shape of membership functions with the fuzzy controller with adaptive membership functions. Membership functions were updated according to the incoming traffic in order to keep the equation form rule base as simple as possible. The simulation results showed that the developed adaptive control algorithm decreased the packet loss rate with a lower occupation of the buffer and increased the link utilisation via decreasing the retransmission rate more effectively. Keywords: active queue management, congestion control, fuzzy control
1 Introduction Communication networks like the Internet provide unreliable, connectionless, packet delivery at the lowest levels of communication. Therefore, packets can be lost or destroyed due to transmission errors, network hardware failure, or too heavy traffic loads. Moreover, these networks can deliver packets out of order, with substantial delay, or deliver duplicates. A network may also dictate an optimal packet size or pose other constraints. The Internet also has two independent flow problems. First, internet protocols need endto-end flow control and secondly, they need a mechanism for intermediate nodes (like routers) to control the amount of traffic (congestion control). Congestion is a condition of severe delay caused by the overload of datagrams at routers. Usually congestion can arise from two different causes: a high-speed computer may generate traffic faster than a network can transfer it or many computers send datagrams simultaneously through a single router. Hence, if a router has insufficient memory to hold the received datagrams, congestion follows. Adding memory may temporarily help, but it has been pointed out (see [9]) that an infinite amount of
T. Frantti: Fuzzy Congestion Control In Packet Networks, Studies in Computational Intelligence (SCI) 2, 291–308 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
292
T. Frantti
memory will make congestion even worser. This is due to the fact that datagrams have already timed out before getting out of the queue, leading to duplicates. The datagrams will also be forwarded to the next intermediate node thus increasing the load all the way to the destination. Slow processors on the routing devices can also lead to congestion. The bookkeeping tasks of routers (queueing buffers, updating tables) cause queues if the line capacity exceeds processing capacity (a mismatch between the parts of the system). This problem will persist until all the parts are in balance, but this equal capacity makes gradual updates of the network difficult due to the renewal of all parts of the system at the same time. Moreover, congestion feeds itself and becomes worse. With a lack of buffer space, a router has to discard arriving datagrams, leading to retransmission after time-out on the transmitting node, which can not discard datagrams before acknowledgement. Flow control is related closely to the point-to-point traffic between a sender and a receiver. It guarantees that a fast sender can not continually send datagrams faster than a receiver can absorb them. Hence, congestion control can be considered more as a global issue wheres flow control is more a point to point issue with some direct feedback from the receiver to the sender. The transmission control protocol (TCP) solves end-to-end flow control with a sliding window scheme, but it does not have a mechanism for congestion control1 . However, a carefully programmed TCP implementations can detect and recover from congestion while a poor implementation can even make it worse [2]. Endpoints do not usually have the details about congestion and its reasons. Routers, on the other hand, use techniques like ICMP (Internet Control Message Protocol) to inform hosts that congestion has occured. In this chapter we present an adaptive fuzzy congestion control solution for active queue management in networks which use TCP to solve end-to-end flow control. Therefore, we organised the rest of the chapter as follows. Section 2 introduces the principles of congestion control, admission control, congestion prevention datagrams and load shedding. In Section 3 the TCP and congestion control issues are presented. Then Section 4 introduces the active queue management (AQM) scheme. In Section 5 fuzzy techniques are presented and Section 8 describes the developed fuzzy model. Section 7 explains an on-line tuning method for queue control and in Sections 9 and 10 the developed simulation system model and achieved results are presented, respectively. Finally, conclusions are drawn in Section 11.
2 Principles of Congestion Control Solutions to congestion control problems can be divided into two main categories, i.e., open loop and closed loop solutions. In the former, problems are attempted to be solved beforehand by good design whereas the latter solutions are based on feedback information. Open loop control includes rules of acceptance for incoming traffic and rules for the discard of datagrams as well as scheduling decisions. In open loop control, the decisions are made without information about current network state. Closed loop control, on the other hand, has three congestion control related parts namely traffic or system monitoring to detect when and where congestion occurs, the transfer of information to places where an action should happen, and adjustment of the system accordingly [17]. The occurence of congestion can be identified via monitoring different kinds of metrics, like defining the percentage share of discarded datagrams, average queue lengths, the percentage 1
For example, the basic Tahoe TCP implementation linearly increases its rate with no loss and exponentially decreases its rate when congestion is detected.
Fuzzy Congestion Control In Packet Networks
293
share of datagrams that are timed out and are retransmitted, and the average value and variance of datagram delay. A natural step after the monitoring and identification of congestion is to transfer information from the congested places to places where control actions can be performed. A host or routers can also periodically send so called probe datagrams to get explicit information about congestion via feedback. The purpose of the feedback information is that a host can take appropriate actions (like to decrease its transmission rate) to reduce congestion. However, information delevery increases the load of the already congested network. According to the taxonomy of Yang and Reddy 1995 [3] the open loop control algorithms can further be divided into ones that act on the source and ones that act on the destination. The closed loop algortihms can be divided to implicit and explicit feedback algorithms. In explicit algorithms datagrams are sent back from intermediate nodes to warn the source whereas in implicit algorithms the source has to deduce the existence of congestion via local observations, such as round-trip delay times. As stated above congestion occurs when the load on the network is temporarily greater than its resources. Therefore, in a congestion state one can either increase resources (increase bandwidth, split traffic via multiple routes, put backup devices on line) or decrease load (denying and/or degrading services). One of the main reasons for congestion is bursty traffic. Traffic shaping is commonly used to regulate the average rate (and burstiness) of data transmission. It decreases congestion and helps with quality of service (QoS) requirements. Congestion situations can also be affected via the right kind of prevention policies, like retransmission, acknowledgement, flow control, out-of-order datagram discard, queueing and service policies, as well as datagrams lifetime management and routing algorithms [17].
2.1 Admission Control Admission control is a widely used way to prevent congestion from getting worse. In it, after congestion has been signalled out, no more connections or virtual circuits are allowed to be set up until the congestion has gone away. It is not only crude but also a simple and robust method. In packet switched networks it is also possible to allow new virtual circuits by routing traffic via different, uncongested routes. Another alternative solution is to negotiate an agreement between the hosts and network during the connection set up by specifying the volume and shape of the traffic as well as the quality of the service requirements. This, however, is not an optimal solution for bandwidth usage, i.e., it tends to waste network resources because all the necessary resources are guaranteed to be available.
2.2 Congestion Prevention Datagram Routers on the network can monitor their output line utilisation. Hence, whenever the utilisation of a line approaches a specified threshold level, the router transmits choke datagrams to the sources in order to give warning signals to them. The source nodes or hosts are required to reduce their transmission rate to the specified destionation by n percentage. Another suggested paradigm for congestion prevention is weighted fair queueing, where a router selects datagrams from multiple queues in a round robin way to the idle output line. Router weights more bandwidth to some services than to other ones. Hence, the name of weighted fair queuing.
294
T. Frantti
2.3 Load Shedding If the congestion does not disappear without preventive actions, routers can throw away datagrams they can not handle. It can be done it either randomly or in a rational way, for example, for a file transfer the drop of a newer one is more rational than of an older one due to acknowledgement and retransmission procedures. On the contrary, in real-time data transfer newer ones are more valuable than older ones. In the same way, for many applications some datagrams are more valuable than others. However, this requires the applications to mark their datagrams in priority classes to indicate their importance.
3 TCP and Congestion Control The transmission control protocol (TCP) provides the interface2 and control mechanism for traffic in TCP/IP networks3 . TCP is a connection-oriented protocol, i.e., both connection endpoints have to agree to participate. It is based on a fundamental technique known as positive acknowledgement with retransmission, which requires a recipient to communicate with the source using ackmowledgement (ACK) messages as it receives data. The sender keeps record of each packets sent and waits for an ACK before sending the next packet. It also starts a timer after transmission and retransmits a packet if the timer expires. In order to adapt to varying delays in an Internet environment, TCP uses an adaptive retransmission algorithm that monitors delays in each connection and adjusts its timeout parameter accordingly. However, a simple posive ACK protocol is very inefficient because it must delay sending a new packet until receiving an ACK for the previous one. Therefore, the concept of a sliding window was developed to make stream transmission more efficient. It simply allows a sender to transmit multiple packets before waiting for an ACK, i.e., when the window size is exhausted, the transmitting source has to wait for an ACK before sending a new packet. A well tuned sliding window protocol keeps the network completely saturated with packets, thus obtaining substantially higher throughput than a simple positive ACK protocol. Therefore, it increases transmission efficiency and solves the flow control problem [2]. TCP allows the window size to vary over time according to the traffic density. The ACK includes a window advertisement that informs the number of additional octects of data the receiver is prepared to accept, i.e., it specifies the receiver’s current buffer size. The TCP implementations like Tahoe, Reno and Vegas differ in window management. The basic Tahoe TCP implementation linearly increases its rate with no loss and exponentially decreases its rate when congestion is detected. In the Reno TCP implementation, two refinements, fast recovery and congestion avoidance, are designed to recover from loss more efficiently. In the first refinement, the window size is frozen when detecting a loss through a duplicate ACK. It allows a source to increase its window size by one upon receiving each duplicate ACK, because each duplicate ACK signals that a packet has left the network. The second refinement sets the window size to half of its value and the protocol enters the congestion avoidance phase directly. Therefore, slow-start is entered very rarely (only when a loss is detected by timeout). The new Reno implementation uses loss as a measure of congestion and this information is indirectly fed back to the source node through a DropTail queueing 2
3
The TCP specification does not dictate the details of the interface between the application programs and TCP. TCP is usually implemented in the computer’s operating system and programmers can employ whatever interface the operating system supplies. Actually TCP is an independent, general purpose protocol [2].
Fuzzy Congestion Control In Packet Networks
295
dicipline that drops an arrival to a full buffer. In the DropTail mechanism a severe problem might occur when the datagrams traveling through a router carry segments from many TCP connections because the drop-tail then causes global synchronisation (it discards one segment from n connections rather than n segments from one connection). In the TCP Vegas implementation a source adjusts its rate based on the queueing delay. Queueing information is updated by the first-in-first-out buffer process and fed to the source implicitly through round-trip time measurements [14].
4 Active Queue Management TCP is insufficient for high quality of service (QoS) support due to the limited control ability of end-points (Figure 1). Therefore, routers need a separate congestion avoidance mechanism, such as active queue management (AQM) with a built-in packet dropping/marking mechanism [1]. In the AQM scheme the routers explicitly or implicitly feedback the congestion to each other by dropping or marking packets in the router queues. Hence, the end-nodes react to it by reducing the amount of data transmitted. The targets of it are to reduce the average queue length in routers, decrease the end-to-end delay and reduce the packet loss probability by preventing buffer overflows. These ensure the efficient use of network resources. By keeping the average queue size small, AQM also has the ability to accommodate bursty traffic without dropping packets. It emphesizes also the tradeoff between delay and throughput. Traditional methods, like ’drop packets on the front of full buffer’ or ’randomly drop packets on the front of full buffer’ are useful when a sub-set of the flow sharing the link monopolizes the queue during the congestion. However, these kinds of solutions can not fully solve the queueing problem [5]. RED (Random Early Discard/Drop/Detection) is an alternative AQM strategy to generate the congestion measure. It is especially used to avoid global synchronisation. RED maintains an exponentially weighted queue length and drops packets with a probability that increases with the average queue length. The operation of RED can be described by following rules [2]: - if queue contains fewer than Tmin datagrams, add new one to the queue - if queue contains more than Tmax datagrams, discard the new one - if queue contains datagrams between Tmin and Tmax , randomly discard datagram with a probability p. Therefore, in [2] the RED policy for routers has been summarized as: if the input queue is full when a datagram arrives, discard the datagram; if the input queue is not full but the size exceeds a minimum threshold, avoid synchronisation by discarding the datagram with probability p. The computation of the probability p is the most complex task of RED. It is defined for each datagram and its value depends on the current queue size and the threshold values. However, it make sense to measure the queue size in ocetes instead of datagrams because it makes the discard propability proportional to the amount of transmitted data and hence small datagrams, like ACKs, have a lower discard probability than large datagrams. One deficiency of the AQM methods is their probabilistic nature. Reliable multicasting, for example, persuades to develop protocols that incorporate redundant information to reduce or eliminate retransmission and one simple scheme is to send reduntant datagrams. Instead of sending a single copy of each datagram, the source node sends n copies. This increases the traffic load on the network. In order to tackle this, routers should check sequence numbers of received datagrams and first discard duplicates in a congestion situation.
296
T. Frantti
a. ) wireline router
router
end-node TCP/IP flow acknowledges
buffer
gateway router wireline network
end-node buffer
b.) wireless TCP/IP flow
buffer
acknowledges
TCP/IP flow acknowledges
wireless terminals
Fig. 1. The traffic flow between the routers and end-nodes in an a.) wireline and b.) wireless environment. In [8] authors have analysed and very thoroughly researched the stability effects of wireline network parameters by considering the network as a linear and constant system. Based on the stability analysis, they designed a proportional integral (PI) controller for AQM. Another paper [5] considers that the linear and constant system assumption is problematic, inaccurate and unrealistic because the actual network has many degrees of freedom. Therefore, they represent the AQM algorithm based on fuzzy control theory. They claim and prove via experiments that a fuzzy controller is very useful in complex and time-varying networks because the fuzzy controller is independent of the mathematical model of the control plant. In this chapter we enlarge the view of [5] by representing a fuzzy controller with an adaptive fuzzy membership function generation and tuning procedure and a multilevel hierarcical decision making method in order to adapt to the incoming traffic. The procedure makes it possible to use very small buffer sizes in very bursty conditions without buffer overflows thus decreasing end-to-end delay, reducing packet loss probability and increasing the link utilisation degree.
5 Adaptive Fuzzy Queue Controller 5.1 Fuzzy Set Theory and Fuzzy Logic Fuzzy set theory was originally presented by Lotfi Zadeh in his seminal paper ”Fuzzy Sets” [19]. Fuzzy logic was developed later from it to reason with uncertain and vague information and to represent knowledge in an operationally powerful form. The name fuzzy sets is used to distinguish them from the crisp sets of the conventional set theory. The characteristic function of a crisp set C, µC (u), assigns a discrete value (usually either 0 or 1) to each element u in the universal set U. The characteristic function can be generalized so that the values
Fuzzy Congestion Control In Packet Networks
297
assigned to the elements u of the universal set U fall within a prespecified range (usually to the unit interval [0, 1]) indicating the degree of membership of these elements in the set. The generalized function is called a membership function and the set defined with the aid of it is a fuzzy set, respectively. Standard operations of fuzzy set theory for the handling of fuzzy sets are complement, union (S-norm) and intersection (T-norm). More information about standard operations and other fuzzy set theoretical operations can be found, for example from reference [12], which covers the topic thoroughly.
5.2 Inference of the Grade of Membership In systems, where the knowledge would be expressed in a linguistic form, a language-oriented approach can be used in model generation. Conventional control engineering uses mathematical models of a system and its inputs and outputs in order to design and analyse control policy. In fuzzy control engineering fuzzy sets and reasoning are used to derive control actions. The fuzzy control engineering approach is especially useful for systems with no precise mathematical model or where model derivation is laborious and for systems where most of the information is in qualitative form. Here the creation of a precise mathematical model was considered to be very difficult because of the very non-linear nature of networks with several degrees of freedom. Therefore, we choose the fuzzy control based linguistic approach as a faster and more accurate modelling method for active queue management. The aim was to reduce the average length of the queue in routers, decrease the end-to-end delay and reduce the packet loss probability via the adaptive fuzzy controller. The main idea of fuzzy modelling is the use of an expert’s knowledge for the rule base creation. Rule base is usually presented with linguistic conditional statements, i.e., if-then rules. However, in this paper we present rule bases (we have in this application two rule bases because of two level hierarchical reasoning) in the matrix form (see more details below and from the [10], [7]). Reasoning can be done either using composition based or individual based inference. In the former all rules are combined into an explicit relation (describes the degrees of associations between fuzzy sets given in a linguistic form) and then fired with a fuzzy input whereas in the latter rules are individually fired with a crisp input and then combined into one overall fuzzy set. Here we used individual based inference with Mamdani’s implication [4] in both reasoning levels of our model. The main reason for the choice was that it is easier to implement (the results are equivalent for both methods when Mamdani’s implication is used). In individual based inference the grade of membership of each fired rule can be formed by taking the T-norm (see [12] for more details) value from the grades of membership of the inputs for each fired rule. Its definition is based on the intersection operation and the relation Rc (c for conjunction defined by the T-norm) (1) µRc (x, y) = min(µA (x), µB (y)), where x and y denote input variables whereas A and B are meanings of the x and y, respectively [4]. The meaning of the whole set of rules is given by taking the S-norm (see [12] for more details) value of the grade(s) of membership from the rules with the same output value to form an output set with only linguistically different values. In this application, the result of the first hierarchical level is the input to the second level of reasoning. The advandage of hierarchical reasoning is the reduction of the number of rules in the rule bases, the number of rules will increase linearly with the number of system variables and not exponentially as in conventional systems [16]).
298
T. Frantti
5.3 Linguistic Equations In the framework of linguistic equations a linguistic model of a system can be described by groups of linguistic relations [10]. The linguistic relations form a rule base of the system that can be converted into matrix equations. Suppose, as an example, that X j , j = 1,..., m (m is an uneven number), is a linguistic level (e.g., negative big, negative small, zero, positive small, and positive big) for a variable. The linguistic levels are replaced by integers −( j−1) ( j−1) , ..., −2, −1, 0, 1, 2, ..., 2 . The direction of the interaction between fuzzy sets is pre2 sented by coefficients Ai {−1, 0, 1}, i = 1,..., m. This means that the directions of the changes in the output variable decrease or increase depending on the directions of the changes in the input variables [11]. Thus a compact equation is: m
∑
Ai j X j = 0.
(2)
i, j=1
The mapping of linguistic relations to linguistic equations is described in Figure 2.
2
0
1
2
1
4
1
ps ns ze ps
1
-1
0
1
0
3
1
0
-2
-1
1
-1
2
1
ns ps
-1
-3
-2
-1
-2
1
1
nb nb nb ns nb ze
-2
-4
-3
-2
-3
0
1
-2
-1 0 1 linguistic levels
ze pb
ze ns ns ze ns ps ns nb ns ns
nb ns ze
ps pb
=>
linguistic levels 1
interaction coefficients
pb ze ps ps ps pb
2
1 1 -1 1 interaction coefficients
Fig. 2. The mapping of linguistic relations to linguistic equations.
6 Network Traffic Model In the literature ( [13], [15], [18]) it has been shown that Ethernet LAN traffic is statistically self-similar (time-scaled discrete or continuous time processes have similar patterns, i.e., the process in a larger scale is a copy of itself in smaller scales) and that none of the commonly used traffic models is able to model this fractal-like behaviour. Moreover, in [13] it has also been claimed that such a behaviour has serious implications for the design, control, and analysis of high-speed, cell-based networks, and that aggregating streams of such traffic typically intensifies the self-similarity (burstyness) instead of smoothing. On the other hand, it is a well-known truth that telecommunication traffic in traditional circuit-switched networks
Fuzzy Congestion Control In Packet Networks
299
obeys a Poisson distribution. One should also note that in the case of packet switched voice applications (e.g., VoIP) the Poisson process model can be a reasonable approximation. Therefore, the simulation model, which was developed to rigorously test the adaptive active queue management scheme on the routers, uses Pareto4 distributed traffic and interval times for the evaluation of the developed adaptive fuzzy queue management for non-voice data traffic in packet-switched networks and Poisson distributed traffic and exponentially distributed interval times for the evaluation of queue management for VoIP traffic in packet-switched networks. A Pareto distribution has the following probability density function: αbα ,x ≥ b (3) xα+1 where α is a shape parameter and b is minimum value of x. The mean value of a Pareto distribution is Ppareto (x) =
αb (4) α−1 For self-similar traffic α should be between one and two. When α ≤ 2, the variance of the distribution is infinite. The formula to generate a Pareto distribution is E(x) =
XPareto =
b
(5) 1 Uα where U is a uniformly distributed value in the range (0,1]. For the packet trains we used the values 1 for b (minimum size of the packet train) and 1.5 for the shape parameter α and for the 8 (size of the preamble compared to the maximum silence periods between packet trains, 1518 size of the Ethernet packet) for b and 1.2 for α. A Poisson distribution, on the other hand, has the following probability density function: Ppoisson (x, λ) =
λx −λ e (x = 0, 1, ...) x!
(6)
where λ is the shape parameter which indicates the mean value of the events in the given time interval. The formula for the Poisson cumulative probability function is: F(x, λ) =
e−λ λi i=0 i! x
∑
(7)
The Poisson percent point function does not exist in simple closed form like for the Pareto distribution above. Hence, it is computed numerically and is only defined for integer values of x.
7 On-line Tuning in Queue Control The source data originated from the incoming traffic density on the intermediate nodes. The problem was to find a suitable data period for the generation of reliable and reactive membership functions to estimate the variability of the incoming traffic density with a very compact packet buffer and rule base (a compact rule base is especially required for embedded systems) 4
Self-similar network traffic can be generated by multiplexing several sources of Pareto distributed packet trains with silence periods between packet trains.
300
T. Frantti
to control the packet rejection probability on the routers for congestion control. The requirement of a small packet buffer decreases the need for silicon area on circuits. With the smaller buffer size it is also easier to design more effective microprosessor pipelines for buffer reading (for more sophisticated algorithms than FIFO, for example). Furthermore, the small packet buffer necessitates an effective congestion control algorith on the routers to allow very high traffic density on the link with a tolerable packet loss rate. A larger buffer size with fixed fuzzy membership functions absorbs to some degree the changes in incoming traffic density. However, it does not limit the incoming traffic via acknowledgements as effectively as the fuzzy controller with automatically generated fuzzy membership functions in changing traffic conditions. This is due to the fact that with buffer overflow all the packets have to be retransmitted and the link has not adapted to the full buffer as it smoothly does with the adaptive controller. In the traditional approach the division of dynamic scale into fuzzy membership functions has to be done for the worst or near worst case. Hence, the resolution of control is inadequate in better conditions and causes unnecessary oscillations due to queue control around the set point. Naturally, we could also define a very large dynamic scale for the variables. However, we should then also define a very large rule base (increasing exponentially with the dynamic scales of variables), which unnecessarily increases the complexity of the model and the amount of testing effort (testing of consistency, continuity and interaction of a set of rules, see [4] for more details) as well as the control time of the controller. Therefore, it looks obvious to model traffic behavior by taking into consideration the dynamic scale of the incoming traffic to the routers via the on-line tuning of membership functions.
8 Fuzzy Model The developed adaptive fuzzy model for active queue control consists of four sequential basic modules: adaptive membership function generation, fuzzification, hierarchical inference and defuzzification modules (Figure 3). input: error change of error
membership functions generation
fuzzyfication
1. level reasoning
2. level reasoning
defuzzyfication
output: rejection probability
Fig. 3. The structure of the developed fuzzy model. Input values of the controller are error (e = set value of buffer length - length of buffer) and change of error (δe = error - earlier error) of the incoming traffic density on the router. The number of Ethernet frames (in this example, the TCP/IP protocols uses Ethernet frames Ethernet frames are of variable length, with no frame smaller than 64 octets or larger than 1518 octets, see Figure 4) created in one time unit on the transmitter (see Figure 1) was assumed to be either Poisson or Pareto5 distributed. The transmission delay of each packet was assumed to be normally distributed. The bursty nature of the traffic was modelled via increasing Poisson or Pareto distributed packet bursts randomly on the simulated link. The error (e) and change of error (δe) of the incoming traffic are fuzzified according to the adaptively tuned fuzzy membership function values (see below). The tuning is done for 5
Traditional telecommuniation network traffic is Poisson distributed with exponentially distributed interval times. Data network traffic has a self-similar and long-range dependant nature, which obeys Pareto distributed interval times.
Fuzzy Congestion Control In Packet Networks Destination Source Preamble Address Address
Frame Type
6 octets
2 octets
8 octets
6 octets
Frame Data 46-1500 octets
301
CRC 4 octets
Fig. 4. The format of an Ethernet frame (packet). both input variables on the routers at the control action frequency (100 to 1000 Hz depending on the incoming traffic density) and it includes data from the last 50 to 200 (depending on the set value of the model designer) measured input values. The tuning of membership function values is performed by collecting periodically a discrete density distribution of time series values of measured input data (see more details from [6]). The cumulative distribution is determined from it. As an illustration assume that the minimum and maximum values of time series values are determined. In order to find density distributions the values between the minimum and maximum values are selected for the density distribution. The density distribution is defined by dividing the interval, for example, into 10 classes and defining the share of the values in each of them. This is done by summing the number of samples in each class and dividing by the total number of samples in the interval. The cumulative distribution is achieved from the density values by summing them from left to right (from small to large). The division points (corner points of quadraple shaped membership functions) are defined from the cumulative distribution. We show the five label case (Figure 5): - define value 0.125 on the vertical axis and find corresponding value from the horizontal axis = division point 1 - ... - define value 1.000 on the vertical axis and find corresponding value from the horizontal axis = division point 8 In this method, when the number of divisions approaches the number of samples in the original time series, the division points form an original like distribution (see [6]). Therefore, we can conclude that the method is suitable for the division of a time series into linguistic areas if the resolution is high enough. In the first inference level of hierarchical reasoning the fuzzy inferences are performed using the fuzzy rules which are presented in matrix form (see references [10] and [7]). The rule base is composed by analysing the dynamics of incoming bursty traffic to the routers. The rule base on the first hierarchical level includes 25 rules (see Figure 6). The control strategy produces the linguistic control output (incoming packets’ rejection probability). On the second level of hierarchical reasoning, which includes only 3 rules, the second derivative value of the incoming traffic is used to regulate the earlier determined linguistic incoming packets’ rejection probability. The final linguistic rejection probability is transformed back into the physical domain to find the crisp control output value for the incoming packet’s rejection probability. In this (defuzzification) phase the center of area method (CoA) was used (see [20]).
9 Simulation System Model In the simulated example (see Figures 7-10) we used the expected value of 50 000 packets/s, which results in a traffic rate of 600 Mb/s when an ethernet packet size of 1500 bytes (octets)
302
T. Frantti
1.000
0.875
0.750
0.625
0.500
0.375
0.250
0.125
division point 4
division point 1 division point 2 division point 3
division point 7
division point 5 division point 8 division point 6
membership functions
Fig. 5. Division points from the cumulative distribution. was assumed. The delay of each packet was assumed to be normally distributed6 with an expected value of 5 ms and standard deviation of 1 ms. The bursty nature of the traffic (here 6
According to the central limit theorem the distribution of an average tends to be normal even if the distribution from which the average is computed is non-normal.
change of error
Fuzzy Congestion Control In Packet Networks
PB
ZE
ZE
NS
NB
NB
PS
PS
ZE
ZE
NS
NB
ZE
PB
PS
ZE
NS
NS
NS
PB
PB
ZE
ZE
NS
NB
PB
PB
PS
PS
ZE
NB
NS
ZE
PS
PB
303
error PB=positive PS=positive ZE=zero NS=negative NB=negative
big small small big
For example: if error is PS and change of error is NB then power step size is PS
Fig. 6. The structure of the rule base. a ratio of standard deviation in the number of arrivals to the mean number of arrivals7 ) was modelled via increasing packet bursts with a probability of 0.1. The expected size of bursts was 60 Mb/s and the delays of them were assumed to be normally distributed with an expected value of 5 ms and standard deviation of 1 ms. Hence, the average data rate was 600 Mb/s + 0.1 × 600Mb/s = 660.0 Mb/s. The model was implemented using the C++ programming language and it runs under the Linux OS.
10 Results In this section we present the effects of the developed fuzzy models on the required buffer length in the router and the end-to-end throughput for the Pareto-Poisson incoming traffic model described earlier in Section 6. One fuzzy model utilizes predefined membership functions for the input and output variables whereas the other applies adaptive membership functions for the input values and predefined membership functions for the output. Adaptive membership functions are not utilized for the output variable (incoming packet rejection 7
There exist at least two other commonly used definitions for burstyness: the ratio of peak bandwidth to the mean bandwidth and the ratio of arrival variance to the mean number of arrivals.
304
T. Frantti
probability) due to the stable scale of it (always between 0 and 1.0). Pareto distributed traffic (Pareto distributed ON-OFF periods with Poisson distributed traffic during the ON period) is used because it is more realistic for packet data networks with non-real time traffic (NRTT), as was assumed here. Generally applied, Poisson distributed traffic can be used to describe, e.g., voice over IP (VoIP) or some other kind of real-time traffic (RTT).
Fig. 7. Required length of the buffer in the router when fixed membership functions were used. Incoming data rate = 600 Mbit/s. Figure 7 present the results achieved from the fuzzy model with fixed, manually created, fuzzy membership functions for the required buffer length in the router whereas Figure 8 presents the results with adaptive membership functions with a period length (data sample collection interval for the tuning of membership functions) of 50 ms for the required buffer length in the router. In this comparison one can notice that the hierarchical fuzzy model with adaptive membership functions manages the buffer more economically. The required buffer size is continuously under 600 Mbit/s even if sometimes the buffer is occupated longer than with the non-adaptive model. This is probably due to traffic bursts and the adaptive model’s better ability to store incoming packets during and after bursts within the limit of the buffer. The fuzzy model with predefined membership functions more easily discards packets during bursts due to its limited capability to adapt to changing circumstances. This kind of behavior also explains the better throughput of the adaptive model (see Figures 9-10). Moreover, the buffer size ’peak frequency’ is lower in the hierarchical fuzzy reasoning model with on-line tuned fuzzy membership functions, i.e., the created control method has a better predictive nature (lower packet loss rate and lower occupation of the buffer). Hence, we can state that hierarchical fuzzy reasoning with adaptive membership functions offers a good choice of active
Fuzzy Congestion Control In Packet Networks
305
Fig. 8. Required length of the buffer in the router when adaptive membership functions were used. Incoming data rate = 600 Mbit/s.
queue management for routers. This will even be emphesized if active queue management is applied in a wireless environment in order to eliminate unnecessary retransmissions as radio signal attenuation, fading, interference and shadowing in the wireless environment already themselves cause a high retransmission rate. We would like to note that unncessary retransmissions in a TCP/IP network can also be avoided via proper updating of the timeout parameter value. TCP can not update timeout values (round trip time values, RTT) from retransmitted segments but an algorithm that merely ignores times from retransmitted values can also lead to failure when there is a sharp increase in delay [2]. Therefore, for example the Karn algorithm8 separates the computation of the timeout value from the current RTT estimate. The current estimate is used to calculate the initial timeout value and then backs off the timeout on each retransmission until it can successfully transfer a segment. Figures 9-10 present end-to-end throughput when fixed fuzzy membership functions and adaptive membership functions were used, respectively. The average input data rate in the figures is 600 Mbits/s. One can notice that the throughput is higher when adaptive membership functions were used. This is due to the fact that the average input data rate increased from time to time because of bursts over the (pre)designed limit of control area when fixed membership functions were used, i.e., we should know the approximate traffic volume beforehand. Therefore, we can notice that by using adaptive membership functions, the model adapts itself to the increased average data rate and it is not required to know beforehand the traffic volume 8
Phil Karn, an amateur radio enthusiast, developed the Karn algorithm to allow TCP ommunication across a high-loss packet radio connection.
306
T. Frantti
Fig. 9. End-to-end throughput when fixed membership functions were used. Incoming data rate = 600 Mbit/s. on the network. A fuzzy controller with manually predefined membership functions for the assumed traffic data rate instead acts (or has to act) on the above of predesigned control area when the incoming data rate is higher than assumed, applying the highest possible incoming packet rejection probability to incoming packets, thus unnecessarily increasing the packet loss rate. Moreover, the adaptive model does not overreact to small traffic changes unlike the fuzzy model with fixed fuzzy membership functions obviously does, which can be noticed from the required buffer length’s peaks. This is probably due to the fact that the resolution of control area is lower for the model with fixed membership functions (see Section 7 for more details and explanations). From the presented figures we can notice that FAQM (Fuzzy Active Queue Management) significantly decreases the packet loss rate with less overflows of the buffer. FAQM also decreases control signalling between the transmitting nodes and routers by decreasing the number of ICMP messages due to a decreased packet loss rate. Furthermore, it guarantees better utilisation of link capacity with a sliding window scheme (less slow-starts due to a lower packet loss rate). Also, by keeping the queue size small, the adaptive algorithm has the ability to accommodate itself more effectively to very bursty traffic than the non-adaptive algorithm without unnecessarily dropping packets. The on-line tuning of membership functions and hierarchical reasoning also decrease the variance of the buffer length, allowing the use of smaller buffer sizes and therefore the use of more optimised circuits and algorithms on routers. Moreover, the adaptive fuzzy controller adjusts itself very effectively to the increased (over a predefined maximum level) input data rate. This is especially important if active queue management is applied in a wireless environment. Hence, we are not required
Fuzzy Congestion Control In Packet Networks
307
Fig. 10. End-to-end throughput when adaptive membership functions were used. Incoming data rate = 600 Mbit/s. to continuously update FAQM algorithms according to a link’s througput, such as increased link capacity due to, e.g., the update of interface media from cable to optical wire or from the update of IEEE802.11b to IEEE802.11a devices in wireless local area networks.
11 Conclusions Here an adaptive fuzzy congestion control solution for active queue management in packet switched networks was described. It supports end-to-end control for packet data traffic between intermediate nodes, such as routers, in the network. In the presented solution, a fuzzy controller regulates the rejection probability of received packets on intermediate nodes to prevent buffer overflow and the rejection of all transmitted packets during congestion. In the simulations we compared a fuzzy controller with a fixed shape of membership functions with the fuzzy controller with adaptive membership functions. Membership functions were updated according to the incoming traffic in order to keep the equation form rule base as simple as possible. The simulation results showed that the developed adaptive control algorithm decreased the packet loss rate with a less overflows of the buffer and increased the link utilisation via decreasing the retransmission rate more effectively. Furthermore, the algorithm also decreases the power consumption of nodes (via decreasing retransmissions) and therefore increases the communication and standby times of battery devices.
308
T. Frantti
References 1. Braden, B. and et. al.: Recoindentations on queue management and congestion avoidance in the internet rfc2309, april 1998. RFC2309, (1998) 2. Comer, D. E.: Internetworking with TCP/IP, principles, protocols, and architectures. Prentice Hall, New Jersey, 4th edition edition, (2000) 3. C.Q., Y. and A.V.S., R.: A taxonomy for congestion control algorithms in packet switching networks. IEEE Network Magazine, 9:34–35, (1995) 4. Driankov, D., Hellendoorn, H., and Reinfark, M.: An Introduction to Fuzzy Control. Springer-Verlag, New York, 3nd edition edition, (1996) 5. Fengyuan, R., Yong, R., and Xiuming, S.: Design of a fuzzy controller for active queue management. Computer Communications, 25:874–883, (2002) 6. Frantti, T.: Timing of fuzzy membership functions from data. Doctor thesis, University of Oulu, Department of Process Engineering, Oulu, Finland, (2001) 7. Frantti, T. and Mahonen, P.: Fuzzy logic based forecasting model. Engineering Applications of Artificial Intelligence, 14(2):189–201, (2001) 8. Hollot, C., Misra, V., Towsley, D., and Gong, W.: On designing improved controllers for aqm routers supporting tcp flows. In INFOCOM 2001 April 2001, Anchorage, Alaska, (2001) 9. J., N.: On packet switches with infinite storage. IEEE/ACM Transaction on Communication, 35:435–438, (1987) 10. Juuso, E.: Linguistic equations framework for adaptive expert systems. In Stephenson, J., editor, Modelling and Simulation 1992, Proceedings of the 1992 European Simulation Multiconference, pages 99–103, (1992) 11. Juuso, E.: Linguistic simulation in production control. In Pooley, R. and Zobel, R., editors, UKSS’93 Conference of the United Kingdom Simulation Society, pages 34–38, Keswick, UK, (1993) 12. Klir, G. and Folger, T.: Fuzzy Sets, Uncertainty, and Information. Prentice Hall, New York, 1st edition, (1988) 13. Leland, W. E., Taqqu, M. S., Willinger, W., and Wilson, D. V.: On the self-similar nature of ethernet traffic (extended version). IEEE/ACM Transactions on Networking, 2(1):1–15, (1994) 14. Low, S., Paganini, F., and Doyle, J.: Internet congestion control: An analytical perspective. IEEE Control Systems Magazine, 22(1):28–43, (2002) 15. Paxson, V. and Floyd, S.: Wide area traffic: the failure of poisson modeling. IEEE/ACM Transaction on Networking, 3(3):226–244, (1995) 16. Raju, G., Zhou, J., and Kisner, R.: Hierarchical fuzzy control. International Journal of Control, 54, No.5:1201–1216, (1991) 17. S., T. A.: Computer Networks. Prentice Hall, New Jersey, USA, 3rd edition, (1996) 18. Willinger, W., Taqqu, M. S., and W. E. Leland, D. V. W.: Self-similarity in high-speed packet traffic: analysis and modeling of ethernet traffic measurements. Statistical Science, 10:67–85, (1995) 19. Zadeh, L.: Fuzzy sets. Information and Control, 8:338 – 353, (1965) 20. Zimmerman, H. J.: Fuzzy Set Theory and Its Applications. Kluwer Academic Publishers, Massachusetts, USA, 5th edition, (1992)
CHAPTER 22 Fuzzy Logic Strategy of Prognosticating TCP’s Timeout and Retransmission Zhongwei Zhang1 , Zhi Li1 and Shan Suthaharan2 1
2
Dept of Mathematics and Computing University of Southern Queensland Toowoomba, QLD 4350, Australia zhongwei,
[email protected] Department of Mathematical Sciences University of North Carolina at Greensboro Greensboro, NC 27402, USA
[email protected]
Abstract: The work presented in this paper is the design and implementation of an intelligent strategy using fuzzy logic technology to gauge the TCP timeout and retransmission value. The conventional algorithms, which are based on statistical analysis, perform in a marginally acceptable way for estimating theses two values. But they have been shown to be increasingly incapable of dealing with more complicated TCP traffic due to ignorance of traffic complexity. Fuzzy logic technology will be applied to estimate the TCP timeout and retransmission for the purpose of utilising artificial intelligence, combining knowledge about the network traffic and connection.
1 Introduction The size and the complexity of computer networks have grown in past years. To achieve an efficient and reliable transmission, some protocols inevitably need to handle complicated network traffics, and unexpected transmission losses. These problems usually are referred to as flow control and congestion control. The technologies of managing complex computer networks need to be more circumspect not only in the sending host and receiving hosts, but also the intermediate routers. One of these protocols is the Transmission Control Protocol (TCP) that has a responsibility of ensuring reliability. Because TCP guarantees the reliable delivery of data, it retransmits each segment if an ACK is not received in a certain period of time. TCP sets this timeout as a function of the round trip time (RTT) it expects between the two ends of the connection. Unfortunately, given the range of possible RTTs between any pair of hosts in
Z. Zhang et al.: Fuzzy Logic Strategy of Prognosticating TCP’s Timeout and Retransmission, Studies in Computational Intelligence (SCI) 2, 309–320 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
310
Z. Zhang et al.
the Internet, as well as the variation in RTT between the same two hosts over time, choosing an appropriate timeout value is not very easy [1]. To address this problem, TCP uses an adaptive retransmission mechanism. An important improvement occurred in 1986 when a simple algorithm was developed. The idea is that every time TCP sends a data segment, it records the time. When an ACK for that segment arrives, TCP reads the time again, and then takes the difference between these two times as a SampleRTT . TCP then computes an EstimatedRTT as a weighted average between the previous estimate and this new sample. The main problem with the simple algorithm is that it does not take the variance of the sample RTTs into account. An improved algorithm was then developed. For details about these conventional methods, refer to section 2. The conventional methods which have been used in the past years have largely relied on statistical data analysis technology. However, the study of using fuzzy logic to improve TCP performance has been proposed in [2] where a fuzzy logic controller has been used to selectively drop in the cells in an unspecified bit rate (UBR) service. This paper introduces a new method for calculating TCP’s timeout and retransmission value. This paper has been organized in four sections. Section 2 explain the TCP timeout and retransmission problem and survey the relevant strategies. In section 3, a new approach is introduced based on fuzzy inference. The experiment results are presented in section 4. In the last section, we conclude our paper. In addition we also outline some possible improvements for the future.
2 TCP Timeout and Retransmission TCP is one of the most predominate transport layer protocols in the TCP/IP protocol suits. TCP provides a reliable transport layer, even though the service it uses (IP) is unreliable. Every piece of TCP data that gets transferred around the Internet goes through the IP layer at both end systems and at every intermediate router. One approach for providing reliability is for each end to acknowledge the data it receives from the other end. But data segments and acknowledgment can get lost. TCP handles this by setting a timeout when it sends data, and if the data isn’t acknowledged when the timeout expires, it retransmits the data. Fundamental to TCP’s timeout and retransmission is the measurement of the round-trip time (RTT) experienced on a given connection. The RTT is dynamic over time, due to changes in routes and changes in network traffic, and TCP should track these changes and modify its timeout accordingly. In the early days of the Internet, a simple algorithm was used to determine the timeout (RTO). This algorithm doubles an estimated RTT; while an estimated RTT is a weighted RTT and the previous estimated RTT. That is: EstimatedRT T = α × EstimatedRT T + (1 − α) × SampleRT T
(1)
Fuzzy Logic Strategy of Prognosticating TCP’s Timeout and Retransmission
311
TimeOut = 2 × EstimatedRT T
(2)
This algorithm has been proven to be rather conservative. It also has a flaw: RTT might not be the time difference between the first acknowledgment or the second with the data transmission. The Internet was suffering from high levels of network congestion at that time. An improved algorithm was introduced in 1987 [9] to mitigate the causes of congestion. The algorithm can at best fix some causes of that congestion, but cannot completely eliminate it. Soon, another approach has was proposed to battle congestion [4]. TCP calculates the round-trip time and then uses these measurements to keep track of a smoothed RTT estimator and a smoothed mean deviation estimator. These two estimators are then used to calculate the next retransmission timeout value. The new algorithm folds the variance into the timeout calculation as follows: Di f f erence = SampleRT T − EstimatedRT T
(3)
EstimatedRT T = EstimatedRT T + (δ × Di f f erence)
(4)
Deviation = Deviation + δ(|Di f f erence| − Deviation)
(5)
TimeOut = µ × EstimatedRT T + φ × Deviation
(6)
Where based on experience, δ is a fraction between 0 and 1, µ is typically set to 1 and φ is set to 4. Thus, when the variance is small, TimeOut is close to EstimatedRTT ; a large variance causes the Deviation term to dominate the calculation. Apparently, the approaches of estimating the timeout and retransmission value are conventionally based on statistical analysis. In the next section, we propose a new approach [2, 3, 5] based on fuzzy logic. Nevertheless, the basis of the new approach of computing the timeout and retransmission value is the same as the Jacobson/Karns algorithm. More specifically, the new approach computing the next timeout value is also based on the two smoothed estimators.
3 A Fuzzy Approach of Estimating the Timeout Value Fuzzy logic looks at the world in imprecise terms, in much the same way that our own brain takes in information [6, 7]. While considering the retransmission of a packet, it is important to estimate an appropriate value for TCP to retransmit a packet (RTO). A big estimated RTO value results in data transmission becoming idle and thus network resources being wasted, while a small estimated RTO value results in unnecessary retransmission. In order to keep the network running efficiently, we designed a fuzzy system to implement an RTO setting which is capable of tracking the trend of RTTs and quickly mastering the situation of the network according to previous RTTs. We knew that the relationship between RTT and RTO is nonlinear. Fuzzy inferencing systems perform well in the nonlinear dynamic systems. The fuzzy system is designed as in Figure 1: The input variables for the system contain:
312
Z. Zhang et al.
SampleRTT
∆RTO_RTT ∆ RTT
TimeOut and Retransmission Estimator (Fuzzy System)
∆ RTO
Fig. 1. Fuzzy inference system
(1) ∆RTT: the variance between the current RTT and the previous one; (2) ∆RTO RTT: the deference between the current RTO and the current RTT. The output variable for the system contains: ∆RTO: the variance between the next RTO and the current RTO. Once we have the variance between the next RTO and the current one, it is straightforward to calculate the next RTO as follows: RT Pnext = RT O prev + ∆RT O
(7)
Seven membership functions were used for each of the two inputs and for the output. The membership functions of ∆RTT defined fuzzy sets for NB: Negative Big, NM: Negative Medium, NS: Negative Small, ZERO: A fuzzy set, PS: Positive Small, PM: Positive Medium and PB: Positive Big. The membership functions of ∆RTO RTT defined fuzzy sets for VH: Very High, H: High, MH: Medium High, M: Medium, ML: Medium Low, L: Low, and VL: Very Low. The seven membership functions for the output are NB: Negative Big, NM: Negative Medium, NS: Negative Small, ZERO: A special fuzzy set,
Fuzzy Logic Strategy of Prognosticating TCP’s Timeout and Retransmission
313
PS: Positive Small, PM: Positive Medium and PB: Positive Big. Although the input and output are similarly labeled, the membership functions for the variables were independently specified and adaptable. The fuzzy estimator thus comprised a total of 21 linguistic variables. Triangular and trapezium shapes are adopted for both input and output membership functions. Triangular and trapezium shapes are adopted for both input and output membership functions. The distribution of membership functions are related with the average of RTT and RTO. Figure 2 shows the membership functions of the input variables and output linguistic variables. Note that the memberships for ∆RTT and ∆RTO are symmetric, but the membership function for ∆RTO RTT is asymmetric. NB
NM
NS
Zero
PS
PB
PM
1
0.8
0.6
0.4
0.2
0 −6
−1
−0.5
NB
1
NM
0
NS
Zero
0.5
PS
1
6
PM
PB
0.8
0.6
0.4
0.2
0 6
1
−2
VL
L
−1
−1.5
−0.5
ML
M
MH
0.5
1
1.5
0
0.5
1.5
1
H
2
6
VH
0.8
0.6
0.4
0.2
0 0
2
2.5
3
3.5
6
Fig. 2. Membership functions for the input and output variables
The design of the fuzzy linguistic rules for the inference of RTO takes into consideration the following conditions: • •
The timeout is related to congestion. If you timeout too soon, you may unnecessarily retransmit a segment, which only adds to the load on the network. The timeout is related to the network situation. The previous RTTs are a good indicator of this; the network situation is well reflected by the average RTTs.
314
Z. Zhang et al.
• The difference between the previous RTT and the mean RTT will contribute significantly to the current RTT accordingly. • The current RTO is heavily reliant on the difference between the previous RTO and current RTT. The fuzzy rules base consists of 7 rules, which are summarized in Table 1. For instance, if ∆RTT is NB OR ∆RTO RTT is VH, then ∆RTO is NB
The principle of the fuzzy rules is following the track of the RTT of data transmission and keeping the distance between the estimated RTO and practical RTT around an ideal time interval (1 second). Table 1. Fuzzy rule base #
∆RTT
1 2 3 4 5 6 7
NB NM NS ZERO PS PM PB
RTORTT VH H MH M ML L VL
∆RTO NB NM NS ZERO PS PM PB
Min-product inference was used, along with center of gravity defuzzification. For more information about gravity defuzzification, refer to [6].
4 Experiment Results The fuzzy system has been tested on two different kinds of networks. It has been used to estimate the RTO on a real network and on a simulated network. With the simulated network, the fuzzy system has been used on a set of networks with different congestion levels. 4.1 A Real Network An experiment can be carried out on the Internet or a computer network which constantly results in packet loss. We performed this experiment on our experimental network consisting of three Linux boxes named as maximian, Gordian and valerian. Basically, the timer for the connection is started when segment 1 is transmitted, and
Fuzzy Logic Strategy of Prognosticating TCP’s Timeout and Retransmission
315
turned off when its acknowledgement (segment 2) arrives. The timer usually reports the clock ticks. We need to determine how many seconds the timeout value is. Figure 3 shows the relations between the actual RTT and the counted clock ticks. 0.03
0.13
0.53
0.0
1.53
1.061 1.063 on
off
1.061 sec, 3ticks
2.03
1.871 1.872 off on
on
3.03
2.53
2.887 off 1.015sec, 2ticks
0.808sec,1tick
Fig. 3. Measuring the Timeout
The proposed fuzzy system has been implemented in MATLAB. The measured RTTs have been recorded by the tcpdump program [8]. In this experiment, 128 segments were transmitted, and 18 RTT samples were collected. Figure 4 shows the measured RTT along with the RTO used by TCP for the timeout. Note that the top one is calculated by using the Karn’s algorithm and the middle one is calculated by the fuzzy estimator; while the bottom one are the measured RTTs. 6 RTT RTO using FL RTO with conventional method 5
4
3
2
1
0
0
5
10
15
20
25
30
35
Fig. 4. MeasuredRTT and RTOs calculated
This experiment has demonstrated that fuzzy inference can be applied in predicting the TCP timeout and retransmission value. Secondly, it is obvious that the RTO calculated by the fuzzy estimator is less than that calculated by the Karn’s algorithm. It indicates that the RTO calculated by using fuzzy inference is finer. Also the fuzzy estimator can quickly get ready to predict the RTOs, meaning the RTO drops so quick that it gives a much better estimation within 3 seconds. The calculated RTO by our fuzzy strategy is more sensitive to the dynamics of MeasuredRTTs, and RTO by the Karn’s method is less sensitive to the changes of
316
Z. Zhang et al.
RTTs. This insensitivity might cause the inefficiency of TCP in sending packets and possibly result in congestion in the routers. 4.2 Simulated Networks All the simulations are conducted using the NS2 network simulator [10] as a platform. TCP-Reno is adopted. Simulation Topology We simulate environments where a bottleneck link is between a premises on the client side and an edge router on the ISP side, as in reality. In fact, such links are among the most cost-sensitive and bandwidth constrained components in the Internet, and remains the most concern of a ISP. A dumbbell topology used in simulations is depicted in Figure 5. There are two servers, two clients, and three Internet routers. The bottleneck link capacity, in our simulations, is set as 2Mb, when the capacity of the other links is either 10Mb or 100Mb as shown in the figure. R1 is a core router, while R2 is an edge router and R3 is a premises. One way TCP traffic has been used in the simulations. There are two kinds of traffic, one from server s1 to client c1, the other one from server s2 to client c2. The propagation delays of these two traffic types are 44ms and 80ms, respectively. The AQM schemes are implemented in router R2 to conduct the performance comparison with Droptail. The output queue buffer of router R2 is 300 packets. A dropping strategy is used to inform the TCP senders.
s1 s2
10Mbps 2ms
20ms R1 Servers
2Mbps 20ms
100Mbps 20ms R2
10Mbps 2ms
c1
c2 R3 20ms Clients
Fig. 5. Network topology
Traffic Pattern It is well known [11] that Internet traffic tends to be made up of a large number of quite small flows, a small number of very large flows, and nothing much in-between. This observation follows from another well accepted model of Internet traffic – the Poisson Pareto Burst Process (PPBP) model [12, 13]. In this paper, the PPBP traffic model is generated by setting the inter-arrival time of two kinds of traffic subject to non-positive exponential distribution and file sizes of each flow subject to the Pareto distribution. In our simulations, the Pareto distribution has an average flow size of 12 packets (1000 bytes for each packet), and
Fuzzy Logic Strategy of Prognosticating TCP’s Timeout and Retransmission
317
with a shape parameter of 1.2. We vary the inter-arrival time to get different traffic loads to the network. With the topology shown in Figure 5, for example, traffic load is approximatively 80%, 100%, 125% of the bottleneck link capacity, when the inter arrival time variable λ is 8, 10, 12.5 respectively by using the following formula: tra f f icl oad =
(40 + a f s × 1040) × 8 × n λ×c
(8)
We give the explanation for the above formula as follows. SYN control packets are 40 bytes and each data packet is 1040 bytes with 1000 bytes of data and 40 bytes of header. a f s stands for average flow size. Variable n is the number of traffic types, and in our case it is two, one from Server s1 to client c1, and the other from Serve r s2 to client c2. Variable c is the bottleneck link capacity, and we have chosen c as 2Mb. traffic distribution with lamda=10.0, average_size=12, and gama=1.2 1
simulation duration: 400s
Probability
0.1
0.01
0.001
1
100
10
Flow Length
Fig. 6. Traffic pattern with λ = 10
Before conducting each simulation, the generated traffic based on the method mentioned above has been checked to find a simulation duration long enough to get PPBP traffic. For example, when variable λ is 10, we have found 400s is an appropriate simulation time interval, and the generated traffic is approximately PPBP . Figure 6 shows the traffic pattern using a log scale on the axes. Simulation Results The simulations have been carried out with three scenarios, including lambda = 8, lambda = 10, and lambda = 12.5. We randomly choose one TCP connection as a sample to examine the performance of the proposed fuzzy estimator with the comparison of Karn’s algorithm in each scenario. The simulation results have been shown in Figure 7, Figure 8, and Figure 9. • Scenario 1: With the parameter lambda = 8 in the network, the traffic load is 80%, which indicates the network is relatively relaxed, not much congested. In
318
Z. Zhang et al.
this case, there exists considerable oscillation in RTT caused by the bursty nature of the Internet. The difference between the RTT and RTO is fairly big, though 6
RTT RTO using FL RTO with conventional method 5
4
3
2
1
0 360
390
380
370
400
430
420
410
440
Fig. 7. where λ = 8
the TCP RTO predicted by Karn’s algorithm is converging at the late stage of the experiment. • Scenario 2: With the parameter lambda = 10 in the network, the traffic load is heavy in the network. In this case, the RTTs are larger than those in the previous scenario, and with smaller oscillation. Like scenario 1, the difference between 6 RTT RTO using FL RTO with conventional method
5
4
3
2
1
0 310
320
330
340
350
360
370
Fig. 8. where λ = 10
380
390
400
410
Fuzzy Logic Strategy of Prognosticating TCP’s Timeout and Retransmission
319
the RTT and RTO is fairly big, and the RTO predicted by the Karn’s algorithm is converging at the late stage of the experiment. • Scenario 3: With the parameter lambda = 12.5 in the network, the traffic load is very heavy in the network. The output buffer in the bottleneck is full most of the time. And thus, the RTTs keep steady. For a relatively congested network, the 6 RTT RTO using FL RTO with conventional method 5
4
3
2
1
0 140
160
180
200
220
240
260
280
300
320
Fig. 9. where λ = 12.5
RTTs and RTOs predicted by Karn’s algorithm are getting closer. All the results indicate that the proposed fuzzy estimator is able to be consistent with the tendency of RTT with a certain desired distance. In addition, for a congested network where the parameter lambda = 12.5, the fuzzy estimator has also reduced TCP timeout, which is obvious at the early phase of the experiment. In the later phase, the fuzzy estimator performed not worse than the TCP-Reno on average, although occasionally the TCP timeout is a little bigger.
5 Conclusion This paper has presented a fuzzy estimator for TCP timeout and retransmission. The main feature of this method is the application of fuzzy logic prediction. It has shown that the performance of the fuzzy inference system is much finer than the conventional methods. In particular the TimeOut calculated by our fuzzy estimator is always less than that estimated by Karn’s algorithm. The fuzzy estimator for TCP timeout has been also tested on a set of simulated networks with different traffic loads. The fuzzy estimator produced favorable results,
320
Z. Zhang et al.
although the improvement to the TCP timeout was so obvious as the one on the real network. This research can be extended in the following directions: 1. The input variables for the fuzzy system can be more specific, which means we can use the RTT directly. 2. The fuzzy rules can be refined using the max-product. 3. This fuzzy logic can be applied in other TCP algorithms such as the slow start, fast recovery and congestion window control.
References 1. Karn, P., and Partidge, C. Improving Round-Trip Time Estimates in Reliable Transport Protocols, Computer Communication Reviews, vol. 17, no. 5 pp2-7 (Aug), 1987. 2. Lim, H. H., and Qiu, B. Performance Improvement of TCP using Fuzzy Logic Prediction, Proceedings of 2001 International Symposium on Intelligent Signal Processing and Communication Systems, Nashville, Tennessess, USA, November 20-21, 2001, pp152156. 3. Kosko, Bart, Neural Networks and Fuzzy Systems, Englewood Cliffs, N. J.: Prentice Hall, Inc., 1992. 4. Comer, Douglas E. and Stenves, David L.: Internetworking with TCP/IP: Design, Implementation and Internals, vol 3, Prentice-Hall, Inc, 1994. 5. Zadeh, L. A: Outline of a new approach to the analysis of complex system and decision process, IEEE Transaction on Systems, Man and Cybernetics SMC-2, pp28-44, 1973. 6. Zimmermann, H.-J: Fuzzy Set Theory and Its Application, 3rd edition, Kluwer Academic Publishers, Boston, 1996. 7. Yager, R. and Filev, D. P.: Essentials of fuzzy modeling and control, John Wiley & Sons, Inc, 1994. 8. Van Jacobson, Caraig Leres and Steven McCanne: Freeware tcpdump, ftp://ftp.ee.lbl.gov 9. Larry L. Peterson and Bruce S. Davie Computer Networks, Morgan Kaufman Publishers, 2000. 10. S. McCanne and S. Floyd, Network simulator ns-2, http://www.isi.edu/nsman/ns, 1997. 11. V. Paxson and S. Floyd: Wide-area traffic: The failure of poisson modeling, IEEE/ACM Transaction on Networking, 3(3), pp226-244, 1995. 12. N. Likhanov, B. Tsybakov, and N. D. Georganas: Analysis of an ATM buffer with selfsimilar (“fractal”) input traffic, In Proceedings, IEEE Infocom 1995, pp1-15, April, 1995. 13. R. G. Addie, T. M. Neame, and M. Zukerman: Performance evaluation of a queue fed by a Poisson Pareto burst process, Computer Networks, 40:377-397, October 2002.
CHAPTER 23 Measuring User Satisfaction in Web Searching M. M. Sufyan Beg and Nesar Ahmad Department of Electrical Engineering Indian Institute of Technology, Delhi New Delhi, 110 016, India {mmsbeg,nahmad}@ee.iitd.ac.in
Abstract- Search engines are among the most popular as well as useful services on the web. But the problem we face is due to the large number of search engines publicly available for the purpose. Which one of them is to be kept as our first choice? To answer this crucial question, we need to know how do these search engines compare. In this work, our aim is to assess the quality of search results obtained through several popular search engines. We propose to collect the feedback of the user to a document in terms of the sequence in which he picks up the results, the time he spends at those documents and whether or not he prints, saves, bookmarks, e-mails to someone or copies-and-pastes a portion of that document. The document importance weight is then calculated from the user feedback by a proposed expression. Alternatively, each component of the user feedback vector is converted to a preference relation. These individual preference relations are then aggregated using the Linguistic Ordered Weighted Averaging (LOWA) technique. The combined preference ordering is then obtained from the combined preference relation using the Quantifier Guided Dominance Degree (QGDD). The Spearman Rank Order Correlation Coefficient (rs) is found between the combined preference ordering and the initial result listing of the search engine. We repeat this procedure for an ad-hoc set of queries and take the average of rs. The resulting average value of rs is the required measure of user satisfaction in web searching. We show our results pertaining to 15 queries and 7 public search engines, namely, AltaVista, DirectHit, Excite, Google, HotBot, Lycos and Yahoo. Keywords- Search Engines, World Wide Web, User Satisfaction, Search Quality Measure, Ordered Weighted Averaging.
1 Introduction Resource discovery on the Internet relies heavily on fast and good quality document search engines. A number of search engines are available to Internet users today and more are likely to appear in the future. But, the results for the same query from different discovery engines vary considerably. We made an effort in [1] to quantify the quality of web search results. We employ a subjective approach, in which we measure the "satisfaction" a user gets when presented with search results. We watch the actions of the user on the search results presented in response to his query, and infer the feedback of the user therefrom. The M.M.S. Beg and N. Ahmad: Measuring User Satisfaction in Web Searching, Studies in Computational Intelligence (SCI) 2, 321–336 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
322
M.M.S. Beg and N. Ahmad
implicit ranking thus provided by the user is compared with the original ranking given by the search engine. The correlation coefficient obtained this way is averaged for a set of queries. The resulting average value is the required quantitative measure of the search quality (SQM). We also propose an alternative approach of gauging user satisfaction from the user feedback vector by using fuzzy theory. We begin our discussion with a brief review of related work in section 1.1. The technique of obtaining the user feedback and then measuring the search quality will be discussed in section 2. Section 3 will discuss our experiments and results. Finally, we conclude in section 4.
1.1 Related Work To the best of our knowledge, no attempt has been made to quantify the user's satisfaction to search results. However, by other means, efforts have been made to compare the performance of various search engines. The underlying principle has been to collect a uniform sample of web pages by carrying out random walks on the web. This uniform sample is then used to measure the size of indices of the various search engines. This index size is an indirect means to estimate the performance of a search engine. The larger the index size, the more the web coverage and the more likely the emergence of good search results. Some of the efforts in this direction are given in [2], [3] and [4]. The relative size and overlap of search engines has also been found in [5], but this time, instead of random walks, random queries have been used. These random queries are generated from a lexicon of about 400,000 words, built from a broad crawl of roughly 300,000 documents in the Yahoo hierarchy. In [6] and [7], the same goal, viz. comparing the search engines, has been achieved using a standard query log like that of the NEC Research Institute. In another instance [8], a test data set has been made available for evaluation of web search systems and techniques by freezing a 18.5 million page snapshot of a part of the web. One of the recent works in this direction appears in [9]. For two different sets of adhoc queries, the results from AltaVista, Google and InfoSeek are obtained. These results are automatically evaluated for relevance on the basis of vector space model, which is a content-based technique, with no apparent consideration to the satisfaction of the user. These results are found to agree with the manual evaluation of relevance based on precision. Precision scores are given as 0, 1 or 2. But then this precision evaluation is similar to the formfilling exercise, which may turn out to be too demanding for users. Most of the users may not be willing to waste time in filling out such a form. Even if we provide some kind of incentive for filling out these forms, the users might provide any careless or even intentionally wrong feedback, just to bag that incentive. Precision evaluation of search engines is reported in [14]. But then, "precision" being just the ratio of retrieved documents that are judged relevant, it doesn't say anything about the ranking of the relevant documents in the search results. Using only the precision evaluation, other important aspects of web search evaluation such as recall, coverage, response time and web coverage, etc., are also missed out. Here, with the quantification of "user satisfaction" on the whole, we aspire to get a complete picture of web search evaluation in this work. In fact, it is acknowledged in [14] itself that the major benefit of subjective evaluation of web searching is accuracy.
Measuring User Satisfaction in Web Searching
323
2 SEARCH QUALITY Let us begin with some useful definitions. Definition 1. Given a universe U and T U, an ordered list (or simply, a list) l with respect to U is given as
>d1, d 2 ,..., dITI @ , with each di T , and
l
d1
d2
...
dT
,
where " " is some ordering relation on T. Also, for i U i l , let l(i) denote the position or rank of i, with a higher rank having a lower numbered position in the list. We may assign a unique identifier to each element in U, and thus, without loss of generality, we may get U={1,2,…,|U|}. Definition 2. Full List: If a list l contains all the elements in U, then it is said to be a full list. Example 1. A full list l given as [c,d,b,a,e] has the ordering relation c d b a e . The universe U may be taken as {1,2,3,4,5} with, say, a { 1, b { 2, c { 3, d { 4 and e { 5. With such an assumption, we have l = [3,4,2,1,5]. Here l(3){l(c)=1, l(4){l(d)=2, l(2){l(b)=3, l(1){l(a)=4, l(5){l(e)=5. Definition 3. Partial List: A list l containing elements, which are a strict subset of U, is called a partial list. We have a strict inequality |l|<|U|. Definition 4. Spearman Rank Order Correlation Coefficient: Let the full lists ao , a1 ,..., aN 1 and bo , b1 ,..., bN 1 be the two rankings for some query Q. The rank-
>
>
@
@
ing >ao , a1 ,..., a N 1 @ can be partitioned into ga groups, where each group has rankings that are in tie. Let the number of elements in these groups be ranking
>bo , b1 ,..., bN 1 @ can also be partitioned in g
in each group be
b
u , u ,..., u . Similarly, the o
1
ga
groups. Let the number of elements
v , v ,...,v . We define the Spearman rank-order correlation coeffio
1
gb
cient (rs) as follows. N 1 §1 3 · 2 ¨ N N ¦ ak bk U c V c ¸ 6 k 0 © ¹ rs 1 3 §1 3 · § · ¨ N N 2U c ¸¨ N N 2V c ¸ 6 6 © ¹© ¹
where,
Uc
1 ga 3 ¦ uk uk 12 k 1
and
Vc
1 gb 3 ¦ vk v k 12 k 1
The Spearman rank-order correlation coefficient (rs) is a measure of closeness of two rankings. The coefficient rs ranges between -1 and 1. When the two rankings are identical, rs = 1, and when one of the rankings is the inverse of the other, then rs = -1.
324
M.M.S. Beg and N. Ahmad
2.1 User Feedback Vector The underlying principle of our approach of performance evaluation of search engines is to measure the "satisfaction" a user gets when presented with the search results. For this, we need to monitor the response of the user to the search results presented before him. We characterize the feedback of the user by a vector (V,T,P,S,B,E,C), which consists of the following: (a) The sequence V in which the user visits the documents, V=(v1, v2, …vN). If document i is the kth document visited by the user, then we set vi = k. If a document i is not visited by the user at all before the next query is submitted, the corresponding value of vi is set to –1. (b) The time ti that a user spends examining the document i. We denote the vector (t1, t2, …, tN) by T. For a document that is not visited, the corresponding entry in the array T is 0. (c) Whether or not the user prints the document i. This is denoted by the Boolean pi. We shall denote the vector (p1, p2, …, pN) by P. (d) Whether or not the user saves the document i. This is denoted by the Boolean si. We shall denote the vector (s1, s2, …, sN) by S. (e) Whether or not the user bookmarked the document i. This is denoted by the Boolean bi. We shall denote the vector (b1, b2, …, bN) by B. (f) Whether or not the user e-mailed the document v. This is denoted by the Boolean ei. We shall denote the vector (e1, e2, …, eN) by E. (g) The number of words that the user copied and pasted elsewhere. We denote the vector (c1, c2, …, cN) by C. The motivation behind collecting this feedback is the belief that a well-educated user is likely to select the more appropriate documents early in the resource discovery process. Similarly, the time that a user spends examining a document, and whether or not he prints, saves, bookmarks, e-mails it to someone else or copies and pastes a portion of the document, indicate the level of importance that document holds for the specified query.
2.2 The Search Quality Measure (SQM) When feedback recovery is complete, we compute the following weighted sum Vj for each document j selected by the user.
§ · ¨w 1 w tj w p w s w b w e w cj ¸ V T P j S j B j E j C v j 1 ¨ 2 t jmax c jtotal ¸¹ ©
Vj where
t
j max
(1)
represents the maximum time a user is expected to spend in examining the
document j, and
c
j total
is the total number of words in the document j. Here, wV, wT, wP, wS,
wB, wE and wC, all lying between 0 and 1, give the respective weights we want to give to each of the seven components of the feedback vector. The sum Vj represents the importance of document j. The intuition behind the formulation in equation (1) is as follows. The importance of the document should decrease monotonically with the postponement being afforded by the user in picking it up. The more the time spent by the user in glancing through the docu-
Measuring User Satisfaction in Web Searching
325
ment, the more important it must be for him. If the user is printing the document, or saving it, or book-marking it, or e-mailing it to someone else, or copying and pasting a portion of the document, it must have some importance in the eyes of the user. A combination of the above seven factors by simply taking their weighted sum gives the overall importance the document holds in the eyes of the user. As regards "the maximum time a user is expected to spend in examining the document j", we clarify that this is taken to be directly proportional to the size of the document. We assume that an average user reads at a speed of about 10 bytes per second. This includes the pages containing text as well as images. So a document of 1 kB size is expected to take a minute and 40 seconds to go through. This default reading speed of 10 bytes per second may be set differently by the user, if he wishes so. It may be noted that depending on his preferences and practice, the user could set the importance of the different components of the feedback vector. For instance, if a user does not have a printer at his disposal, then there is no sense in setting up the importance weight (wP) corresponding to the printing feedback component (P). Similarly, if a user has a dialup network connection, and so he is in a habit of saving the relevant documents rather than spending time on it while online, it would be better to give a higher value to wS, and a lower value to wT. In such a case, lower values may also be given to wP, wE and wC, as he would not usually be printing, e-mailing or copying and pasting a document at a stretch while online. So, after explaining the modalities to him, the user is requested to modify the otherwise default values of 1 for all these weights. It may, however, be noted that the component of the feedback vector corresponding to the sequence of clicking, always remains to be the prime one and so wV must always be 1. Now, sorting the documents on the descending values of Vj will yield a sequence ¦. Let the full list U be the sequence in which the documents were initially short-listed. Without loss of generality, it could be assumed that U = (1,2,3,…,N), where N is the total number of documents listed in the result. We compare the sequences ¦ and U, and find the Spearman Rank Order Correlation Coefficient (rs). We repeat this procedure for a representative set of queries and take the average of rs. The resulting average value of rs is the required quantitative measure of the search quality (SQM). The above procedure is illustrated in Figure 1.
326
M.M.S. Beg and N. Ahmad
Fig. 1. Search Quality Evaluation
2.3 Some finer points in SQM We must note here that it is a very common practice that the user views only those documents whose snippets displayed before him by the search engine he finds to be worth viewing. This would give only the ranking of the documents viewed by the user. This is to say that the list ¦ would almost invariably always be a partial list. In such a case, it is assumed that the rest of the documents are implicitly declared irrelevant by the user and so they are sequenced in a reverse order to complete the sequence ¦, so as to penalize the search engine for displaying irrelevant information. For instance, the user looks at the documents in such a way that sorting documents based on Vj gives the sequence as 5, 2 and 7. Then for a total of ten documents, the sequence ¦ would be obtained as 5, 2, 7, 10, 9, 8, 6, 4, 3 and 1. As seen in section 3, this harsh step of penalization brings down the search quality measures of the search engines drastically. In an effort to moderate this step, we used an average ranking method. Let there be a partial list lj and a full list l, with the number of elements in them being |lj| and |U|, respectively. In order to evaluate the Spearman Rank Order Correlation Coefficient between lj and l, we first complete lj into a full list as follows.
> @
l j l j i
unchanged , if i d l j ° ® ° x x U x l , otherwise j ¯
Measuring User Satisfaction in Web Searching
327
Next, in order to complete the average ranking method, we modify the positions in the full list l as follows.
l i
unchanged , if i d l j ° °° U ® ¦ l k ° k l j 1 , otherwise ° °¯ U l j
Now, we can find the Spearman Rank Order Correlation Coefficient (rs) between the lists l and lj as explained in definition 4. Example 2. For |U|=5, let the full list be l={5,4,1,3,2} and the partial list lj with |lj|=3 be lj={2,1,4}. We shall first complete lj into a full list as lj={2,1,4,3,5} and also modify l as l={5,4,1,2.5,2.5}. An extreme case would be when no matching document is found by a search engine for a given query. In such a case, the engine must be penalized for the lack of information by assuming the sequence ¦ to be just the reverse of U, and hence the Spearman rank-order correlation coefficient (rs) would be taken as -1 for that case. Similarly, if a search engine lists a document location, which when accessed, cannot be found, either due to a "404 error" or the document having moved elsewhere, the engine is penalized for this outdated information by taking its time fraction t j as well as the word fraction c j to
t
j max
c
j total
be zero. In the following section, we use fuzzy techniques to provide an alternate procedure of evaluating user satisfaction to the search results from the user feedback vector.
2.4 Fuzzy Search Quality Measure (FSQM) In equation (1), we combined all seven components of the feedback vector in a heuristic way to get the importance weight of each document. However, there remains a question as to what numerical value should we assign in practice to the weights wV, wT, wP, wS, wB, wE and wC. In other words what respective weightages should we give to each of the seven components of the feedback vector. We have reduced this problem with a fuzzy technique. We propose to combine the seven components of the user feedback vector, in such a way that the need of their respective weightages is replaced by some appropriate fuzzy linguistic quantifier [10]. This approach results in what we call as Fuzzy Search Quality Measure (FSQM), as depicted in Figure 2. The procedure of FSQM is similar to that of SQM, except that the User Feedback Vectors are transformed to a set of fuzzy preference relations, which are then aggregated through a series of fuzzy linguistic quantifiers, to result finally in user preference ordering.
328
M.M.S. Beg and N. Ahmad
Fig. 2. Fuzzy Search Quality Measure Definition 5. Preference Ordering of the Alternatives: A full list, which is a permutation over the set (1,2,…,N}, giving an ordered vector of alternatives, from best to worst. Definition 6. Utility Function Ordering: Preferences on some X = {x1,x2,…,xN} are given as a set of N utility values, ui ; i 1,..., N , ui >0,1@ , where ui represents the utility
^
`
evaluation given to the alternative xi. Definition 7. Fuzzy Preference Relation [11]: Preferences on some X = {x1,x2,…,xN} are described by a fuzzy preference relation, PX X u X , with membership function
P P : X u X o >0,1@ , where P P xi , x j pij X
X
denotes the preference degree of the al-
ternative xi over xj. pij = 1/2 indicates indifference between xi and xj, pij = 1 indicates that xi is unanimously preferred over xj, pij > 1/2 indicates that xi is preferred over xj. It is usual to assume that pij + pji = 1 and pii = 1/2. As shown in Figure 2, we first convert the preference ordering given by the sequence vector V=(v1, v2, …vN) into a fuzzy preference relation RV as [13]:
RV i, j
1 § v j vi · ¨1 ¸ 2 ¨© N 1 ¸¹
(2)
Example 3. For V = (2,1,3,6,4,5), we see that N=6. Using equation (2), we get:
RV
t
1
ª0.5 «0.6 « «0.4 « « 0.1 «0.3 « ¬«0.2
0.4 0.6 0.9 0.7 0.8º 0.5 0.7 1 0.8 0.9»» 0.3 0.5 0.8 0.6 0.7 » » 0 0.2 0.5 0.3 0.4» 0.2 0.4 0.7 0.5 0.6» » 0.1 0.3 0.6 0.4 0.5¼»
Next, we convert the utility values given by normalized time vector T/Tmax =
t1max , t2 t2 max ,..., t N t N max
,
normalized
cut-paste
vector
C/Cmax
=
Measuring User Satisfaction in Web Searching
c
1
329
c1max , c2 c2max ,..., cN cNmax , and all the remaining vectors, namely, P = (p1, p2, …,
pN), S = (s1, s2, …, sN), B = (b1, b2, …, bN), E = (e1, e2, …, eN), into the corresponding fuzzy preference relations as [13]:
x x x 2
RX i, j
i
2
(3)
2
i
j
where X may be substituted by T or P or S or B or E or C, as the case may be. Example 4. For T /Tmax = (0.3,0.9,0.4,0.2,0.7,0.5), we see that N=6. Using equation (3), we get:
RT
ª0.50 «0.90 « «0.64 « « 0.31 «0.84 « ¬«0.74
0.10 0.36 0.69 0.16 0.26º 0.50 0.84 0.95 0.62 0.76»» 0.16 0.50 0.80 0.25 0.39» » 0.05 0.20 0.50 0.08 0.14» 0.38 0.75 0.92 0.50 0.66» » 0.24 0.61 0.86 0.34 0.50¼»
This way, we would have a set of seven fuzzy preference relations, with a one-to-one correspondence with the seven components of the user feedback vector. What must follow is the aggregation of these individual preference relations into a combined preference relation (RA). This is achieved by what is called a Linguistic OWA [12] (Ordered Weighted Averaging), or LOWA, in short. Definition 8. Relative (Fuzzy Linguistic) Quantifier: A relative quantifier, Q : 0,1 o 0,1 , satisfies: Q(0) = 0,
> @ > @
> @
r 0,1 such that Q(r) =1. In addition, it is non-decreasing if it has the following property: a, b >0,1@ , if a > b, then Q a t Q b . The membership function of a relative quantifier can be represented as:
Qr
> @
0 ° °r a ® °b a °1 ¯
if r a if b d r d a , if r ! b
where a , b, r 0,1 . Some examples of relative linguistic quantifiers are shown in Figure 2, where the parameters are (0.3,0.8), (0,0.5) and (0.5,1), respectively.
330
M.M.S. Beg and N. Ahmad
Fig. 3. Relative Linguistic Quantifiers More generally, Yager [12] computes the weights wi of the aggregation from the function Q describing the quantifier. In the case of a relative quantifier, with m criteria, wi Q i / m Q i 1 m , i = 1, 2,…, m, with Q(0) = 0.
Example 5. For the number of criteria (m) = 7, the fuzzy quantifier "most", with the pair (a = 0.3, b = 0.8), the corresponding LOWA operator would have the weighting vector as w1 = 0, w2 = 0, w3 = 0.257143, w4 = 0.285714, w5 = 0.285714, w6 = 0.171429, w7 = 0. Now, we can find the combined preference relation as: 7
R A i, j
¦ w .z k
k
,
k 1
where zk is the kth largest element in the collection {RV(i,j), RT(i,j), RP(i,j), RS(i,j), RB(i,j), RE(i,j), RC(i,j)}. Example 6. Let us assume that RV(5,2) = 0.342, RT(5,2) = 0.248, RP(5,2) = 0.0, RS(5,2) = 1.0, RB(5,2) = 0.0, RE(5,2) = 0.0, RC(5,2) = 0.637. Then, using the fuzzy majority criterion with the fuzzy quantifier "most" with the pair (0.3,0.8), and the corresponding LOWA operator with the weighting vector as in Example 5, for the combined preference relation, RA(5,2) = [w1, w2, w3, w4, w5, w6, w7].[Descending_Sort(RV(5,2), RT(5,2), RP(5,2), RS(5,2), RB(5,2), RE(5,2), RC(5,2))]T = 0.0 u 1.0 + 0.0 u 0.637 + 0.257143 u 0.342 + 0.285714 u 0.248 + 0.285714 u 0.0 + 0.171429 u 0.0 + 0.0 u 0.0 = 0.1588. After the combined preference relation RA has been found using the LOWA operator and taking into account all the seven components of the user feedback vector, we need to convert RA into the combined utility function ordering, YC. We can do this using what is called the Quantifier Guided Dominance Degree (QGDD) [13], as follows.
Yc i
N
¦ w .z k
k 1
k
,
where wk is the kth element of the weighting vector of the LOWA operator, and zk is the kth largest element of the row RC(i,j), j = 1, 2,.., N. Example 7. Let N = 6. Then, for the fuzzy quantifier "as many as possible", with the pair (a = 0.5, b = 1), the corresponding LOWA operator would have the weighting vector as W=[0,0,0,1/3,1/3,1/3]. For, say, RA(4, j) = [0.332, 0.228, 0.260, 0.500, 0.603, 0.598], we would have YC(4) = 0.0 u 0.603 + 0.0 u 0.598 + 0.0 u 0.500 + 0.333 u 0.332 + 0.333 u 0.260 +0.333 u 0.228 = 0.273. If we sort the elements of YC in a descending order, we get the combined preference ordering OC.
Measuring User Satisfaction in Web Searching
331
Example 8. For YC = (0.437,0.517,0.464,0.273,0.286,0.217), sorting in descending order would give us OC = (2,3,1,5,4,6). Now, similar to the procedure of SQM, the combined preference ordering OC, is compared with U, the sequence in which the documents were initially short-listed by the search engine. The Spearman Rank Order Correlation Coefficient (rs) is found between the preference orderings OC and U. We repeat this procedure for a representative set of queries and take the average of rs. The resulting average value of rs is the required fuzzy measure of the search quality (FSQM).
3 Experiments and Results We experimented with queries on seven popular search engines, namely, AltaVista, DirectHit, Excite, Google, HotBot, Lycos and Yahoo. It may be noted here that our emphasis is more to demonstrate the procedure of measuring user satisfaction than to carry out the actual performance measurement of these search engines. It is for this reason, as well as to simplify our experiments, that we have obtained all our results with the weights in equation (1) being wV=1, wT=1, wP=1, wS=0, wB=0, wE=0 and wC=0. In other words, we are taking into account just the three components V, T and P, out of the seven components of the user feedback vector. For instance, the observation corresponding to the query document categorization query generation is given in Table 1. Table 1. Results for the Query document categorization query generation Search Engine AltaVista DirectHit Excite Google
HotBot Lycos
Yahoo
Document Preference(V) Sequence Document 1 2 2 1 1 10 1 7 1 1 2 2 3 3 4 5 1 1 2 6 1 1 2 2 3 7 1 1 2 2 3 3 4 5 5 9
Fraction of Time (T) 0.0 0.0011 0.00091 0.012 0.092 0.88 0.0 0.94 0.092 0.88 0.0 0.88 0.92 0.092 0.88 0.0 0.92 0.47
Printed (P) 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Table 1 shows that from the results of AltaVista, the second document was picked up first by the user, but zero time was spent on it, most probably because the documents was not at this location anymore, and so no printout could be taken. The second pick up was made by the user on what was listed as the first document by AltaVista, the document was
332
M.M.S. Beg and N. Ahmad
read by the user for just 0.11% of the time required to read it completely, and no printout was taken once again. This way the values corresponding to the rest of the search engines are shown in Table 1 for the query document categorization query generation. We experimented with 15 queries in all. These queries are listed in Table 2 and their results are summarized in Table 3. This experiment was carried out in August 2002. The value of FSQM is obtained using the relative quantifier "most" with the parameters (a = 0.3, b = 0.8) for aggregating the individual preference relations, and then "as many as possible" with the parameters (a = 0.5, b = 1.0) for getting the combined preference ordering. The results given in Table 3 are graphically represented in Figure 4. We have taken these 15 queries for the sake of demonstrating our procedure. We may have taken many more. In fact, the actual measure would require a continuous evaluation by taking a running average over an appropriate window size of the successive queries being posed by the users. Table 2. List of Test Queries 1.
measuring search quality
2.
mining access patterns from web logs
3.
pattern discovery from web transactions
4.
distributed association rule mining
5.
document categorization query generation
6.
term vector database
7.
client-directory-server model
8.
similarity measure for resource discovery
9.
hypertextual web search
10.
IP routing in satellite networks
11.
focussed web crawling
12.
concept based relevance feedback for information retrieval
13.
parallel sorting neural network
14.
spearman rank order correlation coefficient
15.
web search query benchmark
Measuring User Satisfaction in Web Searching
333
334
M.M.S. Beg and N. Ahmad
From Figure 4, we see that Yahoo is the best, followed by Google, Lycos, HotBot, AltaVista, Excite and DirectHit, in that order. Both the SQM as well as the FSQM exhibit a similar pattern. We also observe that most of the publicly available search engines taken in this study are much below the users' expectations. That is the reason why all but one of them get a negative value of SQM averaged over all the queries. Moreover, we were also a bit harsher in penalizing these search engines for giving irrelevant results. As explained in section 2.3, the documents that are not at all touched by the user are sequenced in a reverse order to complete the sequence, so as to penalize the search engine for displaying irrelevant information. This harsh step has brought down the search quality measures of these search engines. In an effort to moderate this step, we have taken to an average ranking method, explained in section 2.3. With this, we get the performances improved homogeneously, as shown in Figure 5.
Fig. 4. Performance of Search Engines based on the three components of user feedback vector; (V,T,P)
Measuring User Satisfaction in Web Searching
335
Fig. 5. Performance of Search Engines based on three components of user feedback vector V,T,P, (sequence of untouched documents averaged). Comparing Figures 4 and 5, we see that both SQM and FSQM follow a similar pattern. We also see that Yahoo still stands out as the best of the lot, followed by Google, Lycos, HotBot, AltaVista and Excite, in that order. DirectHit, which appeared worst in Figure 4, has however, improved its performance over AltaVista and Excite, in Figure 5. This is because DirectHit was giving more irrelevant results, and so was penalized due to the harsh measures taken for Figure 4. This eased out substantially in Figure 5. Let us, however, reiterate that these rankings of the search engines by us are just a pointer to what to expect, and should not be taken as the last word as yet. Our aim in this work is just to bring out a procedure for ranking search engines. We have just taken a few adhoc queries, 15 to be precise. For a more confident ranking of search engines, we need to have a more comprehensive set of test-queries.
4 Conclusion We have tried to quantify the search quality of a search system. The more satisfied a user is with the search results presented in response to his query, the higher the quality of the search system. The "user satisfaction" is gauged by the sequence in which he picks up the results, the time he spends at those documents and whether or not he prints, saves, bookmarks, e-mails to someone or copies-and-pastes a portion of that document. We have proposed a fuzzy technique based on relative quantifiers "most" and "as many as possible" for the proper combination of the metrics of the user feedback. Our proposition is tested on 7 public web search engines using some 15 queries. It is observed that the fuzzy measure of search quality FSQM lies in conformity with the other measure of search quality SQM.
336
M.M.S. Beg and N. Ahmad
With this limited set of queries used herein, it has been found that Yahoo gives the best performance followed by Google, Lycos, HotBot, AltaVista, Excite and DirectHit. To say this with more confidence, we need to have a better set of queries. Our aim in this work is just to bring out an improved procedure for ranking search engines.
References 1.
2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
14.
Beg M. M. S. and Ravikumar C. P., "Measuring the Quality of Web Search Results", Proc. 6th International Conference on Computer Science and Informatics - a track at the 6th Joint Conference on Information Sciences (JCIS 2002), March 8-13, 2002, Durham, NC, USA, pp. 324-328. Henzinger M. R., Heydon A., Mitzenmacher M. and Najork M., "Measuring Index Quality Using Random Walks on the Web," Computer Networks, 31, 1999, pp. 12911303. Henzinger M. R., Heydon A., Mitzenmacher M. and Najork M., "On Near Uniform URL Sampling," Proc. 9th International World Wide Web Conference (WWW9), May 2000, pp. 295-308. Bar-Yossef Z., Berg A., Chien S., Fakcharoenphol J. and Weitz D., "Approximating Aggregate Queries about Web Pages via Random Walks," Proc. 26th VLDB Conference, Cairo, Egypt, 2000. Bharat K. and Broder A., "A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines," Proc. 7th International World Wide Web Conference (WWW9), April 1998, pp. 379-388. Lawrence S. and Giles C. L., "Searching the World Wide Web," Science, 5360(280):98, 1998. Lawrence S. and Giles C. L., "Accessibility of Information on the Web," Nature, 400:107-109, 1999. Hawking D., Craswell N., Thistlewaite P. and Harman D., "Results and Challenges in Web Search Evaluation," Toronto '99, Elsevier Science, pp. 243-252. Li L. and Shang Y., "A New Method for Automatic Performance Comparison of Search Engines," World Wide Web, Kluwer Academic, vol. 3, no. 4, Dec. 2000, pp. 241-247. Herrera F., Herrera-Viedma E. and Verdegay J. L., "A Linguistic Decision Process in Group Decision Making," Technical Report DECSAI-94102, Department of Computer Science and Artificial Intelligence, University of Granada, Spain, Feb. 1994. Shimura, M., “Fuzzy Sets Concept in Rank-Ordering Objects,” J. Math. Anal. Appl., vol. 43, 1973, pp.717-733. Yager R. R., "On Ordered Weighted Averaging Aggregation Operators in Multicriteria Decision Making," IEEE Trans. Systems, Man and Cybernetics, vol. 18, no. 1, January/February 1988, pp. 183-190. Herrera-Viedma E., Herrera F. and Chiclana F., " A Consensus Model for Multiperson Decision Making with Different Preference Structures," Technical Report DECSAI-99106, Department of Computer Science and Artificial Intelligence, University of Granada, Spain, April 1999. Shang Y. and Li L., "Precision Evaluation of Search Engines," World Wide Web: Internet and Web Information Systems, 5, 2002, pp. 159-173.
CHAPTER 24 Long-Range Prediction of Population by Sex, Age and District Based on Fuzzy Theories Pyong Sik Pak1 and Gwan Kim2 1
2
Graduate School of Information Science and Technology, Osaka University 2-1 Yamada-Oka, Suita, Osaka 565-0871, Japan
[email protected] Graduate School of Information Science and Technology, Osaka University 2-1 Yamada-Oka, Suita, Osaka 565-0871, Japan
[email protected]
Abstract: A method to predict population by sex, age and district over a long-range period is proposed based on fuzzy theories. First, a fuzzy model is described which is composed of a set of “if-then” rules to estimate the total social increase in each of 402 districts. Specific premises and consequences of the rules were constructed based on actual data, and these rules constitute fuzzy propositions and regression models, respectively. Second, a method to estimate the social increase by sex and age in each district is proposed based on a fuzzy clustering method for dealing with long-range socioeconomic changes in population migration. By the proposed methods, it became possible to predict the population by sex, age and district over a long-range period. Finally, results of the validity test of a constructed population model are presented.
Keywords: fuzzy modeling theory, Fuzzy c-Means, population, migration, social increase
1 Introduction In 21st century Japan, various social problems, such as a rapidly aging population and a drastic decrease of young people, are considered as likely to arise. Estimation of the population, its age structure, and the degree of aging in a local administrative government (district), is indispensable for the future planning of various policies in each district. This paper describes a method to predict the population by sex and age in each district over a long-range period based on fuzzy theories. The Kansai region of Japan, consisting of seven prefectures composed of 402 districts, as shown in Fig. 1, was adopted as an example region. The objective time
P.S. Pak and G. Kim: Long-Range Prediction of Population by Sex, Age and District Based on Fuzzy Theories, Studies in Computational Intelligence (SCI) 2, 337–353 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
338
P.S. Pak and G. Kim
periods adopted were 1980, 1985, and 1990 (five years were taken as one period) owing to the delay in procurement of necessary data and in the required adjustment work of a large amount of data.
Fig. 1. The Kansai region of Japan, consisting of seven prefectures composed of 402 districts
2 Outline of Prediction of Population by Sex, Age and District The population in each district can be forecasted by estimating its population migration (that is also called social increase) and closed population, which is calculated from birth and death. It is easy to estimate the future value of the closed population based on the fundamental equations in demography by using the population by sex and age, age-specific birth rate, and sex- and age-specific death rate [1]. Hence the problem of estimating the population is reduced to the one of estimating the social increase. Prediction of the social increase in each district, however, is not easy by using a conventional regression model (CRM), since it is difficult for CRMs to take into account the fact that the causes of migration vary according to districts and from year to year.
Prediction of Population by Sex, Age and District Based on Fuzzy Theories
339
In the proposed method, the total social increase in each district is predicted based on a fuzzy modeling theory and then the social increase by sex, age and district is estimated by utilizing a fuzzy clustering method.
3 Prediction of the Total Social Increase Based on a Fuzzy Modeling Method It is generally said that the factors of population migration depend on a possibility of procuring housing, job opportunities, degree of convenience of transportation system, and the level of living environments et al. These factors vary according to districts and year. By taking these characteristics into account, in this research the total social increase is estimated by applying a fuzzy modeling method [2]. 3.1 Outline of the Proposed Method The procedure for estimation is composed of two steps. In the first step, all the samples are classified into several rules. Then, the degree of membership belonging to nth rule, ωn , is calculated. The migration rate (which is defined as the ratio of the total social increase to the population at the previous period) and the total social increase were adopted as the variables of premises in this research, considering that these two values determine the major characteristics of the social increase in each district. In the second step, regression models for estimating the total social increase of j district j in rule n, denoted by yn (K), are constructed by using various socioeconomic variables or indicators, xi (K) (i= 1, 2, ...), which constitute consequences, as shown in (1). ynj (K) = bn0 +
∑ bni xi (K).
(1)
i
Here, the superscript j denotes district number ( j = 1-402). The argument K designates the time, and K = 0 denotes the year 1980, K = 1 the year 1985, and K = 2 the year 1990 and so on. The total social increase y j (K) in district j at time K is obtained by weighting j yn (K) with ωn , as shown in (2). y j (K) =
∑ ωn ynj (K)/ ∑ ωn . n
(2)
n
3.2 Construction of Premises In devising the premise, all the samples were classified into nine rules as shown in Fig. 2. In performing the classification, we tried to keep the size of samples belonging
340
P.S. Pak and G. Kim
Total social increase (person) S1 [R2] 2 9
15,000
S3
[R1]
[R4] [R3]
0
289
15
30
[R5] 161
100 -30
-15 15 -100
[R9]
30
[R7] [R6]
0
11
Migration rate (%)
0
636
S2 -15,000
S4
[R8]
0
19
Total number of districts : 1172
Fig. 2. Nine rules R1 through R9 classified by using two variables, and four special rules S1 through S4 (The numeric in each rectangle indicates the number of districts belonging to each rule)
to each rule roughly the same, although the obtained actual results differed much owing to differences in the total social increase and the migration rate. The samples used in the analysis were the time-series/cross-section pooled data with 1,172 samples except 34 samples (these 34 samples were not usable because of the change in the regional boundary due to merger or separation of administrative districts during the time period of analysis). Let denote each obtained rule as R1, R2, ... , R9. As can be seen from Fig. 2, R1 through R4 include the districts whose total social increase is positive, while R6 through R8 include the districts whose total social increase is negative. R5 is composed of the districts whose total social increase has a positive or negative value. Note that R9 had no districts or samples. Also note that there were four special rules named S1 through S4 which has at least one variable with a very large value. The two districts belonged to rule S1 were included into R1 in the next analysis. The rules S2, S3 and S4 had no samples and therefore were neglected in the following. Figure 3 shows membership functions of migration rate and the total social increase adopted to determine ωn . The detailed explanation of membership functions is omitted here.
Prediction of Population by Sex, Age and District Based on Fuzzy Theories Rule 1 1
Rule 2
1 0
27 30 33
1 0
15
1 0
12 1518
27 30 33
1
0 0.09 0.11
1 14 15 16
Rule 5
0 0.09 0.11
14 15 16
Rule 6 1
0 -0.11 -0.09 0.09 0.11
-16 -15 -14
-18 -15 -12
0
-16 -15 -14
1 0 -0.11 -0.09
Rule 8
Rule 7 1 0 -33-30 -27 -18 -15 -12
1 0 12 15 18
1
0 -18 -15 -12 12 15 18
14 15 16
Rule 4
1
0
0
27 30 33
Rule 3 1
341
1 0 -0.11 -0.09
1 0 -33-30 -27 -18 -15 -12
1 -16 -15 -14
0
The diagrams on the left-hand side denote the membership functions of migration rate (%) and the ones on the right-hand side the total social increase (thousand persons).
Fig. 3. Membership functions of migration rate and the total social increase on each rule
3.3 Construction of Consequences Regression analyses have been performed to estimate the total social increase in each j district yn (K). j Among various socioeconomic indicators affecting yn (K), the following six socioeconomic indicators were used as the consequence variables in this research [3]: increment in housing (ihous), increment in labor force supply (ilfs), increment in employment potential (iept), business and commercial employment used as the substitute variable of the land price (comm), population density (pdens), and manufacturing employment in production sectors (manu). Table 1 shows the results of the constructed regression model in each rule. As can be seen from Table 1, the total social increase is significantly related to an increment in housing, ihous, in all rules except R8. Concerning the districts belonging to R1 and R2, where either the migration rate or the total social increase has very large value, the regression model was constructed by using only the variable of ihous as the explaining variable. In R6, R7 and R8, it can be seen that variables such as ilfs (increment in labor force supply), comm (business and commercial employment), and pdens (population density) have been adopted to estimate the total social increase. The regression model obtained in R5 has the lowest R-value of 0.176. This shows that it was difficult to construct the regression model, since the districts included in R5 had very small values of both the total social increase and the migration rate. Thus this low accuracy causes no serious problems, because the estimates give nearly zero values. Although R-value of the obtained regression model in R5 is low, R-values of those in other rules have large values and the t-values of the estimated coefficients are statistically significant at the level of 1% except those in R8. Therefore, the constructed models are considered to be applicable for predicting the total social increase.
342
P.S. Pak and G. Kim
Table 1. Results of regression models constructed in each rule R1 (17) R2 (9) R=0.982 bi T R=0.899 bi T ihous 2.616 21 ihous 1.540 15 R3 (30) R4 (289) R=0.910 bi T R=0.661 bi T ihous 1.601 9.3 ihous 0.5216 14 461713 3.7 iept 164820 5.5 iept comm(-1) -0.0162 -3.5 R5 (161) R6 (636) R=0.176 bi T R=0.781 bi T ihous 0.0991 4.0 ihous 0.4050 8.6 -0.0544 -3.7 ilfs -0.6275 -20 ilfs comm(-1) -0.0250 -9.8 iept 55981 3.5 R7 (11) R8 (19) R=0.967 bi T R=0.687 bi T ihous 2.3816 4.7 ilfs -4.9234 -17 ilfs -0.3308 -2.2 pdens(-1) -80.125 -7.4 manu(-1) -0.1389 -2.0 Note: The numeric in the parenthesis indicates the number of districts belonging to each rule. R: multiple correlation coefficient, bi : partial regression coefficient, T: t-value, (-1): previous time period, ihous: increment in housing, ilfs: increment in labor force supply, iept: increment in employment potential, comm: business and commercial employment, pdens: population density, manu: manufacturing employment in production sectors
Results performed for testing the accuracy of the constructed models will be shown in Sect. 6.
4 Cluster Analysis of Migration Ratio by Sex, Age and District In this section, results of cluster analysis of the migration ratio by sex, age and district are described, which have been utilized to predict the social increase by sex and age in each district.
Prediction of Population by Sex, Age and District Based on Fuzzy Theories
343
4.1 Cluster Analysis of Migration Ratio by Sex, Age and District Based on a Fuzzy Theory The total social increase y j (K) in district j at time K can be estimated by the method described in the previous section. Hence, the social increase by sex and age in a j district at time K, denoted by ys,a (K), can be calculated if the migration ratio by sex j and age, denoted by µs,a (K), which is defined in the following equation, is predicted. j j j µs,a (K) = 100ys,a (K)/| ∑ ys,a (K)|.
(3)
s,a
Here, the subscripts s and a represent sex and age, respectively; and s = 1 denotes male and s = 2 female; a = 0 denotes 0 year old, a = 1 one year old and so on, and a = 85 designates 85 years old and over. j Since µs,a (K) have similar characteristics in many districts, an averaging operaj tion on µs,a (K) over districts is considered to be effective to investigate the characj teristics of µs,a (K). Furthermore it is considered easy to forecast the future migration ratios by sex and age in each district by making use of the average migration ratios obtained. From this point of view, a method of the ordinary cluster analysis was applied. That is, among 402 × 3 samples of migration ratios, districts with similar values of migration ratios were classified into the same cluster. However, this simple direct approach was not successful, since there were so many samples. Hence, in applying cluster analysis, the following approach consisting of two steps was adopted in this research [4]. In the first step, all the 402 × 3 samples were divided into several regions by using the two variables of the total social increase and the migration rate in district j, since these two values determine the major characterj istics of µs,a (K) in each district. In the second step, clustering the samples belonged to each region was carried out by applying Fuzzy c-Means (FCM) as the specific method of cluster analysis [5] to cope with the change in the future, as will be shown in Sect. 5. The detailed procedure of clustering is as follows. The data used in the analysis are the time-series/cross-section pooled data with 1,114 samples except 92 samples which have unusual values because of the smallness of the denominator in (3). (1) Division of samples: All samples were divided into the same eight regions obtained by using the two variables described in Sect. 3.2. In actually applying FCM, R5 was further divided into two sub-rules or subregions R5+ and R5− according to the value of the total social increase being positive or negative. This division made it possible to perform cluster analysis for respective regions, so that the number of samples could be remarkably decreased. j j (2) Normalization of µs,a : Since the values of µs,a differ greatly according to j districts, µs,a were normalized as shown in (4). j
j µs,a = µs,a /σ j j
where σ j shows the standard deviation of µs,a .
(4)
344
P.S. Pak and G. Kim
j
(3) Clustering by FCM: For normalized µs,a belonged to each region, FCM was carried out. In applying FCM, an adjustable parameter (the coefficient of fuzziness) was set to be 1.2 by the results of trial and error so that patterns as distinct as possible could be obtained. From the results of FCM, a matrix U consisting of membership values uJj and the average migration ratio in each cluster µJs,a (where J denotes cluster number), were obtainable. Note that the value of uJj indicates the degree of belonging to cluster J for district j. (4) Conversion of µJs,a to µJs,a : By using (5), µJs,a can be converted into µJs,a . µJs,a = µJs,a · σJ
(5)
where σJ = ∑mj=1 uJj · σ j / ∑mj=1 uJj and m denotes the number of districts belonged to cluster J. 4.2 Obtained Migration Patterns and Their Major Characteristics As the result of FCM, all samples were classified into 20 clusters (J=1-20), and the average migration ratios were obtained. We refer to the obtained average migration ratios in cluster J, µJs,a , as the migration pattern in cluster J in the following. The obtained migration patterns for male and female of 20 clusters are shown in Figs. 4 and 5, where the axis of ordinate denotes the value of µJs,a and the axis of abscissa age. In R1, R2 and R3 there are two clusters, which have similar migration patterns. R4 and R6, to which many districts belong, are classified into three clusters, which have different patterns. Increasing the number of clusters more than three was not adopted, since only the clusters having similar migration patterns were obtained. R5+ consists of two clusters, #10 and #11, and the migration pattern in cluster #10 shows fluctuating values as shown in Fig. 4(e). This is because both the values of the total social increase and migration rate are close to zero. As seen from Figs. 4 and 5, the migration patterns in each cluster are mutually different. However, they have the following characteristics in general. The migration pattern of female is similar to that of male. The values of µJs,a are very small for the ages higher than 55. The values of µJs,a of children aged less than 15 are mainly determined by those of their parent with corresponding ages of 30-49. The districts with patterns of cluster #14 in R6 and of cluster #19 and cluster #20 in R8 are located in highly developed districts such as in Osaka, Kyoto, and Kobe. In these districts younger labor forces immigrate; however, at the ages of 25-34, when most of them married and need new houses, many of them emigrate to districts in R1-R3. In districts in R1-R3, almost all aged people immigrate. The districts with patterns of the clusters in R5− and R7 are located in the periphery of the Kansai region, and in these districts most people emigrate; especially, young labor forces who have graduated from high schools or colleges, to seek better job opportunities. It is known through time-series analyses that the migration pattern in each district varies in general in accordance with the socioeconomic changes in each district [6]. The detailed explanation, however, is omitted here.
Prediction of Population by Sex, Age and District Based on Fuzzy Theories Cluster #1
3
Migration ratio (%)
Migration ratio (%)
Male Female
2
1 0 -1 -2 -3
Cluster #2
3
Male Female
2
1 0 -1 -2
0
10
20
30
40 50 Age
60
70
80
-3
90
0
10
20
30
40 50 Age
60
70
80
90
(a) Cluster #1 and cluster #2 in R1 Cluster #3
Male Female
6 Migration ratio (%)
Migration ratio (%)
Cluster #4
Male Female
6 4 2 0 -2
4 2 0 -2
0
10
20
30
40 50 Age
60
70
80
90
0
10
20
30
40 50 Age
60
70
80
90
(b) Cluster #3 and cluster #4 in R2 Cluster #5
3
Cluster #6
3
Male Female
Male Female
1
0
-1
2
Migration ratio (%)
Migration ratio (%)
2
0
10
20
30
40 50 Age
60
70
80
1
0
-1
90
0
10
20
30
40 50 Age
60
70
80
90
(c) Cluster #5 and cluster #6 in R3 Cluster #7
12
Migration ratio (%)
Migration ratio (%)
Male Female
8
4 0 -4 -8
-12
Cluster #8
12
Male Female
8
4 0 -4 -8
0
10
20
30
40 50 Age
60
70
80
90
-12
0
10
20
40 50 Age
60
70
80
90
Male Female
8 Migration ratio (%)
30
Cluster #9
12
4 0 -4 -8
-12
0
10
20
30
40 50 Age
60
70
80
90
(d) Cluster #7, cluster #8 and cluster #9 in R4 Cluster #10
15
5
Migration ratio (%)
Migration ratio (%)
Male Female
10
0 -5
5 0 -5
-10
-10
-15
-15
-20
-20 -25
Cluster #11
15
Male Female
10
-25 0
10
20
30
40 50 Age
60
70
80
90
-30
0
10
20
30
40 50 Age
60
70
(e) Cluster #10 and cluster #11 in R5+ Fig. 4. Migration patterns in cluster R1 to R5+
80
90
345
P.S. Pak and G. Kim
Cluster #12
15
Migration ratio (%)
Migration ratio (%)
0 -5
-10
5 0 -5
-10
-15
-15
-20
-20
-25
-25
-30
-30 0
10
20
30
40 50 Age
60
70
80
90
-35
0
10
20
30
40 50 Age
(a) Cluster #12 and cluster #13 in R5− Cluster #14
8
60
Migration ratio (%)
0
-8
-12
80
90
Male Female
4
-4
70
Cluster #15
8
Male Female
4 Migration ratio (%)
Male Female
10
5
-35
Cluster #13
15
Male Female
10
0 -4 -8
-12
-16
-16 0
10
20
30
40 50 Age
60
70
80
90
0
10
20
40 50 Age
60
70
80
90
Male Female
4 Migration ratio (%)
30
Cluster #16
8
0 -4 -8
-12 -16 0
10
20
30
40 50 Age
60
70
80
90
(b) Cluster #14, cluster #15 and cluster #16 in R6 1
Male
Cluster #17 Female
1
-1 -2 -3 -4 -5
Male
Cluster #18 Female
0 Migration ratio (%)
Migration ratio (%)
0
-1 -2 -3 -4
0
10
20
30
40 50 Age
60
70
80
-5
90
0
10
20
30
40 50 Age
60
70
80
90
(c) Cluster #17 and cluster #18 in R7 Cluster #19
5
Male Female
4 Migration ratio (%)
3 2 1 0 -1 -2 -3
Cluster #20
5
Male Female
4 Migration ratio (%)
346
3 2 1 0 -1 -2
0
10
20
30
40 50 Age
60
70
80
90
-3
0
10
20
30
40 50 Age
60
70
(d) Cluster #19 and cluster #20 in R8 Fig. 5. Migration patterns in cluster R5− to R8
80
90
Prediction of Population by Sex, Age and District Based on Fuzzy Theories
347
5 Prediction of Migration Ratio by Sex, Age and District 5.1 Procedure for Estimating Migration Ratio j
As mentioned in Sect. 4, prediction of social increase by sex and age ys,a (K) in j district j becomes feasible, if migration ratio µs,a (K) in district j in the future can j be predicted. However, it is difficult to directly estimate migration ratio µs,a (K) in the future in district j, although prediction of various socioeconomic indicators in j each district is possible. Hence, to predict µs,a (K), we first determine to which cluster district j belongs based on the predicted socioeconomic indicators, and then calculate j µs,a (K) by making use of the obtained migration pattern µJs,a . As for the variables determining to which cluster each district belongs, the 12 indicators shown in Fig. 6 were used based on the results of [1]. j The procedure to predict the future value of µs,a (K) in district j is as follows. By using the actual values of the indicator p (p = 1 − 12) in district j ( j = 1 − j 402) at the time period K (K = 0 − 2), denoted by x p , we calculate the average value and the standard deviation, denoted by xˆJp and σJp , by using (6) and (7), respectively. xˆJp = ) σJp
=
m
m
j=1
j=1
∑ uJj · x pj / ∑ uJj ,
m
∑
(uJj j=1
) j · x p − xˆJp )2 /
(6) m
∑ uJj
(7)
j=1
where uJj is the membership values obtained from the results of applying FCM and m denotes the number of districts belonged to cluster J. When the predicted value of indicator p for district j at time period K (K = j 3, 4, ...) denoted by x p (K) is given, the value of the membership function for the indicator p in cluster J, denoted by νJj,p (K), is obtained from (8). ( * j (x p (K) − xˆJp )2 J ν j,p (K) = exp − (8) φ · σJp 2 where φ is an arbitrary parameter, and 0.5 was used. The degree of belonging to cluster J of district j, denoted by wJj (K), can be calculated as wJj (K) =
12
12 20
p=1
p=1 J=1
∑ νJj,p (K)/ ∑ ∑ νJj,p (K).
(9)
j
The future value of µs,a (K) can be estimated by weighting µJs,a with wJj (K) as j (K) = µs,a
20
∑ wJj (K) · µJs,a .
J=1
(10)
348
P.S. Pak and G. Kim
[1] Overall characteristics of migration pattern
Rate of migration
[2] Migration factors Accessibility to job Job opportunity opportuIncrement in nity labor force demand
Housing
Increment in labor force supply Rate of population Supply and forming demand newly-wed family of Supply of housing residential land Land price
Living environments
Amenity
(1) Migrants / total population (2) Employment potential (3) Increment in employment potential (4) Increment in labor force supply (5) 25-34years old population / total population (6) Area of residential land / area of usable land
Population density
(7) Area of business & commercial land / area of usable land Total population / area of usable land
Degree of urbanization
Total employment / area of usable land
Degree of preservation of natural environment Income difference Degree of usability of transportation system
Outmigration caused by urbanization [3] Migration to get Rate of population higher education requiring higher education
(not used owing to high correlation with the indicator #6) (not used owing to high correlation with the indicator #7)
(8) Employment in primary industry / total employment (9) Income difference rate (10) Trip time of commuting purpose from Center Ward
(11) Business & commercial employment / total employment (12) 15-24years old population / total population
Fig. 6. Twelve indicators used for estimating migration patterns
5.2 Estimated Results Figure 7 shows the value of the membership function νJj,p in R4 when the trip time of commuting purpose is changed, as an example of the obtained membership functions. Parts of estimated migration patterns are shown in Figs. 8-10. Figure 8 shows the estimated result in Kameoka City, Kyoto prefecture in 1990. In this example, the belonging values to cluster #7 and #9 in R4 shown in Fig. 8(a) and (b) were calculated to be 0.43 and 0.57, respectively. By weighting the corresponding two migration patterns with the values of 0.43 and 0.57, the intermediate pattern shown
Value of the membership function
Prediction of Population by Sex, Age and District Based on Fuzzy Theories
349
1 Cluster #7 Cluster #8 Cluster #9
0.8 0.6 0.4 0.2 0
60
80
100
120 140 Trip time (minutes)
160
180
200
Fig. 7. Membership function νJj,p in R4 when the trip time of commuting purpose from Center Ward is changed
in Fig. 8(c) was synthesized. Figure 8(d) shows the actual pattern. As can be seen from Fig. 8(c) and (d), the estimated pattern agreed well with the actual one. Figures 9 and 10 show the results of Sennan City in Osaka prefecture and Kizu town in Kyoto prefecture, respectively. As can be seen from these examples and from (10), an intermediate migration pattern among 20 patterns can be appropriately synthesized by using the values of wJj (K). Cluster #7
12
4
0
-4
4
0
-4
-8
-8
-12
-12
0
10
20
30
40 50 Age
60
70
80
90
(a) Migration pattern of cluster #7 (Belonging value: 0.43)
10
20
30
40 50 Age
60
70
80
90
5
Male Female
4
Male Female
4
3
Migration ratio (%)
Migration ratio (%)
0
(b) Migration pattern of cluster #9 (Belonging value: 0.57)
5
2
1
0
-1
3
2
1
0
-1
-2
-2
-3
Male Female
8
Migration ratio (%)
Migration ratio (%)
8
Cluster #9
12
Male Female
-3
0
10
20
30
40 50 Age
60
70
80
(c) Estimated pattern
90
0
10
20
30
40 50 Age
60
70
80
90
(d) Actual pattern
Fig. 8. Estimated and actual migration pattern in Kameoka City, Kyoto prefecture in 1990
350
P.S. Pak and G. Kim
Cluster #14
8
Migration ratio (%)
Migration ratio (%)
Male Female
4
0 -4 -8
-12
0 -4 -8
-12
-16
-16 0
10
20
30
40 50 Age
60
70
80
90
0
(a) Migration pattern of cluster #14 (Belonging value: 0.50) 4
10
20
30
40 50 Age
60
70
80
90
(b) Migration pattern of cluster #16 (Belonging value: 0.40) 4
Male Female
Male Female
2
2 Migration ratio (%)
Migration ratio (%)
Cluster #16
8
Male Female
4
0
-2 -4
0 -2 -4
-6 0
10
20
30
40 50 Age
60
70
80
90
-6
@
0
(c) Estimated pattern
10
20
30
40 50 Age
60
70
80
90
(d) Actual pattern
Fig. 9. Estimated and actual migration pattern in Sennan City in Osaka prefecture in 1990 Cluster #1
3
1 0 -1 -2 -3
1 0 -1 -2
0
10
20
30
40 50 Age
60
70
80
-3
90
(a) Migration pattern of cluster #1 (Belonging value: 0.04)
0
10
20
30
40 50 Age
60
70
80
90
(b) Migration pattern of cluster #2 (Belonging value: 0.96)
3
3
Male Female
Male Female
2 Migration ratio (%)
2 Migration ratio (%)
Male Female
2 Migration ratio (%)
Migration ratio (%)
2
Cluster #2
3
Male Female
1 0 -1 -2
1 0 -1 -2
-3 0
10
20
30
40 50 Age
60
70
80
90
(c) Estimated pattern
-3
0
10
20
30
40 50 Age
60
70
80
90
(d) Actual pattern
Fig. 10. Estimated and actual migration pattern in Kizu town in Kyoto prefecture in 1990
Prediction of Population by Sex, Age and District Based on Fuzzy Theories
351
6 Fuzzy Population Model For estimating the total social increase and determining to which cluster district j belongs, it is required to estimate various socioeconomic indicators such as increment in housing, business and commercial employment, et al. Hence, the population model has to be integrated with other models to estimate these indicators. Figure 11 shows the structure of the population model constructed. The explanation of the developed population model is omitted here, since it has the same structure of the one constructed before [7],[8], although the contents are quite different. Population
Labor Force Supply
a
Population by Sex and Age
d Marriage Rate of Women
Birth & Death by Year
Labor Force
Birth Rate Death Rate
-
+
Labor Force Participation Rate
+
Migration Patterns of 20 Clusters
Increment in Housing Stock
-
Housing Stock
+
-
b
c
-
Time Distance Model
Increment in Potential of Job Opportunity
Housing Starts
a
d
Time Distance between Districts
Land Use Model
Income Difference Rate
Demolishing Rate
Migration(Social Increase)
+
Increment in Labor Force Supply
Demolished House
Various Socioeconomic Indicators
a
+
Employment by Industries
a c d
-
+
b
c
-
Total Social Increase
Migration Pattern by Sex and Age
Potential of Job Opportunity
Closed Labor Force
Closed Population by Sex and Age
Social Increase by Sex and Age
Demand for Labor Force Employment Model
d
Land Use Area by Use Category
a
b
d
d Housing Supply
Concurrent Relation Relation with Delay of One Period
Fig. 11. Structure of the population model constructed
The validity of the constructed fuzzy population model has been tested by comparing the predicted social increase and population in each district with the actual data. Figure 12 shows the results of comparison between the actual and predicted total social increase in 1990, where predicted values were obtained by using a conventional regression model (CRM) [1] and the fuzzy model, starting 1980 as the initial year. In Fig. 12, the axis of ordinate represents the predicted values and the axis of abscissa the actual ones. The correlation coefficient between the actual values and the predicted ones, denoted by r, was 0.776 when the CRM was used, while that by using the fuzzy model was 0.942. Therefore, it was seen that the fuzzy model could estimate the total social increase more precisely than the CRM. Figure 13 shows the comparison of predicted and actual productive and aging population in 1990. It can be seen from Fig. 13 that the predicted population in each district agrees well with the actual one.
P.S. Pak and G. Kim
50
Predicted value (thousand persons)
Predicted value (thousand persons)
352
r = 0.776
40 30 20 10 0 -10 -20 -30 -40 -40
-30
-20
-10
0
10
20
30
40
50
50
r = 0.942
40 30 20 10 0 -10 -20 -30 -40 -40
-30
Actual value (thousand persons)
-20
-10
0
10
20
30
40
50
Actual value (thousand persons)
(a) Conventional regression model
(b) Fuzzy model
Fig. 12. Results of comparison of partial test on the total social increase in 1990 80
r=0.998
Predicted value (thousand persons)
Predicted value (thousand persons)
600
500
400
300
200
100
0
0
100
200
300
400
500
Actual value (thousand persons)
(a) Productive population (20-64 years old)
600
r=0.996 70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
Actual value (thousand persons)
(b) Aging population (over 65 years old)
Fig. 13. Comparison of predicted and actual population in 1990
7 Conclusion This paper described a method of predicting the population by sex and age for each of 402 districts in the Kansai region, Japan, based on fuzzy theories. First, to predict the total social increase for 402 districts by directly taking into consideration of differences in factors of migration in each district, eight rules were set up by using the migration rate and the total social increase in each district as the premise variables. Regression models were constructed in the consequences which use various socioeconomic indicators as explaining variables. The future value of the total social increase in each district can be obtained by weighting the values calcu-
Prediction of Population by Sex, Age and District Based on Fuzzy Theories
353
lated from the estimated regression models with the membership values denoting the degree of belonging to each rule. Second, a method to estimate the social increase by sex and age in each district was proposed based on a fuzzy clustering method for dealing with long-range socioeconomic changes in population migration by sex, age and district. All the 402 × 3 samples of the migration ratio by sex and age were classified into 20 clusters by applying Fuzzy c-Means and the average migration pattern in each cluster µJs,a (J = 1 − 20) was obtained. j The future migration ratio µs,a (K) in district j (s = 1 − 2, a = 0 − 85, j = 1 − 402, K = 3, 4, ...) can be synthesized by weighting the migration pattern in each cluster µJs,a with the values of wJj (K), which is obtained by using 12 socioeconomic indicators in each district, denoting the degree of belonging to each cluster. It was confirmed that the migration ratios synthesized agreed well with the actual ones. The validity of the constructed fuzzy population model was tested by comparing the estimated social increase and population with the actual data, respectively. It was shown that the constructed fuzzy model could estimate social increase and population more precisely than the conventional regression model. The constructed population model is considered to be a great help in making various regional plans to cope with long-term socioeconomic changes.
References 1. Pak P, Suzuki Y, Kim G (1988) Multizonal simulation model of population, J. of Urban Planning and Development, ASCE, 114:91-109 2. Takagi T, Sugeno M (1985-1) Fuzzy identification of systems and its applications to modeling and control, IEEE Trans. on Systems, Man Cybernetics, Vol.SMC-15:116-132 3. Kim G, Pak P (2000) Long-Range Forecasting of Social Increase by Districts Based on Fuzzy Modeling Method and Its Application, JSST (Japan Society of Simulation Technology) Int. Conf. on Modeling, Control and Computation in Simulation:401-406 4. Pak P, Kim G (2002-11) Long-Range Prediction of Population by Sex, Age and District Based on Fuzzy Theory, 2002 International Conference on Fuzzy Systems and Knowledge Discovery (FSKD’02), Singapore, 318-323 5. Bezdek C (1981) Pattern Recognition with Fuzzy Objective Function Algorithm, Plenum Press, New York 6. Pak P, Suzuki Y (1981) Analysis of Population Migration by sex, age, and district in Osaka Prefecture, T. of Urban Planning, No.16:229-234 (in Japanese) 7. Kim G, Pak P, Suzuki Y (1992) Comprehensive regional socioeconomic simulation system, J. of Urban Planning and Development, ASCE, 118:81-96 8. Pak P, Kim G (2002-10) A Model to Estimate Population by Sex, Age and District Based on Fuzzy Theory, 2002 International Conference on Control, Automation and Systems, Muju, Korea, 286-289
CHAPTER 25 An Efficient Information Retrieval Method for Brownfield Assessment with Sparse Data 1 and Melek Basak Demirhan2 ¨ Linet Ozdamar 1
Nanyang Technological University, School of Mechanical and Production Engineering, Systems and Engineering Management Division, 50 Nanyang Avenue, Singapore 639798. e-mail:
[email protected],
[email protected] Yeditepe University, Dept. of Systems Engineering, Kayisdagi, 81120 Istanbul, Turkey. e-mail:
[email protected]
2
Abstract: Environmental site investigations are carried out prior to the reclamation of industrially contaminated sites. In this study, an information retrieval approach involving a fuzzy partitioning scheme is proposed to identify the topology of polluted areas in a given industrial site. While its primary objective is to locate contaminated zones accurately with a minimal number of samples, the second target is to reduce the size of the investigation area subjected to re-sampling in the next phase. The performance of the mentioned approach is demonstrated here, using results from real industrially contaminated sites.
Keywords: fuzzy partitioning, information retrieval, environmental site characterization
1 Introduction The issue of preserving the natural environment becomes more and more crucial as the process of industrialization gains speed. Brownfields that are ex-industrial sites usually contain health hazardous contaminants in soil and also lead to groundwater pollution. Further, brownfields may be re-used for urban purposes and, therefore, they need to be investigated in detail. An accurate characterization of potentially contaminated brownfields is essential before reclaiming takes place, since they carry high health risks. The process of site characterization is carried out more effectively if a preliminary screening action is employed with the following two objectives taken into con
This research is part of the project PURE-EVK1-1999-00153: Protection of Groundwater Resources at Industrially Contaminated Sites under EU 5th Framework R&D Program.
¨ L. Ozdamar and M.B. Demirhan: An Efficient Information Retrieval Method for Brownfield Assessment with Sparse Data, Studies in Computational Intelligence (SCI) 2, 355–366 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
356
¨ L. Ozdamar and M.B. Demirhan
sideration simultaneously. These are: minimizing the cover of the potentially contaminated area that will undergo a more expensive and detailed site investigation phase, and achieving a high precision in determining the contaminant distribution over the site. The latter objectives should be attained with minimal sampling and lab analysis costs. An efficient information retrieval method that works on an initial set of samples from the site is proposed here. The goal is to classify the site into polluted and non-polluted regions. After this first screening is completed, the next phase of the investigation may start and remedial action follows. In the next section, conventional site characterization approaches are briefly discussed. Next, a formal definition of site characterization is given, followed by a description of the proposed Fuzzy Tessellation approach (FTA). Finally, FTA is implemented on real data from two sites and the cost reduction that can be achieved by FTA is illustrated.
2 Conventional site assessment approaches In site characterization, the opportunity of having an exhaustive sampling that is sufficiently representative of the site is seldom. Instead, a sparse data set is available for site assessment. This results in the necessity of making inferences over the whole site using limited sample information. Hence, the level of uncertainty is usually high. The conventional approach to handle uncertainty is to use statistical methods based on probability theory. However, in spatial data analysis, samples are not independent and identically distributed random variables and this requires methodologies that deviate from conventional statistics. Spatial correlation (continuity) is a very important feature in earth science data and it is assumes that two data points that are spatially close to each other are more likely to have similar values than two data that are far apart [8]. The most important features of a spatial data set are the overall trend, the degree and direction of spatial correlation, locations of extreme values and locations of values over a threshold. For spatial data analysis, a general class of methods called Spatial Interpolation Methods (SIM) were introduced [1, 2, 5, 6, 7]. SIM are used in many contexts such as image reconstruction, remotely sensed satellite imagery, identification of mining resources, and topographical mapping. In the more limited context of earth sciences these methods are called geo-statistics and they are concerned with the study of phenomena that fluctuate in space or time. These approaches are mostly based on interpolation and they estimate the values of characteristics at unsampled locations and usually make use of the spatial correlation existing in most earth science data. Geo-statistics provide many deterministic and probabilistic tools to model existing spatial correlation and variability. Also, they change the entire methodology of sampling because they eliminate the need to avoid auto-correlated data, which is a requirement in traditional sampling methods. The emphasis is on a mapping of spatially distributed populations rather than on the estimation of averages. In brief, geostatistical methods estimate the surface topology by using a spatial correlation
An Efficient Information Retrieval Method
357
(correlogram) model and its estimated parameters. Among many geostatistical point estimation methods Ordinary Kriging (OK) is of special importance due to its distinguishing statistical aim of minimizing the error variance and its wide use in practice. In Kriging, a model of the data (a variogram) is built to achieve the latter objective [4]. Once a variogram model is selected, its parameters are estimated and the resulting model is used to estimate surface topology. As stated in [7] there is no universally accepted unique algorithm for determining a spatial correlation model. Hence, the same data set can yield quite different results. Interpolation methods have an additional disadvantage of involving a smoothing effect. The latter misleads the identification of highly contaminated small areas surrounded by clean regions. Unlike many geostatistical methods, the proposed Fuzzy Tessellation Approach (FTA) exploits the available sample information without relying on any assumptions about the statistical or spatial distribution of the contaminants. Therefore, there is no need for the identification of an appropriate spatial correlation model. Furthermore, FTA does not involve any interpolation, and hence, does not suffer from smoothing effects.
3 Preliminaries In environmental site characterization, a sample of size n is collected from locations w j = (x j , y j ) for j = 1, 2 . . . n, within the site of interest, α0 . These are subjected to laboratory tests and their contaminant concentration values f (w j ) are identified. The information, (w j ; f (w j )) consists of the location of the samples and their concentration values. It is utilized to determine the spatial distribution of contaminants over the site. In our context, the determination of the spatial distribution of contaminants is approached by identifying non-potential regions, hence classifying the complementary parts as “potentially contaminated”. This approach is carried out by partitioning any regular cover C (usually a rectangle with vertices located at v0 = (min x, min y), v1 = (max x, min y), v2 = (max x, max y), and v3 = (min x, max y)) of the site S into two complementary sets α and α representing the set of non-potential and potential contaminated zones respectively. Here any subset, αi ⊂ α0 is called a non-potential block if it does not contain any w ∈ α . Thus, the problem can be stated as follows. Identify αi ⊂ α, iεI given (w j ; f (w j )) for j = 1, 2, . . . n.
(1)
Expression (1) implies that the exact boundaries of sub-spaces whose contamination levels are below a threshold level (thrs) should be identified. The distribution of the contaminant over α0 is not explicitly given as a mathematical expression, that is, f (w) is unknown. Rather, we have (w j ; f (w j )), j = 1, 2, . . . n, locations being the inputs and observations the outputs. Therefore, the problem expressed by Expression (1) is a black box problem to be solved with only a finite number of samples.
358
¨ L. Ozdamar and M.B. Demirhan
4 Fuzzy Tessellation Approach: An Information Retrieval Method FTA is a novel approach in site characterization where spatial inferences are based only on the sample data without using any correlation model. It uses fuzzy inferencing in classifying regions in α0 . Three features of FTA are listed below. i. FTA works with an initial sample set S0 collected from α0 to be used in the preliminary investigation. The size of S0 should be minimal since sampling and lab costs might require a significant budget depending on geological properties of α0 . A second round of sampling involves serious setup costs concerning the mobilization of the equipment to the field and transfer of samples to the lab. Transfer of samples might involve lengthy official procedures. ii. FTA uses an overlapping partitioning scheme in order to utilize the information extracted from the given set of data more efficiently. This approach also leads to increased reliability on regional inferences. iii. FTA’s inferencing is based on securing non potential zones and discarding them. The site topology resulting from FTA is identified as the complementary potentially contaminated area. We define the set of locations claimed to be non-potential by FTA by A and the complementary set by A . FTA provides non-statistical estimates A and A for the sets α and α respectively, without making use of any statistical means such as global or local estimation. Although sets α and α represent mutually exclusive sets, + A A = φ does not have to apply, because both sets involve spatial uncertainty and can be treated as fuzzy sets. Unless the exact topology of the site is known, the uncertainty (uncertainty introduced by lack of information) is not zero. However, as the lack of information decreases, the uncertainty related to the topology of the site decreases too, i.e., the fuzzy set A approaches to the crisp set α. 4.1 Partitioning the site In site assessment, the solution for the problem stated in (1) has to be achieved using S0 , i.e., α0 has to be tessellated into clusters A and A ( non potential block estimates of α and α respectively) with a finite number of data points. Hence, the problem can be re-stated as: Identify the non-potential block estimate A given (w j ; f (w j )) for j = 1, 2, . . . , n.
(2)
The overlapped partitioning scheme approach utilized in FTA aims at extracting information more efficiently by enabling data re-utilization, and hence, obtaining a better estimate for A. A definition of overlapped partitioning is given below.
An Efficient Information Retrieval Method
359
Definition 1. A block is partitioned into k overlapping sub-blocks with extension rate ER if for any two sub-blocks, αi and α j , αi ∩ α j = φ and their overlapping ratio is equal in size. Overlapped partitioning has two advantages that are discussed below. Exploitation of data: FTA partitions the site into smaller regions to identify relatively more homogeneous sets of locations that can be reliably classified as nonpotential. The procedure begins with a global measure (or attribute) belonging to the whole site. This global measure indicates the potential of having contamination in the site. Then, partitioning iterations are executed and at every cycle, the global measure is applied to the partitions generated by the overlapped partitioning strategy. Thus, the global potential estimator is localized. Due to the overlapping property of this scheme, when αi is partitioned, neighboring blocks share samples and these contribute to the global measure of the neighbor blocks. With consecutive parallel partitioning iterations taking place, the number of blocks to which a sample contributes simultaneously increases exponentially and hence, the sample is included in many clusters some of which might fit its concentration level better. Re-partitioning iterations on a block αi stop either when a stopping criterion is reached or when it has zero potential. Efficiency: An algorithm is efficient in terms of information retrieval if it draws reliable conclusions from a given set of data. The following discussion shows that an overlapped partitioning algorithm results in a more reliable classification as compared to a non-overlapped one [3] and is therefore more efficient. FTA requires multiple control on any block before it classifies αi in A. The three conditions that have to hold for classifying a block in A are listed below. Condition I. αi has zero potential. Condition II. ∀ w j ε. αi , f (w j ) < trhs. + Condition III. All blocks α j , where αi α j = φ should have zero potential. Here, trhs denotes the legal threshold value (mg/l) for the pollutant of interest. The major contribution to the reduction of misclassification risk is provided by the third condition because the scheme of overlapped partitioning itself imposes multiple control on a block’s neighbors as well. In an non-overlapped partitioning approach, reliability, Rel, is defined to be the probability that a potential block αi is correctly classified in A . That is, Rel(αi ) = 1 − Pr(αi is classified in A | αi = 0).
(3)
Here, αi = 0 indicates that the block is contaminated. On the other hand, in overlapped partitioning, Rel(αi ) = 1 − Pr{[αi is classified in A
∀α j : α j
α j is classified in A,
αi = φ] | αi = 0}.
(4)
360
¨ L. Ozdamar and M.B. Demirhan
The fact that Rel(αi ) in Equation (3) ≤ Rel(αi ) in Equation (4) shows that the overlapped strategy is more reliable. 4.2 Block Assessment Definition 2. The membership function µA (w j ) measures the extent that w j belongs to the fuzzy set A. Similarly, µA (w j ) is defined to be the membership function that represents the location’s degree of belonging to the contaminated homogeneous cluster A . Hence a location w j may belong to the fuzzy sets A and A with membership values not necessarily summing up to one. In the context of site characterization, an arbitrarily selected, monotone non-decreasing function may serve as a membership function. Linear, Gaussian and Sigmoid membership functions are examples of such functions. After determining the membership values µA (w j ) for all observed locations w j , a block measure associated with each subset is defined as follows. Definition 3. The potential, ri , of a block αi is defined to be a block measure that aggregates µA (w j ) of all observations w j ε αi . The potential of a block is also a mapping into the unit interval and it can be interpreted as the degree of membership that the block belongs to the fuzzy sets A and A . 4.3 The Procedure Step 1. Define a regular cover of the site, α0 . Initialize the set of existing blocks as C = α0 . Step 2. Divide each block αi ε C into k overlapping blocks, α1 ,α2 , . . ., αk (without loss of generality we assume that k = 2). Replace the parent by k sub-blocks αi . Update C accordingly. Step 3. Compute the potentials, ri of all sub-blocks αi ε C. Either classify blocks in cluster A partially or as a whole, or preserve them for re-partitioning. (When αi is declared as a member of cluster A, then it is totally or partially exempt from being considered in the remaining iterations, i.e., C = C − {αi }). Step 4. If C = φ, then stop. Else go to Step 2. Once the algorithm completes its iterations, all blocks are assigned to one of the sets, A, or A , resulting in a final tessellation of the site α0 .
5 Illustrations on Real Data The performance of FTA is illustrated by utilizing real data from two sites. The details of the data are confidential, however, the sizes of the sites are accepted to be
An Efficient Information Retrieval Method
361
moderately large (about 50 and 39 ha., respectively) and they have been characterized with reasonable numbers of boreholes drilled according to regulations. To give an idea about site assessment costs, the first site has required a budget of over one million Euros, with 70% dedicated to lab analysis, and 20% to the collection of samples. For illustrating FTA’s performance in site assessment, a number of data is picked up randomly in each grid from the whole set of borehole data that is already available from past site investigations. An iterative sampling procedure where the grid is further reduced is implemented to observe the accuracy of the contaminant topology obtained with different numbers of samples. The reduction in sampling and analysis costs resulting from the use of FTA is thus observed. 5.1 The First Site In the first site, FTA characterizes the site initially with one third of the number of data that are actually available. Then, in each consecutive re-sampling iteration, an additional sample is collected from each grid. Since the available data pattern is not uniform, the number of samples in any iteration is not twice the number of samples in the previous iteration. By using this sampling strategy, the effects of increasing numbers of samples are observed on FTA’s performance and the extent of cost savings can be measured. The re-sampling iterations continue until the full set of data are included in the sample set. In this manner, the initial grid size of 100m x 100m is gradually reduced to 25m x 25m. The true topology of contaminant distribution cannot be known with accuracy on a real site. Namely, the sets of contaminated and non-contaminated locations, α and α in α0 , are not known. Consequently, a performance measure is devised. This measure is denoted as Cover% and it is the percentage of contaminated data covered by FTA out of the whole set of available contaminated data. Naturally, in early iterations, FTA works with more sparse data than the whole set and it is presumed that if the areas claimed to be contaminated include a high percentage of contaminated data in the whole set (most of which are unknown to FTA at the current sampling iteration), then the method should be deemed as successful. In the first set of data, 20% of the whole set is contaminated and contaminant concentrations vary within a very wide range of values. The results obtained in 7 sampling iterations and those obtained with the whole set of data are given in Table 1. The first column indicates sampling iteration number, and the second column denotes the percentage of the whole set of data that is made available to FTA at the current sampling iteration. The third column indicates the percentage of the whole set of contaminated data that is known by FTA at the current iteration. The next column indicates Cover% and the next indicates another performance measure denoted as Area%. Area% is the ratio of FTA’s coverage of contaminated regions to the area of the whole site. In the first iteration, FTA works with only one third of the data, but is able to include 87% of the whole set of contaminated data in its coverage. This is accomplished with only 32% of the whole set of contaminated data known by FTA in this
362
¨ L. Ozdamar and M.B. Demirhan
iteration. In the next sampling iterations, as more contaminated data made available to FTA, an increasing percentage of Cover% is achieved. In the fourth iteration, with 67% of the data now available for FTA, 100% of contaminated data are covered. The Area% covered is about 51% of the site. Thus, savings of 33% are achieved on sampling and lab analysis costs while reducing the area to be re-investigated by half. When the full set of data is utilized, a similar cover topology is obtained, however, it is somewhat more refined. Figure 1 illustrates the cover obtained in the second iteration and Fig. 2 depicts the one obtained with the full data.
Fig. 1. Topology of the cover obtained in the second iteration (first site). (Crosses in circles indicate known contaminated data, plain crosses indicate contaminated data unknown to FTA)
Table 1. Results on the first site Iter. 1 2 3 4 5 6 7 Full set
% of available data % of known cont. data Cover % Area % 32.88 46.64 57.71 67.78 75.50 81.20 85.23 100.00
31.66 43.33 53.33 65.00 75.00 81.66 86.66 100.00
86.66 91.66 91.66 100.00 100.00 100.00 100.00 100.00
36.79 36.35 37.61 51.80 51.05 49.83 49.32 47.79
An Efficient Information Retrieval Method
363
Fig. 2. Topology of the cover obtained in the final iteration (first site). (Crosses in circles indicate known contaminated data)
5.2 The Second Site The total data set collected from the second site is quite sparse (0.47ha/sample) as compared to those of the first (0.17ha/sample). Three sets of experiments are conducted using this set of data. The first one involves 30% of the data, the second one 67% and the last includes the full set. In Table 2 we provide the results. Table 2. Results on the second site Iter. 1 2 Full set
% of available data % of known cont. data Cover % Area % 30.72 67.07 100.00
35.60 70.58 100.00
94.11 100.00 100.00
65.00 56.00 46.00
In Fig. 3, the topology of the cover obtained with 30% of the data is illustrated. 35% of contaminated data are known to FTA. It is observed that only one contaminated sample (on the left side boundary of the site well above other contaminated data) is left out of the cover. In Fig. 4, the topology obtained with 67% of the data is illustrated. The cover includes all contaminated data. Finally, we illustrate a Kriging (ordinary) assessment of the site with 30% of the data used in the first iteration (Fig. 5). We observe that Kriging leaves 52% of contaminated data out of its coverage, as compared to 6% by FTA. These two site illustrations can be regarded as representative of the other tests conducted for other pollutants in these sites. Empirical results show that FTA has the capability to reduce one third of total samples collected from the site while securing
364
¨ L. Ozdamar and M.B. Demirhan
Fig. 3. Topology of the cover obtained in the first iteration (second site). (Crosses indicate full set of contaminated data 35% of which are known to FTA)
Fig. 4. Topology of the cover obtained in the second iteration (second site). (Crosses indicate full set of contaminated data 70% of which are known to FTA)
the accuracy of the contaminant topology of the site. Furthermore, it removes half of the site area for the next round of detailed sampling. The cost reduction thus obtained can be calculated having total site assessment costs in mind. It is also demonstrated that Kriging (a conventional site characterization approach) is much less accurate when utilized on sparse data.
6 Conclusion We propose a site assessment method to determine contaminant distribution over an industrially polluted site with sparse data. Unlike conventional approaches, the
An Efficient Information Retrieval Method
365
Fig. 5. Topology of the cover obtained by Kriging in the first iteration (second site). (Crosses indicate contaminated data (35% known to Kriging), contours indicate contaminated areas claimed by Kriging)
method does not depend on any statistical assumption on the distribution of pollutants and works only with available sample information. This approach localizes spatial inferences by partitioning the site and assessing smaller regions thus formed. The assessment is carried out by using fuzzy measures that exploit the advantages of the proposed overlapped partitioning approach. This approach is particularly useful when the data are sparse and a reliable topology of contamination is required. In overlapped partitioning, samples are shared by neighboring partitions and they contribute to the block assessment of numerous blocks simultaneously. This scheme enables the identification of more homogenous clusters that represent partitions. Applications of the method on real sites illustrate the effectiveness of this approach. Results on two sites show that two thirds of the total number of samples are adequate for representing the site topology, thereby reducing site assessment costs significantly. Further, in the case where a second round of sampling is to be conducted for finer analysis, the size of the area to be re-sampled is reduced by half.
References 1. Burrough PA, McDonnell R (1998) Principles of GIS (Spatial Information Systems). Oxford University Press, New York 2. Cressie NA (1993) Statistics for spatial data. John Wiley & Sons, New York 3. Demirhan M, Ozdamar L (2003) A fuzzy adaptive partitioning algorithm (FAPA) for global optimization. In: Verdegay JL (ed) Fuzzy sets based heuristics for optimization. Studies in Fuzziness and Soft Computing. Vol. 126, Springer, Heidelberg New York 4. Deutsch CV, Journel AG (1998) GSLIB: Geostatistical Software Library and User’s Guide. Oxford University Press, New York 5. Englund EJ (1993) Spatial simulation: Environmental applications. In: Goodchild MF, Parks BO, Steyaert S (eds) Environmental modelling with GIS. Oxford University Press, New York
366
¨ L. Ozdamar and M.B. Demirhan
6. Heywood I, Cornelius S, Carver S (1998) An introduction to geographical information systems. Longman Science and Technology, New York 7. Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, New York. 8. Ripley BD (1981) Spatial statistics, John Wiley and Sons, New York.
CHAPTER 26 A Fuzzy Method for Learning Simple Boolean Formulas from Examples Bruno Apolloni1 , Andrea Brega1 , Dario Malchiodi1 , Christos Orovas2 , and Anna Maria Zanaboni1 1
2
Department of Computer Science, University of Milan, Via Comelico 39/41, 20135 Milan, Italy {apolloni, brega, malchiodi, zanaboni}@dsi.unimi.it Technological Educational Institute of West Macedonia, Greece
[email protected]
Abstract: We discuss a method for inferring Boolean functions from examples. The method is inherently fuzzy in two respects: i) we work with a pair of formulas representing rough sets respectively included by and including the support of the goal function, and ii) we manage the gap between the sets for simplifying their expressions. Namely, we endow the gap with a couple of membership functions of its elements to the set of positive and negative points of the goal function and balance the fuzzy broadening of the sets. This gives the benefit of describing them with a shorter number of symbols for a better understandability of the formulas. The cost-benefit trade-off is obtained via a simulated annealing procedure equipped with special backtracking facilities. We tested the method on both an ad hoc case study and a well known benchmark found on the web. Keywords: Computational learning, rule learning, algorithmic inference, fuzzy sets, Boolean formulas simplification
1 Introduction The leap from examples to concepts, for instance from observation of fire to the determination of thermodynamics laws, could require a complex sequence of hierarchical steps, like human kind took over the course of the centuries. In view of capturing some features of this abstraction process, we devised an agnostic learning procedure [10] that generates a pair of functions which are both consistent with achieved experience and described with a few symbols. In particular, we identify experience with a set of positive and negative examples and seek to discriminate them through a pair of rough sets [15] with the following features. From subsets of equally labelled examples we compute partially consistent hypotheses. They are represented by Boolean functions which are satisfied by a part of the positive examples and contradicted by all negative ones, or vice versa satisfied by all positive examples and contradicted by only some negative ones. The criterion is that the union of the hypotheses coming
B. Apolloni et al.: A Fuzzy Method for Learning Simple Boolean Formulas from Examples, Studies in Computational Intelligence (SCI) 2, 367–381 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
368
B. Apolloni et al.
from positive subsets and the intersection of the others form two nested regions delimiting the gap where the contours of suitable fully consistent hypotheses (hence satisfied by all positive and contradicted by all negative examples) can be found. In Fig. 1 the gap between these rough sets is represented by the light gray area. It is delimited on the inside by a region we call the
Fig. 1. Inner (union of the dark gray circles) and outer (intersection of the white circles) border of a concept. The light gray area denotes the gap between the two borders
inner border and, analogously, by the outer border. Since we have no a priori specifications on the shape a fully discriminating hypothesis should have in connection with our learning problem3 , we initially obtain necessary rough sets by using monomial and clauses for the partial gathering of positive and negative examples respectively. Thus the inner border is described by a Disjunctive Normal Form (DNF) and the outer one by a Conjunctive Normal Form (CNF). Here necessarity of a set means, in the case of the inner border, that it cannot include another set described by a Boolean formula satisfied by all the positive points of the training set; as for the outer border, that it cannot be included in an analogous formula excluding all negative points. This kind of minimality does not however guarantee an analogous minimality on formula description length. So, here we propose an algorithm for simplifying the description of borders in such a way that they preserve descriptional power yet are easily understandable from a semantic viewpoint. We assume the minimal borders to be a first release of our rough sets. Then we feel free to modify them in any way, provided that they: i) do not trespass on the minimal ones and, ii) they do not cross each other. In this framework we face a constrained optimization problem whose qualitative target consists of a simple formula correctly discriminating almost all future inputs. The solution search is driven by special rough set membership functions we plug into the gap. Namely, removing symbols from a DNF produces an expansion of its support (apart from the case where a whole monomial is removed with a consequent support contraction). Conversely, removing symbols narrows the CNF support (except for the analogous case where a whole clause is removed). As a consequence, simplifying formulas causes a broadening of the original monomials and clauses. We interpret this as a spreading of their contours into 3
a situation that is denoted as ignorance of the hypothesis class in the Valiant PAC learning framework [21].
A Fuzzy Method for Learning Simple Boolean Formulas from Examples
369
fuzzy regions. In this way we understand fuzziness as an uncertainty property of the domains gathering points rather than of the points per se. In turn, examples falling into fuzzy regions help to determine the shape of the membership functions. Through these functions we quantify a thickness of the formula contours as a negative feature which we then contrast with the benefit of simplifying them when we are moving toward the solution of the optimization problem. With the general understanding that it is very difficult to know a priori the class of the goal formula or even the set of involved variables [6], the idea of fitting the formula within minimal and maximal hypotheses is well known in the literature, being acquainted from various perspectives [15, 13, 19, 14]. For instance Mitchell [13] frames the class of candidate concepts into a version space constituted exactly by all hypotheses consistent with a set of examples. According to our notations, he partitions these concepts into the possible outer and inner borders plus all those in the gap between pairs of them. In this respect the peculiarities of our approach lie in: •
•
•
the inductive bias. We start from an initial bias free framework – apart from the assumption that the goal formula is monotone – and obtain a set of minimal (monomials) and maximal (clauses) formulas necessarily representing the constituents of the inner and outer borders. This gives specific tasks to the set union and intersection operations involved in the Version-Space Merging algorithm by Hirsh [9] and a sharp meaning to the soft combination of formulas proposed by Sebag [18]. simplification. We consider inner and outer borders as a kind of rough sets for simply describing the concept. The fuzzy criterion we use reads closer to an addition of new inductive bias to restrict the class of formulas from which we pick simple borders than to others syntactical criteria, such as the best fitting and post-pruning of the decision tree algorithms [16] or the mutual information minimization target of rule extractors like RULEX [2]. Moreover, it is more complex than the support maximization and similar criteria used in the Hamming classifier [14]. This allows for a more sophisticated and well founded management of the trade-off between class complexity and error rate clearly synthesized by Vapnik in the structural risk minimization problem [22]. set of rules formation. We have an essentially parallel way of building rules from atomic formulas. It is quite opposite to sequential covering algorithms such as CN2 [7], with a light influence of the actual sequential path we follow for simulating the parallel procedure.
We tested the suitability of this perspective numerically: a) on a set of artificially constructed benchmarks, where we drew labeled examples from the support of original DNFs and their complements to the assignment space. The value of the formulas we learn is appreciated through the exact measure of their symmetric difference versus the original formulas’. For the sake of comparison this quantity is computed also on the output of the Quinlan C4.5 procedure [16], and b) on the web benchmark constituted by the voting behaviors of Democratic and Republican Congresspeople in some crucial political contests. In this case the efficiency of our method is appreciated only in comparison to the the C4.5 results within a cross-validation scheme. The paper is organized as follows. In Sect. 2 we detail the procedure for building the rough set inital issue. Section 3 describes the special routines designed to simplify the learnt formulas. Their performance is discussed in Sect. 4, while the last section is devoted to outlooks and concluding remarks.
370
B. Apolloni et al.
2 Building Boolean Formulas Let us denote Xn = {0, 1}n the space of the Boolean vectors x of size n, which we can assign to the set Vn = {v1 , . . . , vn } of propositional variables in argument to Boolean formulas, a literal being an affirmed v or a negated v propositional variable. We denote by set(g) the set {vi } of all the literals occurring in the Boolean formula g. A formula is monotone if only affirmed variables appear therein. By abuse, we will use the same notation for g and its support, i.e. the set of points x ∈ Xn such that g(x) = 1. Therefore, when we say that a formula is larger than another we refer to their supports. For a given set E ⊆ Xn of examples partitioned into a set E + of positive examples and E − of negative examples, a goal concept χ is a Boolean function such that: i) E + and E − represent sets of assignments respectively belonging and not belonging to χ, and ii) any suffix of these examples drawn with the same distribution law does the same. Here we interpret Ockham’s razor principle [20] as a request to not introduce, in absence of prior knowledge, relations between data that cannot be justified by the data themselves. Therefore within the class of monotone formulas we start considering the minimal form including a positive point, and the maximal excluding a negative point as follows. Definition 1. i) a monotone monomial m with arguments in Vn is a canonical monomial if an x ∈ E + exists such that ( ∈ set(m) if xi = 1 for each i ∈ {1, . . . , n}, vi ∈ set(m) otherwise; ii) a monotone clause c with arguments in Vn is a canonical clause if an x ∈ E − exists such that ( ∈ set(c) if xi = 0 for each i ∈ {1, . . . , n}, vi ∈ set(c) otherwise. Given a positive example, the related canonical monomial m represents a set of points which necessarily belong to the goal concept χ. Hence the minimal (support) form is consistent with the point. Therefore, union of the canonical monomials constitutes the Disjunctive Normal Form representation of the minimal support of a monotone Boolean form consistent with the training set. We call it the inner border since we know it is included in χ. Analogously the intersection of the canonical clauses constitutes a Conjunctive Normal Form representing a maximal support formula consistent with the training set, which we know includes χ and is called the outer border. The algorithmic counterpart of the above definitions is given in Table 1. Canonical monomials and clauses, which henceforth we will call atomic formulas, play the roles of granules in rough sets. Unlike the latter, however, they combine in the outer border by intersection rather than union. The syntactical simplicity of the border is paid in terms of description length. In principle they are constituted by a number of monomials or clauses equal to the size of the training set. 4 4
Note that one and only one assignment – exactly one positive example – is necessary to bind the narrowing of the support of a canonical monomial into the support of another monomial. Similarly a sole negative example is needed for binding the enlargement of a canonical clause support. This stands for a very efficient way of sentineling concepts with great benefit in terms of sample complexity of the learning task [3, 4]. The same does not happen if, for instance, we still use monomials for covering negative examples [14].
A Fuzzy Method for Learning Simple Boolean Formulas from Examples
371
Table 1. Pseudocode of the preliminary border building First issue(E + , E − ) Begin b0 = 0/ FOR ALL x ∈ E + m = ∩(i s.t. xi = 1) vi b 0 = b0 ∪ m B0 = Xn FOR ALL x ∈ E − c = ∪(i s.t. xi = 0) vi B0 = B0 ∩ c Return (b0 , B0 ) End
Some initial benefits come from the monotonicity of the formulas in terms of their length. Indeed when a canonical monomial is included in another one it can be removed from the description of the inner border. The same happens in the outer border for a clause including another one. Note that if a non monotone formula is needed we may still represent it in our framework after duplicating the propositional variables, setting, vn+k = vk for each k ∈ {1, . . . , n}, so that the same example which assigns value 1 to vk assigns value 0 to vn+k and vice versa. In this way each canonical monomial and clause has all n literals (affirmed or negated) in its argument. Its support hence consists of only one point (in the case of a monomial) and of all but one point (in the case of a clause). In this case the benefit of the inclusions between atomic formulas is exploited exactly after their simplification (i.e. after some literal have been removed from the formulas). In the next section we will discuss a series of steps in this direction.
3 Fuzzy Relaxation Moving forward in the fuzzy reading of our learning framework, we decide to broaden the contours of the inner and outer borders. So, we balance a desirable shortening of the formula f with the undesirable loss of description power, as mentioned in the introduction. We account for this trade-off by minimizing a cost function O( f , λ) that linearly adds through the weight vector λ: i) the contributions of the single monomials, and ii) the cost of a dummy monomial that accounts for the combined effect of the single monomials not taken into account in i). Let us focus initially on the inner border and then transfer later the results to the outer one thanks to the dual relationship between CNF and DNF. A first component of the cost of any monomial m in the inner border is its length, which we can shorten trivially through two actions: 1. removal of a literal from set(m) obtaining a relaxed monomial m ; 2. removal of the entire monomial. Both actions may corrupt the accuracy of the formula. This is because the former annexes new points to the support of m , while the second may leave some points in the support of m with no cover in the inner border (i.e. not satisfying any other monomial of the border).We manage the first drawback via the introduction of a fuzzy frontier for the monomial to which
372
B. Apolloni et al.
annexed points belong with a membership degree. On the other hand, points left without a covering after a monomial removal are attributed to a common dummy monomial with an analogous cost. Starting from a crisp monomial m, any subsequent removal of literals gives rise to a monotone enlargement of the monomial support. We consider this enlargement as a thick fuzzy frontier of the monomial that we describe through a fuzzy membership function decreasing from 1 in the crisp monomial to 0 outside its enlargement (see Fig. 2). An enlargement slice is associated to each removed literal. We attribute all the points belonging to the same enlargement slice a same membership value to the fuzzy frontier. We also transfer this value to the literal whose removal generated the slice, and identify the fuzzy frontier directly with the ordered sequence of the removed literals. Due to consistency constraints, no negative sample point may be enclosed in these enlargements. On the contrary, points in E + determine the construction of the membership function as follows. Definition 2. Given a monomial m, for an ordered sequence d = (d1 , . . . , ds ) of length s of literals from set(m), let us denote by dk its prefix of length k. Let md0 = m, and mdk denote the monomial obtained by removing the literal dk from the representation of mdk−1 . Let us denote σ(dk ) the cardinality of the E + subset belonging to mdk − m. We define the (fuzzy) membership function µmd (dk ) of a literal dk in respect to md as follows: µmd (dk ) = 1 −
σ(dk ) . σ(d)
(1)
The monotonicity of the membership function, referred to assignments as points in the unitary hypercube, induces a dummy metrics where points annexed by one literal in d are farther from the crisp monomial than the ones annexed by previous literals. In a localist interpretation of the membership function we can consider µmd (dk ) as a probability estimate 5 of finding points that belong to the fuzzy frontier outside the enlargement induced by dk . And we can define the radius of the frontier as the mean value of the distances of points belonging to each enlargement slice from the crisp monomial m as follows. Definition 3. Given a monomial mi and an ordered sequence di = (d1 , . . . , ds ) of length s of literals from set(mi ), we call md − mi the fuzzy frontier of mi , and ρi =
s
∑ µm (dk ) d
(2)
k=1
its radius. Remark 1. In the above sum the sth literal has weight 0. This is not a real drawback since we sum all the different values of µ in any case. Remark 2. We could define in a similar way the border of the dummy monomial whose crisp support is empty by definition. However, in the following we will consider the support of the border directly as a cost to be minimized and we will estimate it through the cardinality of the intersection of this monomial with E. Moving to clauses, we have a dual membership function definition which takes into account that: • 5
the first issue clause c identifies the crisp set of the points outside it, A somehow dual notion of the formula certainty factor in rough set theory [15].
A Fuzzy Method for Learning Simple Boolean Formulas from Examples
1
2
3= l
373
k
Fig. 2. The fuzzy border of a monomial. Bullets: sampled points •
the removal of a literal dk expands the set of these points, thus giving rise to enlargements cdk of the complement of c support with analogous notations and properties as in Definition 2 and a radius ri for the ith clause as in Definition 3, now referred to the negative examples.
Summing up, the cost function we want to minimise with respect to the formula f is m
m
i=1
i=1
O( f , λ) = λ 1 ∑ Li + λ2 ∑ Ri + λ3 ν0 ,
(3)
where: • • • • •
f is an inner or outer border; ∑i Li is the length of the formula, being Li the number of literals in the ith atomic formula; Ri = either ρi or ri , depending if f is an inner or outer border; ν0 is the percentage of examples not covered by f , respectively either positive points not included in the inner border or negative points not excluded from the outer one; λ is the set of free parameters balancing the costs.
3.1 Stochastic Optimization The true goal of our simplification is to find a couple of formulas that both have a short description and discriminate with high accuracy the new x submitted to them in the contour. We essentially relate the latter property to the tightness of the fuzzy border introduced by the removal of literals. Consequently we identify our goal with the minimization of the fitness function (3). Instead of exhaustive search for the best fitting solution, we consider only a limited set of candidate simplifications. Namely the search is driven by a simulated annealing procedure [1] along the landscape of the local cost function (3). In greater detail:
Selecting a Move The extraction of the candidate solution is randomly selected from the neighborhood of the current solution. We uniformly draw:
374
B. Apolloni et al.
• the monomial/clause within the inner/outer border, and • the literal inside it. We maintain the role of inner and outer borders through the entire simplification process, in terms of a functional consistency (see Fig. 3) requiring the inclusion of the current inner border b inside the current outer border B. Namely, we envisage a series of incremental simplifications where at each iteration i the following inclusion relation holds b0 ⊆ bi ⊆ Bi ⊆ B0 ,
(4)
where subscripts identify the iteration and where iteration 0 represents the original issue of the formulas. The intermediate inclusion is checked by testing whether, for each monomial m ∈ bi , and each clause c ∈ Bi , it is true that m ⊂ c. The quickest way to perform it is to check if at least one element of set(m) belongs to set(c). The extreme inclusions could be violated when entire monomials or clauses are removed during the simplification process. However, having introduced dummy atomic formulas for recovering examples that are included/excluded only by the above monomials/clauses, we get that functional consistency implies the consistency with the entire sample E considered earlier. Therefore, each proposed solution f for O is preliminarily checked against consistency with the routine described in Table 2.
(a)
(b)
Fig. 3. Functional consistency is satisfied in (a) and violated in (b). Black and gray ellipses denote respectively a DNF and a CNF. Dashed curves: contours of dummy monomial and clauses. Circles and diamonds denote respectively points in E + and E −
Table 2. Pseudocode of the functional consistency check Functional check(bL , BL ) Begin IF FOR ALL m ∈ bL FOR ALL c ∈ BL set(m) ∩ set(c) = 0/ THEN return true ELSE return false End Given the current solution f , we move to a candidate consistent solution f according to the Metropolis stochastic decision [12]:
A Fuzzy Method for Learning Simple Boolean Formulas from Examples
375
If O( f , λ) ≤ O( f , λ) accept f else accept f with probability
1 . 1 + exp (−β (O( f , λ) − O( f , λ)))
(5)
Here β is a tuning parameter with the flavor of an inverse temperature. Low values rise the probability of accepting moves incrementing O, thus allowing overcoming local minima. High values concentrate asymptotically the process around the lowest values of O, with high risk however of getting stuck in local minima. A suitable schedule of β from low to high values, that is usually denoted a cooling schedule [1], will be detailed in the numerical examples section.
Trajectory Bundle Note that the membership functions we defined are not univocal for a given expansion of a monomial from m to md . They depend strictly on the history of literal removals we followed. Thus we may have in general Od ( f , λ) = Od ( f , λ) because the two deletion histories d and d leading to f are different either in the sole deletion order from a same initial formula or in the sets of deleted literals (hence from two different initial formulas). Moreover, we further enrich this history by considering as well some reinsertion of literals whenever it is accepted by (5). A reinsertion just deletes an item from d while preserving the relative order of the remaining ones. In particular, we may have a conjunction of two deletion histories when a monomial mdk includes another monomial m d h during the simplification of m and m . In this case m d h is erased from the DNF and the two histories of m and m continues jointly with dk+1 . In addition, we decide to reinsert in the back history literals from d h after dk+1 . Thus we attain two benefits: • •
we privilege the locally finer description of the membership function, we possibly pass through unseen sites during the trip walk toward the optimal DNF in a Tabu Search [8] reminiscent strategy.
Management of the Simplification Steps We synthesize the simplification steps of our incremental procedure in Table 3. The simplification of an atom of a border gets it closer to the other, with less chance of a further simplification for the latter. Thus on one hand the selection of the first move is crucial (i.e. the choice of which border to be modified first, as it will be numerically evidenced in the next section). On the other, in order to relieve the mentioned bias, in the procedure we alternate between cycles removing literals from monomials with those removing literal from clauses.
4 Numerical Examples We checked the effectiveness of our procedure on two test beds: an artificial instance consisting in recovering monotone formulas, plus a usual benchmark [17].
376
B. Apolloni et al.
Table 3. Pseudocode of a Relaxation step and the cooling schedule on the inner border. We can get the analogous routine for the outer border cooling by substituting clauses with monomials Relaxation(bL−1 , BL−1 ) Begin bL = Inner border cooling cycle(bL−1 , BL−1 ) BL = Outer border cooling cycle(bL , BL−1 ) Return(bL , BL ) End Inner border cooling cycle(b, B) Begin For(k = 1, k ≤ cyclesteps, k++) Begin Draw an m ∈ b , Draw a consistent enlargement m IF StochasticAcceptanceRule (β(k)) is true , THEN b = (b\m) ∪ m End Return b End
4.1 Rediscovering DNFs The experiments focused on learning DNFs on 12 propositional variables drawn randomly so as to have a number of terms between 2 and 7. We considered 100 such formulas. For each of them we generated two kinds of training sets containing 100 examples equally partitioned in positive and negative examples. The first (unbiased) training set is constituted by assignments to the propositional variables of the value 0 or 1 with same probability 0.5. In the second (biased) training set we draw at random 4 propositional variables to which value 1 is assigned with probability 0.7, while probability remains 0.5 for the other. The test set is constituted by the whole set of 4096 different 12 long Boolean vectors labeled according to the DNF to be recovered. To run the procedure in Table 3 we must assess a few parameters. The cost function (3) breaks down as m
m
i=1
i=1
O( f , λ) = λ ∑ Li + (1 − λ) ∑ Ri ,
(6)
where the cost of the uncovered points disappears since, in fact, the consistency constraints never allow the removal of an entire monomial or clause. Thus the cost of each atom reduction is totally independent of the reduction of other atoms of the same border. Apart from parameter λ that fixes the trade-off between formula length and fuzziness, we must specify the cooling schedule of the simulated annealing algorithm. Namely, theoretical and numerical reasons suggest periodically incrementing β in (5), during the search for the best simplification. Our schedule, coming from a set of numerical trials [5], is the following for all experiments in this section. At cooling step k the temperature β is set to β(k) = k/c, with c ranging from 0.05 to 0.2. It stays at this value for νk tosses of candidate solutions, with νk = 30k + νk−1 , starting from ν0 = 0. At the end of a first cooling cycle the search procedure tends to get stuck in a local minimum, since the high value of β does not allow us to try escaping moves from it [1]. Then we start again with the initial temperature to give a second chance to the procedure so as
A Fuzzy Method for Learning Simple Boolean Formulas from Examples
377
to descend in a better local minimum. As a matter of fact, no better or even different solution is generally found after two cooling cycles.
Table 4. Comparing the performances of FR and C4.5 methods for learning formulas in the polynomial hierarchy. Headers unbiased and biased qualify the training set distribution law. µ and σ are mean and standard deviations over 100 trials; ρ equals the ratio between the found and first issue rule lengths; FP, FFP, FN, FFN denote percentages of False Positives, Fuzzy False Positives, False Negatives, and Fuzzy False Negatives, respectively, determined by the discovered rules DNF ρ FFP FN unbiased FR biased unbiased C4.5 biased
µ σ µ σ µ σ µ σ
0.11 0.07 0.11 0.06
3.69 3.12 3.79 3.95
2.24 2.45 2.58 2.85
8.45 6.95 7.25 6.51
5.58 4.72 6.09 4.93
CNF ρ FP FFN 0.12 0.05 0.12 0.05
8.24 4.94 8.22 5.78
1.59 1.93 1.93 2.41
Table 4 reports the performances (with headline FR) of the method, in terms of: i) compression rate ρ of the simplified versus first issue formulas; and ii) test set classification accuracy. In regard to the latter, positive points falling within the inner border definitely denote a correct classification; those falling outside the outer border add to the error score, while those falling in the gap between borders are questionable. If we consider the latter correct, we get a lower bound to the error percentage in the FFN (Fuzzy False Negative) headed column of the quoted table; if we consider them incorrect we get an upper bound in the FN (False Negative) column of the same table (see Fig. 4). The same happens with the negative points, for which FFP and FP percentages are defined analogously. If we want to focus on exactly one formula, the error percentages suggest that we use the inner border as a final concept for discriminating positive from negative points. In this case the FFP errors determine a sharp percentage for the selected formula and their counterpart remains the quoted FN errors. In two respects and for two reasons this option works better than assuming the outer border as a final concept. Accuracy in the second option, synthesized by the FP and FFN columns, is globally worse, and so is the compression rate. The more favorable behavior of the disjunctive representation comes both from the fact that the original formula is a disjunctive form too and from the mentioned privilege we give to the inner border versus the outer one in the simplification procedure. Indeed, before processing the outer border, we first conclude a whole cooling schedule cycle on the inner one. As a result, with this method we allow less room for a restriction of the outer border6 . The gap between borders is around 3% of the whole instance space with a standard 6
To give a quantitative idea of this asymmetry, part (a) of the below table refers to analogous results as in Table 4 (though with different formula dimensioning) obtained by a process fed with examples generated through a DNF and producing as first formula a DNF, part (b) to a process fed through a CNF and producing as first formula a CNF.
378
B. Apolloni et al.
outer border contour
inner border contour
Fig. 4. Points error contributions.Circles → positive points; diamonds → negative points.White → correctly classified points;gray + black → contribution to false classified points;black → contribution to fuzzy false classified points
deviation around 3%, too. This denotes a generally low uncertainty on the final discrimination rule. We obtain in average formulas having a slight smaller length of the original one labeling the examples (with a ratio around 0.94) and a standard deviation aorund 1/3 of the length. These quantities shift to around 1.5 the original lengths and 1/2 when we start the simplification from the opposite normal forms. As a matter of fact, 11% of the formulas are exactly recovered by FR with unbiased samples and 10% with biased ones. In the remaining cases we pay compression with accuracy. In general a slight degradation of all performance indices is registered when we pass from unbiased to biased samples. Fuzzy relaxation demands relatively little computer time. In all these trials it averaged 1.9 seconds (with almost 0 variance) on a 2.0 GHz Pentium IV as a reference architecture. With these running times we may compete with the widespread concept learning algorithm represented by C4.5 [16], where a decision tree in terms of IF-THEN-ELSE rules is drawn directly by iterated partitioning of the sampled data on the basis of mutually exclusive tests on their range. The last lines of Table 4 report the analogous results from this algorithm as available on the web site [17]; they are reported in the columns FFP and FN for comparison with the DNF computed through FR. With an average running time of 0.03 seconds we have mean description lengths from 1.3 to 1.9 greater than those provided by FR, and generally worse accuracies. Two orders may not be a dramatical gap in the running time, since we must consider that the procedures are implemented in different languages: FR is written in Java (thus compiled into bytecode), while the C4.5 code is written in C (thus compiled into machine
DNF FFP FN
CNF FP FFN
DNF FFP FN
CNF FP FFN
µ 1.46 1.31 5.95 0.78 σ 2.79 2.16 5.06 1.42
µ 1.25 6.98 1.85 2.67 σ 2.44 5.44 3.35 4.33
(a)
(b)
A Fuzzy Method for Learning Simple Boolean Formulas from Examples
379
language). It is interesting to note how the standard deviations of the various performance indices are generally almost equal to the means. This denotes the presence of some rare hard instances causing very high values of the indices, balancing a higher concentration of cases in the almost 1 standard deviation wide gap between 0 and the mean. Finally, the correlation between the indices in FR and C4.5 is small, around 0.2, that denotes an inherent difference between them, since many goal formulas are recognizable more by one method rather than the other.
4.2 Discriminating Voting Records This is a benchmark reported in the above C4.5 website. It consists of the votes for 300 U.S. House of the Representatives Congresspeople on 16 key votes. The goal is to recognize, on the basis of the votes, whether the Congressperson is a Democratic or a Republican. The records consist of 16 three-valued variables, reporting a favorable (y), unfavorable (n) or unexpressed vote (u), plus a binary variable declaring the party of the voter. To come to a Boolean problem we associated y to 1, n to 0, and split the records each time we met a u, substituting it once with 1 and once with 0, getting a total of 439 binary records. Analysis of the results from both FR and Quinlan’s algorithms clearly highlights the different approaches. In the latter syntactical strategies are employed to find commonalities between the data strings of the training examples (with possible statistical correctives). The main goal lies in finding the best fit of the data through a decision tree descriptor. Ockham’s razor [11] is used as much as possible to simplify the description of the example clusters thus gathered. We took a different approach in drawing a lesson from the data. We look for a set of formulas capturing the inherent syntactical structure of the data. To obtain values in Table 5 we used a cross-validation strategy. Namely we got 50 different random partitions of the file into training and test sets, where the former is the 70% of the whole file. Table 5. Comparing the performance of two methods for learning a discriminating rule between voters from two parties. Same notation as in Table 4 DNF CNF FFP FN FP FFN FR
µ 4.45 2.43 5.03 0.84 σ 1.86 1.02 2.02 0.36
C4.5
µ 2.33 0.32 σ 1.32 1.03
C4.5 definitely fits the data better, with an accuracy about two times greater than ours. However, covering is not explaining. If we analyze the mislabeled data we discover that our approach finds more stable structures, as denoted by the persistency of the errors on some records in spite of the change of the training set generating the formulas. Indeed more than half of the misclassified records are incorrectly classified by more than 30 of the 50 inferred formulas. This does not happen with the Quinlan algorithm, where errors are almost uniformly spread over the records. Actually only one quarter of the misclassified records are now incorrectly classified by more than 10 formulas. In other words, our borders fail to classify mainly
380
B. Apolloni et al.
some atypical records that cannot find an easy explanation through a monotone Boolean formula.
5 Conclusions Confining an unknown function f between tight and weak hypotheses is a common way the human brain infers properties about the function and takes operational decisions. It is a functional extension of the confidence interval notion, thus based on statistics on the observed examples. In absence of any indication about f we assign to the example the sole role of watching for inconsistencies. Then to render the formulas understandable we accept the compromise of reducing the sharpness of the borders. This gives rise to an algorithm for learning concepts that proves highly accurate when a monotone formula really underlies the data. Although it does not find the best fit of the data, the structure it finds underlying them pays in terms of robustness versus the randomness of the training set. The algorithm runs fast with few free parameters, making it usable as a default tool for analyzing data.
Acknowledgements This work was partially supported by the project ORESTEIA: mOdular hybRid artEfactS wiTh adaptivE functIonAlities, funded by the European Commission under the grant No. IST2000-26091.
References 1. E. Aarts and J. Korst. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley, 1989. 2. R. Andrews and S. Geva. Inserting and extracting knowledge from constrained backpropagation network. In Proc. 6th Australian Conference on Neural Networks, pages 29–32, Sidney, 1995. 3. B. Apolloni, D. Malchiodi, and S. Gaito. Algorithmic Inference in Machine Learning. International Series on Advanced Intelligence. Advanced Knowledge International, Magill, Adelaide, 2003. 4. B. Apolloni, D. Malchiodi, C. Orovas, and G. Palmas. From synapses to rules. Cognitive Systems Research, 3/2:167–201, 2002. 5. B. Apolloni, D. Malchiodi, C. Orovas, and A. M. Zanaboni. Fuzzy methods for simplifying a Boolean formula inferred from examples. In L. Wang, S. Halgamuge, and X. Yao, editors, FSKD’02. Proceedings of the First International Conference on Fuzzy Systems and Knowledge Discovery (November 18-22, 2002 Singapore), volume 2, pages 554–558, 2002. 6. A. Blum. Learning Boolean functions in an infinite attribute space. Machine Learning, 9:373–386, 1992. 7. P. Clark and R. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–284, 1989. 8. F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, Norwell, MA., 1997.
A Fuzzy Method for Learning Simple Boolean Formulas from Examples
381
9. H. Hirsh. Generalizing version spaces. Machine Learning, 17(1), 1994. 10. M. Kearns, M. Li, L. Pitt, and L. Valiant. On the learnability of Boolean formulae. In Proc. 19th ACM Symp. on Theory of Computing, pages 285–295, NY, 1987. ACM Press. 11. M. J. Kearns and U. V. Vazirani. An Introduction to Computational Learning Theory. The MIT Press, Cambridge, 1994. 12. N. Metropolis, A. Rosenbluth, and A. Teller. Equation state calculations by fast computing machines. J. Chem. Phys., 21:1087–1092, 1953. 13. T. M. Mitchell. Machine Learning. McGraw-Hill Series in Computer Science. The McGraw-Hill Companies, Inc., New York, 1997. 14. M. Muselli and D. Liberati. Binary rule generation via Hamming clustering. IEEE Transactions on Knowledge and Data Engineering, 14:1258–1268, 2002. 15. Z. Pawlak. Rough sets and decision algorithms. In W. Ziarko and Y. Yao, editors, Rough Sets and Current Trends in Computing, pages 30–45, Berlin, 2001. Springer. 16. J. R. Qunilan. C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo, California, 1993. 17. Ross Quinlan home page. http://www.cse.unsw.edu.au/∼quinlan/. 18. M. Sebag. Delaying the choice of bias; a disjunctive version space approach. In L. Saitta, editor, Machine Learning, Proceedings of the Thirteenth International Conference (ICML ’96), Bari, Italy, July 3-6, 1996, pages 444–452. Morgan Kaufmann, San Francisco, 1996. 19. B. Selman and H. Kautz. Knowledge compilation and theory approximation. Journal of the ACM, 43(2):193–224, 1996. 20. S. C. Tornay. Ockham: Studies and Selections. Open Court, La Salle, IL, 1938. 21. L. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984. 22. V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
CHAPTER 27 FUZZY LINEAR PROGRAMMING: A MODERN TOOL FOR DECISION MAKING PANDIAN VASANT1,2, R. NAGARAJAN2 and SAZALI YAACOB2 1
Department of Mathematics, Mara University of Technology, 88997 Kota Kinabalu, Malaysia
[email protected] 2 School of Engineering and Information Technology, Universiti Malaysia Sabah, 88999 Kota Kinabalu, Malaysia
[email protected] and
[email protected]
Abstract: In this paper, the S-curve membership function methodology is used in a real life industrial problem of mix product selection. This problem occurs in the chocolate manufacturing industry whereby a decision maker, analyst and implementer play important roles in making decisions in an uncertain environment. As analysts, we try to find a solution with a higher level of satisfaction for the decision maker to make a final decision. This problem of mix product selection is considered because all the coefficients such as objective, technical and resource variables are fuzzy. This is considered as one of sufficiently large problem involving 29 constraints and 8 variables. A decision maker can identify which vagueness (D) is suitable for achieving satisfactory optimal revenue. The decision maker can also suggest to the analyst some possible and practicable changes in fuzzy intervals for improving the satisfactory revenue. This interactive process has to go on among the analyst, the decision maker and the implementer until an optimum satisfactory solution is achieved and implemented. Keywords: S-curve membership function, vagueness, degree of satisfaction, decision making
1. INTRODUCTION A non linear membership function, referred to as the “S–Curve Membership Function” has been used in problems involving interactive fuzzy systems. The modified S-curve membership function [9, 16, 23] can be applied and tested for its suitability through an applied problem. In this problem, the S-curve membership function was applied to reach a decision when all three coefficients, such as objective function, technical coefficients and resources, of mix product selection (FPS) were fuzzy. The solution thus obtained is suitable to be given to decision maker and implementer for final implementation. The problem illustrated in this paper is only one of eight cases of FPS problems which occur in real life applications. It will be interesting to investigate the fuzzy solution patterns of these above FPS problem. This paper discusses such an aspect. The above case of FPS problem is P. Vasant et al.: Fuzzy Linear Programming: A Modern Tool for Decision Making, Studies in Computational Intelligence (SCI) 2, 383–401 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
384
P. Vasant et al.
considered on a real life situation in the case of chocolate manufacturing. The data for this problem are taken from the data-bank of Chocoman Inc, USA [22]. Chocoman produces varieties of chocolate bars, candy and wafers using a number of raw materials and processes. The objective is to use the modified S-curve membership function for obtaining a revenue maximization procedure through a fuzzy linear programming (FLP) approach. Many authors have studied fuzzy linear programming models and used different methodologies in solving problems related to fuzzy optimization [3, 13, 14, 18, 19, 21, 24]. Zimmermann offered the solution for the formulation by fuzzy linear programming [27]. Fuzzy linear programming models are robust and flexible [10, 11, 13]. Decision-makers consider the existing alternatives under given constraints, but also develop new alternatives by considering all possible situations [28]. Various types of membership functions which express a vague aspiration level of a decision maker are proposed such as linear membership function [6, 12], a tangent type of a membership function [29], an interval linear membership function [30], an exponential membership function [23] and an inverse tangent membership function [12] and so on. As a tangent type of a membership function, an exponential membership function, and inverse tangent membership function are a non-linear function, a fuzzy mathematical programming problem defined with a non-linear membership function results in non-linear programming. Usually a linear membership function is employed in order to avoid non-linearity. Nevertheless, there are some difficulties in selecting the solution of a problem written in a linear membership function. Therefore a logistic membership function is employed by [23] to overcome the deficits of the linear function. In this paper a more flexible modified logistic membership function such as S-curve membership is employed in the fuzzy mix product selection problem.
2. METHODOLOGY OF FPS PROBLEM The methodology for this FLP has references to various works by [2, 9, 17, 23, 29, 31]. The approach proposed here is based on an interaction with the decision maker, the implementer and the analyst in order to find a compromised satisfactory solution for a fuzzy linear programming problem (FLP). In a decision process using an FLP model, source resource variables may be fuzzy, instead of precisely given numbers as in the crisp linear programming (CLP) model. For example, machine hours, labor force, material needed and so on in a manufacturing center, are always imprecise, because of incomplete information and uncertainty in various potential suppliers and environments. Therefore, they should be considered as fuzzy resources, and the FLP problem should be solved by using fuzzy set theory. The general methodology to solve FLP is given below: A general model of crisp linear programming is formulated as:
Max
Subject to where c and x n matrix.
x
z
cx
Standard formulation
Ax d b xt0
are n dimensional vectors,
(2.1)
b
is an m dimensional vector, and
A
is m
385
Fuzzy Linear Programming: A Modern Tool for Decision Making
Since the industrial problem of mix product selection occurs in an uncertain environment, the coefficients of objective function ( c ), the technical coefficients of matrix ( A ) and the resource variables ( b ) are fuzzy. Therefore it can be represented by fuzzy numbers, and hence the problem can be solved by the FLP approach. The fuzzy linear programming problem is formulated as:
Max
~
z
~
cx
Fuzzy formulation
~
Ax b ~
Subject to
(2.2)
xt0 ~
~
~
where x is the vector of decision variables; A , b and c are fuzzy quantities; the operations of addition and multiplication by a real number of fuzzy quantities are defined by Zades’s extension principle [25]; the inequality relation
is given by a certain fuzzy
~
relation and the objective function, z , is to be maximized in the sense of a given crisp LP problem. The Carlsson and Korhonen [2] approach is considered to solve FLP problem (2.2) which is full trade-off, meaning that the solution will have a certain degree of satisfaction. ~
~
~
Firstly, formulate the membership functions for the fuzzy parameters of c, A and b . Here a non-linear membership function such as logistic function is employed. The
P a , P b and P c
membership functions are represented by
variables for
i
cj
=1,…,m and
i
for i =1,…, m and
A
technical coefficients of matrix
ij
j
, where
j =1,…, n , bi
a ij
are the
are the resource
are the coefficients of objective function
z
for
j =1,…, n . Next, through the appropriate transformation with the assumption of trade-off between ~
fuzzy numbers of
~
~
a ij
,
bi
and
cj
i
~
will be obtained. After trade-off between
P
Pc
j
Pa
ij
Pb
i
~
and
~
~
a ij , bi
~
~
j , an expression for a ij , bi and c j and
cj
the solution will exist at [2] :
for all i 1,..., m and j 1,..., n
(2.3)
Therefore, we can obtain:
c = g c (P ) ,
A
=
g A (P )
and
b
=
g b (P )
(2.4)
386
P. Vasant et al.
where
Pb
P [0,1] and g c , g A
and
gb
are inverse functions [4] of
Pc , P A
and
respectively. Eq. (2.2) becomes Max
z
[ gc (P
)]x
Subject to [ g A ( P ) ] x d
g b (P )
(2.5)
xt0 By using the above methodology, one can find an optimal compromise ‘in between’ as a function of the grades of imprecision in the parameters. Furthermore one can plot the optimal solutions (zk*: k = 1, 2, 3, 4,...) as function of the membership functions in 2 dimensional and 3 dimensional graphical mode. The graphics offer a decision maker a clear holistic perception of how the objective function behaves for varying grades of precision, and enable him to arrive at appropriate conclusions. This compromised fuzzy solution will be given to the implementer for further discussion and implementation. In order to solve FLP it is assumed that the non-linear membership functions and operators are consistent to the judgement of the decision maker and the implementer and to the rationality in the fuzzy decision making processes.
3. FORMULATION OF A LOGISTIC FUNCTION As mentioned by [23], trapezoidal membership function will have some difficulties such as degeneration (some form of deterioration of solution) while solving fuzzy linear programming problems. In order to solve the issue of degeneration, we should employ a non-linear logistic function such as a tangent hyperbolic which has asymptotes at 1 and 0 [4]. In this case, we employ a logistic function for the non-linear membership function as given by:
f ( x)
B 1 Ce Dx
(3.1)
where B and C are scalar constants and D , 0 < D < f is a fuzzy parameter which measures the degree of vagueness, wherein D = 0 indicates crisp. Fuzziness becomes highest when D o f. Equation (3.1) will be of the form as indicated by Figure 3.1 when 0 < D < f. Notation D determine the shapes of a membership function f(x), D > 0. The larger parameter D gets, the vagueness is more, meaning that the availability of parameters such as aij, bi and cj becomes less. It is necessary that parameter D which determines the figures of membership functions, should be heuristically and experientially decided by experts. The logistic function, Equation (3.1) is a monotonically non-increasing function [8], which will be employed as a fuzzy membership function. This is very important because, due to the uncertain environment, the availability of the variables are represented by degree of vagueness. The membership function has shown to be non-increasing.
Fuzzy Linear Programming: A Modern Tool for Decision Making
FIG. 3.1 VARIATION OF F(X) WITH RESPECT TO
D
387
( M2 > M1 )
The reason why we use this function is that, the logistic membership function has a similar shape as that of tangent hyperbolic function employed by [4], but it is more flexible [1, 17] than the tangent hyperbola. It is also known that a trapezoidal membership function is an approximation to a logistic function. Therefore, the logistic function is very much considered as an appropriate function to represent a vague goal level. This function is found to be very useful in making decisions and implementation by decision maker and implementer [16, 23]. The first step is construction of an S-curve membership function for the FPS problem. This is followed by formulation of the FLP problem which represents the FPS problem. This mathematical model of FLP problem will be solved by using the LP toolbox in MATLAB.
4. MODIFIED S-CURVE MEMBERSHIP FUNCTION There are many possible forms for a membership function: linear, exponential, hyperbolic, hyperbolic inverse, piece-wise linear, etc [20]. Here we employed the modified S-curve form as it is not as restrictive as the linear form, but flexible enough to describe the vagueness in the fuzzy parameters. The S-curve membership function is a particular case of the logistic function with specific values of B, C and D. These values are to be discovered. This logistic function as given by Equation (4.1) and depicted in Figure (4.1) is indicated as S-shaped membership function by [5, 26]. If the obtained membership value of the solution is appropriate and proper, that is, it is included in (0,1), regardless of the shape of a membership function, whether we employ a linear membership function or a non-linear membership function to the analysis, both solutions are not different so much [23]. Nevertheless, it is possible that the non-linear membership function such as S-curve membership function changes its shape according to
388
P. Vasant et al.
the parameter values. Then a decision maker is able to apply his strategy to a fuzzy mix product selection problem using these parameters. Therefore, the non-linear membership function is much more convenient than the linear ones. We define, here, a modified S-curve membership function as follows:
P ( x)
where
P
1 ° °0.999 °° B ® Dx °1 Ce °0.001 ° °¯0
x xa x
xa
x a x xb x
(4.1)
xb
x ! xb
is the degree of membership function.
Figure 2 shows the S-curve. In Equation (4.1) the membership function is redefined as 0.001 d P (x) d 0.999. This range is selected because in manufacturing system the work force need not be always 100% of the requirement. At the same time the work force will not be 0%. Therefore there is a range between xa and xb with 0.001 d P(x) d 0.999 . This concept of range of P (x) is used in the real life applied problem of mix product selection.
FIG. 4.1 S-CURVE MEMBERSHIP FUNCTION
Fuzzy Linear Programming: A Modern Tool for Decision Making
389
We rescale the x axis as xa = 0 and xb = 1 in order to find the values of B, C and D. In [15] such a rescaling work was done in social sciences. The values were calculated by analytically as B = 1, C = 0.001001001 and D = 13.813. Here we only consider one problem of FPS in which the objective coefficients, technical coefficients and resource variables all are fuzzy. The FLP model for this problem is given in Equation (4.2). The objective function is the revenue for the FPS problem. 8
Maximize
~
¦c
z
j
xj (4.2)
j 1
29
subject to
8
~
¦¦a
~
ij
x j d bi
i 1 j 1
~
~
~
where c j ,a ij and c j are fuzzy parameters . Equation (4.2) is solved by using fuzzy parametric programming approaches [2] and a modified S-curve membership function used as a methodology [16]. The input data for cj is the revenue fuzzy values, aij technical coefficients and bi is the resource variables for FPS problem. There are 29 constraints and 8 products and hence in Equation (4.2), i = 1, 2, 3,...,29 and j = 1,2,3,...,8. Membership function and membership values for cj are constructed and valued. The FLP problem has been formulated and all the coefficients are parameterized. However, it will not be possible to use the linear parametric formulation to solve the FLP problem since the membership functions are non-linear [2]. Then, it is needed to carry out a series of experiments for 21 membership values: Paij = Pbi = Pcj = P = 0.0010, 0.0509, 0.1008,..., 0.9990 with an interval of 0.0499. These experiments are carried out by using the Simplex Method in the Optimization Tool Box of MATLAB.
390
P. Vasant et al. ~
5. FUZZY COEFFICIENT FOR OBJECTIVE FUNCTION c j
Since cj is a fuzzy coefficient for the objective function as in Equation (5.4), it is denoted as c҇j. Therefore
c~j
· § c b c aj · 1 § B ¸ ¸ ln ¨ 1 c aj ¨¨ j ¸ C ¨P ¸ D © ¹ © cj ¹
The membership function for
Pc
j
and the fuzzy interval,
(5.5)
c aj to c bj ,
in Figure (4.1) . Using Equation (5.5) and the formulation (4.2) is made equivalent to:
for
c~j
is given
391
Fuzzy Linear Programming: A Modern Tool for Decision Making
§ ªcbj caj º 1 ª B º·¸ a ¨ Max ¦ c j « » ln « 1» x j ¨ D ¼» C «¬Pcj »¼¸ j 1 « ¬ ¹ © 8
b a b a § ª º· ¨aa ªaij aij ºln 1 « B 1»¸ x d ba ªbi bi º ln 1 ª« B 1º» ¦ ¨ ij «« D »» C «Pa »¸ j i «¬ D »¼ C «Pb » i 1 ¼ ¬ ¬ i ¼ ¬ ij ¼¹ © 29
(5.6)
where x j t 0, j 1, 2,3...,29 , 0 Pcj , Paij , Pbi 1 , 0 D f. In Equation (5.6), the best value for the objective function at the fixed level of P is reached when [2]
P
Pc
j
Pa
ij
P b for i 1, 2 ,...,29; j 1, 2 ,...,8
(5.7)
i
~
Using Equation (5.6) with the above values of D, B and C, values of
Pc
and computed for the range adjacent
Pc
j
j
0.001 to P c j
cj
are generated
0.999 . The interval between two
values can be arbitrary but has to be as small as possible to reach a level of
precision in the optimal solution. Here an interval for [9] has indicated that the membership function
Pc
j
way is by using a functional rule for determining forming a function for
Pc
j
Pc
j
is considered as 0.0499. Kuzmin
can be obtained in several ways. One
Pc
j
. This observation is adopted in
as given in Equation (5.6). Carlsson and Korhonen [2] have
also used their own functional rule for
Pc
j
. The interval for
Pc
j
for computation of
cj ,
in their works as 0.1, is considerably large.
6. CASE STUDY ANALYSIS AND FUZZY MODELING Due to limitations in resources for manufacturing a product and the need to satisfy certain conditions in manufacturing and demand, a problem of fuzziness occurs in production planning systems. This problem occurs also in chocolate manufacturing when deciding a mixed selection of raw materials to produce varieties of products. This is referred here to as the Product - mix Selection [22] The Fuzzy Product – mix Selection (FPS) is stated as: There are n products to be manufactured by mixing m raw materials with different proportion and by using k varieties of processing. There are limitations in resources of raw materials. There are also some constraints imposed by marketing department such as
392
P. Vasant et al.
product – mix requirement, main product line requirement and lower and upper limit of demand for each product. All the above requirements and conditions are fuzzy. It is necessary to obtain maximum revenue with certain degree of satisfaction by using fuzzy linear programming approach. Chocoman Inc. manufactures 8 chocolate products. There are 8 raw materials to be mixed in different proportions and 9 processes (facilities) to be utilized. The product demand, material and facility available are illustrated in Table 6.1 and Table 6.2 respectively. Table 6.3 and Table 6.4 give the mixing proportions and facility usage required for manufacturing each product. Table 6.1. Demand of product
Synonym
Product
x1 x2 x3 x4 x5 x6 x7 x8
Milk chocolate, 250 g Milk chocolate, 100 g Crunchy chocolate, 250 g Crunchy chocolate,100 g Chocolate with nuts, 250g Chocolate with nuts,100g Chocolate candy Wafer
MC 250 MC 100 CC 250 CC 100 CN 250 CN 100 CANDY WAFER
Fuzzy Interval (x103 units) [500,625) [800,1000) [400,500) [600,750) [300,375) [500,625) [200,250) [400,500)
Table 6.2. Raw material and facility availability
Raw Material/Facility (units) Coco (kg) Milk (kg) Nuts (kg) Confectionery sugar (kg) Flour (kg) Aluminum foil (ft2) Paper (ft2) Plastic (ft2) Cooking (ton-hours) Mixing (ton-hours) Forming (ton-hours) Grinding (ton-hours) Wafer making (ton-hours) Cutting (hours) Packaging 1 (hours) Packaging 2 (hours) Labor (hours)
Fuzzy Interval for Availability [75000,125000) [90000,150000) [45000,75000) [150000,250000) [15000,25000) [375000,625000) [375000,625000) [375000,625000) [750,1250) [150,250) [1125,1875) [150,250) [75,125) [300,500) [300,500) [900,1500) [750,1250)
In each Table, the entries are given as non fuzzy data as well as fuzzy data with two limits; the lower limit is a crisp data whereas the upper limit is a fuzzy data and hence the range is fuzzy. For example, in Table 6.2, MC 250 (Milk Chocolate 250 gm) has a certainty
Fuzzy Linear Programming: A Modern Tool for Decision Making
393
of 500,000 units of demand. But the range 625,000 – 500,000 = 125,000 is fuzzy. This fuzziness is due to various reasons such as availability and usage of raw material, availability and usage of process facilities, etc. It is unnecessary to indicate that the nature of fuzziness is inevitable in any large manufacturing center such as Chocoman Inc. The following constraints were established by the sales department of Chocoman: 1. Product mix requirements. Large–sized products (250g) of each type should not exceed 60% (non fuzzy value) of the small-sized product (100g), such that : (6.1) x1d[45%,75%)x2 (6.2) x3d[45%,75%)x4 (6.3) x5d[45%,75%)x6 2. Main product line requirement. The total sales from candy and wafer products should not exceed 15% (non fuzzy value) of the total revenues from the chocolate bar products, such that: 3. [300,500) x7 + [112.5,187.5) x8 d [42.19, 70.31) x1 + [ 16.87, 28.12) x2 + [45, 77) x3 + [18, 30 ) x4 + [ 47.25, 78.75 )x5 + [19.69,32.81)x6 (6.4)
394
P. Vasant et al.
Fuzzy Linear Programming: A Modern Tool for Decision Making
395
396
P. Vasant et al.
Table 6.5. Objective coefficients Product
Fuzzy IntervalProfit ($/100 unit)
Milk chocolate, 250 g
[135,225)
Milk chocolate, 100g
[62,104)
Crunchy chocolate, 250g
[115,191)
Crunchy chocolate, 100g
[54,90)
Chocolate with nuts, 250g
[97,162)
Chocolate with nuts, 100g
[52,87)
Chocolate candy
[156,261)
Chocolate wafer
[62,104)
By using a linear programming technique we are able to solve the fuzzy mix product selection problem and a fuzzy frontier solution for revenue function could be obtained. The obtained results are summarized in the following section.
7. RESULT OF FUZZY FRONTIER SOLUTION The FPS problem is solved by using MATLAB and its tool box of Linear Programming (LP). The vagueness is given by D, and P is the degree of satisfaction. The LP tool box has two inputs, namely D and P, in addition to the fuzzy parameters. There is one output z*, the optimal revenue. The given values of various parameters of chocolate manufacturing are fed to the tool box. The solution can be tabulated and can be presented as a two and three dimensional graph.
Fuzzy Linear Programming: A Modern Tool for Decision Making
397
Fig. 7.1. Revenue and Degree of Satisfaction at D = 13.813 Fig. 7.1. Revenue and Degree of Satisfaction at D = 13.813 From Table 7.1 and Figure 7.2, we can conclude that a higher degree of satisfaction gives a higher value of revenue. But the realistic solution for the above problem exist at 50% of degree of satisfaction [2], that is 608440. From Fig. 4 we can see that the fuzzy outcome of the revenue function, z* is an increasing function.
7.1 Revenue z* For Various Values Of Vagueness, D The membership value P in Figure 7.1 represents the degree of satisfaction and z* is revenue function of FPS problem. We can observe that when vagueness increases, the revenue value at particular P decreases. This phenomenon actually occurs in real life problems in a fuzzy environment. The ideal solution in a fuzzy environment exists at P = 0.5 [2, 16, 17, 23]. Hence the result for 50% degree of satisfaction for 2 d D d 40 and the corresponding values for z* are presented in Table 7.1.
398
P. Vasant et al.
Fig.7.2. Revenue and degree of satisfaction for 2 d D d 40
Table 7.1 : Vagueness D and revenue z* at 50% degree of satisfaction Vagueness D 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Revenue z* 761960 758940 745660 709900 666350 631840 606370 587130 572130 560110 550260 542050 535090 529120 523950 519420 515420 511850 508560 504920
Fuzzy Linear Programming: A Modern Tool for Decision Making
399
The data in Table 7.1 is the outcome of analyzing the FPS problem for fuzzy mathematical model of Equation (5.6). This data are very useful for the decision maker to take a specific decision towards implementation after consulting the implementer. The 3 dimensional plot for P, D and z* . It is found that the S-curve membership function with varies value of D offer an acceptable solution with certain degree of satisfaction in fuzzy environment. More vagueness results in less revenue. The relationship between z*, P and D is given in Figure 6.1. This figures and tables are very useful for the decision maker to find the revenue at any given value of D with degree of satisfaction P. From Figure 6.1, it can be seen that for higher value of degree of satisfaction the revenue will not be higher. But at 99% degree of satisfaction the revenue value will be largest even with higher value of vagueness From the diagonal values in Figure 6.1, we can conclude that when vagueness increases in the fuzzy parameters then revenue reduces. This means one should satisfy with a degree of satisfaction when come to making decisions in a fuzzy environment. The result shows that the outcome almost does not depend on the decision made at an early stage of input level for fuzzy parameters of objective coefficients, technical coefficients and resource
Fig. 6.1 : Variation of Revenue z* in terms of P and D variables. From the theory and numerical results, it can be seen that the method presented here of solving the fuzzy mix product selection with modified S-curve membership function is very promising and encouraging. Moreover in employing a modified S-curve membership function as a methodology to denoting fuzzy parameters, solution P, of the fuzzy mix product selection problem, holds the relation 0 < P < 1 and, therefore, exists on the efficient new frontier solution.
400
P. Vasant et al.
8. CONCLUSION The S-curve membership function was used in generating fuzzy parameters towards solving an industrial production planning problem. These parameters are defined in terms of the fuzzy linear programming problem and named as the fuzzy coefficients of the objective function, fuzzy technical coefficients and fuzzy resource variables. Membership values for these fuzzy parameters were created by using the S-curve membership function. This formulation is found to be suitable in applying the Simplex Method in a Linear Programming (LP) approach. This approach of solving an industrial production planning problem has ramifications for the decision maker, the implementer and the analyst. It is to be noted that higher revenue need not lead to a higher degree of satisfaction. The decision maker has to be satisfied with the revenue obtained through the FLP process with respect to degree of satisfaction. Therefore there must be interaction between the analyst and the decision maker to continue this process until the decision maker is satisfied with his preferred solution. As analysts, we need to work in hand to hand with the decision maker so that he gets the best outcome (highest degree of satisfaction) from this interactive process towards achieving a higher profit in a situation filled with vagueness. Furthermore, for the problem considered, the optimal satisfaction solution infers that by incorporating fuzziness in a linear programming model in objective function, technical coefficients, resource variables and decision variables provides a better level of satisfaction for obtained result compared to non-fuzzy linear programming.
ACKNOWLEDMENT The authors would like to sincerely thank the reviewers for their very valuable comments and suggestions for the improvement of this paper.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9.
Bells S (1999) Flexible membership functions. Http:/www. Louder than a bomb com/spark_features.html Carlsson C, Korhonen PA (1986) Parametric approach to fuzzy linear programming. Fuzzy Sets And Systems 20 : 17-30 Chen HK, Chou HW (1996) Solving multiobjective linear programming problems – a generic approach. Fuzzy Sets and Systems 82: 35-38 Freeling ANS (1980) Fuzzy sets and decision analysis. IEEE Transaction on Systems, Man and Cybernetics 10: 341-354 Goguen JA (1969) The logic of inexact concepts. Syntheses 19: 325-373 Hannan EL (1981) Linear programming with multiple fuzzy goals. Fuzzy Sets and Systems 6: 235-248 Hu CF and Fang SC (1999). Solving fuzzy inequalities with piecewise linear membership functions. IEEE Transactions On Fuzzy Systems 7: 230-235 Jeffery A (1996) Mathematics for engineers and scientists. Chapman and Hall, London Kuz’min VB (1981) A Parametric approach to description of linguistic values of variables and hedges. Fuzzy Sets and Systems 6: 27-41
Fuzzy Linear Programming: A Modern Tool for Decision Making
401
10. Lai TY, Hwang CL (1993) Possibilistic linear programming for managing interest rate risk. Fuzzy Sets and Systems 54: 135-146 11. Lai TY, Hwang CL (1994) Fuzzy multi objective decision making: methods and applications. Spinger-Verlag, Berlin 12. Leberling H (1981) On finding compromise solutions in multicriteria problems using the fuzzy min-operator . Fuzzy Sets and Systems 6: 105-118 13. Lootsma FA (1997) Fuzzy logic for planning and decision making. Kluwer Academic Publishers, Dordrecht/Boston/London 14. Maleki HR, Tata M, Mashinchi M (2000) Linear programming with fuzzy variables. Fuzzy Sets and Systems 109: 21-33 15. Nowakowska N (1977) Methodological problems of measurement of fuzzy concepts in the social sciences. Behavioral Science 22: 107-115 16. Pandian MV (2002) A methodology of decision making in an industrial production planning using interactive fuzzy linear programming. M.Sc. Thesis, University Malaysia Sabah 17. Pandian MV (2003) Application of fuzzy linear programming in production planning. Fuzzy Optimization and Decision Making 3:229-241 18. Parra MA, Terol AB, Rodrfguez Una MV (1999) Theory and methodology solving the multi- objective possiblistic linear programming problem. European Journal of Operational Research 117: 175-182 19. Rommelfanger H (1996) Fuzzy linear programming and applications. European Journal of Operational Research 92: 512 20. Sakawa M (1983) Interactive computer program for fuzzy linear programming with multiple objectives. Journal Man-Machine Studies 18 : 489-503 21. Sengupta A, Pal TK, Chakraborty D (2001) Interpretation of inequality constraints involving interval coefficients and a solution to interval linear programming. Fuzzy Sets and Systems 119: 29-138 22. Tabucanon MT (1996) Multi objective programming for industrial engineers. In: Mathematical programming for industrial engineers, Marcel Dekker, Inc, New York, pp 487-542 23. Watada J (1997) Fuzzy portfolio selection and its applications to decision making. Tatra Mountains Mathematics Publication 13:219-248 24. Wu YK, Guu SM (1999) Two phase approach for solving the fuzzy linear programming problems. Fuzzy Sets and Systems 107:191-195 25. Zadeh LA (1971) Similarity relations and fuzzy orderings. Information Science 3:177206 26. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning I, II, III. Information Sciences 8 : 199-251, 301-357 ; 9 ,43-80 27. Zimmermann HJ (1976) Description and optimization of fuzzy system. International Journal of General Systems 2:1976 28. Zimmermann HJ (1978) Fuzzy programming and linear programming with several objective functions. Fuzzy Sets and Systems :45-55 29. Zimmermman HJ (1985) Application of fuzzy set theory to mathematical programming. Information Sciences 36: 25-58 30. Zimmermann HJ (1987) Fuzzy Sets, Decision Making, and Experts Sys tems. Kluwer,Boston 31. Zimmermann H.J (1991) Fuzzy Set Theory-and Its Applications. Kluwer, Boston
CHAPTER 28 Fuzzy Logic Control in Hybrid Power Systems Josiah Munda1 , Sadao Asato2 , and Hayao Miyagi3 1 2 3
Tshwane University of Technology, Power Engineering Department Private Bag X680, Pretoria 0001, South Africa,
[email protected] Okiden Sekkei Co Inc, Engineering Department, 5-2-1 Makiminato, Urasoe, Okinawa 901-2131, Japan
[email protected] University of the Ryukyus, Information Engineering Department, 1 Senbaru, Nishihara, Okinawa 903-0213, Japan
[email protected]
Abstract: In this chapter the dynamic analysis of a hybrid power system, employing diesel and locally available wind and solar energies, is discussed and performed. Diesel engines drive synchronous generators, which are equipped with governors and automatic voltage regulators. Self-excited induction generators are used to convert the mechanical energy of wind turbines into electrical energy, and the value of the terminal capacitors is controlled so as to maintain constant terminal voltage. Solar energy is converted into electrical energy in the PV arrays. To obtain maximum power output from the wind turbines and PV arrays, as wind speed and solar irradiance vary, fuzzy logic control is introduced in the control of various system parameters. To guarantee the performance of the fuzzy controller based on stability and robustness properties, a sliding mode control aspect is incorporated in this study. Keywords: Fuzzy logic control, sliding mode control, power system, wind turbine, PV arrays
1 Introduction Hybrid power systems employing different forms of renewable energy sources are nowadays a common feature the world over. Wind energy conversion systems and photovoltaic (PV) systems are extensively used in areas with high wind and solar potentials. These systems, which are very common in remote areas, are exposed to extreme variations in wind speed and solar insolation levels. This, coupled with typical load fluctuations, poses unique challenges in the design and operation of such systems [1]. Self-excited induction generators are used in wind energy conversion systems due to a number of factors, some of which are: simplicity, ruggedness, lower capital cost, no need for a separate exciter, simpler starting, and ability to produce electrical energy at a wide range of speeds. Many researchers have studied control means to achieve almost constant power irrespective of wind speed, within wind cut-in and cut-out speeds, which are about 5 m/s and 25 m/s respectively.
J. Munda et al.: Fuzzy Logic Control in Hybrid Power Systems, Studies in Computational Intelligence (SCI) 2, 403–413 (2005) c Springer-Verlag Berlin Heidelberg 2005 www.springerlink.com
404
J. Munda et al.
Power system control is required to maintain a continuous balance between electrical generation and a varying load demand, while system frequency, voltage levels and security are maintained. Stability and control analysis of self-excited induction generators is complicated by the fact that the terminal voltage and frequency are not initially known. Another problem is the nonlinear nature of the magnetizing reactance. To implement adequate control, it is essential to clearly understand the dynamic performance of each element of the system. For practical purposes only those dynamics, which are relevant to a particular time scale, need to be included in the models of the system [2]. This aspect of modelling is the biggest challenge in conventional control strategies, especially for large interconnected power systems. The concept of fuzzy logic control (FLC) significantly alters the control approach, as it does not adjust the system control parameters based on accurate mathematical models of the process dynamics [3]. One of the earliest fuzzy logic applications in power systems was in the area of intelligent control. Fuzzy systems, utilizing heuristic knowledge, have been employed very effectively as controllers. These controllers, modelling the thinking process of a human expert, replace the conventional complex nonlinear controllers and are easier to implement [4]. Most of the available controllers, however, concentrate on enhancing control by governors, AVR, and FACTS devices on systems supplied only from synchronous generators, whereby simulation tests are performed to confirm improved operating conditions from the point of view of synchronous machine parameters [5]–[8]. In [9], a fuzzy logic control technique is implemented to design a tracking controller with the objective of identifying and extracting the maximum power from a wind energy system and transferring this power to a utility. This study considers the hybrid power system case, that is, where supply is from both conventional sources of energy and renewable ones. For hybrid power systems, in which the contribution from renewable energy conversion systems is very small compared with the diesel power output, most system operators simply employ governor control (diesel engine) while letting, say, the wind power output to vary freely within wind cut-in and cut-out speeds. Where the contribution from the renewable sources is relatively big, it becomes necessary to derive as much power as possible from these sources so as to keep the use of the diesel system in the most cost effective range, bearing in mind the related environmental impacts. Sliding mode control, based on the theory of variable structure systems, has attracted a lot of research on control systems for over 20 years now. The main advantage of sliding mode control is robustness against structured and unstructured parameter uncertainties [10]. This article considers fuzzy logic control of the wind turbine blade pitch angle, the stability analysis of which is derived from sliding mode control theory. Simulation studies are conducted to compare the performance of fuzzy logic controllers with that of conventional control methods for various abnormal operating conditions in the system. The control targets are: to maintain synchronous generator frequency and terminal voltages within their respective limits, and to keep the induction generator slip as much as possible within the stable operating region. This chapter is organised as follows: Modelling of the system machines is given in Section 2. In Section 3, the sliding mode control strategy is discussed. Section 4 deals with the fuzzy logic controller details. Simulation studies are then presented in Section 5, followed by conclusions in Section 6.
Fuzzy Logic Control in Hybrid Power Systems
405
2 Modelling The system under consideration is given in Fig. 1. In this figure, SG1 and SG2 represent aggregate synchronous generators, IG1 and IG2 – aggregate induction generators, PV – photovoltaic cells, B1–B10 – buses, and L1–L19 – connected loads. SG1
B7
B3 PV
B1 B5
B6
L3,4 IG1
L19
L1
B10
B4 B2
B9
B8
L7,8 IG2
L9-18
SG2
L5
L6
Fig. 1. Test system
The synchronous generator equations are written with the assumptions that: the machine can be represented by a voltage behind transient impedance, the phase angle of the voltage coincides with the rotor angle, the system is initially in sinusoidal steady state, and the machine is equipped with a governor and an AVR which can be represented by first-order equations. These equations are [11]: dδi ˜ i = ωi − ω0 =ω dt ˜i dω = Pmi + ∆Pmi − Pei Mi dt d∆Pmi 1 U ˜ i + ∆Pmi ] + Gi =− [KGi ω dt TGi TGi dE qi Tdoi = E f di + ∆E f di − Eqi − (xdi − xdi )Idi dt d∆E f di 1 = − [KAi ∆Vi + ∆E f di ] dt TAi
(1) (2) (3) (4) (5)
where δi = generator rotor angle [rad], ωi , ω0 = rotor speed and synchronous speed, respectively [rpm], Pei is the electrical power output [p. u.], Pmi = mechanical power input [p. u.], = q-axis Mi = moment of inertia [p. u.], Idi = d-axis component of stator current [p. u.], Eqi = d-axis synchronous and transient reactance, recomponent of stator voltage [p. u.], xdi , xdi = open circuit d-axis time constant [s], E spectively [p. u.], Tdoi f di = excitation field voltage [p. u.], KAi , TAi = controller gain and time constant of the AVR, KGi , TGi = controller gain
406
J. Munda et al.
and time constant of the governor, UGi = fuzzy controller output signal for the governor, ∆Pmi , ∆E f di , ∆Vi = change in mechanical power, field voltage, and terminal voltage, respectively. For the induction generators, we shall assume that the phase angle of the voltage coincides with the rotor angle. The mechanical torque is obtained from a wind turbine, which is connected through a gearbox. The mathematical models of the induction generator are derived as [11]: dsk = Tgk − Tmk + ∆Tmk dt dEdk = −Edk Tok + (xk − xk )Iqk + ωb sk Eqk dt dEqk = −Eqk Tok − (xk − xk )Idk − ωb sk Edk dt 2Hk
(6) (7) (8)
where Tgk = Pgk /ωk = electromagnetic (developed) torque [p. u.], Pgk = electrical power output [p. u.], Hk = inertia constant [s], Tmk = mechanical torque [p. u.], ∆Tmk = change in mechanical = open circuit time constant [s], ω , ω = bus and torque, due to variations in wind speed, Tok b k rotor speed respectively [rpm], and sk = slip [p. u.], where:
sk =
ωb − ωk ωb
(9)
The mechanical torque from the wind turbine can be represented by the following expression [12]: Tmk = Ctk KtkVw2
(10)
where: Ctk = [−C1k βk −C2k ]γk + [C3k βk +C4k ], Ktk =
ρπR3k , 2
γk =
Rk ωk pk GkVw
Here C1k — C4k = constants, Ctk = torque coefficient, βk = pitch angle of windmill blade [deg], Vw = wind speed [m/s], γk = angular speed rate of windmill, ρ = air density, Rk = radius of windmill [m], Gk = gear ratio of gearbox, pk = number of pole pairs of induction generator. The initial change in the mechanical torque is assumed to be initiated by the variations in wind speed. The resulting change is then neutralised by an appropriate change in the pitch angle of the turbine blades, leading to the overall change in mechanical torque: ∆Tmk = ∆Tmkw + ∆Tmkβ
(11)
where ∆Tmkw is the wind component, and ∆Tmkβ - pitch angle part. The equations for the control of the self-excitation capacitor, and pitch angle of wind turbine blade will be represented as follows:
Fuzzy Logic Control in Hybrid Power Systems 1 d∆XSEk =− [KSEk ∆Vk + ∆XSEk ] dt TSEk Uβk d∆βk 1 =− [Kβk ∆Vw + ∆βk ] + dt Tβk Tβk
407
(12) (13)
Here KSEk , Kβk = controller gains of self-excitation and blade pitch angle controllers; TSEk , Tβk = time constants of self-excitation and pitch angle controllers; ∆Vk , ∆XSEk , ∆Vw = change in terminal voltage, self-excitation reactance, and wind speed; and Uβk = fuzzy controller output signal for the blade pitch angle. The voltage equation, from which ∆Vk can be obtained, is written as: E k = Vk + (rk + jxk )Ik
(14)
+ jE = |E | δ , V = terminal voltage [p. u.], r = stator resistance [p. u.], where E k = Edk k k k qk k Edk , Eqk = d-axis and q-axis components of stator voltage, respectively [p. u.] The PV array output power can be modelled simply as a combination of a dc component and an ac one [13]:
PPV (t) = PDC + KAC sin(ωPV − t)
(15)
where ωPV = 2πFPV , and FPV , the PV array output variation frequency may be arbitrarily determined depending on typical area solar insolation patterns.
3 Sliding Mode Control The equation of motion of an inductor generator (6) can be written in state variable form as: x˙1k = x2k x˙2k =
ωb 2Hk [Tmk + ∆Tmk − Tgk ]
(16)
where x1k = δk − δok and x2k = dxdt1k = ωb (sok − sk ) [14]. δk and δok are induction generator rotor angle and equilibrium state angle, respectively. sok is the equilibrium state slip. Sliding mode control is a feedback control with switching parameters. The switching line A that gives the desired dynamic charateristics to the control system is defined by: A = λx1k + x2k
(17)
where λ is the slope of the switching line, and λ > 0. By assuming a Lyapunov function of the form V = 12 A2 , the condition for sliding motion to occur on A = 0 can be written as: V˙ = AA˙ = λA2 + ATa < 0
(18)
where Ta is the effective acceleration torque, given by: ωb [(Tmk + ∆Tmk − Tgk ) − λ2 x1k ] (19) 2Hk From (18), the following two conditions must be satisfied for the system to be stabilized: Ta =
408
J. Munda et al. |Ta | > λA Ta > |λA|
Ta < 0 if A > 0 and Ta > 0 if A < 0 and
(20)
In terms of the mechanical torque, the conditions become: k 2 Tmk + ∆Tmk < Tgk + 2H ωb λ x1k
if
A > 0
k 2 Tmk + ∆Tmk > Tgk + 2H ωb λ x1k
if
A < 0
(21)
where Tk = Tmk + ∆Tmk Let us now introduce the question of modelling uncertainties. It is generally very difficult to measure rotor angles in multimachine power systems. The rotor angle of an induction machine has even been described by various researchers as having no consequence in stability analysis [11]. To deal with this difficulty, assume the range of uncertainty of the equilibrium point to be δokmin ≤ δok ≤ δokmax . Also, introduce a sigmoid function ρk , defined as: ρk = −0.5 + 1.0/(1.0 + e−τk )
(22)
where τk = σ1 |x2 | + σ2 |Tmk + ∆Tmk − Tgk |; σ1 , σ2 are positive constants; δokmin = δk − ρk , δokmax = δk + ρk [7]. The new set of conditions then becomes:
k 2 Tk ≤ Tgk + 2H ω b λ ρk
if
x2k > λρk
k 2 Tk ≥ Tgk + 2H ω b λ ρk
if
x2k < −λρk
(23)
Finally, to test robustness to parameter variation, we assume a range of values of inertia constant Hkmin < Hk < Hkmax , and consider the worst case scenario: Tk ≤ Tk∗ = Tgk + 2Hωkmin λ2 ρk if x2k > λρk b
Tk ≥ Tk∗ = Tgk + 2Hωkmax λ2 ρk if x2k < −λρk b
(24)
where Tk∗ is the desired mechanical torque for the generator to be stable. Equation (24) represents the final form of the slope of the switching line A and the desired generator mechanical torque for stable operation, taking into account the uncertainty of δ and parameter variations of H.
4 Fuzzy Logic Controllers The basic configuration of a Fuzzy Logic Controller (FLC) is shown in Fig. 2 [6]. Fuzzy controllers consist of a collection of control rules (if-then rules) describing the behaviour of the controller by employing linguistic terms (fuzzification stage), an inference mechanism (decision making logic), and an output interface (defuzzification stage). At the decision making
Fuzzy Logic Control in Hybrid Power Systems
409
stage, the fuzzy control is obtained via a compositional rule of inference, usually the max-min compositional rule. The knowledge base consists of a database, which provides necessary definitions used to define linguistic control rules and fuzzy data manipulation, and a rule base, which characterizes the control goals and control policy of the domain experts by means of a set of linguistic control rules.
Input
Normalization and fuzzification
Decision making logic
Defuzzification
Output
Knowledge base
Fig. 2. Configuration of a typical FLC
In this study, we consider two-inputs single-output fuzzy controllers. The control inputs for the governor will be the deviation in rotor speed and the associated rate of change of speed deviation. The membership functions for speed deviation and change in speed deviation are composed of 5 and 3 parts, respectively. A set of conditional statements to control the two variables are given in Table 1.
Table 1. Fuzzy rules for governor control ∆ω2 ˙2 ∆ω
NB NS Z
N Z P
PB PM PT NT NS PM PS Z NS NM PS PT NT NM NB
PS PB
The control strategy is to increase the mechanical power input as the speed decreases, brought about by reduction in the output power from the renewable sources or by an increase in load. For example, the first rule states that, if the error in speed is negative big (say, due to an increase in load), and the error rate is negative, then the change in input power should be positive big. The abbreviations used in Table 1 stand for: N: Negative, NB: Negative Big, NM: Negative Medium, NS: Negative Small, NT: Negative Tiny, P: Positive, PB: Positive Big, PM: Positive Medium, PS: Positive Small, PT: Positive Tiny, Z: Zero.
410
J. Munda et al.
For the blade pitch angle control, based on (24), we shall use the deviations in generator rotor speed, x2k , and mechanical torque, Tk − Tk∗ , corresponding to deviations in wind speed. The membership functions for these control inputs consist of 5 parts each. The corresponding set of conditional statements to control the variables in this case is provided in Table 2. The abbreviations used here have the same meanings as those used in Table 1.
Table 2. Fuzzy rules for pitch angle control x2k Tk −Tk∗
NB NS Z
PS PB
NB NS Z PS PB
NB NM NM NS Z
NS Z PS PS PM
NM NS NS Z PS
NS NS Z PS PS
Z PS PM PM PB
In both cases the input and output variables are transformed by means of necessary scaling factors into realistic values, used in fuzzy systems theory (in the interval [0, 1]). We then employ the correlation product inference mechanism and the centre of gravity defuzzification method commonly used in fuzzy control systems [15][16], to generate the required crisp control signals. The output of the fuzzy controller is a fuzzy set, while the power system under control requires nonfuzzy control values, hence the need for defuzzification, that is, conversion of fuzzy quantities to crisp values. For example, if we have a set of premises: Ri : if εi (error in x2k ) is Ai and ε j (error in Tk ) is Bi then ui (control) is Ci The fuzzy control is obtained through the compositional rule of inference as: Ui = max[min(µAi , µBi , µCi )];
i = 1, 2, · · · , n
(25)
where µAi , µBi , µCi are the membership functions of the respective parameters. We can express the firing strengths (weights) Wi of the premises as: Wi = min[µAi (εi0 ), µBi (ε j0 )]
(26)
where µAi (εi0 ), µBi (ε j0 ) are regarded as the degrees of partial match between the user-supplied data and data in the rule base. A crisp control action is then obtained as the weighted combination: uβk =
∑ni=1 UiWi ∑ni=1 Wi
(27)
5 Simulations In the simulation studies, we consider three abnormal operating conditions for the system given in Fig. 1: (a) short circuit in the middle of one of the parallel lines connecting buses 1 and 2, (b) uniform increase in system load, and (c) decrease in the speed of wind. Synchronous
Fuzzy Logic Control in Hybrid Power Systems
411
generator SG2 is equipped with a fuzzy-controlled governor. The wind turbines connected to induction generators IG1 and IG2 have their pitch angles controlled by FLC. Time domain simulation studies are performed for the above faults, and the results are given in Figs. 3–7 below. For faults (a) and (b), tests are done for the following cases: (i) conventional control of GOV, (ii) fuzzy control of GOV. Similar tests are performed for fault (c) by considering pitch angle control, and variation in inertia constant of the induction generators. Synchronous generator rotor angles are drawn for SG2, taking SG1 as the reference machine.
30
conventional fuzzy
Rotor angle [rad]
20
10
0
-10
-20
-30 0
2
4
6
8 Time [sec]
10
12
14
Fig. 3. SG2 rotor angle for a short circuit
0
conventional fuzzy
Slip [p.u]
-0.005
-0.01
-0.015
-0.02 0
2
4
6
8 Time [sec]
10
Fig. 4. IG2 slip for a short circuit
12
14
412
J. Munda et al. 0 conventional fuzzy
-0.01 -0.02
Slip [p.u]
-0.03 -0.04 -0.05 -0.06 -0.07 -0.08 0
2
4
6
8 Time [sec]
10
12
14
Fig. 5. IG1 slip for a decrease in wind speed 30
conventional fuzzy
Rotor angle [rad]
20
10
0
-10
-20
-30 0
2
4
6
8 Time [sec]
10
12
14
Fig. 6. SG2 rotor angle for a decrease in wind speed
6 Conclusions This chapter introduced fuzzy logic controllers for governor and windmill blade pitch angle in the control of a hybrid power system. Simulation results show an improvement in the time for the individual generating machine parameters to reach new stable values following various fault cases. In the derivation of the fuzzy logic control quantities, the sliding mode control strategy was incorporated in order to ensure stability and robustness of the control system to parameter variations. The control of a hybrid power system with actual wind and solar insolation data over a reasonably long period of time is a topic of future discussion.
Fuzzy Logic Control in Hybrid Power Systems
30
413
fuzzy H fuzzy 0.8H
Rotor angle [rad]
20
10
0
-10
-20
-30 0
2
4
6
8 Time [sec]
10
12
14
Fig. 7. SG2 angle for a variation in Vw and H1
Acknowledgement The authors gratefully acknowledge the financial and material assistance of The Okinawa Electric Power Company, Japan.
References 1. Bowen A.J, Cowie M, Zakay N (2001) Renewable Energy 22: 429–445 2. Anderson P.M, Fouad A.A (1977) Power System Control and Stability. Iowa State University, Ames 3. Mohamed A.Z et al. (2001) Renewable Energy 23: 235–245 4. Srinivasan D, Liew A.C, Chang C.S (1995) Electric Power Systems Research 35: 39–43 5. Lie T.T, Shrestha G.B, Ghosh A (1995) Electric Power Systems Research 33: 17–23 6. Lakshmi P, Khan M.A (1998) Electric Power Systems Research 47: 39–46. 7. Senjyu T, Uezato K (1998) Journal of Intelligent and Fuzzy Systems 6: 209–221 8. Rastegar H et al. (1999) Electric Power Systems Research 50: 191–204 9. Mohamed A.Z, Eskander M.N, Ghali F.A (2001) Renewable Energy 23: 235–245 10. Ha Q.P, Rye D.C, Durrant-Whyte H.F (1999) Automatica 35: 607–616 11. Pavella M, Murthy P.G (1994) Transient Stability of Power Systems: Theory and Practice. John Wiley & Sons Ltd., England 12. Youjiang Long et al. (2000) Transactions IEE Japan 120B(3): 346–353 13. Kim H, Okada N, Takigawa K (2001) Solar Energy Materials & Solar Cells 67: 559–569 14. Munda J.L, Miyagi H (2002) International Journal of Power and Energy System 22: 8–15 15. Lee C.C (1990) IEEE Trans. Syst. Man Cybernetics 20(2): 404–435 16. Pedrycz W (1993) Fuzzy Control and Fuzzy Systems. Research Studies Press Ltd, England