Forging New Frontiers: Fuzzy Pioneers I
Studies in Fuzziness and Soft Computing, Volume 217 Editor-in-chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected]
Further volumes of this series can be found on our homepage: springer.com
Aspects of Automatic Text Analysis, 2007 ISBN 978-3-540-37520-3
Vol. 202. Patrick Doherty, Witold Šukaszewicz, Andrzej Skowron, Andrzej Szalas Knowledge Representation Techniques: A Rough Set Approach, 2006 ISBN 978-3-540-33518-4
Vol. 210. Mike Nachtegael, Dietrich Van der Weken, Etienne E. Kerre, Wilfried Philips (Eds.) Soft Computing in Image Processing, 2007 ISBN 978-3-540-38232-4
Vol. 203. Gloria Bordogna, Giuseppe Psaila (Eds.) Flexible Databases Supporting Imprecision and Uncertainty, 2006 ISBN 978-3-540-33288-6 Vol. 204. Zongmin Ma (Ed.) Soft Computing in Ontologies and Semantic Web, 2006 ISBN 978-3-540-33472-9 Vol. 205. Mika Sato-Ilic, Lakhmi C. Jain Innovations in Fuzzy Clustering, 2006 ISBN 978-3-540-34356-1 Vol. 206. A. Sengupta (Ed.) Chaos, Nonlinearity, Complexity, 2006 ISBN 978-3-540-31756-2 Vol. 207. Isabelle Guyon, Steve Gunn, Masoud Nikravesh, Lot A. Zadeh (Eds.) Feature Extraction, 2006 ISBN 978-3-540-35487-1 Vol. 208. Oscar Castillo, Patricia Melin, Janusz Kacprzyk, Witold Pedrycz (Eds.) Hybrid Intelligent Systems, 2007 ISBN 978-3-540-37419-0 Vol. 209. Alexander Mehler, Reinhard Köhler
Vol. 211. Alexander Gegov Complexity Management in Fuzzy Systems, 2007 ISBN 978-3-540-38883-8 Vol. 212. Elisabeth Rakus-Andersson Fuzzy and Rough Techniques in Medical Diagnosis and Medication, 2007 ISBN 978-3-540-49707-3 Vol. 213. Peter Lucas, José A. Gámez, Antonio Salmerón (Eds.) Advances in Probabilistic Graphical Models, 2007 ISBN 978-3-540-68994-2 Vol. 214. Irina Georgescu Fuzzy Choice Functions, 2007 ISBN 978-3-540-68997-3 Vol. 215. Paul P. Wang, Da Ruan, Etienne E. Kerre (Eds.) Fuzzy Logic, 2007 ISBN 978-3-540-71257-2 Vol. 216. Rudolf Seising The Fuzzi cation of Systems, 2007 ISBN 978-3-540-71794-2 Vol. 217. Masoud Nikravesh, Janusz Kacprzyk, Lofti A. Zadeh (Eds.) Forging New Frontiers: Fuzzy Pioneers I, 2007 ISBN 978-3-540-73181-8
Masoud Nikravesh · Janusz Kacprzyk · Lofti A. Zadeh Editors
Forging New Frontiers: Fuzzy Pioneers I With 200 Figures and 52 Tables
Masoud Nikravesh
Janusz Kacprzyk
University of California Berkeley Department of Electrical Engineering and Computer Science - EECS Berkeley 94720 USA
[email protected]
PAN Warszawa Systems Research Institute Newelska 6 01-447 Warszawa Poland
[email protected]
Lofti A. Zadeh University of California Berkeley Department of Electrical Engineering and Computer Science (EECS) Soda Hall 729 Berkeley 94720-1776 USA
[email protected]
Library of Congress Control Number: 2007930457
ISSN print edition: 1434-9922 ISSN electronic edition: 1860-0808 ISBN 978-3-540-73181-8 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, speci cally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on micro lm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com c Springer-Verlag Berlin Heidelberg 2007 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a speci c statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Integra Software Services Pvt. Ltd., India Cover design: WMX Design, Heidelberg Printed on acid-free paper
SPIN: 11865285
89/3180/Integra
5 4 3 2 1 0
To Fuzzy Community
Preface
The 2005 BISC International Special Event-BISCSE’05 “FORGING THE FRONTIERS” was held in the University of California, Berkeley, “WHERE FUZZY LOGIC BEGAN, from November 3–6, 2005. The successful applications of fuzzy logic and it’s rapid growth suggest that the impact of fuzzy logic will be felt increasingly in coming years. Fuzzy logic is likely to play an especially important role in science and engineering, but eventually its influence may extend much farther. In many ways, fuzzy logic represents a significant paradigm shift in the aims of computing - a shift which reflects the fact that the human mind, unlike present day computers, possesses a remarkable ability to store and process information which is pervasively imprecise, uncertain and lacking in categoricity. The BISC Program invited pioneers, the most prominent contributors, researchers, and executives from around the world who are interested in forging the frontiers by the use of fuzzy logic and soft computing methods to create the information backbone for global enterprises. This special event provided a unique and excellent opportunity for the academic, industry leaders and corporate communities to address new challenges, share solutions, and discuss research directions for the future with enormous potential for the future of the Global Economy. During this event, over 300 hundred researchers and executives were attended. They were presented their most recent works, their original contribution, and historical trends of the field. In addition, four high level panels organized to discuss the issues related to the field, trends, past, present and future of the field. Panel one was organized and moderated by Janusz Kacprzyk and Vesa A. Niskanen, “SOFT COMPUTING: PAST, PRESENT, FUTURE (GLOBAL ISSUE)”. The Panelists were Janusz Kacprzyk, Vesa A. Niskanen, Didier Dubois, Masoud Nikravesh, Hannu Nurmi, Rudi Seising, Richard Tong, Enric Trillas, and Junzo Watada. The panel discussed general issues such as role of SC in social and behavioral sciences, medicin, economics and philosophy (esp. philosophy of science, methodology, and ethics). Panel also considered aspects of decision making, democracy and voting as well as manufacturing, industrial and business aspects. Second Panel was organized and moderated by Elie Sanchez and Masoud Nikravesh, “FLINT AND SEMANTIC WEB”. The Panelists were C. Carlsson, N. Kasabov, T. Martin, M. Nikravesh, E. Sanchez, A. Sheth, R. R. Yager and L.A. Zadeh. The Organizers think that, it is exciting time in the fields of Fuzzy Logic and Internet (FLINT) and the Semantic Web. The panel discussion added to VII
VIII
Preface
the excitement, as it did focus on the growing connections between these two fields. Most of today’s Web content is suitable for human consumption. The Semantic Web is presented as an extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in cooperation. But while the vision of the Semantic Web and associated research attracts attention, as long as bivalent-based logical methods will be used, little progress can be expected in handling ill-structured, uncertain or imprecise information encountered in real world knowledge. During recent years, important initiatives have led to reports of connections between Fuzzy Logic and the Internet (FLINT). FLINT meetings (“Fuzzy Logic and the Internet”) have been organized by BISC (“Berkeley Initiative in Soft Computing”). Meanwhile, scattered papers were published on Fuzzy Logic and the Semantic Web. Special sessions and workshops were organized, showing the positive role Fuzzy Logic could play in the development of the Internet and Semantic Web, filling a gap and facing a new challenge. Fuzzy Logic field has been maturing for forty years. These years have witnessed a tremendous growth in the number and variety of applications, with a real-world impact across a wide variety of domains with humanlike behavior and reasoning. And we believe that in the coming years, the FLINT and Semantic Web will be major fields of applications of Fuzzy Logic. This panel session discussed concepts, models, techniques and examples exhibiting the usefulness, and the necessity, of using Fuzzy Logic and Internet and Semantic Web. In fact, the question is not really a matter of necessity, but to recognize where, and how, it is necessary. Third Panel was organized and moderated by Patrick Bosc and Masoud Nikravesh. The Panelist were P. Bosc, R. de Caluwe, D. Kraft, M. Nikravesh, F. Petry, G. de Tré, Raghu Krishnapuram, and Gabriella Pasi, FUZZY SETS IN INFORMATION SYSTEMS. Fuzzy sets approaches have been applied in the database and information retrieval areas for more than thirty years. A certain degree of maturity has been reached in terms of use of fuzzy sets techniques in relational databases, object-oriented databases, information retrieval systems, geographic information systems and the systems dealing with the huge quantity of information available through the Web. Panel discussed issues related to research works including database design, flexible querying, imprecise database management, digital libraries or web retrieval have been undertaken and their major outputs will be emphasized. On the other hand, aspects that seem to be promising for further research has been identified. The forth panel was organized and moderated by L. A. Zadeh and Masoud Nikravesh, A GLIMPSE INTO THE FUTURE. Panelists were Janusz Kacprzyk, K. Hirota, Masoud Nikravesh, Henri Prade, Enric Trillas, Burhan Turksen, and Lotfi A. Zadeh. Predicting the future of fuzzy logic is difficult if fuzzy logic is interpreted in its wide sense, that is, a theory in which everything is or is allowed to be a matter of degree. But what is certain is that as we move further into the age of machine intelligence and mechanized decision-making, both theoretical and applied aspects of fuzzy logic will gain in visibility and importance. What will stand out is the unique capability of fuzzy logic to serve as a basis for reasoning and computation with information described in natural language. No other theory has this capability.
Preface
IX
The chapters of the book are evolved from presentations made by selected participant at the meeting and organized in two books. The papers include report from the different front of soft computing in various industries and address the problems of different fields of research in fuzzy logic, fuzzy set and soft computing. The book provides a collection of forty two (42) articles in two volumes. We would like to take this opportunity to thanks all the contributors and reviewers of the articles. We also wish to acknowledge our colleagues who have contributed to the area directly and indirectly to the content of this book. Finally, we gratefully acknowledge BT, OMRON, Chevron, ONR, and EECS Department, CITRIS program, BISC associate members, for the financial and technical support and specially, Prof. Shankar Sastry—CITRIS Director and former EECS Chair, for his special support for this event, which made the meeting and publication and preparation of the book possible. November 29, 2006 Berkeley, California USA
Masoud Nikravesh, Lotfi A. Zadeh, and Janusz Kacprzyk Berkeley Initiative in Soft Computing (BISC) University of California, Berkeley
Contents
Web Intelligence, World Knowledge and Fuzzy Logic Lotfi A. Zadeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Towards Human Consistent Linguistic Summarization of Time Series via Computing with Words and Perceptions Janusz Kacprzyk, Anna Wilbik and Sławomir Zadro˙zny . . . . . . . . . . . . . . . . . .
17
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind Masoud Nikravesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century1 Rudolf Seising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
Selected Results of the Global Survey on Research, Instruction and Development Work with Fuzzy Systems Vesa A. Niskanen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
Fuzzy Models and Interpolation László T. Kóczy, János Botzheim and Tamás D. Gedeon . . . . . . . . . . . . . . . . . 111 Computing with Antonyms E. Trillas, C. Moraga, S. Guadarrama, S. Cubillo, E. Castiñeira . . . . . . . . . . . . 133 Morphic Computing: Concept and Foundation Germano Resconi and Masoud Nikravesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets Rui J. P. de Figueiredo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Concept-Based Search and Questionnaire Systems Masoud Nikravesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Towards Perception Based Time Series Data Mining Ildar Z. Batyrshin and Leonid Sheremetov . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
XII
Contents
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems Ronald R. Yager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Uncertainty in Computational Perception and Cognition Ashu M. G. Solo and Madan M. Gupta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 Computational Intelligence for Geosciences and Oil Exploration Masoud Nikravesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Hierarchical Fuzzy Classification of Remote Sensing Data Yan Wang, Mo Jamshidi, Paul Neville, Chandra Bales and Stan Morain . . . . . 333 Real World Applications of a Fuzzy Decision Model Based on Relationships between Goals (DMRG) Rudolf Felix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences Marius Calin and Constantin Leonte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Soft Computing in the Chemical Industry: Current State of the Art and Future Trends Arthur Kordon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Identifying Aggregation Weights of Decision Criteria: Application of Fuzzy Systems to Wood Product Manufacturing Ozge Uncu, Eman Elghoneimy, William A. Gruver, Dilip B Kotak and Martin Fleetwood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 On the Emergence of Fuzzy Logics from Elementary Percepts: the Checklist Paradigm and the Theory of Perceptions Ladislav J. Kohout and Eunjin Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
List of Contributors
Anna Wilbik Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warsaw, Poland
[email protected] Arthur Kordon Engineering&Process Sciences, Core R&D, The Dow Chemical Company, 2301 N Brazosport Blvd, Freeport, TX 77541, USA
[email protected]
Germano Resconi Catholic University, Brescia, Italy
[email protected] Ildar Z. Batyrshin · Leonid Sheremetov Research Program in Applied Mathematics and Computing (PIMAyC), Mexican Petroleum Institute, Eje Central Lazaro Cardenas 152, Col. San Bartolo Atepehuacan, C.P. 07730, Mexico, D.F., Mexico, {batyr,
[email protected]}
Ashu M. G. Solo Maverick Technologies America Inc., Suite 808, 1220 North Market Street, Wilmington, Delaware, U.S.A. 19801
[email protected] János Botzheim Department of Telecommunications and S. Cubillo · E. Castiñeira Media Informatics, Budapest University Technical University of Madrid, of Technology and Economics, Hungary Department of Applied Mathematics, Department of Computer Science, The Madrid, Spain Australian National University Dilip B Kotak · Martin Fleetwood Institute for Fuel Cell Innovation, National Research Council, Vancouver, BC, Canada, {dilip.kotak, martin.fleetwood}@nrc-cnrc.gc.ca Eunjin Kim Department of Computer Science, John D. Odegard School of Aerospace Science, University of North Dakota, Grand Forks ND 58202-9015, USA
Janusz Kacprzyk · Sławomir Zadro˙zny Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warsaw, Poland Warsaw School of Information Technology (WIT), ul. Newelska 6, 01-447 Warsaw, Poland
[email protected] zadro˙zny@@ibspan.waw.pl XIII
XIV
Ladislav J. Kohout Department of Computer Science, Florida State University, Tallahassee, Florida 32306-4530, USA László T. Kóczy Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary Institute of Information Technology and Electrical Engineering, Széchenyi István University, Gy˝or, Hungary Lotfi A. Zadeh BISC Program, Computer Sciences Division, EECS Department, University of California, Berkeley, CA 94720, USA
[email protected] Madan M. Gupta Intelligent Systems Research Laboratory College of Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, Canada S7N 5A9
[email protected]
List of Contributors
Masoud Nikravesh BISC Program, Computer Sciences Division, EECS Department and Imaging and Informatics Group-LBNL, University of California, Berkeley, CA 94720, USA
[email protected] Masoud Nikravesh Berkeley Initiative in Soft Computing (BISC) rogram, Computer Science Division- Department of EECS, University of California, Berkeley, CA 94720
[email protected] Mo Jamshidi Department of Electrical and Computer Engineering and ACE Center, University of Texas at San Antonio, San Antonio, TX, 78924
[email protected]
Marius Calin · Constantin Leonte The “Ion Ionescu de la Brad”, University of Agricultural Sciences and Veterinary Medicine of Iasi, Aleea Mihail Sadoveanu no. 3, Iasi 700490, Romania {mcalin, cleonte}@univagro-iasi.ro
Ozge Uncu · Eman Elghoneimy · William A. Gruver School of Engineering Science, Simon Fraser University, 8888 University Dr. Burnaby BC, Canada, {ouncu, eelghone, gruver}@sfu.ca
Masoud Nikravesh BISC Program, Computer Sciences Division, EECS Department University of California, Berkeley, and NERSCLawrence Berkeley National Lab., California 94720–1776, USA
[email protected]
Paul Neville · Chandra Bales · Stan Morain The Earth Data Analysis Center (EDAC), University of New Mexico, Albuquerque, New Mexico, 87131 {pneville, cbales, smorain}@edac.unm.edu
Masoud Nikravesh BISC Program, EECS Department, University of California, Berkeley, CA 94720, US
[email protected]
Ronald R. Yager Machine Intelligence Institute, Iona College, New Rochelle, NY 10801
[email protected]
List of Contributors
Rudolf Felix FLS Fuzzy Logik Systeme GmbH Joseph-von-Fraunhofer Straße 20, 44 227 Dortmund, Germany
[email protected]
XV
Tamás D. Gedeon Department of Computer Science, The Australian National University E. Trillas · C. Moraga · S. Guadarrama Technical University of Madrid, Department of Artificial Intelligence
Rudolf Seising Medical University of Vienna, Core Unit for Medical Statistics and Vesa A. Niskanen Informatics, Vienna, Austria Department of Economics &
[email protected] ment, University of Helsinki, PO Box 27, 00014 Helsinki, Finland Rui J. P. de Figueiredo
[email protected] Laboratory for Intelligent Signal Processing and Communications, Yan Wang California Institute for TelecommuniIntelligent Inference Systems Corp., cations and Information Technology, NASA Research Park, MS: 566-109, University of California Irvine, Irvine, Moffett Field, CA, 94035 CA, 92697-2800
[email protected]
Web Intelligence, World Knowledge and Fuzzy Logic Lotfi A. Zadeh
Abstract Existing search engines—with Google at the top—have many remarkable capabilities; but what is not among them is deduction capability—the capability to synthesize an answer to a query from bodies of information which reside in various parts of the knowledge base. In recent years, impressive progress has been made in enhancing performance of search engines through the use of methods based on bivalent logic and bivalent-logic-based probability theory. But can such methods be used to add nontrivial deduction capability to search engines, that is, to upgrade search engines to question-answering systems? A view which is articulated in this note is that the answer is “No.” The problem is rooted in the nature of world knowledge, the kind of knowledge that humans acquire through experience and education. It is widely recognized that world knowledge plays an essential role in assessment of relevance, summarization, search and deduction. But a basic issue which is not addressed is that much of world knowledge is perception-based, e.g., “it is hard to find parking in Paris,” “most professors are not rich,” and “it is unlikely to rain in midsummer in San Francisco.” The problem is that (a) perception-based information is intrinsically fuzzy; and (b) bivalent logic is intrinsically unsuited to deal with fuzziness and partial truth. To come to grips with the fuzziness of world knowledge, new tools are needed. The principal new tool—a tool which is briefly described in their note—is Precisiated Natural Language (PNL). PNL is based on fuzzy logic and has the capability to deal with partiality of certainty, partiality of possibility and partiality of truth. These are the capabilities that are needed to be able to draw on world knowledge for assessment of relevance, and for summarization, search and deduction.
Lotfi A. Zadeh BISC Program, Computer Sciences Division, EECS Department, University of California, Berkeley, CA 94720, USA e-mail:
[email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
1
2
L. A. Zadeh
1 Introduction In moving further into the age of machine intelligence and automated reasoning, we have reached a point where we can speak, without exaggeration, of systems which have a high machine IQ (MIQ) (Zadeh, [17]). The Web and especially search engines—with Google at the top—fall into this category. In the context of the Web, MIQ becomes Web IQ, or WIQ, for short. Existing search engines have many remarkable capabilities. However, what is not among them is the deduction capability—the capability to answer a query by a synthesis of information which resides in various parts of the knowledge base. A question-answering system is by definition a system which has this capability. One of the principal goals of Web intelligence is that of upgrading search engines to question-answering systems. Achievement of this goal requires a quantum jump in the WIQ of existing search engines [1]. Can this be done with existing tools such as the Semantic Web [3], Cyc [8], OWL [13] and other ontology-centered systems [12, 14]—tools which are based on bivalent logic and bivalent-logic-based probability theory? It is beyond question that, in recent years, very impressive progress has been made through the use of such tools. But can we achieve a quantum jump in WIQ? A view which is advanced in the following is that bivalent-logic- based methods have intrinsically limited capability to address complex problems which arise in deduction from information which is pervasively ill-structured, uncertain and imprecise. The major problem is world knowledge—the kind of knowledge that humans acquire through experience and education [5]. Simple examples of fragments of world knowledge are: Usually it is hard to find parking near the campus in early morning and late afternoon; Berkeley is a friendly city; affordable housing is nonexistent in Palo Alto; almost all professors have a Ph.D. degree; most adult Swedes are tall; and Switzerland has no ports. Much of the information which relates to world knowledge—and especially to underlying probabilities—is perception-based (Fig. 1). Reflecting the bounded ability of sensory organs, and ultimately the brain, to resolve detail and store information, perceptions are intrinsically imprecise. More specifically, perceptions are f-granular in the sense that (a) the boundaries of perceived classes are unsharp; and (b) the values of perceived attributes are granular, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality [2]. Imprecision of perception-based information is a major obstacle to dealing with world knowledge through the use of methods based on bivalent logic and bivalentlogic-based probability theory—both of which are intolerant of imprecision and partial truth. What is the basis for this contention? A very simple example offers an explanation. Suppose that I have to come up with a rough estimate of Pat’s age. The information which I have is: (a) Pat is about ten years older than Carol; and (b) Carol has two children: a son, in mid-twenties; and a daughter, in mid-thirties. How would I come up with an answer to the query q: How old is Pat? First, using my world knowledge, I would estimate Carol’s age, given (b); then I would
Web Intelligence, World Knowledge and Fuzzy Logic
3
MEASUREMENT-BASED VS. PERCEPTION-BASED INFORMATION INFORMATION measurement-based numerical
perception-based linguistic
• it is 35 C°
• It is very warm
• Eva is 28 • Tandy is three years older than Dana
• Eva is young • Tandy is a few years older than Dana
•
• it is cloudy
•
• traffic is heavy
•
• Robert is very honest
Fig. 1 Measurement-based and perception-based information
add “about ten years,” to the estimate of Carol’s age to arrive at an estimate of Pat’s age. I would be able to describe my estimate in a natural language, but I would not be able to express it as a number, interval or a probability distribution. How can I estimate Carol’s age given (b)? Humans have an innate ability to process perception-based information—an ability that bivalent-logic-based methods do not have; nor would such methods allow me to add “about ten years” to my estimate of Carol’s age. There is another basic problem—the problem of relevance. Suppose that instead of being given (a) and (b), I am given a collection of data which includes (a) and (b), and it is my problem to search for and identify the data that are relevant to the query. I came across (a). Is it relevant to q? By itself, it is not. Then I came across (b). Is it relevant to q? By itself, it is not. Thus, what I have to recognize is that, in isolation, (a) and (b) are irrelevant, that is, uninformative, but, in combination, they are relevant i.e., are informative. It is not difficult to recognize this in the simple example under consideration, but when we have a large database, the problem of identifying the data which in combination are relevant to the query, is very complex. Suppose that a proposition, p, is, in isolation, relevant to q, or, for short, is i -relevant to q. What is the degree to which p is relevant to q? For example, to what degree is the proposition: Carol has two children: a son, in mid-twenties, and a daughter, in mid-thirties, relevant to the query: How old is Carol? To answer this question, we need a definition of measure of relevance. The problem is that there is no quantitative definition of relevance within existing bivalent-logic-based theories. We will return to the issue of relevance at a later point. The example which we have considered is intended to point to the difficulty of dealing with world knowledge, especially in the contexts of assessment of relevance and deduction, even in very simple cases. The principal reason is that much of world knowledge is perception-based, and existing theories of knowledge representation
4
L. A. Zadeh
and deduction provide no tools for this purpose. Here is a test problem involving deduction form perception-based information.
1.1 The Tall Swedes Problem Perception: Most adult Swedes are tall, with adult defined as over about 20 years in age. Query: What is the average height of Swedes? A fuzzy-logic-based solution of the problem will be given at a later point. A concept which provides a basis for computation and reasoning with perceptionbased information is that of Precisiated Natural Language (PNL) [15]. The capability of PNL to deal with perception-based information suggests that it may play a significant role in dealing with world knowledge. A quick illustration is the Carol example. In this example, an estimate of Carol’s age, arrived at through the use of PNL would be expressed as a bimodal distribution of the form Age(Carol)
is
((P1 , V1 ) + . . . + (Pn , Vn ))
, i = 1, . . . , n
where the Vi are granular values of Age, e.g., less than about 20, between about 20 and about 30, etc.; Pi is a granular probability of the event (Age(Carol) is Vi ), i = 1, . . . , n; and + should be read as “and.” A brief description of the basics of PNL is presented in the following.
2 Precisiated Natural Language (PNL) PNL deals with perceptions indirectly, through their description in a natural language, Zadeh [15]. In other words, in PNL a perception is equated to its description in a natural language. PNL is based on fuzzy logic—a logic in which everything is, or is allowed to be, a matter of degree. It should be noted that a natural language is, in effect, a system for describing perceptions. The point of departure in PNL is the assumption that the meaning of a proposition, p, in a natural language, NL, may be represented as a generalized constraint of the form (Fig. 2) X isr R, where X is the constrained variable, R is a constraining relation which, in general, is not crisp (bivalent); and r is an indexing variable whose values define the modality of the constraint. The principal modalities are: possibilistic (r =blank); veristic(r =v); probabilistic(r =p); random set(r =r s); fuzzy graph (r = f g); usuality (r =u); and Pawlak set (r = ps). The set of all generalized constraints together with their combinations, qualifications and rules of constraint propagation, constitutes
Web Intelligence, World Knowledge and Fuzzy Logic
5
•standard constraint: X ∈ C •generalized constraint:X isr R copula
X isr R
GC-form (generalized constraint form of type r) modality identifier constraining relation constrained variable
•X = (X1 , …, Xn ) •X may have a structure: X = Location (Residence(Carol)) •X may be a function of another variable: X = f(Y) •X may be conditioned: (X/Y) • r : = / ≤ ... / ⊂ / ⊃ / blank / v / p / u / rs / fg / ps / ...
Fig. 2 Generalized Constraint
the Generalized Constraint Language (GCL). By construction, GCL is maximally expressive. In general, X, R and r are implicit in p. Thus, in PNL the meaning of p is precisiated through explicitation of the generalized constraint which is implicit in p, that is, through translation into GCL (Fig. 3). Translation of p into GCL is exemplified by the following. (a) Monika is young −→ Age(Monika) is young, where young is a fuzzy relation which is characterized by its membership function μyoung, with μyoung (u) representing the degree to which a numerical value of age, u, fits the description of age as “young.” (b) Carol lives in a small city near San Francisco −→ Location (Residence(Carol)) is SMALL [City; μ] ∩ NEAR [City; μ], where SMALL [City; μ] is the fuzzy set of small cities; NEAR [City; μ] is the fuzzy set of cities near San Francisco; and ∩ denotes fuzzy set intersection (conjunction). (c) Most Swedes are tall −→ Count(tall.Swedes/Swedes) is most. In this example, the constrained variable is the relative count of tall Swedes among Swedes, and the constraining relation is the fuzzy quantifier “most,” with μ most 1 usually
Fig. 3 Calibration of most and usually represented as trapezoidal fuzzy numbers
0
0.5
1
number
6
L. A. Zadeh
“most” represented as a fuzzy number (Fig. 3). The relative count, Count( A/B), is defined as follows. If A and B are fuzzy sets in a universe of discourse U = {u i , . . . , u n }, with the grades of membership of u i in A and B being μi and vi , respectively, then by definition the relative Count of A in B is expressed as Count(A/B) =
i μi ∧ vi , vi i
where the conjunction, ∧ , is taken to be min. More generally, conjunction may be taken to be a t-norm [11]. What is the rationale for introducing the concept of a generalized constraint? Conventional constraints are crisp (bivalent) and have the form X ε C, where X is the constrained variable and C is a crisp set. On the other hand, perceptions are f-granular, as was noted earlier. Thus, there is a mismatch between f-granularity of perceptions and crisp constraints. What this implies is that the meaning of a perception does not lend itself to representation as a collection of crisp constraints; what is needed for this purpose are generalized constraints or, more generally, an element of the Generalized Constraint Language (GCL) [15]. Representation of the meaning of a perception as an element of GCL is a first step toward computation with perceptions—computation which is needed for deduction from data which are resident in world knowledge. In PNL, a concept which plays a key role in deduction is that of a protoform—an abbreviation of “prototypical form,” [15]. If p is a proposition in a natural language, NL, its protoform, PF( p), is an abstraction of p which places in evidence the deep semantic structure of p. For example, the protoform of “Monika is young,” is “A(B) is C,” where A is abstraction of “Age,” B is abstraction of “Monika” and C is abstraction of “young.” Similarly, Most Swedes are tall −→ Count( A/B) is Q, where A is abstraction of “tall Swedes,” B is abstraction of “Swedes” and Q is abstraction of “most.” Two propositions, p and q, are protoform-equivalent, or PF-equivalent, for short, if they have identical protoforms. For example, p: Most Swedes are tall, and q: Few professors are rich, are PF-equivalent. The concept of PF-equivalence suggests an important mode of organization of world knowledge, namely protoform-based organization. In this mode of organization, items of knowledge are grouped into PF-equivalent classes. For example, one such class may be the set of all propositions whose protoform is A(B) is C, e.g., Monika is young. The partially instantiated class Price(B) is low, would be the set of all objects whose price is low. As will be seen in the following, protoform-based organization of world knowledge plays a key role in deduction form perception-based information. Basically, abstraction is a means of generalization. Abstraction is a familiar and widely used concept. In the case of PNL, abstraction plays an especially important role because PNL abandons bivalence. Thus, in PNL, the concept of a protoform is
Web Intelligence, World Knowledge and Fuzzy Logic
7
THE BASIC IDEA P
GCL
NL precisiation
description p perception
NL(p)
GC(p)
description of perception
precisiation of perception PFL
GCL abstraction GC(p)
PF(p)
precisiation of perception GCL (Generalized Constrain Language) is maximally expressive
Fig. 4 Precisiation and abstraction
not limited to propositions whose meaning can be represented within the conceptual structure of bivalence logic. In relation to a natural language, NL, the elements of GCL may be viewed as precisiations of elements of NL. Abstractions of elements of GCL gives rise to what is referred to as Protoform Language, PFL, (Fig. 4). A consequence of the concept of PF-equivalence is that the cardinality of PFL is orders of magnitude smaller than that of GCL, or, equivalently, the set of precisiable propositions in NL. The small cardinality of PFL plays an essential role in deduction. The principal components of the structure of PNL (Fig. 5) are: (1) a dictionary from NL to GCL; (2) a dictionary from GCL to PFL (Fig. 6); (3) a multiagent, modular deduction database, DDB; and (4) a world knowledge database, WKDB. The constituents of DDB are modules, with a module consisting of a group of protoformal rules of deduction, expressed in PFL (Fig. 7), which are drawn from a particular domain, e.g., probability, possibility, usuality, fuzzy arithmetic, fuzzy
NL p•
precisiation
GCL
PFL
p* • GC(p)
p** • PF(p) dictionary 2
dictionary 1
WKDB world knowledge database
Fig. 5 Basic structure of PNL
DDB deduction database
8
L. A. Zadeh
1: precisiation
proposition in NL p
p* (GC-form)
most Swedes are tall
2:
Σ Count (tall.Swedes/Swedes) is most protoform
precisiation
PF(p*)
p* (GC-form)
Σ Count (tall.Swedes/Swedes) is most
Q A’s are B’s
Fig. 6 Structure of PNL: dictionaries
logic, search, etc. For example, a rule drawn from fuzzy logic is the compositional rule of inference [11], expressed as X is A (X, Y ) is B Y is A ◦ B
μ A◦B (v) = max u (μ A (u) ∧ μ B (u, v))
symbolic part
computational part
where A◦B is the composition of A and B, defined in the computational part, in which μ A , μ B and μ A◦ B are the membership functions of A, B and A◦B, respectively. Similarly, a rule drawn from probability is
μ D (v) = max q (μ B (∫ μ A (u)g(u)du)) Prob (X is A) is B Prob (X is C) is D
U
subject to: v = ∫ μC (u)g(u)du U
∫ g(u)du = 1
U
symbolic part computational part where D is defined in the computational part and g is the probability density function of X. The rules of deduction in DDB are, basically, the rules which govern propagation of generalized constraints. Each module is associated with an agent whose function is that of controlling execution of rules and performing embedded computations. The top-level agent controls the passing of results of computation from a module to other modules. The structure of protoformal, i.e., protoform-based, deduction is shown in Fig. 5. A simple example of protoformal deduction is shown in Fig. 8.
Web Intelligence, World Knowledge and Fuzzy Logic
9
MODULAR DEDUCTION DATABASE POSSIBILITY MODULE
SEARCH MODULE
FUZZY ARITHMETIC PROBABILITY MODULE agent MODULE
FUZZY LOGIC MODULE
EXTENSION PRINCIPLE MODULE
Fig. 7 Basic structure of PNL: modular deduction database
The principal deduction rule in PNL is the extension principle [19]. Its symbolic part is expressed as f (X) is A g(X) is B in which the antecedent is a constraint on X through a function of X, f (X); and the consequent is the induced constraint on a function of X, g(X). The computational part of the rule is expressed as μB (v) = supu (μA (f(u)) subject to v = g(u)
p
GC(p)
Dana is young Age (Dana) is young
Tandy is a few years older than Dana X is A Y is (X+B) Y is A+B
Age (Tandy) is (Age (Dana)) +few
PF(p) X is A
Y is (X+B)
Age (Tandy) is (young+few)
μ A+ B ( v ) = supu ( μ A ( u ) ∧ μ B ( v - u )
Fig. 8 Example of protoformal reasoning
10
L. A. Zadeh
To illustrate the use of the extension principle, we will consider the Tall Swedes Problem: p: Most adult Swedes are tall q: What is the average height of adult Swedes? Let P1 (u 1 , . . . , u N ) be a population of Swedes, with the height of u i being h i , i = 1, . . . , N, and μa (u i ) representing the degree to which u i is an adult. The average height of adult Swedes is denoted as h ave . The first step is precisiation of p, using the concept of relative Count: p −→ Count(tall∧adult.Swedes/adult.Swedes) is most or, more explicitly, p −→
μt all (h i ) ∧ μa (u i ) i
is most.
i μa (u i )
The next step is precisiation of q: q
−→
h ave =
h ave is ?B,
1 i h i N
where B is a fuzzy number Using the extension principle, computation of h ave as a fuzzy number is reduced to the solution of the variational problem. μ B (v) = suph (μmost (
i μt all (h i ) ∧ μa (u i ) )) i μa (u i )
, h = (h 1 , . . . , h N )
subject to v=
1 (i h i ). N
Note that we assume that a problem is solved once it is reduced to the solution of a well-defined computational problem.
3 PNL as a Definition Language One of the important functions of PNL is that of serving as a high level definition language. More concretely, suppose that I have a concept, C, which I wish to define. For example, the concept of distance between two real-valued functions, f and g, defined on the real line.
Web Intelligence, World Knowledge and Fuzzy Logic
11
The standard approach is to use a metric such as L 1 or L 2 . But a standard metric may not capture my perception of the distance between f and g. The PNL-based approach would be to describe my perception of distance in a natural language and then precisiate the description. This basic idea opens the door to (a) definition of concepts for which no satisfactory definitions exist, e.g., the concepts of causality, relevance and rationality, among many others, and (b) redefinition of concepts whose existing definitions do not provide a good fit to reality. Among such concepts are the concepts of similarity, stability, independence, stationarity and risk. How can we concretize the meaning of “good fit?” In what follows, we do this through the concept of cointension. More specifically, let U be a universe of discourse and let C be a concept which I wish to define, with C relating to elements of U . For example, U is a set of buildings and C is the concept of tall building. Let p(C) and d(C) be, respectively, my perception and my definition of C. Let I ( p(C)) and I (d(C)) be the intensions of p(C) and d(C), respectively, with “intension” used in its logical sense, [6, 7] that is, as a criterion or procedure which identifies those elements of U which fit p(C) or d(C). For example, in the case of tall buildings, the criterion may involve the height of a building. Informally, a definition, d(C), is a good fit or, more precisely, is cointensive, if its intension coincides with the intension of p(C). A measure of goodness of fit is the degree to which the intension of d(C) coincides with that of p(C). In this sense, cointension is a fuzzy concept. As a high level definition language, PNL makes it possible to formulate definitions whose degree of cointensiveness is higher than that of definitions formulated through the use of languages based on bivalent logic. A substantive exposition of PNL as a definition language is beyond the scope of this note. In what follows, we shall consider as an illustration a relatively simple version of the concept of relevance.
4 Relevance We shall examine the concept of relevance in the context of a relational model such as shown in Fig. 9. For concreteness, the attributes A1 , . . . , An may be interpreted as symptoms and D as diagnosis. For convenience, rows which involve the same value of D are grouped together. The entries are assumed to be labels of fuzzy sets. For example, A5 may be blood pressure and a53 may be “high.” An entry represented as * means that the entry in question is conditionally redundant in the sense that its value has no influence on the value of D (Fig. 10). An attribute, A j , is redundant, and hence deletable, if it is conditionally redundant for all values of Name. An algorithm, termed compactification, which identifies all deletable attributes is described in Zadeh [18]. Compactification algorithm is a generalization of the Quine-McCluskey algorithm for minimization of switching circuits. The Redact algorithm in the theory of rough sets [10] is closely related to the compactification algorithm.
12
L. A. Zadeh
RELEVANCE, REDUNDANCE AND DELETABILITY DECISION TABLE Name Name1 .
A1 a11
Aj a1j
An ain
D d1
. akj
. akn
. d1
Namek
. ak1
Namek+1
ak+1, 1
.
. al1
. alj
. aln
. dl
. am1
. amj
. amn
. dr
Namel . Namen
Aj: j th symptom aij: value of j th symptom of Name
d2
ak+1, j ak+1, n
D: diagnosis
Fig. 9 A relational model of decision
REDUNDANCE Name
A1
Aj
. Namer
. ar1
.
.
. * .
DELETABILITY An . arn
D . d2
.
.
Aj is conditionally redundant for Namer, A, is ar1, An is arn If D is ds for all possible values of Aj in *
Aj is redundant if it is conditionally redundant for all values of Name
Fig. 10 Conditional redundance and redundance
• compactification algorithm (Zadeh, 1976); Quine-McCluskey algorithm
The concept of relevance (informativeness) is weaker and more complex than that of redundance (deletability). As was noted earlier, it is necessary to differentiate between relevance in isolation (i -relevance) and relevance as a group. In the following, relevance should be interpreted as i -relevance. A value of A1 , say ar j , is irrelevant (uninformative) if the proposition A j is ar j does not constrain D (Fig. 11). For example, the knowledge that blood pressure is high may convey no information about the diagnosis (Fig. 12). An attribute, A j , is irrelevant (uninformative) if, for all ar j , the proposition A j Name A1 Name
Aj
An
D d1 . d1
.
aij
.
.
aij
.
r
d2 Name i+s
Fig. 11 Irrelevance
.
d2
(Aj is aij) is irrelevant (uninformative)
Web Intelligence, World Knowledge and Fuzzy Logic Fig. 12 Relevance and irrelevance
13 D is ?d
if Aj is arj
constraint on Aj induces a constraint on D example: (blood pressure is high) constrains D (Aj is arj) is uniformative if D is unconstrained
Aj is irrelevant if it Aj is uniformative for all arj irrelevance
deletability
is ar j does not constrain D. What is important to note is that irrelevance does not imply deletability, as redundance does. The reason is that A j may be i -irrelevant but not irrelevant in combination with other attributes. An example is shown in Fig. 13. As defined above, relevance and redundance are bivalent concepts, with no degrees of relevance or redundance allowed. But if the definitions in question are interpreted as idealizations, then a measure of the departure from the ideal could be used as a measure of the degree of relevance or redundance. Such a measure could be defined through the use of PNL. In this way, PNL may provide a basis for defining relevance and redundance as matters of degree, as they are in realistic settings. However, what should be stressed is that our analysis is limited to relational models. Formalizing the concepts of relevance and redundance in the context of the Web is a far more complex problem—a problem for which no cointensive solution is in sight.
A2
EXAMPLE D: black or white
0
A1 A1 and A2 are irrelevant (uninformative) but not deletable A2 D: black or white
Fig. 13 Redundance and irrelevance
0
A2 is redundant (deletable)
A1
14
L. A. Zadeh
5 Concluding Remark Much of the information which resides in the Web—and especially in the domain of world knowledge—is imprecise, uncertain and partially true. Existing bivalentlogic-based methods of knowledge representation and deduction are of limited effectiveness in dealing with information which is imprecise or partially true. To deal with such information, bivalence must be abandoned and new tools, such as PNL, should be employed. What is quite obvious is that, given the deeply entrenched tradition of basing scientific theories on bivalent logic, a call for abandonment of bivalence is not likely to meet a warm response. Abandonment of bivalence will eventually become a reality but it will be a gradual process.
References 1. Arjona, J.; Corchuelo, R.; Pena, J. and Ruiz, D. 2003. Coping with Web Knowledge. Advances in Web Intelligence. Springer-Verlag Berlin Heidelberg, 165–178. 2. Bargiela, A. and Pedrycz, W. 2003. Granular Computing—An Introduction. Kluwer Academic Publishers: Boston, Dordrecht, London. 3. Berners-Lee, T.; Hendler, J. and Lassila, O. 2001. The Semantic Web. Scientific American. 4. Chen, P.P. 1983. Entity-Relationship Approach to Information Modeling and Analysis. North Holland. 5. Craven, M.; DiPasquo, D.; Freitag, D.; McCallum, A.; Mitchell, T.; Nigam, K. and Slattery, S. 2000. Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence 118 (1-2): 69–113. 6. Cresswell, M. J. 1973. Logic and Languages. London, U.K.: Methuen. 7. Gamat, T. F. 1996. Language, Logic and Linguistics. University of Chicago Press. 8. Lenat, D. B. 1995.cyc: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM 38(11): 32–38. 9. Novak, V. and Perfilieva, I., eds. 2000. Discovering the World with Fuzzy Logic. Studies in Fuzziness and Soft Computing. Heidelberg New York: Physica-Verlag. 10. Pawlak, Z. 1991. Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic, Dordrecht. 11. Pedrycz, W., and Gomide, F. 1998. Introduction to Fuzzy Sets. Cambridge, Mass.: MIT Press. 12. Smith, B. and Welty, C. 2002. What is Ontology? Ontology: Towards a new synthesis. Proceedings of the Second International Conference on Formal Ontology in Information Systems. 13. Smith, M. K.; Welty, C.; McGuinness, D., eds. 2003. OWL Web Ontology Language Guide. W3C Working Draft 31. 14. Sowa, J. F., in Albertazzi, L. 1999. Ontological Categories. Shapes of Forms: From Gestalt Psychology and Phenomenology to Ontology and Mathematics, Dordrecht: Kluwer Academic Publishers, 307–340. 15. Zadeh, L.A. 2001. A New Direction in AI—Toward a Computational Theory of Perceptions. AI Magazine 22(1): 73–84. 16. Zadeh, L.A. 1999. From Computing with Numbers to Computing with Words—From Manipulation of Measurements to Manipulation of Perceptions. IEEE Transactions on Circuits and Systems 45(1): 105–119. 17. Zadeh, L.A. 1994. Fuzzy Logic, Neural Networks, and Soft Computing, Communications of the ACM—AI, Vol. 37, pp. 77–84.
Web Intelligence, World Knowledge and Fuzzy Logic
15
18. Zadeh, L.A. 1976. A fuzzy-algorithmic approach to the definition of complex or imprecise concepts, Int. Jour. Man-Machine Studies 8, 249–291. 19. Zadeh, L.A. 1975. The concept of a linguistic variable and its application to approximate reasoning, Part I: Inf. Sci.8, 199–249, 1975; Part II: Inf. Sci. 8, 301-357, 1975; Part III: Inf. Sci. 9, 43–80.
Towards Human Consistent Linguistic Summarization of Time Series via Computing with Words and Perceptions Janusz Kacprzyk, Anna Wilbik and Sławomir Zadro˙zny
Abstract We present a new, human consistent approach to the summarization of numeric time series aimed at capturing the very essence of what occurs in the data in terms of an increase, decrease and stable values of data, when relevant changes occur, and how long are periods of particular behavior types exemplified by an increase, decrease and a stable period, all with some intensity. We use natural language descriptions that are derived by using tools of Zadeh’s computing with words and perceptions paradigm, notably tools and techniques involving fuzzy linguistic quantifiers. More specifically, we use the idea of fuzzy logic based linguistic summaries of data(bases) in the sense of Yager, later developed by Kacprzyk and Yager, and Kacprzyk, Yager and Zadro˙zny. We extend that idea to a dynamic setting, to the summarization of trends in time series characterized by: dynamics of change, duration and variability. For the aggregation of partial trends, which is crucial element in the problem considered, we apply the traditional Zadeh’s fuzzy logic based calculus of linguistically quantified propositions and the Sugeno integral. Results obtained are promising.
1 Introduction The availability of information is a prerequisite for coping with difficult, changeable and competitive environments within which individuals, human groups of various size, and organizations are to operate. To be effective and efficient, information
Janusz Kacprzyk · Sławomir Zadro˙zny Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warsaw, Poland Warsaw School of Information Technology (WIT), ul. Newelska 6, 01-447 Warsaw, Poland e-mail:
[email protected] e-mail: zadro˙zny@@ibspan.waw.pl Anna Wilbik Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warsaw, Poland e-mail:
[email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
17
18
J. Kacprzyk et al.
should be not only easily available but also in a form that can be comprehensible by individuals, groups or organizations. This means first of all that it should be in a proper language. In our context we have human beings as individuals, human groups and human organizations. Unfortunately, this brings about serious difficulties because with computer systems being indispensable for supporting these tasks related to the acquisition, processing and presentation of information, and with a human being as a crucial element, there is an inherent discrepancy between the very specifics of them. Notably, this is mainly related to a remarkable human ability to properly understand imprecise information resulting from the use of natural language, and to use it purposefully. This is clearly not the case for computers and information technology. A real challenge is to attain a synergistic collaboration between a human being and computer, taking advantage of the strengths of both of them. This is the basic philosophy of our approach. We are concerned with some time distributed sequences of numerical data, representing how a quantity of interest evolves over time. A human being has a remarkable ability to capture the essence of how this evolves in terms of duration, variability, etc. However, when asked to articulate or assess this, a human being is normally able to only provide an imprecise and vague statement in a natural language, exemplified by “values grow strongly in the beginning, then stabilize, and finally decrease sharply”. The traditional approach to such a trend analysis is based on the detection of change of the first and second derivative (e.g., [8, 12, 20, 21]). Some first attempts at using nontraditional, natural language based analyses appeared in the works by Batyrshin [5], and Batyrshin et al. [3], [4]. Our approach, though maybe somewhat similar in spirit, is different with respect to the point of departure and tools and techniques employed even if both are heavily based on the philosophy of Zadeh’s computing with words. In this paper we employ linguistic summaries of databases introduced by Yager [25], and then further developed by Kacprzyk and Yager [14], and Kacprzyk, Yager and Zadro˙zny [15]). They serve a similar purpose. They are meant to capture in a simple natural language like form essential features of original data according to the user’s needs or interests. Therefore some attempts have been made to apply them to temporal data, too (cf., e.g., [7]) but that approach is different from the one proposed in this paper. Basically, our approach boils down to the extraction of (partial) trends which express whether over a particular time span (window), usually small in relation to the total time horizon of interest, data increase, decrease or are stable, all this with some intensity. We use for this purpose a modification of the well known Sklansky and Gonzalez [23] algorithm that is simple and efficient. Then, those consecutive partial trends are to be aggregated to obtain a linguistic summary of trends in a particular time series. For instance, in case of stock exchange data, a linguistic summary of the type considered may be: “initially, prices of most stocks increased sharply, then stabilized, and finally prices of almost all stocks exhibited a strong variability”. One can use various methods of aggregation and here we will use first the classic one based on Zadeh’s [26] calculus of linguistically quantified propositions. Then, we will use the Sugeno [24] integral.
Human Consistent Linguistic Summarization of Time Series
19
2 Temporal Data and Trend Analysis We deal with numerical data that vary over time, and a time series is a sequence of data measured at uniformly spaced time moments. We will identify trends as linearly increasing, stable or decreasing functions, and therefore represent given time series data as piecewise linear functions. Evidently, the intensity of an increase and decrease (slope) will matter, too. These are clearly partial trends as a global trend in a time series concerns the entire time span of the time series, and there also may be trends that concern parts of the entire time span, but more than a particular window taken into account while extracting partial trends by using the Sklansky and Gonzalez algorithm. In particular, we use the concept of a uniform partially linear approximation of a time series. Function f is a uniform ε-approximation of a time series, or a set of points {(x i , yi )}, if for a given, context dependent ε > 0, there holds ∀i : | f (x i ) − yi | ≤ ε
(1)
We use a modification of the well known, effective and efficient Sklansky and Gonzalez [23] algorithm that finds a linear uniform ε-approximation for subsets of points of a time series. The algorithm constructs the intersection of cones starting from point pi of the time series and including the circle of radius ε around the subsequent data points pi+ j , j = 1, 2, . . . , until the intersection of all cones starting at pi is empty. If for pi+k the intersection is empty, then we construct a new cone starting at pi+k−1 . Figures 1 and 2 present the idea of the algorithm. The family of possible solutions is indicated as a gray area. Clearly other algorithms can also be used, and there is a lot of them in the literature. To present details of the algorithm, let us first denote: • p_0 – a point starting the current cone, • p_1 – the last point checked in the current cone, • p_2 – the next point to be checked,
y •
Fig. 1 An illustration of the algorithm for the uniform ε-approximation – the intersection of the cones is indicated by the dark grey area
p1 • • p0
β1 γ1
p2 • β2 γ2 x
20
J. Kacprzyk et al.
Fig. 2 An illustration of the algorithm for the uniform ε-approximation – a new cone starts in point p2
y •
• •
p2 •
p0 x
• Alpha_01 – a pair of angles (γ1 , β1 ), meant as an interval, that defines the current cone as shown in Fig. 1, • Alpha_02 – a pair of angles of the cone starting at the point p_0 and inscribing the circle of radius ε around the point p_2 (cf. (γ2 , β2 ) in Fig. 1), • function read_point() reads a next point of data series, • function find() finds a pair of angles of the cone starting at the point p_0 and inscribing the circle of radius ε around the point p_2. The pseudocode of the procedure that extracts the trends is depicted in Fig. 3. The bounding values of Alpha_02 (γ2 , β2 ), computed by function find() correspond to the slopes of two lines such that: • are tangent to the circle of radius ε around point p2 = (x 2 , y2 ) • start at the point p0 = (x 0 , y0 ) Thus ⎞ Δx · Δy − ε (Δx)2 + (Δy)2 − ε2 ⎠ γ2 = ar ctg ⎝ (Δx)2 − ε2 ⎛
and ⎞ Δx · Δy + ε (Δx)2 + (Δy)2 − ε2 β2 = ar ctg ⎝ 0⎠ (Δx)2 − ε2 ⎛
where Δx = x 0 − x 2 and Δy = y0 − y2 . The resulting ε-approximation of a group of points p_0, . . . ,p_1 is either a single segment, chosen as, e.g., a bisector, or one that minimizes the distance (e.g.,
Human Consistent Linguistic Summarization of Time Series
21
read_point(p_0); read_point(p_1); while(1) { p_2=p_1; Alpha_02=find(); Alpha_01=Alpha_02; do { Alpha_01 = Alpha_01 ∩ Alpha_02; p_1=p_2; read_point(p_2); Alpha_02=find(); } while(Alpha_01 ∩ Alpha_02 = 0); save_found_trend(); p_0=p_1; p_1=p_2; }
Fig. 3 Pseudocode of the modified Sklansky and Gonzalez [23] procedure for extracting trends
the sum of squared errors, SSE) from the approximated points, or the whole family of possible solutions, i.e., the rays of the cone. This method is effective and efficient as it requires only a single pass through the data. Now we will identify (partial) trends with the line segments of the constructed piecewise linear function.
3 Dynamic Characteristics of Trends In our approach, while summarizing trends in time series data, we consider the following three aspects: • dynamics of change, • duration, and • variability, and it should be noted that by trends we mean here global trends, concerning the entire time series (or some, probably large, part of it), not partial trends concerning a small time span (window) taken into account in the (partial) trend extraction phase via the Sklansky and Gonzales [23] algorithm. In what follows we will briefly discuss these factors.
22
J. Kacprzyk et al.
3.1 Dynamics of Change By dynamics of change we understand the speed of changes. It can be described by the slope of a line representing the trend, (cf. α in Fig. 1). Thus, to quantify dynamics of change we may use the interval of possible angles α ∈ −90; 90 or their trigonometrical transformation. However it might be impractical to use such a scale directly while describing trends. Therefore we may use a fuzzy granulation in order to meet the users’ needs and task specificity. The user may construct a scale of linguistic terms corresponding to various directions of a trend line as, e.g.: • • • • • • •
quickly decreasing, decreasing, slowly decreasing, constant, slowly increasing, increasing, quickly increasing
Figure 4 illustrates the lines corresponding to the particular linguistic terms. In fact, each term represents a fuzzy granule of directions. In Batyrshin et al. [1, 4] there are presented many methods of constructing such a fuzzy granulation. The user may define a membership functions of particular linguistic terms depending on his or her needs.
quickly increasing
increasing
slowly increasing constant
slowly decreasing
Fig. 4 A visual representation of angle granules defining the dynamics of change
quickly decreasing
decreasing
Human Consistent Linguistic Summarization of Time Series
23
We map a single value α (or the whole interval of angles corresponding to the gray area in Fig. 2) characterizing the dynamics of change of a trend identified using the algorithm shown as a pseudocode in Fig. 3 into a fuzzy set (linguistic label) best matching a given angle. We can use, for instance, some measure of a distance or similarity, cf. the book by Cross and Sudkamp’s [9]. Then we say that a given trend is, e.g., “decreasing to a degree 0.8”, if μdecreasing (α) = 0.8, where μdecreasing is the membership function of a fuzzy set representing “decreasing” that is a best match for angle α.
3.2 Duration Duration describes the length of a single trend, meant as a linguistic variable and exemplified by a “long trend” defined as a fuzzy set whose membership function may be as in Fig. 5 where OX is the time axis divided into appropriate units. The definitions of linguistic terms describing the duration depend clearly on the perspective or purpose assumed by the user.
3.3 Variability Variability refers to how “spread out” (“vertically”, in the sense of values taken on) a group of data is. The following five statistical measures of variability are widely used in traditional analyses: • The range (maximum – minimum). Although the range is computationally the easiest measure of variability, it is not widely used, as it is based on only two data points that are extreme. This make it very vulnerable to outliers and therefore may not adequately describe the true variability. • The interquartile range (IQR) calculated as the third quartile (the third quartile is the 75th percentile) minus the first quartile (the first quartile is the 25th percentile) that may be interpreted as representing the middle 50% of the data. It is resistant to outliers and is computationally as easy as the range. (x −x) ¯ 2
, where x¯ is the mean value. • The variance is calculated as i ni • The standard deviation – a square root of the variance. Both the variance and the standard deviation are affected by extreme values.
µ(t)
Fig. 5 Example of membership function describing the term “long” concerning the trend duration
1
t
24
J. Kacprzyk et al.
|x −x| ¯
• The mean absolute deviation (MAD), calculated as i ni . It is not frequently encountered in mathematical statistics. This is essentially because while the mean deviation has a natural intuitive definition as the “mean deviation from the mean” but the introduction of the absolute value makes analytical calculations using this statistic much more complicated. We propose to measure the variability of a trend as the distance of the data points covered by this trend from a linear uniform ε-approximation (cf. Sect. 2) that represents a given trend. For this purpose we propose to employ a distance between a point and a family of possible solutions, indicated as a gray cone in Fig. 1. Equation (1) assures that the distance is definitely smaller than ε. We may use this information for the normalization. The normalized distance equals 0 if the point lays in the gray area. In the opposite case it is equal to the distance to the nearest point belonging to the cone, divided by ε. Alternatively, we may bisect the cone and then compute the distance between the point and this ray. Similarly as in the case of dynamics of change, we find for a given value of variability obtained as above a best matching fuzzy set (linguistic label) using, e.g., some measure of a distance or similarity, cf. the book by Cross and Sudkamp [9]. Again the measure of variability is treated as a linguistic variable and expressed using linguistic terms (labels) modeled by fuzzy sets defined by the user.
4 Linguistic Data Summaries A linguistic summary is meant as a (usually short) natural language like sentence (or some sentences) that subsumes the very essence of a set of data (cf. Kacprzyk and Zadro˙zny [18], [19]). This data set is numeric and usually large, not comprehensible in its original form by the human being. In Yager’s approach (cf. Yager [25], Kacprzyk and Yager [14], and Kacprzyk, Yager and Zadro˙zny [15]) the following perspective for linguistic data summaries is assumed: • Y = {y1 , . . . , yn } is a set of objects (records) in a database, e.g., the set of workers; • A = {A1 , . . . , Am } is a set of attributes characterizing objects from Y , e.g., salary, age, etc. in a database of workers, and A j (yi ) denotes a value of attribute A j for object yi . A linguistic summary of a data set consists of: • a summarizer P, i.e. an attribute together with a linguistic value (fuzzy predicate) defined on the domain of attribute A j (e.g. “low salary” for attribute “salary”); • a quantity in agreement Q, i.e. a linguistic quantifier (e.g. most);
Human Consistent Linguistic Summarization of Time Series
25
• truth (validity) T of the summary, i.e. a number from the interval [0, 1] assessing the truth (validity) of the summary (e.g. 0.7); usually, only summaries with a high value of T are interesting; • optionally, a qualifier R, i.e. another attribute together with a linguistic value (fuzzy predicate) defined on the domain of attribute Ak determining a (fuzzy subset) of Y (e.g. “young” for attribute “age”). Thus, a linguistic summary may be exemplified by T (most of employees earn low salary) = 0.7
(2)
or, in a richer (extended) form, including a qualifier (e.g. young), by T (most of young employees earn low salary) = 0.9
(3)
Thus, basically, the core of a linguistic summary is a linguistically quantified proposition in the sense of Zadeh [26] which, for (2), may be written as Qy’s are P
(4)
Q Ry’s are P
(5)
and for (3), may be written as
Then, T , i.e., the truth (validity) of a linguistic summary, directly corresponds to the truth value of (4) or (5). This may be calculated by using either original Zadeh’s calculus of linguistically quantified propositions (cf. [26]), or other interpretations of linguistic quantifiers.
5 Protoforms of Linguistic Trend Summaries It was shown by Kacprzyk and Zadro˙zny [18] that Zadeh’s [27] concept of the protoform is convenient for dealing with linguistic summaries. This approach is also employed here. Basically, a protoform is defined as a more or less abstract prototype (template) of a linguistically quantified proposition. Then, the summaries mentioned above might be represented by two types of the protoforms: • Summaries based on frequency: – a protoform of a short form of linguistic summaries: Q trends are P
(6)
26
J. Kacprzyk et al.
and exemplified by: Most of trends have a large variability – a protoform of an extended form of linguistic summaries: Q R trends are P
(7)
and exemplified by: Most of slowly decreasing trends have a large variability • Duration based summaries: – a protoform of a short form of linguistic summaries: The trends that took Q time are P
(8)
and exemplified by: The trends that took most time have a large variability – a protoform of an extended form of linguistic summaries: R trends that took Q time are P
(9)
and exemplified by: Slowly decreasing trends that took most time have a large variability
The truth values of the above types and forms of linguistic summaries will be found using the classic Zadeh’s [26] calculus of linguistically quantified propositions, and Sugeno [24] integral.
5.1 The Use of Zadeh’s Calculus of Linguistically Quantified Propositions In Zadeh’s [26] fuzzy logic based calculus of linguistically quantified propositions, a (proportional, nondecreasing) linguistic quantifier Q is assumed to be a fuzzy set in the interval [0, 1] as, e.g.
Human Consistent Linguistic Summarization of Time Series
27
⎧ for x ≤ 0.8 ⎨1 μ Q (x) = 2x − 0.6 for 0.3 < x < 0.8 ⎩ 0 for x ≤ 0.3
(10)
The truth values (from [0,1]) of (6) and (7) are calculated, respectively, as T (Qy’s are P) = μ Q
n 1 μ P (yi ) n
(11)
i=1
and n T (Q Ry’s are P) = μ Q
i=1 (μ R (yi ) ∧ μ P (yi )) n i=1 μ R (yi )
(12)
Here and later on, we assume, for obvious reasons, that the respective denominators are not equal to zero. Otherwise we need to resort to slightly modlified formulas, but this will not be considered in this paper. The calculation of the truth values of duration based summaries is more complicated and requires a different approach. In the case of a summary “the trends that took Q time are P” we should compute the time that is taken by “trend is P”. It is obvious, when “trend is P” to degree 1, as we can use then the whole time taken by this trend. However, what to do if “trend is P” is to some degree? We propose to take only a part of the time, defined by the degree to which “trend is P”. In other words we compute this time as μ(yi )t yi , where t yi is the duration of trend yi . The value obtained (duration of those trends, which “trend is P”) is then normalized by dividing it by the overall time T . Finally, we may compute to which degree the time taken by those trends which “trend is P” is Q. We proceed in a similar way in case of the extended form of linguistic summaries. Thus, we obtain the following formulae: • for the short form of the duration based summaries (8): T (y that took Q time are P) = μ Q
n 1 μ P (yi )t yi T
(13)
i=1
where T is the total time of the summarized trends and t yi is the duration of the i th trend; • for the extended form of the duration based summaries (9): T (Ry that took Q time are P) = n i=1 (μ R (yi ) ∧ μ P (yi ))t yi n = μQ i=1 μ R (yi )t yi where t yi is the duration of the i th trend.
(14)
28
J. Kacprzyk et al.
Both the fuzzy predicates P and R are assumed above to be of a rather simplified, atomic form referring to just one attribute. They can be extended to cover more sophisticated summaries involving some confluence of various, multiple attribute values as, e.g, “slowly decreasing and short”. Alternatively, we may obtain the truth values of (8) and (9), if we divide each trend, which takes t yi time units, into t yi trends, each lasting one time unit. For this new set of trends we use frequency based summaries with the truth values defined in (11) and (12).
5.2 Use of the Sugeno Integral The use of the Sugeno integral is particularly justified for the duration based linguistic summaries for which the use of Zadeh’s method is not straightforward. Let us briefly recall the idea of the Sugeno [24] integral. Let X = {x 1 , . . . , x n } be a finite set. Then, cf., e.g., Sugeno [24], a fuzzy measure on X is a set function μ : P(X) −→ [0,1] such that: μ(∅) = 0, μ(X) = 1 if A ⊆ B then μ(A) ≤ μ(B), ∀A, B ∈ P(X)
(15)
where P(X) denotes a set of all subsets of X. Let μ(.) is a fuzzy measure on X. The discrete Sugeno integral of function f : X −→ [0, 1], f (x i ) = ai , with respect to μ(.) is a function Sμ : [0, 1]n −→ [0, 1] such that Sμ (a1 , . . . , an ) = max (aσ (i) ∧ μ(Bi )) i=1,...,n
(16)
where ∧ stands for the minimum, σ is such a permutation of {1, . . . , n} that aσ (i) is the i -th smallest element from among the ai ’s and Bi = {x σ (i) , . . . , x σ (n) }. We can view function f as a membership function of a fuzzy set F ∈ F (X), where F (X) denotes a family of fuzzy sets defined in X. Then, the Sugeno integral can be equivalently defined as a function Sμ : F (X) −→ [0, 1] such that Sμ (F) =
max
(αi ∧ μ(Fαi ))
αi ∈{a1 ,...,an }
(17)
where Fαi is the α-cut of F and the meaning of other symbols is as in (16). The fuzzy measure and the Sugeno integral may be intuitively interpreted in the context of multicriteria decision making (MCDM) where we have a set of criteria and some options (decisions) characterized by the degree of satisfaction of particular criteria. In such a setting X is a set of criteria and μ(.) expresses the importance of each subset of criteria, i.e., how the satisfaction of a given subset of criteria contributes to the overall evaluation of the option. Then, the properties of the fuzzy measure (15) properly require that the satisfaction of all criteria makes an option fully satisfactory and that the more criteria are satisfied by an option the better
Human Consistent Linguistic Summarization of Time Series
29
its overall evaluation. Finally, the set F represents an option and μ F (x) defines the degree to which it satisfies the criterion x. Then, the Sugeno integral may be interpreted as an aggregation operator yielding an overall evaluation of option F in terms of its satisfaction of the set of criteria X. In such a context the formula (17) may be interpreted as follows: find a subset of criteria of the highest possible importance (expressed by μ) such that at the same time minimal satisfaction degree of all these criteria by the option F is as high as possible (expressed by α), and take the minimum of these two degrees as the overall evaluation of the option F.
(18)
Now we will briefly show how various linguistic summaries discussed in the previous section may be interpreted using the Sugeno integral. The monotone and nondecreasing linguistic quantifier Q is still defined as in Zadeh’s calculus as a fuzzy set in [0, 1], exemplified by (10). The truth value of particular summaries is computed using the Sugeno integral (17). For simple types of summaries we are in a position to provide the interpretation similar to this given above for the MCDM. For this purpose we will identify the set of criteria X with a set of (partial) trends while option F will be the whole time series under consideration characterized in terms of how well its trends satisfy P. In what follows |A| denotes the cardinality of set A, summarizers P and qualifiers R are identified with fuzzy sets modelling the linguistic terms they contain, X is the set of all trends extracted from time series and time (x i ) denotes duration of the trend x i . Now, for the particular types of summaries we obtain: • For simple frequency based summaries defined by (6) The truth value may be expressed as Sμ (P) where μ(Pα ) = μ Q
|Pα | |X |
(19)
Thus, referring to (18), the truth value is determined by looking for a subset of trends of the cardinality high enough as required by the semantics of the quantifier Q and such that all these trends “are P” to the highest possible degree. • For extended frequency based summaries defined by (7) The truth value of this type of summary may be expressed as Sμ (P) where μ(Pα ) = μ Q
| (P ∩ R)α | | Rα |
• For simple duration based summaries defined with (8): The truth value of this type of summary may be expressed as Sμ (P) where
(20)
30
J. Kacprzyk et al.
μ(Pα ) = μ Q
i:x i ∈ Pα
i:x i ∈X
time(x i )
(21)
time(x i )
Thus, referring to (18) the truth value is determined by looking for a subset of trends such that their total duration with respect to the duration of the whole time series is long enough as required by the semantics of the quantifier Q and such that all these trends “are P” to the highest possible degree. • For extended duration based summaries defined with (9): The truth of this type of summary may be expressed as Sμ (P) where μ(Pα ) = μ Q
i:x i ∈( P∩R)α
i:x i ∈Rα
time(x i )
(22)
time(x i )
Due to the properties of the quantifiers employed it is obvious that all μ(.)s defined above for the particular types of summaries satisfy the axioms (15) of the fuzzy measure. This concludes our brief exposition of the use of the classic Zadeh’s calculus of linguistically quantified propositions and the Sugeno integral for linguistic quantifier driven aggregation of partial scores (trends). In particular the former method is explicitly related to the computing with words and perception paradigm.
6 Example Assume that from some given data we have extracted the trends listed in Table 1 using the method presented in this paper. We assume the granulation of dynamics of change given in Sect. 3.1. We will show the results obtained by using for the calculation of the truth (validity) values first the classic Zadeh’s calculus of linguistically quantified propositions, and then the Sugeno integral. Table 1 An example of trends extracted id
dynamics of change (α, in degrees)
duration (time units)
variability ([0,1])
1 2 3 4 5 6 7 8 9
25 –45 75 –40 –55 50 –52 –37 15
15 1 2 1 1 2 1 2 5
0.2 0.3 0.8 0.1 0.7 0.3 0.5 0.9 0.0
Human Consistent Linguistic Summarization of Time Series
31
6.1 Use of Zadeh’s Calculus of Linguistically Quantified Propositions First, we consider the following frequency based summary: Most of trends are decr easi ng
(23)
In this summary most is the linguistic quantifier Q defined by (10). “Trends are decreasing” is the summarizer P with decreasing given as: ⎧ ⎪ ⎪0 ⎪ ⎪ ⎨ 0.066α + 4.333 μ P (α) = 1 ⎪ ⎪ −0.01α − 1 ⎪ ⎪ ⎩ 0
for α ≤ −65 for − 65 < α < −50 for − 50 ≤ α ≤ −40 for − 40 < α < −20 for α ≥ −20
(24)
The truth value of (23) is calculated via (11), and we obtain: T (Most of the trends are decr easi ng) =
n 1 μ P (yi ) = 0.389 = μQ n i=1
Where n is the number of all trends, here n = 9 If we consider the extended form of the linguistic summary, then we may have, for instance: Most of shor t trends are decr easi ng
(25)
Again, most is the linguistic quantifier Q given by (10). “Trends are decreasing” is the summarizer P as in the previous example. “Trend is short” is the qualifier R with μ R (t) defined as: ⎧ ⎨1 μ R (t) = − 12 t + ⎩ 0
3 2
for t ≤ 1 for 1 < t < 3 for t ≥ 3
The truth value of (25) is: T (Most of shor t trends are decr easi ng) = n i=1 (μ R (yi ) ∧ μ P (yi )) n = μQ = 0.892 i=1 μ R (yi )
(26)
32
J. Kacprzyk et al.
On the other hand, we may have the following duration based linguistic summary: Trends that took most time are slowly i ncr easi ng
(27)
The linguistic quantifier Q is most defined as previously. “Trends are slowly increasing” is the summarizer P with μ P (α) defined as: ⎧ 0 ⎪ ⎪ ⎪ ⎪ ⎨ 0.1α − 0.5 μ P (α) = 1 ⎪ ⎪ −0.05α + 2 ⎪ ⎪ ⎩ 0
for α ≤ 5 for 5 < α < 15 for 15 ≤ α ≤ 20 for 20 < α < 40 for α ≥ 40
(28)
The truth value of (27) is then: T (Trends that took most time are slowly i ncr easi ng) =
n 1 = μQ μ P (yi )t yi = 0.48333 T i=1
We may consider an extended form of a duration based summary, for instance: Trends wi th low vari abili t y that took most time are slowly i ncr easi ng
(29)
Again, most is the linguistic quantifier Q and “trends are slowly increasing” is the summarizer P defined as in the previous example. “Trends have low variability” is the qualifier R with μ R (v) given as: ⎧ for v ≤ 0.2 ⎨1 μ R (v) = −5v + 2 for 0.2 < v < 0.4 ⎩ 0 for v ≥ 0.4 The truth value of (29) is calculated via (14), and we obtain: T (Trends wi th low vari abili t y that took most time are slowly i ncr easi ng) = i=1 (μ R (yi ) ∧ μ P (yi ))t yi n = 0.8444 i=1 μ R (yi )t yi
n = μQ
(30)
Human Consistent Linguistic Summarization of Time Series
33
6.2 Use of the Sugeno Integral For the following simple frequency based summary: Most of trends are decr easi ng
(31)
with most as in (10), “Trends are decreasing” with the membership function of decreasing given as in (24), the truth value of (31) is calculated using (17) and (19) which yield: T (Most of the trends are decr easi ng) = |Pα | = max = 0.511 αi ∧ μ Q αi ∈{a1 ,...,an } |X | If we assume the extended form, we may have the following summary: Most of shor t trends are decr easi ng
(32)
Again, most is the linguistic quantifier Q given as (10). “Trends are decreasing” is the summarizer P given as previously. “Trend is short” is the qualifier R defined by (26). The truth value of (32) is calculated using (17) and (20), and we obtain: T (Most of shor t trends are decr easi ng) | (P ∩ R)α | = 0.9 = max αi ∧ μ Q αi ∈{a1 ,...,an } | Rα | On the other hand, we may have the following simple duration based linguistic summary: Trends that took most time are slowly i ncr easi ng
(33)
where “Trends are slowly increasing” is the summarizer P with μ P (α) defined as in (28) and the linguistic quantifier most is defined as previously. The truth value of (33) is: T (Trends that took most time are slowly i ncr easi ng)
x i ∈ Pα time(x i ) αi ∧ μ Q = max = 0.733 αi ∈{a1 ,...,an } i:x i ∈X time(x i ) Finally, we may consider an extended form of a duration based summary, here exemplified by: Trends wi th a low vari abili t y that took most of the time are slowly i ncr easi ng
(34)
34
J. Kacprzyk et al.
Again, most is the linguistic quantifier and “trends are slowly increasing” is the summarizer P defined as previously. “Trends have a low variability” is the qualifier R with μ R (v) given in (30). The truth value of (34) is: T (Trends wi th low vari abili t y that took most of the time are slowly i ncr easi ng)
i:x i ∈( P∩R)α time(x i ) αi ∧ μ Q = 0.75 i:x i ∈Rα time(x i )
=
max
αi ∈{a1 ,...,an }
7 Concluding Remarks In the paper we have proposed an effective and efficient method for linguistic summarization of time series. We use a modification of the Sklansky and Gonzalez [23] algorithm to extract from a given time series trends that are described by 3 basic attributes related to their dynamics of change, duration and variability. The linguistic summarization employed is in the sense of Yager. For the aggregation of partial trends we used the classic Zadeh’s calculus of linguistically quantified propositions, and the Sugeno integral. The approach proposed may be a further step towards a more human consistent natural language base temporal data analysis. This may be viewed as to reflect the use of the philosophy of Zadeh’s computing with words and perceptions.
References 1. I. Batyrshin (2002). On granular Derivatives and the solution of a Granular Initial Value Problem. International Journal Applied Mathematics and Computer Science, 12, 403–410. 2. I. Batyrshin, J. Kacprzyk, L. Sheremetov, L., L.A. Zadeh, Eds. (2006). Perception-based Data Mining and Decision Making in Economics and Finance. Springer, Heidelberg and New York. 3. I. Batyrshin, R. Herrera-Avelar, L. Sheremetov, A. Panova (2006). Moving Approximation Transform and Local Trend Association on Time Series Data Bases. In: Batyrshin, I.; Kacprzyk, J.; Sheremetov, L.; Zadeh, L.A. (Eds.): Perception-based Data Mining and Decision Making in Economics and Finance. Springer, Heidelberg and New York, pp. 53–79. 4. I. Batyrshin, L. Sheremetov (2006). Perception Based Functions in Qualitative Forecasting. In: Batyrshin, I.; Kacprzyk, J.; Sheremetov, L.; Zadeh, L.A. (Eds.): Perception-based Data Mining and Decision Making in Economics and Finance. Springer, Heidelberg and New York, pp. 112–127. 5. I. Batyrshin, L. Sheremetov, R. Herrera-Avelar (2006). Perception Based Patterns in Time Series Data Mining. In Batyrshin, I.; Kacprzyk, J.; Sheremetov, L.; Zadeh, L.A. (Eds.): Perception-based Data Mining and Decision Making in Economics and Finance. Springer, Heidelberg and New York, pp. 80–111. 6. T. Calvo, G. Mayor, R. Mesiar, Eds. (2002). Aggregation Operators. New Trends and Applications. Springer-Verlag, Heidelberg and New York.
Human Consistent Linguistic Summarization of Time Series
35
7. D.-A. Chiang, L. R. Chow, Y.-F. Wang (2000). Mining time series data by a fuzzy linguistic summary system. Fuzzy sets and Systems, 112, 419–432. 8. J. Colomer, J. Melendez, J. L. de la Rosa, J. Augilar (1997). A qualitative/quantitative representation of signals for supervision of continuous systems. In Proceedings of the European Control Conference - ECC97, Brussels. 9. V. Cross, Th. Sudkamp (2002). Similarity and Compatibility in Fuzzy Set Theory: Assessment and Applications. Springer-Verlag, Heidelberg and New York. 10. H. Feidas, T. Makrogiannis, E. Bora-Senta (2004). Trend analysis of air temperature time series in Greece and their relationship with circulation using surface and satellite data: 1955–2001. Theoretical and Applied Climatology, 79, 185–208. 11. I. Glöckner (2006). Fuzzy Quantifiers. A Computational Theory. Springer, Heidelberg and New York. 12. F. Hoeppner (2003). Knowledge Discovery from Sequential Data. Ph.D. dissertation, Braunschweig University. 13. B.G. Hunt (2001). A description of persistent climatic anomalies in a 1000-year climatic model simulation. Climate Dynamics, 17, 717–733. 14. J. Kacprzyk and R.R. Yager (2001). Linguistic summaries of data using fuzzy logic. In International Journal of General Systems, 30, 33–154. 15. J. Kacprzyk, R.R. Yager and S. Zadro˙zny (2000). A fuzzy logic based approach to linguistic summaries of databases. International Journal of Applied Mathematics and Computer Science, 10, 813–834. 16. J. Kacprzyk and S. Zadro˙zny (1995). FQUERY for Access: fuzzy querying for a Windowsbased DBMS. In P. Bosc and J. Kacprzyk (Eds.), Fuzziness in Database Management Systems, Springer-Verlag, Heidelberg and New York, pp. 415–433. 17. J. Kacprzyk and S. Zadro˙zny (1999). The paradigm of computing with words in intelligent database querying. In L.A. Zadeh and J. Kacprzyk (Eds.) Computing with Words in Information/Intelligent Systems. Part 2. Foundations, Springer–Verlag, Heidelberg and New York, pp. 382–398. 18. J. Kacprzyk, S. Zadro˙zny (2005). Linguistic database summaries and their protoforms: toward natural language based knowledge discovery tools. Information Sciences, 173, 281–304. 19. J. Kacprzyk, S. Zadro˙zny (2005). Fuzzy linguistic data summaries as a human consistent, user adaptable solution to data mining. In B. Gabrys, K. Leiviska, J. Strackeljan (Eds.), Do Smart Adaptive Systems Exist? Springer, Berlin, Heidelberg and New York, pp. 321–339. 20. S. Kivikunnas (1999). Overview of Process Trend Analysis Methods and Applications. In Proceedings of Workshop on Applications in Chemical and Biochemical Industry, Aachen. 21. K. B. Konstantinov, T. Yoshida (1992). Real-time qualitative analysis of the temporal shapes of (bio) process variables. American Institute of Chemical Engineers Journal, 38, 1703–1715. 22. K. Neusser (1999). An investigation into a non-linear stochastic trend model. Empirical Economics 24, 135–153. 23. J. Sklansky and V. Gonzalez (1980). Fast polygonal approximation of digitized curves. Pattern Recognition 12, 327–331. 24. M. Sugeno (1974). Theory of Fuzzy Integrals and Applications. Ph.D. Thesis, Tokyo Institute of Technology, Tokyo, Japan. 25. R.R. Yager (1982). A new approach to the summarization of data. Information Sciences, 28, 69–86. 26. L.A. Zadeh (1983). A computational approach to fuzzy quantifiers in natural languages. In Computers and Mathematics with Applications, 9, 149–184. 27. L.A. Zadeh (2002). A prototype-centered approach to adding deduction capabilities to search engines – the concept of a protoform. BISC Seminar, University of California, Berkeley.
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind Masoud Nikravesh
Abstract Inspired by human’s remarkable capability to perform a wide variety of physical and mental tasks without any measurements and computations and dissatisfied with classical logic as a tool for modeling human reasoning in an imprecise environment, Lotfi A. Zadeh developed the theory and foundation of fuzzy logic with his 1965 paper “Fuzzy Sets” [1] and extended his work with his 2005 paper “Toward a Generalized Theory of Uncertainty (GTU)—An Outline” [2]. Fuzzy logic has at least two main sources over the past century. The first of these sources was initiated by Peirce in the form what he called a logic of vagueness in 1900s, and the second source is Lotfi’s A. Zadeh work, fuzzy sets and fuzzy Logic in the 1960s and 1970s. Keywords: Nature of Mind · Zadeh · Fuzzy Sets · Logic · Vagueness
1 Introduction Human have a remarkable capability to perform a wide variety of physical and mental tasks without any measurements and any computations. In traditional sense, computation means manipulation of numbers, whereas human uses words for computation and reasoning. Underlying this capability is brain’s crucial ability to manipulate perceptions and remarkable capability to operate on, and reason with, perception-based information which are intrinsically vague, imprecise and fuzzy. In this perspective, addition of the machinery of the computing with words to existing theories may eventually lead to theories which have a superior capability to deal with real-world problems and make it possible to conceive and design intelligent systems or systems of system with a much higher MIQ (Machine IQ) than those we have today. The role model for intelligent system is Human Mind. Dissatisfied with
Masoud Nikravesh BISC Program, Computer Sciences Division, EECS Department University of California, Berkeley, and NERSC-Lawrence Berkeley National Lab., California 94720–1776, USA e-mail:
[email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
37
38
M. Nikravesh
classical logic as a tool for modeling human reasoning in an imprecise and vague environment, Lotfi A. Zadeh developed the formalization of fuzzy logic, starting with his 1965 paper “Fuzzy Sets” [1] and extended his work with his 2005 paper “Toward a Generalized Theory of Uncertainty (GTU)—An Outline” [2]. To understand how human mind works and its relationship with intelligent system, fuzzy logic and Zadeh’s contribution, in this paper we will focus on the issues related to common sense, historical sources of fuzzy logic, different historical paths with respect to the nature of human mind, and the evolution of intelligent system and fuzzy logic.
2 Common Sense In this section, we will focus on issues related to common sense in the basis of the nature of human mind. We will also give a series of examples in this respect. Piero Scaruffi in his book “Thinking about Thought” [3], argues that human has a remarkable capability to confront the real problem in real situation by using a very primitive logic. In Lotfi A. Zadeh’s perspective, human have a remarkable capability to perform a wide variety of physical and mental tasks without any measurements and any computations. This is the notion that we usually refer as “common sense”. This remarkable capability is less resembles to a mathematical genius but it is quite effective for the purpose of human survival in this world. Common sense determines usually what we do, regardless of what we think. However, it is very important to mention that common sense is sometimes wrong. There are many examples of “paradoxical” common sense reasoning in the history of science. For example, Zeno proved that Achilles could never overtake a turtle or one can easily prove that General Relativity is absurd (a twin that gets younger just by traveling very far is certainly a paradox for common sense) and not to mention that common sense told us that the Earth is flat and at the center of the world. However, physics is based on precise Mathematics and not on common sense for the reason that common sense can be often being wrong. Now the question is that “What is this very primitive logics based on that human use for common sense?” Logic is based on deduction, a method of exact inference based on classical logic ( i.e. “all Greek are human and Socrates is a Greek, therefore Socrates is a human”) or approximate in form of fuzzy logic. There are other types of inference, such as induction which infers generalizations from a set of events (i.e. “Water boils at 100 degrees”), and abduction, which infers plausible causes of an effect (i.e. “You have the symptoms of a flue”). While the common sense is based on a set of primitive logics, human rarely employ classical logic to determine how to act in a new situation. If we used classical logic, which for sure are paradoxical often, one may be able to take only a few actions a day. The classical logic allows human to reach a conclusion when a problem is well defined and formulated while common sense helps human to deal with the complexity of the real world which are ill defined, vague and are nor formulated. In short, common sense provides an effective path to make critical
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
39
decisions very quickly based on perceptive information. The bases of common sense are both reasoning methods and human knowledge but that are quite distinct from the classical logic. It is extremely difficult, if not impossible, to build a mathematical model for some of the simplest decisions we make daily. For example, human can use common sense to draw conclusions when we have incomplete, imprecise, and unreliable or perceptive information such as “many”, “tall”, “red”, “almost”. Furthermore and most importantly, common sense does not have to deal with logical paradoxes. What is important to mention is that the classical logic is inadequate for ordinary life. The limits and inadequacies of classical logic have been known for decades and numerous alternatives or improvements have been proposed including the use Intuitionism, Non- Monotonic Logic (Second thoughts), Plausible reasoning (Quick, efficient response to problems when an exact solution is not necessary), and fuzzy logic. Another important aspect of common sense reasoning is the remarkable human capability on dealing with uncertainties which are either implicit or explicit. The most common classic tool for representing uncertainties is probability theory which is formulated by Thomas Bayes in the late 18th century. The basis for the probabilities theory is to translate uncertainty into some form of statistics. In addition, other techniques such as Bayes’ theorem allow one to draw conclusions from a number of probable events. In a technical framework, we usually use probability to express a belief. In the 1960s Leonard Savage [4] thought of the probability of an event as not merely the frequency with which that event occurs, but also as a measure of the degree to which someone believes it will happen. While Bayes rule would be very useful to build generalizations, unfortunately, it requires the “prior” probability, which, in the case of induction, is precisely what we are trying to assess. Also, Bayes’ theorem often time does not yield intuitive conclusions such as the accumulation of evidence tends to lower the probability, not to increase it or the sum of the probabilities of all possible events must be one, and that is also not very intuitive. To solve some of the limitations of the probability theory and make it more plausible, in 1968 Glenn Shafer [5] and Stuart Dempster devised a theory of evidence by introducing a “belief function” which operates on all subsets of events (not just the single events). Dempster-Shafer’s theory allows one to assign a probability to a group of events, even if the probability of each single event is not known. Indirectly, Dempster-Shafer’s theory also allows one to represent “ignorance”, as the state in which the belief of an event is not known. In other words, Dempster-Shafer’s theory does not require a complete probabilistic model of the domain. Zadeh introduced fuzzy set in 1965 and fuzzy logic in 1973 and generalize the classical and bivalent logic to deal with common sense problems.
3 Historical Source of Fuzzy Logic Fuzzy logic has at least two main sources over the past century. The first of these sources was initiated by Charles S. Peirce who used the term “logic of vagueness”, could not finish the idea and to fully develop the theory prior to his death. In 1908,
40
M. Nikravesh
he has been able to outline and envision the theory of triadic logic. The concept of “vagueness” was later picked up by Max Black [6]. In 1923, philosophers Bertrand Russell [7] in a paper on vagueness suggested that language is invariably vague and that vagueness is a matter of degree. More recently, the logic of vagueness became the focus of studies of other such by Brock [8], Nadin [9, 10], Engel-Tiercelin [11], and Merrell [12, 13, 14, 15]. The second source is the one initiated by Lotfi A. Zadeh who used the term “Fuzzy Sets” for the first time in 1960 and extended the idea during the past 40 years.
3.1 Lotfi A. Zadeh [1] Prior to the publication of his [Prof. Zadeh] first paper on fuzzy sets in l965—a paper which was written while he was serving as Chair of the Department of Electrical Engineering at UC Berkeley—he had achieved both national and international recognition as the initiator of system theory, and as a loading contributor to what is more broadly referred to as systems analysis. His principal contributions were the development of a novel theory of time-varying networks; a major extension of Wiener’s theory of prediction (with J.R. Ragazzini) [16]; the z-transform method for the analysis of sampled-data systems (with J.R. Ragazzini) [17]; and most importantly, development of the state space theory of linear systems, published as a co-authored book with C.A. Desoer in l963 [18]. Publication of his first paper on fuzzy sets in l965 [1] marked the beginning of a new phase of his scientific career. From l965 on, almost all of his publications have been focused on the development of fuzzy set theory, fuzzy logic and their applications. His first paper entitled “Fuzzy Sets,” got a mixed reaction. His strongest supporter was the late Professor Richard Bellman, an eminent mathematician and a leading contributor to systems analysis and control. For the most part, He encountered skepticism and sometimes outright hostility. There were two principal reasons: the word “fuzzy” is usually used in a pejorative sense; and, more importantly, his abandonment of classical, Aristotelian, bivalent logic was too radical as a departure from deep-seated scientific traditions. Fuzzy logic goes beyond Lukasiewicz’s [19] multi-valued logic because it allows for an infinite number of truth values: the degree of “membership” can assume any value between zero and one. Fuzzy Logic is also consistent with the principle of incompatibility stated at the beginning of the century by a father of modern Thermodynamics, Pierre Duhem. The second phase, 1973–1999 [22, 23, 24, 25, 26], began with the publication of 1973 paper, “Outline of a New Approach to the Analysis of Complex Systems and Decision Processes” [20]. Two key concepts were introduced in this paper: (a) the concept of a linguistic variable; and (b) the concept of a fuzzy if-then rule. Today, almost all applications of fuzzy set theory and fuzzy logic involve the use of these concepts. What should be noted is that Zadeh employed the term “fuzzy logic” for the first time in his 1974 Synthese paper “Fuzzy Logic and Approximate Reasoning”. [21] Today, “fuzzy logic” is used in two different senses: (a) a narrow sense, in which fuzzy logic, abbreviated as FLn , is a logical system which is a generalization
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
41
of multivalued logic; and (b) a wide sense, in which fuzzy logic, abbreviated as FL, is a union of FLn , fuzzy set theory, possibility theory, calculus of fuzzy if-then rules, fuzzy arithmetic, calculus of fuzzy quantifiers and related concepts and calculi. The distinguishing characteristic of FL is that in FL everything is, or is allowed to be, a matter of degree. Many of Prof. Zadeh’s papers written in the eighties and early nineties were concerned, for the most part, with applications of fuzzy logic to knowledge representation and commonsense reasoning. Unlike Probability theory, Fuzzy Logic represents the real world without any need to assume the existence of randomness. Soft computing came into existence in 1991, with the launching of BISC (Berkeley Initiative in Soft Computing) at UC Berkeley. Basically, soft computing is a coalition of methodologies which collectively provide a foundation for conception, design and utilization of intelligent systems. The principal members of the coalition are: fuzzy logic, neurocomputing, evolutionary computing, probabilistic computing, chaotic computing, rough set theory and machine learning. The basic tenet of soft computing is that, in general, better results can be obtained through the use of constituent methodologies of soft computing in combination rather than in a stand-alone mode. A combination which has attained wide visibility and importance is that of neuro-fuzzy systems. Other combinations, e.g., neuro-fuzzy-genetic systems, are appearing, and the impact of soft computing is growing on both theoretical and applied levels.
3.2 Charles S. Peirce [26] Charles S. Peirce is known to what he termed a logic of vagueness and equivalently, logic of possibility and logic of continuity. Pierce believes that this logic will fit theory of possibility which can be used for most of the reasoning case. Pierce’s view indeed was a radical view in his time which obviously did go against the classical logic such as Boole, de Morgan, Whatley, Schröder, and others. For a period of time, researchers were believed that Peirce’s work is line with triadic logic; however it must be more than that. In fact, one should assume, as Peirce himself often times used term logic of vagueness, his theory and thinking most likely is one of the sources of today’s fuzzy logic. Unfortunately, Peirce has not been able to fully develop his idea and theory of logic of vagueness that he envisioned. In 1908, he has been able to outline the makings of a ‘triadic logic’ and envision his logic based on actuality real possibility and necessity.
4 Human Mind Mind is what distinguishes humans from the rest of creation. Human mind is the next unexplored frontier of science and arguably the most important for humankind. Our mind is who we are. We consider human brain as a machine or as a powerful
42
M. Nikravesh
Fig. 1 New Age
computer, the mind is a set of processes. Therefore mind occurs in the brain, however mind it is not like a computer program.
4.1 Different Historical Paths It has been different historical part toward the understanding of the human mind, these include: philosophy, psychology, math, biology (neuro computing), computer science and AI (McCarthy, 1955’s), linguistics and computational linguistics (Chomsky 1960s), and physics (since 1980s) (Figs. 1–4). Each of these fields has contributed in a different way to help scientist to better understand the human mind. For example the contribution of information science includes, 1) consider the mind as a symbol processor, 2) formal study of human knowledge, 3) the knowledge processing, 4) study of common-sense knowledge, and 5) Neuro-computing. The contribution of linguistics includes competence over performance, pragmatics, and metaphor. The contribution of psychology includes understanding the mind as a processor of concepts, reconstructive memory, memory is learning and is reasoning, and fundamental unity of cognition. The contribution of neurophysiology includes to understand the brain is an evolutionary system, mind shaped mainly by genes and experience, neural-level competition, and connectionism. Finally, the contribution of physics includes, understanding the living beings create order from disorder, non-equilibrium thermodynamics, self-organizing systems, and the mind as a selforganizing system, theories of consciousness based on quantum & relativity physics.
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
43
MACHINE INTELLIGENCE: History 1936: TURING MACHINE 1940: VON NEUMANN’S DISTINCTION BETWEEN DATA AND INSTRUCTIONS 1943: FIRST COMPUTER 1943 MCCULLOUCH & PITTS NEURON 1947: VON NEUMANN’s SELF-REPRODUCING AUTOMATA 1948: WIENER’S CYBERNETICS 1950: TURING’S TEST 1956: DARTMOUTH CONFERENCE ON ARTIFICIAL INTELLIGENCE 1957: NEWELL & SIMON’S GENERAL PROBLEM SOLVER 1957: ROSENBLATT’S PERCEPTRON 1958: SELFRIDGE’S PANDEMONIUM 1957: CHOMSKY’S GRAMMAR 1959: SAMUEL’S CHECKERS 1960: PUTNAM’S COMPUTATIONAL FUNCTIONALISM 1960: WIDROW’S ADALINE 1965: FEIGENBAUM’S DENDRAL
1965: ZADEH’S FUZZY LOGIC 1966: WEIZENBAUM’S ELIZA 1967: HAYES-ROTH’S HEARSAY 1967: FILLMORE’S CASE FRAME GRAMMAR 1969: MINSKY & PAPERT’S PAPER ON NEURAL NETWORKS 1970: WOODS’ ATN 1972: BUCHANAN’S MYCIN 1972: WINOGRAD’S SHRDLU 1974: MINSKY’S FRAME 1975: SCHANK’S SCRIPT 1975: HOLLAND’S GENETIC ALGORITHMS 1979: CLANCEY’S GUIDON 1980: SEARLE’S CHINESE ROOM ARTICLE 1980: MCDERMOTT’S XCON 1982: HOPFIELD’S NEURAL NET 1986: RUMELHART & MCCLELLAND’S PDP 1990: ZADEH’S SOFT COMPUTING 2000: ZADEH’S COMPUTING WITH WORDS AND PERCEPTIONS & PNL
Fig. 2 History of Machine Intelligence
4.2 Mind and Computation As we discussed earlier, Mind is not like a computer program. Mind is a set of processes that occurs in Brain. Mind has computational equivalences in many dimensions such as imaging and thinking. Other important dimensions that Mind has computational equivalences includes, appreciation of beauty, attention, awareness, belief, cognition, consciousness, emotion, feeling, imagination, intelligence, knowledge, meaning, perception, planning, reason, sense of good and bad, sense of justice & duty, thought, understanding, and wisdom. Dr. James S. Albus Senior NIST Fellow – Intelligent Systems Division–Manufacturing Engineering Laboratory at National Institute of Standards and Technology provided a great overview of these dimensions and the computational equivalences of Mind as described below (with his permission): • • • • •
Understanding = correspondence between model and reality Belief = level of confidence assigned to knowledge Wisdom = ability to make decisions that achieve long term goals Attention = focusing sensors & perception on what is important Awareness = internal representation of external world
44
M. Nikravesh DAVID HILBERT (1928) MATHEMATICS = BLIND MANIPULATION OF SYMBOLS FORMAL SYSTEM = A SET OF AXIOMS AND A SET OF INFERENCE RULES PROPOSITIONS AND PREDICATES DEDUCTION = EXACT REASONING
KURT GOEDEL (1931) •
A CONCEPT OF TRUTH CANNOT BE DEFINED WITHIN A FORMAL SYSTEM
ALFRED TARSKI (1935) •
DEFINITION OF “TRUTH”: A STATEMENT IS TRUE IF IT CORRESPONDS TO REALITY (“CORRESPONDENCE THEORY OF TRUTH”)
ALFRED TARSKI (1935) BUILD MODELS OF THE WORLD WHICH YIELD INTERPRETATIONS OF SENTENCES IN THAT WORLD TRUTH CAN ONLY BE RELATIVE TO SOMETHING “META-THEORY”
ALAN TURING (1936) • • • •
COMPUTATION = THE FORMAL MANIPULATION OF SYMBOLS THROUGH THE APPLICATION OF FORMAL RULES HILBERT’S PROGRAM REDUCED TO MANIPULATION OF SYMBOLS LOGIC = SYMBOL PROCESSING EACH PREDICATE IS DEFINED BY A FUNCTION, EACH FUNCTION IS DEFINED BY AN ALGORITHM
NORBERT WIENER (1947) CYBERNETICS BRIDGE BETWEEN MACHINES AND NATURE, BETWEEN "ARTIFICIAL" SYSTEMS AND NATURAL SYSTEMS FEEDBACK, HOMEOSTASIS, MESSAGE, NOISE, INFORMATION PARADIGM SHIFT FROM THE WORLD OF CONTINUOUS LAWS TO THE WORLD OF ALGORITHMS, DIGITAL VS ANALOG WORLD
CLAUDE SHANNON AND WARREN WEAVER (1949) • INFORMATION THEORY ENTROPY = A MEASURE OF DISORDER = A MEASURE OF THE LACK OF INFORMATION LEON BRILLOUIN'S NEGENTROPY PRINCIPLE OF INFORMATION
ANDREI KOLMOGOROV (1960) ALGORITHMIC INFORMATION THEORY COMPLEXITY = QUANTITY OF INFORMATION CAPACITY OF THE HUMAN BRAIN = 10 TO THE 15TH POWER MAXIMUM AMOUNT OF INFORMATION STORED IN A HUMAN BEING = 10 TO THE 45TH ENTROPY OF A HUMAN BEING = 10 TO THE 23TH
LOTFI A. ZADEH (1965) FUZZY SET “Stated informally, the essence of this principle is that as the complexity of a system increases, our ability to make precise and yet significant statements about its behavior diminishes until a threshold is reached beyond which precision and significance (or relevance) become almost mutually exclusive characteristics.”
Fig. 3 History of Computation Science
• • • •
Consciousness = aware of self & one’s relationship to world Intelligence = ability to achieve goals despite uncertainty Sense of good and bad = fundamental value judgment Sense of justice & duty = culturally derived value judgment
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
45
First Attempts Simple neurons which are binary devices with fixed thresholds – simple logic functions like “and”, “or” – McCulloch and Pitts (1943) Promising & Emerging Technology Perceptron – three layers network which can learn to connect or associate a given input to a random output - Rosenblatt (1958) ADALINE (ADAptive LInear Element) – an analogue electronic device which uses least-mean-squares (LMS) learning rule – Widrow & Hoff (1960) 1962: Rosenblatt proved the convergence of the perceptron training rule. Period of Frustration & Disrepute Minsky & Papert’s book in 1969 in which they generalised the limitations of single layer Perceptrons to multilayered systems. “...our intuitive judgment that the extension (to multilayer systems) is sterile” Innovation Grossberg's (Steve Grossberg and Gail Carpenter in 1988) ART (Adaptive Resonance Theory) networks based on biologically plausible models. Anderson and Kohonen developed associative techniques Klopf (A. Henry Klopf) in 1972, developed a basis for learning in artificial neurons based on a biological principle for neuronal learning called heterostasis. Werbos (Paul Werbos 1974) developed and used the back-propagation learning method 1970-1985: Very little research on Neural Nets Fukushima’s (F. Kunihiko) cognitron (a step wise trained multilayered neural network for interpretation of handwritten characters). Re-Emergence 1986: Invention of Backpropagation [Rumelhart and McClelland, but also Parker and earlier on: Werbos] which can learn from nonlinearly-separable data sets. Since 1985: A lot of research in Neural Nets! 1983 Defense Advanced Research Projects Agency(DARPA) funded neurocomputing research
Fig. 4 History of Machine Intelligence (Neuro Computing)
• • • • • • • • • • •
Appreciation of beauty = perceptual value judgment Imagination = modeling, simulation, & visualization Thought = analysis of what is imagined Reason = logic applied to thinking Planning = thinking about possible future actions and goals Emotion = value judgment, evaluation of good and bad Feeling = experience of sensory input or emotional state Perception = transformation of sensation into knowledge Knowledge = information organized so as to be useful Cognition = analysis, evaluation, and use of knowledge Meaning = relationships between forms of knowledge
4.3 The Nature of Mind As we discussed earlier, it has been different historical part toward the understanding of the human mind. This understanding provides a better insight into the nature of the Mind. The main contributions of the Information sciences toward better understanding of the nature of Mind include; development of neural network models, the model mind as a symbol processor, formal study of human knowledge and knowledge processing and common sense knowledge. The main contributions of the Linguistics include understanding of pragmatics, metaphor, and competence. Psychology has opened an important door toward better understanding the nature of
46
M. Nikravesh
the Mind through interpreting Mind as a processor of concepts, unity of cognition, memory as a learning and reasoning, and reconstructive memory. Neurophysiology helped to better understand the connection between brain and Mind which interpreted the brain as an evolutionary system, and the fact that Mind is shaped and formed maily by genes and experience, and better insight about the nature of Mind at neural-level competition and as connectionism. Physics has been in front row of understanding the nature of Mind and development of systems to represent such unexplored frontiers. Physics provided another insight about the nature of Mind such as Mind as self-organizing systems and the theory of consciousness based on quantum and relativity physics.
4.4 Recent Breakthroughs Recent years we have witnessed a series of breakthrough toward better understanding of the human Mind. These include breakthrough in Neurosciences, Cognitive Modeling, Intelligent Control, Depth Imaging, and Computational Power. Dr. James S. Albus Senior NIST Fellow summarized some of the new breakthrough as follows (with his permission): Neurosciences – Focused on understanding the brain - chemistry, synaptic transmission, axonal connectivity, functional MRI Cognitive Modeling – Focused on representation and use of knowledge in performing cognitive tasks - mathematics, logic, language Intelligent Control – Focused on making machines behave appropriately (i.e., achieve goals) in an uncertain environment - manufacturing, autonomous vehicles, agriculture, mining Depth Imaging – Enables geometrical modeling of 3-D world. Facilitates grouping and segmentation. - Provides solution to symbol-grounding problem. Computational Power – Enables processes that rival the brain in operations per second. At 1010 ops, heading for 1015 ops.
4.5 Future Anticipations: Reality or SciFi It is year 2020. What can we expect in year 2020. Scientist projects the following possible predictions and anticipations: • Computing Power ==> Quadrillion/sec/$100 – 5–15 Quadrillion/sec (IBM’s Fastest computer = 100 Trillion)
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
47
• High Resolution Imaging (Brian and Neuroscience) – Human Brain, Reverse Engineering – Dynamic Neuron Level Imaging/Scanning and Visualization • Searching, Logical Analysis Reasoning – Searching for Next Google • Technology goes Nano and Molecular Level – Nanotechnology – Nano Wireless Devices and OS • Tiny- blood-cell-size robots • Virtual Reality through controlling Brain Cell Signals • Once we develop a very intelligent robot or may be humanoid that can do everything human can do and may be better, faster and more efficient most of the tasks in corporate and labor world, at that time, who do you think should work and who should get paid? Everyday, the boundary between the reality and fantasy, and science and fiction get fuzzier and fuzzier and it is hard to know if we are dreaming or we live in reality. Human has been dreamed for years and fantasized her dreams and today and may be in year 2020 or 2030 or 2050, we will see some of the long waited dreams become reality. Figures 5 through 10 are some of my picks that connect the reality to the SciFi and what we can anticipate for years to come.
The year is 2035
In science fiction, the Three Laws of Robotics are a set of three rules written by Isaac Asimov, which most robots appearing in his fiction must obey. Introduced in his 1942 short story "Runaround", the Laws state the following: 1. A robot may not harm a human being, or, through inaction, allow a human being to come to harm. 2. A robot must obey the orders given to it by human beings except where such orders would conflict with the First Law. 3. A robot must protect its own existence, as long as such protection does not conflict with the First or Second Law.
S onn y Wikipedia -- Encyclopedia
Fig. 5 Movie I-Robot, the age of robot and machine intelligence
48
M. Nikravesh
Star Wars is an influential science fantasy saga and fictional universe created by writer/producer/director George Lucas during the 1970s.
Wikipedia -- Encyclopedia
Fig. 6 Movie Star Wars, the age of human cloning
5 Quotations “As Complexity increases precise statements lose meaning and meaningful statements lose precision.” (Lotfi A Zadeh) “As far as the laws of mathematics refer to reality, they are not certain, and as they are certain, they do not refer to reality.” (A. Einstein) “The certainty that a proposition is true decreases with any increase of its precision The power of a vague assertion rests in its being vague (“I am not tall”.
It seems 400 years prior, the rest of the planets population succumbed to a vague industrial disease (shades of Todd Haynes’ Safe) that was later cured by a distant ancestor of the current Chairman. Despite seemingly unlimited Biotechnology, fear of disease prevents citizens from venturing out into the wild greenery beyond Bregna’s walls. The cities population is having strange dreams and people are disappearing. Æon Flux was created by Korean American animator Peter Chung.
Wikipedia -- Encyclopedia
Fig. 7 Movie AeonFlux, the new age of communication and genetic cloning
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
49
The first and most obvious theme is whether individuals are dominated by fate or whether they have free will.
Steven Spielborg Minority Report is a science fiction short story by Philip K. Dick first published in 1956. A movie, Minority Report (2002), starring Tom Cruise and directed by Steven Spielberg
Fig. 8 Movie Minority Report, the age of human thoughts
A very precise assertion is almost never certain (“I am 1.71cm tall)” Principle of incompatibility.” (Pierre Duhem) “Was Einstein misguided? Must we accept that there is a fuzzy, probabilistic quantum arena lying just beneath the definitive experiences of everyday reality? As of today, we still don’t have a final answer. Fifty years after Einstein’s death,
Mind-Manipulation Invention
Wikipedia -- Encyclopedia
Fig. 9 Movie Batman Forever, the age of reading the thought through human brain activities
50
M. Nikravesh
Fantastic Voyage is a 1966 science fiction film written by Harry Kleiner. 20th Century Fox wanted a book that would be a tie-in with the movie, and hired Isaac Asimov to write a novelization based on the screenplay.
Fig. 10 Movie Fantastic Voyage, the age of nano-medical devices
however, the scales have certainly tipped farther in this direction.” (Brian Greene, NYT (2005)) [28] “One is tempted to rewrite Quantum Mechanics using Fuzzy Theory instead of Probability Theory. After all, Quantum Mechanics, upon which our description of matter is built, uses probabilities mainly for historical reasons: Probability Theory was the only theory of uncertainty available at the time. Today, we have a standard interpretation of the world which is based on population thinking: we cannot talk about a single particle, but only about sets of particles. We cannot know whether a particle will end up here or there, but only how many particles will end up here or there. The interpretation of quantum phenomena would be slightly different if Quantum Mechanics was based on Fuzzy Logic: probabilities deal with populations, whereas Fuzzy Logic deals with individuals; probabilities entail uncertainty, whereas Fuzzy Logic entails ambiguity. In a fuzzy universe a particle’s position would be known at all times, except that such a position would be ambiguous (a particle would be simultaneously “here” to some degree and “there” to some other degree). This might be viewed as more plausible, or at least more in line with our daily experience that in nature things are less clearly defined than they appear in a mathematical representation of them.” Piero Scaruffi (2003) “What I believe, and what as yet is widely unrecognized, is that, the genesis of computing with words and the computational theory of perceptions in my 1999 paper [26], “From Computing with Numbers to Computing with Words—from Manipulation of Measurements to Manipulation of Perceptions,” will be viewed, in retrospect, as an important development in the evolution of fuzzy logic, marking the beginning of the third phase, 1999–. Basically, development of computing with
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
51
words and perceptions brings together earlier strands of fuzzy logic and suggests that scientific theories should be based on fuzzy logic rather than on Aristotelian, bivalent logic, as they are at present. A key component of computing with words is the concept of Precisiated Natural Language (PNL). PNL opens the door to a major enlargement of the role of natural languages in scientific theories. It may well turn out to be the case that, in coming years, one of the most important application-areas of fuzzy logic, and especially PNL, will be the Internet, centering on the conception and design of search engines and question-answering systems. From its inception, fuzzy logic has been—and to some degree still is—an object of skepticism and controversy. In part, skepticism about fuzzy logic is a reflection of the fact that, in English, the word “fuzzy” is usually used in a pejorative sense. But, more importantly, for some fuzzy logic is hard to accept because by abandoning bivalence it breaks with centuries-old tradition of basing scientific theories on bivalent logic. It may take some time for this to happen, but eventually abandonment of bivalence will be viewed as a logical development in the evolution of science and human thought.” Lotfi A. Zadeh (2005)
6 Conclusions Many of Zadeh’s papers written in the eighties and early nineties were concerned, for the most part, with applications of fuzzy logic to knowledge representation and commonsense reasoning. Then, in l999, a major idea occurred to him—an idea which underlies most of his current research activities. This idea was described in a seminal paper entitled “From Computation with Numbers to Computation with Words—From Manipulation of Measurements to Manipulation of Perceptions.” His 1999 paper initiated a new direction in computation which he called Computing with Words (CW). In Zadeh’s view, Computing with Words (CW) opens a new chapter in the evolution of fuzzy logic and its applications. It has led him to initiation of a number of novel theories, including Protoform Theory (PFT), Theory of Hierarchical Definability (THD), Perception-based Probability Theory (PTt), Perception-based Decision Analysis (PDA) and the Unified Theory of Uncertainty (UTU) (Figs. 11–12). Zadeh believes that these theories will collectively have a wide-ranging impact on scientific theories, especially those in which human perceptions play a critical role—as they do in economics, decision-making, knowledge management, information analysis and other fields. Successful applications of fuzzy logic and its rapid growth suggest that the impact of fuzzy logic will be felt increasingly in coming years. Fuzzy logic is likely to play an especially important role in science and engineering, but eventually its influence may extend much farther. In many ways, fuzzy logic represents a significant paradigm shift in the aims of computing - a shift which reflects the fact that the human mind, unlike present day computers, possesses a remarkable ability to store and process information which is pervasively imprecise, uncertain and lacking in categoricity.
52
M. Nikravesh
EVOLUTION OF FUZZY LOGIC –SHIFTS IN DIRECTION computing with words + PTp
f.g-generalization+ PT + + 1999
1973 1965
f-generalization+ PT +
crisp logic+ PT
PT= standard probability theory (measurement-based) PT += f-generalization of PT PT + += f.g-generalization of PT
PTp= p-generalization of PT=perception-based probability theory
Fig. 11 Evolution of Fuzzy Logic (Zadeh’s Logic) FROM COMPUTING WITH NUMBERS TO COMPUTING WITH WORDS FROM COMPUTING WITH WORDS TO THE COMPUTATIONAL THEORY OF PERCEPTONS AND THE THEORY OF HIERACHICAL DEFINABILITY computing with words
computing with numbers
CN
+
+
+
CI
GrC
PNL
granulation
CSNL
CW
computing with intervals
generalized constraint
CTP
THD
GCL
GrC : computing with granules PNL: precisiated natural language
CTP: computational theory of perceptions
CSNL: constraint -centered semantics of natural languages
THD: theory of hierarchical definability
GCL: generalized constraint language
Fig. 12 Fuzzy Logic, New Tools Acknowledgments Funding for this research was provided in part by the British Telecommunication (BT), OMRON, Tekes, Chevron-Texaco, Imaging and Informatics-Life Science of Lawrence Berkeley National Laboratory and the BISC Program of UC Berkeley. The author would like to thanks Prof. Lotfi A. Zadeh—UC Berkeley and Dr. James S. Albus Senior NIST Fellow for allowing to use some of their presentation materials.
Evolution of Fuzzy Logic: From Intelligent Systems and Computation to Human Mind
53
References 1. Zadeh, Lotfi A. (1965). Fuzzy sets. Information and Control 8, 378–53. 2. Zadeh, Lotfi A. (2005). Toward a Generalized Theory of Uncertainty (GTU)—An Outline, Information Sciences. 3. Scaruffi, Piero (2003). Thinking About Thought, iUniverse publishing 4. Savage, Leonard (1954) THE FOUNDATIONS OF STATISTICS, John Wiley. 5. Shafer, Glenn (1976). A MATHEMATICAL THEORY OF EVIDENCE, Princeton Univ Press. 6. Black, Max (1937). Vagueness, an exercise in logical analysis. Philosophy of Science 6, 427–55. 7. Russell, Bertrand (1923). Vagueness. Australian Journal of Philosophy 1, 88–91. 8. Brock, Jarrett E. (1979). Principle themes in Peirce’s logic of vagueness. In Peirce Studies 1, 41–50. Lubbock: Institute for Studies in Pragmaticism. 9. Nadin, Mihai (1982). Consistency, completeness and the meaning of sign theories. American Journal of Semiotics l (3), 79–98. 10. Nadin, Mihai (1983). The logic of vagueness and the category of synechism. In The Relevance of Charles Peirce, E. Freeman (ed.), l54–66. LaSalle, IL: Monist Library of Philosophy. 11. Engel-Tiercelin, Claudine (1992). Vagueness and the unity of C. S. Peirce’s realism. Transactions of the C. S. Peirce Society 28 (1), 51–82. 12. Merrell, Floyd (1995). Semiosis in the Postmodern Age. West Lafayette: Purdue University Press. 13. Merrell, Floyd (1996). Signs Grow: Semiosis and Life Processes. Toronto: University of Toronto Press. 14. Merrell, Floyd (1997). Peirce, Signs, and Meaning. Toronto: University of Toronto Press. 15. Merrell, Floyd (1998). Sensing Semiosis: Toward the Possibility of Complementary Cultural ‘Logics’. New York: St. Martin’s Press. 16. Zadeh, Lotfi A. (1950). An extension of Wiener’s theory of prediction, (with J. R. Ragazzini), J. Appl. Phys. 21, 645–655. 17. Zadeh, Lotfi A. (1952). The analysis of sampled-data systems, (with J. R. Ragazzini), Applications and Industry (AIEE) 1, 224–234. 18. Zadeh, Lotfi A. (1963). Linear System Theory-The State Space Approach, (co-authored with C. A. Desoer). New York: McGraw-Hill Book Co., 1963. 19. Lukaszewicz Witold (1990). NON-MONOTONIC REASONING, Ellis Harwood. 20. Zadeh, Lotfi A. (1973). Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. on Systems, Man and Cybernetics SMC-3, 28–44. 21. Zadeh, Lotfi A. (1975). Fuzzy Logic and Approximate Reasoning, Synthese, 30, 407–28. 22. Zadeh, Lotfi A. (1970). Decision-making in a fuzzy environment, (with R. E. Bellman), Management Science 17, B- 141-B-164. 23. Zadeh, Lotfi A. (1972). Fuzzy languages and their relation to human and machine intelligence, Proc. of Intl. Conf. on Man and Computer, Bordeaux, France, 130–165. 24. Zadeh, Lotfi A. (1979). A theory of approximate reasoning, Machine Intelligence 9, J. Hayes, D. Michie, and L. I Mikulich (eds.), 149–194. New York: Halstead Press. 25. Zadeh, Lotfi A. (1992). Fuzzy Logic for the Management of Uncertainty, L.A. Zadeh and J. Kacprzyk (Eds.), John Wiley & Sons, New York. 26. Zadeh, Lotfi A. (1999). From Computing with Numbers to Computing with Words – From Manipulation of Measurements to Manipulation of Perceptions, IEEE Transactions on Circuits and Systems, 45, 105–119. 27. Peirce, Charles S. (1908). CP 6.475, CP 6.458, CP 6.461 [CP refers to Collected Papers of Charles Sanders Peirce, C. Hartshorne, P. Weiss and A. W. Burks, eds., Harvard University Press, Cambridge, Mass, 1931–58; the first refers to the volum, the number after the dot to the paragraph, and the last number to the year of the text]. 28. Green, Brian (2005). One Hundred Years of Uncertainty, NYT, Op-Ed 2818 words, Late Edition - Final , Section A , Page 27 , Column 1.
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century1 Rudolf Seising
Abstract This contribution deals with developments in the history of philosophy, logic, and mathematics during the time before and up to the beginning of fuzzy logic. Even though the term “fuzzy” was introduced by Lotfi A. Zadeh in 1964/65, it should be noted that older concepts of “vagueness” and “haziness” had previously been discussed in philosophy, logic, mathematics, applied sciences, and medicine. This paper delineates some specific paths through the history of the use of these “loose concepts”. Vagueness was avidly discussed in the fields of logic and philosophy during the first decades of the 20th century – particularly in Vienna, at Cambridge and in Warsaw and Lvov. An interesting sequel to these developments can be seen in the work of the Polish physician and medical philosopher Ludwik Fleck. Haziness and fuzziness were concepts of interest in mathematics and engineering during the second half of the 1900s. The logico-philosophical history presented here covers the work of Bertrand Russell, Max Black, and others. The mathematicaltechnical history deals with the theories founded by Karl Menger and Lotfi Zadeh. Menger’s concepts of probabilistic metrics, hazy sets (ensembles flous) and micro geometry as well as Zadeh’s theory of fuzzy sets paved the way for the establishment of soft computing methods using vague concepts that connote the nonexistence of sharp boundaries. The intention of this chapter is to show that these developments (historical and philosophical) were preparatory work for the introduction of soft computing as an artificial intelligence approach in 20th century technology.
Rudolf Seising Medical University of Vienna, Core Unit for Medical Statistics and Informatics, Vienna, Austria e-mail:
[email protected] 1
This chapter is a modified and revised version of my conference contributions to the IFSA 2005 World Congress [1] and the BISC Special Event ’05 Forging New Frontiers. 40th of Fuzzy Pioneers (1965–2005) in Honour of Prof. Lotfi A. Zadeh [2]. It accompanies a series of articles on the history of the theory of fuzzy sets that will appear in the near future, each in a different journal. An expanded survey of the author’s historical reconstruction will appear in [3]. Four additional articles [4, 5, 6, 7] will highlight different aspects of this history. The idea underlying this project came from my friend Thomas Runkler, who suggested it during the FUZZ-IEEE 2004 conference in Budapest. Subsequently, the journal’s chief editors encouraged me to realize the project.
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
55
56
R. Seising
1 Introduction Exact concepts, i.e. concepts with strict boundaries, bivalent logic that enables us to decide yes or no, mathematical formulations to represent sharp values of quantities, measurements and other terms are tools of logic and mathematics that have given modern science its exactness since Galileo (1564–1642) and Descartes (1596–1650) in the 17th century. It has been possible to formulate axioms, definitions, theorems, and proofs in the language of mathematics. Moreover, the ascendancy of modern science achieved through the works of Newton (1643–1727), Leibniz (1646–1716), Laplace (1749–1827) and many others made it seem that science was able to represent all the facts and processes that people observe in the world. But scientists are human beings, who also use natural languages that clearly lack the precision of logic and mathematics. Human language – and possibly also human understanding – are not strictly defined systems. On the contrary, they include a great deal of uncertainty, ambiguities, etc. They are, in fact, vague. “Vagueness” is also part of the vocabulary of modern science. In his Essay Concerning Human Understanding (1689), John Locke (1632–1704) complained about the “vague and insignificant forms of speech” and in the French translation (1700), the French word “vague” is used for the English word “loose”. Nevertheless, “vague” did not become a technical term in philosophy and logic during the 18th and 19th century. In the 20th century, however, philosophers like Gottlob Frege (1848–1925) (Fig. 1, left side), Bertrand Russell (1872–1970) (Fig. 3, left side), Max Black (1909–1988) (Fig. 3, middle), and others focused attention on and analyzed the problem of “vagueness” in modern science. A separate and isolated development took place at the Lvov-Warsaw School of logicians in the work of Kazimierz Twardowski (1866–1938) (Fig. 1, middle), Tadeusz Kotarbi´nski (1886–1981), Kazimierz Ajdukiewicz (1890–1963) (Fig. 1, right side), Jan Łukasiewicz (1878–1956), Stanisław Le´sniewski (1886–1939), Alfred Tarski (1901–1983), and many other Polish mathematicians and logicians. Their important contributions to modern logic were recognized when Tarski gave a lecture to the Vienna Circle in September 1929 – following an invitation extended after the Viennese mathematician Karl Menger (1902–1985) had got to know the Lvov-Warsaw scholars during his travels to Warsaw the previous summer. It turned out that these thinkers had been influenced by Frege’s studies. This was especially true of Kortabi´nski, who argued that a concept for a property is vague (Polish: chwiejne) if the property may be the case by grades [8], and Ajdukiewicz, who stated the definition that “a term is vague if and only if its use in a decidable context. . . will make the context undecidable in virtue of those [language] rules” [9]. The Polish characterization of “vagueness” was therefore the existence of fluid boundaries [8, 9]. In the first third of the 20th century there were several groups of European scientists and philosophers who concerned themselves with the interrelationships between logic, science, and the real world, e.g. in Berlin, Cambridge, Warsaw, and the so-called Vienna Circle. The scholars in Vienna regularly debated these issues over a period of years until the annexation of Austria by Nazi Germany in 1938 marked the end of the group. One member of the Vienna Circle was Karl Menger, who later became a professor of mathematics in the USA. As a young man in Vienna,
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
57
Fig. 1 Gottlob Frege, Tadeusz Kortabi´nski, Kazimierz Adjukiewicz
Menger raised a number of important questions that culminated in the so-called principle of logical tolerance. In addition, in his work after 1940 on the probabilistic or statistical generalization of metric space, he introduced the new concepts “hazy sets” (ensembles flous), t-norms and t-conorms, which are also used today in the mathematical treatment of problems of vagueness in the theory of fuzzy sets. This new mathematical theory to deal with vagueness was established in the mid 1960s by Lotfi A Zadeh (born 1921) (Fig. 6, right side), who was then a professor of electrical engineering at Berkeley. In 1962 he described the basic necessity of a new scientific tool to handle very large and complex systems in the real world: “we need a radically different kind of mathematics, the mathematics of fuzzy or cloudy quantities which are not describable in terms of probability distributions. Indeed, the need for such mathematics is becoming increasingly apparent even in the realm of inanimate systems, for in most practical cases the a priori data as well as the criteria by which the performance of a man-made system are judged are far from being precisely specified or having accurately-known probability distributions” ([10], p. 857). In the two years following the publication of this paper, Zadeh developed the theory of fuzzy sets [11, 12, 13], and it has been possible to reconstruct the history of this process [1, 2, 3, 14, 15, 16, 17, 18, 19]. Very little is known about the connectivity between the philosophical work on vagueness and the mathematical theories of hazy sets and fuzzy sets. In this contribution the author shows that there is common ground in the scientific developments that have taken place in these different disciplines, namely, the attempt to find a way to develop scientific methods that correspond to human perception and knowledge, which is not the case with the exactness of modern science.
2 Vagueness – Loose Concepts and Borderless Phenomena 2.1 Logical Analysis of Vagueness In his revolutionary book Begriffsschrift (1879) the German philosopher and mathematician Gottlob Frege confronted the problem of vagueness when formalizing
58
R. Seising
the mathematical principle of complete induction: he saw that some predicates are not inductive, viz. they have been defined for all natural numbers, but they result in false conclusions, e.g. the predicate “heap” cannot be evaluated for all natural numbers [20]. Ten years later, when Frege revised the basics of his Begriffsschrift for a lecture to the Society of Medicine and Science in Jena at the beginning year 1891, he reinterpreted concepts functions and consequently he introduced these functions of concepts everywhere. He stated: If “x+1” is meaningless for all arguments x, then the function x+1=10 has no value and no truth value either. Thus, the concept “that which when increased by 1 yields 10” would have no sharp boundaries. Accordingly, for functions the demand on sharp boundaries entails that they must have a value for every argument [21]. This is a mathematical verbalization of what is called the classical sorites paradox that can be traced back to the old Greek word σ oρως (sorós for “heap”) used by Eubulid of Alexandria (4th century BC). In his Grundgesetze der Arithmetik (Foundations of Arithmetic) that appeared in the years 1893–1903, Frege called for concepts with sharp boundaries, because otherwise we could break logical rules and, moreover, the conclusions we draw could be false [22]. Frege’s specification of vagueness as a particular phenomenon influenced other scholars, notably his British contemporary and counterpart the philosopher and mathematician Bertrand Russell, who published the first logico-philosophical article on “Vagueness” in 1923 [23]. Russell quoted the sorites – in fact, he did not use mathematical language in this article, but, for example, discussed colours and “bald men” (Greek: falakros, English: fallacy, false conclusion): “Let us consider the various ways in which common words are vague, and let us begin with such a word as ‘red’. It is perfectly obvious, since colours form a continuum, that there are shades of colour concerning which we shall be in doubt whether to call them red or not, not because we are ignorant of the meaning of the word ‘red’, but because it is a word the extent of whose application is essentially doubtful. This, of course, is the answer to the old puzzle about the man who went bald. It is supposed that at first he was not bald, that he lost his hairs one by one, and that in the end he was bald; therefore, it is argued, there must have been one hair the loss of which converted him into a bald man. This, of course, is absurd. Baldness is a vague conception; some men are certainly bald, some are certainly not bald, while between them there are men of whom it is not true to say they must either be bald or not bald.” ([23], p. 85). Russell showed in this article from 1923 that concepts are vague even though there have been and continue to be many attempts to define them precisely: “The metre, for example, is defined as the distance between two marks on a certain rod in Paris, when that rod is at a certain temperature. Now, the marks are not points, but patches of a finite size, so that the distance between them is not a precise conception. Moreover, temperature cannot be measured with more than a certain degree of accuracy, and the temperature of a rod is never quite uniform. For all these reasons the conception of a metre is lacking in precision.” ([23], p. 86) Russell also argued that a proper name – and here we can take as an example the name “Lotfi Zadeh” – cannot be considered to be an unambiguous symbol even if we believe that there is only one person with this name. Lotfi Zadeh “was born, and being born is a gradual process. It would seem natural to suppose that the name was not attributable before birth; if so, there was doubt, while birth was taking place,
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
59
whether the name was attributable or not. If it be said that the name was attributable before birth, the ambiguity is even more obvious, since no one can decide how long before birth the name become attributable.” ([23], p. 86) Russell reasoned “that all words are attributable without doubt over a certain area, but become questionable within a penumbra, outside which they are again certainly not attributable.” ([23], p. 86f) Then he generalized that words of pure logic also have no precise meanings, e.g. in classical logic the composed proposition “p or q” is false only when p and q are false and true elsewhere. He went on to claim that the truth values “‘true’ and ‘false’ can only have a precise meaning when the symbols employed – words, perceptions, images . . . – are themselves precise”. As we have seen above, this is not possible in practice, so he concludes “that every proposition that can be framed in practice has a certain degree of vagueness; that is to say, there is not one definite fact necessary and sufficient for its truth, but certain region of possible facts, any one of which would make it true. And this region is itself ill-defined: we cannot assign to it a definite boundary.” Russell emphasized that there is a difference between what we can imagine in theory and what we can observe with our senses in reality: “All traditional logic habitually assumes that precise symbols are being employed. It is therefore not applicable to this terrestrial life, but only to an imagined celestial existence.” ([23], p. 88f). He proposed the following definition of accurate representations: “One system of terms related in various ways is an accurate representation of another system of terms related in various other ways if there is a one-one relation of the terms of the one to the terms of the other, and likewise a one-one relation of the relations of the one to the relations of the other, such that, when two or more terms in the one system have a relation belonging to that system, the corresponding terms of the other system have the corresponding relation belonging to the other system.” And in contrast to this, he stated that “a representation is vague when the relation of the representing system to the represented system is not one-one, but onemany.” ([23], p. 89) He concluded that “Vagueness, clearly, is a matter of degree, depending upon the extent of the possible differences between different systems represented by the same representation. Accuracy, on the contrary, is an ideal limit.” ([23], p. 90). The Cambridge philosopher and mathematician Max Black responded to Russell’s article in “Vagueness. An exercise in logical analysis”, published in 1937. [24]. Black was born in Baku, the capital of the former Soviet Republic Azerbaijan.2 Because of the anti-Semitism in their homeland, his family immigrated to Europe, first to Paris and then to London, where Max Black grew up after 1912. He studied at the University of Cambridge and took his BA. in 1930. He went to Göttingen in Germany for the academic year of 1930/31 and afterwards continued his studies in London, where he received a Ph D in 1939. His doctoral dissertation was entitled “Theories of logical positivism” [..]. After 1940 he taught at the Department of Philosophy at the University of Illinois at Urbana and in 1946 he became a professor at Cornell University in Ithaca, New York.
2
It is a curious coincidence that Max Black and Lotfi Zadeh, the founder of the theory of fuzzy sets, were both in the same city and both became citizens of the USA in the 1940s.
60
R. Seising
Influenced by Russell and Wittgenstein (and the other famous analytical philosophers at Cambridge, Frank P. Ramsey (1903–1930) and George E. Moore (1873–1958), in 1937 Black continued Russell’s approach to the concept of vagueness, and he differentiated vagueness from ambiguity, generality, and indeterminacy. He emphasized “that the most highly developed and useful scientific theories are ostensibly expressed in terms of objects never encountered in experience. The line traced by a draughtsman, no matter how accurate, is seen beneath the microscope as a kind of corrugated trench, far removed from the ideal line of pure geometry. And the ‘point-planet’ of astronomy, the ‘perfect gas’ of thermodynamics, and the ‘pure species’ of genetics are equally remote from exact realization.” ([24], p. 427) Black proposed a new method to symbolize vagueness: “a quantitative differentiation, admitting of degrees, and correlated with the indeterminacy in the divisions made by a group of observers.” ([24], p. 441) He assumed that the vagueness of a word involves variations in its application by different users of a language and that these variations fulfill systematic and statistical rules when one symbol has to be discriminated from another. He defined this discrimination of a symbol x with respect to a language L by Dx L(= Dx¬L) (Fig. 2). Most speakers of a language and the same observer in most situations will determine that either L or ¬L is used. In both cases, among competent observers there is a certain unanimity, a preponderance of correct decisions. For all DxL with the same x but not necessarily the same observer, m is the number of L uses and n the number of ¬L uses. On this basis, Black stated the following definition: “We define the consistency of application of L to x as the limit to which the ratio m/n tends when the number of DxL and the number of observers increase indefinitely. [. . .] Since the consistency of the application, C, is clearly a function of both L and x, it can be written in the form C(L, x).” ([24], p. 442)
Fig. 2 Consistency of application of a typical vague symbol ([24], p. 443)
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
61
Fig. 3 Bertrand Russell, Max Black, Ludwik Fleck
More than a quarter century later and two years before Zadeh introduced fuzzy sets into science, Black published “Reasoning with loose concepts”. In this article he labelled concepts without precise boundaries as “loose concepts” rather than “vague” ones, in order to avoid misleading and pejorative implications [25]. Once again he expressly rejected Russell’s assertion that traditional logic is “not applicable” as a method of conclusion for vague concepts: “Now, if all empirical concepts are loose, as I think they are, the policy becomes one of abstention from any reasoning from empirical premises. If this is a cure, it is one that kills the patient. If it is always wrong to reason with loose concepts, it will, of course, be wrong to derive any conclusion, paradoxical or not, from premises in which such concepts are used. A policy of prohibiting reasoning with loose concepts would destroy ordinary language – and, for that matter, any improvement upon ordinary language that we can imagine.” ([25], p. 7)
2.2 Phenomena of Diseases Do Not Have Strict Boundaries Reasoning with loose concepts is standard practice in medical thinking. In 1926 the young Polish physician and philosopher Ludwik Fleck (1896–1961) (Fig. 3, right side) expressed this fact in a striking manner: “It is in medicine that one encounters a unique case: the worse the physician, the ‘more logical’ his therapy.” Fleck said this in a lecture to the Society of Lovers of the History of Medicine at Lvov entitled Some specific features of the medical way of thinking, which was published in 1927 (in Polish) [26]. Even though he was close to the Polish school of logic, he opposed the view that medical diagnoses are the result of strong logical reasoning. He thought that elements of medical knowledge, symptoms and diseases are essentially indeterminate and that physicians rely on their intuition rather than on logical consequences to deduce a disease from a patient’s symptoms.
62
R. Seising
Ludwik Fleck was born in Lvov (Poland), where he received his medical degree at the Jan Kazimierz University. When Nazi Germany attacked the Soviet Union in 1939 and German forces occupied Lvov, Fleck was deported to the city’s Jewish ghetto. In 1943 he was sent to the Auschwitz concentration camp. From the end of 1943 to April 1945, he was detained in Buchenwald, where he worked in a laboratory set up by the SS for the production and study of production methods for typhus serum. After the Second World War, Fleck served as the head of the Institute of Microbiology of the School of Medicine in Lublin and then as the Director of the Department of Microbiology and Immunology at a state institute in Warsaw. After 1956 he worked at the Israel Institute for Biological Research in Ness-Ziona. He died at the age of 64 of a heart attack. Fleck was both a medical scientist and a philosopher. In 1935 he wrote Genesis and Development of a Scientific Fact (in German) [27], but unfortunately this very important work was unknown in most parts of the world until it was translated into English in the late 1970s [28]. In it Fleck anticipated many of Thomas Kuhn’s ideas on the sociological and epistemological aspects of scientific development that Kuhn published in his very influential book The Structure of Scientific Revolutions [29]. In Fleck’s philosophy of science, sciences grow like living organisms and the development of a scientific fact depends on particular “thought-styles”. Fleck denied the existence of any absolute and objective criteria of knowledge; on the contrary, he was of the opinion that different views can be true. He suggested that truth in science is a function of a particular “thinking style” by the “thought-collective”, that is, a group of scientists or people “exchanging ideas or maintaining intellectual interaction”. Fleck adopted this “notion of the relativity of truth” from his research on philosophy of medicine, where he pointed out that diseases are constructed by physicians and do not exist as objective entities. Some specific features of the medical way of thinking begins with the following sentence: “Medical science, whose range is as vast as its history is old, has led to the formation of a specific style in the grasping of its problems and of a specific way of treating medical phenomena, i.e. to a specific type of thinking.” A few lines later, he exemplified this assumption: “Even the very subject of medical cognition differs in principle from that of scientific cognition. A scientist looks for typical, normal phenomena, while a medical man studies precisely the atypical, abnormal, morbid phenomena. And it is evident that he finds on this road a great wealth and range of individuality of these phenomena which form a great number without distinctly delimited units, and abounding in transitional, boundary states. There exists no strict boundary between what is healthy and what is diseased, and one never finds exactly the same clinical picture again. But this extremely rich wealth of forever different variants is to be surmounted mentally, for such is the cognitive task of medicine. How does one find a law for irregular phenomena? – This is the fundamental problem of medical thinking. In what way should they be grasped and what relations should be adopted between them in order to obtain a rational understanding?” ([30], p. 39)
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
63
Fleck emphasized two points: The first of these was the impact of the knowledge explosion in medical science. Accelerated progress in medical research had led to an enormous number of highly visible disease phenomena. Fleck argued that medical research has “to find in this primordial chaos, some laws, relationships, some types of higher order”. He appreciated the vital role played by statistics in medicine, but he raised the objections that numerous observations “eliminate the individual character of the morbid element” and “the statistical observation itself does not create the fundamental concept of our knowledge, which is the concept of the clinical unit.” ([30], p. 39.) Therefore “abnormal morbid phenomena are grouped round certain types, producing laws of higher order, because they are more beautiful and more general than the normal phenomena which suddenly become profoundly intelligible. These types, these ideal, fictitious pictures, known as morbid units, round which both the individual and the variable morbid phenomena are grouped, without, however, ever corresponding completely to them – are produced by the medical way of thinking, on the one hand by specific, far-reaching abstraction, by rejection of some observed data, and on the other hand, by the specific construction of hypotheses, i.e. by guessing of non observed relations.” ([30], p. 40) The second point that concerned Fleck was the absence of sharp borders between these phenomena: “In practice one cannot do without such definitions as ‘chill’, ‘rheumatic’ or ‘neuralgic’ pain, which have nothing in common with this bookish rheumatism or neuralgia. There exist various morbid states and syndromes of subjective symptoms that up to now have failed to find a place and are likely not to find it at any time. This divergence between theory and practice is still more evident in therapy, and even more so in attempts to explain the action of drugs, where it leads to a peculiar pseudo-logic.” ([30], p. 42) Clearly, it is very difficult to define sharp borders between various symptoms in the set of all symptoms and between various diseases in the set of diseases, respectively. On the contrary, we can observe smooth transitions from one entity to another and perhaps a very small variation might be the reason why a medical doctor diagnoses a patient with disease x instead of disease y. Therefore Fleck stated that physicians use a specific style of thinking when they deliberate on attendant symptoms and the diseases patients suffer from. Of course, Fleck could not have known anything about the methods of fuzzy set theory, but he was a philosopher of vagueness in medical science. He contemplated a “space of phenomena of disease” and realized that there are no boundaries either in a continuum of phenomena of diseases or between what is diseased and what is healthy. Some 40 years later Lotfi Zadeh proposed to handle similar problems with his new theory of fuzzy sets when he lectured to the audience of the International Symposium on Biocybernetics of the Central Nervous System: “Specifically, from the point of view of fuzzy set theory, a human disease, e.g., diabetes may be regarded as a fuzzy set in the following sense. Let X = {x} denote the collection of human beings. Then diabetes is a fuzzy set, say D, in X, characterized by a membership function μ D (x) which associates with each human being x his grade of membership in the fuzzy set of diabetes” ([31], p. 205). This “fuzzy view” on vagueness in medicine was written without any knowledge of Fleck’s work, but it offers an interesting view
64
R. Seising
of the history of socio-philosophical and system theoretical concepts of medicine. For more details the reader is referred to [32].
3 Haziness – A Micro Geometrical Approach Karl Menger (Fig. 5, right side), a mathematician in Vienna and later in the USA, was one of the first to begin laying the groundwork for human-friendly science, i.e. for the development of scientific methods to deal with loose concepts. In his work, he never abandoned the framework of classical mathematics, but used probabilistic concepts and methods. Karl Menger was born in Vienna, Austria, and he entered the University of Vienna in 1920. In 1924, after he had received his PhD, he moved to Amsterdam because of his interests in topology. There, he became the assistant of the famous mathematician and topologist Luitzen E. J. Brouwer (1881–1966). In 1927 Menger returned to Vienna to become professor of geometry at the University and he became a member of the Vienna Circle. In 1937 – one year before the annexation of Austria by Nazi Germany – he immigrated to the USA, where he was appointed to a professorship at the University of Notre Dame. In 1948 Menger moved to the Illinois Institute of Technology, and he remained in Chicago for the rest of his life.
3.1 Logical Tolerance When Menger came back to Vienna from Amsterdam as a professor of geometry, he was invited to give a lecture on Brouwer’s intuitionism to the Vienna Circle. In this talk he rejected the view held by almost all members of this group that there is one unique logic. He claimed that we are free to choose axioms and rules in mathematics, and thus we are free to consider different systems of logic. He realized the philosophical consequences of this assumption, which he shared with his student Kurt Gödel (1906–1978): “the plurality of logics and language entailing some kind of logical conventionalism” ([33], p. 88). Later in the 1930s, when the Vienna Circle became acquainted with different systems of logics – e.g. the three and multi-valued logic founded by Jan Łukasiewicz (1878–1956), which was also discussed by Tarski in his Vienna lecture, and Brouwer’s intuitionistic logic – Rudolf Carnap (1891–1970) also defended this tolerant view of logic. In his lecture The New Logic: A 1932 Lecture, Menger wrote. “What interests the mathematician and all that he does is to derive propositions by methods which can be chosen in various ways but must be listed. And to my mind all that mathematics and logic can say about this activity of mathematician (which neither needs justification nor can be justified) lies in this simple statement of fact.” ([34], p. 35.) This “logical tolerance” proposed by Menger later became well-known in Carnap’s famous book Logische Syntax der Sprache (Logical Syntax of Language) in 1934 [35]. It is a basic principle for the contemplation of “deviant logics”, that is, systems of logic that differ from usual bivalent logic, one of which is the logic of fuzzy sets.
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
65
3.2 Statistical Metrics, Probabilistic Geometry, the Continuum and Poincaré’s Paradox In Vienna in the 1920s and 1930s, Menger evolved into a specialist in topology and geometry, particularly with regard to the theories of curves, dimensions, and general metrics. After he emigrated to the USA, he tied up to theses subjects. In 1942, with the intention of generalizing the theory of metric spaces more in the direction of probabilistic concepts, he introduced the term “statistical metric”: A statistical metric is “a set S such that with each two elements (‘points’) p and q of S, a probability function Π(x; p, q) is associated satisfying the following conditions: 1. 2. 3. 4.
Π(0; p, p) = 1. If p = q, then Π(0; p, q) < 1. Π(x; p, q) = Π(x; q, p) T [Π(x; p, q), Π(y; q, r )] ≤ Π(x + y; p, r ).
where T [α, β] is a function defined for 0 ≤ α ≤ 1 and 0 ≤ β ≤ 1 such that (a) (b) (c) (d) (e)
0 ≤ T (α, β) ≤ 1. T is non-decreasing in either variable. T (α, β) = T [β, α]. T (1, 1) = 1. If α > 0, then T (α, 1) > 0.” ([36], p. 535f)
Menger called Π(x; p, q) the distance function of p and q, which bears the meaning of the probability that the points p and q have a distance ≤ x. Condition 4, the “triangular inequality” of the statistical metric S implies the following inequality for all points q and all numbers x between 0 and z: Π(z; p, r ) ≥ Max T [Π(x; p, q), Π(z − x; q, r )]. In this paper Menger used the name triangular norm (t-norm) to indicate the function T for the first time. Almost 10 years later, in February 1951, Menger wrote two more notes to the National Academy of Sciences. In the first note “Probabilistic Geometry” [37], among other things he introduced a new notation ab for the non-decreasing cumulative distribution function, associated with every ordered pair (a, b) of elements of a set S and he presented the following definition: “The value ab(x) may be interpreted as the probability that the distance from a to b be < x.” ([37], p. 226.) Much more interesting is the following text passage: “We call a and a certainly-indistinguishable if aa (x) = 1 for each x > 0. Uniting all elements which are certainly indistinguishable from each other into identity sets, we decompose the space into disjoint sets A, B, . . . We may define AB (x) = ab (x) for any a belonging to A and b belonging to B. (The number is independent of the choice of a and b.) The
66
R. Seising
identity sets form a perfect analog of an ordinary metric space since they satisfy the condition If A = B, then there exists a positive x with AB (x) < 1.”3 In the paper “Probabilistic Theories of Relations” of the same year [38], Menger addressed the difference between the mathematical and the physical continuum. Regarding A, B, and C as elements of a continuum, he referred to the French mathematician and physicist Henri Poincaré’s (1854–1912) (Fig. 5, middle) claim “that only in the mathematical continuum do the equalities A = B and B = C imply the equality A = C. In the observable physical continuum, ‘equal’ means ‘indistinguishable’, and A = B and B = C by no means imply A = C.” The raw result of experience may be expressed by the relation A = B, B = C, A < C, which may be regarded as the formula for the physical continuum.“ According to Poincaré, physical equality is a non-transitive relation.” ([38], p. 178.) Menger suggested a realistic description of the equality of elements in the physical continuum by associating with each pair ( A, B) of these elements the probability that A and B will be found to be indistinguishable. He argued: “For it is only very likely that A and B are equal, and very likely that B and C are equal – why should it not be less likely that A and C are equal? In fact, why should the equality of A and C not be less likely than the inequality of A and C?” ([38], p. 178.) To solve “Poincaré’s paradox” Menger used his concept of probabilistic relations and geometry: For the probability E(a, b) that a and b would be equal he postulated: E(a, a) = 1 E(a, b) = E(b, a), E(a, b) · E(b, c) ≤ E(a, c),
for every a; for every a and b; for every a, b, c.
If E(a, b) = 1, then he called a and b certainly equal. (In this case we obtain the ordinary equality relation.) “All the elements which are certainly equal to a may be united to an ‘equality set’, A. Any two such sets are disjoint unless they are identical.” ([38], p. 179.) In 1951, as a visiting lecturer at the Sorbonne University, Menger presented similar ideas in the May session of the French Académie des sciences in his “Ensembles flous et functions aléatoires”. He proposed to replace the ordinary element relation “∈” between each object x in the universe of discourse U and a set F by the probability Π F (x) of x belonging to F. In contrast to ordinary sets, he called these entities “ensembles flous”: “Une relation monaire au sens classique est un sous-ensemble F de l’univers. Au sens probabiliste, c’est une fonction Π F définie pour tout x ∈ U . Nous appellerons cette fonction même un ensemble flou et nous interpréterons Π F (x) comme la probabilité que x appartienne à cet ensemble.” ([39], p. 2002) Later, he also used the English expression “hazy sets” [40].
3
In his original paper Menger wrote “>”. I would like to thank E. P. Klement for this correction.
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
67
3.3 Hazy Lumps and Micro Geometry Almost 15 years later, in 1966, Menger read a paper on “Positivistic Geometry” at the Mach symposium that was held by the American Association for the Advancement of Science to mark the 50th anniversary of the death of the Viennese physicist and philosopher Ernst Mach (1838–1916) (Fig. 5, left side). The very important role of Mach’s philosophical thinking and his great influence on Vienna Circle philosophy are obvious from the initial name of this group of scholars in the first third of the 20th century – “Verein (Society) Ernst Mach”. In his contribution to the Mach symposium in the USA, Menger began with the following passage from Mach’s chapter on the continuum in his book Die Principien der Wäremelehre (Principles of the Theory of Heat), published in 1896: “All that appears to be a continuum might very well consist of discrete elements, provided they were sufficiently small compared with our smallest practically applicable units or sufficiently numerous.” [41] Then he again described Poincaré’s publications connected with the notion of the physical continuum and he summarized his own work on statistic metrics, probabilistic distances, indistinguishableness of elements, and ensembles flous. He believed that it would be important in geometry to combine these concepts with the concept “of lumps, which can be more easily identified and distinguished than points. Moreover, lumps admit an intermediate stage between indistinguishability and apartness, namely that of overlapping. It is, of course, irrelevant whether the primitive (i.e. undefined) concepts of a theory are referred to as points and probably indistinguishable, or as lumps and probably overlapping. All that matters are the assumptions made about the primitive concepts. But the assumptions made in the two cases would not be altogether identical and I believe that the ultimate solution of problems if micro geometry may well lie in a probabilistic theory of hazy lumps. The essential feature of this theory would be that lumps would not be point sets; nor would they reflect circumscribed figures such as ellipsoids. They would rather be in mutual probabilistic relations of overlapping and apartness, from which a metric would have to be developed.” [40] Menger reminded his readers of “another example of distances defined by chains” that is a more detailed representation of what Russell had mentioned more than 40 years before: In the 1920s, the well-known Viennese physicist Erwin Schrödinger (1887–1961) – who was later one of the founders of quantum mechanics – established a theory of a metric of colors, called Colorimetry (Farbenmetrik) (Fig. 4): “Schrödinger sets the distance between two colors C and C , equal to 1 if C and C are barely distinguishable; and equal to the integer n if there exists a sequence of elements, C0 = C, C1 , C2 , . . . , Cn−1 = C such that any two consecutive elements Ci−1 and Ci are barely distinguishable while there does not exist any shorter chain of this kind. According to Schrödinger’s assumption about the color cone, each element C (represented by a vector in a 3-dimensional space) is the center of an ellipsoid whose interior points cannot be distinguished from C while the points on the surface of the ellipsoid are barely distinguishable from C. But such well-defined neighborhoods do not seem to exist in the color cone any more that on the human skin. One must first define distance 1 in a probabilistic way.” ([42], p. 232).
68
R. Seising
Fig. 4 Schrödinger’s color cone ([42], p. 428)
Menger never envisaged a mathematical theory of loose concepts that differs from probability theory. He compared his “micro geometry” with the theory of fuzzy sets: “In a slightly different terminology, this idea was recently expressed by Bellman, Kalaba and Zadeh under the name fuzzy set. (These authors speak of the degree rather than the probability of an element belonging to a set.)” [40] Menger did not see that this “slight difference” between “degrees” (fuzziness) and “probabilities” is a difference not just in terminology but in the meaning of the concepts. Ludwik Fleck and Karl Menger were philosophers of vagueness in the first half of the 20th century, each one with a specific field of study. Fleck’s field of research was physicians’ specific style of thinking. He contemplated a “space of phenomena of disease” and realized that there are no clear boundaries either between these phenomena or between what is diseased and what is healthy. Fleck therefore called the medical way of thinking a “specific style” that is not in accordance with deductive logic. Today we would say that this way of thinking is essentially fuzzy. The mathematician Menger looked for a tool to deal with elements of the physical continuum, which is different from the mathematical continuum, because these elements can be indistinguishable but not identical. To that end, Menger generalized the theory of metric spaces, but he could not break away from probability theory and statistics. Thus, Menger created a theory of probabilistic distances of points or elements of the physical continuum. Both of these scholars concerned themselves with tools to deal with fuzziness before the theory of fuzzy sets came into being.
Fig. 5 Ernst Mach, Jules Henri Poincaré, Karl Menger
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
69
4 Fuzzy Sets – More Realistic Mathematics 4.1 Thinking Machines In the late 1940s and early 1950s information theory, the mathematical theory of communication, and cybernetics developed during the Second World War by Claude E. Shannon (1916–2001) (Fig. 6, second from left), Norbert Wiener (1894–1964) (Fig. 6, left side), Andrej N. Kolmogorov (1903–1987), Ronald A. Fisher (1890– 1962) and many others became well-known. When Shannon and Wiener went to New York to give lectures on their new theories at Columbia University in 1946, they introduced these new milestones in science and technology to the young doctoral student of electrical engineering Lotfi A. Zadeh. Also the new era of digital computers – which began in the 1940s with the Electronic Numerical Integrator and Computer (ENIAC) and the Electronic Discrete Variable Computer (EDVAC), both of which were designed by J. P. Eckert (1919–1995) and J. W. Mauchly (1907–1980), and continuing developments by scientists such as John von Neumann and others – gave huge stimuli to the field of engineering. In 1950 Zadeh moderated a discussion between Shannon, Edmund C. Berkeley (1923–1988, author of the book Giant Brains or Machines That Think [43] published in 1949), and the mathematician and IBM consultant, Francis J. Murray (1911–1996). “Can machines think?” was Alan Turing’s (Fig. 6, second from right) question in his famous Mind article “Computing Machinery and Intelligence” in the same year [44]. Turing proposed the imitation game, now called the “Turing Test”, to decide whether a computer or a program could think like a human being or not. Inspired by Wiener’s Cybernetics [45], Shannon’s Mathematical Theory of Communication [46], and the new digital computers, Zadeh wrote the article “Thinking Machines – A New Field in Electrical Engineering” (Fig. 7) in the student journal The Columbia Engineering Quarterly in 1950 [47]. As a prelude, he quoted some of the headlines that had appeared in newspapers throughout the USA during 1949: “Psychologists Report Memory is Electrical”, “Electric Brain Able to Translate Foreign Languages is Being Built”, Electronic Brain Does Research”, “Scientists Confer on Electronic Brain,” and he asked, “What is behind these headlines? How will “electronic brains” or “thinking machines”
Fig. 6 Norbert Wiener, Claude E. Shannon, Alan M. Turing, Lotfi A. Zadeh
70
R. Seising
Fig. 7 Headline of L. A. Zadeh’s 1950-paper [47]
affect our way of living? What is the role played by electrical engineers in the design of these devices?” ([47], p. 12.) Zadeh was interested in “the principles and organization of machines which behave like a human brain. Such machines are now variously referred to as ‘thinking machines’, ‘electronic brains’, ‘thinking robots’, and similar names.” In a footnote he added that the “same names are frequently ascribed to devices which are not ‘thinking machines’ in the sense used in this article” and specified that “The distinguishing characteristic of thinking machines is the ability to make logical decisions and to follow these, if necessary, by executive action.” ([47], p. 12) In the article, moreover, Zadeh gave the following definition: “More generally, it can be said that a thinking machine is a device which arrives at a certain decision or answer through the process of evaluation and selection.” On the basis of this definition, he decided that the MIT differential analyzer was not a thinking machine, but that both of the large-scale digital computers that had been built at that time, UNIVAC and BINAC, were thinking machines because they both were able to make non-trivial decisions. ([47], p. 13) Zadeh explained “how a thinking machine works” (Fig. 8) and stated that “the box labeled Decision Maker is the most important part of the thinking machine”. Lotfi Zadeh moved from Tehran, Iran, to the USA in 1944. Before that, he had received a BS degree in Electrical Engineering in Tehran in 1942. After a while, he continued his studies at MIT where he received an MS degree in 1946. He then moved to New York, where he joined the faculty of Columbia University as an instructor. In 1949 he wrote his doctoral dissertation on Frequency Analysis of Variable Networks [48], under the supervision of John Ralph Ragazzini, and in 1950 he became an assistant professor at Columbia. The following sections of this article
Fig. 8 Zadeh’s schematic diagram illustrating the arrangement of the basic elements of a thinking machine ([47], p. 13)
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
71
Fig. 9 Signal space representation: comparison of the distances between the received signal y and all possible transmitted signals ([49], p. 202)
deal with Zadeh’s contributions to establishing a new field in science that is now considered to be a part of research in artificial intelligence: soft computing, also known as computational intelligence. This development starts with Zadeh’s work in information and communication technology.
Information and Communication Science In March 1950 Zadeh gave a lecture on Some Basic Problems in Communication of Information at the New York Academy of Sciences [49], in which he represented signals as ordered pairs (x(t), y(t)) of points in a signal space Σ, which is embedded in a function space (Fig. 9). The first problem deals with the recovery process of transmitted signals: “Let X = {x(t)} be a set of signals. An arbitrarily selected member of this set, say x(t), is transmitted through a noisy channel Γ and is received as y(t). As a result of the noise and distortion introduced by Γ , the received signal y(t) is, in general, quite different from x(t). Nevertheless, under certain conditions it is possible to
Fig. 10 Geometrical representation of nonlinear (left) and linear filtering (right) ([49], p. 202)
72
R. Seising
recover x(t) – or rather a time-delayed replica of it – from the received signal y(t).” ([49], p. 201) In this paper, he did not examine the case where {x(t)} is an ensemble; he restricted his view to the problem of recovering x(t) from y(t) “irrespective of the statistical character of {x(t)}” ([49], p. 201). Corresponding to the relation y = Γ x between signals x(t) and y(t), he represented the recovery process of y(t) from x(t) by x = Γ −1 y, where Γ −1 y is the inverse of Γ , if it exists, over {y(t)}. Zadeh represented signals as ordered pairs of points in a signal space Σ, which is embedded in a function space with a delta-function basis. To measure the disparity between x(t) and y(t), he attached a distance function d(x, y) with the usual properties of a metric. Then he considered the special case in which it is possible to achieve a perfect recovery of the transmitted signal x(t) from the received signal y(t). He supposed that “X = {x(t)} consist of a finite number of discrete signals x 1 (t), x 2 (t), . . . , x n (t), which play the roles of symbols or sequences of symbols. The replicas of all these signals are assumed to be available at the receiving end of the system. Suppose that a transmitted signal x k is received as y. To recover the transmitted signal from y, the receiver evaluates the ‘distance’ between y and all possible transmitted signals x 1 , x 2 , . . . , x n , with the use of a suitable distance function d(x, y), and then selects that signal which is ‘nearest’ to y in terms of this distance function (Fig. 9). In other words, the transmitted signal is taken to be the one that results in the smallest value of d(x, y). This in brief, is the basis of the reception process.” ([49], p. 201) In this process, the received signal x k is always ‘nearer’ – in terms of the distance functional d(x, y) – to the transmitted signal y(t) than to any other possible signal x i . Zadeh conceded that “in many practical situations it is inconvenient, or even impossible, to define a quantitative measure, such as a distance function, of the disparity between two signals. In such cases we may use instead the concept of neighborhood, which is basic to the theory of topological spaces.” ([49], p. 202) – About 15 years later, he proposed another ‘concept of neighborhood’ which is now basic to the theory of fuzzy systems. He also discussed the multiplex transmission of two or more signals: A system has two channels and the sets of signals assigned to their respective channels are X = {x(t)} and Y = {y(t)}. If we are given the sum signal u(t) = x(t) + y(t) at the receiving end, how can we extract x(t) and y(t) from u(t)? – We have to find two filters N1 and N2 such that, for any x in X and any y in Y , N1 (x + y) = x and N2 (x + y) = y. In the signal space representation, two manifolds Mx and M y in Σ correspond to the sets of signals, X and Y . Zadeh showed that the coordinates of the signal vector x(t) is computable in terms of the coordinates of the signal vector u(t) in the following form: x ν = Hν (u 1 , u 2 , . . . , u n ),
ν = 1, 2, . . . n
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
73
Symbolically, he wrote x = H (u), where H (or equivalently, the components Hν ) provides the desired characterization of the ideal filter. Zadeh argued that all equations in all coordinates of the vectors x, y, and u “can be solved by the use of machine computation or other means” and he stated an analogy between filters and computers: “It is seen that, in general, a filter such as N1 may be regarded essentially as a computer which solves equations [. . . ] of x(t) in terms of those of u(t).” ([49], p. 204) Finally, in this lecture Zadeh mentioned the case of linearity of the equations for f i and g j . “In this case the manifolds Mx and M y are linear, and the operation performed by the ideal filter is essentially that of projecting the signal space Σ on Mx along M y ” ([49], p. 204). He illustrated both the nonlinear and the linear modes of ideal filtering in Fig. 10 in terms of two-dimensional signal space. This analogy between the projection in a function space and the filtration by a filter led Zadeh in the early 1950s to a functional symbolism of filters [50, 51]. Thus, N = N1 + N2 represents a filter consisting of two filters connected by addition, N = N1 N2 represents their tandem combination and N = N1 |N2 the separation process (Fig. 11). But in reality filters don’t do exactly what they are supposed to do in theory. Therefore, in later papers Zadeh started (e.g. in [52]) to differentiate between ideal and optimal filters. Ideal filters are defined as filters that achieve a perfect separation of signal and noise, but in practice such ideal filters do not exist. Zadeh regarded optimal filters to be those that give the “best approximation” of a signal and he noticed that “best approximations” depend on reasonable criteria. At that time he formulated these criteria in statistical terms [52]. In the late 1950s Zadeh realized that there are many problems in applied science (primarily those involving complex, large-scale systems, or complex or animated systems) in which we cannot compute exact solutions and therefore we have to be content with unsharp – fuzzy – solutions. To handle such problems, reasoning with loose concepts turned out to be very successful and in the 1960s Zadeh developed a mathematical theory to formalize this kind of reasoning, namely the theory of fuzzy sets [11, 12, 13]. During this time Zadeh was in constant touch with his close
Fig. 11 Functional symbolism of ideal filters ([51], p. 225)
74
R. Seising
friend and colleague Richard E. Bellman (1920–1984) (Fig. 12, left side), a young mathematician working at the RAND Corporation in Santa Monica. This friendship began in the late 1950s, when they met for the first time in New York while Zadeh was at Columbia University and ended at Bellman’s death. Even though they dealt with different, special problems in mathematical aspects of electrical engineering, system theory, and later computer science, they met each other very often and discussed many subjects involved in their scientific work. Thus, it is not surprising that the history of fuzzy set theory is closely connected with Richard Bellman.
Fuzzy Sets and Systems Zadeh came up with the idea of fuzzy sets in the summer of 1964. He and Bellman had planned to work together at RAND in Santa Monica. Before that, Zadeh had to give a talk on pattern recognition at a conference at the Wright Patterson Air Force Base in Dayton, Ohio. During that time he started thinking about the use of grades of membership for pattern classification and he conceived the first example of fuzzy mathematics, which he wrote in one of his first papers on the subject: “For example, suppose that we are concerned with devising a test for differentiating between the handwritten letters O and D. One approach to this problem would be to give a set of handwritten letters and indicate their grades of membership in the fuzzy sets O and D. On performing abstraction on these samples, one obtains the estimates μ˜ 0 and μ˜ D of μ0 and μ D , respectively. Then given a letter x which is not one of the given samples, one can calculate its grades of membership in O and D; and, if O and D have no overlap, classify x in O or D.” ([12], p. 30.) Over the next few days he extended this idea to a preliminary version of the theory of fuzzy sets, and when he got to Santa Monica, he discussed these ideas with Bellman. Then he wrote two manuscripts: “Abstraction and Pattern Classification” and “Fuzzy Sets”. The authors of the first manuscript were listed as Richard Bellman, his associate Robert Kalaba (1926–2004) (Fig. 12, middle), and Lotfi Zadeh (Fig. 12, right side). It was printed as RAND memorandum RM-4307-PR in October 1964 and was written by Lotfi Zadeh (Fig. 13). In it he defined fuzzy sets for the
Fig. 12 Richard E. Bellman, Robert E. Kalaba, and Lotfi A. Zadeh
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
75
first time in a scientific paper, establishing a general framework for the treatment of pattern recognition problems [52]. This was an internal RAND communication, but Zadeh also submitted the paper to the Journal of Mathematical Analysis and Applications, whose editor was none other than Bellman. It was accepted for publication, under the original title, in 1966 [53]. This was the article that Karl Menger cited in his contribution at the Mach symposium in 1966. The second manuscript that Zadeh wrote on fuzzy sets was called “Fuzzy Sets” [11]. He submitted it to the editors of the journal Information and Control in November 1964 (Fig. 15, left side). As Zadeh himself was a member of the editorial board of this journal, the reviewing process was quite short. “Fuzzy Sets” appeared in June 1965 as the first article on fuzzy sets in a scientific journal. As was usual in the Department of Electrical Engineering at Berkeley, this text was also preprinted as a report and this preprint appeared as ERL Report No. 64–44 of the Electronics Research Laboratory in November 1964 [53]. In this seminal article Zadeh introduced new mathematical entities as classes or sets that “are not classes or sets in the usual sense of these terms, since they do not dichotomize all objects into those that belong to the class and those that do not.” He introduced “the concept of a fuzzy set, that is a class in which there may be a
Fig. 13 Zadeh’s RAND memo [52]
76
R. Seising
Fig. 14 Zadeh’s Illustration of fuzzy sets in R1 : “The membership function of the union is comprised of curve segments 1 and 2; that of the intersection is comprised of segments 3 and 4 (heavy lines).” ([11], p. 342.)
continuous infinity of grades of membership, with the grade of membership of an object x in a fuzzy set A represented by a number f A (x) in the interval [0, 1].” [11] The question was how to generalize various concepts, union of sets, intersection of sets, and so forth. Zadeh defined equality, containment, complementation, intersection and union relating to fuzzy sets A, B in any universe of discourse X as follows (for all x ∈ X) (Fig. 14): A = B if and only if μ A (x) = μ B (x), A ⊆ B if and only if μ A (x) ≤ μ B (x), ¬A is the complement of A, if and only if μ¬ A (x) = 1– μ A (x), A ∪ B if and only if μ A∪B (x) = max(μ A (x), μ B (x)), A ∩ B if and only if μ A∩B (x) = min(μ A (x), μ B (x)). Later, the operations of minimum and maximum of membership functions could be identified as a specific t-norm and t-conorm, respectively. These algebraic concepts that Karl Menger introduced into mathematics in connection with statistical metrics in the early 1940s are now an important tool in modern fuzzy mathematics. In April 1965 the Symposium on System Theory was held at the Polytechnic Institute in Brooklyn. Zadeh presented “A New View on System Theory”: a view that deals with the concepts of fuzzy sets, “which provide a way of treating fuzziness in a quantitative manner.” He explained that “these concepts relate to situations in which the source of imprecision is not a random variable or a stochastic process but rather a class or classes which do not possess sharply defined boundaries.” [12] His “simple” examples in this brief summary of his new “way of dealing with classes in which there may be intermediate grades of membership” were the “class” of real numbers which are much larger than, say, 10, and the “class” of “bald men”, as well as the “class” of adaptive systems. In the subsequent publication of the proceedings of this symposium, there is a shortened manuscript version of Zadeh’s talk, which
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
77
Fig. 15 Zadeh’s first published papers on fuzzy sets in 1965 [11] and [12]
here is entitled “Fuzzy Sets and Systems”. [12] (Fig. 15, right side). In this lecture and published paper, Zadeh first defined “fuzzy systems” as follows: A system S is a fuzzy system if (input) u(t), output y(t), or state s(t) of S or any combination of them ranges over fuzzy sets. ([12], p. 33). We have the same systems equations that hold for usual systems, but with different meanings for u, y, and s as specified in this definition. st +1 = f (st , u t ),
yt = g(st , u t ),
t = 0, 1, 2, . . . .
He maintained that these new concepts provide a “convenient way of defining abstraction – a process which plays a basic role in human thinking and communication.” ([12], p. 29)
5 Towards Fuzzy Sets in Artificial Intelligence The theory of fuzzy sets is a mathematical theory to deal with vagueness and other loose concepts lacking strict boundaries. It seems that “vagueness”, as it has been used in philosophy and logic since the 20th century, may be formalized by fuzzy sets, whereas “haziness” like other scientific concepts, e.g. indeterminacy, is a concept that needs to be formalized by probability theory and statistics. Nevertheless, fuzzy mathematics cannot possibly be imagined without the use of t-norms and t-conorms. The theory of fuzzy sets is considered to be the core of “soft computing”, which became a new dimension of artificial intelligence in the final years of the 20th century. Zadeh was inspired by the “remarkable human capability to perform a wide variety of physical and mental tasks without any measurements and any computations. Everyday examples of such tasks are parking a car, playing golf, deciphering sloppy handwriting and summarizing a story. Underlying this capability is the brain’s crucial ability to reason with perceptions – perceptions of time, distance, speed, force, direction, shape, intent, likelihood, truth and other attributes of physical and mental objects.” ([54], p. 903).
78
R. Seising
Fig. 16 Perception-based system modelling [57]
In the 1990s he established Computing with Words (CW) [55, 56] instead of exact computing with numbers, as a method for reasoning and computing with perceptions. In an article entitled “Fuzzy Logic = Computing with Words” he stated that “the main contribution of fuzzy logic is a methodology for computing with words. No other methodology serves this purpose” ([55], p. 103.). Three years later he wrote “From Computing with Numbers to Computing with Words – From Manipulation of Measurements to Manipulation of Perceptions”, to show that a new “computational theory of perceptions, or CTP for short, is based on the methodology of CW. In CTP, words play the role of labels of perceptions and, more generally, perceptions are expressed as propositions in natural language.” ([56], p. 105). Zadeh had observed “that progress has been, and continues to be, slow in those areas where a methodology is needed in which the objects of computation are perceptions – perceptions of time, distance, form, direction, color, shape, truth, likelihood, intent, and other attributes of physical and mental objects.” ([57], p. 73.) He set out his ideas in the AI Magazine in the spring of 2001. In this article called “A New Direction in AI” he presented perception-based system modelling (Fig. 16): “A system, S, is assumed to be associated with temporal sequences of input X 1 , X 2 , . . . ; output Y1 , Y2 , . . . ; and states S1 , S2 , . . . is defined by state-transition function f with St +1 = f (St , X t ), and output function g with Yt = g(St , X t ), for t = 0, 1, 2, . . . . In perception-based system modelling, inputs, outputs and states are assumed to be perceptions, as state-transition function, f , and output function, g.” ([57], p. 77.) In spite of its advances, artificial intelligence cannot compete with human intelligence on many levels and will not be able to do so in the very near future. Nonetheless, CTP, CW, and Fuzzy Sets enable computers and human beings to communicate in terms that allow them to express uncertainty regarding measurements, evaluations, and (medical) diagnostics, etc. In theory, this should put the methods of information and communication science used by machines and human beings on levels that are much closer to each other. Acknowledgments The author would like to thank Professor Dr. Lotfi A. Zadeh (Berkeley, California) for his generous help and unstinted willingness to support the author’s project on the history of the theory of fuzzy sets and systems and also Professor Dr. Erich-Peter Klement (Linz, Austria) for his generous help and advice on Karl Menger’s role in the history of fuzzy sets. Many thanks are due to Jeremy Bradley for his advice and help in the preparation of this contribution.
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
79
References 1. R. Seising, “Fuzziness before Fuzzy Sets: Two 20th Century Philosophical Approaches to Vagueness - Ludwik Fleck and Karl Menger”. Proceedings of the IFSA 2005 World Congress, International Fuzzy Systems Association, July 28–31, 2005, Beijing, China, CD. 2. R. Seising, “Vagueness, Haziness, and Fuzziness in Logic, Science, and Medicine – Before and When Fuzzy Logic Began”. BISCSE’05 Forging New Frontiers. 40th of Fuzzy Pioneers (1965–2005), BISC Special Event in Honour of Prof. Lotfi A. Zadeh, Memorandum No. UCB/ERL M05/31, November 2, 2005, Berkeley Initiative in Soft Computing (BISC), Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of Berkeley, Berkeley, California, USA. 3. R. Seising, The 40th Anniversary of Fuzzy Sets – New Views on System Theory. International Journal of General Systems, Special Issue “Soft Computing for Real World Applications, 2006 (in review). 4. R. Seising, Optimality, Noninferiority, and the Separation Problem in Pattern Classification: Milestones in the History of the Theory of Fuzzy Sets and Systems – The 40th Anniversary of Fuzzy Sets. IEEE Transactions on Fuzzy Systems 2006 (in preparation). 5. R. Seising, From Filters and Systems to Fuzzy Sets and Systems – The 40th Anniversary of Fuzzy Sets. Fuzzy Sets and Systems 2006 (in preparation). 6. R. Seising, Roots of the Theory of Fuzzy Sets in Information Sciences – The 40th Anniversary of Fuzzy Sets. Information Sciences 2006 (in preparation). 7. R. Seising, From Vagueness in the Medical Way of Thinking to Fuzzy Reasoning Foundations of Medical Diagnoses. Artificial Intelligence in Medicine, (2006) 38, 237–256. 8. T. Kortabi´nski, Elementy teorii poznania, logiki formalnej i metodologii nauk, Ossolineum, Lwów, 1929. 9. K. Ajdukiewicz, On the problem of universals, Przeglad Filozoficzny, Vol. 38, 1935, 219–234. 10. L. A. Zadeh, From Circuit Theory to System Theory. Proceedings of the IRE, Vol. 50, 1962, 856–865. 11. L. A. Zadeh, Fuzzy Sets, Information and Control, 8, 1965, 338–353. 12. L. A. Zadeh, Fuzzy Sets and Systems. In: Fox J (ed): System Theory. Microwave Research Institute Symposia Series XV, Polytechnic Press, Brooklyn, New York, 1965, 29–37. 13. L. A. Zadeh, R. E. Bellman, R. E. Kalaba, Abstraction and Pattern Classification, Journal of Mathematical Analysis and Applications 13, 1966, 1–7. 14. R. Seising, “Noninferiority, Adaptivity, Fuzziness in Pattern Separation: Remarks on the Genesis of Fuzzy Sets”. Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society, Banff, Alberta, Canada: Fuzzy Sets in the Hearth of the Canadian Rockies 2004 (NAFIPS 2004), June 27–30, 2004, Banff, Alberta, Canada, 2002–2007. 15. R. Seising, “40 years ago: Fuzzy Sets is going to be published”, Proceedings of the 2004 IEEE International Joint Conference on Neural Networks / International Conference on Fuzzy Systems (FUZZ-IEEE 2004), July 25–29, 2004, Budapest, Hungary, CD. 16. R. Seising, “1965 – “Fuzzy Sets” appear – A Contribution to the 40th Anniversary”. Proc. of the Conference FUZZ-IEEE 2005, Reno, Nevada, May 22–25, 2005, CD. 17. R. Seising, “The 40th Anniversary of Fuzzy Sets – A New View on System Theory”. Ying H, Filev D (eds) Proc. of the NAFIPS Annual Conference, Soft Computing for Real World Applications, 22–25 June, 2005, Ann Arbor, Michigan, USA, 92–97. 18. R. Seising, “On the fuzzy way from “Thinking Machines” to “Machine IQ””, Proceedings of the IEEE International Workshop on Soft Computing Applications, IEEE – SOFA 2005, 27–30 August, 2005, Szeged-Hungary and Arad-Romania, 251–256. 19. R. Seising, Die Fuzzifizierung der Systeme. Die Entstehung der Fuzzy Set Theorie und ihrer ersten Anwendungen - Ihre Entwicklung bis in die 70er Jahre des 20. Jahrhunderts, Franz Steiner Verlag, Stuttgart, 2005. 20. G. Frege, Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens, Halle, 1879.
80
R. Seising
21. G. Frege, Funktion und Begriff, 1891. In: Patzig G (ed) Frege G (1986) Funktion, Begriff, Bedeutung, Vandenheoeck & Ruprecht, Göttingen, 18–39. 22. G. Frege, Grundgesetze der Arithmetik. 2 vols., Hermann Pohle, Jena, 1893–1903. 23. B. Russell, Vagueness, The Australasian Journal of Psychology and Philosophy, 1, 1923, 84–92. 24. M. Black, Vagueness. An exercise in logical analysis, Philosophy of Science, 4, 1937, 427–455. 25. M. Black, Reasoning with loose concepts, Dialogue, 2, 1963, 1–12. 26. L. Fleck, O niektórych swoistych ceceach my´ss´lekarskiego, Archiwum Historji i Filo-zofji Medycyny oraz Historji Nauk Przyrodniczych, 6, 1927, 55–64. Engl. Transl.: Ludwik Fleck: Some specific features of the medical way of thinking. In: [23], 39–46. 27. L. Fleck, Entstehung und Entwicklung einer wissenschaftlichen Tatsache. Einführung in die Lehre vom Denkstil und Denkkollektiv, Schwabe & Co, Basel, 1935. 28. T. J. Trenn, R. K. Merton, Genesis and Development of a Scientific Fact, University of Chicago Press, Chicago/London, 1979. 29. T. S. Kuhn, The Structure of Scientific Revolutions, University of Chicago Press, Chicago, 1962. 30. R. S. Cohen, T. Schnelle (eds), Cognition and Fact. Materials on Ludwik Fleck, D. Reidel Publ. Comp. Dordrecht, Boston, Lancaster, Tokyo, 1986, 39–46. 31. L. A. Zadeh, Biological Applications of the Theory of Fuzzy Sets and Systems. In: The Proceedings of an International Symposium on Biocybernetics of the Central Nervous System. Little, Brown and Company: Boston, 1969, 199–206. 32. R. Seising, From Vagueness in the Medical Way of Thinking to Fuzzy Reasoning Foundations of Medical Diagnosis. Artificial Intelligence in Medicine, Special Issue: Fuzzy Sets in Medicine (85th birthday of Lotfi Zadeh), October 2006 (in review). 33. K. Menger, Memories of Moritz Schlick. In: E. T. Gadol (ed), Rationality and Science. A Memorial Volume for Moritz Schlick in Celebration of the Centennial of his Birth. Springer, Vienna, 1982, 83–103. 34. K. Menger, The New Logic: A 1932 Lecture. In: Selected Papers in Logics and Foundations in Didactics, Economics, Vienna Circle Collection, Volume 10, D. Reidel, Dordrecht, 1979. 35. R. Carnap, Logische Syntax der Sprache, Springer, Wien, 1934. (English edition: R. Carnap, Logical Syntax of Language, Humanities, New York.) 36. K. Menger, Statistical Metrics, Proceedings of the National Academy of Sciences, 28, 1942, 535–537. 37. K. Menger, Probabilistic Geometry, Proceedings of the National Academy of Sciences, 37, 1951, 226–229. 38. K. Menger, Probabilistic Theories of Relations, Proceedings of the National Academy of Sciences, 37, 1951, 178–180. 39. K. Menger, Ensembles flous et fonctions aléatoires. Comptes Rendus Académie des Sciences, 37, 1951, 2001–2003. 40. K. Menger, Geometry and Positivism. A Probabilistic Microgeometry. In: K. Menger (ed.) Selected Papers in Logic and Foundations, Didactics, Economics, Vienna Circle Collection, 10, D. Reidel Publ. Comp., Dordrecht, Holland, 1979, 225–234. 41. E. Mach, Die Principien der Wärmelehre. Historisch-kritisch entwickelt. Leipzig, Verlag von Johann Ambrosius Barth, 1896. 42. E. Schrödinger, Grundlinien einer Theorie der Farbenmetrik im Tagessehen. Annalen der Physik, 63, 4, 1920, 397–456, 481–520. 43. E. C. Berkeley, Giant Brains or Machines that Think. John Wiley & Sons, Chapman & Hall, New York, London, 1949. 44. A. M. Turing, Computing Machinery and Intelligence, Mind, LIX, 236, 1950, 433–460. 45. N. Wiener, Cybernetics or Control and Communication in the Animal and the Machine, Hermann & Cie., The Technology Press, and John Wiley & Sons, Cambridge-Mass., New York 1948. 46. C. E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal, 27, 379–423 and 623–656.
Pioneers of Vagueness, Haziness, and Fuzziness in the 20th Century
81
47. L. A. Zadeh, Thinking Machines – A New Field in Electrical Engineering, Columbia Eng. Quarterly, Jan. 1950, 12–13, 30–31. 48. L. A. Zadeh, Frequency Analysis of Variable Networks, Proceedings of the IRE, March, 38, 1950, 291–299. 49. L. A. Zadeh, Some Basic Problems in Communication of Information. In The New York Academy of Sciences, Series II, 14, 5, 1952, 201–204. 50. L. A. Zadeh, K. S. Miller, Generalized Ideal Filters. Journal of Applied Physics, 23, 2, 1952, 223–228. 51. L. A. Zadeh, Theory of Filtering, Journal of the Society for Industrial and Applied Mathematics, 1, 1953, 35–51. 52. R. E. Bellman, R. E. Kalaba, L. A. Zadeh, Abstraction and Pattern Classification, Memorandum RM- 4307-PR, Santa Monica, California: The RAND Corporation, October, 1964. 53. L. A. Zadeh, Fuzzy Sets, ERL Report No. 64-44, University of California at Berkeley, November 16, 1964. 54. L. A. Zadeh, The Birth and Evolution of Fuzzy Logic – A Personal Perspective, Journal of Japan Society for Fuzzy Theory and Systems, 11, 6, 1999, 891–905. 55. L. A. Zadeh, Fuzzy Logic = Computing with Words, IEEE Transactions on Fuzzy Systems, 4, 2, 1996, 103–111. 56. L. A. Zadeh, From Computing with Numbers to Computing with Words – From Manipulation of Measurements to Manipulation of Perceptions, IEEE Trans. on Circuits And Systems-I: Fundamental Theory and Applications, 45, 1, 1999, 105–119. 57. L. A. Zadeh, A New Direction in AI. Toward a Computational Theory of Perceptions. AI-Magazine, 22, 1, 2001, 73–84.
Selected Results of the Global Survey on Research, Instruction and Development Work with Fuzzy Systems Vesa A. Niskanen
Abstract Selected results of the survey on research, instruction and development work with fuzzy systems are presented. This study was carried out by the request of the International Fuzzy Systems Association (IFSA). Electronic questionnaire forms were sent to the IFSA Societies, relevant mailing lists and central persons in fuzzy systems, and 166 persons from 36 countries filled our form. Since our data set was relatively small, we could only draw some general guidelines according to this data, and their statistical analysis is presented below. However, first, studies of this type can support the future research design and work on fuzzy systems by revealing the relevant research areas and activities globally. Second, we can support the future comprehensive educational planning on fuzzy systems globally. Third, we can promote better the marketing of fuzzy systems for decision makers, researchers, students and customers. Fourth, we can support the future global networking of persons, institutes and firms in the area of fuzzy systems.
1 Background Since Lotfi Zadeh’s invention of fuzzy systems in 1965, numerous books and articles are published and a lot of applications are produced in this field. In addition, numerous university-level courses on fuzzy systems have been held. However, we still lack exact numbers concerning publications, applications, researchers and fuzzy courses even though they are relevant for planning and decision making in both the business life and non-corporate public services. In addition, if these numbers are sufficiently convincing, they can well promote further global dissemination of fuzzy systems in the future. For example, it seems that there are already over 40 000 publications and thousands of patents on fuzzy systems as well as thousands of researchers globally.
Vesa A. Niskanen Department of Economics & Management, University of Helsinki, PO Box 27, 00014 Helsinki, Finland e-mail:
[email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
83
84
V.A. Niskanen
Hence, we need global surveys on these activities in order to achieve the foregoing aims. The author has already designed and performed a global survey of this type by the request of the International Fuzzy Systems Association (IFSA) in 2003–2005, and some results were presented in the IFSA World Congress in Beijing in 2005. However, since only 166 persons responded to our questionnaire, we are unable to draw any general conclusions according to this data set. Hence, below we mainly present certain descriptive results and we also draw up some methodological guidelines for performing a global survey on the fuzzy activities in research, instruction and development work. We expect that the considerations below can provide a basis for further studies on this important subject matter. Section 2 provides our design and basic characteristics of a survey. Section 3 presents the results. Section 4 provides a summary of our research.
2 Basic Characteristics of a Survey A survey usually aims to describe systematically and in an exact manner a phenomenon, a group of objects or a subject matter at a particular point in time. Surveys can also determine the relationships that exist between the specific events [1]. In the survey research design we establish our aims and possible hypotheses. We also acquaint ourselves with the previous similar studies and necessary analysis methods as well as make plans for the data collection. Practical matters include tasks such as scheduling and financing. In our case we can establish these aims: 1. A world-wide causal-comparative survey and examination on fuzzy systems in scientific research, university-level instruction and commercial products. 2. Establishment of aims and guidelines for future studies and instruction on fuzzy systems in the Globe. The data collection is based on structured questionnaire forms, and we can use them in interviews or in postal questionnaires (today electronic forms and the Internet seem to replace the traditional postal forms). In addition, we can collect data with standardized tests. A census study seemed more usable in this context than the sample survey although the size of the population was obviously large. This approach was due to the fact that the population structure and size were unknown and thus we were unable to design a reasonable sampling. The author designed an electronic structured questionnaire form, and this was distributed to the IFSA Societies, the BISC mailing list and to some key persons in 2004–2005 [4]. The surveys usually proceed through well-defined stages, and we thus proceed from our problem-setting to conclusions and a research report. If we assume four principal stages in our study, research design is first carried out. Second, data collection is performed. Third, our data is examined. Fourth, conclusions are provided and a report is published.
Selected Results of the Global Survey
85
If we focus on the nature of our variables as well as we consider the frequencies of our observations, we carry out a descriptive survey. If we also perform comparisons between the classes in our data or consider interrelationships between the variables, we perform a comparative survey. Finally, if we aim to explain or interpret the interrelationships between the variables by mainly focusing on their causal connections, we are dealing with a causal-comparative survey. We mainly adopted the third approach. Our data was examined with statistical methods [2, 5]. In this context various methods and tools, such as chi-square analysis, correlation analysis, t-tests, analysis of variance and regression analysis, can be used. Today we can also replace some of these methods with Soft Computing modeling [3]. Below we only sketch some essential results which are mainly based on descriptive statistics because our relatively small and heterogeneous data set prevented us from performing deeper and more conclusive analyses. In addition, lack of space also provides us restrictions in this context.
3 Selected Results Most of our 166 respondents were males (Table 1). The respondents represented 36 countries according to their present citizenship, and they had usually stayed permanently in their native countries although about 60% had also worked abroad for brief periods of time. They ages ranged from 24 to 70 years, and the mean was about 44 years (Fig. 1). There were no significant differences in the means of the ages between the males and females when the t-test was used. Table 1 Respondents according to their gender Gender
Frequency
Percent
Female Male Total
34 132 166
20.5 79.5 100.0
The average age when the respondents became acquainted with the fuzzy systems, i.e., their “fuzzy ages”, was about 29 years (ranging from 16 to 55 years; Fig. 2). According to the t-test, there were no significant differences in the means of these ages between the males and females. As Fig. 3 also shows, there was a significant positive correlation between the respondents’ ages and their “fuzzy ages”. About 70% were scholars, such as Professors and Lecturers, and about 84% had Doctoral degrees (Figs. 4 and 5). According to the one-way analysis of variance (ANOVA) and Kruskal-Wallis non-parametric analysis, there were significant differences in the means of the ages between the titles of the respondents (Fig. 6), but these differences were not significant when their corresponding “fuzzy ages” were considered. Two-way ANOVA models with the additional factor Gender were also used but this variable seemed to be insignificant in this context. The two-way
86
V.A. Niskanen 10
Frequency
8
6
4
2
0 69.00
66.00
62.00
64.00
58.00
60.00
56.00
54.00
52.00
50.00
46.00
48.00
44.00
42.00
38.00
40.00
36.00
34.00
30.00
32.00
28.00
26.00
24.00
Age
Fig. 1 Ages of respondents
20
Frequency
15
10
5
0 55.00 51.00 50.00 47.00 45.00 44.00 42.00 41.00 40.00 39.00 38.00 37.00 36.00 35.00 34.00 33.00 32.00 31.00 30.00 29.00 28.00 27.00 26.00 25.00 24.00 23.00 22.00 21.00 20.00 19.00 16.00
Age fuzzy
Fig. 2 Ages in which respondents first became acquainted with fuzzy systems (“fuzzy ages”)
ANOVA model also showed that there were no significant differences in the means of fuzzy ages when Gender and Degree were our factors. As regards the respondents’ function, most of them were involved in R&D, Education or in several areas simultaneously (Fig. 7). Significant differences in the
Selected Results of the Global Survey
87 Gender Female Male
50,00
Age fuzzy
40,00
30,00
20,00
20,00
30,00
40,00
50,00
60,00
70,00
Age Fig. 3 Ages vs. “fuzzy ages”
Title
Professor Dr., Lecturer Several Other
Fig. 4 Titles of respondents
88
V.A. Niskanen Degree
Bachelor Master Dr.
Other
Fig. 5 Respondents’ degrees 70,00
Mean of age
60,00
50,00
40,00
30,00
20,00
al
er
v Se
d ire
ct
Le
r
e ur
r
af
t
st
tis
en
h ec
ci
rs te
so
es
pu
er th
O
et
r.,
R
D
of
om
Pr
C er
ft ro
be
ne
gi
En
er
ag
em
M
an
M
f Title
Fig. 6 The means of ages according to the titles
means of ages were found (Fig. 8), whereas no significant differences were in the case of their “fuzzy ages” (standard ANOVA and Kruskall-Wallis modeling). Only about 55% were members of the IFSA Societies, and most of the members belonged to EUSFLAT (about 25%; Fig. 9). Significant differences in the means of
Selected Results of the Global Survey
89
Function
R&D Eng / Computer sci Education Several Other
Fig. 7 Respondents’ functions
70,00
Mean of age
60,00
50,00
40,00
30,00
20,00
l ra ve
Se
er
th O
d
ire
et g
n lti su
n ci rs
te
pu
t
en
om
io
/C
at uc
on
R
C
Ed
g
&D
em
ag
an
En
R
M
Function
Fig. 8 Ages according to the functions
the ages were found in this context (ANOVA, Fig. 10), whereas there were no significant differences in the means of the “fuzzy ages”. According to Fisher’s exact test, there was no correlation between the IFSA membership and respondent’s degree.
90
V.A. Niskanen MemIFSA
No member EUSFLAT SBA Several Other
Fig. 9 Membership in the IFSA Societies
55,00
M e an o f a g e
50,00
45,00
40,00
l
SA
ra ve
Se
AF
H
EF
FA
H
G
A
Fig. 10 Ages according to the IFSA membership
SI
SB
IS
S
IP
T LA
AT
FS
SF
AF
KF
C
N
EU
r be
em m
FT
SO
o
N
MemIFSA
Selected Results of the Global Survey
91 Philosophy & Math Control Pattern recognition Decision making HW & SW eng Other Several
Fig. 11 Research areas
When the respondents’ research areas were examined, about 63% of them were working in several fields. The most popular single areas were decision making as well as philosophy (including logic) and mathematics (Fig. 11). No significant differences were found in the means of the ages or “fuzzy ages” when Gender and Research area were the factors in the two-way ANOVA models. About 45% of the respondents had completed a doctoral thesis on fuzzy systems within the years 1975–2005 (Fig. 12). There were no significant differences in the means of these years between the males and females. There was a negative correlation between the ages and the years when the thesis was completed (Fig. 13). About 7% of the respondents had never attained any fuzzy conference (Fig. 14). About 13% had published at least one scientific book on fuzzy systems (Fig. 15), about 39% at least one journal article (Fig. 16) and about 40% at least one conference paper (Fig. 17). No significant differences in the means of numbers of the publications between the males and the females were not found when the t-tests were used. About 62% of the respondents had provided instruction in fuzzy systems, and 41% of them had provided this in robotics, 31% in philosophy and mathematics, 26% in control and 19% in pattern recognition. The first courses were held in 1971 (Fig. 18). In instruction on fuzziness, about 9% had produced textbooks (1 to 4 items), 17% digital material, 14% computer software and 2% other materials. There was a significant negative correlation between their ages and the year when they first held classes on fuzzy systems (Fig. 19).
92
V.A. Niskanen 7
6
Frequency
5
4
3
2
1
0 2005
2003
2004
2001
2002
1999
2000
1998
1997
1996
1994
1995
1992
1993
1989
1991
1987
1988
1984
1983
1980
1979
1977
1978
Thesis
Fig. 12 Doctoral theses on fuzzy systems
Gender
2005
Female Male
2000
Thesis
1995
1990
1985
1980
1975 20,00
30,00
40,00
50,00
Age
Fig. 13 Age vs. the year when fuzzy thesis completed
60,00
70,00
Selected Results of the Global Survey
93
40
Frequency
30
20
10
0 never
<1
1
2
3
4
5
6
>6
Attendconf
Fig. 14 Respondents’ number of fuzzy conferences per year
10
Frequency
8
6
4
2
0 1
2
3
4
5
Books Fig. 15 Number of published scientific books per person
10
13
55
94
V.A. Niskanen
6
5
Frequency
4
3
2
1
0 200 195 133 86 73 42 39 38 35 29 26 24 23 22 20 15 14 13 12 11 10 9 7 6 5 4 3 2 1
Articles
Fig. 16 Number of published articles per person
4
Frequency
3
2
1
0
138
100
110
58
48
44
37
Fig. 17 Number of published conference papers per person
42
30
26
23
18
21
13
15
10
8
5
1
3
Confpapers
Selected Results of the Global Survey
95
12
10
Frequency
8
6
4
2
0
2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1979 1976 1975 1974 1972 1971
Teachyear
Fig. 18 When the respondents’ first courses on fuzzy systems were held
2010
Teachyear
2000
1990
1980
1970 20,00
30,00
40,00
50,00
Age Fig. 19 Age vs. the year when the first fuzzy courses were held
60,00
70,00
96
V.A. Niskanen
About 58% of the respondents had made fuzzy applications, and they were made between the years 1971 and 2005 (Fig. 20). These included such areas as heavy industry (about 7% of them), vehicle applications (4%), spacecraft applications (1%), consumer products (2%), financing and economics (8%), robotics (7%), computer software and hardware (10%), health care (6%) and military applications (2%). There was a significant negative correlation between their ages and the year when they first made their applications (Fig. 21). There were no significant differences in the means between the males and females when the numbers of applications were considered. This similar situation concerned the differences in the means of the years when fuzzy applications were first made.
4 Summary and Discussion We have considered the data set which was collected within the global survey on fuzzy research, instruction and development work in 2004–2005. We used an electronic questionnaire form in a census study, and 166 responds were obtained. Since our data set was quite small, we were unable to draw any generalized conclusions concerning the entire fuzzy community but rather we can only sketch some tentative results and provide some guidelines for future studies in this area. Lack of space also restricted our examinations in this context. In addition, due to the small data set, we also performed both parametric and non-parametric statistical analyses. The foregoing results nevertheless show that we already have a global fuzzy community with the diversity of professions and for decades this community has been very active in research, instruction and development work. It also seems that younger generations have been recruited well in these areas. However, we still should have much more females in this work as well as persons from the development countries. Our research aims would also allow us to perform such qualitative research as semi-structured interviews and action research in order to examine the structures and underlying connections of those human and institutional networks that can be found behind the fuzzy activities. In this context, as well as in our quantitative research, we could also apply the theory of networks, graph theory, concept maps, cognitive maps and neuro-fuzzy modeling. For example, we could use neuro-fuzzy models in regression and cluster analysis. In addition, we could examine simulations of emerging networks and their information flows by applying fuzzy cognitive maps. Hopefully, we can perform these studies in the further surveys. In the future surveys we should collect more representative data sets because this type of work is very important to the fuzzy community. In practice this means hundreds or even thousands of cases. If a sufficient number of cases can be collected, we can support the future research design and work on fuzzy systems by revealing the relevant research areas and activities globally. Second, we can enhance the visibility of the fuzzy community. Third, we can support the future comprehensive educational planning on fuzzy systems globally. Fourth, we can promote better the marketing of fuzzy systems for decision makers, researchers, students and customers.
Selected Results of the Global Survey
97
10
Frequency
8
6
4
2
0 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1992 1991 1990 1989 1988 1987 1985 1982 1981 1980 1978 1977 1971
Year_application
Fig. 20 Number of fuzzy applications
2010
Year_application
2000
1990
1980
1970
20,00
30,00
40,00
50,00
Age
Fig. 21 Age vs. the first year when fuzzy applications were made
60,00
70,00
98
V.A. Niskanen
Fifth, we can support the future global networking of persons, institutes and firms in the area of fuzzy systems. In brief, we can thus convincingly disseminate the fact that the future is fuzzy. Acknowledgments I express my gratitude to Professor Lotfi Zadeh for providing me the idea on performing this survey. I also express my thanks to IFSA for supporting this work.
References 1. Cohen L and Manion L (1989) Research Methods in Education. Routledge, London. 2. Guilford J and Fruchter B (1978) Fundamental Statistics in Psychology and Education. McGraw-Hill, London. 3. Niskanen V A (2003) Soft Computing Methods in Human Sciences. Springer Verlag, Heidelberg. 4. Niskanen V A (2004) questionnaire form, https://kampela.it.helsinki.fi/elomake/lomakkeet/573/lomake.html 5. SPSSTM Statistical Software, Version 12.0.1.
Selected Results of the Global Survey
99
Appendices Appendix 1. Respondents’ present citizenship. Country
Frequency
Percent
Argentina Australia Austria Belgium Brazil Bulgaria Canada China Colombia Czech Republic Finland France Germany Greece Hungary India Iran Ireland Israel Italy Japan Latvia Netherlands Nigeria Poland Portugal Romania Russia Slovakia Slovenia South Africa Spain Taiwan UK Ukraine USA Total
3 4 1 10 16 3 4 3 3 8 5 2 6 2 2 8 2 1 1 10 1 1 1 2 2 1 4 3 3 1 1 34 2 5 2 9 166
1.8 2.4 .6 6.0 9.6 1.8 2.4 1.8 1.8 4.8 3.0 1.2 3.6 1.2 1.2 4.8 1.2 .6 .6 6.0 .6 .6 .6 1.2 1.2 .6 2.4 1.8 1.8 .6 .6 20.5 1.2 3.0 1.2 5.4 100.0
100
V.A. Niskanen
Appendix 2. The electronic questionnaire form World-Wide Survey on Fuzzy Systems Dear Recipient of This Questionnaire Form, Since information on world-wide statistics and research networks of fuzzy systems activities are continuously required in various instances of decision making, by the initiative of Prof. Lotfi Zadeh the International Fuzzy Systems Association (IFSA) has nominated Docent, Dr. Vesa A. Niskanen at the University of Helsinki, Finland, to carry out this task as a Chair of an International Committee of IFSA, referred to as the IFSA Information Committee (website: http://www.honeybee.helsinki.fi/users/niskanen/ifsainfocom.htm ), in 2004–2005. In practice Information Committee collects world-widely the data on research groups, applications, patents, types of products applying fuzzy logic and universitylevel instruction arrangements. According to results, aims and guidelines for future global planning, decision making, research and instruction on fuzzy systems can be suggested. This survey also gives you a good opportunity to provide information on your activities and promote visibility of your work world-widely. The information you provide is also very valuable for future policies of IFSA and it will be examined anonymously and strictly confidentially. We wish that you can spend a little bit of your time to answer the questions provided below. In case of problems or questions concerning the filling of this form, please feel free to contact Dr. Niskanen. An example of a filled form is on http://www.honeybee.helsinki.fi/users/niskanen/surveyexample.pdf ****************************** PLEASE REPLY PRIOR TO MARCH 20, 2005. ****************************** Sincerely, Lotfi Zadeh IFSA Honorary President Zenn Bien IFSA President Dr. Vesa A. Niskanen IFSA Information Committee Chair Selected Results of the Global Survey University of Helsinki, Dept. of Economics & Management PO Box 27, 00014 Helsinki, Finland Tel. +358 9 40 5032030, Fax +358 9 191 58096 E-mail:
[email protected]
Selected Results of the Global Survey Part 1. Background information about you 1. Your gender: Male
Female
2. Your country of citizenship when you were born: 3. Your present country of citizenship: 4. Your year of birth (e.g. 1965): 5. In what year did you first time become acquainted with fuzzy systems (for example in your work, studies or research. E.g. 1989)? 6. Countries in which you have principally studied or worked with fuzzy systems (max. 5):
Country #1 Country #2 Country #3 Country #4 Country #5
101
102 Part 2. Your present working environment 7. What is your title? President / Chair of the board / CEO General manager / Manager Owner Member of technical staff Engineer Computer scientist Professor / Associate prof. / Assistant prof. Docent / Dr. / Reader / Lecturer / Instructor / Research scientist Consultant Retired Other (please specify -->)
8. What is your highest university degree? Bachelor Master Doctor Other (please specify -->)
9. What is your principal function? Management Research / Development
V.A. Niskanen
Selected Results of the Global Survey Selected Results of the Global Survey Engineering / Computer science Marketing / Sales / Purchasing Education / Teaching Consulting Retired Other (please specify -->)
10. Are you a member of any IFSA Society? (c.f. http://www.pa.info.mie-u.ac.jp/~furu/ifsa/ )
No, I am not a member SOFT EUSFLAT NAFIPS CFSAT KFIS SBA VFSS SIGEF FMSAC HFA FIRST NSAIS SCI HAFSA
103
104
V.A. Niskanen
Part 3. Your activities in fuzzy systems research at scientific level (skip Part 3 if you do not have any) 11. What are your principal research areas on fuzzy systems (tick all the items that fit your case)?
Philosophy (including logic) or mathematics Control Pattern recognition Decision making Robotics Hardware or software engineering Other (please specify -->)
12. If you have written a doctoral thesis on fuzzy systems, in what year did you completed it (the first one if you have many, e.g. 1989; leave blank if none): 13. How many scientific publications have you written on fuzzy systems? (put numbers to boxes, e.g. 3; put 0 if none) You are the only author Books in English Articles in edited books or refereed international journals in English Selected Results of the Global Survey Papers in international conference proceedings in English
With coauthor(s)
Selected Results of the Global Survey Books in other languages Articles in edited books or refereed international journals in other languages Papers in conference proceedings in other languages
14. How often have you attended international conferences, seminars or workshops on fuzzy systems (average per year)?
Never Less than once a year Once per year Twice per year Three times per year Four times per year Five times per year Six times per year More than six times per year
105
106
V.A. Niskanen
Part 4. Your activities at university-level fuzzy systems teaching or education (skip Part 4 if you do not have any) 15. In what year did you start teaching (i.e., holding classes) on fuzzy systems (e.g. 1993, leave blank if none): 16. In what year did you start tutoring or supervising undergraduate or postgraduate students on fuzzy systems (e.g. 1993, leave blank if none): 17. What are your principal areas in education or teaching of fuzzy systems (tick all the items that fit your case)? Philosophy (including logic) or mathematics Control Pattern recognition Decision making Robotics Hardware or software engineering Other (please specify -->)
18. How many items of instructional material have you published or produced on fuzzy systems? (Put numbers to boxes, e.g. 3; put 0 if none) You are the only With publisher or prosomeone else ducer Textbooks in English
Selected Results of the Global Survey Digital material in English (www pages, slide presentations, videos, DVDs etc.) Computer software in English Textbooks in other languages Digital material in other languages Selected Results of the Global Survey Computer software in other languages Other material
107
108
V.A. Niskanen
Part 5. Your activities in fuzzy systems applications (skip Part 5 if you do not have any) 19. In what year did you start doing applications with fuzzy systems (e.g. 1997): 20. How many fuzzy systems applications have you done? (put numbers to boxes, e.g. 3; put 0 if none) For commercial purposes when you are the only designer To heavy industry To automobiles, trains, airplanes or other vehicles To spacecrafts To home appliances (washing machines etc.), cameras, phones) To finance or economics To robotics To computer software or hardware
For commercial purposes as a team member
For other purposes (scientific, educational etc) when you are the only designer
For other purposes (scientific, educational etc) as a team member
Selected Results of the Global Survey To health care instruments or services To military purposes 21. In which countries are your applications principally used (mention max. 5 countries)
Country #1: Country #2: Country #3: Country #4: Country #5:
Part 6. Additional information Selected Results of the Global Survey 22. If you have any comments or additional information on the foregoing questions, please include them in here. You can also put your website address here if you like it to be mentioned in the survey report.
Proceed Reset
Thank you very much for your contribution.
109
Fuzzy Models and Interpolation László T. Kóczy, János Botzheim and Tamás D. Gedeon
Abstract This paper focuses on two essential topics of the fuzzy area. The first is the reduction of fuzzy rule bases. The classical inference methods of fuzzy systems deal with dense rule bases where the universe of discourse is fully covered. By applying sparse or hierarchical rule bases the computational complexity can be decreased. The second subject of the paper is the introduction of some fuzzy rule base identification techniques. In some cases the fuzzy rule base might be given by a human expert, but usually there are only numerical patterns available and an automatic method has to be applied to determine the fuzzy rules.
1 Introduction An overview of fuzzy model identification techniques and fuzzy rule interpolation is presented in this paper. In Sect. 2 the historical background of the fuzzy systems is described. Section 3 introduces the fuzzy rule base reduction methods, including fuzzy rule interpolation, hierarchical fuzzy rule bases, and the combination of these two methods. The next part of the paper deals with fuzzy model identification. In Sect. 4 clustering based fuzzy rule extraction is discussed, and in Sect. 5 our most recently proposed technique, the bacterial memetic algorithm is introduced.
László T. Kóczy Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary Institute of Information Technology and Electrical Engineering, Széchenyi István University, Gy˝or, Hungary János Botzheim Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary Department of Computer Science, The Australian National University Tamás D. Gedeon Department of Computer Science, The Australian National University
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
111
112
L.T. Kóczy et al.
2 Background Fuzzy rule based models were first proposed by Zadeh (1973) and later practically implemented with some technical innovations by Mamdani and Assilian (1975). If . . . then . . . rules contain an arbitrary number of variables xi with antecedent descriptors, formulated either by linguistic terms or directly by convex and normal fuzzy sets (fuzzy numbers) in the If . . . part, and one or several output variables yi and their corresponding consequent terms/membership functions in the then . . . part. An alternative model was proposed by Takagi and Sugeno (1985) where the consequent part was represented by y = f(x) type functions. Later Sugeno and his team proposed the basic idea of hierarchical fuzzy models where the meta-level contained rules with symbolic outputs referring to further sub-rule bases on the lower levels. All these types of fuzzy rule based models might be interpreted as sophisticated and human friendly ways of defining (approximate) mappings from the input space to the output space, the rule bases being technically fuzzy relations of X×Y but representing virtual “fuzzy graphs” of extended (vague) functions in the state space. A great advantage of all these models compared to more traditional symbolic rule models is the inherent interpolation property of the fuzzy sets providing a cover of the input space, with the kernel values of the antecedents corresponding to the areas (or points) where supposedly exact information is available – typical points of the graph, and vaguer “grey” areas bridging the gaps between these characteristic values by the partially overlapping membership functions or linguistic terms. It was however necessary in all initial types of fuzzy models that the antecedents formed a mathematically or semantically full cover, without any gaps between the neighboring terms. The “yellow tomato problem” proposed by Kóczy and Hirota pointed out that often it was not really necessary to have a fuzzy cover, since “yellow tomatoes” could be interpreted as something between “red tomatoes” and “green tomatoes”, with properties lying also in between, the former category describing “ripe tomatoes”, the latter “unripe tomatoes”, yellow tomatoes being thus “half ripe”. Linear fuzzy rule interpolation proposed in 1990 provided a new algorithm extending the methods of reasoning and control to sparse fuzzy covers where gaps between adjoining terms could be bridged with the help of this new approach, delivering the correct answer both in the case of linguistic rules like the “Case of the yellow tomato” and in more formal situations where membership functions were defined only by mathematical functions. This idea was later extended to many non-linear interpolation methods, some of them providing very practical means to deal with real life systems. In 1993 and in a more advanced and practically applicable way in 1997 the authors proposed the extension of fuzzy interpolation to a philosophically completely new area, the sub-models of the hierarchical rule base itself. Interpolation of submodels in different sub-spaces raised new mathematical problems. The solution was given by a projection based approach with local calculations and subsequent substitution into the meta-level rule base, replacing sub-model symbols by actual fuzzy sub-conclusions.
Fuzzy Models and Interpolation
113
After an overview of these types of fuzzy models we will deal with the problem of model identification. Often human experts provide the approximate rules – like they did in the hierarchical fuzzy helicopter control application by Sugeno referred to above, but more often, there are only input-output data observed on a black box system as the starting point to construct the fuzzy rule structure and the rules themselves. Sugeno and Yasukawa (1993) proposed the use of fuzzy c-means (FCM) clustering originally introduced by Bezdek (1981) for identifying output clusters in the data base, from where rules could be constructed by best fitting trapezoidal membership functions applied on the input projections of the output clusters. This method has limited applicability, partly because of the conditions of FCM clustering, and partly because of the need to have well structured behavior from the black box under observation. In the last few years our group has introduced a variety of identification methods for various types of fuzzy models. The FCM based approach could be essentially extended to hierarchical models as well, and was applied to a real life problem (petroleum reservoir characterization (Gedeon et al., 1997; Gedeon et al., 2003)) successfully. Another was the bacterial evolutionary algorithm. The use of Levenberg-Marquardt (LM) optimization for finding the rule parameters was borrowed from the field of neural networks. These latter approaches all had their advantages and disadvantages, partly concerning convergence speed and accuracy, partly locality and globality of the optimum. The most recent results (Botzheim et al., 2005) showed that the combination of bacterial evolutionary algorithm and LM, called Bacterial Memetic Algorithm delivered very good results compared to the previous ones, with surprisingly good approximations even when using very simple fuzzy models. Research on extending this new method to more complicated fuzzy models is going on currently.
3 Fuzzy Rule Base Reduction The classical approaches of fuzzy control deal with dense rule bases where the universe of discourse is fully covered by the antecedent fuzzy sets of the rule base in each dimension, thus for every input there is at least one activated rule. The main problem is the high computational complexity of these traditional approaches. If a fuzzy model contains k variables and maximum T linguistic terms in each dimension, the order of the number of necessary rules is O(T k ). This expression can be decreased either by decreasing T , or k, or both. The first method leads to sparse rule bases and rule interpolation, and was first introduced by Kóczy and Hirota (see e.g. Kóczy and Hirota, 1993a; Kóczy and Hirota, 1993b). The second, more effectively, aims to reduce the dimension of the sub-rule bases by using meta-levels or hierarchical fuzzy rule bases. The combination of these two methods leads to the decreasing of both T and k, and was introduced in (Kóczy and Hirota, 1993c).
114
L.T. Kóczy et al.
3.1 Fuzzy Rule Interpolation The rule bases containing gaps require completely different techniques of reasoning from the traditional ones. The idea of the first interpolation technique, the KH method, was proposed in (Kóczy and Hirota, 1993a; Kóczy and Hirota, 1993b), and considers the representation of a fuzzy set as the union of its α-cuts: A=
α Aα
(1)
α∈[0,1]
This method calculates the conclusion by its α-cuts. Theoretically all α-cuts should be considered, but for practical reasons only a finite set is taken into consideration during the computation. The KH rule interpolation algorithm requires the following conditions to be fulfilled: the fuzzy sets in both premises and consequences have to be convex and normal (CNF) sets, with bounded support, having continuous membership functions. Further, there should exist a partial ordering among the CNF sets of each variable. We shall use the vector representation of fuzzy sets, which assigns a vector of its characteristic points to every fuzzy set. The representation of the piecewise linear fuzzy set A will be denoted by vector a = [a−m , . . . , a0 , . . . , an ], where ak (k ∈ [−m, n]) are the characteristic points of A and a0 is the reference point of A having membership degree one. A partial ordering among CNF fuzzy sets (convex and normal fuzzy sets) is defined as: A ≺ B if ak ≤ bk (k ∈ [−m, n]). The basic idea of fuzzy rule interpolation (KH-interpolation) is formulated in the Fundamental Equation of Rule Interpolation (FERI): D(A∗ , A1 ) : D(A∗ , A2 ) = D(B ∗ , B1 ) : D(B ∗ , B2 )
(2)
In this equation A∗ and B∗ denote the observation and the corresponding conclusion, while R1 =A1 → B1 , R2 =A2 → B2 are the rules to be interpolated, such that A1 ≺ A∗ ≺ A2 and B1 ≺ B2 . If in some sense D denotes the Euclidean distance between two symbols, the solution for B∗ results in simple linear interpolation. If D = d˜ (the fuzzy distance family), linear interpolation between corresponding α-cuts is performed and the generated conclusion can be computed as below, (as it is first described in (Kóczy and Hirota, 1993c)): b1k b2k + ∗ d(a1k , ak ) d(a2k , ak∗ ) bk∗ = 1 1 + d(a1k , ak∗ ) d(a2k , ak∗ )
(3)
where the first index (1 or 2) represents the number of the rule, while the second (k) the corresponding α-cut. From now on d(x, y) = |x − y|, so that (3) becomes: K H b ∗ = (1 − λ )b + λ b , where λ = (a ∗ − a )/(a − a ) (for the left and k 1k k 2k k 1k 2k 1k k k right side, respectively).
Fuzzy Models and Interpolation
115
This family of interpolation techniques has various advantageous properties, mainly simplicity, convenient practical applicability, stability, however, the direct applicability of this result is limited as the points defined by these equations often describe an abnormal membership function for B ∗ that needs further transformation to obtain a regular fuzzy set. Even then, the conclusion may need further transformation, since it may not be normal. Furthermore, the method does not conserve piecewise linearity of the rules and the observation. In order to avoid this difficulty, some alternative or modified interpolation algorithms were proposed, which solve the problem of abnormal solutions. Some of these algorithms rely on the same idea of applying the Fundamental Equation of rule interpolation for some kind of metric, or in some cases non-metric dissimilarity degree. However, the way of measuring this distance or dissimilarity is varied. In (Gedeon and Kóczy, 1996), the distance is measured only for α=1 and so the position and length of the core in each of the k input dimensions determine the position and the core of conclusion in the output space Y . The remaining information is contained in the shape of the flanks of the fuzzy sets describing the rules, (the parts where 0<μ<1). Depending on the type of the membership functions (trapezoidal or more general) a single value of fuzziness, F, is the basis for the calculation of the points of the flanks of B ∗ , where F = |S0 -S1 |, S is an arbitrary fuzzy set, the subscripts refer to the corresponding minimal or maximal points of the core and subscript, “core and support points”, respectively. Alternatively, a function of α-dependent fuzziness values based on either the core or the support points is the base for this calculation. Here however a logarithmic type of dissimilarity is applied. These latter methods guarantee that the resulting conclusion is always “normal” (in the sense of not being abnormal). In (Baranyi et al., 1999) a coordinate transformation of the data is used before applying the KH interpolation, which guarantees the normality of the conclusion obtained after interpolation.
3.2 Hierarchical Fuzzy Rule Bases The basic idea of using hierarchical fuzzy rule bases is the following: Often the multi-dimensional input space X = X 1 × X 2 × . . . × X m can be decomposed, so that some of its components, e.g. Z 0 = X 1 × X 2 × . . . × X p determine a subspace of X ( p < m), so that in Z 0 a partition Π={D1 , D2 , . . ., Dn } can be determined: n Di = Z 0 . i=1
In each element of Π, i.e. Di , a sub-rule base Ri can be constructed with local validity. In the worst case, each sub-rule base refers to exactly X/Z 0 = X p+1 ×. . .× X m . The complexity of the whole rule base O(T m ) is not decreased, as the size of R0 is O(T p ), and each Ri , i > 0, is of order O(T m− p ), O(T p )× O(T m− p ) = O(T m ). A way to decrease the complexity would be finding in each Di a proper subset of {X p+1 × . . . × X m }, so that each Ri contains only less than m − p input variables. In some concrete applications in each Di
116
L.T. Kóczy et al.
a proper subset of {X p+1 , . . . , X m } can be found so that each Ri contains only less than m– p input variables, and the rule base has the following structure: R0 :
If z 0 is D1 then use R1 If z 0 is D2 then use R2 ... If z 0 is Dn then use Rn
R1 :
If z 1 is A11 then y is B11 If z 1 is A12 then y is B12 ... If z 1 is A1r1 then y is B1r1
R2 :
If z 2 is A21 then y is B21 If z 2 is A22 then y is B22 ... If z 2 is A2r2 then y is B2r2
Rn :
If z n is An1 then y is Bn1 If z n is An2 then y is Bn2 ... If z n is Anrn then y is Bnrn
where z i ∈ Z i , Z 0 × Z i being a proper subspace of X for i =1, .., n. n If the number of variables in each Z i is ki < m − p and max ki = K < m − p, i=1
then the resulting complexity will be O(T p+K ) < O(T m ), so the structured rule base leads to a reduction of the complexity. The task of finding such a partition is often difficult, if not impossible, (sometimes such a partition does not even exist). There are cases when, locally, some variables unambiguously dominate the behavior of the system, and consequently the omission of the other variables allows an acceptably accurate approximation. The bordering regions of the local domains might not be crisp or even worse, these domains overlap. For example, there can be a region D1 , where the proper subspace Z 1 dominates, and another region D2 , where another proper subspace Z 2 is sufficient for the description of the system, however, in the region between D1 and D2 all variables in [Z 1 × Z 2 ] play a significant role ([.×.] denoting the space that contains all variable that occur in either argument within the brackets). In this case, sparse fuzzy partitions can be used, so that in each element of the partition a proper subset of the remaining input state variables is identified as exclusively dominant. ˆ Such a sparse fuzzy partition can be described as follows: ={D 1 , D2 , . . . , Dn } and n n Cor e(Di ) ⊂ Z 0 in the proper sense (fuzzy partition). Even Supp(Di ) ⊂ Z 0 i=1
i=1
is possible (sparse partition). If the fuzzy partition chosen is informative enough concerning the behavior of the system, it is possible to interpolate its model among ˆ the elements of . Each element Di will determine a sub-rule base Ri referring to another subset of variables. The technical difficulty is how to combine the “sub-conclusions” Bi∗ with the help of R0 into the final conclusion.
Fuzzy Models and Interpolation
117
3.3 Fuzzy Rule Interpolation in Hierarchical Rule Bases This section deals with reduction of both the maximum T linguistic terms, and the k variables. Let us assume that the observation on X is A∗ and its projections are: A∗0 = A∗ /Z 0 , A∗1 = A∗ /Z 1 , A∗2 = A∗ /Z 2 . Using the Fundamental Equation, the two sub-conclusions, obtained from the two sub-rule bases R1 and R2 are: K H 1∗ bk K H 2∗ bk
1 1 = (1 − λ1k )b1k + λ1k b2k , and 2 2 = (1 − λ2k )b1k + λ2k b2k respectively.
(The superscript shows the reference to the rule base R1 and R2 .) Finally, by substituting the sub-conclusions into the meta-rule base we get: KH ∗ bk
= (1 − λ0k )bk1∗ + λ0k bk2∗
(4)
The steps of the algorithm are as follows: 1. Determine the projection A0 ∗ of the observation A∗ to the subspace of the fuzzy ˆ partition . 2. Find the interpolating rules. 3. Determine λ0k . 4. For each Ri determine Ai ∗ the projection of A∗ to Z i . Find the interpolating rules in each Ri . 5. Determine the sub-conclusions for each sub-rule base Ri . 6. Using the sub-conclusions from step 5, compute the final conclusion according to (4). Figure 1 shows an example of the algorithm. In step 4, in each sub-rule base, a different inference engine can be applied, e.g., if the sub-rule base itself is dense, the Mamdani-algorithm (or one of its variations), or in any case one of the interpolation algorithms can be used. It might be reasonable to apply the KH interpolation or one of the methods summarized above in step 4, while in step 5 usually the best recommendation is the method in (Gedeon and Kóczy, 1996) as it is directly applicable also for flanking domains Di with wide areas and projected observations A0 ∗ with narrow supports, e.g. crisp singletons.
4 Fuzzy Model Identification One of the crucial problems of fuzzy rule based modeling is how to find an optimal or at least a quasi-optimal rule base for a certain system. In most applications there is
118
L.T. Kóczy et al. A*
B1*
B*
R0
B2*
If z0 is D1 then use R1 If z0 is D2 then use R2 A* / Z1
A* / Z2
R1
R2
If z1 is A11 then y is B11 If z1 is A12 then y is B12
If z2 is A21 then y is B21 If z2 is A22 then y is B22
Fig. 1 Interpolation in a hierarchical model
no human expert available, thus some automatic method to determine the fuzzy rule base must be employed. In this section and in the next one some of these methods are introduced.
4.1 Clustering-based Rule Extraction Technique Recently, clustering-based approaches have been proposed for rule extraction (Wong et al., 1997; Wang and Mendel, 1992). Most of the techniques use the idea of partitioning the input space into fixed regions to form the antecedents of the fuzzy rules. Although these techniques have the advantage of efficiency, they may lead to the creation of a dense rule base that suffers from rule explosion. In general, the number of rules generated is T k where k is the number of input dimensions and T is the terms per input. In this case, the number of rules generated grows exponentially with the increase of input dimensions. Due to this reason, the techniques are not suited for generating fuzzy rule bases that have a large number of input dimensions. Among the rule extraction techniques proposed in the literature, Sugeno and Yasukawa’s (Sugeno and Yasukawa, 1993) technique (referred to as “SY method” hereafter) is one of the earliest work that emphasize the generation of a sparse rule base. The SY approach clusters only the output data and induces the rules by computing the projections to the input domains of the cylindrical extensions of the fuzzy clusters. This way, the method produces only the necessary number of rules for the input-output sample data (more details later). The paper (Sugeno and Yasukawa, 1993) discusses the proposed technique at the methodological level leaving out some implementation details. The SY technique was further examined in (Tikk et al., 2002) where additional readily implementable techniques are proposed to complete the modeling methodology.
Fuzzy Models and Interpolation
119
In the first step of SY modeling, the Regularity criterion (Ihara, 1980) is used to assist in the identification of “true” input variables that have significant influence on the output. The input variables that have little or no influence on the output are ignored for the rest of the process. The true input variables are then used in the actual rule extraction process. The rule extraction process starts with the determination of the partition of the output space. This is done by using fuzzy c-means clustering (Bezdek, 1981) (see Sect. 4.2). For each output fuzzy cluster Bi resulting from the fuzzy c-means clustering, a cluster in the input space Ai can be induced. The input cluster can be projected onto the various input dimensions to produce rules of the form: If x1 is Ai1 and x2 is Ai2 and . . . and xn is Ain then y is Bi
4.2 Fuzzy c-Means Clustering Given a set of data, Fuzzy c-Means clustering (FCMC) performs clustering by iteratively searching for a set of fuzzy partitions and the associated cluster centers that represent the structure of the data as best as possible. The FCMC algorithm relies on the user to specify the number of clusters present in the set of data to be clustered. Given the number of cluster c, FCMC partitions the data X = {x 1 , x 2 , . . . , x n } into c fuzzy partitions by minimizing the within group sum of squared error objective function as follows (5). n c (Uik )m x k − vi 2 , Jm (U, V ) =
(5)
k=1 i=1
1≤m≤∞ where Jm (U, V ) is the sum of squared error for the set of fuzzy clusters represented by the membership matrix U , and the associated set of cluster centers V . ||.|| is some inner product induced norm. In the formula, ||x k − vi ||2 represents the distance between the data x k and the cluster center vi . The squared error is used as a performance index that measures the weighted sum of distances between cluster centers and elements in the corresponding fuzzy clusters. The number m governs the influence of membership grades in the performance index. The partition becomes fuzzier with increasing m and it has been shown that the FCMC algorithm converges for any m ∈ (1, ∞). The necessary conditions for (5) to reach its minimum are ⎞−1 ⎛ c x k − vi 2/(m−1) ⎠ ∀i, ∀k Uik = ⎝ x k − v j j =1
and
(6)
120
L.T. Kóczy et al. n
vi =
(Uik )m x k
k=1 n
(7) (Uik
)m
k=1
In each iteration of the FCMC algorithm, matrix U is computed using (6) and the associated cluster centers are computed as (7). This is followed by computing the square error in (5). The algorithm stops when either the error is below a certain tolerance value or its improvement over the previous iteration is below a certain threshold. The FCMC algorithm cannot be used in situations where the number of clusters in a set of data is not known in advance. Since the introduction of FCMC, a reasonable amount of work has been done on finding the optimal number of clusters in a set of data. This is referred to as the cluster validity problem. Among numerous alternative cluster validity indices proposed in the literature the most suitable proved to be the one proposed by Fukuyama and Sugeno (1989). S(c) =
n c (Uik )m ( x k − vi 2 − vi − x ¯ 2)
(8)
k=1 i=1
2 < c < n, where n is the number of data points to be clustered; c is the number of clusters; x k is the k t h data, x is the average of data; vi is the i th cluster center; Uik is the membership degree of the k t h data with respect to the i t h cluster and m is the fuzzy exponent. The number of clusters, c, is determined so that S(c) reaches a local minimum as c increases. The terms ||x k − vi || and ||vi − x|| represent the variance in each cluster and variance between clusters respectively. Therefore, the optimal number of clusters is found by minimizing the distance among data and the corresponding cluster center and maximizes the distance among data in different clusters. Other cluster validity indices can be found in (Yang and Wu, 2001).
4.3 Hierarchical Fuzzy Modeling Fuzzy clustering plays an important role in feature selection. The idea is that given a set of clusters, the most important subset of features (true inputs) can be selected by considering its capability to separate the clusters (Tikk and Gedeon, 2000; Pal, 1992). We use the interclass separability criterion for this purpose. Consider a set of N input-output pairs F = {X; y}, X = {x i |i ∈ I } where I is the index set, x i and y are column vectors. By deleting some features (input variables), we obtain a subspace X = {x i |i ⊂ I }. Suppose that the input X is clustered into clusters Ci (i = 1, . . . , Nc ) then the criterion function for feature ranking based on the interclass separability is formulated by means of the following fuzzy between-class (9) and within-class (11) scatter (covariance) matrices.
Fuzzy Models and Interpolation
Qb =
121 Nc N
μm ¯ i − v) ¯ T i j (vi − v)(v
(9)
i=1 j =1
Qi =
N
1 N j =1
T μm i j (x j − vi )(x j − vi )
(10)
j =1 μm ij
Qw =
Nc
Qi
(11)
Nc 1 vi Nc
(12)
i=1
where
v¯ =
i=1
Here, vi is given by (7). The criterion is a trade off between Q b (9) and Q w (11), often expressed as: J (X ) =
tr (Q b ) tr (Q w )
(13)
where ‘tr’ denotes the trace of a matrix. In Tikk and Gedeon (2000), the set of classes C is determined by clustering the output space using fuzzy clustering algorithms such as fuzzy c-means (Bezdek, 1981). The elements of the resulting partition matrix U = {μik |i = 1 . . . Nc , k = 1 . . . N} are then used as weights (9–12). Each feature can be ranked using the sequential backwards algorithm. Firstly, different subsets of data are obtained by temporary deleting each feature. This is followed by deleting permanently the feature whose removal resulted in the largest value. This process is repeated until all features are deleted and the order of the deleted variables gives their rank of importance. From here, we obtain a set of features ordered (ascending) by their importance. The set of true inputs can then be determined by selecting the n most important features that minimizes: ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ i
⎞ ⎫2 ⎪ ⎪ ⎜ ⎟⎪ ⎪ μci ⎬ ⎜ ⎟ c ⎜ ⎟ yi − de f uzz ⎜ ⎟ ⎪ c ⎪ ⎪ x k − vi 2/(m−1) ⎠⎪ ⎝ ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ x − v k j j =1 ⎛
(14)
where vi is given by (7). Here, defuzz(.) denotes a defuzzification method, such as the center of area (COA), that is used in the fuzzy inference.
122
L.T. Kóczy et al.
4.3.1 Finding and Z 0 The main requirement of a reasonable is that each of its elements Di can be modelled by a rule base with local validity. In this case, it is reasonable to expect Di to contain homogeneous data. The problem of finding can thus be reduced to finding homogeneous structures within the data. This can be achieved by clustering algorithms. The subspace Z 0 is used by meta-rules to select the most appropriate sub-rule base to infer the output for a given observation (i.e. system input). In general, the more separable are the elements in , the easier the sub-rule base selection becomes. Therefore, the proper subspace can be determined by means of some criterion that ranks the importance of different subspaces (combination of variables) based on their capability in separating the components Di . Unfortunately, the problems of finding and Z 0 are not independent of each other. Consider the following helicopter control fuzzy rules extracted from the hierarchical fuzzy system in (Sugeno, 1991): If distance (from obstacle) is small then hover Hover: if (helicopter) body rolls right then move lateral stick leftward if (helicopter) body pitches forward then move longitudinal stick backward On one hand, we need to rely on the feature selection technique to find a “good” subspace (Z 0 ) for the clustering algorithm to be effective. On the other hand, most of the feature selection criteria assume the existence of the clusters beforehand. One way to tackle this problem is to adopt a projection based approach. The data are first projected to each individual dimension and fuzzy clustering is performed on the individual dimensions. The number of clusters C for clustering can be chosen arbitrarily large. With the set of one dimensional fuzzy clusters obtained, the separability (importance) of each input dimension can be determined separately by computing (13). With the ranking, we can then select the n most important features to form the n-dimensional subspace Z 0 . From here, a hierarchical fuzzy system can be obtained using the following steps: 1. Perform fuzzy c-means clustering on the data along the subspace Z 0 . The optimal number of clusters, C, within the set of data is determined by means of the FS index (8). 2. The previous step results in a fuzzy partition = {D1 , . . . DC }. For each component in the partition, a meta rule is formed as: If Z 0 is Di then use Ri 3. From the fuzzy partition , a crisp partition of the data points is constructed. I.e. for each fuzzy cluster Di , the corresponding crisp cluster of points is determined as Pi = { p|μi ( p) > μ j ( p)∀ j = i }.
Fuzzy Models and Interpolation
123
4. Construct the sub-rule bases. For each crisp partition Pi , apply a feature extraction algorithm to eliminate unimportant features. The remaining features (know as true inputs) are then used by a fuzzy rule extraction algorithm to create the fuzzy rule base Ri . Here, we suggest the use of the projection-based fuzzy modeling approach (Chong et al., 2002). We remark however, that the hierarchical fuzzy modeling scheme does not place any restriction on the technique used. If more hierarchical levels are desired, repeat steps 1 – 4 with the data points in Pi using another subspace.
4.3.2 Modeling Now, the proposed fuzzy modeling technique is described. The algorithm is as follows: 1. Rank the importance of input variables, using fuzzy clustering and the Interclass Separability Criterion and let F be the set of features ordered (ascending) by their importance 2. For i = 1 . . . |F| a. Construct a hierarchical fuzzy system using the subspace Z 0 = X 1 ×. . .× X i . b. Compute εi to be the re-substitution error of the fuzzy system constructed during the i t h iteration. c. If i > 1 and εi > εi−1 , stop end for 3. Perform parameter tuning. The completed hierarchical fuzzy rule base then goes through a parameter identification process where the parameters of the membership function used in the fuzzy rules are adjusted on a trial and error basis to improve the overall performance. The method proposed in (Sugeno and Yasukawa, 1993) adjusts each of the trapezoidal fuzzy set parameters in both directions and chooses the one that gives the best system performance.
5 Bacterial Memetic Algorithm There are various successful evolutionary optimization algorithms known from the literature. The advantage of these algorithms is their ability to solve and quasi-optimize problems with non-linear, high-dimensional, multi-modal, and discontinuous character. The original genetic algorithm was based on the process of evolution of biological organisms. These processes can be easily applied in optimization problems where one individual corresponds to one solution of the problem. A more recent evolutionary technique is called the bacterial evolutionary algorithm (Nawa and Furuhashi, 1999; Botzheim et al., 2002). This mimics microbial rather than eukaryotic evolution. Bacteria share chunks of their genes rather than perform neat crossover in chromosomes. This mechanism is used in the bacterial mutation
124
L.T. Kóczy et al.
Rule 1 Rule 2
Rule 3 Rule 4
…
a31= b31= c31= d31= a32= b32= c32= d32= a3= 4.3 5.1 5.4 6.3 1.2 1.3 2.7 3.1 2.6 Input 1 Input 2
Rule R
b3= c3= 3.8 4.1 Output
d3= 4.4
Fig. 2 Encoding of the fuzzy rules
and in the gene transfer operations. For the bacterial algorithm, the first step is to determine how the problem can be encoded in a bacterium (chromosome). Our task is to find the optimal fuzzy rule base for a pattern set. Thus, the parameters of the fuzzy rules must be encoded in the bacterium. The parameters of the rules are the breakpoints of the trapezoids, thus, a bacterium will contain these breakpoints. For example, the encoding method of a fuzzy system with two inputs and one output can be seen in Fig. 2. Neural networks training algorithms can also be used for fuzzy rule optimization (Botzheim et al., 2004). They result in an accurate local optimum. Incorporating the neural network training algorithm with the bacterial approach, the advantages of both methods can be utilized in the optimization process. The hybridization of these two methods leads to a new kind of memetic algorithm, because the bacterial technique is used instead of the classical genetic algorithm, and the LevenbergMarquardt as the local searcher. This method is the Bacterial Memetic Algorithm (BMA) (Botzheim et al., 2005). The flowchart of the algorithm can be seen in Fig. 3. The difference between the BEA and the BMA is that the latter contains a local searcher step, the Levenberg-Marquardt procedure.
5.1 Initial Population Creation First the initial (random) bacteria population is created. The population consists of Nind bacteria. This means that all membership functions in the bacteria must be randomly initialized. The initial number of rules in a single bacterium is initially constant Nrule . So, Nind (k + 1)Nrule membership functions are created, where k is the number of input variables in the given problem and each membership function has four parameters.
5.2 Bacterial Mutation The bacterial mutation is applied to each bacterium one by one (Nawa and Furuhashi, 1999). First, Nclones copies (clones) of the rule base are generated. Then a certain part of the bacterium (e.g. a rule) is randomly selected and the parameters
Fuzzy Models and Interpolation
125
Fig. 3 Flowchart of the Bacterial Memetic Algorithm
of this selected part are randomly changed in each clone (mutation). Next all the clones and the original bacterium are evaluated by an error criterion. The best individual transfers the mutated part into the other individuals. This cycle is repeated for the remaining parts, until all parts of the bacterium have been mutated and tested. At the end the best rule base is kept and the remaining Nclones are discharged.
5.3 Levenberg-Marquardt Method After the bacterial mutation step, the Levenberg-Marquardt algorithm (Marquardt, 1963) is applied for each bacterium. In this step a minimization criterion has to be employed also, which is related to the quality of the fitting. The training criterion that will be employed is the usual Sum-of-Square-of-Errors (SSE):
=
2 t − y 2
e [k]2 = 2
(15)
where t stands for the target vector, y for the output vector, e for the error vector, and denotes the 2-norm. It will be assumed that there are m patterns in the training set.
126
L.T. Kóczy et al.
The most commonly used method to minimize (15) is the Error-Back-Propagation (BP) algorithm, which is a steepest descent algorithm. The BP algorithm is a firstorder method as it only uses derivatives of the first order. If no line-search is used, then it has no guarantee of convergence and the convergence rate obtained is usually very slow. If a second-order method is to be employed, the best to use is the Levenberg-Marquardt (LM) algorithm (Marquardt, 1963), which explicitly exploits the underlying structure (sum-of-squares) of the optimization problem on hand. Denoting by J the Jacobian matrix: J [k] =
∂y(x ( p) ) [k] ∂ par [k]
(16)
where the vector par contains all membership functions’ parameters (all breakpoints in the membership functions), and k is the iteration variable. The new parameter values can be obtained by the update rule of the LM method: −1 par [k + 1] = par [k] − J T [k] J [k] + α I J T [k] e [k]
(17)
In (17), α is a regularization parameter, which controls the both the search direction and the magnitude of the update. The search direction varies between the GaussNewton direction and the steepest direction, according to the value of α. This is dependent on how well the actual criterion agrees with a quadratic function, in a particular neighborhood. The good results presented by the LM method (compared with other second-order methods such as the quasi-Newton and conjugate gradient methods) are due to the explicit exploitation of the underlying characteristics of the optimization problem (a sum-of-square of errors) by the training algorithm. The Jacobian matrix with respect to the parameters in the bacterium must be computed. This can be done on a pattern by pattern basis (Botzheim et al., 2004; Ruano et al., 2001).
Jacobian computation Because trapezoidal membership functions are used, and each trapezoid has four parameters, thus the relative importance of the j t h fuzzy variable in the i t h rule is: di j − x j x j − ai j μi j x j = Ni, j,1 (x j ) + Ni, j,2 (x j ) + Ni, j,3 (x j ) bi j − ai j d i j − ci j where ai j ≤ bi j ≤ ci j ≤ di j must hold and:
(18)
Fuzzy Models and Interpolation
127
Ni, j,1 (x j ) = Ni, j,2 (x j ) = Ni, j,3 (x j ) =
i f x j ∈ ai j , bi j if xj ∈ / ai j , bi j i f x j ∈ b i j , ci j if xj ∈ / b i j , ci j i f x j ∈ ci j , d i j if xj ∈ / ci j , d i j
1, 0, 1, 0, 1, 0,
(19)
The activation degree of the i t h rule (the t-norm is the minimum): n
wi = min μi j (x j )
(20)
j =1
where n is the number of input dimensions. wi is the importance of the i t h rule if the input vector is x and μi, j (x j ) is the j t h membership function in the i t h rule. The i t h output is being cut in the height wi . Then defuzzification method is calculated with the COG output:
y(x) =
R
!
R
!
yμi (y)d y i=1 y∈suppμi (y)
(21)
μi (y)d y i=1 y∈suppμi (y) where R is the number of rules. If this defuzzification method is used, the integrals can be easily computed. In (21) y(x) will be the following: R
1 y(x) = 3
(C i +Di +E i )
i=1 R i=1
2wi (di −ai )+wi2 (ci +ai −di −bi )
(22)
Ci = 3wi (di2 − ai2 )(1 − wi ) Di = 3wi2 (ci di − ai bi ) E i = wi3 (ci − di + ai − bi )(ci − di − ai + bi ) Then, the Jacobian matrix can be written as: J=
where
∂y(x ( p) ) ∂a11
∂y(x ( p) ) ∂y(x ( p) ) ∂y(x ( p) ) ∂y(x ( p) ) ··· ··· ··· ∂b11 ∂a12 ∂d1 ∂d R
(23)
128
L.T. Kóczy et al.
∂y(x ( p)) ∂y ∂wi ∂μi j = ∂ai j ∂wi ∂μi j ∂ai j
(24)
∂y(x ( p)) ∂y ∂wi ∂μi j = ∂bi j ∂wi ∂μi j ∂bi j ∂y(x ( p)) ∂y ∂wi ∂μi j = ∂ci j ∂wi ∂μi j ∂ci j ∂y(x ( p)) ∂y ∂wi ∂μi j = ∂di j ∂wi ∂μi j ∂di j From (20), wi depends on the membership functions, and each membership function depends only on four parameters (breakpoints). So, the derivatives of wi are ⎧ ⎨
∂wi 1, = ⎩ 0, ∂μi j
n
i f μi j = min μik k=1
other wi se
(25)
and the derivatives of the membership functions will be ( p)
x j − bi j ∂μi j ( p) = Ni, j,1 (x j ) ∂ai j (bi j − ai j )2
(26)
( p)
ai j − x j ∂μi j ( p) = Ni, j,1 (x j ) ∂bi j (bi j − ai j )2 ( p)
di j − x j ∂μi j ( p) = Ni, j,3 (x j ) ∂ci j (di j − ci j )2 ( p)
x j − ci j ∂μi j ( p) = Ni, j,3 (x j ) ∂di j (di j − ci j )2 ∂y ∂wi
and the derivatives of the output membership functions parameters have to be also computed. From (22): ∂F ∗
∂G
∗
i i ∂y 1 den ∂∗i ∗ − num ∂∗i ∗ = ∂∗i ∗ 3 (den)2
(27)
where ∗i = wi , ai , bi , ci , di , den and num are the denominator and the numerator of (22), respectively Fi ∗ and G i ∗ are the i ∗ member of the sum in the numerator and the denominator. The derivatives will be as follows:
Fuzzy Models and Interpolation
∂ Fi = 3(di2 − ai2 )(1 − 2wi ) + 6wi (ci di − ai bi ) ∂wi + 3wi2 [ (ci − di )2 − (ai − bi )2 ] ∂ Gi = 2(di − ai ) + 2wi (ci + ai − di − bi ) ∂wi ∂ Fi ∂ai ∂ Gi ∂ai ∂ Fi ∂bi ∂ Gi ∂bi ∂ Fi ∂ci ∂ Gi ∂ci ∂ Fi ∂di ∂ Gi ∂di
= −6wi ai + 6wi2 ai − 3wi2 bi − 2wi3 (ai − bi )
129
(28)
(29)
= −2wi + wi2 = −3wi2 ai + 2wi3 (ai − bi ) = −wi2 = 3wi2 di − 2wi3 (di − ci ) = wi2 = 6wi di − 6wi2 di + 3wi2 ci + 2wi3 (di − ci ) = 2wi − wi2
5.4 Gene Transfer The gene transfer operation allows the recombination of genetic information between two bacteria. First the population must be divided into two halves. The better bacteria are called the superior half, the other bacteria are called the inferior half. One bacterium is randomly chosen from the superior half, this will be the source bacterium, and another is randomly chosen from the inferior half, this will be the destination bacterium. A part (e.g. a rule) from the source bacterium is chosen and this part will overwrite a rule of the destination bacterium. This cycle is repeated for Nin f times, where Nin f is the number of “infections” per generation.
5.5 Stop Condition If the population satisfies a stop condition or the maximum number of generation Ngen is reached then the algorithm ends, otherwise it returns to the bacterial mutation step.
130
L.T. Kóczy et al.
6 Conclusion Fuzzy rule base reduction methods and fuzzy modeling techniques were introduced in this paper. We discussed how the complexity of a fuzzy rule base can be reduced. The classical fuzzy models deal with dense rule bases where the universe of discourse is fully covered. The reduction of the number of linguistic terms in each dimension leads to sparse rule bases and rule interpolation. By decreasing the dimension of the sub-rule bases by using meta-levels or hierarchical fuzzy rule bases the complexity can also be reduced. It is also possible to apply interpolation in hierarchical rule bases. The paper discussed automatic methods to determine the fuzzy rule base. After an overview of some clustering methods, the bacterial memetic algorithm was introduced. This approach combines the bacterial evolutionary algorithm and a gradient based learning method, namely the Levenberg-Marquardt procedure. By this combination the advantages of both methods can be exploited. Acknowledgments Supported by the Széchenyi University Main Research Direction Grant 2005, a National Scientific Research Fund Grant OTKA T048832 and the Australian Research Council. Special acknowledgements to Prof. Antonio Ruano and Cristiano Cabrita (University of Algarve, Faro, Portugal) and to Dr. Alex Chong (formerly with Murdoch University, Perth, Australia).
References P. Baranyi, D. Tikk, Y. Yam, and L. T. Kóczy: “lnvestigation of a new alpha-cut Based Fuzzy lnterpolation Method”, Tech. Rep., Dept. Of Mechanical and Automation Engineering, The Chinese University of Hong Kong, 1999. J. C. Bezdek, Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press, 1981. J. Botzheim, B. Hámori, L. T. Kóczy, and A. E. Ruano, “Bacterial algorithm applied for fuzzy rule extraction”, in Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, IPMU 2002, pp. 1021–1026, Annecy, France, 2002. J. Botzheim, C. Cabrita, L. T. Kóczy, and A. E. Ruano, “Estimating Fuzzy Membership Functions Parameters by the Levenberg-Marquardt Algorithm”, in Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2004, pp. 1667–1672, Budapest, Hungary, 2004. J. Botzheim, C. Cabrita, L. T. Kóczy, and A. E. Ruano, “Fuzzy rule extraction by bacterial memetic algorithm”, in Proceedings of the 11th World Congress of International Fuzzy Systems Association, IFSA 2005, pp. 1563–1568, Beijing, China, 2005. A. Chong, T. D. Gedeon, and L. T. Kóczy, “Projection Based Method for Sparse Fuzzy System Generation”, in Proceedings of 2nd WSEAS International Conference on Scientific Computation and Soft Computing, pp. 321–325, Crete, 2002. Y. Fukuyama and M. Sugeno, “A new method of choosing the number of clusters for fuzzy c-means method” in Proceedings of the 5th Fuzzy System Symposium, 1989. T. D. Gedeon and L. T. Kóczy, “Conservation of fuzziness in rule interpolation”, Intelligent Technologies, vol. 1 International Symposium on New Trends in Control of Large Scale Systems, Herlany, 13–19, 1996. T. D. Gedeon, P. M. Wong, Y. Huang, and C. Chan, “Two Dimensional Fuzzy-Neural Interpolation for Spatial Data” in Proceedings Geoinformatics’97: Mapping the Future of the Asia-Pacific, Taiwan, vol. 1: 159–166, 1997.
Fuzzy Models and Interpolation
131
T. D. Gedeon, K. W. Wong, P. M. Wong, and Y. Huang, “Spatial Interpolation Using Fuzzy Reasoning”, Transactions on Geographic Information Systems, 7(1), pp. 55–66, 2003. J. Ihara, “Group method of data handling towards a modelling of complex systems – IV” Systems and Control (in Japanese), 24, pp. 158–168, 1980. L. T. Kóczy and K. Hirota, “Approximate reasoning by linear rule interpolation and general approximation” International Journal of Approximate Reasoning, 9, pp. 197–225 (1993a). L. T. Kóczy and K. Hirota, “Interpolative reasoning with insufficient evidence in sparse fuzzy rule bases” Information Sciences, 71, pp. 169–201 (1993b). L. T. Kóczy and K. Hirota, “Interpolation in structured fuzzy rule bases” in Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE’93, pp. 803-808, San Francisco (1993c). E. H. Mamdani and S. Assilian, “An experiment in linguistic synthesis with a fuzzy logic controller,” Int. J. Man-Mach. Stud., vol. 7, pp. 1–13, 1975. D. Marquardt, “An Algorithm for Least-Squares Estimation of Nonlinear Parameters”, J. Soc. Indust. Appl. Math., 11, pp. 431–441, 1963. N. E. Nawa and T. Furuhashi, “Fuzzy System Parameters Discovery by Bacterial Evolutionary Algorithm”, IEEE Tr. Fuzzy Systems vol. 7, pp. 608–616, 1999. S. K. Pal, “Fuzzy set theoretic measure for automatic feature evaluation – II” Information Sciences, pp. 165–179, 1992. A. E. Ruano, C. Cabrita, J. V. Oliveira, L. T. Kóczy, and D. Tikk, “Supervised Training Algorithms for B-Spline Neural Networks and Fuzzy Systems”, in Proceedings of the Joint 9th IFSA World Congress and 20th NAFIPS International Conference, Vancouver, Canada, 2001. M. Sugeno and T. Yasukawa, “A fuzzy-logic-based approach to qualitative modelling”, IEEE Transactions on Fuzzy Systems, 1(1): pp. 7–31, 1993. M. Sugeno, T. Murofushi, J. Nishino, and H. Miwa, “Helicopter flight control based on fuzzy logic”, in Proceedings of IFES’91, Yokohama, 1991. T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control”, IEEE Trans. Syst., Man, Cybern., vol. SMC-15, pp. 116–132, 1985. D. Tikk and T. D. Gedeon, “Feature ranking based on interclass separability for fuzzy control application”, in Proceedings of the International Conference on Artificial Intelligence in Science and Technology (AISAT’2000) pp. 29–32, Hobart, 2000. D. Tikk, Gy. Biró, T. D. Gedeon, L. T. Kóczy, and J. D. Yang, “Improvements and Critique on Sugeno’s and Yasukawa’s Qualitative Modeling”, IEEE Tr. on Fuzzy Systems, vol.10. no.5., October 2002. L. X. Wang and J. M. Mendel, “Generating fuzzy rules by learning from examples” IEEE Transactions on Systems, Man, and Cybernetics, 22(6): pp. 1414–1427, 1992. K. W. Wong, C. C. Fung, and P. M. Wong, “A self-generating fuzzy rules inference systems for petrophysical properties prediction” in Proceedings of IEEE International Conference on Intelligent Processing Systems, Beijing, 1997. M. S. Yang and K. L. Wu, “A New Validity Index For Fuzzy Clustering” in Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 89–92, Melbourne, 2001. L. A. Zadeh, “Outline of a new approach to the analysis of complex systems,” IEEE Trans. Syst., Man, Cybern., vol. SMC-1, pp. 28–44, 1973.
Computing with Antonyms E. Trillas1 , C. Moraga2, S. Guadarrama3, S. Cubillo and E. Castiñeira
Abstract This work tries to follow some agreements linguistic seem to have on the semantical concept of antonym, and to model by means of a membership function an antonym a P of a predicate P, whose use is known by a given μ P . Keywords: Antonym · Negate · Computing with Words · Semantics
1 Introduction Nature shows a lot of geometrical symmetries, and it is today out of discussion that the mathematical concept of symmetry is of paramount importance for Nature’s scientific study. Language also shows the relevant symmetry known as Antonymy that, as far as the authors know, has not received too much attention form the point of view of Classical Logic, perhaps because of the scarcely syntactical character of that Language’s phenomenon. However, and may be as Fuzzy Logic deals more than Classical Logic with semantical aspects of Language, the concept of Antonym was early considered in Fuzzy Logic (see [1]) and originates some theoretical studies as it is the case of papers [4] and [3]. In any Dictionary an antonym of a word P is defined as one of its opposite words a P, and one can identify “opposite” as “reached by some symmetry on the universe of discourse X” on which the word P is used.
E. Trillas · C. Moraga · S. Guadarrama Technical University of Madrid, Department of Artificial Intelligence S. Cubillo · E. Castiñeira Technical University of Madrid, Department of Applied Mathematics, Madrid, Spain 1
Corresponding author:
[email protected]. This work has been partially supported by CICYT (Spain) under project TIN2005-08943-C02-01
2
The work of C. Moraga has been supported by the Spanish State Secretary of Education and Universities of the Ministry of Education, Culture and Sports (Grant SAB2000-0048) and by the Social Fund of the European Community.
3
The work of S. Guadarrama has been supported by the Ministry of Education, Culture and Sports of Spain (Grant AP2000-1964).
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
133
134
E. Trillas et al.
In [11] it was settled that the membership function μ P of a fuzzy set in X labelled P (equivalently, the compatibility function for the use on X of predicate P) is nothing else than a general measure on the set X (P) of atomic statements “x is P” for any x in X, provided that a preorder translating the secondary use of P on X has been established on X (P).
2 Fuzzy Sets Let P be a predicate, proper name of a property, common name or linguistic label. The use of P on a universe of objects X not only depends on both X and P themselves but also on the way P is related to X. Three aspects of the use of a predicate P on X can be considered: • Primary use, or the descriptive use of the predicate, expressed by the statements “x is P”, for x in X. • Secondary use, or the comparative use of the predicate, expressed by the statements “x is less P than y”, for x, y in X. • Tertiary use, or the quantitative use of the predicate, expressed by the statements “degree up to which x is P”, for x in X. A fuzzy set labeled P on a universe of discourse X is nothing else than a function μ P : X → [0, 1]. On the other hand, when using P on X by means of a numerical characteristic of P, φ P : X → S, the atomic statement “x is P” is made equivalent to a corresponding statement “φ P (x) is P ∗ ”; this is the case, for example, of interpreting “x is cold” as “temperature of x is low”, where low is predicated on an interval S = [a, b] of numerical values a temperature can take. At this point, it is not to be forgotten that six different sets are under consideration. The ground set X, the set X (P) of the statements “x is P”, the set of the values of a numerical characteristic φ, the set of the statements “φ(x) is P ∗ ”, the partially ordered set L of the degrees up to which “x is P”, and the partially ordered set L ∗ of the degrees up to which “φ(x) is P ∗ ”. (See Fig. 1). We will say that the use of P on X is R-measurable (see [11]) if there exists a mapping φ : X → S with S a non-empty subset of R endowed with a pre-order ≤ P ∗ and a ≤ P ∗ -measure m on S such that: 1. φ is injective. 2. φ(x) ≤ P ∗ φ(y) iff “x is less P than y”, for x,y in X. 3. The degree up to which “x is P” is given by m(φ(x)) for each x in X. After an experimental process, a mathematical model for the use of P on X is obtained. Of course the model can be reached thanks to the fact that P is used on X by means of a numerical characteristic given by φ. To measure, is, after all, to evaluate something that the elements of a given set exhibit, relatively to some characteristic they share. We do always measure concrete aspects of some elements; for example, the height but not the volume of a building,
Computing with Antonyms
135
X
X (P )
x
" x is P"
y
" y is P"
( L, ≤ P )
"y is less P than x"
S
X ( P* )
φ ( x)
"φ ( x ) is P*"
φ ( y)
"φ ( y ) is P* "
( L* , ≤ P* )
Fig. 1 Sets under consideration
the wine’s volume in barrels but not the wine’s alcoholic rank, the probability of getting an odd score when throwing a die but not the die’s size, and so on. Hence, at the very beginning we consider a set X of objects sharing, possibly among others, a common characteristic K of which we know how it varies on X, that is, for all x, y ∈ X we know whether it is or it is not the case that “x shows K less than y shows K”. Therefore, we at least know the qualitative and perception’s based relation K on X × X defined as: “x K y if and only if x shows K less than y shows K”. Since it does not seem too strong to assume that this relation verifies the reflexive and transitive properties, we will consider that K is a pre-order. A fuzzy set P on a universe of discourse X is only known, as discussed earlier, once a membership function μ P : X → [0, 1] is fixed, and provided that this function reflects the meaning of the statements “x is P” for all x in X. This is the same as to consider that μ P translates the use of P on X, and in this sense μ p is a model of this use.
3 Antonyms 3.1 Linguistic Aspects of Antonyms Antonymy is a phenomenon of language linking couples of words (P, Q) grammarians call pairs of antonyms, that seems very difficult to define and study in a formal way. Language is too complex to admit strictly formal definitions like it happens in other dominions after experimentation and mathematical modelling. Nevertheless, there are some properties that different authors attribute to a pair of antonyms (see [5], [6], [7], [8]). These properties can be summarized in the following points:
136
E. Trillas et al.
1. In considering a pair (P, Q) of antonyms, Q is an antonym of P and P an antonym of Q. Then, from Q = a P, P = a Q it follows P = a(a P) and Q = a(a Q): in a single pair of antonyms the relation established by (P, a P) is a symmetry. 2. For any object x such that the statements “x is P” and “x is a P” are meaningful, it is “If “x is a P”, then it is not “x is P” ”. The contrary “If it is not “x is P”, then “x is a P” ”, does not hold. A clear example is given by the pair (full, empty) when we say “If the bottle is empty, it is not full” but not “If the bottle is not full, it is empty”. For short, a P implies Not-P, but it is not the case that Not-P implies a P, except in some particular situations on which it happens that (P, Not-P) is a pair of antonyms. That is, a P = Not-P happens but as a limiting case. 3. Pairs of antonyms (P, Q) are “disjoint”. This point is difficult to understand without any kind of formalization allowing to precisely know what grammarians understand by disjoint. Anyway, point (2) suggests that it means “contradiction”, a concept that, as it is well known, is coincidental with that of “incompatibility” in Boolean Algebras, but not always in other logical structures. 4. The more applicable is a predicate P to objects, the less applicable is a P to these objects, and the less applicable is P, the more applicable is a P. For example, with the pair (tall, short) and a subject x, the taller is x, the less is x short and the less tall is x, the shorter is she/he. 5. For a P being a “good” antonym of P, and among the objects x to which P and a P apply there should be a zone of neutrality, that is, a subset of X to which neither P nor a P can be applied. In what follows we will try to take advantage of points 1 to 5, by translating these properties to the compatibility functions μ P and μa P to obtain the basic relations that a fuzzy set μa P should verify to be considered the antonym of a given fuzzy set μ P , and to define some types of antonymy, synthesizing and extending the work that was done in [10], [13] and [14].
3.2 Antonyms and Fuzzy Sets Since symmetries will be used below, we specify them first in a general way. As was explained in Sect. 2, there are three sets on which symmetries are to be considered: the universe [a, b] of numerical values that the elements of the universe of discourse X may take due to the numerical characteristic of P, the unit interval [0, 1] and the set F ([a, b]) of all fuzzy sets of [a, b]. Symmetries on [a, b] are just mappings α : [a, b] → [a, b] such that α 2 = id[a,b] and α(a) = b (and hence α(b) = a). Symmetries on [0, 1] are mappings N : [0, 1] → [0, 1] such that N 2 = id[0,1] and N(0) = 1 (and hence N(1) = 0). And a symmetry on F ([a, b]) should be a mapping A : F ([a, b]) → F ([a, b]) such that A2 = idF([a,b]) and A(μ∅ ) = μ[a,b] (and hence A(μ[a,b]) = μ∅ ).
Computing with Antonyms
137
It is easy to prove that for each couple of symmetries (N, α) the combined mapping N/α : F ([a, b]) → F ([a, b]), defined by (N/α)(μ) = N ◦μ◦α, is a symmetry on F ([a, b]) (see [13]). Furthermore, if N is a strong negation (that is, x ≤ y implies N(y) ≤ N(x)), then N/α is an involution (that is, a dual automorphism of period two) on F ([a, b]). S. Ovchinnikov (see [4]) proved that any such involution A : F ([a, b]) → F ([a, b]) is of the form A(μ)(x) = Nx (μ(α(x))) for a symmetry α on [a, b] and a family of dual automorphisms {Nx ; x ∈ [a, b]} on [0, 1] such that Nα(x) = Nx−1 . Of course, if α = id[a,b] , functions Nx are strong-negations. Negations N/α are not always functionally expressible, that is, it does not generally exist a symmetry N ∗ : [0, 1] → [0, 1] such that (N/α)(μ) = N ∗ (μ). For example, if X = [0, 1] with N = α = 1 − id[0,1] , it results (N/α)(μ)(x) = 1 − μ(1 − x), and (N/α)(μ)(0) = 1 − μ(1), and it is enough to take μ such that μ(1) = 1, μ(0) = 0 to have N/α(μ)(0) = 0, but N ∗ ◦ μ(0) = N ∗ (μ(0)) = N ∗ (0) = 1. Let us try to model point 4 of the former section. If μ P and μa P are the compatibility functions of the predicates P and a P in the universe of discourse X, the implication “if 0 < μ P (x) ≤ μ P (y), then μa P (y) ≤ μa P (x)”, must be satisfied. Nevertheless, this condition would agree to the requirements when working with predicates that, translated to their numerical characteristic in [a, b], are monotonic, that is, the preorder induced by the predicate P in [a, b] (see [11]) is a total order, but not in general; this is why a new point of view for point 4 of the former section could be necessary.
3.3 Example 1 Given the predicate P = “close to 4” on the interval [0, 10], it should be a P = “far from 4”. But by taking the involution α given by α(x) = 10−x it results μa P = μ P ◦ α, as it is shown in the Fig. 2, that obviously, does not correspond to “far form 4”.
1
aP
P
Fig. 2 Solid line: Close to 4. Dotted line: Close to 6
0
1
2
3
4
5
6
7
8
9
10
138
E. Trillas et al.
Fig. 3 Solid line: Close to 4. Dotted line: Far from 4
1
aP
P
0
1
2
3
4
5
6
7
8
9
10
If we consider α defined through the two intervals [0, 4], [4, 10], with α1 : [0, 4] → [0, 4], α1 (x) = 4 − x, and α2 : [4, 10] → [4, 10], α2 (x) = 14 − x, we obtain μa P as shown in Fig. 3, that better reflects “far from 4”. Since the use of a predicate P on X is here viewed as: 1. The ordering P given in X is “x is less P than y” (the comparative use of P), and 2. A P -measure μ(x is P) = μ P (x) (measuring to what extent x is P), It seems reasonable that the use of a P should be viewed as: 1. The ordering a P =−1 P , 2. A a P -measure μ(x is a P) = μ(α P (x is P), or μa P = μ P (α P (x)), with α P a symmetry (involution) for −1 P =a P . As the ordering P is obtained in practice from the function μ P by looking at the subintervals of [a, b] on which μ P is either decreasing or non-decreasing, it results that α P should be an involution (symmetry) at each one of them. Then, what is more natural is to define α : [a, b] → [a, b] by ⎧ α1 (x), if x ∈ [a, a1] ⎪ ⎪ ⎪ ⎨ α2 (x), if x ∈ [a1, a2 ] α(x) = .. .. ⎪ . . ⎪ ⎪ ⎩ αn (x), if x ∈ [an−1 , b] with involutions αi : [ai−1 , ai ] → [ai−i , ai ] Of course, what it is not obtained is that α reverses the order ≤, but it reverses the order ≤ P . Hence, to obtain a perfect antonym (see below, Definition 4.17) of P by using μ P the following steps should be observed: 1. 2. 3. 4.
Determine the order P by looking at function μ P . Consider the subintervals [ai−1 , ai ]. Construct an αi in each interval [ai−1 , ai ]. Obtain μa P (x) = μ P (αi (x)), if x ∈ [ai−1 , ai ].
Computing with Antonyms
139
It is immediate that this is the algorithm used in the example with the subintervals [1, 4] and [4, 10].
3.4 Example 2 Let us consider the predicate P = “very close to 0 or close to 10” given in X = [0, 10] by the membership function in Fig. 4. 1
µ P (x ) 3 8 9 64
0
1
2
2.5 3
μ P (x) =
4 ⎧ ⎨ ⎩
5
(x−4)2 16
0
x−6 4
6
7
7.5
8
9
10
if 0 ≤ x ≤ 4 if 4 < x ≤ 6 if 6 < x ≤ 10
Fig. 4 Representation of “very Close to 0 or Close to 10”
A possible way to find an antonym will be, as in [14], considering two strong negations in two intervals of X, as for example α1 : [0, 5] → [0, 5] given by α1 (x) = 5 − x and α2 : [5, 10] → [5, 10] given by α2 (x) = 15 − x. So, the fuzzy set μa P (x) = μ P (α(x)), where α(x) = α1 (x) if x ∈ [0, 5] and α(x) = α2 (x) if x ∈ [5, 10], will be Now this fuzzy set can be understood as a membership function of the antonym 9 ≤ of P: “Very far from 0 and far from 10” (see fig. 5). Notice that it is μ P ( 52 ) = 64 15 3 5 9 15 3 μ P ( 2 ) = 8 and also μa P ( 2 ) = 64 ≤ μa P ( 2 ) = 8 . This would lead us to the necessity of considering the condition “If μ P (x) ≤ μ P (y) then μa P (x) ≥ μa P (y)” in each interval in which μ P is monotonic, that is, to consider the preorder induced by μ P . As it was introduced in [11], if μ P : X → [0, 1] is the membership function of a predicate P in X and, being [a, b] the interval in which its numerical characteristic takes values, if there exists apartition {c0 , c1 , . . . cn } of this interval in such a way that functions μ P ◦ (Φ P )−1 |[ci−1 ,ci ] are monotonic, then a preorder (n-sharpened) can be defined in [a, b] in the following way: If r, s ∈ [a, b] it is r ≤μ P s if and only if exists i ∈ {1, . . . , n} such tat r, s ∈ [ci−1 , ci ] and μ P (r ) ≤ μ P (s), being μ P = μ P ◦ (Φ P )−1 . Furthermore, the predicate P induces also a preorder in X: If x, y ∈ X it is defined
140
E. Trillas et al. 1
µ aP (x )
3 8
9 64
0
1
2
2.5
3
μa P (x) =
4
⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
5
(1−x)2 16
0
9−x 4
6
7
7.5
8
9
10
if 1 < x ≤ 5 if 0 ≤ x ≤ 1 or 9 < x ≤ 10 if 5 < x ≤ 9
Fig. 5 Representation of “very Far from 0 and Far from 10”
x ≤ P y ⇔ φ P (x) ≤μ P φ P (y), which we must take into account to formalize point 4.
4 Regular and Perfect Antonyms Pairs Let X be a universe of discourse and P a predicate naming a gradual property of the elements of X, that is, that with each atomic statement “x is P” there is associated a number μ P (x) ∈ [0, 1] reflecting the degree up to which x verifies the property named by P. To approach the concept of antonym based on the former five points, let us consider involutive mappings A : F (X) → F (X), that is, verifying A2 = idF(X ) . Definition 4.1 A couple (μ, A(μ)) is a pair of antonyms if A.1. It exists a strong-negation Nμ such that A(μ) ≤ Nμ ◦ μ (that is, μ and A(μ) are Nμ -contradictory fuzzy sets) A.2. If x ≤μ y, then y ≤ A(μ) x Of course, if the preorder induced by μ on X is a total order (that is, μ is monotonic), then the axiom A.2 turns into: “If 0 < μ(x) ≤ μ(y), then A(μ)(x) ≥ A(μ)(y)”. For the sake of simplicity, and if there is no confusion, let us write μa = A(μ), μ = Nμ ◦ μ. Of course, rules A.1 and A.2 do not facilitate a unique A(μ) for each μ ∈ F (X) and some definitions should be introduced both to distinguish among diverse kinds of antonyms and to consider former points 1 to 5.
Computing with Antonyms
141
Theorem 4.2 If (μ, μa ) is a pair of antonyms: a) μ(x) ≥ ε implies μa (x) ≤ Nμ (ε), and μa (x) ≥ ε implies μ(x) ≤ Nμ (ε), for all ε ∈ [0, 1]. b) If μ(x) = 1, then μa (x) = 0. If μa (x) = 1, then μ(x) = 0. c) Property A.1 is equivalent to μ ≤ Nμ ◦ μa . d) Property A.1 is equivalent to the existence of a t-norm Wμ in the Łukasiewicz’s family such that Wμ ◦ (μ × μa ) = μ∅ . That is, the Wμ -intersection of μ and μa is empty, or μ and μa are Wμ -disjoint. Proof. (a) From ε ≤ μ(x), it follows Nμ (μ(x)) ≤ Nμ (ε) and μa (x) ≤ Nμ (ε), for any ε ∈ [0, 1]. (b) Take ε = 1 in (a). (c) From μa ≤ Nμ ◦ μ follows Nμ ◦ Nμ ◦ μ = μ ≤ Nμ ◦ μa , and reciprocally. (d) As it is well known (see [2]), it exists a strictly increasing function gμ : [0, 1] → [0, 1] verifying gμ (0) = 0, gμ (1) = 1 and such that Nμ = gμ−1 ◦ (1 − id) ◦ gμ . Hence, A.1 can be written as μa (x) ≤ gμ−1 (1 − gμ (x))) or, equivalently, gμ (μa (x)) + gμ (μ(x)) − 1 ≤ 0. This inequality is equivalent to Max(0, gμ (μa (x)) + gμ (μ(x)) − 1) = 0, or to 0 = gμ−1 (W (gμ (μa (x)), gμ (μ(x)))) = gμ−1 ◦ W ◦ (gμ × gμ )(μ(x), μa (x)). Then, with Wμ = gμ−1 ◦ W ◦ (gμ × gμ ), it is Wμ ◦ (μ × μa ) = μ∅ . The reciprocal is immediate. Definition 4.3 A pair of antonyms (μ, μa ) is strict whenever μa μ , that is, if it exists some x ∈ X such that μa (x) < Nμ (μ(x)). Otherwise, the pair is non-strict or complementary and μa = μ . Theorem 4.4 (μ, μa ) is a non-strict pair of antonyms if and only if Sμ ◦(μ×μa ) = μ X , with Sμ the t-conorm Nμ -dual of the t-norm Wμ in theorem 4.2. Proof. If the pair (μ, μa ) is non-strict, from μa = μ it follows Sμ ◦ (μ × μa ) = Nμ ◦ Wμ ◦ (Nμ ◦ μ × Nμ ◦ μ ) = Nμ ◦ Wμ ◦ (μ × μ) = Nμ ◦ μ∅ = μ X . Reciprocally, if Sμ ◦ (μ × μa ) = μ X , it follows Wμ ◦ (Nμ ◦ μ × Nμ ◦ μa ) = μ∅ and gμ−1 ◦ W ◦ (gμ ◦ Nμ ◦ μ × gμ ◦ Nμ ◦ μa ) = μ∅ , or W ◦ (1 − gμ ◦ μ × 1 − gμ ◦ μa ) = μ∅ , that is, for any x ∈ X Max(0, 1 − gμ (μ(x)) − gμ (μa (x))) = 0. Then, 1 − gμ(μ(x)) − gμ(μa (x)) ≤ 0, and gμ−1 (1 − gμ(μ(x))) ≤ μa (x). Hence, ≤ μa (x) and, by A.1, μ = μa .
μ (x)
Corollary 4.5 A pair of antonyms (μ, μa ) is strict if and only if Sμ ◦ (μ × μa ) = μ X , with Sμ the t-conorm of theorem 4.4. Remark If μ is a fuzzy set labelled P (μ = μ P ) and μa can be identified by a label Q, then it is Q = a P and the words (P, Q) form a pair of antonym words. If
142
E. Trillas et al.
this pair is non-strict then a P ≡ ¬P and if it is strict a P is not coincidental with ¬P. In general, for a pair of antonym words (P, a P) translated by means of their use in X in the fuzzy sets μ p and μa P , respectively, it happens (theorem 4.2.d) that the intersection μ P ∩ μa P obtained by the t-norm Wμ is empty: μ P ∩ μa P = μ∅ , that is, in this sense μ P and μa P are disjoint fuzzy sets. Of course, this emptiness is equivalent to the contrariness of μ P and μa P relatively to a strong negation N P that allows to define μ¬ P = N P ◦ μ P (μa P ≤ N P ◦ μ P , the version of A.1 with Nμ = N P ). More again, the fuzzy sets μ p and μa P are a partition of X on the theory (F (X), T P , S P , N P ) (with T P = Wμ , S P = Wμ∗ and N P = Nμ , being Wμ∗ the t-conorm Nμ -dual of the t-norm Wμ , see theorem 4.2) if and only if a P ≡ ¬P (theorem 4.4), as only in this case is μ P ∪ μa P = μ X and μ P ∩ μa P = μ∅ . Definition 4.6 Given a pair of antonyms (μ, μa ) and a t-norm T , μm = T ◦ (μ × (μa ) ) = T ◦ (Nμ ◦ μ × Nμ ◦ μa ), is the T -medium fuzzy set between μ and μa . Theorem 4.7 For any t-norm T , μm is 1-normalized if and only if it exists x ∈ X such that μ(x) = μa (x) = 0. Proof. If μ(x) = μa (x) = 0, it is μm (x) = T (Nμ (0), Nμ (0)) = T (1, 1) = 1, and μm is 1-normalized at x. Reciprocally, if μm is 1-normalized at a point x, from 1 = μm (x) = T (Nμ (μ(x)), Nμ (μa (x))) it follows 1 = Nμ (μ(x)) = Nμ (μa (x)), or μ(x) = μa (x) = 0, as T ≤ Min. Hence, that μm be normalized is equivalent to {x ∈ X; μ(x) = μa (x) = 0} = {x ∈ X; μm (x) = 1} = ∅. Let us denote this set by N (μ, μa ) and call it the Neutral Zone of X for the pair of antonyms (μ, μa ). Remark: Logical and Linguistic contradiction As it is known (see [12]), a fuzzy set μ P is selfcontradictory (with respect to the considered negation) when μ P ≤ μ¬ P = μ P = N ◦ μ P . This concept is directly derived form the logical implication “If P, then not P”, but in the language it is often used the contradiction as “If P, then aP” Definition 4.8 A fuzzy set labelled P is linguistically selfcontradictory if μ P ≤ μa P for some antonym a P of P. Of course, linguistic selfcontradiction implies logical selfcontradiction: As μa P ≤ μ¬ P , from μ P ≤ μa P it follows μ P ≤ μ¬ P . As it is also known (see [15]), in very general structures comprising all theories of fuzzy sets and, in particular, standard ones ([0, 1] X , T, S, N), it holds: 1. μ P ∧ μ P ≤ (μ P ∧ μ P ) , the principle of non-contradiction (NC). 2. (μ P ∨ μ P ) ≤ μ P ∨ μ P , the principle of excluded-middle (EM). In this formulation, the principle NC says that μ P ∧ μ P is logically selfcontradictory, and the principle EM says that (μ P ∨ μ P ) is also selfcontradictory as (μ P ∨ μ P ) ≤ μ P ∨ μ P = ((μ P ∨ μ P ) ) . What about μ P ∧ μa P and μ P ∨ μa P with a given antonym a P of P ?
Computing with Antonyms
143
Theorem 4.9 For any antonym a P of P it is always P ∧ a P logically selfcontradictory (see [16]). Proof. As μa P ≤ μ¬ P it is μ P ∧ μa P ≤ μ P ∧ μ¬ P . Hence, (μ P ∧ μ¬ P ) ≤ (μ P ∧ μa P ) and as it is always μ P ∧ μ¬ P ≤ (μ P ∧ μ¬ P ) it follows that μ P ∧ μa P ≤ (μ P ∧ μa P ) . What still remains open is to know if μ P ∧ μa P is linguistically selfcontradictory. Concerning the case of ¬(P ∨ a P) for any antonym a P of P, it is obvious that, provided it holds the duality law ¬(P ∨ Q) = ¬P ∨ ¬Q, it is logically selfcontradictory if and only if the medium term ¬P ∧ ¬a P is logically selfcontradictory. What still remains open is to know what happens without duality and to know when a(P ∨ a P) is linguistically selfcontradictory. Example The fuzzy set (μ P ∨μa P ) represented in Fig. 6 is not logically selfcontradictory, because assuming that T = Mi n, S = Max and N = 1 − i d, the medium term μ P ∧ μ a P is normalized and, hence, is not logically selfcontradictory. 1
μ¬P ∧¬aP μ
μP
μ aP
0 Fig. 6 Medium term is not selfcontradictory
Assuming T = Pr od, S = Pr od ∗ and N = 1 − i d, the fuzzy set (μ P ∨ μa P ) represented in Fig. 7 is however logically selfcontradictory, because the medium term μ P ∧ μ a P is below the fixed level of the negation N = 1 − i d and hence is logically selfcontradictory. Definition 4.10 A pair of antonyms (μ, μa ) is regular whenever the Neutral Zone N (μ, μa ) is not empty. Of course, a pair of antonyms (μ, μa ) is regular if and only if for any t-norm T the T -medium μm is 1-normalized and, hence, if and only if the Minmedium between μ and μa is 1-normalized. That is, if μm (x) = Min(Nμ (μ(x)), Nμ (μa (x))) = 1 for some x ∈ X. Theorem 4.11 Any regular pair of antonyms is strict. Proof. If N (μ, μa ) = ∅, it exists x in X such that μ(x) = μa (x) = 0. Then μ (x) = Nμ (0) = 1, and 0 = μa (x) < 1 = μ (x), or μa < μ .
144
E. Trillas et al.
1
μ ’aP
0.5
μ ’P μ aP
μP
μ¬P ∧¬aP 0 Fig. 7 Medium term is selfcontradictory
Corollary 4.12 Any non-strict pair of antonyms is non-regular. The set E(μ, μa ) = {x ∈ X ; μ(x) = μa (x)} obviously verifies N (μ, μa ) ⊂ E(μ, μa ). Then, if N (μ, μa ) is non-empty it is E(μ, μa ). Then, for any regular pair of antonyms it is E(μ, μa ) = ∅. Notice that, if μ = μ P and μa = μa P the points x in E(μ P , μa P ) verify that the truth-values of the statements “x is P” and “x is a P” are equal. Then, points in E(μ P , μa P ) are those that can be considered both as P and as a P. Theorem 4.13 For any pair of antonyms (μ, μa ), it is E(μ, μa ) ⊂ {x ∈ X ; μ(x) ≤ gμ−1 (1/2)}, with Nμ = gμ−1 ◦ (1 − id[0,1] ) ◦ gμ , and gμ−1 (1/2) ∈ (0, 1) the fixed point of Nμ . Proof. Obviously, gμ−1 (1/2) ∈ (0, 1) is the fixed point of Nμ . If x ∈ E(μ, μa ), from μ(x) = μa (x) ≤ μ (x) it follows μ(x) ≤ gμ−1 (1 − gμ (μ(x))), that is equivalent to gμ (μ(x)) ≤ 1/2. The theorem obviously holds if E(μ, μa ) = ∅. Consequently, it is N (μ, μa ) ⊂ E(μ, μa ) ⊂ {x ∈ X ; μ(x) ≤ gμ−1 (1/2)}. Theorem 4.14 If μ is monotonic and E(μ, μa ) = ∅, both functions μ and μa only take eventually in E(μ, μa ) two values 0 and r , with 0 < r ≤ gμ−1 (1/2). Proof. If E(μ, μa ) − N (μ, μa ) = ∅ functions μ and μa are constantly equal to a non-zero value on this difference-set. In fact, let it be x, y ∈ E(μ, μa ) − N (μ, μa ). It should be 0 < μ(x) = μa (x), 0 < μ(y) = μa (y). Let us suppose 0 < μ(x) ≤ μ(y), then μa (y) ≤ μa (x) because of A.2. Hence: μa (y) ≤ μa (x) = μ(x) ≤ μ(y) = μa (y), and μ(x) = μ(y) = r , with r ≤ gμ−1 (1/2). Of course, on N (μ, μa ) it is μ(x) = μa (x) = 0. Theorem 4.15 If (μ, μa ) is a non-strict pair of antonyms, E(μ, μa ) = {x ∈ X ; μ(x) = gμ−1 (1/2)}.
Computing with Antonyms
145
Proof. If μa = μ , E(μ, μa ) = {x ∈ X ; μ(x) = Nμ (μ(x))} = {x ∈ X ; μ(x) = gμ−1 (1 − gμ (μ(x)))} = {x ∈ X ; μ(x) = gμ−1 (1/2)}. Of course, in this case it is N(μ, μa ) = ∅. Corollary 4.16 If (μ, μa ) is a pair of antonyms such that E(μ, μa ) = ∅ and the value r of μ verifies r < gμ−1 (1/2), the pair is strict. Proof. Obvious, as if the pair is non-strict by theorem 4.15 it should be r = gμ−1 (1/2). A symmetry α : X → X can have or have not fixed points. For example, in a Boolean Algebra, α(x) = x (the complement of x), is a symmetry as α 2 (x) = α(x ) = x = x and α(0) = 0 = 1. It is also and involution as if x ≤ y then α(x) ≥ α(y); nevertheless, there is no a x such that α(x) = x = x. If X is a closed interval [a, b] of the real line any symmetry α : [a, b] → [a, b], such that x < y implies α(x) > α(y) and α(a) = b (that is, any involution), has a single fixed point α(z) = z ∈ (a, b). Definition 4.17 A pair of antonyms (μ, μa ) is perfect if it exists a symmetry αμ : [a, b] → [a, b] such that μa = μ ◦ αμ . Notice that, in this case, if μ is crisp, also μa is crisp. Of course, to have μaa = μ, as μaa = μa ◦ αμa = μ ◦ αμ ◦ αμa , it is a sufficient condition that αμ ◦ αμa = id X , or αμ = αμa . In what follows this “uniformity” condition will be supposed. Theorem 4.18 If αμ has some fixed point, a sufficient condition for a perfect pair of antonyms (μ, μa ) with μa = μ ◦ αμ to be regular is that μ(z) = 0 for some fixed point z ∈ X of αμ . Proof. As it is μa (z) = μ(αμ (z)) = μ(z) = 0, it follows z ∈ N (μ, μa ). For example, in X = [1, 10] if μ(x) = x−1 9 and αμ (x) = 11 − x (a symmetry of [1, 10] as αμ2 (x) = x, it is non increasing and verifies αμ (1) = 10), the definition μa (x) = μ(αμ (x)) = μ(11 − x) =
10 − x , 9
gives the non Nμ -strict pair of antonyms (μ, μa ) with Nμ (x) = 1 − x. In fact, it holds Nμ (μ(x)) = μa (x). Let us consider a different kind of example in X = [1, 10]. Taking μ as: ⎧ if 1 ≤ x ≤ 4 ⎨ 1, μ(x) = 5 − x, if 4 ≤ x ≤ 5 ⎩ 0, if 5 ≤ x ≤ 10, it is μ(x) > 0 if and only if x ∈ [1, 5). To have a regular pair of perfect antonyms (μ, μ ◦ αμ ), it is sufficient but not necessary that 11 − x < αμ (x), as then 5.5 < z provided that αμ (z) = z.
146
E. Trillas et al.
Remarks 1. If αμ has the fixed point z, from μa (z) = μ(αμ (z)) = μ(z), it is z ∈ E(μ, μa ), and μ(z) > 0 implies μ(z) = r (theorem 4.14). Hence, if μ = μ P , μa = μa P , the fixed point z of αμ can be considered to be both P and a P. If this common truth-value is not-zero, the pair of antonyms is not regular. 2. In the former first example in [1, 10] with μ(x) = x−1 9 , as it is μ(x) = 0 if x = 1, to have a perfect pair of antonyms (μ, μ ◦ αμ ) to which theorem 4.18 can be applied it is necessary that αμ have the fixed point z = 1, and that both A.1, A.2 are verified by the pair (μ, μ ◦ αμ ). 3. When the pair of antonyms is not perfect there is the serious technical difficulty of proving μaa = μ. This is a problem that with perfect antonymy is automatically obtained as the hypothesis αμ = αμa and αμ2 = id make it immediate. 4. Let us consider P=“less than three” on X = [1, 10]. The extension of P on X is given by the fuzzy (crisp) set μ P (x) =
1, if x ∈ [1, 3) 0, if x ∈ [3, 10]
The negate ¬P of P gives the complement of the set μ−1 P (1) = [1, 3), that is μ−1 (1) = [3, 10], and μ = 1 − μ with ¬P ≡ “greater or equal than three”. ¬P P ¬P Then N P (x) = 1 − x and, for any antonym a P of P it should be μa P ≤ 1 − μ P , and hence μ−1 a P (1) ⊂ [3, 10]. Consequently, if μa P = μ P ◦ α P is a perfect antonym of P, from: μa P (x) = μ P (α P (x)) =
1, if α P (x) ∈ [1, 3) 0, if α P (x) ∈ [3, 10],
it follows μ−1 a P (1) = (α P (3), 10] and it should be (α P (3), 10] ⊂ [3, 10]. Then 3 ≤ α P (3) and as μ P (α P (3)) = 0 any of these perfect antonyms are regular. If 3 = α P (3), it is μ−1 a P (1) = (3, 10] and a P ≡ “greater than three”. If 3 < α P (3) it is a P ≡ “greater than α P (3)” and if, for example, α P (x) = 1 + 10 − x = 11 − x it is μ−1 a P (1) = (8, 10] with a P ≡ “greater than eight”. Notice that as μmP (x) = 1 if and only if μa P (x) = μ P (x) = 0, it is (μmP )−1 (1) = [3, α P (3)], for any t-norm T .
4.1 The Case Probable-improbable in Boolean Algebras After Kolmogorov axiomatics it is generally accepted in the field of mathematics that the predicate P = probable is used, on a boolean algebra (B, ·, +, ;≤; 0, 1), by means of the partial order of B (a ≤ b iff a · b = a) and some probability p : B → [0, 1]. That is, compatibility functions μ P for the different uses of P on B are given by μ P (x) = p(x) for all x in B: to fix a concrete use of P on B is
Computing with Antonyms
147
to fix a probability p, and the degree up to which “x is P” is p(x). The ordering ≤ P that P induces on B is always taken as the partial ordering ≤, and hence μ P is ≤-measure (vid [11]) on B as x ≤ y implies p(x) ≤ p(y), and μ P (0) = p(0) = 0, μ P (1) = p(1) = 1. −1 (the reverse ordering Concerning the use of ¬P = not probable, as ≤−1 P =≤ of ≤: a ≤−1 b iff b ≤ a), and “x is not P” is identified with “not x is P” (“x’ is P”), from p(x ) = 1 − p(x), it follows μ¬ P (x) = μ P (x ) = 1 − p(x); a function that verifies: 1. μ¬ P (1) = 0, with 1 the minimum of (B, ≤−1 ) 2. μ¬ P (0) = 1, with 0 the maximum of (B, ≤−1 ) 3. If x ≤−1 y, then μ¬ P (x) ≤ μ¬ P (y), and consequently it is a ≤−1 -measure. What about the uses of a P =improbable, the antonym of P? Only considering perfect antonyms a P of P, which are given by μa P (x) = μ P (α(x)), with some mapping α : B → B such that: 1. 2. 3. 4.
α(0) = 1; If x ≤ y, then α(x) ≤−1 α(y); α 2 = i d B , and μa P (x) = μ P (α(x)) ≤ μ¬ P (x).
Hence, α(1) = α(α(0)) = 0, and p(α(x)) ≤ 1 − p(x). It should be pointed out that neither 1 − p nor p ◦ α are probabilities, as (1 − p)(0) = 1 and ( p ◦ α)(0) = 1, but they are ≤−1 -measures as, if x ≤ y, then (1− p)(x) ≤ (1− p)(y) and ( p◦α)(x) ≤ ( p◦α)(y), and (1− p)(1) = ( p◦α)(1) = 0, (1 − p)(0) = ( p ◦ α)(0) = 1. The symmetry α can be either a boolean function or not. If α is boolean, that is, if α(x) = a · x + b · x for some coefficients a, b in B, it is α 2 = i d B if and only if b = a , and then with α(0) = 1 it results α(x) = x . Hence, if α is boolean μa P (x) = μ P (α(x)) = μ P (x ) = μ¬ P (x), and there is no difference between a P and ¬P. The booleanity of α corresponds to the non-strict use of a P (a use that appears in some dictionaries of antonyms that give a(probable) = not-probable), but the reciprocal is not-true as the following example will show: Example Take B = 23 with atoms a1 , a2 , a3 , and a probability p. Take α as: • α(0) = 1, α(1) = 0. • α(a1 ) = a2 , α(a2 ) = a3 , α(a3 ) = a1 . • α(a2 ) = a1 , α(a3 ) = a2 , α(a1 ) = a3 . It is α 2 = i d B , α(0) = 1 and if x ≤ y, then α(x) ≥ α(y), but α is non-boolean as α(x) = x (for example, α(a1 ) = a3 = a1 ). Then: • μa P (0) = p(1) = 1, μa P (1) = p(0) = 0 • μa P (a1 ) = p(a3 ) = p(a1 ) + p(a2 ); μa P (a2 ) = p(a1 ) = p(a2 ) + p(a3 ); μa P (a3 ) = p(a2 ) = p(a1 ) + p(a3 ) • μa P (a1 ) = p(a2 ); μa P (a2 ) = p(a3 ); μa P (a3 ) = p(a1 )
148
E. Trillas et al.
An easy computation shows that μa P ≤ μ¬ P whenever p(a1 ) = p(a2 ) with p(a3 ) = 1− p(a1 )− p(a2 ) = 1−2 p(a1 ). With p(a1 ) = p(a2) = 14 , p(a3 ) = 12 it is, for example, μa P (a1 ) = 12 < 34 = 1 − p(a1 ) = μ¬ P (a1 ), hence: μa P < μ¬ P , and the corresponding use of “improbable” is a strict and regular antonym of “probable”.
4.2 A Complex Example An especial example could be the case of the predicate “Organized” over the organization grade of desks and its antonym “Disorganized”, because a “non-organized” desk does not mean a “disorganized” desk, and also, a “non-disorganized” desk does not mean an “organized” desk. Moreover, we assume that while the organization of the desk can decrease and accordingly the grade to which it is organized, it may not yet be disorganized, unless until a certain weak level of organization is reached. And also in the inverse way, the disorganization of the desk can decrease and so the grade to which it is disorganized, it may not yet be organized, unless until a certain weak level of disorganization is reached. A possible representation of this predicates can be seen in Fig. 8. 1
Disorganized Non-Organized
Non-Disorganized
Organized
M
μ aP (x )
μ P (x )
1 2
0 x0
x1
x2
x3
x4
x5
xn
Fig. 8 Organized versus Disorganized, Non-Organized and Non-Disorganized
Notice that, α(x) = (x n − x) for all x ∈ [x 0 , x n ]. Let μa P = μ P ◦ α. It follows: • (μ P , μa P ) satisfies Definition 4.1 and is a pair of antonyms, since: μa P ≤ N ◦ μ P where N = 1 − i d ∀x i , x j ∈ [x 0 , x n ] : 0 < μ P (x i ) ≤ μ P (x j ) ⇒ μa P (x i ) ≥ μa P (x j ) • (μ P , μa P ) satisfies Definition 4.3, hence it is strict: ∀x ∈ (x 1 , x 5 ) : μa P (x) < N(μ P (x))
Computing with Antonyms
149
• μ P satisfies Theorem 4.2
∀x ∈ [x 0 , x n ] : μ P (x) ≤ N(μa P (x)) • (μ P , μa P ) satisfies Theorem 4.2 ∀x ∈ [x 0 , x n ] : W (μ P (x), μa P (x)) = max(0, μ P (x) + μa P (x) − 1) = 0 • (μ P , μa P ) satisfies Corollary 4.5 ∀x ∈ [x 0 , x n ] : W ∗ (μ P (x), μa P (x)) = μ X It is fairly obvious that for x ∈ (x 1 , x 2 ] and x ∈ [x 4, x 5 ) μ P (x) + μa P (x) < 1 • The polygon (x 0 , x 1 , M, x 5 , x n ) represents the fuzzy set T −medium, with T = Mi n (see Definition 4.6). This T −medium (or more properly Mi n−medium) is non-normalized (See Theorem 4.7) • The neutral zone N (μ P , μa P ) = ∅, from where this pair of antonyms is nonregular (see Definition 4.9), however is perfect (see Definition 4.17). • The set E(μ p , μa P ) = {x 3 } and μ P (x 3 ) < 12 (see Corollary 4.12, Theorem 4.13 and Corollary 4.16).
5 On Two Controversial Points Using the model introduced for perfect pairs of antonyms let us briefly consider two controversial issues concerning antonymy. First, the relation between the predicates ¬(a P) and a(¬P) derived from P, and second the relation between a(P ∧ Q), a(P ∨ Q), a P and a Q. Notice that Example 2 in 3.4 furnishes a case in which a(P ∨ Q) can be interpreted as a P ∧ a Q, a property of duality that actually does not seems to be general. Concerning the first issue, if both predicates ¬(a P) and a(¬P) are used on X in such a way that they can be respectively represented by the fuzzy sets μ¬(a P) and μa(¬ P), it is: μ¬(a P) = Na P ◦ μa P = Na P ◦ μ P ◦ α P , and μa(¬ P) = μ¬ P ◦ α¬ P = N P ◦ μ P ◦ α¬ P .
150
E. Trillas et al.
Hence, in general it is not true that μ¬(a P) = μa(¬ P) , that is, predicates ¬(a P) and a(¬P) are not generally synonyms on a universe X: they cannot be identified. Nevertheless, it is obvious that a sufficient condition for this identification is that Na P = N P and α P = α¬ P . This “uniformity” conditions seem far to be always attainable and perhaps only in very restricted contexts can be accepted. Whenever ¬(a P) ≡ a(¬P) both predicates are synonyms on X, but it is not usual to have a single word to designate this predicate. For example, there seems not to exist English words designating neither a(Not Young) nor a(Not Probable). It should be pointed out that also if Na P = N P from μ P ◦ α = μ P ◦ α¬ P it does not generally follow that α P = α¬ P except, of course, if μ P is an injective function. Concerning the second issue and provided that μ P∧Q is given, in any standard Fuzzy Set Theory, by T ◦ (μ P × μ Q ) for some t-norm T , it is: μa( P∧Q) = μ P∧Q ◦ α P∧Q = T ◦ (μ P × μ Q ) ◦ α P∧Q μa P∧a Q
= T ◦ (μ P ◦ α P∧Q × μ Q ◦ α P∧Q ), and = T ◦ (μa P × μa Q ) = T ◦ (μ P ◦ α P × μ Q ◦ α Q ).
Hence, in general it is not μa( P∧Q) = μa P∧a Q . But a sufficient condition for a(P ∧ Q) and a P ∧ a Q being synonyms is that α P∧Q = α P = α Q , a case on which a(P ∧ Q) can be identified with a P ∧ a Q. On this uniformity conditions it is again a(P ∨ Q) identifiable with a P ∨ a Q, as it easy to analogously proof it if, for some t-conorm S, it is μ P∨Q = S ◦ (μ P × μ Q ). It is notwithstanding to be noticed that the conjunction of properties a(a P) ≡ P (a general one) and a(P ∧ Q) ≡ a P ∧ a Q (only valid on a restricted situation) results in a contradiction when, with the t-norm of theorem 4.2.d, is μ P∧a P = μ∅ . In fact, if all that turns to happen it is μ∅ = μ P∧a P = μa(a P)∧a P = μa(a P∧P) , and it follows the absurd a(empty)≡ empty. Of course, if P ∧ a P does not represent the empty set (only proved for a special t-norm) that contradiction can not appear in the same way. The relationship between the involution A and the logical connectives ¬, ∧ and ∨ does not always result in general laws because a P strictly depends on the language: a P only exists actually if there is a name for it in the corresponding language. For example, in a restricted family domain on which “young and rich” is made synonym of “interesting” it could perfectly be a(interesting) ≡ “old and poor” ≡ a(young) and a(rich).
Computing with Antonyms
151
6 Conclusions 6.1 Antonyms and symetries If with the negate ¬P of a predicate P it is not always the case that statements “x is ¬¬P” and ”x is P” are equivalent, the situation is different with the antonym a P as it seems to be generally accepted the law a(a P) ≡ P, for a given antonym a P of P. But the antonym does not seems definable without appealing to the way of negating the predicate: the inequality μa P ≤ μ¬ P should be respected and it shows ¬P as a limit case for any possible antonym of P. As long as languages are phenomena of big complexity not fully coincidental with which one finds in dictionaries and Grammar Textbooks, but constantly created by people using it, a lot of experimentation should be done in order to better know on which (possibly restricted) domains mathematical models arising from theoretical Fuzzy Logic can actually be taken as enough good models for language. Antonymy is in language and, if classical logics cannot possibly cope with it, fuzzy logic not only can but deals with it early from its inception. That an antonym is a non-static concept of language and perhaps not of any kind of syntactical logic is shown by the fact that “John is old” is used thanks to the previous use, for example, of “Joe is young”: antonymy comes from direct observation of the objects in the universe of discourse. This leads to a certain empirical justification for making sometimes equivalent both statements “x is a P” and “α P (x)” is P”, for some adequate symmetry α P : X → X. Possibly it is the consideration of a concrete ground set X and of how predicates are used in it that makes the difference between how an abstract science, like Formal Logic, approaches language and how it is done by an empirical one like Fuzzy Logic.
6.2 A Final Complex Example Consider the pair of antonyms (“closed”, “open”) in the following scenario. A door is “closed” if neither light nor a person can get through. The door is “slightly open” if light and sound, but not a person can get through. Finally the door is “open” if a person can get through. Let the universe of discourse be [a, b] = [0, 90o ]. A possible representation is shown in Fig. 9. It becomes apparent that “closed” implies “not open” and “open” imples “not closed”; moreover, the core of “slightly open” represents a neutral zone between “closed” and “open”. That is, “closed” and “open” satisfy the basic requirements to constitute a pair of antonyms. However no symmetry seems to relate them. Let ϕ : [a, b] → [a, b] be an (appropriate) order isomorphism. Then given a predicate P its antonym a P may be obtained as follows: • With μ P (x) build μ P (x) = μ P (ϕ(x)) • With μ P (x) and the symmetry α(ϕ(x)) = (b + a − ϕ(x)), build μa P (x) • Obtain μa P (x) with μa P (ϕ −1 (x))
152
E. Trillas et al.
Fig. 9 Representation of “closed”, “slightly open” and “open”
slightly open
closed
open
0
90
For the example above, one possible order isomorphism ϕ is shown in Fig. 10. In all former examples, the order isomorphism implicitly used was obviously the identity. The important open question remains how to find an appropriate order isomorphism which will allow the use of a simple symmetry to relate antonyms.
6.3 Final remarks This paper, again as a continuation of the former essays [13], [11] and specially [14], is nothing else than a first approach at how fuzzy methodology can help to see more clearly across the fog of language by an attempt to modelling the concept of antonym, a concept that is far from being enough known for operational purposes in Computer Science. A concept that plays a relevant role both in concept-formation and in meaning-representation, as well as in the interpretation of written texts or utterances. That methodology can also help linguistics as a useful representation for the study of antonym. For example, to clarify in which cases it is either a(P ∧ Q) = a P ∨ a Q or a(P ∧ Q) = a P ∧ a Q, or something else.
closed(x)
slightly open
open(x)
a
closed(ϕ(x))
b
open(ϕ(x))
a
b
ϕ
Fig. 10 An isomorphism ϕ for example
Computing with Antonyms
153
References 1. Zadeh, L. (1975) “The concept of a linguistic variable and its application to approximate reasoning” I, II, III. Information Sciences, 8, 199–251; 8, 301–357; 9, 43–80. 2. Trillas, E. (1979) “Sobre funciones de negación en la teoría de conjuntos difusos”. Stochastica, Vol.III, 1, 47–60. Reprinted, in English (1998), “On negation functions in fuzzy set theory” Advanges in Fuzzy Logic, Barro, S., Bugarín, A., Sobrino, A. editors , 31–45. 3. Trillas, E. and Riera, T. (1981) “Towards a representation of synonyms and antonyms by fuzzy sets”. Busefal, V, 30–45. 4. Ovchinnikov, S.V. (1981) “Representations of Synonymy and Antonymy by Automorphisms in Fuzzy Set Theory”. Stochastica, 2, 95–107. 5. Warczyk, R. (1981) “Antonymie, Négation ou Opposition?”. La Linguístique, 17/1, 29–48. 6. Lehrer, A. and Lehrer, K. (1982) “Antonymy”. Linguistics and Philosophy, 5, 483–501. 7. Kerbrat-Orecchioni, C. (1984) “Antonymie et Argumentation: La Contradiction”, Pratiques, 43, 46–58. 8. Lehrer, A. (1985) “Markedness and antonymy”. J. Linguistics, 21, 397–429. 9. Trillas E. (1998) “Apuntes para un ensayo sobre qué ’objetos’ pueden ser los conjuntos borrosos”. Proceedings of ESTYLF’1998. 31–39. 10. De Soto, A.R. and Trillas, E. (1999) “On Antonym and Negate in Fuzzy Logic”. Int. Jour. of Intelligent Systems 14, 295–303. 11. Trillas, E. and Alsina, C. (1999) “A Reflection on what is a Membership Function”. Mathware & Soft-Computing, VI, 2–3, 219–234. 12. Trillas, E., Alsina, C. and Jacas, J. (1999) “On contradiction in Fuzzy Logic”. Soft Computing, Vol 3, 4, 197–199. 13. Trillas, E. and Cubillo, S. (2000) “On a Type of Antonymy in F([a, b])”. Proceedings of IPMU’2000, Vol III, 1728–1734. 14. Trillas, E., Cubillo, S. and Castiñeira, E. (2000) “ On Antonymy from Fuzzy Logic”. Proceedings of ESTYLF’2000, 79–84. 15. Trillas, E., Alsina, C. and Pradera, A. (to appear) “Searching for the roots of NonContradiction and Excluded-Middle” International Journal of General Systems. 16. Guadarrama, S., Trillas, E. and Renedo, E. (to appear) “Non-Contradiction and ExcludedMiddle with antonyms” Proceeding of ESTYLF’2002.
Morphic Computing: Concept and Foundation Germano Resconi and Masoud Nikravesh
Abstract In this paper, we introduce a new type of computation called “Morphic Computing”. Morphic Computing is based on Field Theory and more specifically Morphic Fields. We claim that Morphic Computing is a natural extension of Holographic Computation, Quantum Computation, Soft Computing, and DNA Computing. All natural computations bonded by the Turing Machine can be formalised and extended by our new type of computation model – Morphic Computing. In this paper, we introduce the basis for this new computing paradigm. Keywords: Morphic computing · Morphogenetic computing; Morphic fields morphogenetic fields · Quantum computing · DNA computing · Soft computing · Computing with words · Morphic systems · Morphic neural network · Morphic system of systems · Optical computation by holograms · Holistic systems
1 Introduction In this paper, we introduce a new type of computation called “Morphic Computing” [Resconi and Nikravesh, 2006, 2007a, and 2007b]. We assume that computing is not always related to symbolic entities such as numbers, words or other symbolic entities. Fields as entities are more complex than any symbolic representation of the knowledge. For example, Morphic Fields include the universal database for both organic (living) and abstract (mental) forms. Morphic Computing can also change or compute non physical conceptual fields. The basis for Morphic Computing is Field Theory and more specifically Morphic Fields. Morphic Fields were first introduced by Rupert Sheldrake [1981] from his hypothesis of formative causation
Germano Resconi Catholic University, Brescia, Italy e-mail
[email protected] Masoud Nikravesh BISC Program, EECS Department, University of California, Berkeley, CA 94720, US e-mail:
[email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
155
156
G. Resconi, M. Nikravesh
[Sheldrake, 1981 and 1988] that made use of the older notion of Morphogenetic Fields. Rupert Sheldrake developed his famous theory, Morphic Resonance [Sheldrake 1981 and 1988], on the basis of the work by French philosopher Henri Bergson [1896, 1911]. Morphic Fields and its subset Morphogenetic Fields have been at the center of controversy for many years in mainstream science. The hypothesis is not accepted by some scientists who consider it a pseudoscience. “The term [Morphic Fields] is more general in its meaning than Morphogenetic Fields, and includes other kinds of organizing fields in addition to those of morphogenesis; the organizing fields of animal and human behaviour, of social and cultural systems, and of mental activity can all be regarded as Morphic Fields which contain an inherent memory.” – Sheldrake [1988].
We claim that Morphic Fields reshape multidimensional space to generate local contexts. For example, gravitational fields in general relativity, are Morphic Fields that reshape space-time space to generate local context where particles move. Morphic Computing reverses the ordinary N input basis fields of possible data to one field in the output system. The set of input fields form an N dimension space or context. The N dimensional space can be obtained by a deformation an Euclidean space and we argue that the Morphic Fields are the cause of the deformation. In line with Rupert Sheldrake [1981], our Morphic Fields are the formative causation of the context. In our approach, the output field of data is the input field X in Morphic Computing. To compute the coherence of the X with the context, we project X into the context. Our computation is similar to a quantum measure where the aim is to find coherence between the behaviour of the particles and the instruments. Therefore, the quantum measure projects the physical phenomena into the instrument as a context. The logic of our projection is the same as quantum logic in the quantum measure. In conclusion, we compute how a context can implement desired results based on Morphic Computing. Gabor [1972] and Fatmi and Resconi [1988] discovered the possibility of computing images made by a huge number of points as output from objects as a set of huge number of points as input by reference beams or laser (holography). It is also known that a set of particles can have a huge number of possible states that in classical physics separate one from the other. However, only one state (position and velocity of the particles) would be possible at the same time. With respect to quantum mechanics, one can have a superposition of all states with all states presented in the superposition at the same time. It is also very important to note that at the same time one cannot separate the states as individual entities but consider them as one entity. For this very peculiar property of quantum mechanics, one can change all the superpose states at the same time. This type of global computation is the conceptual principle by which we think one can build quantum computers. Similar phenomena can be used to develop DNA computation where a huge number of DNA as a field of DNA elements are transformed (replication) at the same time and filtered (selection) to solve non polynomial problems. In addition, soft-computing or computation by words extend the classical local definition of true and false value for logic predicate to a field of degree of true and false inside the space of all possible values of the
Morphic Computing: Concept and Foundation
157
predicates. In this way, the computational power of soft computing is extended similar to that which one can find in quantum computing, DNA computing, and optical computing by holography. In conclusion, one can expect that all the previous approaches and models of computing are examples of a more general computation model called “Morphic Computing” where the “Morphic” means “form” and is associated with the idea of holism, geometry, field, superposition, globality and so on. We claim that Morphic Computing is a natural extension of Optical Computation by holograms [Gabor 1972] and holistic systems [Smuts 1926] however in form of complex System of Systems [Kotov 1997], Quantum Computation [Deutsch 1985, Feynman, 1982, Omnès 1994, Nielsen and Chuang 2000], Soft Computing [Zadeh 1991], and DNA Computing [Adleman 1994]. All natural computation bonded by the Turing Machine [Turing 1936–7 and 1950] such as classical logic and AI [ McCarthy et al. 1955] can be formalized and extended by our new type of computation model – Morphic Computing. In holistic systems (biological, chemical, social, economic, mental, linguistic, etc.) one cannot determined or explained the properties of the system by the sum of its component parts alone; “The whole is more than the sum of its parts” (Aristotle). In this paper, we introduce the basis for our new computing paradigm- Morphic Computing.
2 Morphic Computing and Field Theory: Classical and Modern Approach 2.1 Fields In this paper, we assume that computing is not always related to symbolic entities as numbers, words or other symbolic entities. Fields as entities are more complex than any symbolic representation of the knowledge. For example, Morphic Fields include the universal database for both organic (living) and abstract (mental) forms. In classical physics, we represent the interaction among the particles by local forces that are the cause of the movement of the particles. Also in classical physics, it is more important to know at any moment the individual values of the forces than the structure of the forces. This approach considers that the particles are independent from the other particles under the effect of the external forces. But with further development of the theory of particle physics, the researchers discovered that forces are produced by intermediate entities that are not located in one particular point of space but are at any point of a specific space at the same time. These entities are called “Fields”. Based on this new theory, the structure of the fields is more important than the value itself at any point. In this representation of the universe, any particle in any position is under the effect of the fields. Therefore, the fields are used to connect all the particles of the universe in one global entity. However, if any particle is under the effect of the other particles, every local invariant property will disappear because every system is open and it is not possible to close any local system. To solve this invariance problem, scientist discovered that the local invariant can be conserved
158
G. Resconi, M. Nikravesh
with a deformation of the local geometry and the metric of the space. While the form of the invariant does not change for the field, the action is changed. However, these changes are only in reference to how we write the invariance. In conclusion, we can assume that the action of the fields can be substituted with deformation of the space. Any particle is not under the action of the fields, the invariance as energy, momentum, etc. is true (physical symmetry). However, the references that we have chosen change in space and time and in a way to simulate the action of the field. In this case, all the reference space has been changed and the action of the field is only a virtual phenomena. In this case, we have a different reference space whose geometry in general is non Euclidean. With the quantum phenomena, the problem becomes more complex because the particles are correlated one with one other in a more hidden way without any physical interaction with fields. This correlation or entanglement generates a structure inside the universe for which the probability to detect a particle is a virtual or conceptual field that covers the entire Universe.
2.2 Morphic Computing: Basis for Quantum, DNA, and Soft Computing Gabor [1972] and H. Fatmi and Resconi [1988] discovered the possibility of computing images made by a huge number of points as output from objects as a set of huge number of points as input by reference beams or laser (holography). It is also known that a set of particles can have a huge number of possible states that in classical physics, separate one from the other. However, only one state (position and velocity of the particles) would be possible at a time. With respect to quantum mechanics, one can have a superposition of all states with all states presented in the superposition at the same time. It is also very important to note that at the same time one cannot separate the states as individual entities but consider them as one entity. For this very peculiar property of quantum mechanics, one can change all the superpose states at the same time. This type of global computation is the conceptual principle by which we think one can build quantum computers. Similar phenomena can be used to develop DNA computation where a huge number of DNA as a field of DNA elements are transformed (replication) at the same time and filtered (selection) to solve non polynomial problems. In addition, soft-computing or computation by words extend the classical local definition of true and false value for logic predicate to a field of degree of true and false inside the space of all possible values of the predicates. In this way, the computational power of soft computing is extended similar to that which one can find in quantum computing, DNA computing, and Holographic computing. In conclusion, one can expect that all the previous approaches and models of computing are examples of a more general computation model called “Morphic Computing” where “Morphic” means “form” and is associated with the idea of holism, geometry, field , superposition, globality and so on.
Morphic Computing: Concept and Foundation
159
2.3 Morphic Computing and Conceptual Fields – Non Physical Fields Morphic Computing change or compute non physical conceptual fields. One example is in representing the semantics of words. In this case, a field is generated by a word or a sentence as sources. For example, in a library the reference space would be where the documents are located. At any given word, we define the field as a map of the position of the documents in the library and the number of the occurrences (values) of the word in the document. The word or source is located in one point of the reference space (query) but the field (answer) can be located in any part of the reference. Complex strings of words (structured query) generate a complex field or complex answer by which the structure can be obtained by the superposition of the fields of the words as sources with different intensity. Any field is a vector in the space of the documents. A set of basic fields is a vector space and form a concept. We break the traditional idea that a concept is one word in the conceptual map. Internal structure (entanglement) of the concept is the relation of dependence among the basic fields. The ambiguous word is the source (query) of the fuzzy set (field or answer).
2.4 Morphic Computing and Natural Languages – Theory of Generalized Constraint In a particular case, we know that a key assumption in computing with words is that the information which is conveyed by a proposition expressed in a natural language or word may be represented as a generalized constraint of the form “X isr R”, where X is a constrained variable; R is a constraining relation; and r is an indexing variable whose value defines the way in which R constrains X. Thus, if p is a proposition expressed in a natural language, then “X isr R” representing the meaning of p, equivalently, the information conveyed by p. Therefore, the generalised constraint model can be represented by field theory in this way. The meaning of any natural proposition p is given by the space X of the fields that form a concept in the reference space or objective space, and by a field R in the same reference. We note that a concept is not only a word, but is a domain or context X where the propositions p represented by the field R is located. The word in the new image is not a passive entity but is an active entity. In fact, the word is the source of the field. We can also use the idea that the word as an abstract entity is a query and the field as set of instances of the query is the answer.
2.5 Morphogenetic and Neural Network A neural network is a complex structure that connects simple entities denoted as neurons. The main feature of the neural network is the continuous evolution in
160
G. Resconi, M. Nikravesh
complexity of the interconnections. This evolutionary process is called morphogenesis. Besides the evolution in the structure, we also have an evolution in the biochemical network inside the neurons and the relations among neurons. The biochemical morphogenesis is useful for adapting the neurons to the desired functionality. Finally, in a deeper form, the biochemical and structural morphogenesis is under the control of the gene network morphogenesis. In fact, any gene will control the activity of the other genes through a continuous adaptive network. Morphogenesis is essential to brain plasticity so as to obtain homeostasis or the invariant of the fundamental and vital functionality of the brain. The predefined vital functions interact with the neural network in the brain which activity is oriented to the implementation or projection of the vital function into the neural activities. In conclusion, morphogenetic activity is oriented to compensate for the difference between the vital functions and the neural reply. The field nature of the designed functionality as input and of the implemented functionality in the neural network as output suggest the holistic nature of the neural network activity. The neural network with its morphogenesis can be considered as a prototype of morphic computing.
2.6 Morphic Systems and Morphic System of Systems (M-SOS) We know that System of System or SoS movement study large scale systems integration. Traditional systems engineering is based on the assumption that if given the requirements the engineer will give you the system. The emerging system of systems context arises when a need or set of needs are met with a mix of multiple systems, each of which are capable of independent operation in order to fulfil the global mission or missions. For example, design optimisation strategies focus on minimizing or maximizing an objective while meeting several constraints. These objectives and constraints typically characterize the performance of the individual system for a typical design mission or missions. However, these design strategies rarely address the impact on the performance of a larger system of systems, nor do they usually address the dynamic, evolving environment in which the system of systems must act. A great body? of work exists that can address “organizing” a system of systems from existing single systems: resource allocation is currently used in any number of fields of engineering and business to improve the profit,? other systems optimisation. One reason for this new emphasis on large-scale systems is that customers want solutions to provide a set of capabilities, not a single specific vehicle or system to meet an exact set of specifications. Systems engineering is based on the assumption that if given the requirements the engineer will give you the system “There is growing recognition that one does not necessarily attack design problems solely at the component level.” The cardinal point for SoS studies is that the solutions are unlikely to be found in any one field There is an additional consideration: A fundamental characteristic of the problem areas where we detect SoS phenomena will be that they are open systems without fixed and stable boundaries and adapt to changing circumstances. In SoS there is common language and shared goals, communications helps form
Morphic Computing: Concept and Foundation
161
communities around a confluence of issues, goals, and actions. Therefore, if we wish to increase the dimensionality we need tools not to replace human reasoning but to assist and support this type of associative reasoning. Important effects occur at multiple scales, involving not only multiple phenomena but phenomena of very different types. These in turn, are intimately bound to diverse communities of effected individuals. Human activity now yields effects on a global scale. Yet we lack the tools to understand the implications of our choices. Given a set of prototype systems, integrations (superposition) of these systems generate a family of meta-systems. Given the requirement X the MS (Morphic System) reshape in a minimum way or zero the X. The reshaped requirements belong to the family of meta-systems and can be obtained by an integration of the prototype systems. In conclusion, the MS is the instrument to implement the SoS. To use MS we represent any individual system as a field the superposition with the field of the other systems is the integration process. This is similar to the quantum computer where any particle (system) in quantum physics is a field of uncertainty (density of probability). The integration among particles is obtained by a superposition of the different uncertainties. The meta-system or SoS is the result of the integration. Any instrument in quantum measure has only a few possible numbers of integration. So the quantum measure reshape the given requirement or physical integration in a way to be one of the possible integrations in the instrument. So the quantum instrument is a particular case of the SoS and of the MS. In MS we also give suitable geometric structure (space of the fields of the prototype systems) to represent integration in SoS and correlation among the prototype systems. With this geometry we give an? invariance property for transformation of the prototype systems and the requirement X. So we adjoin to the SoS integration a dynamical process which dynamical law is the invariant. Since the geometry is in general a non Euclidean geometry, we can define for the SoS a morpho field (M F) which action is to give the deformation of the geometry from Eucliden (independent prototype system) to a non Euclidean (dependent prototype system). The Morpic Field is comparable to the gravity field in general relativity. So the model of quantum mechanics and general relativity are a special case of MS and are also the physical model for the SoS.
2.7 Morphic Computing and Agents – Non Classical Logic In the agent image, where only one word (query) as a source is used for any agent, the field generated by the word (answer) is a Boolean field (the values in any points are true or false). Therefore, we can compose the words by logic operations to create complex Boolean expression or complex Boolean query. This query generates a Boolean field for any agent. This set of agents creates a set of elementary Boolean fields whose superposition is the fuzzy set represented by a field with fuzzy values. The field is the answer to the ambiguous structured query whose source is the complex expression p. The fields with fuzzy values for complex logic expression are coherent with traditional fuzzy logic with a more conceptual transparency because it is found on agents and Boolean logic structure. As points out [Nikravesh, 2006]
162
G. Resconi, M. Nikravesh
the Web is a large unstructured and in many cases conflicting set of data. So in the World Wide Web, fuzzy logic and fuzzy sets are essential parts of a query and also for finding appropriate searches to obtain the relevant answer. For the agent interpretation of the fuzzy set, the net of the Web is structured as a set of conflicting and in many case irrational agents whose task is to create any concept. Agents produce actions to create answers for ambiguous words in the Web. A structured query in RDF can be represented as a graph of three elementary concepts as subject, predicate and complement in a conceptual map. Every word and relationship in the conceptual map are variables whose values are fields which their superposition gives the answer to the query. Because we are more interested in the meaning of the query than how we write the query itself, we are more interested in the field than how we produce the field by the query. In fact, different linguistic representations of the query can give the same field or answer. In the construction of the query, we use words as sources of fields with different intensities. With superposition, we obtain the answer for our structured query. We structure the text or query to build the described field or meaning. It is also possible to use the answer, as a field, to generate the intensity of the words as sources inside a structured query. The first process is denoted READ process by which we can read the answer (meaning) of the structured query. The second process is the WRITE process by which we give the intensity or rank of the words in a query when we know the answer. As an analogy to holography, the WRITE process is the construction of the hologram when we know the light field of the object. The READ is the construction of the light field image by the hologram. In the holography, the READ process uses a beam of coherent light as a laser to obtain the image. Now in our structured query, the words inside of text are activated at the same time. The words as sources are coherent in the construction by superposition of the desired answer or field. Now the field image of the computation by words in a crisp and fuzzy interpretation prepares the implementation of the Morphic Computing approach to the computation by words. In this way, we have presented an example of the meaning of the new type of computation, “Morphic Computing”.
2.8 Morphic Computing: Basic Concepts Morphic Computing is based on the following concepts: 1) The concept of field in the reference space 2) The fields as points or vectors in the N dimension Euclidean space of the objects (points) 3) A set of M ≤ N basis fields in the N dimensional space. The set of M fields are vectors in the N dimensional space. The set of M vectors form a non Euclidean subspace H (context) of the space N. The coordinates Sα in M of the field X are the contro-variant components of the field X. The components of X in M are also the intensity of the sources of the basis field. The superposition of the basis field with different intensity give us the projection Q of X or Y = QX into the space
Morphic Computing: Concept and Foundation
163
H When M < N the projection operator of X into H define a constrain or relation among the components of Y. 4) With the tensor calculus with the components Sα of the vector X or the components of more complex entity as tensors , we can generate invariants for any unitary transformation of the object space or the change of the basis fields. 5) Given two projection operators Q1 , Q2 on two spaces H1 , H2 with dimension M1 and M2 we can generate the M = M1 M2 , with the product of Y1 and Y2 or Y = Y1 Y2 . Any projection Q into the space H or Y = QX of the product of the basis fields generate Y. When Y = Y1 Y2 the output Y is in entanglement state and cannot separate in the two projections Q1 and Q2 . 6) The logic of the Morphic Computing Entity is the logic of the projection operators that is isomorphic to the quantum logic The information can be coded inside the basis fields by the relation among the basis fields. In Morphic Computing, the relation is represented by a non Euclidean geometry which metric or expression of the distance between two points shows this relation. The projection operator is similar to the measure in quantum mechanics. The projection operator can introduce constrains inside the components of Y. The sources are the instrument to control the image Y in Morphic Computing. There is a strong analogy between Morphic Computing and computation by holography and computation by secondary sources (Jessel 1982) in the physical field. The computation of Y by X and the projection operator Q that project X into the space H give his result when the Y is similar to X. In this case, the sources S are the solution of the computation. We see the analogy with the neural network where the solution is to find the weights wk at the synapse. In this paper, we show that the weights are sources in Morphic Computing. Now, it is possible to compose different projection operators in a network of Morphic Systems. It is obvious to consider this system as a System of Systems. Any Morphic Computation is always context dependent where the context is H. The context H by the operator Q define a set of rules that are comparable with the rules implemented in a digital computer. So when we change the context with the same operations, we obtain different results. We can control the context in a way to obtain wanted results. When any projection operator of X or QX is denoted as a measure, in analogy with quantum mechanics, any projection operator depends on the previous projection operator. In the measure analogy, any measure depends on the previous measures. So any measure is dependent on the path of measures or projection operators that we realise before or through the history. So we can say that different projection operators are a story (See Roland Omnès [1994] in quantum mechanics stories). The analogy of the measure also gives us another intuitive idea of Morph Computing. Any measure become a good measure when gives us an image Y of the real phenomena X that is similar, when the internal rules to X are not destroyed. In the measure process, the measure is a good measure. The same for Morphic Computing, the computation is a good computation when the projection operator does not destroy the internal relation of the field in input X.
164
G. Resconi, M. Nikravesh
The analogy with the measure in quantum mechanics is also useful to explain the concept of Morphic Computing because the instrument in the quantum measure is the fundamental context that interferes with the physical phenomena as H interferes with the input field X. A deeper connection exists between the Projection operator lattice that represents the quantum logic and Morphic Computing processes (see Eddie Oshins 1992). Because any fuzzy set is a scalar field of the membership values on the factors (reference space) (Wang and Sugeno, 1982). We remember that any concept can be viewed as a fuzzy set in the factor space. So at the fuzzy set we can introduce all the processes and concepts that we utilise in Morphic Computing. For the relation between concept and field, we introduce in the field theory an intrinsic fuzzy logic. So in Morphic Computing, we have an external logic of the projection or measure (quantum logic) and a possible internal fuzzy logic of the fuzzy interpretation of the fields. In the end, because we also use agents’ superposition to define fuzzy sets and fuzzy rules, we can again use Morphic Computing to compute the agents inconsistency and irrationality. Thus, fuzzy set and fuzzy logic are part of the more general computation denoted as Morphic Computing.
3 Reference Space, Space of the objects, Space of the fields in the Morphic Computing Given the n dimensional reference space (R1 , R2 , . . . , Rn ), any point P = (R1 , R2 , . . . , Rn ) is an object. Now we create the space of the objects which dimension is equal to the number of the points and the value of the coordinates in this space is equal to the value of the field in the point. We call the space of the points “space of the objects”. Any field connect all the points in the reference space and is represented as a point in the object space. The components of this vector are the value of the field in the different points. We know that each of the two points connected by a link assume the value of one of the connections. All the other points assume zero value. Now any value of the field in a point can be considered as a degree of connection of this point with all the others. Therefore, in one point where the field is zero, we can consider this point as non-connected to the others. In fact, because the field in this point is zero the other points cannot be connected by the field to the given point. In conclusion, we consider the field as a global connector of the objects in the reference space. Now inside the space of the objects, we can locate any type of field as vectors or points. In field theory, we assume that any complex field can be considered as a superposition of prototype fields whose model is well known. The prototype fields are vectors in the space of the objects that form a new reference or field space. In general, the field space is a non Euclidean space. In conclusion, any complex field Y can be written in this way
Morphic Computing: Concept and Foundation
165
Y = S1 H1 (R1 , . . . , Rn ) + S2 H2 (R1 , . . . , Rn ) + . . . + Sn Hn (R1 , . . . , Rn ) = H(R) S
(1)
In (1), H1 , H2 , . . . , Hn are the basic fields or prototype fields and S1 , S2 , . . . , Sn are the weights or source values of the basic fields. We assume that any basic field is generated by a source. The intensity of the prototype fields is proportional to the intensity of the sources that generates the field itself.
3.1 Example of the Basic Field and Sources In Fig. 1, we show an example of two different basic fields in a two dimensional reference space (x, y). The general equation of the fields is F(x, y) = S[e−h((x−x0)
2 +(y−y )2 ) 0
]
(2)
the parameters of the field F1 are S = 1 h = 2 and x0 = −0.5 and y0 = −0.5, the parameters of the field F2 are S = 1 h = 2 and x0 = 0.5 and y0 = 0.5 For the sources S1 = 1 and S2 = 1 the superposition field F that is shown in Fig. 2 is F = F1 + F2 . For the sources S1 = 1 and S2 = 2, the superposition field F that is shown again in Fig. 2 is F = F1 + 2 F2 .
3.2 Computation of the Sources To compute the sources Sk , we represent the prototype field Hk and the input field X in a Table 1 where the objects are the points and the attribute are the fields. The values in Table 1 is represented by the following matrices F1
F
F2
F
Fig. 1 Two different basic fields in the two dimensional reference space (x,y)
166
G. Resconi, M. Nikravesh
F = F1 + 2F2.
F = F1 + F2.
Fig. 2 Example of superposition of elementary fields F1 , F2 Table 1 Fields values for M points in the reference space P1 P2 ... PM
H1
H2
...
HN
Input Field X
H1,1 H2,1 ... HM,1
F1,2 H2,2 ... HM,2
... ... ... ...
H1,N H2,N ... HM,N
X1 X2 ... XM
⎡
H1,1 ⎢ H2,1 H =⎢ ⎣ ... H M,1
H1,2 H2,2 ... H M,2
... ... ... ...
⎡ ⎤ ⎤ X1 H1,N ⎢ ⎥ H2,N ⎥ ⎥ , X = ⎢ X2 ⎥ ⎣ ... ⎦ ... ⎦ XM H M,N
The matrix H is the relation among the prototype fields Fk and the points Ph . At this point, we are interested in the computation of the sources S by which they give the best linear model of X by the elementary field values. Therefore, we have the superposition expression ⎡
⎤ ⎡ ⎤ ⎡ ⎤ H1,1 H1,2 H1,n ⎢ H2,1 ⎥ ⎢ H2,2 ⎥ ⎢ H2,n ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ Y = S1 ⎢ ⎣ . . . ⎦ + S2 ⎣ . . . ⎦ + . . . + Sn ⎣ . . . ⎦ = H S H M,1 H M,2 H M,n
(3)
Then, we compute the best sources S in a way the difference |Y − X| is the minimum distance for any possible choice of the set of sources. It is easy to show that the best sources are obtained by the expression S = (H T H )−1 H T X
(4)
Given the previous discussion and field presentation, the elementary Morphic Computing element is given by the input-output system as shown in Fig. 3. Figure 4 shows network of elementary Morphic Computing with three set of
Morphic Computing: Concept and Foundation Field X
167
Sources S = (HTH)-1 HT X
Field Y = H S = QX
Prototype fields H(R)
Fig. 3 Elementary Morphic Computing
prototype fields and three type of sources with one general field X in input and one general field Y in output and intermediary fields from X and Y. When H is a square matrix, we have Y = X and S = H −1 X and Y = X = H S
(5)
Now for any elementary computation in the Morphic Computing, we have the following three fundamental spaces. 1) The reference space 2) The space of the objects (points) 3) The space of the prototype fields Figure 5 shows a very simple geometric example when the number of the objects are three (P1 , P2 , P3 ) and the number of the prototype fields are two (H1 , H2 ). The space which coordinates are the two fields is the space of the fields. Please note that the output Y = H S is the projection of X into the space H Y = H(HT H)−1 HT X = Q X With the property Q2 X = Q X Therefore, the input X can be separated in two parts X = QX+F where the vector F is perpendicular to the space H as we can see in a simple example given in Fig. 6. H1
S1 X
Fig. 4 Shows the Network of Morphic Computing
S2
S3
H2
H3
Y
168
G. Resconi, M. Nikravesh
Fig. 5 The fields H1 and H2 are the space of the fields. The coordinates of the vectors H1 and H2 are the values of the fields in the three points P1 , P2 , P3
P3
H1 P1 H2 P2
Now, we try to extend the expression of the sources in the following way Given G () = T and G (H) = HT H and S∗ = [G (H) + G ()]−1 HT X and [G(H) + G ()]S∗ = HT X So for S∗ = (HT H)−1 HT X + α = Sα + α we have ([G (H) + G ()](Sα + α ) = HT X (HT H)(HT H)−1 HT X + ([G (H) + G ()]α = HT X G () Sα + [G (H) + G ()] α = 0 and −1 T S∗ = S + = H T H + T H X where for is function of S by the equation G () S + [G (H) + G ()] = 0 For non-square matrix and/or singular matrix, we can use the generalized model given by Nikravesh [] as follows;
−1
−1 H T T X = (H )T ( H ) (H )T X S ∗ = H T T H Where we transform by Λ the input and the references H. The value of the variable D (metric of the space of the field) is computed by the expression (6) D2 = (H S)T (H S) = ST HT H S = ST G S = (Q X)T Q X
(6)
For the unitary transformation U for which, we have UT U = I and H’ = U H the prototype fields change in the following way P3 X F F1
Fig. 6 Projection operator Q and output of the elementary Morphic Computing. We see that X = Q X + F , where the sum is the vector sum
QX = Y F2 P2
P1
Morphic Computing: Concept and Foundation
169
H’ = U H G’ = (U H)T (U H) = HT UT U H = HT H And S’ = [(U H)T (U H)]−1 (U H)T Z = G−1 HT UT Z = G−1 HT (U−1 Z) For Z = U X we have S’ = S and the variable D is invariant. for the unitary transformation U. We remark that G = HT H is a quadratic matrix that gives the metric tensor of the space of the fields. When G is a diagonal matrix the entire elementary field are independent one from the other. But when G has non diagonal elements, in this case the elementary fields are dependent on from the other. Among the elementary fields there is a correlation or a relationship and the geometry of the space of the fields is a non Euclidean geometry.
References 1. L.M. Adleman (1994) “Molecular Computation Of Solutions To Combinatorial Problems”. Science (journal) 266 (11): 1021–1024 (1994). 2. H. Bergson (1896), Matter and Memory 1896. (Matière et mémoire) Zone Books 1990: ISBN 0-942299-05-1 (published in English in 1911). 3. D. Deutsch (1985). “Quantum Theory, the Church-Turing Principle, and the Universal Quantum Computer”. Proc. Roy. Soc. Lond. A400, 97–117. 4. R.P. Feynman (1982). “Simulating Physics with Computers”. International Journal of Theoretical Physics 21: 467–488. 5. H.A. Fatmi and G.Resconi (1988), A New Computing Principle, Il Nuovo Cimento Vol.101 B,N.2 - Febbraio 1988 - pp.239–242 6. D. Gabor (1972), Holography 1948–1971, Proc.IEEE Vol. 60 pp 655–668 June 1972. 7. M. Jessel, Acoustique Théorique, Masson et Cie Editours 1973 8. V. Kotov (1997), “Systems-of-Systems as Communicating Structures,” Hewlett Packard Computer Systems Laboratory Paper HPL-97-124, (1997), pp. 1–15. 9. J. McCarthy, M.L. Minsky, N. Rochester, and C.E. Shannon (1955), Proposal for Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. 10. M. Nielsen and I. Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press. ISBN 0-521-63503-9. 11. M. Nikravesh, Intelligent Computing Techniques for Complex systems, in Soft Computing and Intelligent Data Analysis in Oil Exploration, pp. 651–672, Elsevier, 2003. 12. R. Omnès (1994), The Interpretation of Quantum Mechanics , Princeton Series in Physics, 1994. 13. E. Oshins, K.M. Ford, R.V. Rodriguez and F.D. Anger (1992). A comparative analysis: classical, fuzzy, and quantum logic. Presented at 2nd Florida Artificial Intelligence Research Symposium, St. Petersburg, Florida, April 5, 1989. In Fishman, M.B. (Ed.), Advances in artificial intelligence research, Vol. II, Greenwich, CT: JAI Press. Most Innovative Paper Award, 1989, FLAIRS-89, Florida AI Research Symposium. 14. G. Resconi and M. Nikravesh, Morphic Computing, to be published in Forging New Frontiers, 2007 15. G. Resconi and M. Nikravesh, Morphic Computing, Book in Preparation, 2007
170
G. Resconi, M. Nikravesh
16. G. Resconi and M. Nikravesh, Field Theory and Computing with Words, , FLINS 2006, 7th International FLINS Conference on Applied Artificial Intelligence, August 29–31, 2006 Italy 17. G. Resconi and L.C. Jain (2004), Intelligent Agents, Springer, 2004 18. R. Sheldrake (1981), A New Science of Life: The Hypothesis of Morphic Resonance (1981, second edition 1985), Park Street Press; Reprint edition (March 1, 1995) 19. R. Sheldrake (1988), The Presence of the Past: Morphic Resonance and the Habits of Nature (1988), Park Street Press; Reprint edition (March 1, 1995) 20. J.C. Smuts (1926),Holism and Evolution, 1926 MacMillan, Compass/Viking Press 1961 reprint: ISBN 0-598-63750-8, Greenwood Press 1973 reprint: ISBN 0-8371-6556-3, Sierra Sunrise 1999 (mildly edited): ISBN 1-887263-14-4 21. A.M. Turing (1936–7), ‘On computable numbers, with an application to the Entscheidungsproblem’, Proc. London Maths. Soc., ser. 2, 42: 230–265; also in (Davis 1965) and (Gandy and Yates 2001) 22. A.M. Turing (1950), ‘Computing machinery and intelligence’, Mind 50: 433–460; also in (Boden 1990), (Ince 1992) 23. P.Z. Wang and Sugeno M. (1982) , The factor fields and background structure for fuzzy subsets. Fuzzy Mathematics , Vol. 2 , pp.45–54 24. L.A. Zadeh, Fuzzy Logic, Neural Networks and Soft Computing, Communications of the ACM, 37(3):77–84, 1994. 25. L.A. Zadeh and M. Nikravesh (2002), Perception-Based Intelligent Decision Systems; Office of Naval Research, Summer 2002 Program Review, Covel Commons, University of California, Los Angeles, July 30th -August 1st , 2002. 26. L.A. Zadeh, and J. Kacprzyk, (eds.) (1999a), Computing With Words in Information/Intelligent Systems 1: Foundations, Physica-Verlag, Germany, 1999a. 27. L.A. Zadeh, L. and J. Kacprzyk (eds.) (1999b), Computing With Words in Information/Intelligent Systems 2: Applications, Physica-Verlag, Germany, 1999b. 28. L.A. Zadeh, Fuzzy sets, Inf. Control 8, 338–353, 1965.
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets Rui J. P. de Figueiredo
Abstract We present a nonlinear functional analytic framework for modeling and processing fuzzy sets in terms of their membership functions. Let X = {x} denote a universe of discourse, and A, a fuzzy set of elements x in X and membership ˜ of attributes A, ˜ and a function μ A . First, we formally introduce a class C = { A} judgment criterion J in the definition of μ A ; and explain the role of such an ˜ J, x) in the interpretaattribute- and judgment-based membership function μ A ( A, tion of A as a value-based or uncertainty-based fuzzy set. Second, for uncertaintybased fuzzy sets, we associate with each attribute A˜ (e.g., old), a corresponding event, also denoted by A˜ (e.g., event of being old), the set C of all such events constituting a completely additive class in an appropriate probability space, x being a random variable, vector, or object induced by this space. On such a basis, we present and discuss the role of the membership function μ A as a generalization of ˜ the concept of posterior probability P( A/x). This allows us to introduce rigorously, on a single platform, both human and machine judgment J in assigning objects to fuzzy sets by minimizing conditional risk. Third, we assume that X is a vector space endowed with a scalar product, such as a Euclidian Space or a separable Hilbert Space. Typically, X = {x} would be a feature space, its elements x = x( p) being feature vectors associated with objects p of interest in the discourse. Then, membership functions become membership “functionals”, i.e., mappings from a vector space to the real line. Fourth, with this motivation, we focus attention on the class Φ of fuzzy sets A whose membership functions μ A are analytic (nonlinear) functionals on X, which, can therefore be represented by abstract power series in the elements x of X. Specifically, μ A in Φ are assumed to lie in a positive cone Λ of vectors in a Generalized Fock Space F(X). This F(X) is a Reproducing Kernel Hilbert Space (RKHS) of analytic functionals on X introduced by the author and T. A. W. Dwyer in 1980 for nonlinear signals and systems analysis. Fifth, in such a setting, we view the μ A as vectors in F(X) acting as “representers” of their respective fuzzy sets A. Thus, because of the one-to-one relationship between the fuzzy sets A and
Rui J. P. de Figueiredo Laboratory for Intelligent Signal Processing and Communications, California Institute for Telecommunications and Information Technology, University of California Irvine, Irvine, CA, 92697-2800
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
171
172
R. J.P. de Figueiredo
their respective μ A , the fuzzy sets A can benefit from all the results of analytical processing of their μ A as vectors. Finally, we derive a “best model” Aˆ for a fuzzy set A based on a best approximation μˆ A of its membership functional μ A in the space F(X), subject to appropriate interpolating or smoothing training data constraints and a positivity constraint. The closed form solution μˆ A thus obtained appears in the form of an artificial network, proposed by the author in another theoretical context in 1990. For the purpose of illustration of the underlying technology, an application to the diagnosis of Alzheimer’s disease is briefly discussed.
1 Introduction In 1965, L.A. Zadeh generalized the concept of an ordinary set to that of a fuzzy set [1]. In the four decades that have elapsed since then, this conceptual generalization has enabled the development of simple and yet powerful rules and methodologies for processing data sets taking into account the uncertainty present in them (see, e.g., [2, 3, 4, 5]). To further extend the capabilities of the above generalization, we present, in this chapter, a framework for modeling and processing fuzzy sets in terms of their membership functions. ˜ of attributes A, ˜ and For this purpose, we formally introduce a class C = { A} a judgment criterion J in the definition of μ A , and explain the role of such a ˜ J, x) in the interpretation of a set A consisting of elements x of a universal μ A ( A, set X as a value-based or uncertainty-based fuzzy set. For uncertainty-based fuzzy sets we assume that there is a one-to-one relationship between an attribute A˜ and a corresponding event, denoted by the same symbol, in an appropriate probability space. In such a setting, we present and discuss the role of the membership function μ A as a generalization of the concept of posterior ˜ probability P( A/x). This allows us to introduce rigorously, on a single platform, both human and machine judgment J in minimizing the conditional risk for the purpose of linking x with the attribute A˜ characterizing the fuzzy set A. Then, under the conditions to be stated in a moment, we allow the membership functions μ A to be analytic (nonlinear) functionals, from X to the real line, belonging to a Reproducing Kernel Hilbert Space F(X). We may think of them as “representers” of their respective fuzzy sets A. In such a setting, we pose the problem of finding the “best model” for A as a constrained optimization problem of finding the best approximation μ A for their corresponding μ A in F(X), under training data constraints. In this way, the benefits of the vector-based analytical processing of μ A is passed on to the fuzzy sets A that they represent. Specifically, we consider, in this chapter, a class of fuzzy sets A, whose μ A satisfy the following three conditions. 1. The universe of discourse X is a vector space endowed with a scalar product. Thus X may be a (possibly weighted) Euclidian Space or a separable Hilbert Space. Typically, the elements x = x( p) of X are feature vectors associated with objects p of interest in the discourse. Then, the membership functions μ A (.) :
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
173
X → [0, 1] generalize into membership “functionals”, i.e., mappings from a vector space to the real line. These functionals are, in general, nonlinear in; 2. The μ A are analytic, and hence representable by abstract power series in the elements x of X. 3. The μ A belong to a positive cone of a Reproducing Kernel Hilbert Space F(X) of analytic functionals on the Hilbert Space X. The space F(X) was introduced in 1980 by de Figueiredo and Dwyer [6, 7] for nonlinear functional analysis of signals and systems. F(X) constitutes a generalization of the Symmetric Fock Space, the state space of non-self-interacting Boson fields in quantum field theory. In the next section, we discuss interpretations of μ A based on an attribute variable Aˆ (e.g., “old”, “rich”, “educated”, “hot”. . . , or any event of interest) and a judgment criterion J , and, in particular, focus on the role of characterization of μ A as a generalization of a posterior probability. This provides the motivation and basis for the material presented in the subsequent sections. In particular, as we have just indicated, under the above three conditions, we derive a “best model” Aˆ for a fuzzy set A based on a best approximation μˆ A of its membership functional μ A in the space F(X). Let {v i , u i }, i = 1, . . . , q, denote a set of training pairs, where, for i = 1, . . . , q, u i is the value μˆ A (v i ) to be assigned by an intelligent machine or a human expert ˆ Then μˆ A is obtained to a prototype feature vector v i assumed to belong to A. by minimizing the maximum error between μˆ A and all other μ A in an uncertainty set in F(X), subject to a positivity constraint and interpolating constraints μ Aˆ (v i ) = u i , i=1, . . . , q. This set may be viewed as a fuzzy set of fuzzy set representatives. It turns out that the closed form solution to the μˆ A thus obtained appears naturally in the form of an artificial neural network introduced by us, in a general theoretical setting in 1990 and called an Optimal Interpolative Neural Network OINN [8, 9]. In conclusion, an example of application to medical diagnosis is given.
2 Interpretation of Fuzzy Sets Via Their Membership Functionals 2.1 What is a Fuzzy Set? Let us recall, at the outset, using our own clarifying notation, the following: Definition 1. Let X = {x} denote a universal collection of objects x constituting the universe of discourse. Then A is called a fuzzy set of objects x in X if and only if the following conditions are satisfied: ˜ (a) There is an attribute, or an event characterized by the respective attribute, A, and a judgment criterion J , on the basis of which the membership of x in A is judged.
174
R. J.P. de Figueiredo
˜ J, .) : X → [0, 1], (b) For a given A˜ and J , there is a membership functional μ A ( A, whose value at x expresses, on a scale of zero to one, the extent to which x belongs to A. Under these conditions, the fuzzy set A is defined as A = Support{μA } ˜ J, x) = 0} = {x ∈ X : μ A ( A,
(1)
From now on, for simplicity in notation, we will drop A˜ and J in the arguments of μ A unless, for the sake of clarity, they need to be invoked. 1 Remark 1. Let B denote an ordinary subset (called “crisp” subset) of X with the characteristic function ϕ B : X → {0, 1} ⊂ [0, 1], i.e., ϕ B (x) =
1, 0,
x∈B other wi se
(2)
It is clear from (1) and (2) that B is a fuzzy set with membership functional μ B (x) = ϕ B (x),
x∈X
(3)
Thus a crisp subset constitutes a special case of a fuzzy set and all statements regarding fuzzy sets hold for crisp sets or subsets, with the qualification expressed by (3). For this reason, fuzzy sets may well have been called “generalized sets.” Remark 2. According to condition (a) of Definition 1, there is a judgment J involved in the mathematical definition of a fuzzy set. This judgment may come from a human. For this reason, just as the conventional (crisp) set theory provides the foundations for conventional mathematical analysis, fuzzy set theory can provide a rigorous foundation for a more general mathematical analysis, which takes human judgment into account in the definitions of appropriate measure spaces, functions, functionals, and operators. This may explain why fuzzy set theory has led to “unconventional” mathematical developments, like fuzzy logic and approximate reasoning, used in modeling human/machine systems.
2.2 Interpretation of Membership Functionals For a given universe of discourse X, the following are two of the possible important interpretations of fuzzy sets via their respective membership functions. They are:
1
Ends of formal statements will be signaled by
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
175
1. In the most general case (without the restriction that X be a vector space), a fuzzy set A may be viewed as a value-based set for which the membership func˜ J ; x) to an object x, based on an tional μ A assigns a normalized value μ A ( A, attribute A˜ and judgment J . Note that the other interpretation listed below may be considered to be value-based, where “value” is represented by the value of the ˜ dependent variable used in scoring the uncertainty in A. 2. As an uncertainty set, one may interpret the value of a membership functional as the value of a functional analogous to the posterior probability of A˜ given x. In the following two sub-sections we discuss respectively these two interpretations.
2.2.1 Valued-Based Fuzzy Sets This type of interpretation can best be explained by means of an example such as the following. Example 1. Value-Based Fuzzy Sets. Let the set of all persons p of ages between 0 to 100 years generate the universe of discourse. Specifically for this problem, choose the “Age” x( p) of p as the feature vector belonging to the one-dimensional real Euclidian Space E 1 . Thus we have for the universe of discourse X = {x ∈ E 1 : 0 ≤ x ≤ 100}.
(4)
We can construct various fuzzy sets using different attributes, say, an attribute A˜ 1 such as “Old” or an attribute A˜ 2 such as “Rich”, as well as judgments from different entities, say a judgment J1 from a private sector organization or a judgment J2 from a public sector organization. For the fuzzy sets “Old”, select, for example, the expression for the membership function (where we use this term instead of functional because its argument is scalar) g(λ, x) = [(1 − exp(−100λ)]−1[1 − exp(−λx)], 0 ≤ x ≤ 100
(5)
Then, assuming that judgments J1 and J2 correspond to some values λ1 and λ2 of the parameter λ, we can construct the fuzzy sets A1 and A2 of “Old” persons as follows: A1 = {x ∈ [0, 100] : μ A1 (x) = μ A1 ( A˜ 1 , J1 , x) = g(λ1 , x)} A2 = {x ∈ [0, 100] : μ A2 (x) = μ A2 ( A˜ 1 , J2 , x) = g(λ2 , x)}
(6a) (6b)
For the fuzzy sets “Rich”, we may select, for example, another expression for the membership function h(λ, x) = [(1 − exp(−104λ)]−1 [1 − exp(−λx 2 )], 0 ≤ x ≤ 100, and construct the fuzzy sets A3 and A4 of “Rich” persons
(7)
176
R. J.P. de Figueiredo 1 0.9 0.8 0.7
μA(x)
0.6 0.5 0.4 0.3 0.2
λ1=0.03
0.1 0
λ2=0.05 0
10
20
30
40
50
60
70
80
90
100
x(Age)
Fig. 1 Old
A3 = {x ∈ [0, 100] : μ A3 (x) = μ A3 ( A˜ 2 , J1 , x) = h(λ1 , x)} A4 = {x ∈ [0, 100] : μ A4 (x) = μ A4 ( A˜ 2 , J2 , x) = h(λ2 , x)}
(8a) (8b)
Figures 1 and 2 depict the shape of the membership function curves for the fuzzy sets of old and rich persons for the nominal values of the parameters λ shown. . Remark 3. We glean from the above example the following important capabilities of fuzzy sets: In the modeling of fuzzy sets: (1) Attributes or events as well as judgments from an intelligent machine and/or human can be taken into account; (2) It is not required to partition, at the outset, the universal set into subsets, because different fuzzy sets may coincide with one and another, and, in fact, with the entire universe of discourse. What distinguishes one fuzzy set from one another is the difference in their membership functionals. This property, which is useful in modeling and processing the content of sets is absent in the analysis of ordinary sets. Example 2. Image as an “Object” x or a “Value-Based Fuzzy set” A. Consider an image f as an array of pixels f = { f (u), u = (i, j ) : i = 1, . . . , M, j = 1, . . . , N}
(9)
In the context of fuzzy sets, f allows two interpretations: (a) f is an object x = x( f ) belonging to a fuzzy set A of images; (b) f is a fuzzy set of pixel locations u = (i, j ), the gray level f (u) being the membership value of the fuzzy set f at the pixel location u.
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
177
1 0.9 0.8 0.7
μA(x)
0.6 0.5 0.4 0.3 0.2
λ3=0.0005
0.1 0
λ4=0.0008 0
10
20
30
40
50 x(Age)
60
70
80
90
100
Fig. 2 Rich
In many image-oriented applications, such as in medical imaging and SAR and HRR radar applications, interpretation (a) is used. Specifically, in interpretation (a), an image f is converted into a feature vector x = x( f ) and then the membership functional pertaining to a target class A˜ assigns to x a score value μ A (x), between 0 and 1, according to the judgment of an appropriately trained machine or expert. An example of this type of application to the diagnosis of Alzheimer’s disease is described at the end of this chapter. An application of interpretation (b) occurs in hyper-spectral image processing in remote sensing, where each sub-pixel is assigned, according to its spectral signature, a membership value w. r. t. different classes of materials, e.g., grass, sand, rock, . . . Example 3. Motion-Blurred Image. Consider the blurred image f shown in Fig. 3. It is a photo of a rapidly moving car taken by a still camera during a ChampCar championship race. The car is an MCI racing car that participated in the race. The blur is not caused by random noise but by a motion operator applied to f. Let A˜ denote a class of racing cars. The membership of this car in A˜ with a prescribed degree of certainty can be assigned by an expert. If the image is de-blurred using a de-blurring operator G, one obtains a de-blurred version of f , namely g = G( f ) It will be then easier to classify g and therefore apply a higher value w. r. t. the membership class to which it correctly belongs. So modeling of the membership functional based on uncertainty models that are not necessarily probabilistic can be of value in some applications, and, in fact the approach that we will be presenting allows this option.
178
R. J.P. de Figueiredo
Fig. 3 Blurred image of an MCI racing car taken by a still camera while racing in a ChampCar Championship Race. The entire image may be considered to be an object x in the universe of car images X = {x} or a fuzzy set in the universe of 2-dimensional pixel locations as explained in the text
2.2.2 Uncertainty-Based Fuzzy Sets An uncertainty-based fuzzy set A is one for which the membership value μ A (x) represents the degree of certainty, in a scale from 0 to 1, with which an element x of ˜ The assignment of the value can be A has an attribute A˜ or belongs to a category A. made based on machine-, and/or human-, and/or probabilistically -based judgment. In a probabilistic setting, it is intuitively appealing to think that the human brain makes decisions by minimizing the conditional risk rather than processing likelihood functions. For example, if one had the choice of investing a sum x on one of several stocks, one would choose the stock which one would judge to have the highest probability of success. To state the above in formal terms, we recall that if A˜ 1 , . . . , A˜ n are n attributes, or events, or hypotheses, then the Bayes’ conditional risk, or cost of making a decision in favor of A˜ i , is C( A˜ i /x) = (1 − P( A˜ i /x))
(10)
So the optimal (minimum risk) decision corresponds to assigning x to the A˜ i for which P( A˜ i /x) is maximum, i.e., such that P( A˜ i /x) ≥ P( A˜ j /x), j = i, j = 1, . . . , n
(11)
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
179
So, on the above basis, we have the following proposition. Proposition 1. Let A˜ denote the attribute or event characterizing a generic fuzzy set A, based on a (objective or subjective) judgment J . Then, in a probabilistic setting, the membership function μ A constitutes a generalization of the posterior probability expressed as ˜ J ; x) = P( A|x), ˜ μ A ( A,
(12)
if the objects x in A are random. If x is a nonrandom parameter appearing in the expression of the probability of ˜ then the above expression is replaced by A, ˜ x) μ A (α, J ; x) = P( A;
(13)
In the case of (13), the posterior probability plays a role similar to that of the likelihood function in the conventional estimation theory. Remark 4. Relaxing Restrictions on the Membership Values of Fuzzy Set Combinations. Fuzzy intersections, fuzzy unions, and fuzzy complements of fuzzy sets A and B defined by Zadeh [1] are equivalent to the following definitions: A ∩ B = {x ∈ X : min[μ A (x), μ B (x)] = 0} A ∪ B = {x ∈ X : max[μ A (x), μ B (x)] = 0}
(14) (15)
AC = {x ∈ X : 1 − μ A (x) = 0}
(16)
There are several ways one may agree to assign membership values to these sets. One that is most common is μ A∩B (x) = min[μ A (x), μ B (x)] μ A∪B (x) = max[μ A (x), μ B (x)]
(17) (18)
μ AC (x) = 1 − μ A (x)
(19)
Another set of assignments that leads to the same fuzzy sets defined in (14) – (16) is the one given by the following equations (20) to (22) below. The advantage of the latter choice is that it satisfies the same rules as the posterior probabilities identified in (12) and (13). Thus the following rules enable machines and humans to use the same framework in processing information. μ A∩B (x) = μ A (x)μ B (x) μ A∪B (x) = μ A (x) + μ B (x) − μ A (x)μ B (x)
(20) (21)
μ AC (x) = 1 − μ A (x)
(22)
Remark 5. Conditional Membership. More generally, following the analogy with probability, one may define a conditional membership μ A/B (x) in a straightforward way, and generalize (20) and (21) by
180
R. J.P. de Figueiredo
μ A∩B (x) = μ A/B (x)μ B (x)
(23)
μ A∪B (x) = μ A (x) + μ B (x) − μ A/B (x)μ B (x).
(24)
This system consists of n fuzzy sets A1 , · · · , An with a Universe of Discourse X = {x 1 , · · · , x m }. The x i , A˜ j , i = 1, · · · , m, j = 1, · · · , n may be viewed respectively as inputs to, and outputs from the channel, and the membership functional values μ A j (x i ), as channel transition probabilities P( A˜ j |x i ). Example 4. Memoryless Communication Channel. Fig. 4 depicts a memoryless communication channel, transmitting sets A1 , · · · , An of input symbols from an alphabet X = {x 1, · · · , x m }. The interpretation is explained under the figure title.
3 Fuzzy Sets in a Euclidian Space Let the universal set X be a N-dimensional Euclidian space E N over the reals, with T the scalar product of any two elements x = (x 1 , . . . , x N) )T and y = y1,..., y N denoted and defined by < x, y >= x y = T
N
x i yi ,
(25)
i=1
where the superscript T denotes the transpose. If E N is a weighted Euclidian Space with a positive definite weight matrix R, the scalar product is given by < x, y >= x T R −1 y
(26)
As indicated before, typically suchXwould constitute the space of finitedimensional feature vectors x associated with the objects alluded in a given discourse, as in the following example. Example 5. The database of a health care clinic for the diagnosis of the conditions of patients w. r. t. various illnesses could constitute a universal set X. In this space, each patient p, would be represented by a feature vector x = x ( p) ∈ E N , consisting of N observations made on. p These observations could consist, for example, of lab test results and clinical test scores pertaining to p. Then a subset A in the feature space X with a membership functional μ A could characterize the feature vectors of patients possibly affected by an illness A . For a given patient p, a physician or a computer program would assign a membership value μ A (x( p)) = μ A (x) to x indicating the extent of sickness, w. r. t. the illness A , of the patient p (e.g., very well, fairly well, okay, slightly sick, or very sick).
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
181
Fuzzy set A1 Elements of
X
Attributes
x1
A1
x2
A2
A3
xi
A4 Fuzzy set Aj
xi +1
μA (x1) j
X (Universe of Discourse)
C
μAj ( xi )
xk
Aj
μAj (xk )
Aj +1
(Completely additive class of Attributes/ Events)
xk +1 Aj+2
xk+2
An
xm
Fuzzy sets corresponding to the attributes shown in the figure are: A1 = {x1, x2 }, A2 = {x2 , xi , xi + 1}, A3 = {x1, xi }, A4 = {xi +1, xk }, A j +1 = {xk , xk + 2 , xm }, Aj + 2 = {xk +1 , xk + 2 , xm },
, A j = {x1, xi , xk },
, An = {xk +2 , xm }
Fig. 4 Memoryless communication channel model of a Fuzzy-Set-Based System
3.1 Membership Functionals as Vectors in the RKH Space F(E N ) Returning to our formulation, in the present case, F is the space of bounded analytic functionals on a bounded set ⊂ E N defined by
= x ∈ E N : x ≤ γ
(27)
where γ is a positive constant selected according to a specific application. The conditions under which F is defined and the mathematical properties of F are given in the Appendix.
182
R. J.P. de Figueiredo
Remark 6. It is important to recall that, under the assumption made in this paper, the fuzzy sets A under consideration belong to , the membership functionals μ A belong to the positive cone in F Λ = {μ A ∈ F : μ A (x) ≥ 0},
(28)
μ A (x) ∈ [0, 1] ∀ x ∈ ⊂ E N
(29)
and satisfy the condition.
Our algorithms for modeling and processing μ A , these constraints are satisfied. In view of the analyticity and other conditions satisfied by members of F, stated in the Appendix, any μ pertaining to a fuzzy set belonging to can be represented in the form of an N-variable power series, known as an abstract Volterra series, in these N variables, absolutely convergent at every x ∈ , expressible by: μ(x) =
∞ 1 μn (x) n!
(30)
n=0
where μn are homogeneous Hilbert-Schmidt(H-S) polynomials of degree n in the components of x, a detailed representation of which is given in the Appendix. The scalar product of any μ A and μ B ∈ F, corresponding to fuzzy sets A and B is defined by μ A , μ B F =
∞ 1 1 |k|! ck d k ) ( λn n! k! n=0
(31)
|k|=n
where λn , n = 0, 1, 2, . . ., is a sequence of positive weights expressing the prior uncertainty respectively in the terms μn , n = 0, 1, 2, . . ., satisfying (A11) in the Appendix, and ck and dk are defined for μ A and μ B in the same way as ck is defined for μ in (A4) and (A8) in the Appendix. The reproducing kernel K (x, z), in F is def
K (x, z) = ϕ (x, z ) =
∞ λn n=0
n!
(x, z )n ,
(32)
where x, z denotes the scalar product of x and z in E N . In the special case in which λn = λn0 , ϕ is an exponential function, and thus K (x, z) becomes K (x, z) = exp (λ0 x, z ) .
(33)
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
183
Remark 7. For a simple explanation of the above formulation for the case in which fuzzy sets are intervals on a real line, please see [16].
3.2 Best Approximation of a Membership Functional in F Now we address the important issue of the recovery of a membership functional μ, associated with a fuzzy set A ∈ from a set of training pairs (v i , u i ), i = 1, 2, . . . , q, i.e., (v i , u i ) : v i ∈ A ⊂ E N , u i = μ(v i ) ∈ R 1 , i = 1, · · · , q ,
(34)
and under the positivity constraint (4) which we re-write here as 0 ≤ μ(x) ≤ 1, x ∈ A
(35)
The problem of best approximation μˆ of μ can be posed as the solution of the optimization problem in F of the problem inf sup ||μ − μ|| ˜ F μ ∈ F μ˜ ∈ F subj ect to (18) and (19)
(36)
This is a quadratic programming problem in F that can be solved by standard algorithms available in the literature. However, a procedure that usually works well (see [10]) is the one which solves (20) recursively using a RLS algorithm under (18) alone. In such a learning process setting, the pairs in (18) are used sequentially and a new pair is added whenever (19) is violated. This procedure is continued until (19) is satisfied. Such a procedure leads to the following closed form for μ, ˆ where the pairs (v i , u i ) are all the pairs used until the end of the procedure, μ(x) ˆ = u T G −1 K˜ (x)
(37)
⎛ 1 ⎞ ϕ v ,x ⎟ ⎜ ⎜ 2 ⎟ ⎜ϕ v ,x ⎟ ⎟ ⎜ K˜ (x) = ⎜ ⎟ = (s1 (x), s2 (x), . . . , sq (x))T .. ⎟ ⎜ . ⎟ ⎜ ⎠ ⎝ ϕ (v q , x)
(38)
where
184
R. J.P. de Figueiredo
⎞ u1 ⎜ ⎟ u = ⎝ ... ⎠ ⎛
(39)
uq
and G is a q × q matrix with elements G i j , i, j = 1, · · · , q, defined by
G i j = ϕ vi , v j
(40)
In terms of the above, another convenient way of expressing (37) is v1 , x + w1 ϕ v 2 , x
μ(x) ˆ = w1 ϕ
(41)
···
+ wq ϕ v q , x = w1 s1 (x) + w2 s2 (x) + ...wq sq (x)
where wi are the components of the vector w obtained by w = G −1 u
(42)
Remark 8. We repeat that It is important to note that the functional μ expressed by (37) or, equivalently (41), need not satisfy the condition (35) required for it to be a membership functional. If this condition is violated at any point, say x 0 ∈ X then x 0 and its observed membership value at x 0 are inserted as an additional training pair (v q+1, , u q+1 ) in the training set (34) and the procedure of computing the parameter vector w is repeated. This process is continued using the RLS algorithm in until it converges to a membership functional that satisfies (35). For details on a general recursive learning algorithm that implements this process see [10].
3.3 Neural Network Realization of a Membership Functional in the Space F The structure represented by (37) or, equivalently (41), is shown in the block diagram of Fig. 5. It is clear from this figure that this structure corresponds to a two-hidden layer artificial neural network, with the synapses and activation functions labeled according to the symbols appearing in (37) and (41). Such an artificial neural network, as an optimal realization of an input output map in F based on a training set as given by (34), was first introduced by de Figueiredo in 1990 [8, 9] and called by him an Optimal Interpolative Neural Network (OINN). It now turns out, as developed above, that such a network is also an optimal realization of the membership functional of a fuzzy set in .
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
185
μA(x)
Fig. 5 Optimal realization of the membership functional μ A of a fuzzy set A ∈
∑
wi
s1
s2
si
sq
v ij
x1
x2
xN
feature vector x
4 Fuzzy Sets in L2 Two important cases of applications in which the universal set X is infinitedimensional are those in which the objects in X are waveforms x = {x(t) : a ≤ t ≤ b} or images x = {x(u, v) : a ≤ u ≤ b, c ≤ d}. By simply replacing the formulas for the scalar product in X given by (25) or (26) for the Euclidian case to the present case, all the remaining developments follow in the same way as before, with the correct interpretation of the inner products and with the understanding that summations now become integrations. For example, the scalar product, analogous to (25) and (26), for the case of two waveforms x and y would be b x(t)y(t)dt
(43)
x(t)R −1 (t, s)y(s)dtds
(44)
a
and b a
Then the synaptic weight summations in Fig. 5 would be converted into integrals representing matched filters matched to the prototype waveforms in the respective fuzzy set. The author called such networks dynamic functional artificial neural networks (D-FANNs) and provided a functional analytic method for their analysis in [12] and application in [14].
186
R. J.P. de Figueiredo
5 Applications Due to limitations in space, it is not possible to dwell on applications, except to briefly mention the following example of an analysis of a database of Brain Spectrogram image feature vectors x belonging to two mutually exclusive fuzzy sets A and B of images of patients with possible Alzheimer’s and vascular dementia (For details, see [13]). Figure 6 shows the images of 12 slices extracted from a patient brain by single photon emission with computed tomography (SPECT) hexamethylphenylethyleneneaminieoxime technetium-99, abbreviated as HMPAO-99Tc. The components of
Fig. 6 Images of slices of Brain Spectrogram images of a prototype patient. The template frames are used to extract the components of the feature vector x characterizing the patient
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets Classification
187
Fuzzy set A Fuzzy set B
μA(x)
μB(x)
Diagnosis Class
Scores
Prototype Patients
α1
α2
β1
β2
Correlations
SPECT Image Regions-Of-Interest (ROI) Intensities
x1
x2
xN
feature vector x
Fig. 7 An OINN (Optimal Interpolative Neural Network) used to realize the membership functionals of two fuzzy sets of patients with two different types of dementia
the feature vector x are the average intensities of the image slices in the regions of interest in the templates displayed in the lower part of Fig. 2. This feature vector was applied as input to the OINN shown in Fig. 7, which is self explanatory. The results of the study, including a comparison of the performance of a machine and a human expert in achieving the required objective, are displayed in Table 1. Table 1 Result of the study of dementia based on the fuzzy set vector functional membership analysis Source of Classification
Rate of Correct Classification∗
Fuzzy-Set-Based Classification Radiological Diagnostician
81% 77%
∗ 41
subjects: 15 Probable AD, 12 Probable VD, 10 Possible VD, 4 Normal.
6 Conclusion For the case in which the universe of discourse X is a vector space, we presented a framework for modeling and processing fuzzy sets based on the representation of their membership functionals as vectors belonging to a Reproducing Kernel Hilbert Space F(X) of analytic functionals. A number of interesting properties of such models have been described, including that of best approximation of membership functionals as vectors in F(X) under training data constraints. These best
188
R. J.P. de Figueiredo
approximation models of membership functionals naturally lead to their realization as artificial neural networks. Due to the one-to-one correspondence between fuzzy sets and their membership functionals, the benefits of vector processing of the letter are automatically carried to fuzzy sets to which they belong. Potential application of the underlying technology has been illustrated by an example of computational intelligent diagnosis of Alzheimer’s disease.
References 1. Zadeh, L.A., “Fuzzy sets”, Information and Control, 8, pp.338–353, 1965 2. Klir, G. J. and Yuan, B., Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems – Selected Papers by Lotfi A. Zadeh, World Scientific Publishing Co., 1996. 3. Klir, G.J. and T .A. Folger,Fuzzy Sets, Uncertainty, and Information, Prentice Hall, 1988. 4. Ruan, Da (Ed.) “Fuzzy Set Theory and Advanced Mathematical Applications”, Kluwer Academic, 1995 5. Bezdek, J. C. and Pal, S. K. (Eds.), “Fuzzy Models for Pattern Recognition”, IEEE Press, 1992 6. R.J.P. de Figueiredo and T.A.W. Dwyer “A best approximation framework and implementation for simulation of large-scale nonlinear systems”, IEEE Trans. on Circuits and Systems, vol. CAS-27, no. 11, pp. 1005–1014, November 1980. 7. R.J.P. de Figueiredo “A generalized Fock space framework for nonlinear system and signal analysis,” IEEE Trans. on Circuits and Systems, vol. CAS-30, no. 9, pp. 637–647, September 1983 (Special invited issue on “Nonlinear Circuits and Systems”) 8. R.J.P. de Figueiredo “A New Nonlinear Functional Analytic Framework for Modeling Artificial Neural Networks” (invited paper), Proceedings of the 1990 IEEE International Symposium on Circuits and Systems, New Orleans, LA, May 1–3, 1990, pp. 723–726. 9. R.J.P. de Figueiredo “An Optimal Matching Score Net for Pattern Classification”, Proceedings of the 1990 International Joint Conference on Neural Networks (IJCNN-90), San Diego, CA, June 17–21, 1990, Vol. 2, pp. 909–916. 10. S.K. Sin and R.J.P. de Figueiredo, “An evolution-oriented learning algorithm for the optimal Interpolative neural net”, IEEE Trans. Neural Networks, Vol. 3, No. 2, March 1992, pp. 315–323 11. R. J.P. de Figueiredo and Eltoft, T., “Pattern classification of non-sparse data using optimal interpolative nets”, Neurocomputing, vol.10, no.4, pp. 385–403, April 1996 12. R. J.P. de Figueiredo, “Optimal interpolating and smoothing functional artificial neural networks (FANNs) based on a generalized Fock space framework,” Circuits, Systems, and Signal Processing, vol.17, (no.2), pp. 271–87, Birkhauser Boston, 1998. 13. R. J.P. de Figueiredo, W.R. Shankle, A. Maccato, M.B. Dick, P.Y. Mundkur, I. Mena, C.W.Cotman, “Neural-network-based classification of cognitively normal, demented, Alzheimer’s disease and vascular dementia from brain SPECT image data”, Proceedings of the National Academy of Sciences USA, vol 92, pp. 5530–5534, June 1995. 14. T. Eltoft and R.J.P de Figueiredo, “Nonlinear adaptive time series prediction with a Dynamical- Functional Artificial Neural Network”, IEEE Transactions on Circuist and Systems, Part II, Vol.47 no.10, Oct.2000, pp.1131–1134 15. R.J.P. de Figueiredo “Beyond Volterra and Wiener: Optimal Modeling of Nonlinear Dynamical Systems in a Neural Space for Applications in Computational Intelligence”, in Computational Intelligence: The Experts Speak, edited by Charles Robinson and David Fogel, volume commemorative of the 2002 World Congress on Computational Intelligence, published by IEEE and John Wiley & Sons, 2003. 16. R. J. P. de Figueiredo, “Processing fuzzy set membership functionals as vectors”, to appear in a Special Issue of ISCJ honoring the 40th anniversary of the invention of fuzzy sets by L. A. Zadeh, International Journal on Soft Computing, 2007 (in press).
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
189
APPENDIX Brief overview of the Space F In this Appendix we briefly review some definitions and results on the space F(X) invoked in the body of the paper. For the sake of brevity, we focus on the case in which X is the N-dimensional Euclidian space E N . The developments also apply, of course, when X is any separable Hilbert space, such as l2 , the apace of square-summable strings of real numbers of infinite length (e.g., discrete time signals), or L 2 , the space of square-integrable waveforms or images. Thus let F consist of bounded analytic functionals μ on E N [We denote members of F by μ because, in this paper, they are candidates for membership functionals of fuzzy sets in ].Then such functionals μ can be represented by abstract power (Vollerra functional) series on E N satisfying the following conditions. (a) μ is a real analytic functional on a bounded set ⊂ E N defined by
= x ∈ E N : x ≤ γ
(A1)
where γ is a positive constant. This implies that there is an N-variable power series known as a Volterra series in these N variables, absolutely convergent at every x ∈
, expressible by: μ(x) =
∞ 1 μn (x) n!
(A2)
n=0
where μn are homogeneous Hilbert-Schmidt(H-S) polynomials of degree n in the components of x, given by n
μn (x) =
···
k1 =0 |k|=k1 +k2 +···k N =n
n
|k|! x k1 · · · x Nk N k1 ! · · · k N ! N
=
|k|=n
ck
ck1 ···k N ·
k N =0
|k|! k x k!
where, in the last equality, we have used the notation
(A3)
(A4)
190
R. J.P. de Figueiredo
k = (k1 , · · · , k N ) |k| =
N
ki
(A5) (A6)
i=1
k! = k1 !, · · · , k N ! ck = ck1 ···k N
(A7) (A8)
x k = x 1k1 · · · x 1k N
(A9)
(b) Let there be given a sequence of positive numbers λ = {λ0 , λ1 · · ·} ,
(A10)
where the weights λn express prior uncertainty in the terms μn , n = 0, 1, 2, . . ., and satisfy ∞ 1 γ 2n <∞ λn n!
(A11)
n=0
Actually, some elements of λ, namely λk , k ∈ S, where S is a subset of non-negative integers, may be allowed to be zero, if we assume that f belongs to a subspace of F consisting of powers series in F with the terms of degree k ∈ S deleted. (c) Finally, the coefficients ck in the terms μn of the abstract power series expansion of the membership function satisfy the restriction ∞ 1 |k|! |ck |2 < ∞ n!λn k! n=0
(A12)
|k|=n
The above allows us to state the following theorem. For a proof, see [6]. Theorem 1. (de Figueiredo / Dwyer) [6]. Under (A1), (A2),(A11), and (A12), the completion of the set of nonlinear functionals μ in (A2) constitutes a Reproducing Kernel Hilbert (RKHS) F(E N ) = F on
, with: (i) The scalar product between any μ A and μ B ∈ F, corresponding to fuzzy sets A and B is defined by μ A , μ B F =
∞ 1 1 |k|! ck d k ) ( λn n! k! n=0
(A13)
|k|=n
where ck and dk are defined for μ A and μ B in the same way as ck is defined for defined for μ in.(A4) and (A8).
A Nonlinear Functional Analytic Framework for Modeling and Processing Fuzzy Sets
191
(ii) The reproducing kernel, K (x, z), in F is def
K (x, z) = ϕ (x, z) =
∞ λn n=0
n!
(x, z)n
(A14)
where x, z denotes the inner product in E N . In the special case that λn = λn0 ϕ is an exponential function, and thus K (x, z) becomes K (x, z) = exp (λ0 x, z)
(A15)
It is easily verified that the K (x, z) defined as the reproducing property < K (x, .), μ(.) > F = μ(x)
(A16)
Methods for best approximation of a functional in F, on which the development s in section 2.3 are based, are described in [6, 7, 8, 9, 10] and [15].
Concept-Based Search and Questionnaire Systems Masoud Nikravesh
Abstract World Wide Web search engines including Google, Yahoo and MSN have become the most heavily-used online services (including the targeted advertising), with millions of searches performed each day on unstructured sites. In this presentation, we would like to go beyond the traditional web search engines that are based on keyword search and the Semantic Web which provides a common framework that allows data to be shared and reused across application,. For this reason, our view is that “Before one can use the power of web search the relevant information has to be mined through the concept-based search mechanism and logical reasoning with capability to Q&A representation rather than simple keyword search”. In this paper, we will first present the state of the search engines. Then we will focus on development of a framework for reasoning and deduction in the web. A new web search model will be presented. One of the main core ideas that we will use to extend our technique is to change terms-documents-concepts (TDC) matrix into a rule-based and graph-based representation. This will allow us to evolve the traditional search engine (keyword-based search) into a concept-based search and then into Q&A model. Given TDC, we will transform each document into a rule-based model including it’s equivalent graph model. Once the TDC matrix has been transformed into maximally compact concept based on graph representation and rules based on possibilistic relational universal fuzzy–type II (pertaining to composition), one can use Z(n)-compact algorithm and transform the TDC into a decision-tree and hierarchical graph that will represents a Q&A model. Finally, the concept of semantic equivalence and semantic entailment based on possibilistic relational universal fuzzy will be used as a basis for question-answering (Q&A) and inference from fuzzy premises. This will provide a foundation for approximate reasoning, language for representation of imprecise knowledge, a meaning representation language for natural languages, precisiation of fuzzy propositions expressed in a natural language, and as a tool for Precisiated Natural Language (PNL) and precisation of meaning. The maximally compact documents based on Z(n)-compact algorithm and
Masoud Nikravesh BISC Program, Computer Sciences Division, EECS Department and Imaging and Informatics Group-LBNL, University of California, Berkeley, CA 94720, USA e-mail:
[email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
193
194
M. Nikravesh
possibilistic relational universal fuzzy–type II will be used to cluster the documents based on concept-based query-based search criteria. Keywords: Semantic web · Fuzzy query · Fuzzy search · PNL · NeuSearch · Z(n)compact · BISC-DSS
1 Introduction What is Semantic Web? The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work n cooperation.” – Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming.” – W3C organization (http://www.w3.org/2001/sw/) “Facilities to put machine-understandable data on the Web are becoming a high priority for many communities. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. For the Web to scale, tomorrow’s programs must be able to share and process data even when these programs have been designed totally independently. The Semantic Web is a vision: the idea of having data on the web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.” (http://www.w3.org/2001/sw/). Semantic Web is a mesh or network of information that are linked up in such a way that can be accessible and be processed by machines easily, on a global scale. One can think of Semantic Web as being an efficient way of representing and sharing data on the World Wide Web, or as a globally linked database. It is important to mention that Semantic Web technologies are still very much in their infancies and there seems to be little consensus about the likely characteristics of such system. It is also important to keep in mind that the data that is generally hidden in one way or other is often useful in some contexts, but not in others. It is also difficult to use on a large scale such information, because there is no global system for publishing data in such a way as it can be easily processed by anyone. For example, one can think of information about local hotels, sports events, car or home sales info, insurance data, weather information stock market data, subway or plane times, Major League Baseball or Football statistics, and television guides, etc.. . .. All these information are presented by numerous web sites in HTML format. Therefore, it is difficult to use such data/information in a way that one might wanted to do so. To build any semantic web-based system, it will become necessary to construct a powerful logical language for making inferences and reasoning such that the system to become
Concept-Based Search and Questionnaire Systems
195
expressive enough to help users in a wide range of situations. This paper will try to develop a framework to address this issue. Figure 1 shows Concept-Based WebBased Databases for Intelligent Decision Analysis; a framework for next generation of Semantic Web. In the next sections, we will describe the components of such system.
2 Mining Information and Questionnaire System The central tasks for the most of the search engines can be summarize as 1) query or user information request- do what I mean and not what I say!, 2) model for the Internet, Web representation-web page collection, documents, text, images, music, etc, and 3) ranking or matching function-degree of relevance, recall, precision, similarity, etc. One can use clarification dialog, user profile, context, and ontology, into an integrated frame work to design a more intelligent search engine. The model will be used for intelligent information and knowledge retrieval through conceptual matching of text. The selected query doesn’t need to match the decision criteria exactly, which gives the system a more human-like behavior. The model can also be used for constructing ontology or terms related to the context of search or query to resolve the ambiguity. The new model can execute conceptual matching dealing with context-dependent word ambiguity and produce results in a format that permits the user to interact dynamically to customize and personalized its search strategy. It is also possible to automate ontology generation and document indexing using the terms similarity based on Conceptual-Latent Semantic Indexing Technique (CLSI). Often time it is hard to find the “right” term and even in some cases the term does not exist. The ontology is automatically constructed from text document collection and can be used for query refinement. It is also possible to generate conceptual documents similarity map that can be used for intelligent search engine based on CLSI, personalization and user profiling. The user profile is automatically constructed from text document collection and can be used for query refinement and provide suggestions and for ranking the information based on pre-existence user profile. Given the ambiguity and imprecision of the “concept” in the Internet, which may be described by both textual and image information, the use of Fuzzy Conceptual Matching (FCM) is a necessity for search engines. In the FCM approach (Figs. 2 through 4), the “concept” is defined by a series of keywords with different weights depending on the importance of each keyword. Ambiguity in concepts can be defined by a set of imprecise concepts. Each imprecise concept in fact can be defined by a set of fuzzy concepts. The fuzzy concepts can then be related to a set of imprecise words given the context. Imprecise words can then be translated into precise words given the ontology and ambiguity resolution through clarification dialog. By constructing the ontology and fine-tuning the strength of links (weights), we could construct a fuzzy set to integrate piecewise the imprecise concepts and precise words to define the ambiguous concept. To develop FCM, a series of new tools are needed. The new tools are based on fuzzy-logic-based
Imag e Q uery
Search Index
Indexed WebPages
Un-Structured Information
Retrieval
Spiders-Crawlers
Web
User Query
Unit
CFS
UnSupervised
Clustering
SVM
USER
DB
Image Extraction
Web Ana lyzer
i.e. Diagnosis-Prognosis
Intelligent System Analyzer
OutLink
InLink
Terms
Deduction
Structured Information
Evolutionary Kernel
t f.idf
Deduction
Image Analyzer
Aggregation
Unit
CFS
• Functional Requirements • Constraints • Goals and Objectives • LLiinngguuiissttiiccVVaarriiaabblleessRReeqquuirireem meenntt
Q/A System
Image Annotation
Eqv. tf.idf
Eqv. tf.idf
Experts Knowledge Mo del Representation Including Linguistic Formulation
Summarization
Q/A System
Input From Decision Makers
• Selection • Cross Over • Mutation
Genetic Algorithm, Genetic Programming, and DNA
Model and
SOM
DataMiner
Community Builder
Exp ert Knowledge
Organize Knowledge Bases
Generate Knowledge
Knowledge Representation, Data Visualization and Visual Interactive Decision Making
Knowledge Discovery and Data Mining
Data Sources and Warehouse (databases)
Visualization
Data Management
• Query • Aggregation • Ran king • Fitnes s Evaluation
Model Management
Data Visualization
Fig. 1 Concept-Based Web-Based Databases for Intelligent Decision Analysis; a framework for next generation of Semantic Web
and Retrieval
Image Analyzer and Annotation
Data IF … THEN Rule
Knowledge of Engineer
Summarization
Recom mendation, Advice, and Explanation
inferen ces & conclusion
Inference Engine
Knowledge Refinement
expertise is transferred and it is stored
Knowledge Base
Serve as Semantic Web Engine
advises the user and explains the logic
users ask for advice or provide preferences
User Interface Dialog Function Knowledge Base Editor
Semantic Web
3.1.1
User
196 M. Nikravesh
Concept-Based Search and Questionnaire Systems
197
Term-Document Matrix
[ 0, 1]
[ tf-idf] The use of statistical-Probabilistic Theory
The use of bivalent-logic Theory
Specialization
[set] The use of Fuzzy Set-Object-Based Theory
The use of Fuzzy Set Theory
Fig. 2 Evolution of Term-Document Matrix representation
method of computing with words and perceptions (CWP [1, 2, 3, 4]), with the understanding that perceptions are described in a natural language [5, 6, 7, 8, 9] and stateof-the-art computational intelligence techniques [10, 11, 12, 13, 14, 15]. Figure 2 shows the evolution of Term-Document Matrix representation. The [0,1] representation of term-document matrix (or in general, storing the document based on the keywords) is the simplest representation. Most of the current search engines such as GoogleTM, Teoma, Yahoo!, and MSN use this technique to store the term-document q1
r1
q2
r2
q3
p
r3
Fuzzy Conceptual Similarity p (q1 to qi) & p(r1 to rj) & Doc-Doc Similarity Based on Fuzzy Similarity Measure Rules between Authorities, Hubs and Doc-Doc
q1
Fig. 3 Fuzzy Conceptual Similarity
r1
198
M. Nikravesh
RX’=
Webpages
Webpages
1
(Text_Sim, In_Link, Out_Link, Rules, Concept )
2 (…) 0 (…) 0 (…)
0
(Text_Sim, In_Link, Out_Link, Rules, Concept )
1 (…) 1(…) 6 (…)
2
(Text_Sim, In_Link, Out_Link, Rules, Concept )
0 (…) 5 (…) 4 (…)
0
(Text_Sim, In_Link, Out_Link, Rules, Concept )
1 (…) 4 (…) 0 (…)
Text_sim: Based on Conceptual Term-Doc Matrix; It is a Fuzzy Set In_Link & Out_Link: Based on the Conceptual Links which include actual links and virtual links; It is a Fuzzy Set Rules: Fuzzy rules extracted from data or provided by user Concept: Precisiated Natural Language definitions extracted from data or provided by user
Fig. 4 Matrix representation of Fuzzy Conceptual Similarity model
information. One can extend this model by the use of ontology and other similarity measures. This is the core idea that we will use to extend this technique to FCM. In this case, the existence of any keyword that does not exist directly in the document will be decided through the connection weight to other keywords that exist in this document. For example consider the followings: • if the connection weight (based on the ontology) between term “i” (i.e. Automobile) and term “j” (i.e. Transportation) is 0.7; and the connection weight between term “j” (i.e. Transportation) and term “k” (i.e. City) is 0.3; ◦ wij = 0.7 ◦ wjk = 0.3 • and term “i” doesn’t exist in document “I”, term “j” exists in document “I”, and term “k” does not exist in document I ◦ T i DI = 0 ◦ T i DI = 1 ◦ T k DI = 0 • Given the above observations and a threshold of 0.5, one can modify the termdocument matrix as follows: ◦ T i DI ’ = 1 ◦ T i DI ’ = 1 ◦ T k DI ’ = 0 • In general one can use a simple statistical model such as the concurrence matrix to calculate wij [5, 12].
Concept-Based Search and Questionnaire Systems
199
An alternative to the use of [1,0] representation is the use of the tf-idf (term frequency-inverse document frequency) model. In this case, each term gets a weight given its frequency in individual documents (tf, frequency of term in each document) and its frequency in all documents (idf, inverse document frequency). There are many ways to create a tf-idf matrix [7]. In FCM, our main focus is the use of fuzzy set theory to find the association between terms and documents. Given such association, one can represent each entry in the term-document matrix with a set rather then either [0,1] or tf-idf. The use of fuzzy-tf-idf is an alternative to the use of the conventional tf-idf. In this case, the original tf-idf weighting values will be replaced by a fuzzy set representing the original crisp value of that specific term. To construct such value, both ontology and similarity measure can be used. To develop ontology and similarity, one can use the conventional Latent Semantic Indexing (LSI) or Fuzzy-LSI [7]. Given this concept (FCM), one can also modify the link analysis (Fig. 3) and in general Webpage-Webpage similarly (Fig. 4). More information about this project and also a Java version of Fuzzy Search Tool (FST) that uses the FCM model is available at http://www.cs.berkeley.edu/∼nikraves/fst//SFT and a series of papers by the author at Nikravesh, Zadeh and Kacprzyk [12]. Currently, we are extending our work to use the graph theory to represent the termdocument matrix instead of the use of fuzzy set. While, each step increases the complexity and the cost to develop the FCM model, we believe this will increase the performance of the model given our understanding based on the results that we have analyzed so far. Therefore, our target is to develop a more specialized and personalized model with better performance rather than a general, less personalized model with less accuracy. In this case, the cost and complexity will be justified. One of the main core ideas that we will use to extend our technique is to change terms-documents-concepts (TDC) matrix into a rule and graph. In the following section, we will illustrate how one can build such a model. Consider a terms-documents-concepts (TDC) matrix presented as in Table 1 where (Note that TDC entries (kii jj ) could be crisp number, tf-idf values, set or fuzzy-objects (including the linguistic labels of fuzzy granular) as shown in Fig. 2). Di : Documents; where i = 1 . . . m (in this example 12) Keyj : Terms/Keywords in documents; where j = 1 . . . n (in this example 3) Cij : Concepts; where ij = 1 . . . l (in this example 2) One can use Z(n)-compact algorithm (Sect 3.1.7) to represent the TDC matrix with rule-base model. Table 2 shows the intermediate results based on Z(n)-compact algorithm for concept 1. Table 3 shows how the original TDC (Table 1) matrix is represented in final pass with a maximally compact representation. Once the TDC matrix represented by a maximally compact representation (Table 3), one can translate this compact representation with rules as presented in Tables 4 and 5. Table 6 shows the Z(n)-compact algorithm. Z(n) Compact is the basis to create web-web similarly as shown in Fig. 4.
200
M. Nikravesh Table 1 Terms-Documents-Concepts (TDC) Matrix Terms-Documents-Concepts Documents D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12
key1
key2
key3
1
2
3
k1 k1 1 k2 1 k3 1 k3 1 k1 1 k2 1 k3 1 k3 1 k1 1 k2 1 k2 1
k1 k2 2 k2 2 k2 2 k1 2 k2 2 k2 2 k2 2 k1 2 k1 2 k1 2 k1 2
Concepts
k1 k1 3 k1 3 k1 3 k2 3 k2 3 k2 3 k2 3 k1 3 k2 3 k1 3 k2 3
c1 c1 c1 c1 c1 c1 c1 c1 c2 c2 c2 c2
Table 2 Intermediate results for Z(n)-compact algorithm The Intermediate Results/Iterations Documents
key1
key2
key3
C
1
2
3
D1 D2 D3 D4 D5 D6 D7 D8
k1 k1 1 k2 1 k3 1 k3 1 k1 1 k2 1 k3 1
k1 k2 2 k2 2 k2 2 k1 2 k2 2 k2 2 k2 2
k1 k1 3 k1 3 k1 3 k2 3 k2 3 k2 3 k2 3
c1 c1 c1 c1 c1 c1 c1 c1
D2 , D3 , D4 D6 , D7 , D8
∗
k2 2 k2 2
k1 3 k2 3
c1 c1
D5 , D8
k3 1
k2 3
c1 c1 c1 c1 c1
∗
∗
D2 , D6 D3 , D7 D4 , D8 D2 , D6 D2 , D3 , D4 , D6, D7, D8
k1 k2 1 k3 1 ∗
k2 k2 2 k2 2 k2 2
∗
D1 D5 , D6 D2 , D3 , D4 , D6, D7, D8
k1 1 k3 1
k1 2 ∗ k2 2
k1 3 k2 3
1
2
∗
∗ ∗ ∗
c1 c1 c1
∗
Table 3 Maximally Zn-compact representation of TDC matrix Documents D1 D5 , D8 D2 , D3 , D4 , D6 , D7 , D8 D9 D10 D11 , D12
key1
key2
keyn
1
2
3
k1 k3 1 ∗
k3 1 k1 1 k2 1
k1
∗
k2 2 k1 2 k1 2 k1 2
k1 k2 3
∗
k1 3 k2 3
∗
Concepts c1 c1 c1 c2 c2 c2
Concept-Based Search and Questionnaire Systems
201
Table 4 Rule-based representation of Z(n)-compact of TDC matrix Documents
Rules
D1 D5 , D8 D2 , D3 , D4 , D6 , D7 , D8 D9 D10 D11 , D12
If key1 is k1 1 If key1 is k3 1 If key2 is k2 2 If key1 is k3 1 If key1 is k1 1 If key1 is k2 1
and key2 is k1 2 and key3 is k1 3 THEN Concept is c1 and key3 is k2 3 THEN Concept is c1 THEN Concept is c1 and key2 is k1 2 and key3 is k1 3 THEN Concept is c2 and key2 is k1 2 and key3 is k2 3 THEN Concept is c2 and key2 is k1 2 THEN Concept is c2
Table 5 Rule-based representation of Maximally Z(n)-compact of TDC matrix (Alternative representation for Table 4) Documents Rules D1 D5 , D8 D2 , D3 , D4 , D6 , D7 , D8 D9 D10 D11 , D12
If key1 is k1 1 OR If key1 is k3 1 OR If key2 is k2 2 If key1 is k3 1 OR If key1 is k1 1 OR If key1 is k2 1
and key2 is k1 2 and key3 is k1 3 and key3 is k2 3 THEN Concept is c1 and key2 is k1 2 and key3 is k1 3 and key2 is k1 2 and key3 is k2 3 and key2 is k1 2 THEN Concept is c2
Table 6 Z(n)-Compactification Algorithm Z(n)-Compact Algorithm: The following steps are performed successively for each column j; j=1 . . . n 1. Starting with kii jj (ii=1, jj=1) check if for any kii 1 (ii = 1, . . . , 3 in this case) all the columns are the same, then kii 1 can be replaced by ∗ • For example, we can replace kii 1 (ii = 1, . . . , 3) with ∗ in rows 2, 3, and 4. One can lso replace kii 1 with ∗ in rows 6, 7, and 8. (Table 1, first pass). 2. Starting with kii jj (ii=1, jj=2) check if for any kii 2 (ii = 1, . . . , 3 is this case) all the columns are the same, then kii 2 can be replaced by ∗ • For example, we can replace kii 2 (ii=1, ..,3) with ∗ in rows 5 and 8. (Table 1, first pass). 3. Repeat steps one and 2 for all jj. 4. Repeat steps 1 through 3 on new rows created Row ∗ (Pass 1 to Pass nn, in this case, Pass 1 to Pass 3). • For example, on Rows ∗ 2,3, 4 (Pass 1), check if any of the rows given columns jj can be replaced by ∗ . In this case, kii 3 can be replaced by ∗ . This will gives: ∗ k 2 ∗. 2 • For example, on Pass 3, check if any of the rows given columns jj can be replaced by ∗ . In this case, kii 1 can be replaced by ∗ . This will gives: ∗ k2 2 ∗ 5. Repeat steps 1 through 4 until no compactification would be possible
202
M. Nikravesh
As it has been proposed, the TDC entries could not be crisp numbers. The following cases would be possible: A. The basis for the kii jj s are [0 and 1]. This is the simplest case and Z(n)-compact will work as presented B. The basis for the kii jj s are tf-idf or any similar statistical based values or GAGP context-based tf-idf, ranked tf-idf or fuzzy-tf-idf. In this case, we use fuzzy granulation to granulate tf-idf into series of granular, two ([0 or 1] or [high and low]), three (i.e. low, medium, and high), etc. Then the Z(n)-compact will work as it is presented. C. The basis for the kiijj s are set value created based on ontology which can be created based on traditional statistical based methods, human made, or fuzzy-ontology. In this case, the first step is to find the similarities between set values using statistical or fuzzy similarly measures. BISC-DSS software has a set of similarity measures, T-norm and T-conorm, and aggregator operators for this purpose. The second step is to use fuzzy granulation to granulate the similarities values into series of granular, two ( [0 or 1] or [high and low]), three (i.e. low, medium, and high), etc. Then the Z(n)-compact will work as it is presented. D. It is also important to note that concepts may also not be crisp. Therefore, steps B and C could also be used to granulate concepts as it is used to granulate the keyword entries values (kiijj s). E. Another important case is how to select the right keywords in first place. One can use traditional statistical or probabilistic techniques which are based on tf-idf techniques, non traditional techniques such as GA-GP context-based tf-idf, ranked tf-idf or fuzzy-tf-idf, or clustering techniques. These techniques will be used as first pass to select the first set of initial keywords. The second step will be based on feature selection technique based on to maximally separating the concepts. This techniques are currently part of the BISC-DSS toolbox, which includes the Z(n)-Compact-Feature-Selection technique (Z(n)-FCS)). F. Other possible cases: These include when the keywords are represented based on a set of concepts (such as Chinese-DNA model) or concepts are presented as a set of keywords (traditional techniques). In the following sections, we will use Neu-FCS model to create concepts automatically and relate the keywords to the concepts through a mesh of networks of neurons. G. Another very important case is when the cases are not possibilistic and are in general form “ IF Keyi isri Ki and . . . Then Class isrc Cj ; where isr can be presented as: r: = equality constraint: X = R is abbreviation of X is = R r: ≤ inequality constraint: X ≤ R r: ⊂ subsethood constraint: X ⊂ R r: blank possibilistic constraint; X is R; R is the possibility distribution of X r: v veristic constraint; X isv R; R is the verity distribution of X r: p probabilistic constraint; X isp R; R is the probability distribution of X r: rs random set constraint; X isrs R; R is the set-valued probability distribution of X r: fg fuzzy graph constraint; X isfg R; X is a function and R is its fuzzy graph
Concept-Based Search and Questionnaire Systems
203
r: u usuality constraint; X isu R means usually (X is R) r: g group constraint; X isg R means that R constrains the attribute-values of the group • Primary constraints: possibilistic, probabilisitic and veristic • Standard constraints: bivalent possibilistic, probabilistic and bivalent veristic H. It is important to note that Keys also can be presented in from of grammar and linguistic, such as subjects 1 to n; verbs 1 to m, objects 1, l, etc. In this case, each sentence can be presented in form of “isr” form or “IF . . . Then “ rules with “isr” or “is” format. Once the TDC matrix has been transformed into maximally compact concept based on graph representation and rules based on possibilistic relational universal fuzzy– type I, II, III, and IV (pertaining to modification, composition, quantification, and qualification), one can use Z(n)-compact algorithm and transform the TDC into a decision-tree and hierarchical graph that will represents a Q&A model. Finally, the concept of semantic equivalence and semantic entailment based on possibilistic relational universal fuzzy will be used as a basis for question-answering (Q&A) and inference from fuzzy premises. This will provide a foundation for approximate reasoning, language for representation of imprecise knowledge, a meaning representation language for natural languages, precisiation of fuzzy propositions expressed in a natural language, and as a tool for Precisiated Natural Language (PNL) and precisation of meaning. The maximally compact documents based on Z(n)-compact algorithm and possibilistic relational universal fuzzy–type II will be used to cluster the documents based on concept-based query-based search criteria. Tables 7 shows the technique based on possibilistic relational universal fuzzy–type I, II, III, and IV (pertaining to modification, composition, quantification, and qualification), [3, 4] and series of examples to clarify the techniques. Types I through Types IV of possibilistic relational universal fuzzy in connection with Z(n)-Compact can be used as a basis for precisation of meaning. Tables 8 through 17 show the technique Table 7 Four Types of PRUF, Type I, Type II, Type III and Type IV PRUF Type I: pertaining to modification X is very small X is much larger than Y Eleanor was very upset The Man with the blond hair is very tall PRUF Type II: pertaining to composition X is small and Y is large (conjunctive composition) X is small or Y is large (disjunctive composition) If X is small then Y is large (conditional and conjunctive composition) If X is small then Y is large else Y is very lare (conditional and conjunctiv composition)
PRUF – Type III: pertaining to quantification Most Swedes are tall Many men are much taller than most men Most tall men are very intelligent PRUF – Type IV: pertaining to qualification • Abe is young is not very true (truth qualification) • Abe is young is quite probable (probability qualification • Abe is young is almost impossible (possibility qualification)
204
M. Nikravesh
Table 8 Possibilistic Relational Universal Fuzzy–Type I; Pertaining to modification, rules of type I, basis is the modifier Proposition p p : N is F Modified proposition p+ p+ = N is mF m: not, very, more or less, quite, extremely, etc. μYOUNG = 1 − S(25, 35, 45) Lisa is very young → ΠAge(Lisa) = YOUNG2 μYOUNG 2 = (1 − S(25, 35, 45))2 p: Vera and Pat are close friends Approximate p → p∗ : Vera and Pat are friends2 π(F R I E N DS) = μF R I E N DS 2 (Name1 = V era; Name2 = Pat)
Table 9 Possibilistic Relational Universal Fuzzy–Type II – operation of composition – to be used to compact rules presented in Table 5. R
X1 F11 ... Fm1
X2 F12 ... ...
... ... ... ...
Xn F1n ... Fmn
R = X1 is F11 and X2 is F12 and . . . Xn is F1n OR OR X1 is F21 and X2 is F22 and . . . Xn is F2n ... ... ... X1 is Fm1 and X2 is Fm2 and . . . Xn is Fmn R → (F11 × · · · × F1n ) + · · · + (Fm1 × · · · × Fmn )
Table 10 Possibilistic Relational Universal Fuzzy–Type II – pertaining to composition – main operators p = q∗ r q:M is F r:N is G M is F and N is G : F ∩ G = F × G M is F or N is G = F + G if M is F then N is G = F ⊕ G or If M is F then N is G = F × G + F × V F
=
F
×V
G =U ×G μ F×G (u, v) = μ F (u) ∧ μG (V ) μF⊕G (u, v) = 1 ∧ (1 − μ F (u) + μG (V )) ∧ : min, +arithmetic sum, -arithmetic difference
If M is F then N is G else N is H Π(X,...,X m ,Y1 ,...,Yn ) = (F ⊕ G) ∩ (F ⊕ H ) or If M is F then N is G else N isH ↔ (If M is F then N is G) and (If M is not F then N is H)
Concept-Based Search and Questionnaire Systems
205
Table 11 Possibilistic Relational Universal Fuzzy–Type II – Examples X is small or Y is large: Example:
⎡
U = V = 1 + 2 + 3, M: X,N:Y F: SMALL : 1/1+0.6/2+0.1/3 G: LARGE : 0.1/1+0.6/2+1/3. X is small⎡and Y is large: ⎤ 0.1 0.6 1 ⎣ Π(x,y) = 0.1 0.6 0.6⎦ 0.1 0.1 0.1
Π(x,y)
⎤ 1 1 1 = ⎣0.6 0.6 1⎦ 0.1 0.6 1
if X is small ⎡ then Y is⎤large: 0.1 0.6 1 Π(x,y) = ⎣0.5 1 1⎦ 1 1 1
if X is small then Y is large: Π(x,y)
⎡ ⎤ 0.1 0.6 1 = ⎣0.4 0.6 0.6⎦ 0.9 0.9 0.9
Table 12 Possibilistic Relational Universal Fuzzy–Type III, Pertaining to quantification, Rules of type III; p: Q N are F p = Q N are F Q : fuzzy quantifier quantifier : most, many, few, some, almost Most Swedes are tall N is F → Π X = F Q N are F → Πcount (F) = Q mp ↔ q q → mp (Π p )1 if m : not (Π p )2 if m : very (Π p )0.5 if m : more or less
Table 13 Possibilistic Relational Universal Fuzzy–Type III, Pertaining to quantification, Rules of type III; p: Q N are F, Examples m (N is F) ↔ N is mF not (N is F) ↔ N is not F very (N is F) ↔ N is very F more of less (N is F) ↔ N is more of less F m(M is F and N is G) ↔ (X, Y) is m (F × G) not (M is F and N is G) ↔ (X, Y) is (F × G)
↔ (M is not F )or (N is not G ) very (M is F and N is G) ↔ (M is very F )and (N is very G ) more of less (M is F and N is G) ↔ (M is more or less F) and (N is more or less G) m (QN are F) ↔ (m Q) N are F not (Q N are F) ↔ (not Q) N are F
206
M. Nikravesh
Table 14 Possibilistic Relational Universal Fuzzy–Type IV, Pertaining to qualification, Rules of type IV; q: p is ?, Example 1 y: truth value, a probability -value, a possibility value, . . . for p: N is F and q be a truth - qualified value for p q: N is F is T where T is a liguistic truth - value N is F is T ↔ N is G q: N is F is T q: N is small is very true T = μ F (G) μSMALL = 1 − S(5, 10, 15) μG (u) = μT (μ F (u)) μTRUE = 1 − S(0.6, 0.8, 1.0) N is F → ΠX = F Then N is F is T → ΠX = F+ μF+ (u) = μT (μF (u)) q → X (u) = S 2 (1 − S(u; 5, 10, 15); (0.6, 0.8, 1.0)) if T = u - true μu-true (v) = v v ∈ [0, 1] N is F is u - true → N is F
Table 15 Possibilistic Relational Universal Fuzzy–Type IV, Pertaining to qualification, Rules of type IV; q: p is ? Example 2. Y : truth value, a probability - value, a possibility value,. . . for p: N is F and q be a truth - qualified value for p q: N is F is T where T is a liguistic truth - value N is F is T ↔ N is G q: N is F is T q: N is small is very true T = μF (G) μSMALL = 1-S(5, 10, 15) μG (u) = μT (μF (u)) μTRUE = 1-S(0.6, 0.8, 1.0) N is F → ΠX = F Then N is F is T → ΠX = F+ μF+ (u) = μT (μF (u)) q → X (u) = S 2 (1 − S(u; 5, 10, 15); (0.6, 0.8, 1.0)) If T = u- true μu-true (v) = v v ∈ [0, 1] N is F is u-true → N is F
Table 16 Possibilistic Relational Universal Fuzzy–Type IV, Pertaining to qualification, Rules of type IV; q: p is ? , Example 3. m (N is F is T ) ↔ N is F is m T not (N is F is T ) ↔ N is F is Not T very (N is F is T ) ↔ N is F is very T more or less (N is F is T ) ↔ N is F is more or less T N is not F is T ↔ N is F is ant T ant: antonym false:ant true N is very F is T ↔ N is F is 0.5 T μ0.5T (V ) = μT (V 2 ) μaT (V ) = μT (V 1/a )
Concept-Based Search and Questionnaire Systems
207
Table 17 Possibilistic Relational Universal Fuzzy–Type IV, Pertaining to qualification, Rules of type IV; q: p is ? , Example 4 “Barbara is not very rich” Semantically equivalent propositions: Barbara is not very rich Barbara is not very rich is u - true Barbara is very rich is ant u - true Barbara is rich is 0.5 (ant u - true) where μ0,5(ant u-true) (V ) = 1 − V 2 and true approximately semantically = u − true the proposition will be approximated to: “Barbara is rich is not very true” “Is Barbara rich? T ” is “not very true”.
based on possibilistic relational universal fuzzy–type I, II, III, and IV (pertaining to modification, composition, quantification, and qualification) [11, 12, 13] and series of examples to clarify definitions.
2.1 Fuzzy Conceptual Match The main problem with conventional information retrieval and search such as vector space representation of term-document vectors are that 1) there is no real theoretical basis for the assumption of a term and document space and 2) terms and documents are not really orthogonal dimensions. These techniques are used more for visualization and most similarity measures work about the same regardless of model. In addition, terms are not independent of all other terms. With regards to probabilistic models, important indicators of relevance may not be term – though terms only are usually used. Regarding Boolean model, complex query syntax is often misunderstood and problems of null output and Information overload exist. One solution to these problems is to use extended Boolean model or fuzzy logic. In this case, one can add a fuzzy quantifier to each term or concept. In addition, one can interpret the AND as fuzzy-MIN and OR as fuzzy-MAX functions. Alternatively, one can add agents in the user interface and assign certain tasks to them or use machine learning to learn user behavior or preferences to improve performance. This technique is useful when past behavior is a useful predictor of the future and wide variety of behaviors amongst users exist. In our perspective, we define this framework as Fuzzy Conceptual Matching based on Human Mental Model. The Conceptual Fuzzy Search (CFS) model will be used for intelligent information and knowledge retrieval through conceptual matching of both text and images (here defined as “Concept”). The selected query doesn’t need to match the decision criteria exactly, which gives the system a more human-like behavior. The CFS can also be used for constructing fuzzy ontology or terms related to the context of search or query to resolve the ambiguity. It is intended to combine the expert knowledge with soft computing tool. Expert knowledge needs to be partially converted into artificial intelligence that can better handle the huge information stream. In addition, sophisticated management
208
M. Nikravesh
work-flow need to be designed to make optimal use of this information. The new model can execute conceptual matching dealing with context-dependent word ambiguity and produce results in a format that permits the user to interact dynamically to customize and personalized its search strategy.
2.2 From Search Engine to Q/A and Questionnaire Systems (Extracted text from Prof. Zadeh’s presentation and abstracts; Nikravesh et al., Web Intelligence: Conceptual-Based Model, Memorandum No. UCB/ERL M03/19, 5 June 2003): “Construction of Q/A systems has a long history in AI. Interest in Q/A systems peaked in the seventies and eighties, and began to decline when it became obvious that the available tools were not adequate for construction of systems having significant question-answering capabilities. However, Q/A systems in the form of domain-restricted expert systems have proved to be of value, and are growing in versatility, visibility and importance. Upgrading a search engine to a Q/A system is a complex, effort-intensive, open-ended problem. Semantic Web and related systems for upgrading quality of search may be viewed as steps in this direction. But what may be argued, as is done in the following, is that existing tools, based as they are on bivalent logic and probability theory, have intrinsic limitations. The principal obstacle is the nature of world knowledge. Dealing with world knowledge needs new tools. A new tool which is suggested for this purpose is the fuzzy-logic-based method of computing with words and perceptions (CWP [1, 2, 3, 4]), with the understanding that perceptions are described in a natural language [5, 6, 7, 8, 9]. A concept which plays a key role in CWP is that of Precisiated Natural Language (PNL). It is this language that is the centerpiece of our approach to reasoning and decision-making with world knowledge. The main thrust of the fuzzy-logic-based approach to question-answering which is outlined here, is that to achieve significant question-answering capability it is necessary to develop methods of dealing with the reality that much of world knowledge—and especially knowledge about underlying probabilities is perception-based. Dealing with perception-based information is more complex and more effort-intensive than dealing with measurement-based information. In this instance, as in many others, complexity is the price that has to be paid to achieve superior performance.” Once the TDC matrix (Table 1) has been transformed into maximally Z(n)compact representation (Table 3) and Rules (Tables 4 and 5), one can use Z(n)compact algorithm and transform the TDC into a decision-tree/hierarchical graph which represent a Q&A model as shown in Fig. 5.
2.3 NeuSearchTM There are two types of search engine that we are interested and are dominating the Internet. First, the most popular search engines that are mainly for unstructured
Concept-Based Search and Questionnaire Systems
209
Q Q & A and Deduction Model Z(n)-Compact Clustering
Q2 ak222
a1122 k Q1 k11 a
Q3
ak2211
a1 c 1
ak3311
D2,D3, D4, D6,D7, D8
Q3
ac2D11, D12 2
ak13
k1133 ak2233 a
c1
ac22
D1
D10
ac2
2 D9
ak223 ac1
1
D5,D8
Fig. 5 Q & A model of TDC matrix and maximally Z(n)-compact rules and concept-based querybased clustering
data such as Google TM , Yahoo, MSN, and Teoma which are based on the concept of Authorities and Hubs. Second, search engines that are task spcifics such as 1) Yahoo!: manually-pre-classified, 2) NorthernLight: Classification, 3) Vivisimo: Clustering, 4) Self-organizing Map: Clustering + Visualization and 5) AskJeeves: Natural Languages-Based Search; Human Expert. Google uses the PageRank and Teoma uses HITS for the Ranking. To develop such models, state-of-the-art computational intelligence techniques are needed [10, 11, 12, 13, 14, 15]. Figures 5 through 8 show, how neuro-science and PNL can be used to develop the next generation of the search engine. Figure 6 shows a unified framework for the development of a search engine based on conceptual semantic indexing. This model will be used to develop the NeuSearch model based on the Neuroscience and PNL approach. As explained previously and represented by Figs. 2 through 4 with respect to the development of FCM, the first step will be to represent the term-document matrix. Tf-idf is the starting point for our model. As explained earlier, there are several ways to create the tf-idf matrix [7]. One such attractive idea is to use the term ranking algorithm based on evolutionary computing, as it is presented in Fig. 6, GA-GP context-based tf-idf or ranked tf-idf. Once fuzzy tf-idf (term-document matrix) is created, the next step will be to use such indexing to develop the search and information retrieval mechanism. As shown in Fig. 6, there are two alternatives 1) classical search models such as LSI-based approach and 2) NeuSearch model. The LSI based models include 1) Probability based-LSI, Bayesian-LSI, Fuzzy-LSI, and NNNet-based-LSI models. It is interesting that one can find an alternative to each of such LSI-based models using Radial Basis Function (RBF). For example, one can use Probabilistic RBF equivalent to
210
M. Nikravesh NeuSearch : Neuroscience Approach Search Engine Based on Conceptual Semantic Indexing FCS Based on Neuroscience Approach
PRBF
Probability
LSI
GRRBF
B a y e si a n
RBF
w (i,j) Fuzzy
ANFIS
C l a s s ic a l S e a rc h
NeuFCS NNnet (BP, GA-GP, SVM)
RBFNN (BP, GA-GP, SVM)
Neuro-Fuzzy Conceptual Search (NeuFCS)
Term-Document Matrix
Lycos, etc. Keyword search; classical techniques; Google, Teoma, etc.
[ tf- id f]
[ 0, 1]
Topic, Title, The use of statistical-Probabilistic Theory
The use of bivalent-logic Theory
Specialization
w (i,j)
[set] The use of Fuzzy Set-Object-Based Theory
Use Graph Theory and Semantic Net. NLP with GA-GP Based NLP; Possibly AskJeeves.
GA-GP Context-Based tf-idf; Ranked tf-idf
Summarization Concept-Based Indexing
The use of Fuzzy Set Theory
I m p rec i se S earch
Fig. 6 Search engine based on conceptual semantic indexing
Neu-FCS
Concept-Context Dependent Word Space
w ( j, k )
f p( j , k ), p( j ), p(k )
j: neuron in document or Concept-Context layer k: neuron in word layer
Document
Documents Space or
(Corpus)
Conc ep t and Context Space Ba se d on SO M or PCA
w (i, j )
f p ( i , j ), p ( i ), p ( j )
i: neuron in word layer j: neuron in document or Concept-Context layer
W(i,j) is calculated based on Fuzzy-LSI or Probabilistic LSI (In general form, it can be Calculated based on PNL)
Word Space
Fig. 7 Neuro-Fuzzy Conceptual Search (NeuFCS)
Concept-Based Search and Questionnaire Systems
211 Interconnection based on Mutual Information
Concept-Context Dependent Word Space
i is an instance of j i is an subset of j i is an superset of j j is an attribute of i i causes j i and j are related
rij: w(j, k) = f (p(j, k), p(j), p(k)) j: neuron in document or Concept-Context layer
Document (Corpus)
k: neuron in word layer
w(i, j) = f (p(i, j), p(i), p(j))
w(i, j)
i: neuron in word layer j: neuron in document or Concept-Context layer
(is or isu) (is or isu) (is or isu) (or usually)
rij
i: neuron in document layer Word Space
j: neuron in word layer
Fig. 8 PNL-based conceptual fuzzy search using brain science
Probability based-LSI, Generalized RBF as an equivalent to Bayesian-LSI, ANFIS as an equivalent to Fuzzy-LSI, and RBF function neural network (RBFNN) as an equivalent to NNnet-LSI. RBF model is the basis of the NeuSearch model (NeuroFuzzy Conceptual Search– NeuFCS). Given the NeuSearch model, one needs to calculate the w(i,j) which will be defined in next section (the network weights). Depends on the model and framework used (Probability, Bayesian, Fuzzy, or NNnet model), the interpretation of the weights will be different. Fig. 7 show the typical Neuro-Fuzzy Conceptual Search (NeuFCS) model input-output. The main idea with respect to NeuSearch is concept-context-based search. For example, one can search for word “Java” in the context of “Computer” or in the context of “Coffee”. One can also search for “apple” in the context of “Fruit” or in the context of “Computer”. Therefore, before we relate the terms to the document, we first extend the keyword, given the existing concepts-contexts and then we relate that term to the documents. Therefore, there will be always a two-step process, based on NeuSearch model as shown below: Concept-Context Nodes (RBF Nodes) Extended keyword
Original keywords Wi,j
Wj,k
Original Documents
Extended Documents W’i,j
W’j,k
Concept-Context Nodes (RBF Nodes)
In general, w(i,j) is a function of p(i,j), p(i), and p(j), where p represent the probability. Therefore, one can use the probabilistic-LSI or PRBF given the NeuSearch
212
M. Nikravesh rij rij wij
i
j K(i) lexine
wij = granular strength of association between i and j
lexinei
network of nodes and links
• i (lexine): object, construct, concept (e.g., car, Ph.D. degree) K(i): world knowledge about i (mostly perception-based) K(i) is organized into n(i) relations Rii, …, Rin entries in Rij are bimodal-distribution-valued attributes of i values of attributes are, in general, granular and contextdependent
lexinej
rij:
• • • •
i is an instance of j i is a subset of j i is a superset of j j is an attribute of i i causes j i and j are related
(is or isu) (is or isu) (is or isu) (or usually)
Fig. 9 Organization of the world knowledge; Epistemic (Knowledge directed) and Lexicon (Ontology related)
framework. If the probabilities are not known, which is often times is the case, one can use Fuzzy-LSI model or ANFIS model or NNnet-LSI or RBFNN model using the NeuSearch model. In general, the PNL model can be used as unified framework as shown in Figs. 8 through 10. Figures 8 through 10 show PNL-Based Conceptual Fuzzy Search Using Brain Science model and concept presented based on Figs. 2 through 7.
•standard constraint: X C •generalized constraint: X isr R copula
X isr R
GC-form (generalized constraint form of type r) type identifier constraining relation constrained variable
•X= (X1 , …, Xn ) •X may have a structure: X=Location (Residence(Carol)) •X may be a function of another variable: X=f(Y) •X may be conditioned: (X/Y) • r := / / ... / / / blank / v / p / u / rs / fg / ps / ... r: rs
random set constraint; X isrs R; R is the setvalued probability distribution of X
r: fg
fuzzy graph constraint; X isfg R; X is a function and R is its fuzzy graph
r: u
usuality constraint; X isu R means usually (X is R)
r: ps Pawlak set constraint: X isps ( X, X) means that X is a set and X and X are the lower and upper approximations to X
Fig. 10 Generalized Constraint
Concept-Based Search and Questionnaire Systems
213
Based on PNL approach, w(i,j) is defined based on ri,j as follows: ri,j i
j Wi,j
Where wi,j is granular strength of association between i and j, ri,j is epistemic lexicon, wi,j <== ri,j, and ri,j is defined as follows: rij: i is an instance of j (is or isu) i is a subset of j (is or isu) i is a superset of j (is or isu) j is an attribute of i i causes j (or usually) i and j are related Figure 11 shows a typical example of Neu-FCS result. Both term-document matrix and term-term document matrix/document-document matrix are reconstructed. For example, the original document corpus is very sparse as one can see in the figure; however, after processing through concept node, the document corpus is less spare and smoothens. Also, the original keywords are expanded given the concept-context nodes. While the nodes in the hidden space (Concept-Context nodes) are based on RBF (NueSearch model) and it is one dimensional nodes, one can use SOM to
Neu-FCS
Output: Concept-Context Dependent Word
Activated Document or Concept-Context
Document (Corpus)
Input: Word Word Space
Fig. 11 Typical example of Neu-FCS
214
M. Nikravesh
rearrange the nodes given the similarity of the concept (in this case the RBF in each node) and present the 1-D nodes into 2-D model as shown in Fig. 11.
3 Conclusions Intelligent search engines with growing complexity and technological challenges are currently being developed. This requires new technology in terms of understanding, development, engineering design and visualization. While the technological expertise of each component becomes increasingly complex, there is a need for better integration of each component into a global model adequately capturing the imprecision and deduction capabilities. In addition, intelligent models can mine the Internet to conceptually match and rank homepages based on predefined linguistic formulations and rules defined by experts or based on a set of known homepages. The FCM model can be used as a framework for intelligent information and knowledge retrieval through conceptual matching of both text and images (here defined as “Concept”). The FCM can also be used for constructing fuzzy ontology or terms related to the context of the query and search to resolve the ambiguity. This model can be used to calculate conceptually the degree of match to the object or query. Acknowledgments Funding for this research was provided by the British Telecommunication (BT), the BISC Program of UC Berkeley, and Imaging and Informatics group at Lawrence Berkeley National Lab. The author would like to thanks Prof. Zadeh for his feedback, comments and allowing the authors to use the published and his unpublished documents, papers, and presentations to prepare this paper.
References 1. L. A. Zadeh, From Computing with Numbers to Computing with Words – From Manipulation of Measurements to Manipulation of Perceptions, IEEE Transactions on Circuits and Systems, 45, 105–119, 1999. 2. L. A. Zadeh, “A new direction in AI: Towards a Computational Theory of Perceptions,” AI magazine, vol. 22, pp. 73–84, 2001. 3. L.A. Zadeh, Toward a Perception-based Theory of Probabilistic Reasoning with Imprecise Probabilities, Journal of Statistical Planning and Inference, 105 233–264, 2002. 4. L. A. Zadeh and M. Nikravesh, Perception-Based Intelligent Decision Systems; Office of Naval Research, Summer 2002 Program Review, Covel Commons, University of California, Los Angeles, July 30th –August 1st , 2002. (L.A. PRUF-a meaning representation language for natural languages, Int. J. Man-Machine Studies 10, 395–460, 1978.) 5. M. Nikravesh and B. Azvine; New Directions in Enhancing the Power of the Internet, Proc. Of the 2001 BISC Int. Workshop, University of California, Berkeley, Report: UCB/ERL M01/28, August 2001. 6. V. Loia , M. Nikravesh, L. A. Zadeh, Journal of Soft Computing, Special Issue; fuzzy Logic and the Internet, Springer Verlag, Vol. 6, No. 5; August 2002. 7. M. Nikravesh, et. al, “Enhancing the Power of the Internet”, Volume 139, published in the Series Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer (2004).
Concept-Based Search and Questionnaire Systems
215
8. M. Nikravesh, Fuzzy Logic and Internet: Perception Based Information Processing and Retrieval, Berkeley Initiative in Soft Computing, Report No. 2001-2-SI-BT, September 2001a. 9. M. Nikravesh, BISC and The New Millennium, Perception-based Information Processing, Berkeley Initiative in Soft Computing, Report No. 2001-1-SI, September 2001b. 10. M. Nikravesh, V. Loia„ and B. Azvine, Fuzzy logic and the Internet (FLINT), Internet, World Wide Web, and Search Engines, International Journal of Soft Computing-Special Issue in fuzzy logic and the Internet , 2002 11. M. Nikravesh, Fuzzy Conceptual-Based Search Engine using Conceptual Semantic Indexing, NAFIPS-FLINT 2002, June 27–29, New Orleans, LA, USA 12. M. Nikravesh and B. Azvin, Fuzzy Queries, Search, and Decision Support System, International Journal of Soft Computing-Special Issue in fuzzy logic and the Internet , 2002 13. M. Nikravesh, V. Loia, and B. Azvine, Fuzzy logic and the Internet (FLINT), Internet, World Wide Web, and Search Engines, International Journal of Soft Computing-Special Issue in fuzzy logic and the Internet, 2002 14. M. Nikravesh, Fuzzy Conceptual-Based Search Engine using Conceptual Semantic Indexing, NAFIPS-FLINT 2002, June 27–29, New Orleans, LA, USA 15. V. Loia, M. Nikravesh and Lotfi A. Zadeh, “Fuzzy Logic and the Internet”, Volume 137, published in the Series Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer (2004)
Towards Perception Based Time Series Data Mining Ildar Z. Batyrshin and Leonid Sheremetov
Abstract Human decision making procedures in problems related with analysis of time series data bases (TSDB) often use perceptions like “several days”, “high price”, “quickly increasing” etc. Computing with Words and Perceptions can be used to formalize perception based expert knowledge and inference mechanisms defined on numerical domains of TSDB. For extraction from TSDB perception based information relevant to decision making problems it is necessary to develop methods of perception based time series data mining (PTSDM). The paper considers different approaches used in analysis of time series databases for description of perception based patterns and discusses some methods of PTSDM.
1 Introduction Human decision making procedures in problems related with analysis of time series data bases (TSDB) in economics, finance, meteorology, medicine, geophysics etc. often use perceptions like several days, near future, high price, very quickly increasing, slowly oscillating, new companies, highly associated behavior, very probably etc. defined on a time domain, on a range of time series values, on the sets of time series, attributes, system elements, on a set of possibility or probability values etc. Computing with Words and Perceptions (CWP) [38] containing fuzzy logic as a main constituent can serve as a bridge between linguistic information in expert knowledge and numerical information given in TSDB. Inference procedures of CWP can be used for modeling human perception based reasoning mechanisms. For extraction from TSDB perception based information relevant to decision making problems it is necessary to develop methods of perception based time series data mining (PTSDM). Fig. 1 shows a possible architecture of intelligent decision making system integrating PTSDM, CWP and expert knowledge used in decision making procedures in problems related with analysis of TSDB.
Ildar Z. Batyrshin · Leonid Sheremetov Research Program in Applied Mathematics and Computing (PIMAyC), Mexican Petroleum Institute, Eje Central Lazaro Cardenas 152, Col. San Bartolo Atepehuacan, C.P. 07730, Mexico, D.F., Mexico, e-mail: {batyr,
[email protected]}
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
217
218
I. Z. Batyrshin, L. Sheremetov
Fig. 1 Architecture of intelligent decision making system based on expert knowledge in time series data base domains
Intelligent Decision Making System Expert Knowledge
Computing with Words and Perceptions
Perception Based Time Series Data Mining
Time Series Data Base
Many economic, financial, technological and natural systems described by TSDB are very complex and often it is impossible to take into account all information influencing on the solution of the decision making problems related with the analysis of them. To such systems the Principle of Incompatibility of Zadeh can be applied [37]: “As the complexity of a system increases, our ability to make precise and yet significant statements about its behavior diminishes until a threshold is reached beyond which precision and significance (or relevance) become almost mutually exclusive characteristics”. For this reason, often only qualitative, perception-based solutions have sense in decision making in complex systems. Realization of intelligent system supporting perception based decision making procedures in the problems related with the analysis of time series data bases needs to extend the methods of time series data mining (TSDM) to give them possibility to operate with perceptions. The goal of data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner [18]. The following list contains the main time series data mining tasks [23, 1, 3, 12, 13, 16, 19, 27, 28, 36].
Segmentation: Split a time series into a number of “meaningful” segments. Possible representations of segments include approximating line, perceptual pattern, word, etc. Clustering: Find natural groupings of time series or time series patterns. Classification: Assign given time series or time series patterns to one of several predefined classes. Indexing: Realize efficient execution of queries. Summarization: Give short description of a time series (or multivariate time series) which retains its essential features in considered problem. Anomaly Detection: Find surprising, unexpected patterns. Motif Discovery: Find frequently occurring patterns. Forecasting: Forecast time series values based on time series history or human expertise. Discovery of association rules: Find rules relating patterns in time series (e.g. patterns that occur frequently in the same or in neighboring time segments).
Towards Perception Based Time Series Data Mining
219
These tasks are mutually related, for example, segmentation can be used for indexing, clustering, summarization, etc. Perception-based time series data mining systems should be able to manipulate linguistic information, fuzzy concepts and perception based patterns of time series to support human decision making in problems related with time series data bases. Fortunately, a number of methods in time series data mining were recently developed for manipulating such information. The types of perceptions which can be defined on domains of TSDB and a short survey of papers operating by perception based shape patterns are considered in Sect. 2. Usually the patterns used in many works are crisp but they may be generalized to represent fuzzy patterns. In Sects. 3 we consider new methods of parameterization of perception based shape patterns and application of fuzzy perception based functions to modeling qualitative forecast of new product sales. Conclusions contain discussion of future directions of research in PTSDM.
2 Perception Based Patterns in Time Series Data Bases Development of methods of PTSDM needs to formalize human perceptions about time, time series values, patterns and shapes, about associations between patterns and time series, etc. These perceptions can be represented by words whose meaning is defined on the following domains of time series data bases: • time domain, time intervals (one-two weeks, several days, end of the day), absolute or relative position on time scale (in June 2006, near future), periodic or seasonal time intervals (end of the day, several weeks before Christmas); • range of TS values (high price; very low level of production); • as a perception based function or pattern of TS shape (slowly decreasing, quickly increasing and slightly concave); • on the set of time series, attributes or system elements (stocks of new companies); • on the set of relations between TS, attributes or elements (highly associated); • on the set of possibility or probability values (unlikely, very probable). Most of such perceptions can be represented as fuzzy sets defined on corresponding domain [24]. An exact definition of membership values of fuzzy sets used in models of CWP often is not very important when input and output of models are words [38], which are usually insensible to some change in membership values of fuzzy sets representing them. This situation differs from Mamdani or Sugeno fuzzy models where initial definition of fuzzy sets usually does not play an important role in construction of final fuzzy model when a tuning of membership functions is used in the presence of training input-output data [20]. Different techniques used for description of time series shape patterns can be conventionally classified as follows: • analysis of signs of first and second derivatives of time series variable, • scaling of trends and convex-concave patterns,
220
• • • • •
I. Z. Batyrshin, L. Sheremetov
parameterization of patterns, shape definition language, clustering and linguistic interpretation of cluster shape pattern, analysis of temporal relations between shape patterns, analysis of shape patterns given in expert knowledge, summaries and forecasting texts.
Triangular episodes representation language was formulated in [15] for representation and extraction of temporal patterns. These episodes defined by the signs of the first and second derivatives of time dependent function can be linguistically described as A: Increasing and Concave; B: Decreasing and Concave; C: Decreasing and Convex; D: Increasing and Convex; E: Linearly Increasing; F: Linearly Decreasing; G: Constant. These episodes can be used for coding time series and time series patterns as a sequence of symbols like ABCDABCDAB. Such coded representation of time series can be used for dimensionality reduction, indexing, clustering of time series, etc. The possible applications of such representation are process monitoring, diagnosis and control. The more extended dictionary of temporal patterns defined by the signs of the first and the second derivatives in the pattern is considered in [25]. This dictionary includes the perceptual patterns: PassedOverMaximum; IncreasingConcavely; StartedToIncrease; ConvexMaximum etc. In considered approach the profile of measured variable x j (t) is transformed into the qualitative form as a result of approximation of x j (t) by a proper analytical function from which the signs of derivatives are extracted. The paper [25] introduces a method for reasoning about the form of the recent temporal profiles of process variables which carry important information about the process state and underlying process phenomena. Below is the example of the shape analyzing rule used by decision making system in control of fermentation processes [25]: IF (DuringTheLast1hr Dissolved Oxygen has been DecreasingConcavelyConvexly) THEN (Report: Foaming) and (Feed antifoam agent). A scaling of perception based patterns is used in many papers. This scaling can be applied to time series values, to slope values, to convex-concave shapes etc. The list of primitives Upslope, Large-Pos, Large-Neg, Med-Pos, Med-Neg, Trailing Edge, Cup, Cap was used for description of Carotid waveforms [32]. The sequence of primitives can be used in syntactic pattern recognition of systolic and diastolic epochs. In [11], the possibility to introduce fuzziness into syntactic descriptions of digital signals is discussed. Fuzzy boundaries between the systolic and diastolic regimes and between primitives can be represented by transition membership functions. Scaling of slope values of functional dependencies and time series is used in a system [9] that generates linguistic descriptions of time series in the form of rules If T is Tk then Y is Ak , where Tk are fuzzy intervals, like Between A and B, Small, Large and Ak are linguistic descriptions of trends, like Very Quickly Increasing, Slowly Decreasing, etc.
Towards Perception Based Time Series Data Mining
221
An evolutionary procedure is used to find optimal partition of time domain on fuzzy intervals where time series values are approximated by linear functions. The paper discusses the methods of retranslation of obtained piece-wise linear approximation of time series into linguistic form. Similar linguistic descriptions of time series are reported in [13] where a scaling of trends in the system is used that detects and describes linguistically significant trends in time-series data, applying wavelets and scale space theory. In that work some experimental results of application of this system to summarization of weather data are presented. Below is an example of a description generated automatically by the system, which used perception based patterns like dropped, dropped sharply, etc: “The temperature dropped sharply between the 3rd and the 6th . Then it rose between the 6th and the 11th, dropped sharply between the 14th and 16th and dropped between the 23rd and 28th.” Another approach to description of time series by rules If T is Tk then Y is Ak , of considered above type was proposed in [7]. The method is based on Moving Approximation (MAP) transform [8, 6], which replaces time series by a sequence of slope values of linear functions approximating time series values in sliding window. Perceptions like “Quickly Increasing”, “Very Slowly Decreasing” etc can be easily defined on the set of slope values generated by the MAP [7]. MAP can be used for analysis of TS, e.g. in economics and finance, where the analysis of trends and tendencies in TS values plays an important role. MAP gives a basis for definition of trend association measures for evaluation of associations between time series and time series patterns [6]. Unlike the known similarity measures used in TSDM the new measures are invariant under linear transformations of TS. For this reason they can be used as regular measures of similarity between TS and TS patterns in most tasks of TSDM. Perception based conditions can be used in the analysis of associations between elements of systems described by multivariate time series, e.g. system of petroleum wells given by time series of oil and gas production, system of currencies given by time series of exchange rate etc. Generation of association rules in the form IF PBC then A is highly associated with B, where PBC is perception based condition like High level of oil production in winter months in well number 25 was considered in [5] for the analysis of associations between oil and gas production of wells from some oilfield. Such associations can be compared with spatial and other relations existing between system elements. The paper [3] presents an approach to modeling time series datasets using linguistic shape descriptors represented by parametric functions. A linguistic term “rising” indicates that the series at this point is changing, i.e. yk+1 > yk . A more complex term such as “rising more steeply” indicates that the trend of the series is changing, i.e. yk+2 − yk+1 > yk+1 − yk . These terms are measures of the first and the second derivatives of the series, respectively. Parametric prototype trend shapes are given by the following functions:
222
I. Z. Batyrshin, L. Sheremetov
√ √ f f ls (t) = 1 − 1 − (1 − t)α , frms (t) = 1 − 1 − t α , f f ms (t) = 1 − t α , frls (t) = 1 − (1 − t)α , fcr (t) = 1 − α 2 (t − 0.5)α , ftr (t) = α 2 (t − 0.5)α , representing perception based shape patterns falling less steeply, falling more steeply, rising more steeply, rising less steeply, crest, trough, respectively. Each part of time series can have membership in several trend shapes. The trend concepts are used to build fuzzy rules in the form [3]: If trend is F then next point is Y; If trend is F then next point is current point+dY, where F is a trend fuzzy set, such as “rising more steeply” and Y, dY are fuzzy time series values. Prediction using these trend fuzzy sets is performed using the Fril evidential logic rule. Approach uses also fuzzy scaling of trends with linguistic patterns: falling fast, falling slowly, constant, rising slowly, and rising fast. A Shape Definition Language (SDL) was developed in [1] for retrieving objects based on shapes contained in the histories associated with these objects. For example, in a stock database, the histories of opening price, closing price, the highest for the day, the lowest for the day, and the trading volume may be associated with each stock. SDL allows a variety of queries about the shapes of histories. It performs “blurry” matching [1] where the user cares about the overall shape but does not care about specific details. SDL has efficient implementation based on index structure for speeding up the execution of SDL queries. The alphabet of SDL contains a set of scaled patterns like slightly increasing transition, highly increasing transition, slightly decreasing transition, highly decreasing transition denoted as up, Up, down, Down respectively.This alphabet can be used for definition of shapes as follows: (shape name(parameters) descriptor). For example, “a spike” can be defined as (shape spike() (concat Up up down Down)), where the list of parameters is empty and concat denotes concatenation. Complex shapes can be derived by recursive combination of elementary and previously defined shapes. The approach gives the possibility to retrieve combinations of several shapes in different histories by using the logical operators and and or. Clustering and linguistic interpretation of shape patterns is considered in [16]. This paper studies the problem of finding rules, relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. The patterns are formed from data. The method first forms subsequences by sliding a window through the time series, and then clusters these subsequences by using a suitable measure of time series similarity. Further the rule finding methods are used T
to obtain the rules from the sequences. The simplest rules have format A ⇒ B, i.e. if A occurs, then B occurs within time T, where A and B are identifiers of patterns (clusters of patterns) discovered in time series. The approach was applied to several data sets. As an example, from the daily closing share prices of 10 database companies traded on the NASDAG the
Towards Perception Based Time Series Data Mining
223
20
following significant rule was found: s18 ⇒ s4. The patterns s18 and s4 represent some clusters of patterns. An interpretation of the rule can have a form: “a stock which follows a 2.5-week declining pattern of s18 sharper decrease and then leveling out, will likely incur a short sharp fall within 4 weeks (20 days) before leveling out again (the shape of s4)”. Temporal relations between shape patterns are considered in [19]. The approach to knowledge discovery from multivariate time series usees segmentation of time series and transformation into sequences of state intervals (bi , si , fi ), i =1, . . . n. Here, si are time series states like increasing, decreasing, constant, highly increasing, convex holding during time periods [bi , fi ). The temporal relationships between state intervals are described by thirteen temporal relationships of Allen’s interval logic [2]. Finally, the rules with frequent temporal patterns in the premise and conclusion are derived. The method was applied to time series of air-pressure and wind strength/wind direction data [19]. The smoothed time series have been partitioned into segments with primitive patterns like very highly increasing, constant, decreasing. Below is an example of association rule generated by the proposed approach: convex air pressure, highly decreasing air pressure, decreasing air pressure→ highly increasing wind strength, where the patterns in the antecedent part of the rule are mutually related by Alen temporal relationships like overlaps, equals, meets. The meaningful rules obtained by the described technique can be used together with expert knowledge in construction of expert system [19]. Several approaches use analysis of time series shape patterns given in expert knowledge, summaries and forecasting texts for transforming them into rule based fuzzy expert system or for texts generation similar to human descriptions in considered problem area. A rule-based fuzzy expert system WXSYS realized in FuzzyClips attempts to predict local weather based on conventional wisdom [29]. Below are examples of expert rules which are used in the system in a formalized form: “Generally, if the barometer falls steadily and the wind comes from an easterly quarter, expect foul weather”. “If the current wind is blowing from S to SW and the current barometric pressure is rising from 30.00 or below, then the weather will be clearing within a few hours, then fair for several days”. System called Ana [26] uses special grammar for generation of summaries based on patterns retrieved from the summaries generated by human experts. Ana generates descriptions of stock market behavior. Data from a Dow Jones stock quotes database serves as an input to the system, and the opening paragraphs of a stock market summary are produced as an output. The following text sample is one of the possible interpretations of data generated by the system:
224
I. Z. Batyrshin, L. Sheremetov
“Wall Street’s securities markets rose steadily through most of the morning, before sliding downhill late in the day. The stock market posted a small loss yesterday, with the indexes finishing with mixed results in active trading. The Dow Jones average of 30 industrials surrendered a 16.28 gain at 4pm and declined slightly, to finish at 1083.61, off 0.18 points”. The more extended system called StockReporter is discusses in [33]. This system is one of a few online text generation systems, which generate textual descriptions of numeric data sets [36]. StockReporter produces reports that incorporate both text and graphics. It reports on the behavior of any one of 100 US stocks and on how that stock’s behavior compares with the overall behavior of the Dow Jones Index or the NASDAQ. StockReporter can generate a text like the following: “Microsoft avoided the downwards trend of the Dow Jones average today. Confined trading by all investors occurred today. After shooting to a high of $104.87, its highest price so far for the month of April, Microsoft stock eased to finish at an enormous $104.37”. Finally we cite some perception based weather forecast generated by weather.com: “Scattered thunderstorms ending during the evening”, “Skies will become partly cloudy after midnight”, “Occasional showers possible”. Considered examples of generated texts with perception based terms support one of the main ideas of Computing with Words and Perceptions that inputs and/or outputs of real decision making systems can contain perceptions described by words [38].
3 Perception Based Functions in Qualitative Forecasting In this section we discuss the methods of modeling of qualitative expert forecasting by perception based functions (PBF) [4]. Qualitative forecasting methods use the opinions of experts to subjectively predict future events [12]. These methods are usually used when historical data either are not available or are scarce, for example to forecast sales for the new product. In subjective curve fitting applied to predicting sales of a new product the product life cycle is usually thought of as consisting of several stages: “growth”, “maturity” and “decline”. Each stage is represented by qualitative patterns of sales as follows [12]: “Growth” stage: Start Slowly, then Increase Rapidly, and then Continue to Increase at a Slower Rate; “Maturity” stage, sales of the product stabilize: Increasing Slowly, Reaching a Plateau, and then Decreasing Slowly; “Decline” stage: Decline at an Increasing Rate.
Towards Perception Based Time Series Data Mining
225
Fig. 2 Product Life Cycle. Adopted from [12]
The “growth” stage is subjectively represented as S-curve, which could then be used to forecast sales during this stage (see Fig. 2). To predict time intervals for each step of “Growth” stage, the company uses the expert knowledge and its experience with other products. The subjective curve fitting is very difficult and requires a great deal of expertise and judgment [12]. The methods of reconstruction of perception based functions [4, 10] can support the process of qualitative forecasting. The qualitative patterns of sales may be represented by perception based patterns of trends and each stage may be represented by the sequence of fuzzy time intervals with different trend patterns. For example, S-curve modeling “Growth” stage can be represented by the following rules: R1: If T is Start of Growth Stage then V is Slowly Increasing and Convex; R2: If T is Middle of Growth Stage then V is Quickly Increasing; R3: If T is End of Growth Stage then V is Slowly Increasing and Concave; where T denotes time intervals and V denotes the sale volumes. Perception based functions use scaling of trend patterns and fuzzy granulation of scale grades. Below is an example of a linguistic scale of directions given by IncreasingDecreasing patterns: L D = <Extremely Quickly Decreasing, Quickly Decreasing, Decreasing, Slowly Decreasing, Constant, Slowly Increasing, Increasing, Quickly Increasing, Extremely Quickly Increasing>. In abbreviated form this scale will be written as follows: L D = <1:EQD, 2:QDE, 3:DEC, 4:SDE, 5:CON, 6:SIN, 7:INC, 8:QIN, 9:EQI>. The granulation of this scale may be done by suitable crisp or fuzzy partition of the range of slope values of MAP transform of time series. Figure 3) depicts possible axis of fuzzy directions corresponding to grades of this scale. The scale of concave-convex patterns can have the following grades: L CC = <Strongly Concave, Concave, Slightly Concave, Linear, Slightly Convex, Convex, Strongly Convex>. The grades of these scales can be used for generation of linguistic descriptions like “Quickly Increasing and Slightly Concave”, “Slowly Decreasing and Strongly Convex” etc. Such perception based descriptions may be represented as suitable convex-concave modifications of corresponding axis of
226
I. Z. Batyrshin, L. Sheremetov
Fig. 3 Axis of directions
directions and by further fuzzification of obtained curves. Several methods of convex (CV) and concave (CC) modifications of directions were proposed in [10]. BZ-modifications are based on Zadeh operation of contrast intensification:
CC(y) = y2 −
(y2 − y)2 (y − y1 )2 , C V (y) = y1 + , y2 − y1 y2 − y1
where y is a modified function and y1 , y2 are the minimal and the maximal values of y(x) on considered interval of input variable x. The grades of L CC scale are represented by the following patterns, respectively: PCC =
. Figure 4a) depicts these patterns applied to directions 7.INC and 4.SDE. For example, the pattern Slowly Decreasing and Strongly Convex is represented by the undermost curve in Fig. 4a) and calculated by f =CV(CV(CV(y))) where y is the line corresponding to the direction 4:SDE. BY-modification uses the following CC-CV modifications of linear function y: t CCt (y) = y2 − (y2 − y1 )t − (y − y1 )t , t C Vt (y) = y1 + (y2 − y1 )t − (y2 − y)t , where t is a parameter, t ∈ (0, 1]. We have: CC1 = C V1 = I , where I (y) = y. Figure 4b) shows CC-CV patterns in directions 7:INC and 4:SDE corresponding to all grades of L CC scale and obtained by BY-modifications respectively.
Towards Perception Based Time Series Data Mining CCM = 1, dir = 7:INC
227
CCM = 2, dir = 7:INC
CCM = 3, dir = 7:INC
30
30
30
28
28
28
26
26
26
24
24
24
22
22
22
20
20
20
18
18
18
16
16 -5
0
5
16 -5
0
5
-5
0
dir = 4:SDE
dir = 4:SDE
dir = 4:SDE
a)
b)
c)
5
Fig. 4 Granulation of convex-concave patterns in directions 7:INC (dashed line) and 4:SDE (dotted line) obtained by a) BZ- modification; b) BY-modification, and c) BS-modification
BS-modification applies the following CC-CV modifications of linear function y: (y2 − y1 )(y − y1 ) , (y2 − y1 ) + s(y2 − y) (y2 − y1 )(y2 − y) , C Vs (y) = y2 − (y2 − y1 ) + s(y − y1 )
CCs (y) = y1 +
where s ∈ (−1, 0]. We have CC0 = C V0 = I . Fig. 4c) shows CC-CV patterns obtained by BS-modification of directions 7:INC and 4:SDE corresponding to all grades of L CC scale as follows: respectively. The linear and convex-concave patterns may be used for crisp and fuzzy modeling of perception based time series patterns. Fuzzy modeling uses fuzzification of trend patterns [4]. This fuzzification may be defined parametrically depending on the type and parameters of a fuzzy set used for fuzzification. The possible reconstruction of rules R1 -R3 considered above is shown in Fig. 5. The resulting fuzzy function is obtained as a concatenation of fuzzy perception based patterns defined on the sequence of fuzzy intervals given in the antecedents of the rules R1 -R3 . The width of this fuzzy function corresponds to the uncertainty of forecasting. This PBF
228
I. Z. Batyrshin, L. Sheremetov
Fig. 5 Fuzzy perception based S-curve modeling “Growth stage”
can be used for forecasting of sells of a new product. Perception based functions corresponding to “Maturity” and “Decline” stages can be reconstructed similarly. Perception based functions give natural and flexible tools for modeling human expert knowledge about function’s shapes. They have the following advantage over logarithmic, exponential and other mathematical functions usually used in such modeling: PBF can be easily composed from different segments, each segment has natural linguistic description corresponding to human perceptions, each perception based pattern can be easily tuned due to its parametric definition to be better matched both with expert opinion and with training data if they are available, fuzzy function obtained as a result of reconstruction of PBF gives possibility to model different types of uncertainty presented in forecasting.
4 Conclusions Human decision making in economics, finance, industry, natural systems etc is often based on analysis of time series data bases. Such decision making procedures use expert knowledge usually formulated linguistically and perceptions obtained as a result of analysis of TSDB. CWP and PTSDM can serve as main constituents of intelligent decision making systems which use expert knowledge in decision making problems related with TSDB. The goals of perception based time series data mining is to support procedures of computing with words and perceptions in TSDB domains. The methods of representation and manipulation by perceptions and methods
Towards Perception Based Time Series Data Mining
229
of extraction of perceptions from TSDB relevant to decision making problems should adopt methods developed in fuzzy sets theory, computational linguistic, syntactic pattern recognition, time series analysis and time series data mining. Some approaches to modeling of perception based patterns in time series data bases domains are discussed in this paper. The further directions of research will use adaptation of models and effective algorithms developed in TSDM to representation and processing of fuzzy perceptions, development of methods of computing with words and perceptions for modeling decision-making procedures in TSDB domains, development of decision making systems based on CWP and PTSDM in specific problem areas. Some methods of application of soft computing methods in data mining and time series analysis can be found in [3, 11, 14, 17, 19, 21, 22, 27, 29, 30, 31, 34, 35]. Acknowledgments The research work was supported by projects D.00006 and D.00322 of IMP.
References 1. Agrawal, R., Psaila, G., Wimmers, E.L., Zait M.: Querying shapes of histories. Proc. 21st Intern. Conf. Very Large Databases, VLDB ’95, Zurich, Switzerland (1995) 502–514 2. Allen, J.F.: Maintaining knowledge about temporal intervals. Comm. ACM. 26 (11) (1983) 832–843 3. Baldwin, J.F., Martin, T.P., Rossiter, J.M.: Time series modeling and prediction using fuzzy trend information. Proc. Int. Conf. SC Information/Intelligent Syst. (1998) 499–502 4. Batyrshin, I.: Construction of granular derivatives and solution of granular initial value problem. In: Nikravesh, M.; Zadeh, L.A.; Korotkikh, V. (eds.). Fuzzy Partial Differential Equations and Relational Equations. Reservoir Characterization and Modeling. Studies in Fuzziness and Soft Computing, Vol. 142. Springer-Verlag, Berlin Heidelberg New York (2004) 285–307 5. Batyrshin, I., Cosultchi, A., Herrera-Avelar, R.: Association network of petroleum wells based on partial time series analysis. Proc. Workshop Intel. Comput., Mexico (2004) 17–26 6. Batyrshin, I., Herrera-Avelar, R., Sheremetov, L., Panova, A.: Association networks in time series data mining. NAFIPS 2005, USA (2005) 754–759 7. Batyrshin, I., Herrera-Avelar, R., Sheremetov, L., Suarez, R.: On qualitative description of time series based on moving approximations. Proc. Intern. Conf. Fuzzy Sets and Soft Computing in Economics and Finance, FSSCEF 2004, Russia, vol. I, (2004) 73–80 8. Batyrshin, I., Herrera-Avelar, R., Sheremetov, L., Suarez, R.: Moving approximations in time series data mining. Proc. Int. Conf. Fuzzy Sets and Soft Computing in Economics and Finance FSSCEF 2004, St. Petersburg, Russia, v. I, (2004) 62–72 9. Batyrshin, I., Wagenknecht, M.: Towards a linguistic description of dependencies in data. Int. J. Applied Math. and Computer Science. Vol. 12 (2002) 391–401 10. Batyrshin, I.Z.: On reconstruction of perception based functions with convex-concave patterns. Proc. Int. Conf. Computational Intelligence ICCI 2004, Nicosia, North Cyprus, Near East University Press (2004) 30–34 11. Bezdek, J. C.: Fuzzy models and digital signal processing (for pattern recognition): Is this a good marriage? Digital Signal Processing, 3 (1993) 253–270 12. Bowerman, B.L., O’Connell, R.T.: Time series and forecasting: an applied approach. Mass. Duxbury Press (1979) 13. Boyd, S.: TREND: A system for generating intelligent descriptions of time-series data. Proc. IEEE Intern. Conf. Intelligent Processing Systems (ICIPS1998) (1998) 14. Chen, G.Q., Wei, Q., Kerre E.E.: Fuzzy Data Mining: Discovery of Fuzzy General. Assoc. Rules. In: Recent Research Issues Manag. Fuzz. in DB, Physica-Verlag (2000)
230
I. Z. Batyrshin, L. Sheremetov
15. Cheung, J.T., Stephanopoulos, G.: Representation of process trends. -Part I. A formal representation framework. Computers and Chemical Engineering, 14 (1990) 495–510 16. Das, G., Lin, K.I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. Proc. KDD98 (1998) 16–22 17. Dubois, D., Hullermeier, E., Prade, H.: A note on quality measures for fuzzy association rules. IFSA 2003, LNAI 2715, Springer-Verlag (2003) 677–648 18. Hand, D., Manilla, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001) 19. Höppner, F.: Learning Temporal Rules from State Sequences. IJCAI Workshop on Learning from Temporal and Spatial Data, Seattle, USA (2001) 25–31 20. Jang, J.-S.R., Sun, C.T., Mizutani, E.: Neuro-Fuzzy and Soft Computing. A Computational Approach to Learning and Machine Intelligence. Prentice-Hall International (1997) 21. Kacprzyk, J., Zadrozny, S.: Data Mining via Linguistic Summaries of Data: An Interactive Approach, IIZUKA’98, Iizuka, Japan (1998) 668–671 22. Kandel, A., Last, M., Bunke, H. (eds): Data Mining and Computational Intelligence, PhysicaVerlag, Studies in Fuzziness and Soft Computing, Vol. 68. (2001) 23. KDnuggets: Polls: Time-Series Data Mining (Nov 2004). What Types of TSDM You’ve Done? http://www.kdnuggets.com/polls/2004/time_series_data_mining.htm 24. Klir, G.J., Clair, U.S., Yuan, B.: Fuzzy Set Theory: Foundations and Applications, Prentice Hall Inc. (1997) 25. Konstantinov, K.B., Yoshida, T.: Real-time qualitative analysis of the temporal shapes of (bio) process variables. American Inst. Chem. Eng. Journal 38 (1992) 1703–1715 26. Kukich, K.: Design of a knowledge-based report generator. Proc. 21st Annual Meeting of the Association for Computational Linguistics (ACL-1983) (1983) 145–150 27. Last, M., Klein, Y., Kandel A.: Knowledge discovery in time series databases. IEEE Trans. SMC, Part B, 31 (2001) 160–169 28. Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. San Diego, CA. (2003) 29. Maner, W., Joyce, S.: WXSYS: Weather Lore + Fuzzy Logic = Weather Forecasts. Present. 1997 CLIPS Virtual Conf. http://web.cs.bgsu.edu/maner/wxsys/wxsys.htm (1997) 30. Mitra, S., Pal, S.K., Mitra, P.: Data Mining in Soft Computing Framework: A Survey. IEEE Transactions Neural Networks, 13 (2002) 3–14 31. Pons, O., Vila, M.A., Kacprzyk J. (eds.): Knowledge Management in Fuzzy Databases, Physica-Verlag (2000) 32. Stockman, G., Kanal, L., Kyle, M.C.: Structural pattern recognition of carotid pulse waves using a general waveform parsing system. CACM 19 (1976) 688–695 33. StockReporter. http://www.ics.mq.edu.au/∼ltgdemo/StockReporter/about.html 34. Sudkamp, T. Examples, counterexamples, and measuring fuzzy associations. Fuzzy Sets and Systems, 149 (2005) 57–71 35. Yager, R.R.: On linguistic summaries of data. In: Piatetsky-Shapiro, G., Frawley, B. (eds.): Knowledge Discovery in Databases, MIT Press, Cambridge, MA (1991) 347–363 36. Yu, J., Reiter, E., Hunter, J., Mellish C.: Choosing the content of textual summaries of large time-series data sets. Natural Language Engineering (2005) (To appear) 37. Zadeh, L.A.: Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Systems, Man and Cybernetics SMC-3 (1973) 28–44 38. Zadeh, L.A.: From computing with numbers to computing with words - from manipulation of measurements to manipulation of perceptions. IEEE Trans. on Circuits and Systems - 1: Fundamental Theory and Applications, vol. 45, (1999) 105–119
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems Ronald R. Yager
Abstract Our concern is with the development of tools useful for the construction of semantically intelligent web based systems. We indicate that fuzzy set based reasoning systems such as approximate reasoning provide a fertile framework for the construction of these types of tools. Central to this framework is the representation of knowledge as the association of a constraint with a variable. Here we study one important type of variable, veristic, These are variables that can assume multiple values. Different types of statements providing information about veristic variables are described. A methodology is presented for representing and manipulating veristic information. We consider the role of these veristic variables in databases and describe methods for representing and evaluating queries involving veristic variables.
1 Introduction An important aspect of the future of man-machine interaction will involve search through through a body of text. Particular examples of this question-answering systems and agent mediated searchers. The future development of the semantic web will greatly help in this task (Berners-Lee, Hendler and Lassila 2001; Fensel, Hendler, Lieberman and Wahlster 2003; Antoniou and van Harmelen 2004; Passin 2004). A fundamental requirement here is a semantically rich formal mechanism for reasoning with the kinds of information expressed in terms of a natural language. Stating with Zadeh’s 1965 paradigm changing paper more and more sophisticated fuzzy set based semantic technologies have been developed with these required capabilities. We particularly note the theory of approximate reasoning (Zadeh 1979a), the paradigm of computing with words (Zadeh 1996) and the generalized theory of uncertainty (Zadeh 2005). While these were developed by Zadeh other researchers to numerous to mention have made important contributions to the development of a fuzzy set based semantics (Dubois and Prade 1991; Novak 1992). A recent book
Ronald R. Yager Machine Intelligence Institute, Iona College, New Rochelle, NY 10801 e-mail: [email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
231
232
R. R. Yager
edited by Sanchez (2006) focuses on the interaction between fuzzy sets and the semantic web.
2 Approximate Reasoning The fuzzy set based theory of approximate reasoning (AR) can provide the formal machinery needed to help in the construction of intelligent semantic web systems. A central component of AR is the representation of knowledge by the association of a constraint to a variable. Zadeh [2005] discussed the idea of a generalized constraint and suggested some of the types of variables needed to represent various kinds of human knowledge, among these were veristic and possibilistic variables. While our focus here is on veristic variables (Yager 2000), a basic understanding of possibilistic variables is needed, because of their central role in the theory of approximate reasoning. Possibilistic variables (Zadeh 1978; Dubois and Prade 1988b) have been widely studied in the approximate reasoning literature and have been often used to represent linguistic information. An important characteristic of possibilistic variables is the fact that they are restricted to have one and only one value. Variables such as Bernadette’s age, the current temperature or your father’s name are examples of possibilistic variable, they can only assume one value. While the actual value of a possibilistic variable is unique, our knowledge about this value can be uncertain. This is especially true when our knowledge comes from information described in a natural language. In order to represent this uncertainty, sets and particularly fuzzy sets play an important role in expressing our knowledge of the value of a possibilistic variable. For example if U is the variable Bernadette’s age then the knowledge “Bernadette is young” can be expressed in the language of AR as U is A where A is a fuzzy subset the domain of ages, A(x) indicates the possibility that Bernadette is x years old. We note that the special case where A is a crisp set with one element corresponds to the situation in which the value of U is exactly known. Thus the knowledge “Bernadette is 21” is represented as U is A where A = {21}. Of particular significance to us is that a well established machinery has been developed, within the theory of AR, for the manipulation of possibilistic information which we shall feel free to draw upon (Zadeh 1979a; Yager and Filev 1994). In addition to playing a fundamental role in AR possibilistic variables have also been directly used in databases (Zemankova and Kandel 1984; Petry 1996). Their use in databases have been primarily in two ways. One use has been to represent linguistic quantities appearing as attribute values in a database. In this usage we would represent the age of person who is known to be young by a possibilistic distribution obtained from the fuzzy subset young. The second and currently more prevalent use in database applications, is to use possibility distributions to represent flexible queries (Andreasen, Christianisen and Larsen 1997; Bosc and Prade 1997). A second class of variables available in approximate reasoning are veristic variables (Yager 1982; 1984; 1987a; 1987b; 1988; 2000; 2002). These type of variables, as opposed to possibilistic ones, are allowed to have any number of solutions. One example of a veristic variable is the variable children of Masoud. The friends of
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
233
Didier is another example of a veristic variable. The temperatures I like as well as days I worked this week are also examples of a veristic variable. We note that each of these variables can have multiple or no solutions. As we shall see here, as in the case of possibilistic variables, sets will play a central role in expressing information about the variable. However here, the sets will be used to represent the multiplicity of solutions rather than the uncertainty. Fuzzy sets will also be used to allow for degree of solution.
3 Expressing Veristic Type Knowledge Yager [2000] provided using an extension of Zadeh’s (1997; 2005) Copula notation for the representation of statements involving veristic variables within the language of approximate reasoning, Let V be a veristic variable taking its values in the set X that is V can attain any number of values in X. Zadeh [1997] suggested the symbolic representation of a canonical statement involving veristic information based on the association of a fuzzy subset A with a veristic variable as V is ν A. Here the Copula is ν is used to indicate that the variable is veristic. We note for possibilistic variables we use the copula is. In this framework the membership grade of x in A, A(x), indicates the truth of the statement that x is a value of V given the statement V is ν A. If V is the variable languages spoken by Janusz then the statement V isν {French, Spanish, English, Polish, Russian, Italian, German} indicates that Janusz speaks the seven languages in the set. Fuzzy sets enter the picture when there exists some gradation as to the degree to which an element belongs to the set. Let V be the variable women Lotfi likes, which is a veristic variable. Let the fuzzy subset blond women be defined as Blond = {1/Alice, 0.7/Barbara, 0.5/Carol, 0.3/Debbie, 0/Esther}. The statement Lotfi likes blonds is expressible as V is ν Blond. In this case Blond (Barbara) = 0.7 indicates the veracity of the statement that Lotfi likes Barbara. The statement Henrik likes warm weather can be seen to induce a veristic type statement V is ν Warm where V is the veristic variable temperatures Henrik likes and Warm is a fuzzy subset over the space of temperature. In order to clearly understand the distinction between possibilistic variables and veristic variables consider the following. Let U and V be possibilistic and veristic variables respectively and let A = {1, 2, 3}. Consider the two AR statements U is A and V is ν A. The first statement conveys the knowledge U = 1 or U = 2 or U = 3 while the second conveys the knowledge V = 1 and V = 2 and V = 3
234
R. R. Yager
Yager [2000] identified four different types of canonical statements involving veristic variables and used these to motivate a refinement of the use of veristic statement within the language of AR. Consider the statements S-1 John likes blonds S-2 John likes only blonds S-3 John doesn’t like blonds S-4 John only doesn’t like blonds. We see that statements S-1 and S-2 are positive statements in that they indicate objects that are solutions to our variable, women John Likes. Statements S-3 and S-4 are negative statements; they indicate objects that are not solutions to our variable. Along a different dimension statements S-1 and S-3 are of the same type while statements S-2 and S-4 are of the same type. We see that S–1 and S-3 are “open” statements in the sense that while S-1 indicates that John likes blonds it doesn’t preclude him from liking other women. Thus one would not necessarily find contradictory the two statements John likes blonds John likes redheads Thus these type of statement are open to additions. The same is true of S-3, while stating that John doesn’t like blonds and additional statements indicating other people John doesn’t like would not be contradictory. On the other hand the statements S-2 and S-4 are “closed” statements in that they preclude any other elements from joining. Based upon these observations, Yager [2000] suggested that in expressing veristic information in the language of the theory of approximate reasoning we must, in addition to denoting that a variable is veristic, include information about the manner of association of the variable with its value. In the preceding we essentially indicated that there are two dimensions, one along a scale of positive/negative and the other along a scale open/closed. In order to include this information with the AR expression, it was suggested using two parameters with the veristic Copula is ν. These parameters are c for closed and n for negative, the lack of appearance of a parameter means its opposite holds. This is ν means open and positive is ν(n) means open and negative is ν(c) means closed and positive is ν(c,n) means closed and negative Using these Copulas and letting V indicate the variable women John likes and B indicate the subset blond, we can express the statements S-1 to S-4 as S-1 V is ν B S-2 V is ν(c) B
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
235
S-3 V is ν(n) B S-4 V is ν(n, c) B. Now that we have a means for expressing statements involving veristic variables we turn to the issue of representing the knowledge contained in these statements in a manner we can manipulate within the AR system.
4 Representing and Manipulating Veristic Information As approximate reasoning has a powerful machinery for manipulating statements involving possibilistic variables. Yager [2000] suggested a methodology for converting statements involving veristic variables into propositions involving possibilistic variables. The possibilistic type information could then be manipulated using AR and then the results retranslated into statements involving veristic variables. Figure 1 shows the basic idea. Ideally, as we get more experience with this process we shall learn how to directly manipulate veristic information and circumvent the translation steps.
Veristic Information
Translation into Possibilistic Statements
Knowledge Manipulation
Retranslation
Veristic Statement
Fig. 1 Translation Process
We now describe Yager’s method for expressing veristic information in terms of possibilities. Let V be a veristic variable taking its values in the space X. We associate with V a possibilistic variable V∗ taking its value in the space IX , the set of all fuzzy subsets of X. Thus V∗ has one and only value, a fuzzy subset of X. The relationship between V and V∗ is that V∗ is the set of all solutions to V. Let V be the veristic variable week-days. We can have veristic statements such as Tuesday and Thursday are week-days (V is ν {Tuesday, Thursday}) as well as Saturday is not a week day (V is ν(n) {Saturday}). On the other hand the set of all solutions to V, is {Monday and Tuesday and Wednesday and Thursday and Friday}, V∗ i s (
1 ) {Monday, Tuesday, Wednesday, Thursday, Friday}
Yager [2000] noted that statements involving a veristic variable V tells us something about the value of associated possibilistic variable V∗ . For example, with V the variable corresponding to people John likes, then the veristic statement John likes Mary and Susan, V is ν {Mary, Susan} indicates that the set that is the solution to
236
R. R. Yager
V∗ , the set of all the people John likes, must contain the elements Mary and Susan. Essentially this statement says that all sets containing Mary and Susan are possible solutions of V∗ . Motivated by this, Yager [2000] suggested a means for converting statements involving veristic variables into those involving the associated possibilistic variable. As seen from the above the concept of inclusion (containment) plays a fundamental role in translating veristic statements into possibilistic ones. Specifically the statement that V is ν A means that any fuzzy subset containing A is a possible solution for V∗ . More formally, as a result of this statement if F is a fuzzy subset of X then the possibility that F is the solution of V∗ is equal to degree to which A is contained in F. This situation requires us introduce a definition for containment. As discussed by Bandler and Kohout [1980] there are many possible definitions of containment in the fuzzy framework. We have chosen to use Zadeh’s original definition of containment, if G and H are two fuzzy subsets on the domain X then G is said to be contained in H if G(x) ≤ H(x) for all x in X. We emphasize here that this is not the only possible definition of containment that can be used. However this choice does have the very special property that it is a binary definition, Deg(G ⊆ H) is either one or zero. This leads to a situation in which the possibility of a fuzzy subset of X being the solution to V∗ is either one or zero. This has the benefit of allowing us to more clearly see the processes involved in manipulating veristic information which in turn will help us develop our intuition at this early of our work with veristic information. In the following we list the translation rules, under the assumption of the Zadeh definition of containment, for the four canonical types veristic statements. I.
V i s ν A ⇒ V∗ is A∗1
where A∗1 is a subset of IX such that for any fuzzy subset F ∈ IX its membership grade is defined by A∗1 (F) = 1 if A(x) ≤ F(x) for all x in X (A ⊆ F) A∗1 (F) = 0 if A F Thus any F for which A∗1 (F) = 1 is a possible solution for V∗ . II.
V i s ν(c) A ⇒ V∗ is A∗2
where A∗2 is a subset of Ix such that A∗2 (A) = 1 A∗2 (F) = 0 for F = A We see here that the statement V is ν(c) A induces an exact solution for V∗ , A.
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
237
III. V i s ν(n)A ⇒ V∗ is A∗3 where A∗3 is a subset of IX such that for any F ∈ IX A∗3 (F) = 1 if F(x) ≤ 1 − A(x) for all x in X (F ⊆ A) A∗3 (F) = 0 if F A Here again we see that we get a set of possible solutions, essentially those subsets not containing A. IV. V i s ν(n, c) A ⇒ V∗ is A∗4 where A∗4 is a subset of IX such that A∗4 (A) = 1 A∗4 (F) = 0 for all F = A Here we see we again get an exact solution for V∗ , A. Once having converted our information involving veristic variables into statements involving the associated possibilistic variable we can use the machinery of approximate reasoning to manipulate, fuse, and reason with the information. In making this conversion we pay a price, the domain of V is X while the domain of V∗ , IX , is a much more complex and bigger space. While in general the manipulation of pieces of information involving veristic variables requires their conversion into possibilistic statements, some of these operations can be performed directly with the veristic statements. Among the most useful are the following which involve combinations of like statement: 1. V is ν A1 and V is ν A2 .and . . . . . . . . . and V is ν An ≡ V is ν(∪j Aj ) The “anding” of open affirmative veristic statement results in the union of the associated sets. John likes {Mary, Bill, Ann} and John likes {Tom, Bill} ⇒ John likes {Mary, Bill, Ann, Tom} 2. V is ν A1 or V i s ν A2 or . . . . . . . . . or V i s ν An ≡ V i s ν(∩j Aj ) The “oring” of open affirmative veristic statement results in the intersection of the associated sets. John likes {Mary, Bill, Ann} or John likes {Tom, Bill} ⇒ John likes {Bill} 3. V i s ν(n) A1 and V i s ν(n) A2 .and . . . . . . . . . and V i s ν(n) An ≡ V i s ν(n)(∪j Aj ) 4. V i s ν(n) A1 or V i s ν(n) A2 or . . . . . . . . . or V i s ν(n) An ≡ V i s ν(n)(∩j Aj )
238
R. R. Yager
We emphasize the unexpected association of union with the “and” and intersection with “or” in the above combination rules. Often we are interested in having information about a particular element in X, such as whether x1 is a solution of V, this kind of information is not readily apparent from statements involving V∗ . In order to more readily supply this kind of information about elements in X based upon statements involving V∗ , we introduce two distributions on X induced by a statement involving V∗ , these are called the verity and possibility distributions. Let V be a variable taking its values in the space X From the statement V∗ is W, where W is a fuzzy subset of IX we can obtain a verity distribution on X, Ver: X → I, where I = [0, 1], such that – Ver(x) = MinF∈I x [F(x) ∨ W(F)] In the special case when W is a crisp subset of IX then this reduces to Ver(x) = MinF∈W [F(x)] Here we see that Ver(x) is essentially the smallest membership grade of x in any subset of IX that can be the solution set to V∗ . Ver(x) can be viewed as a lower bound or minimal support for the truth of the statement “x is a solution of V.” The measure of verity is closely related to the measure of certainty, necessity, used with possibilistic variables although a little less complex. This additional complexity with possibilistic variables is a result of the fact that we have one and only one solution for a possibilistic variable, this requires that in determining the certainty of x we must take into account information about possibility of other solutions. While in the case of veristic variables, because of allowability of any number of solutions, the determination of whether x is a solution to V is based solely on information about x and is independent of the situation with respect to any other element in the space X. We can show that the following are the verity distribution associated with the four canonical veristic statement V is ν A ⇒ Ver(x) = A(x) V is ν(n) A ⇒ Ver(x) = 0 V is ν(c) A ⇒ Ver(x) = A(x) Ver is ν(c, n) A ⇒ Ver(x) = 1-A(x) It should be emphasized that the statement V i s ν(n) A has Ver(x) = 0 for all A, thus it never provides any support for the statement x is a solution of V, whether x is contained in A or not. We also note that since an “anding” of a collection of statements of the form V i s ν Aj results in V i s ν(∪j Aj ) then the verity distribution associated with this “anding” is Ver(x) = Maxj [Aj (x)]
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
239
On the other hand since an “oring” of statements of the form V i s ν Aj results in V i s ν(∩j Aj ) then the verity distribution associated with this “oring” is Ver(x) = Minj [Aj (x)] In the case of negative statements of the form V i s ν(n) Aj , both the “anding” and “oring” of these kinds of statements result in Ver(x) = 0. The second distribution on X generated from V∗ is W is called the possibility distribution and is denoted Poss: X → I and is defined as Poss(x) = MaxF∈I X [F(x) ∧ W(F)]. In the special case when W is crisp this reduces to Poss(x) = MaxF∈W [F(x)]. Here we see that Poss(x) is the largest membership grade of X in any subset of Ix that can be a potential solution of V∗ . In this respect Poss(x) can be seen as upper bound or maximal support for the truth of statement “x is a solution of V.” This measure is exactly analogous to the possibility measure used with possibilistic variables. We can be shown that the following are the possibility distributions associated with the four canonical type veristic statement V i s ν A ⇒ Poss(x) = 1 V i s ν(n) A ⇒ Poss(x) = 1 − A(x) V i s ν(c) A ⇒ Poss(x) = A(x) V i s ν(c, n) A ⇒ Poss(x) = 1 − A(x) The “anding” and “oring” of statements of the form V i s ν Aj results in a possibility distribution such that Poss(x) = 1. In the case of statements of the form V i s ν(n) Aj their “anding” results in a possibility distribution such that Poss(x) = 1 − Maxj [Aj (x)] while for their “oring” we get Poss(x) = 1 − Minj [Aj (x)]. It is also noting that in the case of the canonical closed statement V i s ν(c) A both veracity and possibility attain the same value, A(x). Similarly for the canonical closed statement V i s ν(c, n) A both veracity and possibility attain the same value, 1 − A(x). This fact is not surprising in that these type of statements imply that the solution set V∗ is exactly known, it is A.
5 Basic Querying of Relational Databases Containing Veristic Variables A relational database provides a structured representation of knowledge in the form of a collection of relations. A relation in the database can be viewed as a table in
240
R. R. Yager
which each column corresponds to an attribute and each row corresponds to a record or object. The intersection of a row and a column is a variable indicating the value of the row object for the column attribute. Normally one associates with each attribute a set called its universe constituting the allowable values for the attribute. One desirable characteristic of good database design is to try to make the tables in a database have a very natural semantics: tables in which in each of the records, i.e. the rows, correspond to a real entity are in this spirit. Tables such as employees, customers and parts are examples of these types of real entity objects. Another semantically appealing type of table are those used to connect entity objects, these can be seen as linking tables. Another class of tables which have a semantic reality are those which define some concept. For example a table with just one attribute, states, can be used to define a concept such as Northeast. We can call these concept definition tables. One often imposed requirement in the design of databases is the satisfaction of the first normal form. This requires that each variable in a database is allowed to take one and only one element from its domain. The satisfaction of this requirement often leads to an increase in the number of tables needed to represent our knowledge. In addition its satisfaction often requires us to compromise other desired features and to sometimes sacrifice semantic considerations. For example if we have in a database one table corresponding to the entity, employee, and if we want to include information about the names of an employee’s children, this must be stored in a different table. One reason for the imposition of the requirement of satisfying the first normal form is the lack of a mechanism for querying databases in which attribute values can be multi-valued rather than single valued. The introduction of the idea of a veristic variable provides a facility for querying these types of multiple solutioned variables. Here we shall begin look at the issue of querying databases with veristic variables. We here note the previous related work on this issue (Dubois and Prade 1988a; Vandenberghe, Van Schooten, De Caluwe and Kerre 1989). Assume V is a variable corresponding to an attribute of a relational database which can have multiple solutions, such as the children of John. In this case V is a veristic variable. We shall let X indicate the domain of V. Typically the information about this variable is expressed in terms of a canonical veristic statement of the form V is ν(.,.) A, John’s children are Mary, Bill and Tom. In order to use this information we need to be able to answer questions about this variable based on this information: is Tom a child of John? Ideally we desire a system in which the information is expressed in terms of a veristic variable, V is ν(.,.) A, and answers are obtained in terms of A. However, in order to develop the formalism necessary to construct a question-answering system we need to draw upon the possibility based theory of approximate reasoning. This necessitates that we introduce the associated possibilistic variable V∗ . Thus we shall assume that our information about V can be expressed as V∗ is W where W ⊆ IX . Since we have already indicated a means for translating statements involving V into those involving V∗ this poses no problem. In the following by a question we shall mean one in which we try to match the value of some attribute to some specified value. Typically, for any object, the answer
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
241
to this question is some measure of truth that the attribute value of the object satisfies the query value. In order to provide the machinery for answering questions, we need to introduce some tools from the theory of approximate reasoning. Assume U is a possibilistic variable taking its value in the space Y. An example useful here is the variable the Age of John. Consider the situation in which we know that John is over thirty. If we ask whether John is twelve years old, the answer is clearly no. If we ask whether John is over twenty-one, the answer is clearly yes. However, it is not always the case that we can answer every relevant question using our information. Consider the question, “Is John forty years old?” or the question, “Is John over fifty?”, neither of which can be answered. In these cases the best answer is “I don’t know”. The problem here is that there exists some uncertainty in our knowledge; all we know is that John is over thirty. If we knew the exact age of John, with no uncertainty, then any relevant question can be answered. The theory of approximate reasoning, which was developed to implement reasoning with uncertain information, has formally looked into the issue of answering questions in the fact of uncertainty. With U as our possibilistic variable consider the situation in which we have the knowledge that U is A and want to know if U is B is true. Formally, we have data that U is A and we desire to determine the Truth[U is B/ U is A]. As we have indicated, it is often impossible to answer this question when there exists some uncertainty as to the value of U. We should note that the problem exists even if we allow the truth value to lie in the unit interval rather than being binary. Within approximate reasoning two bounding measures for the Truth[U is B/ U is A] have been introduced, Poss[U is B/ U is A] and Cert[U is B/ U is A] (Zadeh 1979b). Poss[U is B/ U is A] is defined as Poss[U is B/ U is A] = Maxy [B(y) ∧ A(y)] It provides an upper bound on the Truth[U is B/ U is A]. The other measure, the certainty, is defined as Cert[U is B/ U is A] = 1 – Maxy [B(y) ∧ A(y)] and it provides a lower bound on the Truth[U is B/ U is A]. If Cert[U is B/ U is A] = Poss[U is B/ U is A] = α then the Truth[U is A/ U is B] is precisely known, its value is α. Two particular cases of this are when α = 1 and α = 0. In the first case the truth is one, this is usually given the linguistic denotation TRUE, in the second case the truth is zero, this is usually given the linguistic denotation FALSE. Another special case is when Poss[U is B/ U is A] = 1 and Cert[U is B/ U is A] = 0 in this case the truth value can be any value in the unit interval and this situation is given the linguistic denotation, UNKNOWN. We shall now use these surrogate values to provide a machinery for answering questions about veristic attributes in a database. Assume our knowledge about the veristic attribute V can be expressed as V∗ is W. Here we recall that V∗ is the associated possibilistic variable and W is a fuzzy subset of IX , X being the domain of V.
242
R. R. Yager
Further assume our query is expressible as V∗ is Q, here again Q is a fuzzy subset of IX . Using the measures just introduced we get Poss[V∗ i s Q/V∗ i s W] = MaxF [Q(F) ∧ W(F)] Cert[V∗ i s Q/V∗ i s W] = 1 − MaxF [Q(F) ∧ W(F)] Here we note that F ∈ IX , the Max is taken over all fuzzy subset of X. Let us consider the question “is x1 a solution of V?” We note that this statement be expressed in AR as V i s ν{x1 }? As we have indicated this type of statement can be translated as V∗ is Q where Q is a fuzzy subset of I∗ such that for each F ∈ IX , Q(F) = Deg({x1 } ⊆ F). Here we shall use the definition of inclusion suggested in Bandler and Kohout [2], if A and B are two fuzzy subsets of X then Deg(A ⊆ B) = Minx [A(x) ∨ B(x)]. Using this in our query we get Q(F) = Deg(Ex1 ⊆ F) = Minx [Ex1 (x) ∨ F(x)] here Ex1 indicates the set consisting solely of x1 . Since Ex1 (x) = 0 for x = x1 and Ex (x) = 1 for x = x1 then Q(F) = Minx (Ex1 (x) ∨ F(x)] = F(x1 ). Using this we can now answer the question “Is x1 a solution to V given V∗ is W” by obtaining the possibility and certainty: Poss[Vi s ν{x1 }/V∗ i s W] = Poss[V∗ i s Q/V∗ i s W] = MaxF [Q(F) ∧ W(F)] = MaxF [F(x1 ) ∧ W(F)] and Cert[V i s ν{x1 }/V∗ i s W] = 1 − Poss[V∗ i s Q/V∗ i s W] = 1 – MaxF [Q(F) ∧ W(F)] = MinF [Q(F) ∨ W(F)] = MinF [F(x1 ) ∨ W(F)] It should be noted here that Cert[V i s ν{x1 }/V∗ is W] = Ver(x1)
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
243
Poss[V i s ν{x1 }/V∗ is W] = Poss(x1 ) We further note that if V∗ is W is generated by one of the atomic types of statements V is ν(.,.) A then we can directly provide these possibility and certainty values from A: DATA Cert[V is ν{x}/ Data] V is ν A V is ν(c) A A(x) V is ν(n) A 1−A(x) V i s ν(n,c) A 1−A(x)
Poss[V is ν{x}/ Data] A(x) A(x) 0 1−A(x)
It is interesting to note that in the cases in which our data is of the closed form, V i s ν(c) A and V i s ν(c,n), then the truth of the question can be precisely answered since the possibility and certainty have the same value. In particular if we have V i s ν(c) A then truth of the question of x being a solution of V is A(x) while in the case when V i s ν(c,n) A the truth is 1 − A(x). The preceding observation inspires us to consider a more general observation. Assume we have a query about a veristic variable V in a database that can be expressed as V∗ is Q. If our information in the database is V∗ is W we indicated that Poss[B∗ i sQ/V∗ i s W] = MaxF [Q(F) ∧ W(F)] Cert[B∗ i s Q/V∗ i s W] = 1 − MaxF [Q(F) ∧ W(F)] which represent the upper and lower bounds on Truth [V∗ i s Q/V∗ i s W]. Consider now the case when the known data is an atomic statement of a closed affirmative type V∗ i s W ⇔ W i s ν(c) A. As we indicated in this case the set W is such that W(A) = 1 and W(F) = 0 for all F = A From this we see MaxF (Q(F) ∧ W(F)) = Q(A). We also see that 1 − MaxF (Q(F) ∧ W(F)) = 1-Q(A) = Q(A). Thus the possibility of V∗ is Q given V∗ is W equals its certainty. From this we get Truth[V∗ is Q/ V i s ν(c) A] = Q(A), the membership grade of A in the query. Consider now the case when the known data is an atomic statement of the closed negative type, V∗ is W ⇔ V i s ν(n,c) A. In this case we indicated that W(A) = 1 and W(F) = 0 for all F = A. From this we get that
244
R. R. Yager
MaxF [Q(F) ∧ W(F)] = Q(A) 1 − MaxF [Q(F) ∧ W(F)] = Q(A) This of course implies that Poss[V∗ i s Q/V∗ i s ν (n, c)A] = Cert[V∗ is Q/ V is ν(n, c) A] and hence Truth[V∗ is Q/ V is ν(n, c) A] = Q(A) Thus we see that in the case of closed atomic veristic statements the truth value of any question expressible as V∗ is Q can be uniquely obtained. This result is very useful especially in the light of the so called “closed world assumption” (Reiter 1978) often imposed in database. Under the assumption a statement such as John’s children are Mary and Tom would be assumed to be a closed statement, that is John’s only children are Mary and Tom.
6 Compound Querying Let us now look at another question that we can ask about a veristic variable assuming the knowledge V∗ is W , “Are x1 and x2 solutions of V?” We can express this query as the statement V is ν A where A = {x1 , x2 }, so then our question becomes that of finding the truth of V is ν A given V∗ is W. The statement V is ν A can be represented as V∗ is Q where Q is a fuzzy subset of Ix in which Q(F) = Deg (A ⊂ F) = Minx [A(x) ∨ F(x)]. In this case Poss[V∗ is Q/ V∗ is W] = MaxF [Q(F) ∧ W(F)] = MaxF [Minx (A(x) ∨ F(x)) ∧ W(F)] Since A = {x1 , x2 } then Poss[V∗ is Q/V∗ is W] = MaxF [F(x1 ) ∧ F(x2 ) ∧ W(F)]. For the certainty we get Cert[V∗ is Q/ V∗ is W] = 1 – MaxF [Q(F) ∧ W(F)] = MinF [(F(x1 ) ∧ F(x2 )) ∨ W(F)] With A(x1 ) = A(x2 ) = 1 and all others equal zero we got Q(F) = F(x1 ) ∧ F(x2 ). In the preceding we represented the question “Are x1 and x2 solutions of V?” as V is ν A where A = {x1 , x2 } and then translated this into V∗ is Q where Q(F) = Deg (A ⊂ F) = Minx [A(x)∨F(x)]. There exists an alternative and eventually more useful way of getting the same result.
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
245
We indicated in the preceding the query is x1 a solution of V can be expressed as V∗ is Q1 where Q1 (F) = Deg(Ex1 ⊆ F) = Minx [Ex1 (x) ∨ F(x)] = F(x1 ). Similarly the query whether x2 is a subset of V can be expressed as V∗ is Q2 where Q2 (F) = F(x2 ). Using this we can form the query as to whether x1 and x2 are solutions of F as V∗ is Q where V∗ i s Q = V∗ i s Q1 and V∗ i s Q2 = V∗ i s Q1 ∩ Q2 . In this case we also get Q(F) = MinF [Q1 (F), Q2 (F)] = F(x1 ) ∧ F(x2 ) which leads to the same value for the possibility and certainty. More generally if we ask the question “are x1 and x2 and . . . and xm solutions of V?” then our question can be expressed as V∗ i s Q = V∗ i s Q1 and V∗ i s Q2 and . . . . . . . and V∗ i s Qm = V∗ i s Q1 ∩ Q2 ∩ . . . . . . . . . ∩ Qm . Since Qj (F) = F(xj ) then Q(F) = Minx∈B [F(x)] where B = {x1 , . . . . . . ., xm }. From this we get that Poss[all x in B are solutions to V|V∗ i s W ] = MaxF [Minx∈B [F(x)] ∧ W(F)] Cert[all x in B are solutions to V|V∗ i s W] = MinF [Minx∈B [F(x)] ∨ W(F)] In the case when V∗ i s W ⇔ V i s ν(c) A then W(A) = 1 and W(F) = 0 for F = A hence Poss[all x in B are solutions to V/ V i s ν(c) A] = Minx∈B [A(x)] Cert[all x in B are solutions to V/ V i s ν(c) A = Minx∈B [A(x)] as expected these are the same indicating that the Truth[all x in B are solutions to V/ V is ν(c) A] = Minx∈B [A(x)] In the case when V∗ is W ⇔ V i s ν(c, n) A then we also get a unique truth value equal to Truth[all x in B are solutions to V/ V i s ν(c) A] = Minx∈B [A—(x)] If V∗ i s W ⇔ V i s ν A then W(F) = 1 if A ⊆ F and W(F) = 0 if F ⊂ A, where A ⊆ F if F(x) ≤ F(x) for all x. Since A ⊆ X then Poss[all x in B are solutions to V/ V i s νA] = 1 To calculate the certainty we need evaluate MinF [Minx∈B [F(x)] ∨ W(F)]. Since for any F such that A ⊂ F we get W(F) = 0 and for those in which A ⊆ F we get W(F) = 1 then
246
R. R. Yager
Cert[all x in B are solutions to V/V i s ν A] = MinF [Minx∈B [F(x)] ∨ W(F)] Thus the certainty is the minimal membership grade of any of question element in A. In the case when our database has the value V i s ν(n) A then W(F) = 1 if F ⊆ A and W(F) = 0 if F ⊂ A. Using this we get Poss[all x in B/ V i s ν(n) A] = MaxF [Minx∈B [F(x)] ∧ W(F)] = MaxF⊆A [Minx∈B [F(x)]] This maximum occurs when F = A hence Poss[all x in B/ V i s ν(n) A] = Minx∈B [A(x)]. In this case Cert[all x in B/V i s ν(n)A] = MinF⊆A— [Minx∈B [F(x)]] = 0. Thus here the certainty is zero but the possibility is the minimal membership grade of any x in B in A. We now turn to another, closely related query type. Consider the question Are x1 or x2 solutions to V? Again we shall assume that the value of veristic variable V in the database is V∗ i s W. We must now find an appropriate representation of this query, V∗ i s Q. We first recall that the question is xj a solution to V, V i s ν{xj }? induces the statement V∗ i s Qj where Qj is a fuzzy subset of Ix such that Qj (F) = Deg(Exj ⊆ F} = Minx [Exj (x)∨F(x)], here Exj indicates the set consisting solely of xj . Since Exj (x) = 0 for x = xj and Exj (x) = 1 for x = xj , then Qj (F) = F(xj ). Our query as to whether x1 or x2 are solutions, can be expressed as V∗ isν Q1 or V∗ isν Q2 thus our query is V∗ i s Q where Q(F) = Max(Q1 (F), Q2 (F)) = F(x1 ) ∨ F(x2 ) Using this we get Poss [x1 or x2 is solutions/ V∗ i s W] = Poss[V∗ i s Q/V∗ i s W] = MaxF [Q(F) ∧ W(F)] = MaxF [(F(x1 ) ∨ F(x2 )) ∧ W(F)] and Cert [x1 or x2 is solutions/ V∗ i s W] = 1 − Poss[V∗ i s Q/V∗ i s W] = MinF [(F(x1 ) ∨ F(x2 )) ∨ W(F)] The generalization of this to any crisp subset B of X is straightforward, thus Poss [∃x ∈ B that is solution of V/V∗ i s W] = MaxF [Maxx∈B [F(x)] ∧ W(F)]
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
247
Cert [∃x ∈ B that is solution of V/V∗ i s W] = MinF [Maxx∈B [F(x)] ∨ W(F)] We consider now the answer in the case of special forms for V∗ i s W. First we consider the case where V∗ i s W ⇔ V i s ν A. Here W(F) = 1 if A ⊆ F and W(F) = 0 otherwise. Again since A ⊆ X we get Poss[∃x ∈ B that is solution of V/V i s ν A] = 1 The certainty in this case can be easily shown to be Cert[∃x ∈ B that is solution of V/V i s ν A] = Maxx∈B [A(x)] In the case where V∗ i s W ⇔ V i s ν(c) A, W(A) = 1 and W(F) = 0 for all F = A, we get Poss[∃x ∈ B that is solution of V/V i s ν(c) A] = Cert[∃x ∈ B that is solution of V/V i s ν(c) A] = Truth[∃x ∈ B that is solution of V/ V i s ν(c) A] = Maxx∈B [A(x)]. When the knowledge is V∗ i s W ⇔ V i s ν(n) A, W(F) = 1 if F ⊆ A and W(F) = 0 if F ⊂ A. then Poss[∃x ∈ B that is solution of V/V i s ν(n)A] = Maxx∈B [A(x)] and Cert[∃x ∈ B that is solution of V/V i s ν(n)A] = MinF⊂A [Maxx∈B [F(x)]] = 0 since Ø⊂ A. When V∗ i s W ⇔ V i s ν(n, c) A we can show that both the possibility and certainty equal Maxx∈B [A(x)]. An interesting special case is when we have no information about the value of V. In this case V∗ i s W where W(F) = 1 for all F. Under this circumstance Poss[∃x ∈ B that is solution of V/V∗ i s W] = MaxF [Maxx∈B [F(x)]∧W(F)] = 1, since W(X) = 1. Cert[∃x ∈ B that is solution of V/V∗ i s W] = MinF [Maxx∈B [F(x)]∨W(F)] = 0, since W(Ø) = 1 Poss[all x in B are solutions of V/V∗ i s W] = MaxF [Minx∈B [F(x)] ∧ W(F)] = 1 Cert[all x in B are solutions of V/V∗ i s W] = 0 i s ν A where Actually this case corresponds to the situation where V∗ i s W ⇔ V A = Ø. Since for this type of information W(F) = 1 for all F such A F and with A = Øthen W(F) = 1 for all F. Another interesting case is when we have V i s ν(c) Ø. Note this is different than the preceding. In this case we are saying that V has no solutions, while in the preceding we are saying that we don’t know any solutions to V.
7 Conclusion Veristic variables are variables that can assume multiple values such as the favorite songs of Didier, books owned by Henri or Johns hobbies. Different types of statements providing information about veristic variables were described. A methodology, based upon the theory of approximate reasoning, was presented for representing
248
R. R. Yager
and manipulating veristic information. We then turned to the role of these veristic variables in databases. We described methods for representing and evaluating queries involving veristic variables.
References 39. T. Andreasen, H. Christianisen and H. L. Larsen, Flexible Query Answering Systems, Kluwer Academic Publishers: Norwell, MA, 1997. 40. G. Antoniou and F. van Harmelen, A Semantic Web Primer, The MIT Press: Cambridge, MA, 2004. 41. W. Bandler and L. Kohout, L., “Fuzzy power sets and fuzzy implication operators,” Fuzzy Sets and Systems 4, 13–30, 1980. 42. T. Berners-Lee, J. Hendler and O. Lassila, “The semantic web,” Scientific America May, 2001. 43. P. Bosc and H. Prade, “An introduction to the fuzzy set and possibility-based treatment of flexible queries and uncertain or imprecise databases,” in Uncertainty Management of Information Systems, edited by Motro, A. and Smets, P., Kluwer Academic Publishers: Norwell, MA, 285–326, 1997. 44. D. Dubois and H. Prade, “Incomplete conjunctive information,” Computational Mathematics Applications 15, 797–810, 1988a. 45. D. Dubois and H. Prade, Possibility Theory : An Approach to Computerized Processing of Uncertainty, Plenum Press: New York, 1988b. 46. D. Dubois and H. Prade, “Fuzzy sets in approximate reasoning Part I: Inference with possibility distributions,” Fuzzy Sets and Systems 40, 143–202, 1991. 47. V. Novak, The Alternative Mathematical Model of Linguistic Semantics and Pragmatics, Plenum Press: New York, 1992. 48. D. Fensel, J. Hendler, H. Lieberman and W. Wahlster, Spinning the Semantic Web, MIT Press: Cambridge, MA, 2003. 49. T. B. Passin, Explorers Guide to the SEmantic Web, Manning: Greenwich, CT, 2004. 50. F. E. Petry, Fuzzy Databases Principles and Applications, Kluwer: Boston, 1996. 51. R. Reiter, “On closed world data bases,” in Logic and Data Bases, Gallaire, H., & Minker, J. (eds.), New York: Plenum, 119–140, 1978. 52. E. Sanchez, Fuzzy Logic and the Semantic Web, Elsevier: Amsterdam, 2006. 53. R. Vandenberghe, A. Van Schooten, R. De Caluwe and E. E. Kerre, “Some practical aspects of fuzzy database techniques; An example,” Information Sciences 14, 465–472, 1989. 54. R. R. Yager, “Some questions related to linguistic variables,” Busefal 10, 54–65, 1982. 55. R. R. Yager, “On different classes of linguistic variables defined via fuzzy subsets,” Kybernetes 13, 103–110, 1984. 56. R. R. Yager, “Set based representations of conjunctive and disjunctive knowledge,” Information Sciences 41, 1–22, 1987a. 57. R. R. Yager, “Toward a theory of conjunctive variables,” Int. J. of General Systems 13, 203– 227, 1987b. 58. R. R. Yager, “Reasoning with conjunctive knowledge,” Fuzzy Sets and Systems 28, 69–83, 1988. 59. R. R. Yager, “Veristic variables,” IEEE Transactions on Systems, Man and Cybernetics Part B: Cybernetics 30, 71–84, 2000. 60. R. R. Yager, “Querying databases containing multivalued attributes using veristic variables,” Fuzzy Sets and Systems 129, 163–185, 2002. 61. R. R. Yager and D. P. Filev, Essentials of Fuzzy Modeling and Control, John Wiley: New York, 1994. 62. L. A. Zadeh, “Fuzzy sets,” Information and Control 8, 338–353, 1965. 63. L. A. Zadeh, “Fuzzy sets as a basis for a theory of possibility,” Fuzzy Sets and Systems 1,
Veristic Variables and Approximate Reasoning for Intelligent Semantic Web Systems
249
3–28, 1978. 64. L. A. Zadeh, “A theory of approximate reasoning,” in Machine Intelligence, Vol. 9, edited by Hayes, J., Michie, D. and Mikulich, L. I., Halstead Press: New York, 149–194, 1979a. 65. [27] L. A. Zadeh, “Fuzzy sets and information granularity,” in Advances in Fuzzy Set Theory and Applications, edited by Gupta, M. M., Ragade, R. K. and Yager, R. R., North-Holland: Amsterdam, 3–18, 1979b. 66. [30] L. A. Zadeh, “Fuzzy logic = computing with words,” IEEE Transactions on Fuzzy Systems 4, 103–111, 1996. 67. [31] L. A. Zadeh, “Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic,” Fuzzy Sets and Systems 90, 111–127, 1997. 68. [b] L. A. Zadeh, “Toward a generalized theory of uncertainty (GTU)-An outline,” Information Sciences 172, 1–40, 2005. 69. [33] M. Zemankova and A. Kandel, Fuzzy Relational Data Bases – A Key to Expert Systems, Verlag TUV Rheinland: Cologne, 1984.
Uncertainty in Computational Perception and Cognition Ashu M. G. Solo and Madan M. Gupta
Abstract Humans often mimic nature in the development of new machines or systems. The human brain, particularly its faculty for perception and cognition, is the most intriguing model for developing intelligent systems. Human cognitive processes have a great tolerance for imprecision or uncertainty. This is of great value in solving many engineering problems as there are innumerable uncertainties in real-world phenomena. These uncertainties can be broadly classified as either uncertainties arising from the random behavior of physical processes or uncertainties arising from human perception and cognition processes. Statistical theory can be used to model the former, but lacks the sophistication to process the latter. The theory of fuzzy logic has proven to be very effective in processing the latter. The methodology of computing with words and the computational theory of perceptions are branches of fuzzy logic that deal with the manipulation of words that act as labels for perceptions expressed in natural language propositions. New computing methods based on fuzzy logic can lead to greater adaptability, tractability, robustness, a lower cost solution, and better rapport with reality in the development of intelligent systems.
1 Introduction For a long time, engineers and scientists have learned from nature and tried to mimic some of the capabilities observed in humans and animals in electrical and mechanical machines. The Wright brothers started their work on the first airplane by studying the flight of birds. Most scientists of the time thought that it was the
Ashu M. G. Solo Maverick Technologies America Inc., Suite 808, 1220 North Market Street, Wilmington, Delaware, U.S.A. 19801 e-mail: [email protected] Madan M. Gupta Intelligent Systems Research Laboratory College of Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, Canada S7N 5A9 e-mail: [email protected]
M. Nikravesh et al., (eds.) , Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
251
252
A. M. G. Solo, M. M. Gupta
flapping of wings that was the principle component of flying. However, the Wright brothers realized that wings were required to increase the buoyancy in air. In biomedical engineering, the principles of natural science and engineering are applied to the benefit of the health sciences. The opposite approach, reverse biological engineering, is used to apply biological principles to the solution of engineering and scientific problems. In particular, engineers and scientists use this reverse engineering approach on humans and animals in developing intelligent systems. The principle attributes of a human being can be classified in three categories (3 H’s): hands, head, and heart. The hands category refers to the physical attributes of humans. These physical attributes have been somewhat mimicked and somewhat improved on to surpass the restrictive physical limitations of humans through such mighty machines as the tractor, assembly line, and aircraft. The head category refers to the perception and cognition abilities of the brain. The restrictive reasoning limitations of humans have been surpassed through the ongoing development of microprocessors. However, the challenge of creating an intelligent system is still in its incipient stages. Finally, the heart category refers to emotions. Emotions have not yet been mimicked in machines. It is debatable whether or not they can be replicated and whether or not replicating them is even beneficial if they could be replicated. One of the most exciting engineering endeavors involves the effort to mimic human intelligence. Intelligence implies the ability to comprehend, reason, memorize, learn, adapt, and create. It is often said that everybody makes mistakes, but an intelligent person learns from his mistakes and avoids repeating them. This fact of life emphasizes the importance of comprehension, reasoning, learning, and the ability to improve one’s performance autonomously in the definition of intelligence. There are essentially two computational systems: carbon-based organic brains, which have existed in humans and animals since their inception, and silicon-based computers, which have rapidly evolved over the latter half of the twentieth century and beyond. Technological advances in recent decades have made it possible to develop computers that are extremely fast and efficient for numerical computations. However, these computers lack the abilities of humans and animals in processing cognitive information acquired by natural sensors. For example, the human brain routinely performs tasks like recognizing a face in an unfamiliar crowd in 100–200 ms whereas a computer can take days to perform a task of lesser complexity. While the information perceived through natural sensors in humans is not numerical, the brain can process such cognitive information efficiently and cause the human to act on it accordingly. Modern day computers fail miserably in processing such cognitive information. This leads engineers to wonder if some of the functions and attributes of the human sensory system, cognitive processor, and motor neurons can be emulated in an intelligent system. For such an emulation process, it is necessary to understand the biological and physiological functions of the brain. Hardware can be developed to model aspects of neurons, the principle element of the brain. Similarly, new theories and methodologies can be developed to model the human thinking process. Soft computing refers to a group of methodologies used in the design of intelligent systems. Each of these soft computing methodologies can be combined with
Uncertainty in Computational Perception and Cognition
253
other soft computing methodologies to utilize the benefits of each. Soft computing is composed of the constituent methodologies of fuzzy logic, neuro-computing, evolutionary computing, chaotic computing, probabilistic computing, and parts of machine learning theory. Fuzzy logic includes fuzzy mathematics, linguistic variables, fuzzy rule bases, computing with words, and the computational theory of perceptions. Neuro-computing includes static and dynamic neural networks. Evolutionary computing includes evolutionary algorithms and genetic algorithms. Chaotic computing includes Fractal geometry and chaos theory. Probabilistic computing includes Bayesian networks and decision analysis. Some of these soft computing methodologies can be combined to create hybrid soft computing methodologies such as neuro-fuzzy systems, fuzzy neural networks, fuzzy genetic algorithms, neuro-genetic algorithms, fuzzy neuro-genetic algorithms, and fuzzy chaos. The synergy of different soft computing methodologies has been used in the creation of intelligent systems with greater adaptability, tractability, robustness, a lower cost solution, and better rapport with reality. Many advances have been made in mimicking human cognitive abilities. These advances were mostly inspired by certain biological aspects of the human brain. One of the intriguing aspects of human perception and cognition is its tolerance for imprecision and uncertainty [1, 2, 3, 4, 5, 6, 7], which characterize most real-world phenomena.
2 Certainty and Precision The excess of precision and certainty in engineering and scientific research and development is often providing unrealizable solutions. Certainty and precision have much too often become an absolute standard in design, decision making, and control problems. This tradition can be seen in the following statement [8] by Lord Kelvin in 1883: In physical science, the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge but you have scarcely, in your thoughts, advanced to the state of science, whatever the matter may be.
One of the fundamental aims in science and engineering has been to move from perceptions to measurements in observations, analysis, and decision making. Through the methodology of precise measurements, engineers and scientists have had many remarkable accomplishments. These include putting people on the moon and returning them safely to Earth, sending spacecraft to the far reaches of the solar system, sending rovers to explore the surface of Mars, exploring the oceans depths, designing computers that can perform billions of computations per second, developing the nuclear bomb, mapping the human genome, and constructing a scanning tunneling microscope that can move individual atoms. However, the path
254
A. M. G. Solo, M. M. Gupta
of precision, as manifested in the theories of determinism and stochasticism, has often caused engineers to be ineffectual and powerless as well as lose scientific creativity.
3 Uncertainty and Imprecision in Perception and Cognition The attribute of certainty or precision does not exist in human perception and cognition. Alongside many startling achievements using the methodology of precise measurements, there have been many abysmal failures that include modeling the behavior of economic, political, social, physical, and biological systems. Engineers have been unable to develop technology that can decipher sloppy handwriting, recognize oral speech as well as a human can, translate between languages as well as a human interpreter, drive a car in heavy traffic, walk with the agility of a human or animal, replace the combat infantry soldier, determine the veracity of a statement by a human subject with an acceptable degree of accuracy, replace judges and juries, summarize a complicated document, and explain poetry or song lyrics. Underlying these failures is the inability to manipulate imprecise perceptions instead of precise measurements in computing methodologies. Albert Einstein wrote, “So far as the laws of mathematics refer to reality, they are not certain. And so far as they are certain, they do not refer to reality [9].” There are various types of uncertainty. However, they can be classified under two broad categories [1]: type one uncertainty and type two uncertainty.
3.1 Type One Uncertainty Type one uncertainty deals with information that arises from the random behavior of physical systems. The pervasiveness of this type of uncertainty can be witnessed in random vibrations of a machine, random fluctuations of electrons in a magnetic field, diffusion of gases in a thermal field, random electrical activities of cardiac muscles, uncertain fluctuations in the weather pattern, and turbulent blood flow through a damaged cardiac valve. Type one uncertainty has been studied for centuries. Complex statistical mathematics has evolved for the characterization and analysis of such random phenomena.
3.2 Type Two Uncertainty Type two uncertainty deals with information or phenomena that arise from human perception and cognition processes or from cognitive information in general. This subject has received relatively little attention. Perception and cognition
Uncertainty in Computational Perception and Cognition
255
through biological sensors (eyes, ears, nose, etc.), perception of pain, and other similar biological events throughout our nervous system and neural networks deserve special attention. The perception phenomena associated with these processes are characterized by many great uncertainties and cannot be described by conventional statistical theory. A person can linguistically express perceptions experienced through the senses, but these perceptions cannot be described using conventional statistical theory. Type two uncertainty and the associated cognitive information involve the activities of neural networks. It may seem strange that such familiar notions have recently become the focus of intense research. However, it is the relative unfamiliarity of these notions and their possible technological applications in intelligent systems that have led engineers and scientists to conduct research in the field of type two uncertainty and its associated cognitive information.
4 Human Perception and Cognition The development of the human cognitive process and environmental perception starts taking shape with the development of imaginative ability in a baby’s brain. A baby in a cradle can recognize a human face long before it is conscious of any visual physical attributes of humans or the environment. Perception and cognition are not precise or certain. Imprecision and uncertainty play an important role in thinking and reasoning throughout a person’s life. For example, when a person is driving a car, his natural sensors (in this case, the eyes and ears) continuously acquire information and relay it through sensory neurons to the cognitive processor (brain), which perceives environmental conditions and stimuli. The cognitive processor synthesizes the control signals and transmits them through the motor neurons to the person’s hands and feet, which activate steering, accelerating, and braking mechanisms in the car. The human brain can instantly perceive and react to a dangerous environmental stimulus, but it might take several seconds to perform a mathematical calculation. Also, the brain has an incredible ability to manipulate perceptions of distance, size, weight, force, speed, temperature, color, shape, appearance, and innumerable other environmental attributes. The human brain is an information processing system and neurocontroller that takes full advantage of parallel distributed hardware, handling thousands of actuators (muscle fibers) in parallel, functioning effectively in the presence of noise and nonlinearity, and capable of optimizing over a long-range planning horizon. It possesses robust attributes of distributed sensors and control mechanisms. The perception process acquires information from the environment through the natural sensory mechanisms of sight, hearing, touch, smell, and taste. This information is integrated and interpreted through the cognitive processor called the brain. Then the cognitive process, by means of a complex network of neurons distributed in the central nervous system, goes through learning, recollecting, and reasoning, resulting in appropriate muscular control.
256
A. M. G. Solo, M. M. Gupta
Silicon-based computers can outperform carbon-based computers like the human brain in numerical computations, but perform poorly in tasks such as image processing and voice recognition. It is widely believed that this is partly because of the massive parallelism and tolerance for imprecision and uncertainty embedded in biological neural processes. Therefore, it can be very beneficial to have a greater understanding of the brain in developing intelligent systems.
5 Computational Perception and Cognition The computational function of a microcomputer is based on Boolean or binary logic. Boolean logic is unable to model human perception and cognition processes. For many years, the conventional wisdom espoused only Boolean logic, as expressed by Willard Van Orman Quine in 1970 [10]: Let us grant, then, that the deviant can coherently challenge our classical true-false dichotomy. But why should he want to? Reasons over the years have ranged from bad to worse. The worst one is that things are not just black and white; there are graduations. It is hard to believe that this would be seen as counting against classical negation; but irresponsible literature to this effect can be cited.
Bertrand Russell, on the other hand, saw the inherent limitations of traditional logic, as he expressed in 1923 [11]: All traditional logic habitually assumes that precise symbols are being employed. It is therefore not applicable to this terrestrial life, but only to an imagined celestial one. The law of excluded middle [A OR not-A] is true when precise symbols are employed but it is not true when symbols are vague, as, in fact, all symbols are.
Conventional mathematical tools, both deterministic and stochastic, are based on some absolute measures of information. However, the perception and cognition activity of the brain is based on relative grades of information acquired by the human sensory system. Human sensors acquire information in the form of relative grades rather than absolute numbers. Fresh information acquired from the environment using natural sensors and information (experience, knowledge base) is stored in memory. The action of the cognitive process also appears in the form of relative grades. For example, a person driving an automobile perceives the environment in a relatively graded sense and acts accordingly. Thus, cognitive processes are based on graded information. Conventional mathematical methods can describe objects such as circles, ellipses and parabolas, but they are unable to describe mountains, lakes, and cloud formations. One can estimate the volume of snow, the heights of mountains, or the frequencies of vibrating musical strings, but conventional mathematics cannot be used to linguistically describe the feelings associated with human perception. The study of such formless uncertainties has led engineers and scientists to contemplate developing morphology for this amorphous uncertainty. In the past, engineers and scientists have disdained this challenge and have chosen to avoid this gap by devising theories unrelated to perception and cognitive processes.
Uncertainty in Computational Perception and Cognition
257
6 Fuzzy Logic The theory of fuzzy logic [8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28] is based on the notion of relative graded membership, as inspired by the processes of human perception and cognition. Lotfi A. Zadeh published his first celebrated paper on fuzzy sets [12] in 1965. Since then, fuzzy logic has proven to be a very promising tool for dealing with type two uncertainty. Stochastic theory is only effective in dealing with type one uncertainty. Fuzzy logic can deal with information arising from computational perception and cognition that is uncertain, imprecise, vague, partially true, or without sharp boundaries. Fuzzy logic allows for the inclusion of vague human assessments in computing problems. Also, it provides an effective means for conflict resolution of multiple criteria and better assessment of options. New computing methods based on fuzzy logic can be used in the development of intelligent systems for decision making, identification, recognition, optimization, and control. Measurements are crisp numbers, but perceptions are fuzzy numbers or fuzzy granules, which are groups of objects in which there can be partial membership and the transition of a membership function is gradual, not abrupt. A granule is a group of objects put together by similarity, proximity, functionality, or indistinguishability. Fuzzy logic is extremely useful for many people involved in research and development including engineers (electrical, mechanical, civil, chemical, aerospace, agricultural, biomedical, computer, environmental, geological, industrial, mechatronics), mathematicians, computer software developers and researchers, natural scientists (biology, chemistry, earth science, physics), medical researchers, social scientists (economics, management, political science, psychology), public policy analysts, business analysts, jurists, etc. Indeed, the applications of fuzzy logic, once thought to be an obscure mathematical curiosity, can be found in many engineering and scientific works. Fuzzy logic has been used in numerous applications such as facial pattern recognition, washing machines, vacuum cleaners, antiskid braking systems, transmission systems, control of subway systems and unmanned helicopters, knowledge-based systems for multiobjective optimization of power systems, weather forecasting systems, models for new product pricing or project risk assessment, medical diagnosis and treatment plans, and stock trading. This branch of mathematics has instilled new life into scientific disciplines that have been dormant for a long time.
7 Computing with Words and Computational Theory of Perceptions Computing has traditionally been focused on the manipulation of numbers and symbols. On the other hand, the new methodology called computing with words in its initial stages of development is focused on the manipulation of words and propositions taken from natural language. The computational theory of perceptions
258
A. M. G. Solo, M. M. Gupta
is based on the methodology of computing with words [8, 23, 24, 25]. Computing with words and the computational theory of perceptions are subsets of fuzzy logic. Words act as the labels for perceptions and perceptions are expressed in natural language propositions. The methodology of computing with words is used to translate natural language propositions into a generalized constraint language. In this generalized constraint language, a proposition is expressed as a generalized constraint X isr R where X is the constrained variable, R is the constraining relation, and isr is a variable copula in which r is a variable that defines the way X is constrained by R.The basic types of constraints are possibilistic, probabilistic, veristic, random set, Pawlak set, fuzzy graph, and usuality. There are four principle reasons for using computing with words in an engineering or scientific problem: 1. The values of variables and/or parameters are not known with sufficient precision or certainty to justify using conventional computing methods. 2. The problem can’t be solved using conventional computing methods. 3. A concept is too difficult to be defined using crisp numerical criteria. 4. There is a tolerance for imprecision or uncertainty that can be exploited to achieve greater adaptability, tractability, robustness, a lower cost solution, or better rapport with reality in the development of intelligent systems. The computing with words methodology is much more effective and versatile for knowledge representation, perception, and cognition than classical logic systems such as predicate logic, propositional logic, modal logic, and artificial intelligence techniques for knowledge representation and natural language processing. The computational theory of perceptions and computing with words will be extremely important branches of fuzzy logic in the future.
8 Examples of Uncertainty Management in Computational Perception and Cognition for Linguistic Evaluations The following examples of uncertainty management in computational perception and cognition for linguistic evaluations illustrate applications of fuzzy sets, linguistic variables, fuzzy math, the computational theory of perceptions, computing with words, and fuzzy rule bases. A database of students’ grades on a test is kept by a professor: Database = Students [Name, Grade] The students are assigned a linguistic grade for their test performance. These linguistic grades are fail, poor, satisfactory, good, excellent, or outstanding. A crisp set is created for the fail linguistic qualifier and a fuzzy set is created for each of these remaining linguistic qualifiers, as shown in Fig. 1. Each of these crisp and fuzzy sets corresponds to a range of numeric grades.
Uncertainty in Computational Perception and Cognition
259
Fig. 1 Fuzzy Sets for Test Grades
The fail crisp set is represented by a Z-shaped membership function that assigns a membership value of 1 to all grades below 49.5% and a membership value of 0 to all grades above 49.5%. The outstanding fuzzy set is represented by an S-shaped membership function that assigns a membership value of 0 to grades below 88%, a membership value increasing from 0 at 88% to 1 at 92%, and a membership value of 1 to grades above 92%. The poor, satisfactory, good, and excellent fuzzy sets are represented by trapezoidal membership functions. The poor fuzzy set assigns unity membership from 49.5%–58%, membership transitioning from 1 at 58% to 0 at 62%, and 0 membership below 49.5% and above 62%. Membership in the satisfactory fuzzy set transitions from 0 at 58% to 1 at 62%, is 1 from 62% through 68%, and transitions from 1 at 68% to 0 at 72%. Membership in the good fuzzy set transitions from 0 at 68% to 1 at 72%, is 1 from 72% through 78%, and transitions from 1 at 78% to 0 at 82%. The excellent fuzzy set assigns unity membership from 82%-88%, membership transitioning from 0 at 78% to 1 at 82%, and membership transitioning from 1 at 88% to 0 at 92%. The absolute values of the slopes is the same for the trailing edge of the poor fuzzy set, the leading edge of the outstanding fuzzy set, and the leading and trailing edges of the satisfactory, good, and excellent fuzzy sets. Thus, membership is not biased toward one fuzzy set over another and if a student’s test grade has membership in two fuzzy sets, the summation of its membership in both of these sets will be 1. For example, for a grade of 60%, there is a membership of 0.5 in the poor fuzzy set and 0.5 in the satisfactory fuzzy set. For a grade of 71%, there is a membership of 0.75 in the good fuzzy set and a membership of 0.25 in the satisfactory fuzzy set. Linguistic qualifiers by themselves can be restrictive in describing fuzzy variables, so linguistic hedges can be used to supplement linguistic qualifiers through numeric transformation. For example, a very hedge can square the initial degree of membership µ in a fuzzy set: µvery good (Student 5) = [µgood (Student 5)]2 Following are some other possible linguistic hedges:
260
• • • • • • • • •
A. M. G. Solo, M. M. Gupta
somewhat slightly more or less quite extremely above below fairly not
A fuzzy rule base can be developed to decide what to do with individual students based on their grades on a test. A sample fuzzy rule base follows: If a student receives a grade of fail on a test, then phone her parents and give her additional homework. If a student receives a grade of poor on a test, then give her additional homework. If a student receives a grade of satisfactory or good or excellent on a test, then do nothing. If a student receives a grade of outstanding on a test, then give her a reward. If a student receives a grade that has membership in two different fuzzy sets, this will potentially assign two different crisp consequents for dealing with the student. For example, if a student receives a grade of 91%, then this grade has membership in two fuzzy sets: 1. excellent with a degree of membership of 0.25 2. outstanding with a degree of membership of 0.75 The maximum (MAX) operator can be used to resolve these conflicting fuzzy antecedents to the fuzzy rule. The MAX operator in fuzzy set theory is a generalization of the MAX operator in crisp set theory. In the following equation, µ represents the degree of membership; x represents the domain value; k1, . . . , kn. represent fuzzy sets in which there is membership for a domain value of x; and k represents the fuzzy set in which there is greatest membership for a domain value of x: µk = MAX{µk1 (x), . . . , µkn (x)} Using the MAX operator, the antecedent used in the fuzzy rules above is outstanding with a degree of membership of 0.75. This results in the student getting a reward. Linguistic grades possibly modified by linguistic hedges can be defuzzified using the center of gravity defuzzification formula to obtain a single crisp representative value. In the following equation for center of gravity defuzzification, µ is the degree of membership, x is the domain value, A is the membership function possibly modified by linguistic hedges, and x 0 is the single crisp representative value:
Uncertainty in Computational Perception and Cognition
x
xo = x
261
µ A(x)xd ˜ x µ A(x)d ˜ x
An instructor can assign linguistic grades to students for a series of tests. Each of these linguistic grades can be defuzzified using the center of gravity defuzzification formula, so a crisp overall numerical average can be calculated. Then this crisp overall grade can be refuzzified for an overall linguistic grade. A numerical average is not always useful in determining overall performance. For example, there could be two critical exams that test two very different criteria, and performance in each of these exams must be considered both individually and together in making a decision on an individual’s advancement. In such a case, a fuzzy rule base can be developed to evaluate an individual’s performance on different exams and make a decision on the individual’s advancement. A sample fuzzy rule base is shown below that evaluates the performance of a soldier on a special forces academic exam and a special forces athletic exam, and then determines the degree of desirability of that soldier for special forces training. The same fuzzy sets shown in Fig. 1 and described above are used for the antecedents of fuzzy rules in this fuzzy rule base. The fuzzy sets shown in Fig. 2 are used for the consequents of fuzzy rules in this fuzzy rule base to describe the degree of desirability of soldiers for special forces training. The sample fuzzy rule base follows: If a soldier receives a grade of fail or poor on the special forces academic exam or a grade of fail or poor on the special forces athletic exam, then he is undesirable for special forces training. If a soldier receives a grade of satisfactory on the special forces academic exam or a grade of satisfactory on the special forces athletic exam, then he has low desirability for special forces training. If a soldier receives a grade of good or excellent on the special forces academic exam and a grade of good or excellent on the special forces athletic exam, then he has medium desirability for special forces training. If a soldier receives a grade of good or excellent on the special forces academic exam and a grade of outstanding on the special forces athletic exam, then he has high desirability for special forces training. If a soldier receives a grade of outstanding on the special forces academic exam and a grade of good or excellent on the special forces athletic exam, then he has high desirability for special forces training. If a soldier receives a grade of outstanding on the special forces academic exam and a grade of outstanding on the special forces athletic exam, then accept him for special forces training. The overall desirability for special forces training of soldiers in a platoon can be determined to evaluate the performance of a platoon leader. To calculate this, the desirability for special forces training of each individual soldier in the platoon is
262
A. M. G. Solo, M. M. Gupta
Fig. 2 Fuzzy Sets for Desirability
defuzzified. Then the crisp overall numerical average can be calculated and refuzzified for a single linguistic grade describing the overall desirability for special forces training of soldiers in a platoon. The computational theory of perceptions and computing with words can be used to form propositions and queries on linguistic grades. A professor keeps the grades of N students in a database. The name and grade of student i where i = 1 . . . , N is indicated by Namei and Gr adei ,respectively. Consider the following natural language query: What proportion of students received outstanding grades on the test? The answer to this query is given by the sigma count [17]:
Count
1 (outstandi ng. students) = µout st anding (Gr adei ) students N i
Consider another natural language query: What proportion of students received satisfactory or better grades on the test? The answer to this query is given by the following: =
Count
(satisfactory. students + good. students + excellent. students + outstanding. students) students
1 [µsatis f actory (Gradei ) + µgood (Gradei ) + µexcellent (Gradei ) N i
+ µoutstanding (Gradei )]
Consider the natural language proposition: Most students did satisfactory on the test. The membership function of most is defined as in Fig. 3. This natural language proposition can be translated to a generalized constraint language proposition: 1 µsat is f act or y (Gr adei ) is most. N i
Uncertainty in Computational Perception and Cognition
263
Fig. 3 Membership Function of Most
Finally, a fuzzy rule base can be developed to decide whether to make adjustments to students’ test grades. A sample fuzzy rule base follows: If most students receive a grade of fail on the test, then give them a makeup test and take teaching lessons. If most students receive a grade of fail or poor or satisfactory on the test, then boost the grades so most students receive a grade of good or excellent or outstanding on the test. If most students receive a grade of good or excellent or outstanding on the test, then hand back the tests with these grades.
9 Conclusion Uncertainty is an inherent phenomenon in the universe and in peoples’ lives. To some, it may become a cause of anxiety, but to engineers and scientists it becomes a frontier full of challenges. Engineers and scientists attempt to comprehend the language of this uncertainty through mathematical tools, but these mathematical tools are still incomplete. In the past, studies of cognitive uncertainty and cognitive information were hindered by the lack of suitable tools for modeling such information. However, fuzzy logic, neural networks, and other methods have made it possible to expand studies in this field.
264
A. M. G. Solo, M. M. Gupta
Progress in information technology has significantly broadened the capabilities and applications of computers. Today’s computers are merely being used for the storage and processing of binary information. The functions of these computing tools should be reexamined in view of an increasing interest in intelligent systems for solving problems related to decision making, identification, recognition, optimization, and control [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]. Shannon’s definition of information was based on certain physical measurements of random activities in systems, particularly communication channels [42]. This definition of information was restricted only to a class of information arising from physical systems. To emulate cognitive functions such as learning, remembering, reasoning, and perceiving in intelligent systems, this definition of information must be generalized and new mathematical tools and hardware must be developed that can deal with the simulation and processing of cognitive information.
References 1. M. M. Gupta, “Cognition, Perception and Uncertainty,” in Fuzzy Logic in Knowledge-Based Systems, Decision and Control, M. M. Gupta and T. Yamakawa, eds., New York: NorthHolland, 1988, pp. 3–6. 2. M. M. Gupta, “On Cognitive Computing: Perspectives,” in Fuzzy Computting: Theory, Hardware, and Applications, M. M. Gupta and T. Yamakawa, eds., New York: North-Holland, 1988, pp. 7–10. 3. M. M. Gupta, “Uncertainty and Information: The Emerging Paradigms,” International Journal of Neuro and Mass-Parallel Computing and Information Systems, vol. 2, 1991, pp. 65–70. 4. M. M. Gupta, “Intelligence, Uncertainty and Information,” in Analysis and Management of Uncertainty: Theory and Applications, B. M. Ayyub, M. M. Gupta, and L. N. Kanal, eds., New York: North-Holland, 1992, pp. 3–12. 5. G. J. Klir, “Where Do We Stand on Measures of Uncertainty, Ambiguity, Fuzziness and the Like,” Fuzzy Sets and Systems, Special Issue on Measure of Uncertainty, vol. 24, no. 2, November 1987, pp. 141–160. 6. G. J. Klir, “The Many Faces of Uncertainty,” in Uncertainty Modelling and Analysis: Theory and Applications, B. M. Ayyub and M. M. Gupta, eds., New York: North-Holland, 1994, pp. 3–19. 7. M. Black, “Vagueness: An Exercise in Logical Analysis,” Philosophy of Science, no. 4, 1937, pp. 427–455. 8. L. A. Zadeh, “From Computing with Numbers to Computing with Words—From Manipulation of Measurements to Manipulation of Perceptions,” IEEE Transactions on Circuits and Systems—I: Fundamental Theory and Applications, vol. 46, no. 1, 1999, pp. 105–119. 9. A. Einstein, “Geometry and Experience,” in The Principle of Relativity: A Collection of Original Papers on the Special and General Theory of Relativity, New York: Dover, 1952. 10. W. V. O. Quine, Philosophy of Logic, Englewood Cliffs, NJ: Prentice Hall, 1970. 11. B. Russell, “Vagueness,” Australian Journal of Philosophy, vol. 1, 1923. 12. L. A. Zadeh, “Fuzzy Sets,” Information and Control, vol. 8, 1965, pp. 338–353. 13. L. A. Zadeh, “A fuzzy-set-theoretic interpretation of linguistic hedges,” Journal of Cybernetics, vol. 2, 1972, pp. 4–34. 14. L. A. Zadeh, “Outline of a new approach to the analysis of complex system and decision processes,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 3, pp. 28–44. 1973.
Uncertainty in Computational Perception and Cognition
265
15. L. A. Zadeh, “Calculus of fuzzy restrictions,” in Fuzzy Sets and Their Applications to Cognitive and Decision Processes, L. A. Zadeh, K. S. Fu, and M. Shimura, eds., New York: Academic, 1975, pp. 1–39. 16. L. A. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning,” Part I: Information Science, vol. 8, pp. 199–249. Part II: Information Science, vol. 8, pp. 301–357, Part III: Information Science, vol. 9, pp. 43–80, 1975. 17. L. A. Zadeh, “PRUF—A meaning representation language for natural languages,” International Journal of Man-Machine Studies, vol. 10, 1978, pp. 395–460. 18. L. A. Zadeh, “Fuzzy sets and information granularity,” in Advances in Fuzzy Set Theory and Applications, M. M. Gupta, R. Ragade, and R. Yager, eds., Amsterdam: North-Holland, 1979, pp. 3-18. 19. L. A. Zadeh, “A theory of approximate reasoning,” Machine Intelligence, vol. 9, J. Hayes, D. Michie, and L. I. Mikulich, eds., New York: Halstead, 1979, pp. 149–194. 20. L. A. Zadeh, “Outline of a computational approach to meaning and knowledge representation based on the concept of a generalized assignment statement,” Proceedings of the International Seminar on Artificial Intelligence and Man-Machine Systems, 1986, pp. 198–211. 21. L. A. Zadeh, “Fuzzy logic, neural networks, and soft computing,” Communications of the ACM, vol. 37, no. 3, 1994, pp. 77–84. 22. L.A. Zadeh, Fuzzy logic and the calculi of fuzzy rules and fuzzy graphs: A précis,” in Multiple Valued Logic 1, Gordon and Breach Science, 1996, pp. 1–38. 23. L. A. Zadeh, “Fuzzy Logic = Computing with Words,” IEEE Transactions on Fuzzy Systems, vol. 4, 1996, pp. 103-111. 24. L. A. Zadeh, “Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic,” Fuzzy Sets and Systems, vol. 90, pp. 111–127. 1997. 25. L. A. Zadeh, “Outline of a Computational Theory of Perceptions Based on Computing with Words,” in Soft Computing & Intelligent Systems, N. K. Sinha and M. M. Gupta, eds., New York: Academic, 2000, pp. 3–22. 26. B. Kosko, Fuzzy Thinking: The New Science of Fuzzy Logic, New York: Hyperion, 1993. 27. A. Kaufmann and M. M. Gupta, Introduction to Fuzzy Arithmetic: Theory and Applications, New York: Van Nostrand Reinhold, 1985. 28. A. Kaufmann and M. M. Gupta, Fuzzy Mathematical Models in Engineering and Management Science, Amsterdam: North-Holland, 1988. 29. R. J. Sarfi and A. M. G. Solo, “The Application of Fuzzy Logic in a Hybrid Fuzzy KnowledgeBased System for Multiobjective Optimization of Power Distribution System Operations,” Proceedings of the 2005 International Conference on Information and Knowledge Engineerin. (IKE’05), pp. 3–9. 30. M. M. Gupta , G. N. Saridis, and B. R. Gaines, eds., Fuzzy Automata and Decision Processes, Elsevier North-Holland, 1977. 31. M. M. Gupta and E. Sanchez, eds., Approximate Reasoning in Decision Analysis, NorthHolland, 1982. 32. M. M. Gupta and E. Sanchez, eds., Fuzzy Information and Decision Processes, NorthHolland, 1983. 33. M. M. Gupta, A. Kandel, W. Bandler, and J. B. Kiszka, eds., Approximate Reasoning in Expert Systems, North-Holland, 1985. 34. S. Mitra, M. M. Gupta, and W. Kraske, eds., Neural and Fuzzy Systems: The Emerging Science of Intelligent Computing, International Society for Optical Computing (SPIE), 1994. 35. M. M. Gupta and D. H. Rao, eds., Neuro-Control Systems: Theory and Applications, Piscataway, NJ: IEEE, 1994. 36. M. M. Gupta and G. K. Knopf, eds., Neuro-Vision Systems: Principles and Applications, Piscataway, NJ: IEEE, 1994. 37. H. Li and M. M. Gupta, eds., Fuzzy Logic and Intelligent Systems, Boston: Kluwer Academic, 1995. 38. B. M. Ayyub and M. M. Gupta, eds., Uncertainty Analysis in Engineering and Sciences: Fuzzy Logic, Statistics and Neural Networks Approach, Boston: Kluwer Academic, 1997.
266
A. M. G. Solo, M. M. Gupta
39. M. M. Gupta and N. K. Sinha, eds., Intelligent Control Systems: Theory and Applications, Piscataway, NJ: IEEE, 1995. 40. N. K. Sinha and M. M. Gupta, eds., Soft Computing and Intelligent Systems, New York: Academic, 2000. 41. M. M. Gupta, L. Jin, and N. Homma, Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory, Hoboken, NJ: John Wiley & Sons, 2003, pp. 633–686. 42. F. M. Reza, An Introduction to Information Theory, New York: McGraw-Hill, 1961, p. 10.
Computational Intelligence for Geosciences and Oil Exploration Masoud Nikravesh
Abstract In this overview paper, we highlight role of Soft Computing techniques for intelligent reservoir characterization and exploration, seismic data processing and characterization, well logging, reservoir mapping and engineering. Reservoir characterization plays a crucial role in modern reservoir management. It helps to make sound reservoir decisions and improves the asset value of the oil and gas companies. It maximizes integration of multi-disciplinary data and knowledge and improves the reliability of the reservoir predictions. The ultimate product is a reservoir model with realistic tolerance for imprecision and uncertainty. Soft computing aims to exploit such a tolerance for solving practical problems. In reservoir characterization, these intelligent techniques can be used for uncertainty analysis, risk assessment, data fusion and data mining which are applicable to feature extraction from seismic attributes, well logging, reservoir mapping and engineering. The main goal is to integrate soft data such as geological data with hard data such as 3D seismic and production data to build a reservoir and stratigraphic model. While some individual methodologies (esp. neurocomputing) have gained much popularity during the past few years, the true benefit of soft computing lies on the integration of its constituent methodologies rather than use in isolation.
1 Introduction This paper reviews the recent geosciences applications of soft computing (SC) with special emphasis on exploration. The role of soft computing as an effective method of data fusion will be highlighted. SC is consortium of computing methodologies (Fuzzy Logic (GL), Neuro-Computing (NC), Genetic Computing (GC), and Probabilistic Reasoning (PR) including; Genetic Algorithms (GA), Chaotic Systems (CS), Belief Networks (BN), Learning Theory (LT)) which collectively provide a foundation for the Conception, Design and Deployment of Intelligent Systems. The role
Masoud Nikravesh Berkeley Initiative in Soft Computing (BISC) rogram, Computer Science Division- Department of EECS, University of California, Berkeley, CA 94720 e-mail: [email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
267
268
M. Nikravesh
model for Soft Computing is the Human Mind. Unlike the conventional or hard computing, it is tolerant of imprecision, uncertainty and partial truth. It is also tractable, robust, efficient and inexpensive. Among main components of soft computing, the artificial neural networks, fuzzy logic and the genetic algorithms in the “exploration domain” will be examined. Specifically, the earth exploration applications of SC in various aspects will be discussed. We outlines the unique roles of the three major methodologies of soft computing – neurocomputing, fuzzy logic and evolutionary computing. We will summarize a number of relevant and documented reservoir characterization applications. We will also provide a list of recommendations for the future use of soft computing. This includes the hybrid of various methodologies (e.g. neural-fuzzy or neuro-fuzzy, neural-genetic, fuzzy-genetic and neural-fuzzygenetic) and the latest tool of “computing with words” (CW) (Zadeh 1996). CW provides a completely new insight into computing with imprecise, qualitative and linguistic phrases and is a potential tool for geological modeling which is based on words rather than exact numbers.
2 The Role of Soft Computing Techniques for Intelligent Reservoir Characterization and Exploration Soft computing is bound to play a key role in the earth sciences. This is in part due to subject nature of the rules governing many physical phenomena in the earth sciences. The uncertainty associated with the data, the immense size of the data to deal with and the diversity of the data type and the associated scales are important factors to rely on unconventional mathematical tools such as soft computing. Many of these issues are addressed in a recent books, Nikravesh et al. (2002), Wong et al (2001), recent special issues, Nikravesh et al. (2001a and 2001b) and Wong and Nikravesh (2001) and other publications such as Zadeh (1994), Zadeh and Aminzadeh (1995), and Aminzadeh and Jamshidi (1994). Recent applications of soft computing techniques have already begun to enhance our ability in discovering new reserves and assist in improved reservoir management and production optimization. This technology has also been proven useful in production from low permeability and fractured reservoirs such as fractured shale, fractured tight gas reservoirs and reservoirs in deep water or below salt which contain major portions of future oil and gas resources. Intelligent techniques such as neural computing, fuzzy reasoning, and evolutionary computing for data analysis and interpretation are an increasingly powerful tool for making breakthroughs in the science and engineering fields by transforming the data into information and information into knowledge. In the oil and gas industry, these intelligent techniques can be used for uncertainty analysis, risk assessment, data fusion and mining, data analysis and interpretation, and knowledge discovery, from diverse data such as 3-D seismic, geological data, well log, and production data. In addition, these techniques can be a key to cost effectively locating and producing our remaining oil and gas reserves. Techniques can be used as a tool for:
Computational Intelligence for Geosciences and Oil Exploration
1) 2) 3) 4)
269
Lowering Exploration Risk Reducing Exploration and Production cost Improving recovery through more efficient production Extending the life of producing wells.
In what follows we will address data processing / fusion / mining, first. Then, we will discuss interpretation, pattern recognition and intelligent data analysis.
2.1 Mining and Fusion of Data In recent years in the oil industry we have witnessed massive explosion in the data volume we have to deal with. As outlined at, Aminzadeh, 1996, this is caused by increased sampling rate, larger offset and longer record acquisition, multi-component surveys, 4-D seismic and, most recently, the possibility of continuous recording in “instrumented oil fields”. Thus we need efficient techniques to process such large data volumes. Automated techniques to refine the data (trace editing and filtering), selecting the desired event types (first break picking) or automated interpretation (horizon tracking) are needed for large data volumes. Fuzzy logic and neural networks have been proven to be effective tools for such applications. To make use of large volumes of the field data and multitude of associated data volumes (e.g. different attribute volumes or partial stack or angle gathers), effective data compression methods will be of increasing significance, both for fast data transmission efficient processing, analysis and visualization and economical data storage. Most likely, the biggest impact of advances in data compression techniques will be realized when geoscientists have the ability to fully process and analyze data in the compressed domain. This will make it possible to carry out computer-intensive processing of large volumes of data in a fraction of the time, resulting in tremendous cost reductions. Data mining is another alternative that helps identify the most information rich part of the large volumes of data. Again in many recent reports, it has been demonstrated that neural networks and fuzzy logic, in combination of some of the more conventional methods such as eigenvalue or principal component analysis are very useful. Figure 1 shows the relationship between Intelligent Technology and Data Fusion/Data Mining. Tables 1 and 2 show the list of the Data Fusion and Data Mining techniques. Figure 2 and Table 3 show the Reservoir Data Mining and Reservoir Data Fusion concepts and techniques. Table 4 shows the comparison between Geostatistical and Intelligent techniques. In sections II through VIII, we will highlight some of the recent applications of these methods in various earth sciences disciplines.
2.2 Intelligent Interpretation and Data Analysis For detailed review of various applications of soft computing in intelligent interpretation, data analysis and pattern recognition see recent books, Nikravesh et al.
270
M. Nikravesh
Fig. 1 Intelligent Technology
Intelligent Technologies I : Conventional interpretation II : Conventional integration
III
Data Fusion
III: Intelligent characterization
II I
Data Ming
(2002), Wong et al (2001), recent special issues, Nikravesh et al. (2001a and 2001b) and Wong and Nikravesh (2001) and other publications such as Aminzadeh (1989), Aminzadeh (1991) and Aminzadeh and Jamshidi (1995). Although seismic signal processing has advanced tremendously over the last four decades, the fundamental assumption of a “convolution model” is violated in many practical settings. Sven Treitel, in Aminzadeh (1995) was quoted to pose the question: What if, mother earth refuses to convolve? Among such situations are:
• • • • • •
Table 1 Data Mining Techniques Deductive Database Client Inductive Learning Clustering Case-based Reasoning Visualization Statistical Package Table 2 Data Fusion Techniques
• Deterministic – – – –
Transform based (projections,. . .) Functional evaluation based (vector quantization,. . .) Correlation based (pattern match, if/then productions) Optimization based (gradient-based, feedback, LDP,. . .)
• Non-deterministic – – – – –
Hypothesis testing (classification,. . .) Statistical estimation (Maximum likelihood,. . .) Discrimination function (linear aggregation,. . .) Neural network (supervised learning, clusterning,. . .) Fuzzy Logic (Fuzzy c-Mean Clustering,. . .)
• Hybrid (Genetic algorithms, Bayesian network,. . .)
Computational Intelligence for Geosciences and Oil Exploration Fig. 2 Reservoir Data Mining
271
Reservoir Data Mining Geological/Stratigraphic
Seismic Well Log Core Test
Seismic Attributes Ro
ckp
hy
sic
al/
Ge
ost
ati
Formation Characters
stic
al/
Int
ell
ige
nt
Al
go rith
ms
Reservoir Properties
Table 3 Reservoir Data Fusion • Rockphysical – transform seismic data to attributes and reservoir properties – formulate seismic/log/core data to reservoir properties • Geostatistical – transform seismic attributes to formation characters – transform seismic attributes to reservoir properties – simulate the 2D/3D distribution of seismic and log attributes • Intelligent – – – –
clustering anomalies in seismic/log data and attributes ANN layers for seismic attribute and formation characters supervised training model to predict unknown from existing hybrid such as GA and SA for complicated reservoirs
Table 4 Geostatistical vs. Intelligent • Geostatistical – Data assumption: a certain probability distribution – Model: weight functions come from variogram trend, stratigraphic facies, and probability constraints – Simulation: Stochastic, not optimized • Intelligent – Data automatic clustering and expert-guided segmentation – Classification of relationship between data and targets – Model: weight functions come from supervised training based on geological and stratigraphic information – Simulation: optimized by GA, Sa, ANN, and BN
272
M. Nikravesh
highly heterogeneous environments, very absorptive media (such as unconsolidated sand and young sediments), fractured reservoirs, and mud volcano, karst and gas chimneys. In such cases we must consider non-linear processing and interpretation methods. Neural networks fractals, fuzzy logic, genetic algorithms, chaos and complexity theory are among such non-linear processing and analysis techniques that have been proven to be effective. The highly heterogeneous earth model that geophysics attempts to quantify is an ideal place for applying these concepts. The subsurface lives in a hyper-dimensional space (the properties can be considered as the additional space dimension), but its actual response to external stimuli initiates an internal coarse-grain and self-organization that results in a low-dimensional structured behavior. Fuzzy logic and other non-linear methods can describe shapes and structures generated by chaos. These techniques will push the boundaries of seismic resolution, allowing smaller-scale anomalies to be characterized.
2.3 Pattern Recognition Due to recent advances in computer systems and technology, artificial neural networks and fuzzy logic models have been used in many pattern recognition applications ranging from simple character recognition, interpolation, and extrapolation between specific patterns to the most sophisticated robotic applications. To recognize a pattern, one can use the standard multi-layer perceptron with a backpropagation learning algorithm or simpler models such as self-organizing networks (Kohonen, 1997) or fuzzy c-means techniques (Bezdek et al., 1981; Jang and Gulley, 1995). Self-organizing networks and fuzzy c-means techniques can easily learn to recognize the topology, patterns, or seismic objects and their distribution in a specific set of information.
2.4 Clustering Cluster analysis encompasses a number of different classification algorithms that can be used to organize observed data into meaningful structures. For example, k-means is an algorithm to assign a specific number of centers, k, to represent the clustering of N points (k
Computational Intelligence for Geosciences and Oil Exploration
273
Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and the cluster centers in a set of data. The cluster estimates obtained from subtractive clustering can be used to initialize iterative optimization-based clustering methods and model identification methods. In addition, the self-organizing map technique known as Kohonen’s selforganizing feature map (Kohonen, 1997) can be used as an alternative for clustering purposes. This technique converts patterns of arbitrary dimensionality (the pattern space) into the response of one- or two-dimensional arrays of neurons (the feature space). This unsupervised learning model can discover any relationship of interest such as patterns, features, correlations, or regularities in the input data, and translate the discovered relationship into outputs. The first application of clustering techniques to combine different seismic attributes was introduced in the mid eighties (Aminzadeh and Chatterjee, 1984)
2.5 Data Integration and Reservoir Property Estimation Historically, the link between reservoir properties and seismic and log data have been established either through “statistics-based” or “physics-based” approaches. The latter, also known as model based approaches attempt to exploit the changes in seismic character or seismic attribute to a given reservoir property, based on physical phenomena. Here, the key issues are sensitivity and uniqueness. Statistics based methods attempt to establish a heuristic relationship between seismic measurements and prediction values from examination of data only. It can be argued that a hybrid method, combining the strength of statistics and physics based method would be most effective. Figure 3, taken from Aminzadeh, 1999 shows the concepts schematically.
Statistical Methods (Regression, clustering, cross plot, kriging, co-kriging, ANN….) Data
Properties
Seismic, log,
Uncertainty
Physical Methods (Rock meas., synthetic modeling, bright spot, Q, Biot-Gassman….)
Fig. 3 A schematic description of physics-based (blue), statistics-based (red) and hybrid method (green)
274
M. Nikravesh
Many geophysical analysis methods and consequently seismic attributes are based on physical phenomena. That is, based on certain theoretical physics (wave propagation, Biot-Gassman Equation, Zoeppritz Equation, tuning thickness, shear wave splitting, etc.) certain attributes may be more sensitive to changes in certain reservoir properties. In the absence of a theory, using experimental physics (for example rock property measurements in a laboratory environment such as the one described in the last section of this paper) and/or numerical modeling, one can identify or validate suspected relationships. Although physics-based methods and direct measurements (the ground truth) is the ideal and reliable way to establish such correlations, for various reasons it is not always practical. Those reasons range from lack of known theories, difference between the laboratory environment and field environment (noise, scale, etc.) and the cost for conducting elaborate physical experiments. Statistics-based methods aim at deriving an explicit or implicit heuristic relationship between measured values and properties to be predicted. Neural-networks and fuzzy-neural networks-based methods are ideally suitable to establish such implicit relationships through proper training. We all attempt to establish a relationship between different seismic attributes, petrophysical measurements, laboratory measurements and different reservoir properties. In such statistics-based method one has keep in mind the impact of noise on the data, data population used for statistical analysis, scale, geologic environment, scale and the correlation between different attributes when performing clustering or regressions. The statistics-based conclusions have to be reexamined and their physical significance explored.
2.6 Quantification of Data Uncertainty and Prediction Error and Confidence Interval One of the main problems we face is to handle non-uniqueness issue and quantify uncertainty and confidence intervals in our analysis. We also need to understand the incremental improvements in prediction error and confidence range from introduction of new data or a new analysis scheme. Methods such as evidential reasoning and fuzzy logic are most suited for this purpose. Figure 4 shows the distinction between conventional probability and theses techniques. “Point probability,” describes the probability of an event, for example, having a commercial reservoir. The implication is we know exactly what this probability is. Evidential reasoning, provides an upper bound (plausibility) and lower bound (credibility) for the event the difference between the two bounds is considered as the ignorance range. Our objective is to reduce this range through use of all the new information. Given the fact that in real life we may have non-rigid boundaries for the upper and lower bounds and we ramp up or ramp down our confidence for an event at some point, we introduce fuzzy logic to handle and we refer to it as “membership grade”. Next-generation earth modeling will incorporate quantitative representations of geological processes and stratigraphic / structural variability. Uncertainty will be quantified and built into the models.
μ, Membership Grade
Computational Intelligence for Geosciences and Oil Exploration
Credibility
275
Point Probability Plausibility
Ignorance Range (Confidence Interval) .2
.3
Probability
.5
Fig. 4 Point probability, evidential reasoning and fuzzy logic
On the issue of non-uniqueness, the more sensitive the particular seismic character to a given change to reservoir property, the easier to predict it. The more unique influence of the change in seismic character to changes in a specific reservoir property, the higher the confidence level in such predictions. Fuzzy logic can handle subtle changes in the impact of different reservoir properties on the wavelet response. Moreover comparison of multitude of wavelet responses (for example near, mid and far offset wavelets) is easier through use of neural networks. As discussed in Aminzadeh and de Groot, 2001, let us assume a seismic pattern for three different lithologies (sand, shaly sand and shale) are compared from different well information and seismic response (both model and field data) and the respective seismic character within the time window or the reservoir interval with four “classes” of wavelets, (w1, w2, w3 and w4). These 4 wavelets (basis wavelets) serve as a segmentation vehicle. The histograms in Fig. 5a show what classes of wavelets that are likely to be present for given lithologies. In the extreme positive (EP) case we would have one wavelet uniquely representing one lithology. In the extreme negative case (EN) we would have a uniform distribution of all wavelets for all lithologies. In most cases unfortunately we are closer to NP than to EP. The question is how best we can get these distributions move from the EN side to EP side thus improving our prediction capability and increasing confidence level. The common sense is to add enhance information content of the input data. How about if we use wavelet vectors comprised of pre-stack data (in the simple case, mid, near far offset data) as the input to a neural network to perform the classification? Intuitively, this should lead to a better separation of different lithologies (or other reservoir properties). Likewise, including three component data as the input to the classification process would further improve the confidence level. Naturally, this requires introduction of a new “metric” measuring “the similarity” of these “wavelet vectors”. This can be done using the new basis wavelet vectors as input to a neural network applying different weights to mid, near and far offset traces. This is demonstrated conceptually, in Fig. 5 to predict lithology. Compare the sharper histograms of the vector wavelet classification (in this case, mid, near, and far offset gathers) in Fig. 5b, against those of Fig. 5a based on scalar wavelet classification.
276
M. Nikravesh
Fig. 5 Statistical distribution of different wavelet types versus lithologies, A, -pre-stack data. B, stacked data
3 Artificial Neural Network and Geoscience Applications of ANN for Exploration Although Artificial neural networks (ANN) were introduced in the late fifties (Rosenblatt, 1962), the interests in them have been increasingly growing in recent years. This has been in part due to new applications fields in the academia and industry. Also, advances in computer technology (both hardware and software) have made it possible to develop ANN capable of tackling practically meaningful problems with a reasonable response time. Simply put, neural networks are computer models that attempt to simulate specific functions of human nervous system. This is accomplished through some parallel structures comprised of non-linear processing nodes that are connected by fixed (Lippmann, 1987), variable (Barhen et al., 1989) or fuzzy (Gupta and Ding, 1994) weights. These weights establish a relationship between the inputs and output of each “Neuron” in the ANN. Usually ANN have several “hidden” layers each layer comprised of several neurons. If the feed-forward (FF) network (FF or concurrent networks are those with unidirectional data flow). The full technical details can be found in Bishop (Bishop, 1995). If the FF network is trained by back propagation (BP) algorithms, they are called BP. Other types of ANN are supervised (self organizing) and auto (hetro) associative networks. Neurocomputing represents general computation with the use of artificial neural networks. An artificial neural network is a computer model that attempts to mimic simple biological learning processes and simulate specific functions of human nervous system. It is an adaptive, parallel information processing system which is able to develop associations, transformations or mappings between objects or data. It is also the most popular intelligent technique for pattern recognition to date. The major applications of neurocomputing are seismic data processing and interpretation, well logging and reservoir mapping and engineering. Good quality seismic data is essential for realistic delineation of reservoir structures. Seismic data quality depends largely on the efficiency of data processing. The processing step is time consuming and complex. The major applications include first arrival picking, noise elimination, structural mapping, horizon picking and event tracking. A detailed review can be found in Nikravesh and Aminzadeh (1999).
Computational Intelligence for Geosciences and Oil Exploration
277
For interwell characterization, neural networks have been used to derive reservoir properties by crosswell seismic data. In Chawathé et al. (1997), the authors used a neural network to relate five seismic attributes (amplitude, reflection strength, phase, frequency and quadrature) to gamma ray (GR) logs obtained at two wells in the Sulimar Queen field (Chaves County). Then the GR response was predicted between the wells and was subsequently converted to porosity based on a fieldspecific porosity-GR transform. The results provided good delineation of various lithofacies. Feature extraction from 3D seismic attributes is an extremely important area. Most statistical methods are failed due to the inherent complexity and nonlinear information content. Figure 3 shows an example use of neural networks for segmenting seismic characters thus deducing information on the seismic facies and reservoir properties (lithology, porosity, fluid saturation and sand thickness). A display of the level of confidence (degree of match) between the seismic character at a given point versus the representative wavelets (centers of clusters) is also shown. Combining this information with the seismic model derived from the well logs while perturbing for different properties gives physical meaning of different clusters. Monson and Pita (1997) applied neural networks to find relationships between 3D seismic attributes and well logs. The study provided realistic prediction of log responses far away from the wellbore. Boadu (1997) also used similar technology to relate seismic attributes to rock properties for sandstones. In Nikravesh et al.7 , the author applied a combination of k-means clustering, neural networks and fuzzy c-means (1984) (a clustering algorithm in which each data vector belongs to each of the clusters to a degree specified by a membership grade) techniques to characterize a field that produces from the Ellenburger Dolomite. The techniques were used to perform clustering of 3D seismic attributes and to establish relationships between the clusters and the production log. The production log was established away from wellbore. The production log and the clusters were then superimposed at each point of a 3D seismic cube. They also identified the optimum locations for new wells based on the connectivity, size and shape of the clusters related to the pay zones. The use of neural networks in well logging has been popular for nearly one decade. Many successful applications have been documented (Wong et al., 1998; Brus et al., (1999); Wong et al., 2000). The most recent work by Bruce et al. (2000) presented a state-of-the-art review of the use of neural networks for predicting permeability from well logs. In this application, the network is used as a nonlinear regression tool to develop transformation between well logs and core permeability. Such a transformation can be used for estimating permeability in uncored intervals and wells. In this work, the permeability profile was predicted by a Bayesian neural network. The network was trained by a training set with four well logs (GR, NPHI, RHOB and RT) and core permeability. The network also provided a measure of confidence (the standard deviation of a Gaussian function): the higher the standard deviation (“sigma”), the lower the prediction reliability. This is very useful for understanding the risk of data extrapolation. The same tool can be applied to estimate porosity and fluid saturations. Another important application is the clustering of well logs for the recognition of lithofacies (Rogers et al., 1992). This provides useful information for improved petrophysical estimates and well correlation.
278
M. Nikravesh
Neurocomputing has also been applied to reservoir mapping. In Wong et al. (1997) and Wang et al. (1998, 1999, 1999), the authors applied a radial basis function neural network to relate the conceptual distribution of geological facies (in the form of hand drawings) to reservoir porosity. It is able to incorporate the general property trend provided by local geological knowledge and to simulate fine-scaled details when used in conjunction with geostatistical simulation techniques. In Caers (1999) and Caers and Journel (1998), the authors trained a neural network to recognize the local conditional probability based on multiple-point information retrieved from a “training image,” which can be any densely populated image (e.g. outcrop data, photographs, hand drawings, seismics, etc.). The conditional probability was used in stochastic simulation with a Markov Chain sampler (e.g. Markov Chain Monte Carlo). These methodologies can be applied to produce 3D model of petrophysical properties using multiple seismic attributes and conceptual geological maps. This is a significant advantage compared to the conventional geostatistical methods which are limited to two-point statistics (e.g. variograms) and simple objects (e.g. channels). In Nikravesh et al. (1996), the authors used neural networks to predict field performance and optimize oil recovery by water injection in the Lost Hill Diatomite (Kern County). They constructed several neural networks to model individual well behavior (wellhead pressure and injection-production history) based on data obtained from the surrounding wells. The trained networks were used to predict future fluid production. The results matched well with the actual data. The study also led to the best oil recovery with the minimum water injected. In what follows we will review the geoscience applications in these broad areas: Data Processing and Prediction. We will not address other geoscience applications such as: classification of multi- source remote sensing data Bennediktson et al., (1990), earthquake prediction, Aminzadeh et al. (1994), and ground water remediation, Johnson and Rogers (1975).
3.1 Data Processing Various types of geoscience data are used in the oil industry to ultimately locate the most prospective locations for oil and gas reservoirs. These data sets go through extensive amount of processing and manipulation before they are analyzed and interpreted. The processing step is very time consuming yet a very important one. ANN have been utilized to help improve the efficiency of operation in this step. Under this application area we will examine: First seismic arrival picking, and noise elimination problems. Also, see Aminzadeh (1991) and McCormack (1991) and Zadeh Aminzadeh (1995) and Aminzadeh et al. (1999) for other related applications.
3.1.1 First Arrival Picking Seismic data are the response of the earth to any disturbance (compressional waves or shear waves). The seismic source can be generated either artificially (petroleum
Computational Intelligence for Geosciences and Oil Exploration
279
seismology, PS) or, naturally, (earthquake seismology, ES). The recorded seismic data are then processed and analyzed to make an assessment of the subsurface (both the geological structures and rock properties) in PS and the nature of the source (location or epicenter and magnitude, for example, in Richter scale) in ES. Conventional PS relies heavily on compressional (P- wave) data while ES is essentially based on the shear (S- wave) data. The first arrivals of P and S waves on a seismic record contain useful information both in PS and ES. However one should make sure that the arrival is truly associated with a seismically generated event not a noise generated due to various factors. Since we usually deal with thousands of seismic records, their visual inspection for distinguishing FSA from noise, even if reliable, could be quite time consuming. One of the first geoscience applications of ANN has been to streamline the operation of identifying the FSA in an efficient and reliable manner. Among the recent publications in this area are: McCormack (1990) and Veezhinathan et al. (1991). Key elements of the latter (V91) are outlined below: Here, the FSA picking is treated as a pattern recognition problem. Each event is classified either as an FSA or non-FSA. A segment of the data within a window is used to obtain four “Hilbert” attributes of the seismic signal. The Hilbert attributes of seismic data were introduced by Taner et al. (1979). In V91, these attributes are derived from seismic signal using a sliding time window. Those attributes are: 1) Maximum Amplitude, 2) Mean Power Level; MPL, 3) Power Ratios, and 4) Envelop Slop Peak. These types of attributes have been used by Aminzadeh and Chatterjee (1984) for predicting gas sands using clustering and discernment analysis technique. In V91, the network processes three adjacent peaks at a time to decide whether the center peak is an FSA or a non-FSA. A BPN with five hidden layers combined with a post processing scheme accomplished correct picks of 97%. Adding a fifth attribute, Distance from Travel Time Curve, generated satisfactory results without the need for the post processing step. McCormack (1990) created a binary image from the data and used it to train the network to move up and down across the seismic record to identify the FSA. This image-based approach captures space-time information in the data but requires a large number of input units, thus necessitating a large network. Some empirical schemes are used to ensure its stability.
3.1.2 Noise Elimination A related problem to FSA is editing noise from the seismic record. The objective here is to identify events with non-seismic origin (the reverse of FSA) and then remove them from the original data in order to increase the signal to noise ratio. Liu et al. (1989), McCormack (1990) and Zhang and Li (1994) are Some of the publications in this area. Zhang and Li (1994) handled the simpler problem, to edit out the whole noisy trace from the record. They initiate the network in the “learning” phase by “scanning” over the whole data set. The weights are adapted in the learning phase either with some human input as the distinguishing factors between “good” and “bad”
280
M. Nikravesh
traces or during an unsupervised learning phase. Then in the “recognizing” phase the data are scanned again and depending upon whether the output of the network is less than or greater than a threshold level the trace is either left alone or edited out as a bad trace.
3.2 Identification and Prediction Another major application area for ANN in the oil industry is to predict various reservoir properties. This ultimately is used a decision tool for exploration and development drilling and redevelopment or extension of the existing fields. The input data to this prediction problem is usually processed and interpreted seismic and log data and/or a set of attributes derived from the original data set. Historically, many “hydrocarbon indicators” have been proposed to make such predictions. Among them are: the bright spot analysis Sherif and Geldart (1982), amplitude versus offset analysis, Ostrander (1982), seismic clustering analysis, Aminzadeh and Chatterjee (1984), fuzzy pattern recognition, Griffiths (1987) and other analytical methods, Agterberg and Griffiths (1991). Many of the ANN developed for this purpose are built around the earlier techniques either for establishing a relationship between the raw data and physical properties of the reservoirs and/or to train the network using the previously established relationships. Huang and Williamson (1994) have developed a general regression neural network, GRNN to predict rock’s total organic carbon (TOC) using well log data. First, they model the relationship between the resistivity log and TOC with a GRNN, using published data. After training the ANN in two different modes, the GRNN found optimum values of sigma. Sigma is an important smoothing parameter used in GRNN. They have established the superiority of GRNN over BP-ANN in determining the architecture of the network. After completing the training phase a predictive equation for determining TOC was derived. Gogan et al. (1995) used ANN for determining lithology and fluid saturation from well log and pre-stack seismic data. Various seismic attributes from partial stacks (mid, near and far offsets) as an input to ANN. The network was calibrated using synthetic (theoretical) data with pre stack seismic response of known lithologies and saturation from the well log data. The output of the network was a set of classes of lithologies and saturations.
3.3 Neural Network and Nonlinear Mapping In this section, a series of neural network models will be developed for nonlinear mapping between wireline logs. A series of neural network models will also be developed to analyze actual welllog data and seismic information and the nonlinear mapping between wireline logs and seismic attributes will be recognized. In this study, wireline logs such as travel time (DT), gamma ray (GR), and density (RHOB) will be predicted based on SP, and resistivity (RILD) logs. In addition, we
Computational Intelligence for Geosciences and Oil Exploration
281
will predict travel time (DT) based on induction resistivity and vice versa. In this study, all logs are scaled uniformly between -1 and 1 and results are given in scaled domain. Figures 6A through 6E show typical behavior of SP, RILD, DT, GR, and RHOB logs in scaled domain. The design of a neural network to predict DT, GR, and RHOB based on RILD and SP logs starts with filtering, smoothing, and interpolating values (in a small horizon) for missing information in the data set. A first-order filter and a simple linear recursive parameter estimator for interpolating were used to filter and reconstruct the noisy data. The available data were divided into three data sets: training, testing, and validation. The network was trained based on the training data set and continuously tested using a test data set during the training phase. The network was trained using a backpropagation algorithm and modified Levenberge-Marquardt optimization technique. Training was stopped when prediction deteriorated with step.
SP
1
a
0.5 0 -0.5
0
500
1000
1500
2000
2500
3000
3500
4000
RILD
1
b
0.5 0 -0.5
0
500
1000
1500
2000
2500
3000
3500
4000
0.5
c
DT
0 -0.5 -1
0
500
1000
1500
2000
2500
3000
3500
4000
GR
0.5 0
d
-0.5 -1
0
500
1000
1500
2000
2500
3000
3500
4000
RHOB
1
e
0.5 0 -0.5
0
500
1000
1500
2000
2500
Sampling Space
Fig. 6 Typical behavior of SP, RILD, DT, GR, and RHOB logs
3000
3500
4000
282
M. Nikravesh
3.3.1 Travel Time (DT) Prediction Based on SP and Resistivity (RILD) Logs The neural network model to predict the DT has 14 input nodes (two windows of data each with 7 data points) representing SP (7 points or input nodes) and RILD (7 points or input nodes) logs. The hidden layer has 5 nodes. The output layer has 3 nodes representing the prediction of the DT (a window of 3 data point). Typical performance of the neural network for training, testing, and validation data sets is shown in Figures 7A, 7B, 7C, and 7D. The network shows good performance for prediction of DT for training, testing, and validation data sets. However, there is not a perfect match between actual and predicted values for DT is the testing and validation data sets. This is due to changes of the lithology from point to point. In other words, some of the data points in the testing and validation data sets are in a lithological layer which was not presented in the training phase. Therefore, to have perfect mapping, it would be necessary to use the layering information (using other types of logs or linguistic information) as input into the network or use a larger data set for the training data set which represent all the possible behaviors in the data.
3.3.2 Gamma Ray (GR) Prediction Based on SP and Resistivity (RILD) Logs In this study, a neural network model is developed to predict GR based on SP and RILD log. The network has 14 input nodes (two windows of data, each with 7 data points) representing SP (7 points or input nodes) and RILD (7 points or input nodes) logs. The hidden layer has 5 nodes. The output layer has 3 nodes representing the prediction of the GR. Figures 8A through 8D show the performance of the neural network for training, testing, and validation data. The neural network model shows a good performance for prediction of GR for training, testing, and validation data sets. In comparison with previous studies (DT prediction), this study shows that the GR is not as sensitive as DT to noise in the data. In addition, a better global relationship exits between SP-resisitivity-GR rather than SP-resisitivity-DT. However, the local relationship is in the same order of complexity. Therefore, two models have the same performance for training (excellent performance). However, the model for prediction of GR has a better generalization property. Since the two models have been trained based on the same criteria, it is unlikely that this lack of mapping for generalization is due to over fitting during the training phase.
3.3.3 Density (RHOB) Prediction Based on SP and Resistivity (RILD) Logs To predict density based on SP and RILD logs, a neural network model with 14 input nodes representing SP (7 points or input nodes) and RILD (7 points or input nodes) logs, 5 nodes in the hidden layer, and 3 nodes in the output layer representing the prediction of the RHOB is developed. Figures 9A through 9D show a typical performance of the neural network for the training, testing, and validation data sets. The network shows excellent performance for the training data set as shown in Figs. 9A and 9B. The model shows a good performance for the testing data set as shown in
Computational Intelligence for Geosciences and Oil Exploration
283
a
Training data set, Solid Color: Actual, Light Color: Prediction 0.3 0.2 0.1
DT
0 -0.1 -0.2 -0.3 -0.4 -0.5
0
50
100
150
200
250
300
sampling space
b
Training data set, Solid Color: Actual, Light Color: Prediction 0.3 0.2 0.1
Prediction
0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.5
-0.4
-0.3
-0.2
-0.1 Actual
0
0.1
0.2
0.3
Fig. 7 Typical neural network performance for prediction of DT based on SP and RILD
Fig. 9C. Figure 9D shows the performance of the neural network for the validation data set. The model has relatively good performance for the validation data set. Therefore, there is not a perfect match between the actual and predicted values for RHOB for the testing and validation data set. Since RHOB is directly related to lithology and layering, and to have perfect mapping, it would be necessary to use the layering information (using other types of logs or linguistic information) as an
284
M. Nikravesh Test data set, Solid Color: Actual, Light Color: Prediction
c
0.3 0.2 0.1 0
DT
-0.1 -0.2 -0.3 -0.4 -0.5 -0.6
0
20
40
60
80
100
sampling space
Validation data set, Solid Color: Actual, Light Color: Prediction
d
0.4 0.3 0.2 0.1
DT
0 -0.1 -0.2 -0.3 -0.4 -0.5
0
20
40 60 sampling space
80
100
Fig. 7 (continued)
input into the network or use a larger data set for the training data set which represent all the possible behaviors in the data. In these cases, one can use a knowledge-based approach using knowledge of an expert and select more diverse information which represent all different possible layering as a training data set. Alternatively, one can use an automated clustering technique to recognize the important clusters existing in the data and use this information for selecting the training data set.
Computational Intelligence for Geosciences and Oil Exploration
285
Training data set, Solid Color: Actual, Light Color: Prediction
a
0.4 0.3 0.2
GR
0.1 0 -0.1 -0.2 -0.3 -0.4
0
50
100
150
200
250
300
sampling space
b
Training data set, Solid Color: Actual, Light Color: Prediction 0.4 0.3 0.2
Prediction
0.1 0 -0.1 -0.2 -0.3 -0.4 -0.4
-0.3
-0.2
-0.1
0 Actual
0.1
0.2
0.3
0.4
Fig. 8 Typical neural network performance for prediction of GR based on SP and RILD
3.3.4 Travel Time (DT) Prediction Based on Resistivity (RILD) The neural network model to predict the DT has 11 input nodes representing a RILD log. The hidden layer has 7 nodes. The output layer has 3 nodes representing the prediction of the DT. Using engineering knowledge, a training data set is carefully
286
M. Nikravesh
c
Test data set, Solid Color: Actual, Light Color: Prediction 0.4 0.3 0.2
GR
0.1 0 -0.1 -0.2 -0.3
0
20
40 60 sampling space
80
100
Validation data set, Solid Color: Actual, Light Color: Prediction
d
0.3 0.2 0.1
GR
0 -0.1 -0.2 -0.3 -0.4
0
20
40 60 sampling space
80
100
Fig. 8 (continued)
selected so as to represent all the possible layering existing in the data. The typical performance of neural network for the training, testing, and validation data sets is shown in Figs. 10A through 10D. As expected, the network has excellent performance for prediction of DT. Even though only RILD logs are used for prediction of DT, the network model has better performance than when SP and RILD logs
Computational Intelligence for Geosciences and Oil Exploration
287
a
Training data set, Solid Color: Actual, Light Color: Prediction 0.6 0.4
RHOB
0.2 0 -0.2 -0.4 -0.6 -0.8 0
50
100
150 200 sampling space
250
300
Training data set, Solid Color: Actual, Light Color: Prediction
b
0.6 0.4
Prediction
0.2 0 - 0.2 - 0.4 - 0.6 - 0.8 - 0.8
- 0.6
- 0.4
- 0.2
0
0.2
0.4
0.6
Actual
Fig. 9 Typical neural network performance for prediction of RHOB based on SP and RILD
used for prediction of DT (comparing Figs. 7A through 7D with 5A through 5D). However, in this study, knowledge of an expert was used as extra information. This knowledge not only reduced the complexity of the model, but also better prediction was achieved.
288
M. Nikravesh
c
Test data set, Solid Color: Actual, Light Color: Prediction 0.5 0.4 0.3 0.2 RHOB
0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5
0
20
40 60 sampling space
80
100
Validation data set, Solid Color: Actual, Light Color: Prediction
d
0.5 0.4 0.3 0.2 RHOB
0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5
0
20
40 60 sampling space
80
100
Fig. 9 (continued)
3.3.5 Resistivity (RILD) Prediction Based on Travel Time (DT) In this section, to show that the technique presented in the previous section is effective, the performance of the inverse model is tested. The network model has 11 input nodes representing DT, 7 nodes in the hidden layer, and 3 nodes in the output
Computational Intelligence for Geosciences and Oil Exploration
289
a
Training data set, Solid Color: Actual, Light Color: Prediction 0.4 0.3 0.2 0.1 DT
0 - 0.1 - 0.2 - 0.3 - 0.4 - 0.5
0
50
100
150 200 sampling space
250
300
Training data set, Solid Color: Actual, Light Color: Prediction
b
0.4 0.3 0.2
Prediction
0.1 0 - 0.1 - 0.2 - 0.3 - 0.4 - 0.5 - 0.6
- 0.4
- 0.2
0
0.2
0.4
Actual
Fig. 10 Typical neural network performance for prediction of DT based on RILD
layer representing the prediction of the RILD. Figures 11A through 11D show the performance of the neural network model for the training, testing, and validation data sets. Figures 11A and 11B show that the neural network has excellent performance for the training data set. Figures 6C and 6D show the performance of
290
M. Nikravesh
c
Test data set, Solid Color: Actual, Light Color: Prediction 0.3 0.2 0.1 0 DT
-0.1 -0.2 -0.3 -0.4 -0.5 -0.6
0
20
40
60
80
100
sampling space Validation data set, Solid Color: Actual, Light Color: Prediction 0.4
d
0.3 0.2
DT
0.1 0 -0.1 -0.2 -0.3 -0.4 0
20
40 60 sampling space
80
100
Fig. 10 (continued)
the network for the testing and validation data set. The network shows relatively excellent performance for testing and validation purposes. As was mentioned in the previous section, using engineering knowledge the complexity of the model was reduced and better performance was achieved.
Computational Intelligence for Geosciences and Oil Exploration
291
a
Training data set, Solid Color: Actual, Light Color: Prediction 0.6 0.4
RESISTIVITY
0.2 0 -0.2 -0.4 -0.6 -0.8
0
50
100
150 200 sampling space
250
300
Training data set, Solid Color: Actual, Light Color: Prediction
b
0.6 0.4
Prediction
0.2 0 -0.2 -0.4 -0.6 -0.8 -0.8
-0.6
-0.4
-0.2 0 Actual
0.2
0.4
0.6
Fig. 11 Typical neural network performance for prediction of RILD based on DT
In addition, since the network model (prediction of DT from resisitivity) and its inverse (prediction of resisitivity based on DT) have relatively excellent performance and generalization properties, a one-to-one mapping was achieved. Therefore, this implies that a good representation of layering was selected based on knowledge of an expert.
292
M. Nikravesh
c
Test data set, Solid Color: Actual, Light Color: Prediction 0.5 0.4 0.3
RESISTIVITY
0.2 0.1 0 -0.1 -0.2 -0.3 -0.4
0
20
40 60 sampling space
80
Validation data set, Solid Color: Actual, Light Color: Prediction
100
d
0.6 0.5 0.4
RESISTIVITY
0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4
0
Fig. 11 (continued)
20
40 60 sampling space
80
100
Computational Intelligence for Geosciences and Oil Exploration
293
4 Fuzzy Logic In recent years, it has been shown that uncertainty may be due to fuzziness rather than chance. Fuzzy logic is considered to be appropriate to deal with the nature of uncertainty in system and human error, which are not included in current reliability theories. The basic theory of fuzzy sets was first introduced by Zadeh (1965). Unlike classical logic which is based on crisp sets of “true and false”, fuzzy logic views problems as a degree of “truth”, or “fuzzy sets of true and false” (Zadeh 1965). Despite the meaning of the word “fuzzy”, fuzzy set theory is not one that permits vagueness. It is a methodology that was developed to obtain an approximate solution where the problems are subject to vague description. In addition, it can help engineers and researchers to tackle uncertainty, and to handle imprecise information in a complex situation. During the past several years, the successful application of fuzzy logic for solving complex problems subject to uncertainty has greatly increased and today fuzzy logic plays an important role in various engineering disciplines. In recent years, considerable attention has been devoted to the use of hybrid neural network-fuzzy logic approaches as an alternative for pattern recognition, clustering, and statistical and mathematical modeling. It has been shown that neural network models can be use to construct internal models that capture the presence of fuzzy rules. However, determination of the input structure and number of membership functions for the inputs has been one of the most important issues of fuzzy modeling. Fuzzy logic provides a completely new way of modeling complex and ill-defined systems. The major concept of fuzzy logic is the use of a linguistic variable, that is a variable whose values are words or sentences in a natural or synthetic language. This also leads to the use of fuzzy if-then rules, in which the antecedent and consequents are propositions containing linguistic variables. In recent years, fuzzy logic, or more generally, fuzzy set theory, has been applied extensively in many reservoir characterization studies. This is mainly due to the fact that reservoir geology is mainly a descriptive science which uses mostly uncertain, imprecise, ambiguous and linguistic information (Bois 1984). Fuzzy set theory has the ability to deal with such information and to combine them with the quantitative observations. The applications are many, including seismic and stratigraphic modeling and formation evaluation. In Bois (1984), he proposed to use fuzzy set theory as a pattern recognition tool to interpret a seismic section. The algorithm produced a synthetic seismic section by convoluting a geological model with a representative impulse (by deconvolution or signature of the source), which were both of subjective and fuzzy nature. The synthetic section was then compared with the original seismic section in a fuzzy context. In essence, the algorithm searches for the appropriate geological model from the observed seismic section by an iterative procedure, which is a popular way for solving inverse problems. In Baygun et al. (1985), the authors used fuzzy logic as a classifier for the delineation of geological objects in a mature hydrocarbon reservoir with many wells. The authors showed that fuzzy logic can be used to extract dimensions and orientation of geological bodies, and geologists can use such a technique for reservoir characterization in a practical way which bypassed many tedious steps.
294
M. Nikravesh
In Nordlund (1996), the author presented a study on dynamic stratigraphic modeling using fuzzy logic. In stratigraphic modeling, it is possible to model several processes simultaneously in space and time. Although many processes can be modeled using conventional mathematics, modeling the change of deposition and erosion on surfaces is often difficult. Formalizing geological knowledge is a difficult exercise as it involves handling of several independent and complex parameters. In addition, most information is qualitative and imprecise, which are unacceptable for direct numerical treatment. In the paper, the author showed a successful use of fuzzy rules to model erosion and deposition. The fuzzified variables included the depth of the reservoirs, the distances to source and shore, a sinusoidal sea-level curve, tectonic subsidence rate and simulation time with depositional surface. From the study, the author demonstrated that a few (10) fuzzy rules could produce stratigraphies with realistic geometry and facies. In Cuddy (1997), the author applied fuzzy logic to solve a number of petrophysical problems in several North Sea fields. The work included lithofacies and permeability from well logs. Lithofacies prediction was based on the use of a possibility value (Gaussian function with a specific mean and variance) to represent a well log belonging to a certain lithofacies. The lithofacies that was associated with the highest combined fuzzy possibility (multiplication of all values) was taken as the most likely lithofacies for that set of logs. A similar methodology was applied to predict permeability by dividing the core permeability values into ten equal bin sizes on a log scale. The problem was converted into a classification problem. All the results suggested that the fuzzy approach had given better petrophysical estimates compared to the conventional techniques. Fang and Chen (1997) also applied fuzzy rules to predict porosity and permeability from five compositional and textural characteristics of sandstone in the Yacheng Field (South China Sea). The five input attributes were the relative amounts of rigid grains, ductile grains and detrital matrix, grain size and the Trask sorting coefficient. All the porosity and permeability data were firstly divided into certain number of clusters by fuzzy c-means. The corresponding sandstone characteristics for each cluster were used to general the fuzzy linguistic rules. Each fuzzy cluster produced one fuzzy if-then rule with five input statements. The formulated rules were then used to make linguistic prediction by combining individual conclusion from each rule. If a numerical output was desired, a defuzzification algorithm (1997) could be used to extract a crisp output from a fuzzy set. The results showed that the fuzzy modeling gave better results compared to those presented in Bloch (1991). In Huang et al. (1999), the authors presented a simple but practical fuzzy interpolator for predicting permeability from well logs in the North West Shelf (offshore Australia). The basic idea was to simulate local fuzzy reasoning. When a new input vector (well logs) was given, the system would select two training vectors which were nearest to the new input vector and build a set of piece-wise linear inference rules with the training values, in which the membership value of the training values was one. The study used well log and core data from two wells and the performance was tested at a third well, where actual core data were available for comparison. The accuracy of the permeability predictions at the test well was although similar to that
Computational Intelligence for Geosciences and Oil Exploration
295
obtained from the authors’ previous neural-fuzzy technique, the fuzzy approach was >7, 000 times faster in terms of CPU time. In Nikravesh and Aminzadeh (2000), the authors applied a neural-fuzzy approach to develop an optimum set of rules for nonlinear mapping between porosity, grain size, clay content, P-wave velocity, P-wave attenuation and permeability. The rules developed from a training set were used to predict permeability in another data set. The prediction performance was very good. The study also showed that the integrated technique discovered clear relationships between P-wave velocity and porosity, and P-wave attenuation and clay content, which were useful to geophysicists.
4.1 Geoscience Applications of Fuzzy Logic The uncertain, fuzzy, and linguistic nature of geophysical and geological data makes it a good candidate for interpretation through fuzzy set theory. The main advantage of this technique is in combining the quantitative data and qualitative information and subjective observation. The imprecise nature of the information available for interpretation (such as seismic data, wirelin logs, geological and lithological data) makes fuzzy sets theory an appropriate tool to utilize. For example, Chappaz (1977) and Bois (1983, 1984) proposed to use fuzzy sets theory in the interpretation of seismic sections. Bois used fuzzy logic as pattern recognition tool for seismic interpretation and reservoir analysis. He concluded that fuzzy set theory, in particular, can be used for interpretation of seismic data which are imprecise, uncertain, and include human error. He maintained these type of error and fuzziness cannot be taken into consideration by conventional mathematics. However, they are perfectly seized by fuzzy set theory. He also concluded that using fuzzy set theory one can determine the geological information using seismic data. Therefore, one can predict the boundary of reservoir in which hydrocarbon exists. B. Baygun et al. (1985) used fuzzy logic as classifier for delineation of geological objects in a mature hydrocarbon reservoir with many wells. B. Baygun et al. have shown that fuzzy logic can be used to extract dimensions and orientation of geological bodies and the geologist can use such a technique for reservoir characterization in a very quick way through bypassing several tedious steps. H. C. Chen et al. in their study used fuzzy set theory as fuzzy regression analysis for extraction of parameter for Archie equation. Bezdek et al. (1981) also reported a series of the applications of fuzzy sets theory in geostatical analysis. Tamhane et al. (2002), show how to integrate linguistic descriptions in petroleum reservoirs using fuzzy logic. Many of our geophysical analysis techniques such as migration, DMO, wave equation modeling as well as the potential methods (gravity, magnetic, electrical methods) use conventional partial differential wave equations with deterministic coefficients. The same is true for the partial differential equations used in reservoir simulation. For many practical and physical reasons deterministic parameters for the coefficients of these PDEs leads unrealistic (for example medium velocities for seismic wave propagation or fluid flow for Darcy equation). Stochastic parameters
296
M. Nikravesh
in theses cases can provide us with a more practical characterization. Fuzzy coefficients for PDEs can prove to be even more realistic and easy to parameterize. Today’s deterministic processing and interpretation ideas will give way to stochastic methods, even if the industry has to rewrite the book on geophysics. That is, using wave equations with random and fuzzy coefficients to describe subsurface velocities and densities in statistical and membership grade terms, thereby enabling a better description of wave propagation in the subsurface–particularly when a substantial amount of heterogeneity is present. More generalized applications of geostatistical techniques will emerge, making it possible to introduce risk and uncertainty at the early stages of the seismic data processing and interpretation loop.
5 Neuro-Fuzzy Techniques In recent years, considerable attention has been devoted to the use of hybrid neuralnetwork/fuzzy-logic approaches as an alternative for pattern recognition, clustering, and statistical and mathematical modeling. It has been shown that neural network models can be used to construct internal models that recognize fuzzy rules. Neuro-fuzzy modeling is a technique for describing the behavior of a system using fuzzy inference rules within a neural network structure. The model has a unique feature in that it can express linguistically the characteristics of a complex nonlinear system. As a part of any future research opportunities, we will use the neurofuzzy model originally presented by Sugeno and Yasukawa (1993). The neuro-fuzzy model is characterized by a set of rules. The rules are expressed as follows: Ri : if x1 is A1 i and x2 is A2 i . . . and xn is An i (Antecedent) then
y∗
(1)
= fi (x1 , x2 , . . . , xn ) (Consequent)
where fi (x1 , x2 , . . . , xn ) can be constant, linear, or a fuzzy set. For the linear case : fi (x1 , x2 , . . . , xn ) = ai0 + ai1 x1 + ai2 x2 + . . . + ain xn (2) Therefore, the predicted value for output y is given by: y = Σi μi fi (x1 , x2 , . . . , xn )/Σμi
(3)
μi = j Aj i (xj )
(4)
with
where Ri is the ith rule, xj are input variables, y is output, Aj i are fuzzy membership functions (fuzzy variables), aij constant values. As a part of any future research opportunities, we will use the Adaptive NeuroFuzzy Inference System (ANFIS) technique originally presented by Jang (1992). The model uses neuro-adaptive learning techniques, which are similar to those of neural networks. Given an input/output data set, the ANFIS can construct a fuzzy inference system (FIS) whose membership function parameters are adjusted using
Computational Intelligence for Geosciences and Oil Exploration
297
the back-propagation algorithm or other similar optimization techniques. This allows fuzzy systems to learn from the data they are modeling.
5.1 Neural-Fuzzy Model for Rule Extraction In this section, a neuro-fuzzy model will be developed for model identification and knowledge extraction (rule extraction) purposes. The model is characterized by a set of rules which can be further used for representation of data in the form of linguistic variables. Therefore, in this situation the fuzzy variables become linguistic variables. The neuro-fuzzy technique is used to implicitly cluster the data while finding the nonlinear mapping. The neuro-fuzzy model developed in this study is an approximate fuzzy model with triangular and Gaussian membership functions originally presented by Sugeno and Yasukawa (1993). K-Mean technique is used for clustering and the network is trained using a backpropagation algorithm and modified Levenberge-Marquardt optimization technique. In this study, the effect of rock parameters and seismic attenuation on permeability will be analyzed based on soft computing techniques and experimental data. The software will use fuzzy logic techniques because the data and our requirements are imperfect. In addition, it will use neural network techniques, since the functional structure of the data is unknown. In particular, the software will be used to group data into important data sets; extract and classify dominant and interesting patterns that exist between these data sets; and discover secondary, tertiary and higher-order data patterns. The objective of this section is to predict the permeability based on grain size, clay content, porosity, P-wave velocity, and P-wave attenuation.
5.2 Prediction of Permeability Based on Porosity, Grain Size, Clay Content, P-wave Velocity, and P-wave Attenuation In this section, a neural-fuzzy model will be developed for nonlinear mapping and rule extraction (knowledge extraction) between porosity, grain size, clay content, P-wave velocity, P-wave attenuation and permeability. Figure 12 shows typical data, which has been used in this study. In this study, permeability will be predicted based on the following rules and equations. (1) through (4): IF Rock Type= Sandstones [5] AND Porosity=[p1,p2] AND Grain Size =[g1,g2] AND Clay Content =[c1,c2] AND P-Wave Vel.=[pwv1,pwv2] AND P-Wave Att.=[pwa1,pwa2] THEN Y∗ = a0 + a1∗ P + a2∗ G + a3∗ C + a4∗ PWV + a5∗ PWA.
298
M. Nikravesh
Grain
400
0
5
15
20
25
30
35
40
10
10
15
20
25
30
35
40
10
15
20
25
30
35
40
5
10
15
20
25
30
35
40
5
10
15
20
25
30
35
40
5
10
15
20 25 Samples
30
35
40
40 20 0
percent 0
40
percent
20 0
5
0
5
6000
m/sec
4000 2000
P-wave Att.
P-wave Vel.
Clay Content
Porosity
0
Permeability
micron
200
0
10
dB/cm
5 0
0
400
mD
200 0
0
Fig. 12 Actual data (Boadu 1997)
Where, P is %porosity, G is grain size, C is clay content, PWV is P-wave velocity, and P-wave attenuation, Y∗ , is equivalent to f in (1). Data are scaled uniformly between −1 and 1 and the result is given in the scaled domain. The available data were divided into three data sets: training, testing, and validation. The neuro-fuzzy model was trained based on a training data set and continuously tested using a test data set during the training phase. Training was stopped when it was found that the model’s prediction suffered upon continued training. Next, the number of rules was increased by one and training was repeated. Using this technique, an optimal number of rules were selected. Figures 13A through 13E and Table 5 show typical rules extracted from the data. In Table 1, Column 1 through 5 show the membership functions for porosity, grain size, clay content, P-wave velocity, and P-wave attenuation respectively. Using the model defined by equations 1 through 4 and membership functions defined in Figs. 13A through 13E and Table 5, permeability was predicted as shown in Fig. 14A. In this study, 7 rules were identified for prediction of permeability based on porosity, grain size, clay content, P-wave velocity, and P-wave attenuation
Computational Intelligence for Geosciences and Oil Exploration
299
2, 3
1 0.9
1
7
6
0.8
Membership Function
0.7 0.6
4, 5
0.5 0.4 0.3 0.2 0.1 0 -0.8
-0.6
-0.4
-0.2
0
0.2
0.4
2
4
0
0.2
0.6
Porosity
Fig. 13A Typical rules extracted from data, 7 Rules (Porosity)
1
6
0.9
3 1
7
5
0.8
Membership Function
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -1
-0.8
-0.6
-0.4 -0.2 Grain Size
Fig. 13B Typical rules extracted from data, 7 Rules (Grain Size)
0.4
300
M. Nikravesh 1 0.9
2
7
6
1
0.8
Membership Function
0.7 0.6 0.5
3, 4, 5
0.4 0.3 0.2 0.1 0 -1
-0.8
-0.6
-0.4 -0.2 Clay Content
0
0.2
0.4
Fig. 13C Typical rules extracted from data, 7 Rules (Clay Content) 1 0.9
6
7
5
1
0.8
2
Membership Function
0.7
3, 4
0.6 0.5 0.4 0.3 0.2 0.1 0 -0.8
-0.6
-0.4
-0.2 0 P-Wave Velocity
0.2
Fig. 13D Typical rules extracted from data, 7 Rules (P-Wave Velocity)
0.4
0.6
Computational Intelligence for Geosciences and Oil Exploration
301
4, 5
1 0.9
1
2
3
6
7
0.8
Membership Function
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
-1
-0.8
-0.6
-0.4 -0.2 P-Wave attenuation
0
0.2
0.4
Fig. 13E Typical rules extracted from data, 7 Rules (P-Wave Attenuation)
(Figs. 13A through 13E and Fig. 14A). In addition, 8 rules were identified for prediction of permeability based on porosity, clay content, P-wave velocity, and P-wave attenuation (Fig. 14B). Ten rules were identified for prediction of permeability based on porosity, P-wave velocity, and P-wave attenuation (Fig. 14C). Finally, 6 rules were identified for prediction of permeability based on grain size, clay content, Pwave velocity, and P-wave attenuation (Fig. 14D). The neural network model shows very good performance for prediction of permeability. In this situation, not only a nonlinear mapping and relationship was identified between porosity, grain size, clay content, P-wave velocity, and P-wave attenuation, and permeability, but the rules existing between data were also identified. For this case study, our software clustered the parameters as grain size, P-wave velocity/porosity (as confirmed by Fig. 15 since a clear linear relationship exists between these two variables), and P-wave attenuation/clay content (as it is confirmed by Fig. 16 since an approximate Table 5 Boundary of rules extracted from data Porosity
Grain Size
Clay Content
P_Wave Velocity P_Wave Attenuation
[-0.4585, -0.3170] [0.4208, 0.5415] [-0.3610, -0.1599] [-0.2793, -0.0850] [-0.3472, -0.1856] [0.2700, 0.4811] [-0.2657, -0.1061]
[-0.6501, -0.3604] [-0.9351, -0.6673] [-0.7866, -0.4923] [-0.5670, -0.2908] [-0.1558, 0.1629] [-0.8077, -0.5538] [0.0274, 0.3488]
[-0.6198, -0.3605] [0.2101, 0.3068] [-0.3965, -0.1535] [-0.4005, -0.1613] [-0.8093, -0.5850] [-0.0001, 0.2087] [-0.4389, -0.1468]
[-0.0893, 0.2830] [-0.7981, -0.7094] [-0.0850, 0.1302] [-0.1801, 0.0290] [0.1447, 0.3037] [-0.6217, -0.3860] [-0.1138, 0.1105]
[-0.6460 -0.3480] [0.0572 0.2008] [-0.4406 -0.1571] [-0.5113 -0.2439] [-0.8610 -0.6173] [-0.1003 0.1316] [-0.5570 -0.1945]
302
M. Nikravesh
K = f(P,G,C,PWV,PWA) 1
Red: 5 Rules 0.5
Green: 7 Rules
Predicted Pemeability
Blue: 10 Rules
0
-0.5
-1
-1.5 -1
-0.5
0 Actual Pemeability
0.5
1
Fig. 14A Performance of Neural-Fuzzy model for prediction of permeability [ K = f (P, G, C, PWV, PWA)]
K = f(P,C,PWV,PWA)
b
1
Red: 4 Rules 0.5
Green: 6 Rules
Predicted Permeability
Blue: 8 Rules 0
-0.5
-1
-1.5 -1
-0.5
0 Actual Permeability
0.5
1
Fig. 14B Performance of Neural-Fuzzy model for prediction of permeability [ K = f (P, C, PWV, PWA)]
Computational Intelligence for Geosciences and Oil Exploration
303
K = f(P,PWV,PWA) 1 Red: 3 Rules Megenda: 5 Rules
Predicted Permeability
0.5
Green: 7 Rules Blue: 10 Rules
0
-0.5
-1
-1.5 -1
-0.5
0 Actual Permeability
0.5
1
Fig. 14C Performance of Neural-Fuzzy model for prediction of permeability [ K = f (P, PWV, PWA)]
K = f(G,C,PWV,PWA)
1 Red: 4 Rules
Predicted Permeability
0.5
Green: 6 Rules Blue: 8 Rules
0
-0.5
-1
-1.5 -1
-0.5
0 0.5 Actual Permeability
1
Fig. 14D Performance of Neural-Fuzzy model for prediction of permeability [ K = f (G, C, PWV, PWA)]
304
M. Nikravesh
1 0.8 0.6
Porosity
0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1
-0.5
0 P-Wave Velocity
0.5
1
0.5
1
Fig. 15 Relationship between P-Wave Velocity and Porosity
1 0.8 0.6 0.4 Clay Content
0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1
-0.5
0 P-Wave Attenuation
Fig. 16 Relationship between P-Wave Attenuation and Clay Content
Computational Intelligence for Geosciences and Oil Exploration
305
linear relationship exists between these two variables). In addition, using the rules extracted, it was shown that P-wave velocity is closely related to porosity and Pwave attenuation is closely related to clay content. Boadu (1997) also indicated that the most influential rock parameter on the attenuation is the clay content. In addition our software ranked the variables in the order grain size, p-wave velocity, p-wave attenuation and clay content/porosity (since clay content and porosity can be predicted from p-wave velocity and p-wave attenuation).
6 Genetic Algorithms Evolutionary computing represents computing with the use of some known mechanisms of evolution as key elements in algorithmic design and implementation. A variety of algorithms have been proposed. They all share a common conceptual base of simulating the evolution of individual structures via processes of parent selection, mutation, crossover and reproduction. The major one is the genetic algorithms (GAs) (Holland, 1975). Genetic algorithm (GA) is one of the stochastic optimization methods which is simulating the process of natural evolution. GA follows the same principles as those in nature (survival of the fittest, Charles Darwin). GA first was presented by John Holland as an academic research. However, today GA turn out to be one of the most promising approaches for dealing with complex systems which at first nobody could imagine that from a relative modest technique. GA is applicable to multiobjectives optimization and can handle conflicts among objectives. Therefore, it is robust where multiple solution exist. In addition, it is highly efficient and it is easy to use. Another important feature of GA is its ability to extract knowledge in terms of fuzzy rules. GA is now widely used and applied to discovery of fuzzy rules. However, when the data sets are very large, it is not easy to extract the rules. To overcome such a limitation, a new coding technique has been presented recently. The new coding method is based on biological DNA. The DNA coding method and the mechanism of development from artificial DNA are suitable for knowledge extraction from large data set. The DNA can have many redundant parts which is important for extraction of knowledge. In addition, this technique allows overlapped representation of genes and it has no constraint on crossover points. Also, the same type of mutation can be applied to every locus. In this technique, the length of chromosome is variable and it is easy to insert and/or delete any part of DNA. Today, genetic algorithm can be used in a hierarchical fuzzy model for pattern extraction and to reduce the complexity of the neuro-fuzzy models. In addition, GA can be use to extract the number of the membership functions required for each parameter and input variables, and for robust optimization along the multidimensional, highly nonlinear and non-convex search hyper-surfaces. GAs work by firstly encoding the parameters of a given estimator as chromosomes (binary or floating-point). This is followed by populating a range of potential solutions. Each chromosome is evaluated by a fitness function. The better parent
306
M. Nikravesh
solutions are reproduced and the next generation of solutions (children) is generated by applying the genetic operators (crossover and mutation). The children solutions are evaluated and the whole cycle repeats until the best solution is obtained. The methodology is in fact general and can be applied to optimizing parameters in other soft computing techniques, such as neural networks. In Yao (1999), the author gave an extensive review of the use of evolutionary computing in neural networks with more than 300 references. Three general areas are: evolution of connection weights; evolution of neural network architectures; and evolution of learning rules. Most geoscience applications began in early 1990s. Gallagher and Sambridge (1994) presented an excellent overview on the use of GAs in seismology. Other applications include geochemical analysis, well logging and seismic interpretation. Fang et al. first used GAs to predict porosity and permeability from compositional and textural information and the Archie parameters in petrophysics. The same authors later used the same method to map geochemical data into a rock’s mineral composition (1996). The performance was much better than the results obtained from linear regression and nonlinear least-squares methods. In Huang et al. (1998), the authors used GAs to optimize the connection weights in a neural network for permeability prediction from well logs. The study showed that the GA-trained networks (neural-genetic model) gave consistently smaller errors compared to the networks trained by the conventional gradient descent algorithm (backpropagation). However, GAs were comparatively slow in convergence. In Huang et al. (2000), the same authors initialized the connection weights in GAs using the weights trained by backpropagation. The technique was also integrated with fuzzy reasoning, which gave a hybrid system of neural-fuzzy-genetic (Huang, 1998). This improved the speed of convergence and still obtained better results. Another important feature of GAs is its capability of extracting fuzzy rules. However, this becomes unpractical when the data sets are large in size. To overcome this, a new encoding technique has been presented recently, which is based on the understanding of biological DNA. Unlike the conventional chromosomes, the length of chromosome is variable and it is flexible to insert new parts and/or delete redundant parts. In Yashikawa et al. (1998) and Nikravesh et al. (1998), the authors used a hybrid system of neural-fuzzy-DNA model for knowledge extraction from seismic data, mapping the well logs into seismic data and reconstruction of porosity based on multi-attributes seismic mapping.
6.1 Geoscience Applications of Genetic Algorithms Most of the applications of the GA in the area of petroleum reservoir or in the area of geoscience are limited to inversion techniques or used as optimization technique. While in other filed, GA is used as a powerful tool for extraction of knowledge, fuzzy rules, fuzzy membership, and in combination with neural network and fuzzylogic. Recently, Nikravesh et. al, (?) proposed to use a neuro-fuzzy-genetic model
Computational Intelligence for Geosciences and Oil Exploration
307
for data mining and fusion in the area of geoscience and petroleum reservoirs. In addition, it has been proposed to use neuro-fuzzy-DNA model for extraction of knowledge from seismic data and mapping the wireline logs into seismic data and reconstruction of porosity (and permeability if reliable data exist for permeability) based on multi-attributes seismic mapping. Seismic inversion was accomplished using genetic algorithms by Mallick (1999). Potter et al (1999) used GA for stratigraphic analysis. For an overview of GA in exploration problems see McCormack et al (1999)
7 Principal Component Analysis and Wavelet Some of the data Fusion and data mining methods used in exploration applications are as follows. First we need to reduce the space to make the data size more manageable as well as reducing the time required for data processing. We can use Principal Component Analysis. Using the eigen value and vectors, we can reduce the space domain. We choose the eigenvector corresponding to the largest eigenvalues. Then in the eigenvector space we use Fuzzy K-Mean or Fuzzy C-Mean technique. For details of Fuzzy C-Means algorithm see Cannon et al (1986). Also, see Lashgari (1991), Aminzadedh (1989) and Aminzadeh (1994) for the application of Fuzzy Logic and Fuzzy K-Means algorithm in several earth exploration problems. We can also use Wavelet and extract the patterns and Wavelets describing different geological settings and the respective rock properties. Using the Wavelet and neural network, we can fuse the data for nonlinear modeling. For clustering purposes, we can use the output from Wavelet and use Fuzzy C-Mean or Fuzzy K-Mean. To use uncertainty and see the effect of the uncertainty, it is easy to add the distribution to each point or some weight for importance of the data points. Once we assign some weight to each point, then we can correspond each weight to number of points in a volume around each point. Of course the techniques based on principal component analysis has certain limitations. One of the limitations is when SNR is negative or zero causing the technique to fail. The reason for this is the singularity of the variance and covariance matrices. Therefore, an important step is to use KF or some sort of Fuzzy set theory for noise reduction and extraction of Signal.
8 Intelligent Reservoir Characterization In reservoir engineering, it is important to characterize how 3-D seismic information is related to production, lithology, geology, and logs (e.g. porosity, density, gamma ray, etc.) (Boadu 1997; Nikravesh 1998a-b; Nikravesh et al., 1998; Chawathe et al. 1997; Yoshioka et al 1996; Schuelke et al. 1997; Monson and
308
M. Nikravesh
Pita 1997, Aminzadeh and Chatterjee, 1985). Knowledge of 3-D seismic data will help to reconstruct the 3-D volume of relevant reservoir information away from the well bore. However, data from well logs and 3-D seismic attributes are often difficult to analyze because of their complexity and our limited ability to understand and use the intensive information content of these data. Unfortunately, only linear and simple nonlinear information can be extracted from these data by standard statistical methods such as ordinary Least Squares, Partial Least Squares, and nonlinear Quadratic Partial Least-Squares. However, if a priori information regarding nonlinear input-output mapping is available, these methods become more useful. Simple mathematical models may become inaccurate because several assumptions are made to simplify the models in order to solve the problem. On the other hand, complex models may become inaccurate if additional equations, involving a more or less approximate description of phenomena, are included. In most cases, these models require a number of parameters that are not physically measurable. Neural networks (Hecht-Nielsen 1989) and fuzzy logic (Zadeh 1965) offer a third alternative and have the potential to establish a model from nonlinear, complex, and multi-dimensional data. They have found wide application in analyzing experimental, industrial, and field data (Baldwin et al. 1990; Baldwin et al. 1989; Pezeshk et al. 1996; Rogers et al. 1992; Wong et al. 1995a, 1995b; Nikravesh et al. 1996; Nikravesh and Aminzadeh, 1997). In recent years, the utility of neural network and fuzzy logic analysis has stimulated growing interest among reservoir engineers, geologists, and geophysicists (Nikravesh et al. 1998; Nikravesh 1998a; Nikravesh 1998b; Nikravesh and Aminzadeh 1998; Chawathe et al. 1997; Yoshika et al. 1996; Schuelke et al. 1997; Monson and Pita 1997; Boadu 1997; Klimentos and McCann 1990; Aminzadeh and Katz 1994). Boadu (1997) and Nikravesh et al. (1998) applied artificial neural networks and neuro-fuzzy successfully to find relationships between seismic data and rock properties of sandstone. In a recent study, Nikravesh and Aminzadeh (1999) used an artificial neural network to further analyze data published by Klimentos and McCann (1990) and analyzed by Boadu (1997). It was concluded that to find nonlinear relationships, a neural network model provides better performance than does a multiple linear regression model. Neural network, neuro-fuzzy, and knowledge-based models have been successfully used to model rock properties based on well log databases (Nikravesh, 1998b). Monson and Pita (1997), Chawathe et al. (1997) and Nikravesh (1998b) applied artificial neural networks and neuro-fuzzy techniques successfully to find the relationships between 3-D seismic attributes and well logs and to extrapolate mapping away from the well bore to reconstruct log responses. Adams et al. (1999a and 1999b), Levey et al. (1999), Nikravesh et al. (1999a and 1999b) showed schematically the flow of information and techniques to be used for intelligent reservoir characterization (IRESC) (Fig. 17). The main goal will be to integrate soft data such as geological data with hard data such as 3-D seismic, production data, etc. to build a reservoir and stratigraphic model. Nikravesh et al. (1999a and 1999b) were developed a new integrated methodology to identify a nonlinear relationship and mapping between 3-D seismic data and production-log data and the technique was applied to a producing field. This advanced data analysis and interpretation methodology for 3-D seismic and production-log data uses
Computational Intelligence for Geosciences and Oil Exploration
Reservoir Engineering Data Log Data Seismic Data Mechanical Well Data
309
Geological Data
Hard Data
Soft Data
Economic and Cost Data
Risk Assessment
Reservoir Model Inference Engine or Kernel Stratigraphic Model
User Interface User
Fig. 17 Integrated Reservoir Characterization (IRESC)
conventional statistical techniques combined with modern soft-computing techniques. It can be used to predict: 1. mapping between production-log data and seismic data, 2. reservoir connectivity based on multi-attribute analysis, 3. pay zone recognition, and 4. optimum well placement (Fig. 18). Three criteria have been used to select potential locations for infill drilling or recompletion (Nikravesh et al., 1999a and 1999b): 1. continuity of the selected cluster, 2. size and shape of the cluster, and 3. existence of high Production-Index values inside a selected cluster with high Cluster-Index values. Based on these criteria, locations of the new wells were selected, one with high continuity and potential for high production and one with low continuity and potential for low production. The neighboring wells that are already in production confirmed such a prediction (Fig. 18). Although these methodologies have limitations, the usefulness of the techniques will be for fast screening of production zones with reasonable accuracy. This new methodology, combined with techniques presented by Nikravesh (1998a, 1998b), Nikravesh and Aminzadeh (1999), and Nikravesh et al. (1998), can be used to reconstruct well logs such as DT, porosity, density, resistivity, etc. away from the well bore. By doing so, net-pay-zone thickness, reservoir models, and geological representations will be accurately identified. Accurate reservoir characterization through data integration is an essential step in reservoir modeling, management, and production optimization.
8.1 Reservoir Characterization Figure 17 shows schematically the flow of information and techniques to be used for intelligent reservoir characterization (IRESC). The main goal is to integrate soft
310
M. Nikravesh
Medium Production
Medium Production
Well Bh High Production Production Index
Well Dh Well Fh
Well Ah
Well Ch
Medium Production
High Production Cluster Index
w Lo or lf ia n nt io te uct Po rod p
Low Production
r High tial fo Poten tion c produ
Well Eh
New Well Locations
Fig. 18 Optimal well placement (Nikravesh et al., 1999a and 1999b)
data such as geological data with hard data such as 3-D seismic, production data, etc. to build reservoir and stratigraphic models. In this case study, we analyzed 3-D seismic attributes to find similarity cubes and clusters using three different techniques: 1. k-means, 2. neural network (self-organizing map), and 3. fuzzy c-means. The clusters can be interpreted as lithofacies, homogeneous classes, or similar patterns that exist in the data. The relationship between each cluster and production-log data was recognized around the well bore and the results were used to reconstruct and extrapolate production-log data away from the well bore. The results from clustering were superimposed on the reconstructed production-log data and optimal locations to drill new wells were determined.
8.1.1 Examples Our example are from fields that produce from the Ellenburger Group. The Ellenburger is one of the most prolific gas producers in the conterminous United States, with greater than 13 TCF of production from fields in west Texas. The Ellenburger Group was deposited on an Early Ordovician passive margin in shallow subtidal to intertidal environments. Reservoir description indicates the study area is affected by a karst-related, collapsed paleocave system that acts as the primary reservoir in the field studied (Adams et al., 1999; Levey et al., 1999).
Computational Intelligence for Geosciences and Oil Exploration
311
8.1.2 Area 1 The 3-D seismic volume used for this study has 3,178,500 data points (Table 6). Two hundred, seventy-four well-log data points intersect the seismic traces. Eightynine production log data points are available for analysis (19 production and 70 non-production). A representative subset of the 3-D seismic cube, production log data, and an area of interest were selected in the training phase for clustering and mapping purposes. The subset (150 samples, with each sample equal to 2 msec of seismic data or approximately 20 feet of Ellenburger dolomite) was designed as a section (670 seismic traces) passing through all the wells as shown in Fig. 19 and has 100,500 (670∗150) data points. However, only 34,170 (670∗51) data points were selected for clustering purposes, representing the main Ellenburger focus area. This subset covers the horizontal boreholes of producing wells, and starts approximately 15 samples (300 feet) above the Ellenburger, and ends 20 samples (400 feet) below the locations of the horizontal wells. In addition, the horizontal wells are present in a 16-sample interval, for a total interval of 51 samples (102 msec or 1020 feet). Table 6 shows typical statistics for this case study. Figure 20 shows a schematic diagram of how the well path intersects the seismic traces. For clustering and mapping, there are two windows that must be optimized, the seismic window and the well log window. Optimal numbers of seismic attributes and clusters need to be determined, depending on the nature of the problem. Figure 21 shows the iterative technique that has been used to select an optimal number of clusters, seismic attributes, and optimal processing windows for the seismic section shown in Fig. 19. Expert knowledge regarding geological parameters has also been used to constrain the maximum number of clusters to be selected. In this study, six attributes have been selected (Raw Seismic, Instantaneous Amplitude, Instantaneous Phase, Cosine Instantaneous Phase, Instantaneous Frequency, and Integrate Absolute Amplitude) out of 17 attributes calculated (Table 7). Figures 22 through 27 show typical representations of these attributes in our case study. Ten clusters were recognized, a window of one sample was used as the optimal window size for the seismic, and a window of three samples was used for the
Table 6 Typical statistics for main focus area, Area 1, and Ellenburger Data Cube Section InLine: 163 Total Number of Traces: 670 Xline: 130 Time Sample: 150 Time Sample: 150 Total Number of Points: 100,500 Total Number of Points: 3,178,500 Used for Clustering: 34, 170 Section/Club=%3.16 For Clustering: %1.08 Well Data Production Data Total Number of Points: 274 Total Number of Points: 89 Well Data/Section: %0.80 Production: 19 Well Data/Cube: %0.009 No Production: 70 Production Data/Section:%0.26 Production Data/Cube:%0.003
312
M. Nikravesh
Well Eh
Well Cv Well Ah
Well Dv
Well Ch1 Well Bh2 Well Bh1 Well Dh Well Ch2 Well Bh3 Well Fh
Well path Section path 0.1
0.
0.1
0.2
0.3
0.4
0.5
miles
Fig. 19 Seismic section passing through all the wells, Area 1
production log data. Based on qualitative analysis, specific clusters with the potential to be in producing zones were selected. Software was developed to do the qualitative analysis and run on a personal computer using MatlabTM software. Figure 28 shows typical windows and parameters of this software. Clustering was based on three different techniques, k-means (statistical), neural network , and fuzzy c-means Well Path
Seismic Traces
Well Log Window
Seismic Window
Fig. 20 Schematic diagram of how the well path intersects the seismic traces
Computational Intelligence for Geosciences and Oil Exploration
313
Optimal Processing Window and Sub-Window
Attributes Raw seismic Amplitude envel ope Instantaneous Frequency Instantaneous Phase Cosi ne instantaneous phase Integrated absolute amplitudeetc., ...
Optimal Number of Attributes
Optimal Number of Clusters
Cube
Section
Seismic
Seismic/Logs
Logs
Seismic
Gas/No-Gas, Breccia /No Breccia DOL, LS, SH, CHERT, Others (SS, COAL)
Logs
Seismic/Logs
This part needs generation of pseudologs from seismic using advanced techniques. (See IRESC Model)
This will be generated around the well-bore.
Fig. 21 Iterative technique to select an optimal number of clusters, seismic attributes, and optimal processing windows
clustering. Different techniques recognized different cluster patterns as shown by the cluster distributions (Figs. 29A through 31). Figures 29-31 through 14 show the distribution of clusters in the section passing through the wells as shown in Fig. 19. By comparing k-mean (Fig. 12A) and neural network clusters (Figure 29) with fuzzy clusters (Figure 14), one can conclude that the neural network predicted a different Table 7 List of the attributes calculated in this study Attribute No.
abbrev.
Attribute
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
ampenv ampwcp ampwfr ampwph aprpol avgfre cosiph deriamp deriv domfre insfre inspha intaamp integ raw sdinam secdev
Amplitude Envelope Amplitude Weighted Cosine Phase Amplitude Weighted Frequency Amplitude Weighted Phase Apparent Polarity Average Frequency Cosine Instantaneous Phase Derivative Instantaneous Amplitude Derivative Dominant Frequency Instantaneous Frequency Instantaneous Phase Integrated Absolute Amplitude Integrate Raw Seismic Second Derivative Instantaneous Amplitude Second Derivative
314
Fig. 22 Typical time slice of Raw Seismic in Area 1 with Rule 1
Fig. 23 Typical time slice of Amplitude envelope in Area 1 with Rule 1
M. Nikravesh
Computational Intelligence for Geosciences and Oil Exploration
Fig. 24 Typical time slice of Instantaneous Phase in Area 1 with Rule 1
Fig. 25 Typical time slice of Cosine Instantaneous Phase in Area 1 with Rule 1
315
316
Fig. 26 Typical time slice of Instantaneous Frequency in Area 1 with Rule 1
Fig. 27 Typical time slice of Integrated Absolute Amplitude in Area 1 with Rule 1
M. Nikravesh
Computational Intelligence for Geosciences and Oil Exploration
317
Fig. 28 MatLabTM software to do the qualitative analysis
structure and patterns than did the other techniques. Figures 29B and 29C show a typical time-slice from the 3-D seismic cube that has been reconstructed with the extrapolated k-means cluster data. Finally, based on a qualitative analysis, specific clusters that have the potential to include producing zones were selected. Each clustering technique produced two clusters that included most of the production data. Each of these three pairs of clusters is equivalent. To confirm such a conclusion, cluster patterns were generated for the section passing through the wells as shown in Fig. 19. Figures 32 through 34 show the two clusters from each technique that correlate with production: clusters one and four from k-means clustering (Fig. 32); clusters one and six from neural network clustering (Fig. 33); and clusters six and ten from fuzzy c-means clustering (Fig. 34). By comparing these three cross sections, one can conclude that, in the present study, all three techniques predicted the same pair of clusters based on the objective of predicting potential producing zones. However, this may not always be the case because information that can be extracted by the different techniques may be different. For example, clusters using classical techniques will have sharp boundaries whereas those generated using the fuzzy technique will have fuzzy boundaries. Based on the clusters recognized in Figures 32 through 34 and the production log data, a subset of the clusters has been selected and assigned as cluster 11 as shown in Figures 35 and 36. In this sub-cluster, the relationship between production-log data and clusters has been recognized and the production-log data has been reconstructed and extrapolated away from the well bore. Finally, the production-log data and the
318
M. Nikravesh
Fig. 29A Typical k-means distribution of clusters in the section passing through the wells as shown in Fig. 19
Fig. 29B Typical 2-D presentation of time-slice of k-means distribution of clusters
Computational Intelligence for Geosciences and Oil Exploration
319
Fig. 29C Typical 3-D presentation of time-slice of k-means distribution of clusters
Fig. 30 Typical neural network distribution of clusters in the section passing through the wells as shown in Fig. 19
320
M. Nikravesh
Fig. 31 Typical fuzzy c-means distribution of clusters in the section passing through the wells as shown in Fig. 19
cluster data were superimposed at each point in the 3-D seismic cube. Figures 37A and 37B (Fig. 18) show a typical time-slice of a 3-D seismic cube that has been reconstructed with the extrapolated production-log data and cluster data. The color scale in Figs. 37A and 20B is divided into two indices, Cluster Index and Production Index. Criteria used to define Cluster Indices for each point are expressed as a series of dependent IF-THEN statements. Before defining Production Indices for each point within a specified cluster, a new cluster must first be defined based only on the seismic data that represents production-log data with averaged values greater than 0.50. Averaged values are determined by assigning a value to each sample of a horizontal borehole (two feet/sample). Sample intervals that are producing gas are assigned values of one and non-producing sample intervals are assigned values of zero. The optimum window size for production-log data is three samples and the averaged value at any point is the average of the samples in the surrounding window. After the new cluster is determined, a series of IF-THEN statements is used to define the Production Indices. Three criteria have been used to select potential locations for infill drilling or recompletion: 1. continuity of the selected cluster, 2. size and shape of the cluster, and 3. existence of high Production-Index values inside a selected cluster with high Cluster-Index values. Based on these criteria, locations of the new wells were selected and two such locations are shown in Fig. 37B (Fig. 18), one with high continuity and potential for high production and one with low continuity and potential for low production. The neighboring wells that are already in production confirm such a prediction as shown in Fig. 37B (Fig. 18).
Computational Intelligence for Geosciences and Oil Exploration
321
Fig. 32 Clusters one and four that correlate with production using the k-means clustering technique on the section passing through the wells as shown in Fig. 19
Fig. 33 Clusters one and six that correlate with production using the neural network clustering technique on the section passing through the wells as shown in Fig. 19
322
M. Nikravesh
Fig. 34 Clusters six and ten that correlate with production using the fuzzy c-means clustering technique on the section passing through the wells as shown in Fig. 19
Fig. 35 The section passing through all wells showing a typical distribution of clusters generated by combining the three sets of 10 clusters created by each clustering technique (k-means, neural network, and fuzzy c-means). Note that eleven clusters are displayed rather than the ten clusters originally generated. The eleventh cluster is a subset of the three pairs of clusters that contain most of the production
Computational Intelligence for Geosciences and Oil Exploration
323
Well Ch Medium production Well Fh Medium Production
Well Eh Low Production
Well Ah High Production
Well Dh Medium Production Well Bh High Production
Fig. 36 The section passing through all wells showing only the eleventh cluster, a subset of the three pairs of clusters that contains most of the production
a
Cluster Index
Potential for Low production
Production P roduction IIndex ndex
h production High Potential for Hi
Fig. 37A A 2-D time slice showing clusters and production, with areas indicated for optimal well placement
324
M. Nikravesh
Medium Production
Well Dh
Well Bh
Medium Production
High Production
Production P roduction IIndex ndex
Well Fh
Well Ah
Well Ch
w
Medium Production
High Production
Cluster Index
Lo or lf ia n nt i o te u c t Po rod p
Low Production
r High tial fo Poten ction produ
Well Eh
New Well Locations
Fig. 37B An oblique view of the 2-D time slice in Fig. 20A showing clusters and production, with areas indicated for optimal well placement
9 Fractured Reservoir Characterization In particular when we faced with fractured reservoir characterization, an efficient method of data entry, compiling, and preparation becomes important. Not only the initial model requires considerable amount of data preparation, but also subsequent stages of model updating will require a convenient way to input the new data to the existing data stream. Well logs suites provided by the operator will be supplied to the project team. We anticipate a spectrum of resistivity, image logs, cutting and core where available. A carefully designed data collection phase will provide the necessary input to develop a 3-D model of the reservoir. An optimum number of test wells and training wells needs to be identified. In addition, a new technique needs to be developed to optimize the location and the orientation of each new well to be drilled based on data gathered from previous wells. If possible, we want to prevent clustering of too many wells at some locations and under-sampling in other locations thus maintaining a level of randomness in data acquisition. The data to be collected will be dependent on the type of fractured reservoir The data collected will also provide the statistics to establish the trends, variograms, shape, and distribution of the fractures in order to develop a non-linear and non-parametric statistical model and various possible realizations of this model. For Example, one can use Stochastic models techniques and Alternative Conditional Expectation (ACE) model developed by Breiman and Friedman [1985] for initial reservoir model prediction This provides crucial information on the variability of the estimated models. Significant changes from one realization to the other indicate a high level of uncertainty, thus the need for additional data to reduce the standard
Computational Intelligence for Geosciences and Oil Exploration
325
deviation. In addition, one can use our neuro-fuzzy approach to better quantify and perhaps reduce the uncertainties in the characterization of the reservoir. Samples from well cuttings (commonly available) and cores (where available) from the focus area can also be analyzed semi-quantitatively by XRD analysis of clay mineralogy to determine vertical variability. Calibration to image logs needs to be performed to correlate fracture density to conventional log signature and mineralogical analysis. Based on the data obtained and the statistical representation of the data, an initial 3-D model of the boundaries of the fractures and its distribution can be developed. The model is represented by a multi-valued parameter, which reflects different subsurface properties to be characterized. This parameter is derived through integration of all the input data using a number of conventional statistical approaches. A novel “neuro-fuzzy” based algorithm that combines the training and learning capabilities of the conventional neural networks with the capabilities of fuzzy logic to incorporate subjective and imprecise information can be refined for this application. Nikravesh [1998a, b] showed the significant superiority of the neurofuzzy approach for data integration over the conventional methods for characterizing the boundaries. Similar method with minor modifications can be implemented and tested for fractured reservoirs. Based on this information, an initial estimate for distribution of reservoir properties including fracture shape and distribution in 2-D and 3-D spaces can be predicted. Finally, the reservoir model is used as an input to this step to develop an optimum strategy for management of the reservoir. As data collection continues in the observation wells, using new data the model parameters will be updated. These models are then continually evaluated and visualized to assess the effectiveness of the production strategy. The wells chosen in the data collection phase will be designed and operated through a combination of an intelligent advisor.
10 Future Trends and Conclusions We have discussed the main areas where soft computing can make a major impact in geophysical, geological and reservoir engineering applications in the oil industry. These areas include facilitation of automation in data editing and data mining. We also pointed out applications in non-linear signal (geophysical and log data) processing. And better parameterization of wave equations with random or fuzzy coefficients both in seismic and other geophysical wave propagation equations and those used in reservoir simulation. Of significant importance is their use in data integration and reservoir property estimation. Finally, quantification and reduction of uncertainty and confidence interval is possible by more comprehensive use of fuzzy logic and neural networks. The true benefit of soft computing, which is to use the intelligent techniques in combination (hybrid) rather than isolation, has not been demonstrated in a full extent. This section will address two particular areas for future research: hybrid systems and computing with words.
326
M. Nikravesh
10.1 Hybrid Systems So far we have seen the primary roles of neurocomputing, fuzzy logic and evolutionary computing. Their roles are in fact unique and complementary. Many hybrid systems can be built. For example, fuzzy logic can be used to combine results from several neural networks; GAs can be used to optimize the number of fuzzy rules; linguistic variables can be used to improve the performance of GAs; and extracting fuzzy rules from trained neural networks. Although some hybrid systems have been built, this topic has not yet reached maturity and certainly requires more field studies. In order to make full use of soft computing for intelligent reservoir characterization, it is important to note that the design and implementation of the hybrid systems should aim to improve prediction and its reliability. At the same time, the improved systems should contain small number of sensitive user-definable model parameters and use less CPU time. The future development of hybrid systems should incorporate various disciplinary knowledge of reservoir geoscience and maximize the amount of useful information extracted between data types so that reliable extrapolation away from the wellbores could be obtained.
10.2 Computing with Words One of the major difficulties in reservoir characterization is to devise a methodology to integrate qualitative geological description. One simple example is the core descriptions in standard core analysis. These descriptions provide useful and meaningful observations about the geological properties of core samples. They may serve to explain many geological phenomena in well logs, mud logs and petrophysical properties (porosity, permeability and fluid saturations). Yet, these details are not utilized due to the lack of a suitable computational tool. Gedeon et al. (1999) provided one of the first attempts to relate these linguistic descriptions (grain size, sorting, matrix, roundness, bioturbation and lamina) to core porosity levels (very poor, poor, fair and good) using intelligent techniques. The results were promising and drawn a step closer to Zadeh’s idea on computing with words (Zadeh, 1996). Computing with words (CW) aims to perform computing with objects which are propositions drawn from a natural language or having the form of mental perceptions. In essence, it is inspired by remarkable human capability to manipulate words and perceptions and perform a wide variety of physical and mental tasks without any measurement and any computations. It is fundamentally different from the traditional expert systems which are simply tools to “realize” an intelligent system, but are not able to process natural language which is imprecise, uncertain and partially true. CW has gained much popularity in many engineering disciplines (Zadeh, 1999a and 1999b). In fact, CW plays a pivotal role in fuzzy logic and viceversa. Another aspect of CW is that it also involves a fusion of natural languages and computation with fuzzy variables.
Computational Intelligence for Geosciences and Oil Exploration
327
In reservoir geology, natural language has been playing a very crucial role for a long time. We are faced with many intelligent statements and questions on a daily basis. For example: “if the porosity is high then permeability is likely to be high”; “most seals are beneficial for hydrocarbon trapping, a seal is present in reservoir A, what is the probability that the seal in reservoir A is beneficial?”; and “high resolution log data is good, the new sonic log is of high resolution, what can be said about the goodness of the new sonic log?” CW has much to offer in reservoir characterization because most available reservoir data and information are too imprecise. There is a strong need to exploit the tolerance for such imprecision, which is the prime motivation for CW. Future research in this direction will surely provide a significant contribution in bridging reservoir geology and reservoir engineering. Given the level of interest and the number of useful networks developed for the earth science applications and specially oil industry, it is expected soft computing techniques will play a key role in this field. Many commercial packages based on soft computing are emerging. The challenge is how to explain or “sell” the concepts and foundations of soft computing to the practicing explorationist and convince them of the value of the validity, relevance and reliability of results based on the intelligent systems using soft computing methods. Acknowledgments The author would like to thanks Dr. Fred Aminzadeh for his contribution to earlier version of this paper, his feedback and comments and allowing the authors to use the his published and his unpublished documents, papers, and presentations to prepare this paper.
References 1. Adams, R.D., J.W. Collister, D.D. Ekart, R.A. Levey, and M. Nikravesh, 1999a, Evaluation of gas reservoirs in collapsed paleocave systems, Ellenburger Group, Permian Basin, Texas, AAPG Annual Meeting, San Antonio, TX, 11–14 April. 2. Adams, R.D., M. Nikravesh, D.D. Ekart, J.W. Collister, R.A. Levey, and R.W. Siegfried, 1999b, Optimization of Well Locations in Complex Carbonate Reservoirs, Summer 1999b, Vol. 5, Number 2, GasTIPS, GRI 1999b. 3. Aminzadeh, F. and Jamshidi, M.: Soft Computing: Fuzzy Logic, Neural Networks, and Distributed Artificial Intelligence, PTR Prentice Hall, Englewood Cliffs, NJ (1994). 4. Adams, R.D., Nikravesh, M., Ekart, D.D., Collister, J.W., Levey, R.A. and Seigfried, R.W.: “Optimization of Well locations in Complex Carbonate Reservoirs,” GasTIPS, Vol. 5, no. 2, GRI (1999). 5. Aminzadeh, F. 1989, Future of Expert Systems in the Oil Industry, Proceedings of HARC Conference on Geophysics and Computers, A Look at the Future. 6. Aminzadeh, F. and Jamshidi,M., 1995, Soft Computing, Eds., PTR Prentice Hall, 7. Aminzadeh, F. and S. Chatterjee, 1984/85, Applications of clustering in exploration seismology, Geoexploration, v23, p.147–159. 8. Aminzadeh, F., 1989, Pattern Recognition and Image Processing, Elsevier Science Ltd. Aminzadeh, F., 1994, Applications of Fuzzy Expert Systems in Integrated Oil Exploration, Computers and Electrical Engineering, No. 2, pp 89–97. 9. Aminzadeh, F., 1991, Where are we now and where are we going?, in Expert Systems in Exploration, Aminzadeh, F. and M. Simaan, Eds., SEG publications, Tulsa, 3–32.
328
M. Nikravesh
10. Aminzadeh, F., S. Katz, and K. Aki, 1994, Adaptive Neural Network for Generation of Artificial Earthquake Precursors, IEEE Trans. On Geoscience and Remote Sensing, 32 (6) 11. Aminzadeh, F. 1996, Future Geoscience Technology Trends in , Stratigraphic Analysis, Utilizing Advanced Geophysical, Wireline, and Borehole Technology For Petroleum Exploration and Production, GCSEPFM pp 1–6, 12. Aminzadeh, F. Barhen, J. Glover, C. W. and Toomanian, N. B , 1999, Estimation of Reservoir Parameters Using a Hybrid Neural Network, Journal of Science and Petroleum Engineering, Vol. 24, pp 49–56. 13. Aminzadeh, F. Barhen, J. Glover, C. W. and Toomanian, N. B., 2000, Reservoir parameter estimation using hybrid neural network, Computers & Geoscience, Vol. 26 pp 869–875. 14. Aminzadeh, F., and de Groot, P., Seismic characters and seismic attributes to predict reservoir properties, 2001, Proceedings of SEG-GSH Spring Symposium. 15. Baldwin, J.L., A.R.M. Bateman, and C.L. Wheatley, 1990, Application of Neural Network to the Problem of Mineral Identification from Well Logs, The Log Analysis, 3, 279. 16. Baldwin, J.L., D.N. Otte, and C.L. Wheatley, 1989, Computer Emulation of human mental process: Application of Neural Network Simulations to Problems in Well Log Interpretation, Society of Petroleum Engineers, SPE Paper # 19619, 481. 17. Baygun et. al, 1985, Applications of Neural Networks and Fuzzy Logic in Geological Modeling of a Mature Hydrocarbon Reservoir, Report# ISD-005-95-13a, Schlumberger-Doll Research. 18. Benediktsson, J. A. Swain, P. H., and Erson, O. K., 1990, Neural networks approaches versus statistical methods in classification of multisource remote sensing data, IEEE Geoscience and Remote Sensing, Transactions, Vol. 28, No. 4, 540–552. 19. Bezdek, J.C., 1981, Pattern Recognition with Fuzzy Objective Function Algorithm, Plenum Press, New York. 20. Bezdek, J.C., and S.K. Pal,eds., 1992, Fuzzy Models for Pattern Recognition: IEEE Press, 539p. 21. Boadu, F.K., 1997, Rock properties and seismic attenuation: Neural network analysis, Pure Applied Geophysics, v149, pp. 507–524. 22. Bezdek, J.C., Ehrlich, R. and Full, W.: The Fuzzy C-Means Clustering Algorithm,” Computers and Geosciences (1984) 10, 191–203. 23. Bois, P., 1983, Some Applications of Pattern Recognition to Oil and Gas Exploration, IEEE Transaction on Geoscience and Remote Sensing, Vol. GE-21, No. 4, PP. 687–701. 24. Bois, P., 1984, Fuzzy Seismic Interpretation, IEEE Transactions on Geoscience and Remote Sensing, Vol. GE-22, No. 6, pp. 692–697. 25. Breiman, L., and J.H. Friedman, Journal of the American Statistical Association, 580, 1985. 26. Cannon, R. L., Dave J. V. and Bezdek, J. C., 1986, Efficient implementation of Fuzzy C- Means algorithms, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. PAMI 8, No. 2. 27. Bruce, A.G., Wong, P.M., Tunggal, Widarsono, B. and Soedarmo, E.: “Use of Artificial Neural Networks to Estimate Well Permeability Profiles in Sumatera, Indonesia,” Proceedings of the 27th Annual Conference and Exhibition of the Indonesia Petroleum Association (1999). 28. Bruce, A.G., Wong, P.M., Zhang, Y., Salisch, H.A., Fung, C.C. and Gedeon, T.D.: “A Stateof-the-Art Review of Neural Networks for Permeability Prediction,” APPEA Journal (2000), in press. 29. Caers, J.: “Towards a Pattern Recognition-Based Geostatistics,” Stanford Center for Reservoir Forecasting Annual Meeting (1999). 30. Caers, J. and Journel, A.G.: “Stochastic Reservoir Simulation using Neural Networks Trained on Outcrop Data,” SPE 49026, SPE Annual Technical Conference and Exhibition, New Orleans (1998), 321–336. 31. Chawathe, A., A. Quenes, and W.W. Weiss, 1997, Interwell property mapping using crosswell seismic attributes, SPE 38747, SPE Annual Technical Conference and Exhibition, San Antonio, TX, 5–8 Oct.
Computational Intelligence for Geosciences and Oil Exploration
329
32. Chen, H. C. ,Fang, J. H. ,Kortright, M. E. ,Chen, D. C Novel Approaches to the Determination of Archie Parameters II: Fuzzy Regression Analysis,. SPE 26288-P 33. Cybenko, G., 1989, Approximation by superposition of a sigmoidal function, Math. Control Sig. System, v2, p. 303. 34. Cuddy, S.: “The Application of Mathematics of Fuzzy Logic to Petrophysics,” SPWLA Annual Logging Symposium (1997), paper S. 35. Bloch, S.: “Empirical Prediction of Porosity and Permeability in Sandstones,” AAPG Bulletin (1991) 75, 1145–1160. 36. Fang, J.H., Karr, C.L. and Stanley, D.A.: “Genetic Algorithm and Application to Petrophysics,” SPE 26208, unsolicited paper. 37. Fang, J.H., Karr, C.L. and Stanley, D.A.: “Transformation of Geochemical Log Data to Mineralogy using Genetic Algorithms,” The Log Analyst (1996) 37, 26–31. 38. Fang, J.H. and Chen, H.C.: “Fuzzy Modelling and the Prediction of Porosity and Permeability from the Compositional and Textural Attributes of Sandstone,” Journal of Petroleum Geology (1997) 20, 185–204. 39. Gallagher, K. and Sambridge, M.: “Genetic Algorithms: A Powerful Tool for Large-Scale Nonlinear Optimization Problems,” Computers and Geosciences (1994) 20, 1229–1236. 40. Gedeon, T.D., Tamhane, D., Lin, T. and Wong, P.M.: “Use of Linguistic Petrographical Descriptions to Characterise Core Porosity: Contrasting Approaches,” submitted to Journal of Petroleum Science and Engineering (1999). 41. Gupta, M. M. and H. Ding, 1994, Foundations of fuzzy neural computations, in Soft Computing, Aminzadeh, F. and Jamshidi,M., Eds., PTR Prentice Hall, 165–199. 42. Hecht-Nielsen, R., 1989, Theory of backpropagation neural networks, presented at IEEE Proc., Int. Conf. Neural Network, Washington DC. 43. Hecht-Nielsen, R., 1990, Neurocomputing: Addison-Wesley Publishing, p. 433.Jang, J.S.R., 1992, Self-learning fuzzy controllers based on temporal backpropagation, IEEE Trans. Neural Networks, 3 (5). 44. Holland, J.: Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Harbor (1975). 45. Horikawa, S, T. Furuhashi, A. Kuromiya, M. Yamaoka, and Y. Uchikawa, Determination of Antecedent Structure for Fuzzy Modeling Using Genetic Algorithm, Proc. ICEC’96, IEEE International onference on Evolutionary Computation, Nagoya, Japan, May 20–22, 1996. 46. Horikawa, S, T. Furuhashi, and Y. Uchikawa, On Fuzzy Modeling Using Fuzzy Neural Networks with the Backpropagation Algorithm, IEEE Trans. On Neural Networks, 3 (5), 1992. 47. Horikawa, S, T. Furuhashi, Y. Uchikawa, and T. Tagawa, “A study on Fuzzy Modeling Using Fuzzy Neural Networks”, Fuzzy Engineering toward human Friendly Systems, IFES’91. 48. Huang, Z. and Williamson, M. A., 1994, Geological pattern recognition and modeling with a general regression neural network, Canadian Jornal of Exploraion Geophysics, Vol. 30, No. 1, 60–68. 49. Huang, Y., Gedeon, T.D. and Wong, P.M.: “A Practical Fuzzy Interpolator for Prediction of Reservoir Permeability,” Proceedings of IEEE International Conference on Fuzzy Systems, Seoul (1999). 50. Huang, Y., Wong, P.M. and Gedeon, T.D.: “Prediction of Reservoir Permeability using Genetic Algorithms,” AI Applications (1998) 12, 67–75. 51. Huang, Y., Wong, P.M. and Gedeon, T.D.: “Permeability Prediction in Petroleum Reservoir using a Hybrid System,” In: Soft Computing in Industrial Applications, Suzuki, Roy, Ovaska, Furuhashi and Dote (eds.), Springer-Verlag, London (2000), in press. 52. Huang, Y., Wong, P.M. and Gedeon, T.D.: “Neural-fuzzy-genetic Algorithm Interpolator in Log Analysis,” EAGE Conference and Technical Exhibition, Leipig (1998), paper P106. 53. Jang, J.S.R., and N. Gulley, 1995, Fuzzy Logic Toolbox, The Math Works Inc., Natick, MA. 54. Jang, J.S.R., 1992, Self-learning fuzzy controllers based on temporal backpropagation, IEEE Trans. Neural Networks, 3 (5). 55. Jang, J.S.R., 1991, Fuzzy modeling using generalized neural networks and Kalman filter algorithm, Proc. of the Ninth National Conference on Artificial Intelligence, pp. 762–767.
330
M. Nikravesh
56. Jang, J.-S.R., Sun, C.-T. and Mizutani, E.: Neuro-Fuzzy and Soft Computing, Prentice-Hall International Inc., NJ (1997). 57. Johnson, V. M. and Rogers, L. L., Location analysis in ground water remediation using artificial neural networks, Lawrence Livermore National Laboratory Report, UCRL-JC-17289. 58. Klimentos, T., and C. McCann, 1990, Relationship among Compressional Wave Attenuation, Porosity, Clay Content and Permeability in Sandstones, Geophys., v55, p. 991014. 59. Kohonen, T., 1997, Self-Organizing Maps, Second Edition, Springer.Berlin. 60. Kohonen, T., 1987, Self-Organization and Associate Memory, 2nd Edition, Springer Verlag., Berlin. 61. Lashgari, B, 1991, Fuzzy Classification with Applications, in Expert Systems in Exploration, Aminzadeh, F. and M. Simaan, Eds., SEG publications, Tulsa, 3–32. 62. Levey, R., M. Nikravesh, R. Adams, D. Ekart, Y. Livnat, S. Snelgrove, J. Collister, 1999, Evaluation of fractured and paleocave carbonate reservoirs, AAPG Annual Meeting, San Antonio, TX, 11–14 April. 63. Lin, C.T., and Lee, C.S.G., 1996, Neural fuzzy systems, Prentice Hall, Englewood Cliffs, New Jersey. 64. Lippmann, R. P., 1987, An introduction to computing with neural networks, ASSP Magazine, April, 4–22. 65. Liu, X., Xue, P., Li, Y., 1989, Neural networl method for tracing seismic events, 59th annual SEG meeting, Expanded Abstracts, 716–718. 66. MacQueen, J., 1967, Some methods for classification and analysis of multivariate observation: in LeCun, L. M. and Neyman, J., eds., The Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability: University of California Press, v.1, 281–297. 67. Mallick, S., 1999, Some practical aspects of prestack waveform inversion using a genetic algorithm: An example from the east Texas Woodbine gas sand: Geophysics, Soc. of Expl. Geophys., 64, 26–336 68. Monson, G.D. and Pita, J.A.: “Neural Network Prediction of Pseudo-logs for Net Pay and Reservoir Property Interpretation: Greater Zafiro Field Area, Equatorial Guinea,” SEG Annual Meeting (1997). 69. Matsushita, S., A. Kuromiya, M. Yamaoka, T. Furuhashi, and Y. Uchikawa, “A study on Fuzzy GMDH with Comprehensible Fuzzy Rules”, 1994 IEEE Symposium on Emerging Tehnologies and Factory Automation.Gaussian Control Problem, Mathematics of operations research, 8 (1) 1983. 70. The Math WorksTM 1995, Natick. 71. McCormack, M. D., 1990, Neural computing in geophysics, Geophysics: The Leading Edge of Exploration, 10, 1, 11–15. 72. McCormack, M. D., 1991, Seismic trace editing and first break piking using neural networks, 60th annual SEG meeting, Expanded Abstracts, 321–324. 73. McCormack, M. D., Stoisits, R. F., Macallister, D. J. and Crawford, K. D., 1999, Applications of genetic algorithms in exploration and production‘: The Leading Edge, 18, no. 6, 716–718. 74. Medsker, L.R., 1994, Hybrid Neural Network and Expert Systems: Kluwer Academic Publishers, p. 240. 75. Monson G.D., and J.A. Pita, 1997, Neural network prediction of pseudo-logs for net pay and reservoir property interpretation: Greater Zafiro field area, Equatorial Guinea, SEG 1997 meeting, Dallas, TX. 76. Nikravesh, M, and F. Aminzadeh, 1997, Knowledge discovery from data bases: Intelligent data mining, FACT Inc. and LBNL Proposal, submitted to SBIR-NASA. 77. Nikravesh, M. and F. Aminzadeh, 1999, Mining and Fusion of Petroleum Data with Fuzzy Logic and Neural Network Agents, To be published in Special Issue, Computer and Geoscience Journal, 1999–2000. 78. Nikravesh, M., 1998a, “Mining and fusion of petroleum data with fuzzy logic and neural c Report, LBNL-DOE, ORNL-DOE, and DeepLook Industry network agents”, CopyRight Consortium. 79. Nikravesh, M., 1998b, “Neural network knowledge-based modeling of rock properties based on well log databases,” SPE 46206, 1998 SPE Western Regional Meeting, Bakersfield, California, 10–13 May.
Computational Intelligence for Geosciences and Oil Exploration
331
80. Nikravesh, M., A.E. Farell, and T.G. Stanford, 1996, Model Identification of Nonlinear TimeVariant Processes Via Artificial Neural Network, Computers and Chemical Engineering, v20, No. 11, 1277p. 81. Nikravesh, M., B. Novak, and F. Aminzadeh, 1998, Data mining and fusion with integrated neuro-fuzzy agents: Rock properties and seismic attenuation, JCIS 1998, The Fourth Joint Conference on Information Sciences, North Carolina, USA, October 23–2. 82. Nikravesh, M., Adams, R.D., Levey, R.A. and Ekart, D.D.: “Soft Computing: Tools for Intelligent Reservoir Characterization (IRESC) and Optimum Well Placement (OWP),” ?????. 83. Nikravesh, M. and Aminzadeh, F.: “Opportunities to Apply Intelligent Reservoir Characterizations to the Oil Fields in Iran: Tutorial and Lecture Notes for Intelligent Reservoir Characterization,” unpublished. 84. Nikravesh, M., Kovscek, A.R., Murer, A.S. and Patzek, T.W.: “Neural Networks for FieldWise Waterflood Management in Low Permeability, Fractured Oil Reservoirs,” SPE 35721, SPE Western Regional Meeting, Anchorage (1996), 681–694. 85. Nikravesh, M.: “Artificial Intelligence Techniques for Data Analysis: Integration of Modeling, Fusion and Mining, Uncertainty Analysis, and Knowledge Discovery Techniques,” 1998 BISC-SIG-ES Workshop, Lawrence Berkeley National Laboratory, University of California, Berkeley (1998). 86. Nikravesh, M., R.A. Levey, and D.D. Ekart, 1999, Soft Computing: Tools for Reservoir Characterization (IRESC), To be presented at 1999 SEG Annual Meeting. 87. Nikravesh, M. and Aminzadeh, F., 2001, Mining and Fusion of Petroleum Data with Fuzzy Logic and Neural Network Agents, Journal of Petroleum Science and Engineering, Vol. 29, pp 221–238, 88. Nordlund, U.: “Formalizing Geological Knowledge – With an Example of Modeling Stratigraphy using Fuzzy Logic,” Journal of Sedimentary Research (1996) 66, 689–698. 89. Pezeshk, S, C.C. Camp, and S. Karprapu, 1996, Geophysical log interpretation using neural network, Journal of Computing in Civil Engineering, v10, 136p. 90. Rogers, S.J., J.H. Fang, C.L. Karr, and D.A. Stanley, 1992, Determination of Lithology, from Well Logs Using a Neural Network, AAPG Bulletin, v76, 731p. 91. Potter, D., Wright, J. and Corbett, P., 1999, A genetic petrophysics approach to facies and permeability prediction in a Pegasus well, 61st Mtg.: Eur. Assn. Geosci. Eng., Session:2053. 92. Rosenblatt, F., 1962 Principal of neurodynamics, Spartan Books. 93. Schuelke, J.S., J.A. Quirein, J.F. Sarg, D.A. Altany, and P.E. Hunt, 1997, Reservoir architecture and porosity distribution, Pegasus field, West Texas-An integrated sequence stratigraphic-seismic attribute study using neural networks, SEG 1997 meeting, Dallas, TX. 94. Selva, C., Aminzadeh, F., Diaz B., and Porras J. M., 200, Using Geostatistical Techniques For Mapping A Reservoir In Eastern Venezuela Proceedings of 7th INTERNATIONAL CONGRESS OF THE BRAZILIAN GEOPHYSICAL SOCIETY, OCTOBER 2001, SALVADOR, BRAZIL
95. Sheriff, R. E., and Geldart, L. P., 1982, Exploration Seismology, Vol. 1 and 1983, Vol. 2, Cambridge 96. Sugeno, M. and T. Yasukawa, 1993, A Fuzzy-Logic-Based Approach to Qualitative Modeling, IEEE Trans. Fuzzy Syst., 1 97. Tamhane, D., Wang, L. and Wong, P.M.: “The Role of Geology in Stochastic Reservoir Modelling: The Future Trends,” SPE 54307, SPE Asia Pacific Oil and Gas Conference and Exhibition, Jakarta (1999). 98. Taner, M. T., Koehler, F. and Sherrif, R. E., 1979, Complex seismic trace analysis, Geophysics, 44, 1196–1212Tamhane, D., Wong, P. M.,Aminzadeh, F. 2002, Integrating Linguistic Descriptions and Digital Signals in Petroleum Reservoirs, International Journal of Fuzzy Systems, Vol. 4, No. 1, pp 586–591 99. Tura, A., and Aminzadeh, F., 1999, Dynamic Reservoir Characterization and Seismically Constrained Production Optimization, Extended Abstracts of Society of Exploration Geophysicist, Houston, 100. Veezhinathan, J., Wagner D., and Ehlers, J. 1991, First break picking using neural network, in Expert Systems in Exploration, Aminzadeh, F. and Simaan, M., Eds, SEG, Tulsa, 179–202.
332
M. Nikravesh
101. Wang, L. and Wong, P.M.: “A Systematic Reservoir Study of the Lower Cretaceous in Anan Oilfield, Erlian Basin, North China,” AAPG Annual Meeting (1998). 102. Wang, L., Wong, P.M. and Shibli, S.A.R.: “Integrating Geological Quantification, Neural Networks and Geostatistics for Reservoir Mapping: A Case Study in the A’nan Oilfield, China,” SPE Reservoir Evaluation and Engineering (1999) 2(6), in press. 103. Wang, L., Wong, P.M., Kanevski, M. and Gedeon, T.D.: “Combining Neural Networks with Kriging for Stochastic Reservoir Modeling,” In Situ (1999) 23(2), 151–169. 104. Wong, P.M., Henderson, D.H. and Brooks, L.J.: “Permeability Determination using Neural Networks in the Ravva Field, Offshore India,” SPE Reservoir Evaluation and Engineering (1998) 1, 99–104. 105. Wong, P.M., Jang, M., Cho, S. and Gedeon, T.D.: “Multiple Permeability Predictions using an Observational Learning Algorithm,” Computers and Geosciences (2000), in press. 106. Wong, P.M., Tamhane, D. and Wang, L.: “A Neural Network Approach to Knowledge-Based Well Interpolation: A Case Study of a Fluvial Sandstone Reservoir,” Journal of Petroleum Geology (1997) 20, 363–372. 107. Wong, P.M., F.X. Jiang and, I.J. Taggart, 1995a, A Critical Comparison of Neural Networks and Discrimination Analysis in Lithofacies, Porosity and Permeability Prediction, Journal of Petroleum Geology, v18, 191p. 108. Wong, P.M., T.D. Gedeon, and I.J. Taggart, 1995b, An Improved Technique in Prediction: A Neural Network Approach, IEEE Transaction on Geoscience and Remote Sensing, v33, 971p. 109. Wong, P. M., Aminzadeh, F and Nikravesh, M., 2001, Soft Computing for Reservoir Characterization, in Studies in Fuzziness, Physica Verlag, Germany 110. Yao, X.: “Evolving Artificial Neural Networks,” Proceedings of IEEE (1999) 87, 1423–1447. 111. Yashikawa, T., Furuhashi, T. and Nikravesh, M.: “Development of an Integrated NeuroFuzzy-Genetic Algorithm Software,” 1998 BISC-SIG-ES Workshop, Lawrence Berkeley National Laboratory, University of California, Berkeley (1998). 112. Yoshioka, K, N. Shimada, and Y. Ishii, 1996, Application of neural networks and co-kriging for predicting reservoir porosity-thickness, GeoArabia, v1, No 3. 113. Zadeh L.A. and Yager R.R.” Uncertainty in Knowledge Bases”’ Springer-Verlag, Berlin, 1991. 11Zadeh L.A., “The Calculus of Fuzzy if-then-Rules”, AI Expert 7 (3), 1992. 114. Zadeh L.A., “The Roles of Fuzzy Logic and Soft Computing in the Concept, Design and deployment of Intelligent Systems”, BT Technol Journal, 14 (4) 1996. 115. Zadeh, L. A. and Aminzadeh, F.( 1995) Soft computing in integrated exploration, proceedings of IUGG/SEG symposium on AI in Geophysics, Denver, July 12. 116. Zadeh, L.A., 1965, Fuzzy sets, Information and Control, v8, 33353. 117. Zadeh, L.A., 1973, Outline of a new approach to the analysis of complex systems and decision processes: IEEE Transactions on Systems, Man, and Cybernetics, v. SMC-3, 244 118. Zadeh, L.A.: “Fuzzy Logic, Neural Networks, and Soft Computing,” Communications of the ACM (1994) 37(3), 77–84. 119. Zadeh, L. and Kacprzyk, J. (eds.): Computing With Words in Information/Intelligent Systems 1: Foundations, Physica-Verlag, Germany (1999). 120. Zadeh, L. and Kacprzyk, J. (eds.): Computing With Words in Information/Intelligent Systems 2: Applications, Physica-Verlag, Germany (1999). 121. Zadeh, L.A.: “Fuzzy Logic = Computing with Words,” IEEE Trans. on Fuzzy Systems (1996) 4, 103–111. 122. Bishop, C.: Neural Networks for Pattern Recognition, Oxford University Press, NY (1995). 123. von Altrock, C., 1995, Fuzzy Logic and NeuroFuzzy Applications Explained: Prentice Hall PTR, 350p.
Hierarchical Fuzzy Classification of Remote Sensing Data Yan Wang, Mo Jamshidi, Paul Neville, Chandra Bales and Stan Morain
Abstract Land covers mix and high input dimension are two important issues to affect the classification accuracy of remote sensing images. Fuzzy classification has been developed to represent the mixture of land covers. Two fuzzy classifiers of Fuzzy Rule-Based (FRB) and Fuzzy Neural Network (FNN) were studied to illustrate the interpretability of fuzzy classification. A hierarchical structure was proposed to simply multi-class classification to multiple binary classification to reduce computation time caused by high number of inputs. The classifiers were compared on the land cover classification of a Landsat 7 ETM+ image over Rio Rancho, New Mexico, and it was proved that Hierarchical Fuzzy Neural Network (HFNN) classifier is the best combination of better classification accuracy with shorter CPU time requirement.
1 Introduction Remotely-sensed imagery classification involves the grouping of image data into a finite number of discrete classes. The Maximum Likelihood Classifier (MLC) method, widely used in remote sensing classification, is based on normal data distribution assumption. However, geo-spatial phenomena do not occur randomly in nature and frequently are not displayed in the image data with a normal distribution. Neural Network (NN) classifiers, without such data distribution requirement [1],
Yan Wang Intelligent Inference Systems Corp., NASA Research Park, MS: 566-109, Moffett Field, CA, 94035 e-mail: [email protected] Mo Jamshidi Department of Electrical and Computer Engineering and ACE Center, University of Texas at San Antonio, San Antonio, TX, 78924 e-mail: [email protected] Paul Neville · Chandra Bales · Stan Morain The Earth Data Analysis Center (EDAC), University of New Mexico, Albuquerque, New Mexico, 87131 e-mail: {pneville, cbales, smorain}@edac.unm.edu
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
333
334
Y. Wang et al.
have increasingly been used in remote sensing classification. In the remote sensing data classification, multiple classes are identified, and each class is represented by a variety of patterns to reflect the natural variability. The neural network works based on training data and learning algorithms, which cannot be interpreted by human language. Normally, it takes longer training and/or classification time for the neural network classification compared to the Maximum Likelihood Classifier [1]. In remote sensing images, a pixel might represent a mixture of class covers, within-class variability, or other complex surface cover patterns that cannot be properly described by one class. These may be caused by the ground characteristics of the classes and the image spatial resolution. Since one class cannot uniquely describe these pixels, fuzzy classification [2, 3] has been developed contrast to the traditional classification, where a pixel does or does not belong to a class. In the fuzzy classification, each pixel belongs to a class with a certain degree of membership and the sum of all class degrees is 1. A fuzzy sets approach [2, 4, 5] to image classification makes no assumption about statistical distribution of the data and so reduces classification inaccuracies. It allows for the mapping of a scene’s natural fuzziness or imprecision [2], and provides more complete information for a thorough image analysis. Fuzzy c-means algorithm [6], as an unsupervised method, is widely used in the fuzzy classification and outputs membership values. Fuzzy k-Nearest Neighbour (kNN) [7] and fuzzy MLC [8] algorithms have been applied to improve the classification accuracy. Fuzzy Rule-Based (FRB) classifiers are used by [9, 10] for multispectral images with specific membership functions. Based on human knowledge, fuzzy expert systems are widely used in control systems [11] with general membership functions and interpretative ability and will be applied on the land cover classification. There are two common ways to generate rules in the fuzzy expert system, expert knowledge and training data. Given with the natural variability and complicated patterns in the remote sensing data, it is difficult to incorporate complete fuzzy rules from expert knowledge to the classification system. The training data is desired to obtain the rules, but there is no learning process to adapt to the patterns. Fuzzy Neural Network (FNN) classifiers, which could combine the learning capability of neural networks in the fuzzy classification, have been applied in the remote sensing data classification. Fuzzy classification has been applied in neural networks to relate their outputs to the class contribution in a given pixel [12, 13]. The combination of fuzzy-c-means and neural networks [14, 15] has attracted the attention again. A possibility-based fuzzy neural network is proposed in [16], as the extension of a typical multiplayer neural network. High input dimension is a crucial problem in the land cover classification since it can cause the classification complexity and the computation time unacceptable. Different feature extraction methods, such as Principle Component Analysis (PCA) [17], DBFE (Decision Boundary Features Extraction) [18], genetic algorithms [7], etc. have been applied to preprocess the data. We will propose a hierarchical structure to simplify multi-class classification to multiple binary class classification so that the computation time is not increased even with high number of input bands. Specially, the fuzzy neural network classifier with the hierarchical structure is proposed as Hierarchical Fuzzy Neural Network (HFNN) classifier. The expert
Hierarchical Fuzzy Classification of Remote Sensing Data
335
knowledge could be not only used to generate the fuzzy rules, but also used to build up the hierarchical structure. In this paper, the land cover classification of a Landsat 7 Enhanced Thematic Mapper Plus (ETM+) image over Rio Rancho, New Mexico is studied. Section II gives the Landsat 7 ETM+ data set. Section III describes the Neural Network (NN), Fuzzy Rule-Based (FRB) and Fuzzy Neural Network (FNN) classifier. Section IV illustrates the hierarchical structure and the Hierarchical Fuzzy neural Network (HFNN) classifier. Section V presents the fuzzy classification and hierarchical fuzzy classification results on the Landsat 7 EMT+ image. The classification performance is evaluated by using overall, average, individual producer accuracy, and kappa coefficient (Foody, 1992). Section VI concludes with a summary.
2 Experimental Setup 2.1 Landsat 7 ETM+ Data The Landsat 7 ETM+ instrument is a nadir-viewing, multispectral scanning radiometer, which provides image data for the Earth’s surface via eight spectral bands (NASA, 2000; USGS Landsat 7, 2000). These bands range from the visible and near infrared (VNIR), the mid-infrared (Mid_IR), and the thermal infrared (TIR) regions of the electromagnetic spectrum (Table 1). The image is initially obtained as a Level 1G data product through pixel reformatting, radiometric correction, and geometric correction (Bales, 2001). Data are quantized at 8 bits. The image (Fig. 1) used in the study is acquired over the Rio Rancho, New Mexico and is 744 lines × 1014 columns (or 754,416 pixels total) for each band. Nine types of land cover are identified in this area - water (WT), urban imperious (UI), irrigated vegetation (IV), barren (BR), caliche-barren (CB), bosque/riparian forest (BQ), shrubland (SB), natural grassland (NG), and juniper savanna (JS). Urban areas, with their mix of buildings, streets and vegetation are an example of high degree of class mixture. Similarly, the shrubland, natural grassland and juniper savanna classes are highly mixed too. In addition, two non-spectral bands are considered. NDVI (Normalized Difference Vegetation Index, TM9) is used to discriminate between the land cover’s
Table 1 Landsat 7 ETM+ bands, spectral ranges, and ground resolutions Band Number
Spectral Range (μm)
Ground Resolution (m)
TM1 (Vis-Blue) TM2 (Vis-Green) TM3 (Vis-Red) TM4 (NIR) TM5 (Mid-IR) TM6 (TIR) TM7 (Mid-IR) TM8 (Pan)
0.450 – 0.515 0.525 – 0.605 0.630 – 0.690 0.750 – 0.900 1.550 – 1.750 10.40 – 12.50 2.090 – 2.350 0.520 – 0.900
30 30 30 30 30 60 30 15
336
Y. Wang et al.
Fig. 1 ETM+ image with 3 bands TM1, TM4, TM7 displayed in blue, green, and red, respectively; training and testing areas displayed in red and yellow squares respectively
vegetation responses. A scaled NDVI [19] for display is computed by: Scaled N DV I = 100∗ [(T M4 − T M3)/(T M4 + T M3) + 1].
(1)
In (1), TM4 is the near-infrared band and TM3 is the visible red band with values greater than 100 indicating an increasing vegetation response, and lower values (as they approach 0) indicating an increasing soil response. DEM (Digital Elevation Model, TM10) could be used to discriminate between some map units such as the juniper savanna, which is found at higher elevation, and the bosque/ riparian forest, which is found at lower elevations. These two band images are displayed in Fig. 2.
2.2 Regions of Interests (ROIs) and Signature Data Region of Interests (ROIs) are groups of image pixels which represent known class labels or called ground-truth data. First, sixty-nine sampling data are collected based on information gathered in the field, using a GPS to record the locations and the map unit that each sample is categorized as. Second, the sampling data are extended to sixty-nine homogenous regions using a region growing method by ERDAS IMAGINE [20]. In the region growing, a distance and a maximum number of pixels are set for the polygon (or linear) region, and from the seeds (field samples) the continuous pixels within the predefined spectral distance are included in the regions of interests.
Hierarchical Fuzzy Classification of Remote Sensing Data
337
(a)
(b)
Fig. 2 (a) Scaled Normalized Difference Vegetation Index (TM9) image (b) Digital Elevation Model (TM10) image
From these seed polygons, basic descriptive statistics are gathered from each of the pixels in the seed polygons for each of the bands, which are signature data. The signature mean is plotted in Fig. 3. From the plot, we can see that some classes have very similar statistics, such as natural grassland and shrubland or barren and calichebarren, which are hard to be separated for the maximum likelihood classifier, based on the data statistics. In total, 9,968 ground-truth data points are collected from the sixty-nine regions, of which 37 areas (4,901 data points) are randomly selected to be used as the training data and the other 32 areas (5,068 data points) are used as the testing data. Their distribution for each class is listed in Table 2.
300
250
Caliche-Barren Barren Irrigated Vegetation
Mean Values
200
Juniper Savanna 150 Natural Grassland 100
Shrubland Urban Imperious Bosque
50
0 TM1
TM2
Fig. 3 Signature mean
TM3
Water TM4
TM5 TM6 Bands
TM7
TM8
TM9
TM10
338
Y. Wang et al. Table 2 Training and testing data (# data points / # areas) distribution Class
Training Data (4901/37)
Testing Data (5068/32)
Water Urban Imperious Irrigated Vegetation Barren Caliche-Barren Bosque Shrubland Natural Grassland Juniper Savanna
265/3 977/6 709/4 1729/8 133/2 124/2 470/5 229/4 265/3
303/2 1394/6 601/4 1746/7 70/1 74/1 453/5 263/3 164/3
3 Classification Methods 3.1 Neural Network (NN) Classifiers One of the most popular multilayer neural networks is the Back-Propagation Neural Network (BPNN), which is based on gradient descent in error and is also known as the back-propagation algorithm or generalized delta rule. Figure 4 shows a simple three-layer neural network consisting of an input layer, a hidden layer, and an output layer. More hidden layers can be used when we need special problem conditions or requirements. There are two operations in the learning process: feedforward and back propagation [21]. The number of input units is determined by the number of features n, and the number of output units corresponds to the number of categories, c. The target output of the training data (x, ωi ) is coded as a unit vector t = (t1 , t2 , . . . , tc )T with ti = 1 and t j = 0 (where i , j = 1, 2, . . . , c, and j = i ), where ωi is the category the input vector x = (x 1 , x 2 , . . . , x n )T belongs to. The classification output ωˆ of the input x is determined by the maximum element of the output z = (z 1 , z2 , . . . , zc )T .
input x
hidden y
output z
target t
y1 x1 yj
x2
xn
Input Layer
yn H Hidden Layer
Fig. 4 Three-layer feedforward networks
z1
t1
z2
tk
zc
tc
Output Layer
Hierarchical Fuzzy Classification of Remote Sensing Data
339
3.2 Fuzzy Rule-Based (FRB) Classifiers Fuzzy expert system is an automatic system that is capable of mimicking human actions for a specific task. There are three main operations in a fuzzy expert system [11]. The first operation is fuzzification, which is the mapping from a crisp point to a fuzzy set. The second operation is inferencing, which is the evaluation of the fuzzy rules in the form of IF-THEN. The last operation is defuzzification, which maps the fuzzy output of the expert system into a crisp value, as shown in Fig. 5. This figure also represents a definition of fuzzy control system. In the FRB classification, the ranges of input variables are divided into subregions in advance and each fuzzy rule is defined for each grid. A set of fuzzy rules is defined for each class. The membership degree of an unknown input for each class is calculated and the input is classified into the class associated with the highest membership degree. FUzzy Logic Development Kit, FULDEK [22] is used to generate the fuzzy rules from data points. In the FULDEK, each input variable is divided evenly and the output variable is represented by a singular function. More subregions (up to 125) or more rules (up to 400) are required to map the input and output data points, with lower error without a pruning process.
3.3 Fuzzy Neural Network (FNN) Classifiers Fuzzy Neural Network (FNN) is a connectionist model for fuzzy rules implementation and inference, in which fuzzy rule prototypes are imbedded in a generalized neural network, and is trained using numerical data. There are a wide variety of architectures and functionalities of FNN. They differ in the type of fuzzy rules, type of inference method, and mode of operation. A general architecture of a FNN consists of five layers. Figure 6 depicts a FNN for two exemplar fuzzy rules of Takagi and Sugeno’s type, whose consequent of each rule is a linear composition of inputs:
Input Observations
..
Human Expert
Rule Set
Data Set
Input (crisp)
Fuzzification (Fuzzy)
Output
Inference Engine
Fig. 5 Definition of a fuzzy expert system
Defuzzification (Fuzzy)
Approximate Output (Crisp)
340
Y. Wang et al. Input Layer
Fuzzification Layer
Rule Layer
Action Layer
Output Layer
A1 x1
x2
A2
R1
B1
R2
B2
x1 x2
o
x1 x2
Fig. 6 Fuzzy neural network (FNN) structure
Rule 1 : If x 1 is A1 and x 2 is B1 , then f1 = p11 x 1 + p12 x 2 + r1 . Rule 2 : If x 1 is A2 and x 2 is B2 , then f2 = p21 x 1 + p22 x 2 + r2 .
(2)
where Ai , Bi (i = 1, 2) are fuzzy linguistic membership functions and pi j , ri are parameters in the consequent f i of Rule i . It consists of the following layers: an input layer, where the neurons represent input variables x 1 and x 2 ; a fuzzification layer, where the neurons represent fuzzy values μ Ai (x 1 ) and μ Bi (x 2 ), the membership degree of x 1 and x 2 ; a rule layer, where the neurons represent fuzzy rules with a normalized firing strength w i , the firing strength wi can be the product of membership degree, μ Ai (x 1 )×μ Bi (x 2 ); an action layer, where the neurons represent weighted consequent of rules wi f i (i = 1, 2); and an output layer, where the neuron 2 2 wi f i represents output variable o = w i fi = i=1 . 2 i=1 i=1 wi The ANFIS (Adaptive-Neural-Network Based Fuzzy Inference System) [23] is considered as the fuzzy neural network classifier, which is able to build up fuzzy rules from data points, and zeroth or first order Sugeno-type fuzzy inference is provided. A gradient descent learning algorithm or in combination with least squares estimate (a hybrid learning) is available to adjust the parameters in (2).
4 Hierarchical Structure In section II, we have defined the nine land cover classes to be separated. Normally, it is hard to achieve good performance with only two or three input bands, especially for the similar classes without big differentiation in most spectral bands. To discriminate the similar classes, we have introduced the two non-spectral bands, NDVI and DEM. For the fuzzy classifiers discussed in Section III, assume that an input band is partitioned by m membership functions, then n input bands will consists of n m × 9 rules for the classification, so the classification complexity increases exponentially with the raising number of input bands. With a high number of rules, the time
Hierarchical Fuzzy Classification of Remote Sensing Data
341
required for the classification would be unreasonable. A hierarchical structure is proposed to solve the problem. In the hierarchical structure, any classifier discussed in section III could be used. The hierarchical structure simplifies multi-class one-level classification to gradual levels of binary class classification. In the first level, all the land cover classes are separated to two groups. For each group, it is further divided to sub-groups in the next level through binary classification until there is only one land cover class existing in the group. For the group classification, additional input bands are selected based on observing the signature data and consulting geographical experts. For the signature data in Fig. 3, we have found that band TM5 and TM7 are effective to separate the nine land cover classes to two groups. In Fig. 7 (a), the upper group consists (a) TM8 IV, BQ
TM9 F N N
TM5 TM7
F N N
WT, UI TM1
BR, CB, SB, NG, JS
TM10 BR, CB
TM8
F N N
BQ WT
WT, UI, IV, BQ
TM3
IV F N N
F N N
F N N
TM10
BR CB
JS
SB, NG, JS TM1
UI
F N N
SB, NG
TM7
SB F N N
NG
(b) F N TM7, TM8, TM9, TM10 N
WT
F N N
JS
TM1, TM3, TM5
TM1, TM3, TM5 TM7, TM8, TM9, TM10
Fig. 7 Classification structure of (a) HFNN (b) FNN (WT: Water, UI: Urban Imperious, IV: Irrigated Vegetation, BR: Barren, CB: Caliche-Barren, BQ: Bosque, SB: Shrubland, NG: Natural Grassland, JS: Juniper Savanna)
342
Y. Wang et al.
of water, urban impervious, irrigated vegetation, and bosque, with lower brightness values in band TM5 and TM7; the lower group consists of barren, caliche-barren, shrubland, natural grassland, and juniper savanna, with higher brightness values in TM5 and TM7. Similarly, the upper and lower group is further separated to two groups with the additional input bands assisted until the upper and lower group is finalized with one land cover class. For each group classification, two or three input bands are enough to achieve good performance regardless of the total number of input bands. By using the hierarchical structure, the classification complexity is not affected by the increasing number of input bands. The classification structure with Hierarchical Fuzzy Neural Network (HFNN) is compared to the one with Fuzzy Neural Network (FNN) in Fig. 7. The hierarchical structure is not unique and flexible. Once there is any change in the input bands, we only need to modify the current levels with the changed bands and their posterior levels, without changing their previous levels. Totally, five spectral bands (TM1, TM3, TM5, TM7, TM8) and two non-spectral bands (TM9, TM10) are applied on the classification. If normal type rules were used blindly the number of rules needed for the classification would be extremely high. Assuming each input is partitioned using two membership functions the number of rules needed would be 27 × 9 = 1152. However, the hierarchical structure with the same membership functions consists of eight-group classification, six with two inputs and two with three inputs, so together 22 × 6 + 23 × 2 = 40 rules are used totally. The number of rules is reduced tremendously by using the hierarchical classification, and it saves the computation time a lot.
5 Classification Results 5.1 Fuzzy Classification In this experiment, we will illustrate the fuzzy classification results of the Fuzzy Rule-Based (FRB) and Fuzzy Neural Network (FNN) classifier on the ETM+ image, which are compared with those of the Maximum Likelihood Classifier (MLC) and Back-Propagation Neural Network (BPNN) classification. First, three input bands TM1, TM4 and TM7 are recommended for the classification by geographical experts, which is one of the most effective combinations to discriminate the land cover classes. In the MLC classification, the mean and covariance matrix of each category are estimated from the training data and each pixel is classified to the category with the maximum posterior probability. The accuracy of the water and bosque class is zero and the overall and average accuracy is only 64% and 47%, respectively (Table 3), not acceptable for most land cover classification. In the BPNN classification, one hidden layer with 5, 10, and 15 hidden nodes and two hidden layers with 5, 5 hidden nodes are tested with 1000 and 10000 training epochs respectively on the ETM+ image. Each band is first normalized
MLC
0 99.78 73.54 56.41 87.14 0 36.2 53.61 20.73 63.5 47.49 54.21
Classes
Water Urban Impervious Irrigated Vegetation Barren Caliche-Barren Bosque Shrubland Natural Grassland Juniper Savanna Overall Average Kappa Coefficient
100 99.28 97.5 93.36 0 71.62 67.55 0 0 84.1 58.81 79.19
BPNN 359 1000 100 99.43 95.01 91.81 0 100 63.58 28.9 8.54 85.14 65.25 80.74
BPNN 359 10000 100 99.21 97.34 89.98 0 89.19 62.91 50.95 14.02 85.83 67.07 81.74
BPNN 3109 1000 100 98.35 97.5 93.07 0 95.95 74.61 33.08 14.63 86.92 67.47 83
BPNN 3109 10000 100 99.43 97.84 90.49 0 89.19 56.29 59.32 7.32 85.75 66.65 81.62
BPNN 3159 1000 100 98.13 98.17 92.9 0 89.19 71.96 0 14.02 84.81 62.71 80.19
BPNN 3159 10000 0 99.07 98.5 89.69 0 0 99.34 0 0 78.71 42.96 72.32
BPNN 3559 1000
100 99.07 98 92.1 0 100 64.68 59.32 0 86.9 68.13 83.06
BPNN 3559 10000
100 96.7 97.34 93.36 0 100 66.89 4.94 5.49 84.16 62.75 79.48
FRB
100 99 97.5 90.66 0 100 64.46 47.15 22.56 86.4 69.04 82.47
FNN
Table 3 Classification accuracy (%) comparison (BPNN 3109, 10000 represents the BPNN with 3 input nodes, 10 hiden nodes, 9 outputs nodes, and trained with 10000 epochs; the bolded coloumns are for comparison)
Hierarchical Fuzzy Classification of Remote Sensing Data 343
344
Y. Wang et al.
to be in [0, 1]. The output nodes are coded as each node representing a class. The training of BPNN adaptively adjusts the learning rate and the momentum. The classification accuracy is listed in Table 3. The eight BPNNs with different structure or training parameters result in the different accuracy of individual classes, such as, the water, bosque, natural grassland, and juniper savanna class. However, their variation does not affect the zero accuracy of the caliche-barren class, most of which is misclassified to the barren class. The accuracy of the natural grassland, and juniper savanna class is 0–59% and 0–15% respectively, even lower than that of the MLC classification 54% and 21% respectively, due to their mix and the mix with the shrubland class. Even though, the overall and average accuracy of the BPNN classification, 79%–87% and 43%–68% respectively, is generally higher than that of the MLC classification, and with a higher kappa coefficient, 72%–83%. The BPNN (3109, 10000) classification, with 3 input nodes, 10 hidden nodes, 9 outputs nodes, and trained with 10000 epochs, is the representative used to compare to other classifiers. In the FRB classification [24, 25], band TM1, TM4, and TM7 is normalized to be in [0, 1] as done in the BPNN classification, then evenly represented by three triangle membership functions, Small, Medium and Large shown in Fig. 8 (a). There are 27 (33 ) rules generated for each class by combining different fuzzified input variables so together 27 × 9 or 243 rules used in the classification. The rule base, such as for the water class, is following: IF TM1 is Small and TM4 is Small and TM7 is Small THEN water is S1; IF TM1 is Small and TM4 is Small and TM7 is Medium THEN water is S2; . . .. IF TM1 is Big and TM4 is Big and TM7 is Medium THEN wateris S26; IF TM1 is Big and TM4 is Big and TM7 is Big THEN water is S27. where, S1, . . ., S27 are singular membership functions for the water class (Table 4); multiplication is used for evaluating and of the antecedents; all the rules are combined by a maximum operation. The maximum output of all the rules represents the membership degree of a pixel belonging to the water class. The overall and average accuracy in Table 3 of the FRB classification is degraded by 3% and 4% respectively, and the kappa coeffiecient is degraded by 3%, compared to that of the BPNN classification. The accuracy of the natural grassland and juniper savanna class is much lower, 5% and 6% respectively. To improve the individual class accuracy of the FRB classifier, we have applied the FNN classification Similary, three triangle membership fucntions are used to represent each input variable so 27 rules are generated for each class and total 243 rules are used in the classification. However, the input membership functions have been changed to adapt to the training data with the hybrid learning algorithm, displayed in Fig. 8 (b), (c), (d). The rule base for the water class is modified to be: IF TM1 is TM1Small and TM4 is TM4Small and TM7 is TM7Small THEN water is S1;
Hierarchical Fuzzy Classification of Remote Sensing Data
345
(a) Small
(b)
Medium
1
TM1Small 1
Big
Degree of membership
Degree of m em bers hip
TM1Big
0.8
0.8
0.6
0.4
0.6
0.4
0.2
0.2
0
0 0
TM1Medium
0.2
0.4
0.6
0.8
0
1
0.2
0.4
(c) TM4Small 1
TM4Big
TM7Small 1
1
TM7Medium
TM7Big
0.8 Degree of membership
Degree of membership
0.8
(d)
TM4Medium
0.8
0.6
0.4
0.2
0.6
0.4
0.2
0 0
0.6 TM1
TM1(TM4, TM7)
0 0.2
0.4
0.6
0.8
1
0
0.2
TM4
0.4
0.6
0.8
1
TM7
Fig. 8 Input membership functions (a) TM1, TM4, TM7 in the FRB classification (b) TM1 in the FNN classification (c) TM4 in the FNN classification (d) TM7 in the FNN classification
IF TM1 is TM1Small and TM4 is TM4Small and TM7 is TM7Medium THEN water is S2; . . .. IF TM1 is TM1Big and TM4 is TM4Big and TM7 is TM7Medium THEN water is S26; where S1, . . ., S27 are the constants for the water class (Table 5) and the other options are same as those in the FRB classification. If the maximum output of all the rules is less than zero or bigger than one, it is regarded as zero or one, respectively. The natural grassland and juniper savanna classes are discriminated with a better accuracy of 47% and 23%, respectively (Table 3). The overall accuracy is 86%,
Table 4 Singular membership functions for the water class of the FRB classification S1: 0.749 S5: 0.310 S9: 0.5 S13: S2: 0.141 S6: 0.497 S10: 0.04 S14: S3: 0.499 S7: 0.238 S11: 0.259 S15: S4: 0 S8: 0.285 S12: 0.508 S16:
0.294 0.217 0.215 0.297
S17: S18: S19: S20:
0.297 0.223 0.217 0.245
S21: S22: S23: S24:
0.498 S25: 0.261 0.224 S26: 0.227 0.237 S27: 0.235 0.224
346
Y. Wang et al.
Table 5 Constant outputs for the water class in of the FNN classification S1: 0.999 S5: −0.001 S9: 0.002 S2: −0.222 S6: 0.006 S10: −0.349 S3: 3.800 S7: 0.001 S11: 0.093 S4: 0.015 S8: −0.010 S12: −1.588
S13: −0.001 S17: 0 S21: S14: 0 S18: 0 S22: S15: 0 S19: −0.022 S23: S16: −0.003 S20: 0.043 S24:
0 S25: 0 0 S26: 0 0 S27: 0 0
similar to that of the BPNN classification. The average accuracy is 69%, better than that of the BPNN and FRB classification, 67% and 62% respectively. The classification maps of the four classifiers are shown in Fig. 9 with each class displayed by an individual color. In the MLC classification map, all the water and bosque areas are misclassified as the urban impervious. The maps of the BPNN and FNN classifier are similar and different from that of the FRB classifier, where the natural grassland and juniper savanna are less showed up but covered by the shrubland class, consistent with the previous analysis.
Water Caliche-Barren
(a)
(b)
(c)
(d)
Urban Impervious Bosque
Irrigated Vegetation Shrubland Juniper Savanna
Fig. 9 Classification map of (a) MLC (b) BPNN (c) FRB (d) FNN
Barren Natural Grassland
Hierarchical Fuzzy Classification of Remote Sensing Data
347
5.2 Hierarchical Classification In this experiment we will implement the hierarchical classification in Fig. 7(a). In the Hierarchical Fuzzy Neural Network (HFNN) classification, each group classification is composed of the FNN classifier and the FNN outputs in the previous level are fed to the group classification in the next level. The prior classification affects its posterior classification and the final classification results. For each FNN classifier, the input variable is represented by two Gaussian combination membership functions. Similarly, the FNN classification without the hierarchical structure in Fig. 7 (b) is implemented with the seven input bands and with the same membership functions. Their error matrices are listed in Table 6 and 7, respectively. The overall accuracy, average accuracy, and kappa coefficient of the HFNN classification Table 6 Error matrix of the HFNN classification (WT: Water, UI: Urban Imperious, IV: Irrigated Vegetation, BR: Barren, CB: Caliche-Barren, BQ: Bosque, SB: Shrubland, NG: Natural Grassland, JS: Juniper Savanna) Actual Classes
Predicted Classes WT
WT UI IV BR CB BQ SB NG JP
UI
IV
Accuracy (%) BR
CB
BQ
SB
NG
303 0 0 0 0 0 0 0 14 1367 0 12 0 0 0 0 8 16 577 0 0 0 0 0 0 46 0 1491 53 0 8 37 0 0 0 0 70 0 0 0 0 0 0 0 0 74 0 0 0 4 0 0 0 0 439 0 0 20 0 0 0 0 168 42 0 0 0 2 0 0 0 0 Overall Accuracy (%) = 89.29, Average accuracy (%) = 87.9, Kappa Coefficient (%) = 86.39, CPU Time = 223s
JP 0 1 0 111 0 0 10 33 162
100 98.06 96.01 85.4 100 100 96.91 15.97 98.78
Table 7 Error matrix of the FNN classification (WT: Water, UI: Urban Imperious, IV: Irrigated Vegetation, BR: Barren, CB: Caliche-Barren, BQ: Bosque, SB: Shrubland, NG: Natural Grassland, JS: Juniper Savanna) Actual Classes
Predicted Classes WT
WT UI IV BR CB BQ SB NG JP
UI
IV
Accuracy (%) BR
CB
BQ
SB
NG
303 0 0 0 0 0 0 0 1 1245 1 36 0 0 1 1 2 79 496 0 0 0 24 0 0 73 2 1527 0 0 2 27 0 0 0 2 68 0 0 0 0 0 0 0 0 74 0 0 0 17 0 1 0 0 108 327 0 0 0 0 0 0 103 127 0 0 0 44 2 0 57 0 Overall Accuracy (%) = 79.1, Average accuracy (%) = 73.97, Kappa Coefficient (%) = 73.41, CPU Time = 10070s
JP 0 109 0 115 0 0 0 33 61
100 89.31 82.53 87.46 97.14 100 23.84 48.29 37.2
348
Y. Wang et al. (a)
Water Caliche-Barren
(b)
Urban Impervious Bosque
Irrigated Vegetation Shrubland
Barren
Natural Grassland
Juniper Savanna
Fig. 10 Classification map of (a) HFNN (b) FNN
is 10%, 14%, and 12% respectively higher than that of the FNN classification. The HFNN classification runs 223s (3.7min) on a Pentium IV 2.2GHz computer. However, the FNN classification costs 10070s (2.8hr) on the same computer, around 45 times of the HFNN computation time. Even though the input membership functions are changed with different numbers or types then the accuracy difference between the HFNN and FNN classification is varied, the HFNN classifier always gains better classification accuracy in shorter CPU time than the FNN classifier with the same parameters. The classification maps of the HFNN and FNN classifier are compared in Fig. 10. Compared to the FNN classifier with the three input bands, the HFNN classifier improves the overall accuracy and average accuracy with 3% and 9% respectively, with a 4% higher kappa coefficient. Specially, the caliche-barren class is separated from the barren class with an accuracy of 100% and the classification of shrubland and juniper savanna classes are much improved with an accuracy of 97% and 99% respectively. The natural grassland class is not classified well with an accuracy of 16%, and its 64% areas are misclassified to the shrubland. However, the FNN classifier with the seven input bands degrades the overall accuracy and kappa coefficient with 7% and 9% respectively, but improves the average accuracy with 5%, since the input variable is represent by less number of membership functions.
6 Conclusions In this paper, we discussed the Neural Network (NN), Fuzzy Rule-Based (FRB), and Fuzzy Neural Network (FNN) classifier, which were compared and tested on the Landsat 7 ETM+ image over Rio Rancho, NM. The FRB and FNN classifier were represented by IF-THEN rules, interpretable by humans and easily to incorporate human experiences. But the FRB classifier did not achieve the classification accuracy as good as that of the NN classifier because of limited expert knowledge
Hierarchical Fuzzy Classification of Remote Sensing Data
349
or rule base. The FNN classifier improved the classification accuracy of some categories by combing the learning capability of neural networks. Even though we have applied different classification methods to improve the classification accuracy, some similar classes could still not be separated or only be separated with very low accuracy. To discriminate the similar classes, we have introduced some non-spectral bands, which on the other hand would increase the classification complexity and the computation time. High input dimension could cause the computation time too long to be unacceptable. The hierarchical structure was proposed to simplify the multiclass classification to multiple binary class classification so that the classification complexity was not increased even with more number of input bands. By using the hierarchical structure, we can gain better classification accuracy in shorter computation time. It was showed in the experiment that the Hierarchical Fuzzy Neural Network (HFNN) classifier achieved the best overall accuracy (89%) and average accuracy (88%) in shorter CPU running time (3.7min) with only 40 fuzzy rules. Especially, the similar clasesse, such as the caliche-barren, shrubland and juniper savanna classes, were seperated with higher accuracy. In the fuzzy classification, we only used linear rules and linear outputs for the classification. Nonlinear rules or nonlinear outputs, such as the extension of ANFIS, Coactive Neuro-Fuzzy Modeling CANFIS [26], could be considered to improve the classification accuracy, especially that of the natural grassland class, which was still low. An appropriate hierarchical structure was necessary to gain good classification performance but it was difficult to generate an optimal one by observing the data statistics. In the future work, we could consider implementing the hierarchical structure automatically with genetic algorithms. Acknowledgments The authors would like to thank anonymous reviewers for their comments, the staff at the Earth Data Analysis Center (EDAC) at UNM for providing the Landsat 7 data set and expert knowledge, the professors and students at the Autonomous Control Engineering (ACE) Center at UNM for their support.
References 1. Heermann, P. D., and N. Khazenic, 1992. “ Classification of Multispectral Remote Sensing Data using a Back-Propagation Neural Network”, IEEE Trans. Geosci. and Remote Sens., 30(1): 81–88. 2. Foody, G. M., 1992. “A fuzzy sets approach to the representation of vegetation continua from remotely sensed data: An example from lowland heath”, Photogramm. Eng. & Remote Sens., 58(2): 221–225. 3. Foody, G. M., 1999. “The Continuum of Classification Fuzziness in Thematic Mapping,” Photogramm. Eng. Remote Sens., 65(4): 443–451. 4. Jeansoulin, R., Y. Fonyaine, and W. Fri, 1981. “Multitemporal Segmentation by Means of Fuzzy Sets,” Proc. Machine Processing of Remotely Sensed Data Symp., 1981, pp. 336–339. 5. Kent, J. T., and K. V. Mardia, 1988. “Spatial Classification Using Fuzzy Member-Ship Models,” IEEE Trans. Pattern Anal. Machine Intel., 10(5): 659–671.
350
Y. Wang et al.
6. Cannon, R. L., J. V. Dave, J. C. Bezdek, and M. M. Trivedi, 1986. “Segmentation of A Thematic Mapper Image Using The Fuzzy C-Means Clustering Algorithm,” IEEE Trans. Geosci. and Remote Sensing, GRS-24(3): 400–408. 7. Yu, Shixin, De Backer, S., and Scheunders, P., 2000 “Genetic feature selection combined with fuzzy kNN for hyperspectral satellite imagery”, Proc. IEEE International Geosci. and Remote Sensing Symposium (IGARSS’00), July 2000, vol. 2, pp. 702–704. 8. Wang, F., 1990. “Improving Remote Sensing Image Analysis Through Fuzzy Information representation,” Photogramm. Eng. & Remote Sens., 56(8): 1163–1169. 9. Bárdossy, A., and L. Samaniego, 2002. “Fuzzy Rule-Based Classification of Remotely Sensed Imagery”, IEEE Trans. Geosci. and Remote Sens., 40(2): 362–374. 10. Melgani, F. A., B. A. Hashemy, and S. M. Taha, 2000. “An explicit fuzzy supervised classification method for multispectral remote sensing images”, IEEE Trans. Geosci. and Remote Sens., 38(1): 287–295. 11. Jamshidi, M., 1997. Large-Scale Systems: Modeling, control, and fuzzy logic, Prentice Hall Inc.. 12. Foody, G. M., 1996. “Relating the land-cover composition of mixed pixels to artificial neural network classification output”, Photogramm. Eng. & Remote Sens., 62: 491–499. 13. Warner, T. A., and M. Shank, 1997. “An evaluation of the potential for fuzzy classification of multispectral data using artificial neural networks”, Photogramm. Eng .& Remote Sens., 63: 1285–1294. 14. Tzeng, Y. C. and K. S. Chen, 1998. “A Fuzzy Neural Network to SAR Image Classification”, IEEE Trans. Geosci. and Remote Sens., 36(1). 15. Sun, C. Y., C. M. U. Neale, and H. D. Cheng, 1998. “A Neuro-Fuzzy Approach for Monitoring Global Snow and Ice Extent with the SSM/I”, Proc. IEEE International Geosci. and Remote Sensing Symposium (IGARSS’98), July 1998, vol. 3, pp. 1274–1276,. 16. Chen, L., D. H. Cooley, and J. P. Zhang, 1999. “Possibility-Based Fuzzy Neural Networks and Their Application to Image Processing”, IEEE Trans. Systems, Man, and Cybernetics-Part B: Cybernetics, 29(1): 119–126. 17. Bachmann, C., T. Donato, G. M. Lamela, W. J. Rhea, M. H. Bettenhausen, R. A. Fusina, K. R. Du Bois, J. H. Porter, and B. R. Truitt, 2002. “Automatic classification of land cover on Smith Island, VA, using HyMAP imagery,” IEEE Trans. Geosci. and Remote Sens., 40: 2313–2330. 18. Dell’Acqua, F., and Gamba, P., 2003. “Using image magnification techniques to improve classification of hyperspectral data”, Proc. IEEE International Geosci. and Remote Sensing Symposium (IGARSS ’03), July 2003, vol. 2, pp. 737–739. 19. Bales, C. L. K., 2001. Modeling Feature extraction for erosion sensitivity using digital image processing, Master Thesis, Department of Geography, University of New Mexico, Albuquerque, New Mexico. 20. ERDAS, 1997. ERDAS Field Guide, Fourth Edition, 1997, ERDAS@ , Inc. Atlanta, Georgia. 21. Duda, R. O., P. E. Hart, and D. G. Stork, 2001. Pattern Classification, Jone Wiley & Sons (Asia) Pte. Ltd., New York. 22. Dreier, M. E., 1991, FULDEK-Fuzzy Logic Development Kit, TSI Press, Albuquerque, NM. 23. Jang, J. S. R., 1993. “ANFIS: Adaptive-Network-Based Fuzzy Inference System”, IEEE Trans. Systems, Man, and Cybernetics, 23(3): 665–685. 24. Wang, Y., and M. Jamshidi, 2004. “Fuzzy Logic Applied in Remote Sensing Image Classification”, IEEE Conference on Systems, Man, and Cybernetics, 10-13 Oct. 2004, Hague, Netherlands, pp. 6378–6382. 25. Wang, Y., and M. Jamshidi, 2004. “Multispectral Landsat Image Classification Using Fuzzy Expert Systems”, World Automation Congress, June 28 - July 1, 2004, Seville, Spain, in Image Processing, Biomedicine, Multimedia, Financial Engineering and Manufacturing, vol. 18, pp. 21–26, TSI press, Albuquerque, NM, USA. 26. Jang, J.-S. R., C.-T. Sun, and E. Mizutani, 1997. Neuro-Fuzzy and Soft Computing, Prentice Hall, New Jersey.
Real World Applications of a Fuzzy Decision Model Based on Relationships between Goals (DMRG) Rudolf Felix
Abstract Real world applications of a decision model of relationships between goals based on fuzzy relations (DMRG) are presented. In contrast to other approaches the relationships between decision goals or criteria for each decision situation are represented and calculated explicitly. The application fields are decision making for financial services, optimization of production sequences in car manufacturing and vision systems for quality inspection and object recognition.
1 Introduction When human decision makers deal with complex decision situations, they intuitively deal with relationships between decision goals or decision criteria and they reason why the relationships between the goals do exist (Felix 1991). As discussed before, for instance in (Felix 1995), other decision making models are not flexible enough and do not reflect the tension between interacting goals in a way human decision makers do. In contrast to this, the decision making model underlying the real world applications presented in this paper, meets the required flexibility (Felix et al. 1996), (Felix 2001), (Felix 2003). The key issue of this model called DMRG (“decision making based on relationships between goals”) is the formal definition of both positive and negative relationships between decision goals. Using the model in applications, the relationships are calculated driven by the current decision data. In this sense they are made explicit in every decision situation. After each calculation, the obtained information about the relationships between the goals together with the priorities of the goals are used in order to select the appropriate aggregation strategies. Finally, based on the selected decision strategies adequate decisions are found.
Rudolf Felix FLS Fuzzy Logik Systeme GmbH Joseph-von-Fraunhofer Straße 20, 44 227 Dortmund, Germany e-mail: [email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
351
352
R. Felix
2 Basic Definitions Before we define relationships between goals, we introduce the notion of the positive impact set and the negative impact set of a goal. A more detailed discussion can be found in (Felix 1991, Felix et al. 1994). Def. 1) Let A be a non-empty and finite set of potential alternatives, G a non-empty and finite set of goals, A ∩ G = Ø, a ∈ A, g ∈ G, δ ∈ (0, 1]. For each goal g we define the two fuzzy sets Sg and Dg each from A into [0,1] by: 1. positive impact function of the goal g Sg (a) :=
δ, a 0,
affects positively g with degree δ else
2. negative impact function of the goal g Def. 2) Dg (a) :=
δ, a 0,
affects negatively g with degree δ else
Let Sg and Dg be defined as in Def. 1). Sg is called the positive impact set of g and Dg the negative impact set of g. The set Sg contains alternatives with a positive impact on the goal g and δ is the degree of the positive impact. The set Dg contains alternatives with a negative impact on the goal g and δ is the degree of the negative impact. Def. 3) Let A be a finite non-empty set of alternatives. Let P(A) be the set of all fuzzy subsets of A. Let X, Y ∈ P(A), x and y the membership functions of X and Y respectively. The fuzzy inclusion I is defined as follows: I : P(A) × P(A) → [0, 1] ⎧ min(x(a), y(a)) ⎪ ⎪ ⎨ a∈A , I(X, Y ) =: x(a) ⎪ ⎪ a∈A ⎩ 1 ,
for X = Ø
with x(a) ∈ X
for X = Ø
and y(a) ∈ Y. The fuzzy non-inclusion N is defined as: N : P(A) × P(A) → [0, 1] N(X, Y ) : = 1 − I(X, Y ) The inclusions indicate the existence of relationships between two goals. The higher the degree of inclusion between the positive impact sets of two goals, the more
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
353
cooperative the interaction between them. The higher the degree of inclusion between the positive impact set of one goal and the negative impact set of the second, the more competitive the interaction. The non-inclusions are evaluated in a similar way. The higher the degree of non-inclusion between the positive impact sets of two goals, the less cooperative the interaction between them. The higher the degree of non-inclusion between the positive impact set of one goal and the negative impact set of the second, the less competitive the relationship. Note that the pair (Sg , Dg ) represents the entire known impact of alternatives on the goal g. (Dubois and Prade, 1992) shows that for (Sg , Dg ) the so-called twofold fuzzy sets can be taken. Then Sg is the set of alternatives which more or less certainly satisfy the goal g. Dg is the fuzzy set of alternatives which are rather less possible, tolerable according to the decision maker.
3 Definition of Relationships Between Goals Based on the inclusion and non-inclusion defined above, 8 basic types of relationships between goals are defined. The relationships cover the whole spectrum from a very high confluence (analogy) between goals to a strict competition (trade-off). Independence of goals and the case of unspecified dependence are also considered. Def. 4) Let Sg1 , D1 , Sg2 and Dg2 be fuzzy sets given by the corresponding membership functions as defined in Def. 2). For simplicity we write S1 instead of Sg1 etc. Let g1 , g2 ∈ G where G is a set of goals. The relationships between two goals are defined as fuzzy subsets of G × G as follows: 1. g1 is independent of g2 : ⇐ I S − I N D E P E N D E N T − O F(g1 , g2 ) := min (N(S1 , S2 ), N(S1 , D2 ), N(S2 , D1 ) N(D1 , D2 )), 2. g1 assists g2 :⇐ ASS I ST S(g1 , g2 ) := min(I(S1 S2 ), N(S1 , D2 )) 3. g1 cooperates with g2 :⇐ C O O P E R AT E S − W I T H (g1, g2 ) := min(I(S1 , S2 ), N(S1 , D2 ), N(S2 , D1 )) 4. g1 is analogous to g2 :⇐ I S − AN AL OG OU S − T O(g1 , g2 ) := min(I(S1 , S2 ), N(S1 , D2 ), N(S2 , D1 ), I(D1 , D2 )) 5. g1 hinders g2 :⇐ H I N D E RS(g1 , g2 ) := min(N(S1 , S2 ), I(S1 , D2 )) 6. g1 competes with g2 :⇐ C O M P E T E S − W I T H (g1, g2 ) := min(N(S1 , S2 ), I(S1 , D2 ), I(S2 , D1 )) 7. g1 is in trade-off to g2 :⇐ I S_I N_T R AD E_O F F(g1 , g2 ) := min(N(S1 , S2 ), I(S1 , D2 ), I(S2 , D1 ), N(D1 , D2 ))
354
R. Felix
ASSISTS COOPERATION ANALOGOUS
HINDERS COMPETE TRADE OFF
Fig. 1 Subsumption of relationships between goals
8. g1 is unspecified dependent from g2 :⇐ I S − U N S P EC I F I E D − D E P E N D E N T − F RO M(g1 , g2 ) := min(I(S1 , S2 ), I(S1 , D2 ), I(S2 , D1 ), I(D1 , D2 )) The relationships 2, 3 and 4 are called positive relationships. The relationships 5, 6 and 7 are called negative relationships. Note: The relationships between goals have a subsumption relation in the sense of Fig. 1. Furthermore, there are the following duality relations: assists ↔ hinders, cooperates ↔ competes, analogous ↔ trade off, which correspond to the common sense understanding of relationships between goals. The relationships between goals turned out to be crucial for an adequate modeling of the decision making process (Felix 1994) because they reflect the way the goals depend on each other and describe the pros and cons of the decision situation. Together with information about goal priorities, the relationships between goals are used as the basic aggregation guidelines in each decision situation: Cooperative goals imply a conjunctive aggregation. If the goals are rather competitive, then an aggregation based on an exclusive disjunction is appropriate. In case of independent goals a disjunctive aggregation is adequate.
4 Types of Relationships Between Goals Imply Way of Aggregation The observation, that cooperative goals imply conjunctive aggregation and conflicting goals rather lead to exclusive disjunctive aggregation, is easy to understand from the intuitive point of view. The fact that the relationships between goals are defined as fuzzy relations based on both positive and negative impacts of alternatives on the goals provides for the information about the confluence and competition bet-ween the goals. Figure 2 shows two different representative situations which can be distinguished appropriately only if besides the positive impact of decision alternatives additionally their negative impact on goals is considered.
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
355
Fig. 2 Distinguishing independence and trade-off based on positive and negative impact functions of goals
In case the goals were represented only by the positive impact of alternatives on them, situation A and situation B could not be distinguished and a disjunctive aggregation would be considered as plausible in both cases. However, in situation B a decision set S1 ∪ S2 would not be appropriate because of the conflicts indicated by C12 and C21 . In this situation the set (S1 /D2 )∪(S2 /D1 ) could be recommended in case that the priorities of both goals are similar (where X/Y is defined as the difference between the sets X and Y, that means X/Y = X∩ Y¯ , where X and Y are fuzzy sets). In case that one of the two goals, for instance goal 1, is significantly more important than the other one, the appropriate decision set would be S1 . The aggregation used in that case had not to be a disjunction, but an exclusive disjunction between S1 and S2 with emphasis on S1 .
356
R. Felix
This very important aspect, can easily be integrated into decision making models by analyzing the types of relationships between goals as defined in Def.4. The information about the relationships between goals in connection with goal priorities is used in order to define relationship-dependent decision strategies which describe the relationship-dependent way of aggregation. For conflicting goals, for instance, the following decision stra-tegy which deduces the appropriate decision set is given: If (g1 is in trade-off to g2 ) and (g1 is significantly more important than g2 ) then S1 . Another strategy for conflicting goals is the following: If (g1 is in trade-off to g2 ) and (g1 is insignificantly more important than g2) then S1 /D2 . Note that both strategies use priority information. Especially in case of a conflictive relationship between goals priority information is crucial for an appropriate aggregation. Note also that the priority information can only be used adequately if in the decision situation there is situation dependent explicit calculation about which goals are conflictive and which of them are rather confluent (Felix 2000).
5 Related Approaches, a Brief Comparison Since fuzzy set theory has been suggested as a suitable conceptual framework of decision making (Bellmann and Zadeh 1970), two directions in the field of fuzzy decision making are observed. The first direction reflects the fuzzification of established approaches like linear programming or dynamic programming (Zimmermann 1991). The second direction is based on the assumption that the process of decision making is adequately modeled by axiomatically specified aggregation operators (Dubois and Prade 1984). None of the related approaches sufficiently addresses the crucial aspect of decision making, namely the explicit and non-hard-wired modeling of the interaction between goals. Related approaches like those described in (Dubois and Prade 1984), (Biswal 1992) either require a very restricted way of describing the goals or postulate that decision making shall be performed based on rather formal characteristics of aggregation like commutativity or associativity. Other approaches are based on strictly fixed hierarchies of goals (Saaty 1992) or on the modeling of the decision situations as probabilistic or possibilistic graphs (Freeling 1984). In contrast to that, human decision makers usually proceed in a different way. They concentrate on the information which goals are positively or negatively affected by which alternatives. Furthermore, they evaluate this information in order to infer how the goals interact with each other and ask for the actual priorities of the goals. In the sense that the decision making approach presented in this contribution explicitly refers to the interaction between goals in terms of the defined relationships, it significantly differs from other related approaches (Felix 1995). The subsequently presented real world applications demonstrate the practical relevance of DMRG. In Sects. 6 and 7 two applications in financial services are described. Section 8 shows how DMRG is used for the optimization of sequences with which car manufacturers organize the production process. Finally, it is shown
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
357
in Sect. 9 how the concept of impact sets is used for image representation and that the relationships between goals can be applied for calculating similarity of images.
6 Application to Calculating Limit Decisions for Factoring Factoring is the business of purchasing the accounts receivable of a firm (client) before their due date, usually at a discount. The bank to which the accounts are sold is called factor. The factor has to decide what shall be the maximum acceptable limit of the accounts by evaluating the debitor of the client. For making its limit decisions, the factor follows a set of criteria. The decision alternatives consist of limit levels expressed, for instance, by percentages of the amount requested from the factor. Examples of decision criteria are explained in detail below.
6.1 Examples of Handling Decision Criteria Let us explain some typical decision criteria: 1. Statistical Information Index SI This index bears values of a specific SI-interval, ranging between SI1 and SI2 . There is, however, a subinterval of [SI1 , SI2 ] where the decision is made only in interaction with other criteria. 2. Integration in the Market This value indicates in which way the company has been active in the market and what is the corresponding risk of default. Experience shows that the relation between life span and risk of default is not inversely proportional, but that it shows a non-linear behavior, for instance depending on relations to other companies. 3. Public Information Gained from Mass Media, Phone Calls, Staff, etc. This kind of information has to be entered into the factoring system by means of a special qualitatively densified data statements. 4. Finance Information This information concerns aspects like credit granting, utilization and overdrawing as well as receivables. The factor’s staff will express the information obtained using qualitatively scaled values. The values depend on the experience of interaction with other criteria.
6.2 Impact Sets For expressing decision criteria, decision makers use linguistic statements. They say, for instance, that a particular criterion may have the values high, neutral or low, as
358
R. Felix Table 1 Impact sets for the criterion C and its values Limit granted in % of L
High
Neutral
Low
100 % 75 % 50 % 25 % 0%
++ ++ + – –
0 + ++ + –
– – – – ++
presented in Table 1. The decision alternatives reflect a granularity of percentage of the limit requested by the client. For simplicity we choose here a granularity of 25%. That means, when the limit requested is L, the limit decision equals a percentage < 100% of the requested amount L. For each criterion C and its corresponding value the impact sets are defined according to Definition 1. For instance, for the value “high” of the criterion C, the entry “++” will be expressed, according to Def. 1, as δ of SC high by a value close to 1. Analogously, for the value “low” of the criterion C, the entry “– ” will get as δ of DC high a value close to 1. This is reflected by the set SC high for positive and by the set DC high for negative impacts. Since the decision maker is evaluating pros and cons, both positive and negative impacts must be expressed. Thus, both the positive and the negative impact sets SC high and DC high are needed.
6.3 Real World Application Environment The factoring application presented in this section is part of an internet-based software system used by a factoring bank. The quality of the system’s recommended decisions is equal to the decision quality achieved by senior consultants of the bank. An important aspect of the system is its standardization of the general quality of limit decisions. Using the system, a very high percentage of the limit decisions is made automatically.
7 Application to Cross-Selling The so called cross-selling is part of the business process that takes place during every assisting conversation when a bank’s customer applies for a credit. In case the bank employee decides to grant the credit, the question arises whether the bank should offer the customer additional products like, for instance, a life assurance, credit card, current account, etc. It is the task of the bank’s sales assistant to define a ranking of the different products and then to select the product (decision alternative) which corresponds best to the customer’s profile. The customer profile is defined by a variety of criteria which build up the bank’s cross-selling policy. The task of crossselling consists in checking up to which extent which of the customer characteristics
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
359
(cross-selling signals) match the cross-selling criteria. Subsequently, the matching results are aggregated to the cross-selling product ranking. The cross-selling product ranking shows the bank’s sales assistant in which sequence the cross-selling products are to be offered to the customer. Let us express the decision making situation in terms of the decision making model presented in Sect. 2: The set A of decision alternatives is the set of cross-selling products, for instance: A: = {life assurance (LA), accident assurance (AA), credit card (CC), current account (CA)}. The set of decision goals G is the set of criteria under which the cross-selling ranking is performed, for instance: G: = {customer lives close to a branch (LCB), customer already has other assurance policies (OAP), customer is young-aged (CYA), customer is old-aged (COA)}.
7.1 Impact Sets The impact the cross-selling products have on the criteria is defined by the positive and negative impact functions (see Table 2). For simplicity, they are given in the following so-called impact matrix, which subsumes the resulting positive and negative impact sets S and D for each goal. The ranking is calculated based on the impact matrix, which is the result of a knowledge acquisition process with human expert decision makers. The ranking calculation can be extended by defining priorities attached to the criteria. Such attaching priorities enables the bank to define strategies that express on which criteria which emphasis should be put while calculating the ranking. Please note that in real world cross-selling the number of cross-selling products is about 10 and the number of goals (criteria) about 40. That means that a rulebased approach practically cannot be used for the representation of the cross-selling knowledge, because there are approximately 240 · 210 dependencies between the criteria (goals) and the decision alternatives (cross-selling products). Using the decision model presented in Sect. 2, at the most 2 · 10 · 40 statements are necessary. Generally speaking, using DMRG the complexity of the knowledge description and therefore the complexity of the knowledge acquisition process can be decisively reduced and is estimated by O (m · n), where m is the cardinality of the set of alternatives A and n the cardinality of the set of goals (criteria) G. In contrast to
Table 2 Example of an Impact Matrix for Cross-Selling Decisions Goals Alternatives
LCB SLCB
DLCB
OAP SOAP
DOAP
CYA SCYA
DCYA
COA SCOA
DCOA
LA AA CC CA
Ø Ø Ø 0.8
Ø Ø Ø Ø
0.9 0.6 Ø Ø
Ø Ø Ø Ø
0.6 0.9 Ø 0.9
Ø Ø 0.6 Ø
Ø 0.6 0.5 0.6
0.6 Ø Ø Ø
360
R. Felix
this, conventional (fuzzy) rule based approaches require a complexity of O (2m · 2n ). The decisively lower complexity of the knowledge acquisition process is a key issue when introducing fuzzy decision making models into real world application fields like the one presented in this section.
7.2 Real World Application Environment The cross-selling application presented in this section is part of a credit and crossselling software system used by a private customer bank with branches in more than 55 German cities and over 300 consultants using the application. The system has been working for more than five years and is implemented in all direct crediting and cross-selling processes. The quality of the system’s recommended decisions is equal to the decision quality achieved by senior consultants of the bank. The system provides for a standardization of the cross-selling decision quality across the bank’s branches.
8 Application to the Optimization of Production Sequences in Car Manufacturing The production process of cars is performed in various successive steps. In the first step, the car bodies are built and moved to a store before the painting area. Then, the car bodies are fetched from the store, painted in the so-called paint shop and placed in an another store before entering the assembly line. According to the production program, the painted car bodies are fetched from this store one by one and transported to the assembly line, where modules such as undercarriage, motor, wheels, or – depending on the particular car – special equipment such as electric window lifters, sunroofs or air-conditioning systems, are assembled until the entire car assembly process is finished. The assembly is carried out at various assembly stations usually connected in series, where specialized teams of workers perform the different assembly work. Depending on the particular car and its equipment, each car body that passes the assembly line creates a different workload at the individual assembly stations. Figure 3 gives an overview of the structure of the assembly process. For simplicity, the store between the body works and the paint shop is not shown. The sequence with which the car bodies are fetched from the store and entered into the assembly line is crucially important for both the degree of capacity utilization at the assembly stations and the smoothness of the assembly process. It is finally extremely important for both the quality of the cars and adequate production costs. A good sequence is organized in such a way that the utilization of capacity is as constant as possible and as near to 100% as possible, in order to guarantee minimal possible costs of the assembly process with respect to the car to be produced in a planning period. At the same time the sequence must meet a large
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
361
high store assembly area body works paint shop
? assembly station 4061
assembly station 4062
assembly station 4064
assembly station 4065
Fig. 3 Assembly process of cars
number of technical restrictions which define which sequences are admissible in terms of technical conditions, for instance, the availability of tools. These conditions basically describe which equipments can be assemblied after another and in which combination. Examples of such conditions are that for instance approximately only every second car may have a sunroof (distribution goal) and exactly every third car must be a car with an air conditioning system (set point distance goal) and sedans must be sequenced in groups of four cars (group goal) etc. A particular car may have 50 and more different relevant equipment items. To each equipment item a distribution goal, a group goal or a distance goal may be attached simultaneously. This means that there are approximately 50 to 100 optimization goals and more to be met at the same time when calculating the sequence. Of course, the complexity of the optimization problems is exponential in terms of the number of these optimization goals (approximately 50 to 100) and factorial in terms of the number of cars to be sequenced.
8.2 Impact Sets In the sense of DMRG the distribution goals, the distance goals, the group goals etc. are goals g1 , . . . , gn with their positive and negative the impact sets Sg1 , . . . , Sgn and Dg1 , . . . , Dgn defined by the positive and negative impact functions. Let us describe how the impacts are defined based on an example of the optimization goal of the type ‘group’ (see Fig. 4). The value ‘Setpoint’ indicates the set point of the group. The values ‘Min’ and ‘Max’ indicate the minimum and the maximum length of the group respectively. In case of values that are smaller than ‘Min’ the positive impact is high. Between ‘Min’ and ‘Setpoint’ the positive impact decreases until 0. Between ‘Setpoint’ and ‘Max’ the negative impact increases and from the value ‘Max’ on the negative impact becomes the highest possible.
8.3 Calculation of a Sequence The calculation of a sequence is performed iteratively. In every iteration step the current set points are calculated based on the current production status and the
362
R. Felix
Fig. 4 Positive and Negative Impact Sets for an Optimization Goal of Type ‘group’
content of the store. For every optimization goal the impact sets are calculated and the DMRG is invoked selecting the best car for the next position within the sequence. The procedure terminates if either all available cars have been sequenced or if there are conflicts in the requirement to the optimization goals which cannot be resolved based on the content of the store. In such a case the current status of the conflicts and confluences between the goals is reported based on the relationships between the goals specified in Def. 4. The relationships between the optimization goals are then used to readjust the optimization goals.
8.4 Real World Application The principle of the optimization presented in this section is used at more than 20 different factories of more than 30 different car models. It is used for both the pre-calculation of the sequences (planning mode) and the online-execution of the sequence during the production process (execution mode). The planning mode is used for calculating sequences of a package of cars scheduled for a period of approximately one day and is performed two or three days before the production. The execution mode is continuously optimizing the sequence during the assembly process.
9 Application to Image Comparison and Recognition In the field of image analysis fuzzy techniques are often used for extraction of characteristics and for the detection of local image substructures like edges
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
363
(Bezdek 1994). In these approaches the results of the analysis procedure often improve the visual quality of the image but do not lead to an automated evaluation of the content of the image in the sense of image comparison as required in the field of industrial quality inspection, for instance, where the membership to object classes has to be recognized in order to make ok-or-not-ok decisions. Other approaches like the so-called syntactic pattern recognition or utility functions (? , Thomason 1978) refer more directly to a comparison of images but the corresponding algorithms lead to a high computational complexity (polynomial or even exponential with respect to the number of image characteristics) (Felix et al. 1994). What is required in order to make a step forward in the field of image comparison and/or recognition is a comparison of the similarity of two or more images based on algorithms performing the comparison and/or recognition in linear computation time.
9.1 Impact Sets and Image Representation One attempt to meet this requirement is the following: We represent images based on the concept of impact sets (Def.1). The impact sets are derived from the presence and absence of elementary image characteristics in the particular image. Examples of such elementary image characteristics are average grey levels or average grey level gradients. The advantage of this image representation is that it can be derived from any image without knowing the content of it. The representation of images in terms of impact sets leads to the idea of considering the images as being decision goals in the sense of DMRG. If we accept this idea then the positive and negative relationships as defined in Def.4 can be used as a tool for the comparison of images. Positive relationships indicate similarity, negative relationships indicate non-similarity of images. The final recognition decision is made by comparing the value of the relationships for instance with empirically or statistically estimated (learned) threshold values. Another way to generate the final recognition decision is the selection of images with the highest relationship value compared to a reference image that stands for a class of images to be identified. More complex decision and aggregation strategies as used in other applications of DMRG are not needed. In order to integrate location-oriented information in the method of representing images by impact sets we partition the whole image by a grid structure of sectors. Then, for each sector the elementary characteristics (average grey level values and average grey level gradients) are calculated. In this way a vector of sector characteristics is calculated. Every component of the vector corresponds to one sector of the image and consists of values of the characteristics. To every sector an index is attached. The corresponding indices of each vector component with its characteristic indicate a location-oriented distribution of the characteristics within the image. Based on the index again impacts sets can be used in order to represent the image where the index values are the elements and the characteristics of the sectors are the membership values. Please note that the computational complexity of the representation, comparison and recognition of images based for instance on the notion
364
R. Felix
of analogy (Def. 4.4) is O(n m) where n is the number of image sectors and m the number of elementary characteristics under consideration.
9.2 Qualitative and Structural Image Representation, Comparison and Recognition The higher the number of sectors, the more details of the image are represented in a location-oriented way. Therefore, with an increasing number of sectors the representation of the image becomes increasingly structural. The lower the number of sectors the less structural information is considered and the more qualitative becomes the representation of the image. The term “qualitative” refers to image representation, which does not concern information about the location of the characteristics within the image at all: Qualitative analysis refers rather to the presence or absence of image characteristics, to their distribution within the image and to the intensity of the distribution. In contrast to this, structural image representation additionally takes location oriented characteristics into account. In the extreme case of structural analysis, each sector corresponds to a single pixel. The extreme case of qualitative representation is reached when the entire image is contained as the one and only sector.
9.3 Real World Applications Both the structural and the qualitative image analysis have been applied in a number of industrial applications in the field of quality inspection and optical object recognition. The problem independent representation of images opens a variety of possible application fields. For example inspection of the quality of punched work pieces, quality analysis of casings, quality of robot controlled placement and positioning actions in the process automation (Felix et al. 1994) are examples of successful applications. Subsequently we describe the quality inspection of rubber surfaces and the optical recognition of tire profiles for the supervision of the tire flow in the process of tire production.
9.3.1 Recognition of the Quality of Rubber Surfaces Surface classification of rubbers is needed for so called soot dispersion checks of rubber quality (Felix, Reddig 1993). Different qualities of the dispersion process can be observed using a microscope. A high quality dispersion leads to smooth surfaces whereas a lower quality is given when the surface becomes rough. Figure 5 shows an industrial standard classification consisting of six different quality classes. The upper left image in Fig. 5 represents the highest quality class, the lower right
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
365
Fig. 5 Six different quality classes of rubber surfaces and a surface to be classified
the lowest quality class. An example of a surface to be classified is given in the right-hand part of the figure. In this application basic characteristics as average grey levels and grey level gradients are used without any additional location orientation. For the surface classification, location-oriented information is not needed, because of the pure qualitative character of the information being evaluated. Location-oriented information even disturbs here, as two images with different locations of the same kind of disturbances represent identical surface qualities. The application has been implemented for an automotive supplier company producing so-called rubber-metal components.
9.3.2 Recognition of Tire Profiles for Supervision of Tire Flow In a tire plant many thousands of tires are produced a day. Everyday hundreds of different tire types are transported by conveyors through the factory from the production area trough the inspection area to the delivery store. Before entering the delivery store, the tires have to be supervised in order to ensure that every tire is going to be located in the right area in the store. For this purpose every single tire has to be identified for a supervised entry into the store. An important identification criterion is the profile of the tread of the tire. The structural image recognition discussed above has been applied within a tire identification system containing more than 200 different tire types. Figure 6 shows the user interface of the system used for the teach-in procedure of the reference images describing the tire types. In its recognition mode the systems works automatically without any user interaction. It automatically recognizes the tires and gives the information about each recognized tire type to the PLC, which is responsible for the control of the conveyors and the supervision of the material flow. Figure 6 also indicates the grid partitioning the system uses in order to represent and evaluate the structure of the profile of the tire tread. The recognition of the profile of the tire tread is solved using the definition of the positive relationship of type ‘analogy’ (Def. 4). The recognition is positive if the value of the positive relationship reaches a specific threshold.
366
R. Felix
Fig. 6 Profile Recognition of Tires
10 Conclusions In this contribution real world applications of the decision making model called DMRG have been presented. The key issue of this model is the formal definition of both positive and negative relationships between decision goals. The relationships between goals are calculated driven by situation dependent current decision data. The real world applications presented above cover a wide range of different application fields. What the applications have in common is that all of them take advantage of the explicit and situation dependent calculation of relationships between the decision goals. This justifies, in the opinion of the author, the statement that explicit modeling of relationships between goals is crucial for further development of adequate decision making models.
References 1. Bellman RE, Zadeh LA (1970) Decision Making in a Fuzzy Environment. Management Sciences 17: 141–164 2. Bezdek JC (1994) Edge Detection Using the Fuzzy Control Paradigm. Proceedings of EUFIT ’94, Aachen 3. Biswal MP (1992) Fuzzy programming technique to solve multiobjective geometric programming problems. Fuzzy Sets and Systems 51: 67–71 4. Dubois D, Prade H (1984) Criteria Aggregation and Ranking of Alternatives in the Framework of Fuzzy Set Theory. Studies in Manage ment Sciences 20: 209–240 5. Dubois D, Prade H (1992) Fuzzy Sets and Possibility Theory: Some Applications to Inference and Decision Processes. In: Reusch B (ed) Fuzzy Logik - Theorie und Praxis. Springer, Berlin, pp 66–83
Fuzzy Decision Model Based on Relationships between Goals (DMRG)
367
6. Felix R (1991) Entscheidungen bei qualitativen Zielen. Ph.D. thesis, University of Dortmund, Department of Computer Sciences. 7. Felix R, Reddig S (1993) Qualitative Pattern Analysis for Industrial Quality Assurance. Proceedings of the 2nd IEEE Conference on Fuzzy Systems, San Francisco, USA, Vol. I: 204–206 8. Felix R (1994) Relationships between goals in multiple attribute decision making. Fuzzy Sets and Systems 67: 47–52 9. Felix R, Kretzberg T, Wehner M (1994) Image Analysis Based on Fuzzy Similarities in Fuzzy Systems. In: Kruse R, Gebhardt J, Palm R (eds) Computer Science, Vieweg, Wiesbaden 10. Felix R (1995) Fuzzy decision making based on relationships between goals compared with the analytic hierarchy process. Proceedings of the Sixth International Fuzzy Systems Association World Congress, Sao Paulo, Brasil, Vol. II: 253–256 11. Felix R, Kühlen J, Albersmann R (1996) The Optimization of Capacity Utilization with a Fuzzy Decision Support Model. Proceedings of the fiftth IEEE International Conference on Fuzzy Systems, New Orleans, USA, pp 997–1001 12. Felix R (2000) On fuzzy relationships between goals in decision analysis. Proceedings of the 3rd International Workshop on Preferences and Decisions, Trento, Italy, pp 37–41 13. Felix R (2001) Fuzzy decision making with interacting goals applied to cross-selling decisions in the field of private customer banking. Proceedings of the 10th IEEE International Conference on Fuzzy Systems, Melbourne, Australia 14. Freeling ANS (1984) Possibilities Versus Fuzzy Probabilities – Two Alternative Decision Aids.In: Fuzzy Sets and Decision Analysis, pp 67–82, Studies in Management Sciences, Vol. 20 15. Gonzalez RC, Thomason MG (1978) Syntactic Pattern Recognition – An Introduction. Addison Wesley 16. Saaty TL (1980) The Analytic Hierarchy Process. Mc Graw-Hill 17. Zimmermann HJ (1991) Fuzzy Set Theory and its Applications. Kluver/ Nijhoff
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences Marius Calin and Constantin Leonte
Abstract Decision making in environments that do not allow much algorithmic modeling is not easy. Even more difficulties arise when part of the knowledge is expressed through linguistic terms instead of numeric values. Agricultural Sciences are such environments and Plant Breeding is one of them. This is why Fuzzy Theories can be very useful in building Decision Support Systems in this field. This chapter presents a number of Fuzzy methods to be used for decision support in the selection phase of Plant Breeding programs. First, the utilization of the Fuzzy Multi-Attribute Decision Model is discussed and a case study is analyzed. Then, two Fuzzy Querying methods are suggested. They can be used when a database must be searched in order to extract the information to be utilized in decision making. In this context, the concept of Fuzzy Similarity is important. Ways to employ it in database querying are suggested.
1 Introduction 1.1 Why Use Fuzzy Theories in Agricultural Research? Fuzzy Theories provide powerful tools to formalize linguistic terms and vague assertions. It was long ago accepted that in Biological Sciences much knowledge is expressed in this form. Being strongly related to the former, Agricultural Sciences are also fit for such approaches. Many decisions are made not by means of numeric values, but on the basis of previous expertise, non-numeric evaluation, or linguistically expressed similarities. An important branch of Agricultural Sciences is Plant Breeding. The basic aim of a Plant Breeding program is to produce a cultivated crop variety (so-called cultivar) with superior characteristics i.e. higher production levels and/or better accommodation to some environmental factors. Recent developments in Genetics Marius Calin · Constantin Leonte The “Ion Ionescu de la Brad”, University of Agricultural Sciences and Veterinary Medicine of Iasi, Aleea Mihail Sadoveanu no. 3, Iasi 700490, Romania e-mail: {mcalin, cleonte}@univagro-iasi.ro
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
369
370
M. Calin, C. Leonte
brought new approaches in Plant Breeding. Detailed description of different genotypes can now be made and, through Genetic Engineering, direct action at DNA level has become possible. However, large-scale use of Genetic Engineering for obtaining genetically modified crops is a heavily disputed issue. The danger of the unmonitored, widespread use of transgenic seeds was emphasized by numerous scientists and organizations [11, 6]. In this context, organic agriculture and organic food production become more and more popular. These activities encourage the development of organic plant breeding programs. A new concept, Participatory Plant Breeding (PPB), emerged in the last decade. PPB aims to involve [20] many actors in all major stages of the breeding and selection process, from scientists, farmers and rural cooperatives, to vendors and consumers. All these show that classic work procedures in Plant Breeding are far from becoming obsolete. Within a Plant Breeding program many decision situations occur. The parental material consists of different sources of cultivated and spontaneous forms (so-called germplasm). The breeder’s decisions [13] include: which parents to use, how to combine them, what, when, and how to select, what to discard, what to keep. Usually, a large number of individuals are taken into consideration. For each of them, a set of characters is measured in order to be used in the selection process. The number of monitored characters can also be large. As one can see, selection is an important step. Its efficiency is conditioned by many factors. A very important one is the expert’s ability to identify the individuals that fulfill the goals for the most of the considered characters. The difficulty of processing a large amount of numerical data in order to make the selection is evident and the usefulness of software tools to aid decision-making and data retrieving doesn’t need any further argumentation. However, much of the handled information is expressed through linguistic terms rather then crisp, numerical, values. A plant breeding professional will properly deal with qualifiers like “low temperature”, “about 50 grains per spikelet” or “medium bean weight”, despite their imprecise nature. If such terms are to be handled within a software tool, such as a Decision Support System, they must be properly formalized. Fuzzy Sets Theory and Fuzzy Logic provide mathematical support for such formalization.
1.2 Fuzzy Terms, Fuzzy Variables, and Fuzzy Relations To discuss the topics in the subsequent sections, some well-known definitions [23, 17, 16] must be reviewed. Definition 1. A fuzzy set is a mapping
F : U → [0, 1].
(1)
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
371
where for any x ∈ U , F(x) is the membership degree of x in F. The referential set U is said to be the universe of discourse. The family of all fuzzy sets in U can be denoted F (x). Definition 2. For a fuzzy set F on U , the following parts are defined [17]:
- the kernel of F
KER(F) =def {x|x ∈ U ∧ F(x) = 1},
(2)
- the co-kernel of F
COKER(F) =def {x|x ∈ U ∧ F(x) = 0},
(3)
- the support of F
SUPP(F) =def {x|x ∈ U ∧ F(x) > 0}.
(4)
A linguistic expression can be represented as a fuzzy set whose universe of discourse is defined by the referred element (temperature, concentration, weight, etc.). This is why the names fuzzy term and linguistic term are considered to be equivalent. The membership function F(x) describes the degree in which, for any value in U , the linguistic statement may be considered to be true. Several fuzzy terms in the family F (x) can be defined to cover the entire universe of discourse with non-zero values, that is a fuzzy variable [17]. The definition of fuzzy terms and fuzzy variables can rely on various criteria like subjective belief, prior experience or statistical considerations. In fact, defining the appropriate fuzzy variables is a topmost step in approaching a problem through Fuzzy Theories. Examples of fuzzy variables defined on different universes of discourse will be given later on a real world application. Anticipating the discussion, three preliminary remarks can be made. • Most generally, the membership function F(x) can have any shape. In practice there are only a few shapes utilized for defining fuzzy terms. The most frequently used are the trapezoidal fuzzy sets, also named Fuzzy Intervals, and the triangular fuzzy sets or Fuzzy Numbers [21]. • In most cases, a real world problem leads to modeling the linguistic terms through normal fuzzy sets. A fuzzy set F is called normal if there is at least one x ∈ U such that F(x) = 1. • It has been remarked that for defining a fuzzy variable, no more than nine fuzzy terms are generally needed. In many cases three or five are sufficient. This is directly related to the human ability to define qualitative nuances. Linguistic terms and variables are often used in various branches of biological sciences. Definition 3. Let U and V be two arbitrary sets and U ×V their Cartesian product. A fuzzy relation (in U ×V ) is a fuzzy subset R of U ×V . Remark 1. When U = V (the relation is defined between elements of the same universe), R is said to be a fuzzy relation in U. Definition 4. Let R ⊆ X ×Y and S ⊆ Y × Z be two fuzzy relations. The composition of the given fuzzy relations, denoted R ◦ S is the fuzzy relation
372
M. Calin, C. Leonte
R ◦ S(x, z) = sup min(R(x, y), S(y, z))
(5)
y∈Y
Departing from the classic concept of relation of equivalence, an extension to Fuzzy Sets Theory [16] was made, that is the relation of similarity. It aims to model and evaluate the "resemblance" between several members of a universe of discourse. Definition 5. A relation of similarity R in the universe U is a fuzzy relation with the following properties: - reflexivity:
R(x, x) = 1, x ∈ U.
(6)
- symmetry:
R(x, y) = R(y, x), x, y ∈ U.
(7)
- transitivity:
R ◦ R ⊆ R, i.e. R(x, z) ≥ sup min(R(x, y), R(y, z)) x, z ∈ U
(8)
y∈U
Definition 6. A relation R is a relation of tolerance [21] iff it has the properties of reflexivity and symmetry. When the universe U is finite, U = {x 1 , . . . , x n }, a fuzzy relation R can be described through a n × n matrix A R = (ai j )i, j =1,...,n . The elements ai j express the relation between the corresponding pairs of members of the universe U . They have the property 0 ≤ ai j ≤ 1. If R is reflexive, then aii = 1, i = 1, . . ., n. If R is symmetrical, then ai j = a j i , i, j = 1, . . . , n. A classical relation of equivalence allows the partition of the universe in classes of equivalence. In the case of fuzzy relations of similarity, defining the corresponding classes of similarity is possible, although handling them proves difficult. Unlike classes of equivalence that are disjoint,classes of similarity can overlap. Moreover, when defining a fuzzy relation of similarity in a real world application, it is quite difficult to fulfill the property of transitivity. This is why attempts were focused on defining fuzzy similarity, rather then fuzzy equivalence [16, 21, 25]. One of these definitions [25] is used in a subsequent section to build a fuzzy database querying method dedicated to assist decision in the selection phase of a plant breeding program. Departing from Definition 5, another way to define a fuzzy relation of similarity is to build it as the limit of a sequence of fuzzy relations [16]. Corresponding classes of similarity can be then defined. The departing point is a reflexive and symmetrical fuzzy relation. In practice, the actual problem is to define this relation in a convenient way. In Sect. 3.3 a method is presented to express the matching degree of two arbitrary intervals through a fuzzy relation having the properties of reflexivity and symmetry, that is a fuzzy relation of tolerance. This matching degree can be used in assessing the “equivalence” of interval valued characters between a prototype plant and other varieties of the same kind.
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
373
2 Applying Fuzzy Decision in Plant Breeding 2.1 The Fuzzy Multi-Attribute Decision Model A decision problem that can be approached using the Multi-Attribute Decision Model (MADM) is expressed by means of an m × n decision matrix:
X1 X2 ... Xm
C1 x11 x21 ... xm1
C2 x12 x22 ... xm2
... ... ... ... ...
Cn x1n x2n ... xmn
The lines of the decision matrix stand for m decision alternatives X 1 , . . . , X m that are considered within the problem and the columns signify n attributes or criteria according to which the desirability of an alternative is to be judged. An element x i j expresses in numeric form the performance rating of the decision alternative X i with respect to the criterion C j . The aim of MADM is to determine an alternative X ∗ with the highest possible degree of overall desirability. There are several points of view in designating X ∗ . The maximin and maximax methods are two classical approaches (short description and examples in [7]). Classic MADM works with criteria that allow exact, crisp, expression (e.g. technical features of some device). But within an MADM, the decision-maker could express the criteria C j as linguistic terms, which leads to the utilization of fuzzy sets to represent them. Thus, a Fuzzy MADM is achieved and the element x i j of the decision matrix becomes the value of a membership function: it represents the degree in which the decision alternative X i satisfies fuzzy criterion C j . To find the most desirable alternative, the decision-maker uses an aggregation approach, which generally has two steps [24]: • Step 1. For each decision alternative X i , the aggregation of all judgements with respect to all goals. • Step 2. The rank ordering of the decision alternatives according to the aggregated judgements. The aforesaid maximin and maximax methods for classic MADM are such aggregation approaches that use the min and max operators. For example, in the maximin method in step 1, the minimum value form each line of the decision matrix is picked, then, in step 2, the maximum of the previously found m values is determined. Moving towards the Fuzzy MADM domain, Bellman and Zadeh [1] introduced the Principle of Fuzzy Decision Making stating the choice of the “best compromise alternative” applying the maximin method to the membership degrees expressed by x i j . There are numerous other methods that use diverse aggregation operators, other than min and max. Not all of them were particularly intended for solving MADM problems, but many can be used in step 1. Authors enumerate different types of
374
M. Calin, C. Leonte
averaging operators [7, 24] appropriate for modeling real world situations were people have a natural tendency to apply averaging judgements to aggregate criteria. A special mention must be made for the ordered weighted averaging (OWA) operators. These were introduced by Yager [22], and have the basic properties of averaging operators but are supplementary characterized by a weighting function. A simple and frequently used method to perform step 1 of the MADM, the aggregation for each decision alternative X i , is the calculation of an overall rating Ri using relation (9): Ri =
n
w j x i j , i = 1, . . . , m.
(9)
j =1
where each w j is a weight expressing the importance of criterion C j . Having these ratings computed, step 2 simply consists of sorting them in descending order.
2.2 The Analytic Hierarchy Process The problem in calculating Ri is to evaluate the weights w j . Usually they express a subjective importance that the decision-maker assigns to the respective criteria. A fixed weight vector is difficult to find as human perception and judgment can vary according to information inputs or psychological states of the decision maker [7]. For the evaluation of the subjective weights w j , Saaty [18] proposed an intuitive method: AHP (Analytic Hierarchy Process). It is based on a pair wise comparison of the n criteria. The decision maker builds a matrix A = [ai j ]n×n whose akl elements are assigned with respect to the following two rules: • Rule 1. akl = a1lk , k = 1, . . . , n, l = 1, . . . , n; • Rule 2. If criterion k is more important then criterion l, then assign to criterion k a value form 1 to 9 (see in Table 1 a guide for assigning such values). The weights w j are determined as the components of the normalized eigenvector corresponding to the largest eigenvalue of the matrix A. This requires solving the equation det[ A − λI ] = 0 which might be a difficult task when A is a large Table 1 Saaty’s scale of relative importance Intensity of relative importance
Definition
1 3 5 7 9 2, 4, 6, 8
equal importance weak importance strong importance demonstrated importance absolute importance intermediate values
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
375
matrix, that is, many decision criteria are considered. An alternative solution [7] that gives a good approximation of the normalized eigenvector components, but is easier to apply, consists of calculating the normalized geometric means of the rows of matrix A.
2.3 Applying the Fuzzy MADM within a Plant Breeding Program As shown above, every plant-breeding program involves a selection phase. When choosing the parents for the next stages, a number of characters are observed and measured. The result of these measurings is a list of numeric values: one row for each plant and one column for each character. Afterwards, this list must be examined in order to decide which plants to select. It is a large one, because the number of examined individuals is usually great and numerous characters are measured. This is why a decision support tool can be useful and MADM can underlie it. Moreover, the utilization of fuzzy techniques is appropriate as the specialist often deals with linguistic values. This leads to the Fuzzy MADM. The assisted selection methodology [4] has seven steps that are described further on. The application of each step will be illustrated through an example that consists of selecting the parents from a set of 150 plants of pod bean, variety Cape. The study aimed to evaluate the potential of this cultivar in view of adapting it to new site factors. Ten characters were measured for each plant: height, number of branches, number of pods, average length of pods, average diameter of pods, number of beans, average number of beans per pod, weight of beans, average weight of beans per pod, weight of 1000 beans. Table 2 contains the results obtained for only 10 out of 150 plants. It suggests the difficulty of dealing with the entire list of 1500 entries.
Table 2 Results of measuring some characters of 10 plants of the pod bean variety Cape ID H (cm) Brs P/pl Avg. p.l. Avg. p.d. (cm) (cm)
B/pl
Avg. B/pd
Bw/pl Avg. Bw/pd
W1000 (g)
1 2 3 4 5 6 7 8 9 10
32.00 16.00 40.00 40.00 102.00 58.00 43.00 27.00 30.00 39.00
3.20 2.00 3.07 3.33 4.08 3.22 2.52 3.85 1.87 3.90
10.20 4.20 13.30 11.00 30.50 15.80 12.00 7.50 10.20 10.50
318 262 332 275 299 272 279 278 340 269
29 43 35 39 41 46 32 31 38 32
5 7 7 6 10 8 8 6 7 7
10 8 13 12 25 18 17 7 16 10
9.80 8.12 9.84 9.25 11.04 10.33 8.64 9.85 8.18 10.00
0.78 0.68 0.76 0.75 0.73 0.75 0.64 0.85 0.56 0.88
1.02 0.52 1.02 0.91 1.22 0.87 0.70 1.07 0.63 1.05
H Height; Brs Branches; P/pl Pods per plant; Avg.p.l. Average pod length; Avg.p.d. Average pod diameter; B/pl Beans per plant; Avg. B/pd Average beans per pod; Bw/pl Beans weight per plant; Avg. Bw/pd Average beans weight per pod; W 1000 Weight of 1000 beans.
376
M. Calin, C. Leonte 50% 45.3% 40% 30%
29.3%
20% 1.3%
0.0%
1.3%
0% 0–
9.3%
13.3%
10%
5
7 7–
8.
5
0
5 8.
–1
1.
1 0–
5
3
5 1.
–1
4.
1 3–
5
6
5 4.
–1
7.
1 6–
1 1 1 1 1 Average length of pods per plant (cm)
Fig. 1 The variability of the average length of pods per plant
2.3.1 Defining the Linguistic Variables Linguistic variables can be defined through various means. One method is using the information obtained from statistical study. Such studies are always performed within plant breeding programs. In the case study, ten fuzzy variables were defined, one for each observed character. The maximum number of fuzzy terms within a fuzzy variable was five. The fuzzy terms were modeled using trapezoidal fuzzy sets. The following example describes the fuzzy variable length_of_pods that was defined by examining the results of the statistical study. The fuzzy variable regards the (average) length of pods per plant and comprises three fuzzy terms: low, medium, high. Figure 1 shows the variability chart of the character average length of pods per plant (cm). Figure 2 illustrates the corresponding fuzzy variable length_of_pods. The construction of these linguistic variables is advisable even though they may not always be used by the plant breeder. For some selection criteria, other linguistic evaluations might be preferred. 1 0.8 Low
0.6
Medium 0.4
High
0.2 0 5
6 7 8 9 10 11 12 13 14 15 Average length of pods per plant (cm)
Fig. 2 Representation of the fuzzy variable length_of_pods
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
377
2.3.2 Stating the Selection Criteria According to the situation, the selection criteria can be stated using fuzzy terms or crisp criteria. However, as the case study shows, there could be criteria that seem to have a crisp definition, but must also be fuzzyfied. The selection criteria used in the case study are enumerated further on. • Height: normal. This linguistic value was chosen because the statistical study did not reveal some special correlation between height and the other characters. • Number of branches: great. A great number of branches determines a great number of pods. • Number of pods: great. This being a pod bean variety, this is the most important criterion. • Average length of pods: medium. Further processing requirements and marketing reasons imposed this linguistic value. • Average diameter of pods: medium. Same reasons as above. • Number of beans: medium. Even though it’s a pod bean variety, beans are important for taste and nutritional parameters. Too small or too large number of beans can lower the overall quality. • Average number of beans per pod: medium. Same reasons as above. • Weight of beans: approximately 15–20 g. The selection criterion was in this case defined as an interval rather then using a fuzzy term of the fuzzy variable Weight of beans. This choice was determined by the results acquired [14] in studying the statistical correlation between this character and the others. Figure 3 shows such a correlation. A fuzzyfication of this criterion was also made to allow degrees of variable confidence on both sides of the interval 15–20. The corresponding trapezoidal fuzzy set is shown in Fig. 4. 350
Weight 1000 beans (g)
300 250 200 150 y = –0.4963x2 + 19.055x + 118.93 R2 = 0.5314
100 50 0 0
5
10 15 20 25 Weight of beans/plant (g)
30
35
Fig. 3 Correlation between the weight of beans per plant and the weight of 1000 beans
378
M. Calin, C. Leonte 1 0.8
Small
0.6
Medium
0.4
Large
0.2
15 - 20
0 0
1
5
10
15
20
25
30
Weight of beans per plant (g)
Fig. 4 Fuzzyfication of the interval 15–20
• Average weight of beans per pod: approximately between 1 and 1.5 g. Similar reasons as before determined the choice of a fuzzyfied interval. • Weight of 1000 beans: big. This parameter ensures a high quality of beans; this is why high values are desirable.
2.3.3 Applying the Selection Criteria By applying the selection criteria expressed as fuzzy sets, one obtains the values of the corresponding membership functions, that is the measure in which each plant satisfies each criterion. In Table 3 some results for the individuals within the case study are shown. The same plants as in Table 2 were chosen. Table 3 Results of applying the selection criteria ID
H
Brs
P/pl
Avg. p.l. Avg. p.d.
B/pl
Avg. B/pd
Bw/pl
Avg. Bw/pd
W1000
1 2 3 4 5 6 7 8 9 10
0.33 0. 0.67 0. 0. 0. 1. 1. 0. 1.
0. 0. 0. 0. 1. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 1. 1. 0. 1. 0.
1. 0.75 1. 1. 1. 1. 1. 1. 0.79 1.
1. 0.4 1. 1. 0. 0.8 1. 1. 1. 1.
1. 0.5 1. 1. 0.96 1. 0.76 1. 0.44 1.
0.04 0. 0.66 0.2 0. 1. 0.4 0. 0.04 0.1
1. 0. 1. 0.64 1. 0.48 0. 1. 0. 1.
0.36 0. 0.64 0. 0. 0. 0. 0. 0.8 0.
1. 1. 1. 1. 1. 1. 1. 0.5 0.6 0.2
H Height; Brs Branches; P/pl Pods per plant; Avg.p.l. Average pod length; Avg.p.d. Average pod diameter; B/pl Beans per plant; Avg. B/pd Average beans per pod; Bw/pl Beans weight per plant; Avg. Bw/pd Average beans weight per pod; W 1000 Weight of 1000 beans.
2.3.4 Calculating the Weights of the Selection Criteria One of the most time consuming steps of this methodology is the two by two comparison of the selection criteria in view of determining the elements of Saaty’s
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
379
matrix from which the weights expressing the relative importance of each criterion are subsequently computed. Table 4 illustrates the Saaty’s matrix for the selection criteria in the case study. Next to matrix are the corresponding calculated weights. As one can see, the most important character is thenumber of pods per plant. On the next two places are other important characters: the diameter of pods and the length of pods. Thus, the calculus gives a numerical expression to some obvious intuitive ideas. 2.3.5 Computing the Overall Ratings for Decision Alternatives The overall ratings are computed according to relation (9) defined earlier. Thus, a final score will be assigned to each studied plant. The maximum possible score would be equal to the sum of the weights. For the studied pod bean variety, this calculation involves the membership degrees shown partially in Table 3 and the weights shown in Table 4. Relation (9) becomes Ri =
10
w j x i j , i = 1, . . . , 150.
(10)
j =1
The maximum score that one plant can obtain is, in this case study, 3.919442. Table 4 The elements Saaty’s matrix and the computed weights of importance Chr H Brs P/pl Avg. p.l. Avg. p.d. B/pl Avg. B/pd Bw/pl Avg. Bw/pd W1000 Weight (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
1 3 7 5 5 1 3 3 3 3
1/3 1 5 3 3 1/5 1/5 1/3 1/3 1
1/7 1/5 1 1/4 1 1/3 1 1/7 1/7 1/7
1/5 1/3 4 1 1 1/5 1/2 1/3 1/2 1/3
1/5 1/3 1 1 1 1/5 1/3 1/3 1/2 1/5
1 5 3 5 5 1 1 1 1 1
1/3 5 1 2 3 1 1 1/3 1/3 1/3
1/3 3 7 3 3 1 3 1 1 1
1/3 3 7 2 2 1 3 1 1 1/3
1/3 1 7 3 5 1 5 1 3 1
0.10502 0.39971 1 0.58489 0.73633 0.16855 0.35437 0.18050 0.21848 0.17152
Chr Character; H Height; Brs Branches; P/pl Pods per plant; Avg.p.l. Average pod length; Avg.p.d. Average pod diameter; B/pl Beans per plant; Avg. B/pd Average beans per pod; Bw/pl Beans weight per plant; Avg. Bw/pd Average beans weight / pod; W 1000 Weight of 1000 beans.
2.3.6 Ranking the Individuals The final hierarchy is determined by sorting the overall ratings in descending order. The top ten final results for the case study are listed in Table 5. On the first place is the individual having the ID number 110 that attained a final score of 3.57, that is 91.12% of the maximum possible score.
380
M. Calin, C. Leonte Table 5 The final hierarchy (top 10 places) Place
ID
Score
Percent
1 2 3 4 5 6 7 8 9 10
110 39 28 117 64 31 96 21 149 51
3.57 3.37 3.01 2.95 2.64 2.63 2.61 2.57 2.52 2.51
91.12% 85.86% 76.81% 75.16% 67.43% 67.04% 66.59% 65.55% 64.34% 64.16%
2.3.7 Selecting the Parents The plant-breeding expert will choose as many individuals from the top of the final classification as he considers appropriate. A graphical representation of the results can help in making a good decision. Such a diagram for the studied pod beans is shown in Fig. 5.
2.4 Conclusion As shown in this section, the selection phase of a plant-breeding program can be viewed as a Fuzzy Multi-Attribute Decision problem. A methodology to apply this model was proposed. Its main advantages consist in the possibility of stating the
3.50
Score values
3.00
3.919422 Maximum score obtainedt: 3.571389 (91.12%)
100%
75%
2.50 2.00
50%
1.50 1.00
25%
0.50 0.00
0% Decision alternatives SCORE
MAXIMUM
%
Fig. 5 The diagram showing the final ranking of the decision alternatives
Percent from maximum score
4.00
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
381
selection criteria through linguistic terms. Some selection criteria could be expressed by intervals instead of linguistic values. However, these intervals should also be transformed in fuzzy sets to ensure a better utilization. The results of the statistical study performed on the sets of plants involved can be used to define the fuzzy variables, as such statistical studies are present in all plant breeding programs.
3 Similarity Measures in Fuzzy Database Querying 3.1 Why Employ Fuzzy Database Querying in Plant Breeding? In many decision situations large amounts of data must be inspected in order to extract information that would suggest the course of action. An usual IT approach is to store the respective data in a relational database (RDB). Its exploitation consists in applying queries to select those records that match the requirements in a given context. The resulting set of records, the dynaset, would play the role of decision alternatives that are to be ranked in order to determine the best solution. However, one can point out some limits of the conventional RDB approach. In a conventional query, the implicit assumption is that the searched values are known exactly: the search parameters must be expressed in a crisp form, such as strings, values or intervals. Moreover, at the simplest level, query criteria are connected through logical AND, meaning that a record must match all of them in order to be selected. At least two drawbacks arise from here. On the one hand, when referring to different attributes, the database user often deals with linguistic terms (like much, approximately, etc.) instead of crisp values. On the other hand, a crisp formulation of the search criteria might lead to omissions in the dynaset. For example, one applies a query stating: (attribute) A between 0.7 and 1.4. A record having A = 0.65 would be rejected, even though this value could be quite satisfactory, considering other points of view that prevail. For example, attribute values B and C could make this record very suitable for the overall purpose. In the decades that followed the inception of Fuzzy Theories due to Zadeh [23], important research efforts were made towards involving this theoretical support in modeling different facets of vagueness in database management. Advances were made in the general framework, named fuzzy database management [9], which covered various scopes: fuzzy querying of classic databases [10], storing and handling linguistic values, generalizing the RDB model [15], indexing of fuzzy databases [2]. An important part of these efforts concentrated upon introducing and implementing query techniques for intelligent data retrieving (e.g. [10, 12]). As literature shows, fuzzy querying is almost always linked to other two important topics: fuzzy decision and fuzzy similarity. These concepts appear together in many research projects engaged in intelligent data manipulation. Data retrieving is often involved in selecting decision alternatives, as shown in the preceding sections. Matching between search criteria and stored data, one or both being represented through fuzzy sets, means evaluating degrees of similarity. Moreover, fuzzy similarity and fuzzy decision often interfere to each other [21].
382
M. Calin, C. Leonte
In this context, the procedure introduced in the previous section can be regarded as a querying method using fuzzy criteria that applies to a database table containing punctual values. The next sections introduce two query methods that use linguistic criteria and apply to attributes stored as numeric intervals.
3.2 A Fuzzy Querying Method for Interval Valued Attributes Many quantitative characters of different cultivars are officially recorded as intervals or linguistic values. Even those expressed as single numbers represent averages rather then exact values. For instance, the standard length of leaves is officially recorded for any variety of vine. Examples: Grenache Noir - short that is 11–13 cm; Chenin Blanc - medium that is 16–18 cm. When designing a plant breeding program, the objectives are defined in a rather linguistic manner: plants having between 60 and 80 cm in height at a medium growth rate, producing at least 5 kg of fruits, etc. These goals are expressed according to different factors (costs, quality level, etc.) and rely on previous expertise, statistical observation, technical literature and others. Again, linguistic terms can be expressed as fuzzy variables. Yet, even exact intervals and single values are to be put as fuzzy numbers of trapezoidal and triangular shape. This is because there are always ranges of variable confidence on both sides of the declared range or number. In fact, levels of approximation are often present when expressing biological characteristics. Such a situation could impose applying a linguistic query whose parameters are expressed as fuzzy intervals to a database with crisp attribute values. A crisp interval I can be viewed as a special case of fuzzy interval having the KER(I ) = SUPP(I ) that is a rectangular membership function. Therefore, the matching between a query criterion and an actual value in the database can be expressed as the similarity of two fuzzy sets [5]. There are many ways to define fuzzy similarity. One of them is the model used within the FuzzyCLIPS1, an extended version of the CLIPS2 rule-based shell for expert systems. FuzzyCLIPS [25] uses two distinct inexact concepts: fuzziness and uncertainty. Fuzziness means that the facts can be not only exact statements, but linguistic assertions represented by fuzzy sets. Furthermore, every fact (fuzzy or non-fuzzy) has an associated Certainty Factor (CF) that expresses the degree of certainty about that piece of information. The value of CF lies between 0 and 1, where 1 indicates complete certainty that the fact is true. A FuzzyCLIPS simple rule has the form:
1
FuzzyCLIPS was developed by the Knowledge Systems Laboratory, Institute for Information Technology, National Research Council of Canada.
2
CLIPS was developed by the Artificial Intelligence Section, Lyndon B. Johnson Space Center, NASA.
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
if A
then C
383
CFr
A CFf −−−−−−−−−−−−−−−−−−−−− C
CFc
where A is the antecedent of the rule, A is the matching fact in the fact database, C is the consequent of the rule, C is the actual consequent, CFr is the certainty factor of the rule, CFf is the certainty factor of the fact, CFc is the certainty factor of the conclusion. Three types of simple rules are defined in FuzzyCLIPS: • CRISP_ rules that have a crisp fact in the antecedent; • FUZZY_CRISP rules that have a fuzzy fact in the antecedent and a crisp one in the consequent; • FUZZY_FUZZY rules that have fuzzy facts both in the antecedent and in the consequent. To perform a query with fuzzy criteria, the model of FUZZY_CRISP rule was chosen. In a FUZZY_CRISP rule: • A and A are fuzzy facts, not necessarily equal, but overlapping; • C is equal to C; • CFc is calculated using the following relation:
CFc = CFr · CFf · S.
(11)
In relation (11), S is the measure of similarity between the fuzzy sets FA (determined by the fuzzy fact A) and FA (determined by the matching fact A ). Similarity is based on the measure of possibility P and the measure of necessity N. These terms have the following definitions: For the fuzzy sets FA : U → [0, 1], and FA : U → [0, 1] S = P(FA |FA ) S = (N(FA |FA ) + 0.5) · P(FA |FA ))
if N(FA |FA ) > 0.5, otherwise,
(12) (13)
384
M. Calin, C. Leonte
where P(FA |FA ) = max(mi n(FA (x), FA (x))) ∀(x ∈ U ),
(14)
N(FA |FA ) = 1 − P(F A |FA ).
(15)
F A is the complement of FA : F A (x) = 1-FA (x), ∀(x ∈ U ).
(16)
A search criterion may be put as a FUZZY_CRISP simple rule. For example: if (A between k1 and k2 ) then (A_OK) CFr 0.8 . In this example • the antecedent A between k1 and k2 is represented by a fuzzy set having the kernel [k1 , k2 ] and an user-defined support (s1 , s2 ); it plays the role of A in the FuzzyCLIPS model (see for example Fig. 6); • the consequent A_OK is a crisp fact, meaning that, from the user’s point of view, the mentioned range of A meets the requirements; • the value of the certainty factor of the rule means that, when necessary, a value smaller than 1 can be assigned to CFr to express some doubt about the validity of the rule. However, the default value should be CFr = 1. The database containing the recorded specifications provides corresponding intervals that play the role of A , the fact involved in the matching process. Consequently, for the fact A , a rectangular membership function should be used. This fact has CFf = 1. To illustrate, Fig. 6 shows the (database) interval [a1 , a2 ] represented as a rectangular fuzzy set. The result of this FUZZY_CRISP inference is the fact A_OK with a calculated CFc meaning that, with respect to attribute A, the record meets the requirements to the extent expressed by CFc . The value of CFc is computed according to relation (11). As discussed, in this relation CFr = 1, and, except for some special situations, CFf = 1. Therefore, the most important factor in the calculation of C Fc
1 Rule Database
0
Fig. 6 Graphical illustration for Case 1
s1
a1 k1
a2 k2
s2
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
385
is S, which is in fact the only necessary factor in most situations. Hence, for the sake of simplicity, the implicit value CFr = 1 will be considered. The overall conclusion is that, following the model of FUZZY_CRISP rules, the matching degree of a fuzzy interval query parameter to the corresponding interval stored in the database can be expressed by calculating the measure of similarity S between the respective trapezoidal and rectangular fuzzy sets. Moreover, as a result of the search process, this method provides computed degrees of similarity between each query parameter and the corresponding database attribute value. This additional information can be used in subsequent evaluation and ranking of alternatives within a Decision Support System as shown in a previous section. The computing procedure described by relations (12) to (16) could seem quite laborious, but a closer look shows that the calculations need to be performed only in a reduced number of cases. This is shown in the following case discussion. Case 1 • Rule: if (A between k1 and k2 ) then (A_OK) CFr = 1 • Fact in the database: a1 ≤ A ≤ a2 with [a1 , a2 ] ⊆ [k1 , k2 ] The calculation of S does not actually need to be performed. This case corresponds to the situation of a classical query: the kernel of the fuzzy fact FA from the database is included in the kernel of the rule (query) FA , which means that the interval in the database entirely fulfils the conditions of the query. Thus, the value of S can be directly set to 1. Case 2 • Rule: if (A between k1 and k2 ) then (A_OK) CFr = 1 • Fact in the database: a1 ≤ A ≤ a2 , with k1 < a1 < k2 and k2 < a2 <
k2 +s2 2
Taking into account the position of a2 and using relation (14), geometrical considerations lead to P(F A |FA ) < 0.5 . Then, from (15), N(FA |FA ) > 0.5, which entails from (12) S = P(FA |FA ). From the position of a1 it results P(FA |FA ) = 1,
386
M. Calin, C. Leonte
that is S=1. Finally, CFc = CFr · CFf · S = 1 . The numerical example shown in Fig. 7 illustrates the considered situation, that is: P(F A |FA ) = 0.33 ⇒ N(FA |FA ) = 0.66 > 0.5 ⇒ S = P(FA |FA ) = 1 In Case 2 the kernel of the fuzzy set FA is not included in the kernel of FA , but the “deviation” is very slight and it does not affect the eligibility of the record. This situation is expressed by the calculated value S = 1. One can see that in a “classic” crisp querying process, such a record would have been rejected because the actual interval in the database did not match the query criterion. Case 3 • Rule: if (A between k1 and k2 ) then (A_OK) CFr = 1 • Fact in the database: a1 ≤ A ≤ a2 , with k1 < a1 < k2 and
k2 +s2 2
≤ a2 < s2
Taking into account the position of a2 and using relation (14), geometrical considerations lead to P(F A |FA ) > 0.5. Then, from (15), N(FA |FA ) < 0.5, which entails from (13) S = (N(FA |FA ) + 0.5) ∗ P(FA |FA ). From the position of a1 we have:
1 Rule Database
0
Fig. 7 Numerical example for Case 2
0.7
1 0.9
1.45 1.6 1.4
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
387
P(FA |FA ) = 1, that is S<1. Finally, CFc = CFr · CFf · S < 1 . The numerical example shown in Fig. 8 illustrates the considered situation, that is: P(F A |FA ) = 0.66 ⇒ N(FA |FA ) = 0.33 < 0.5 ⇒ ⇒ S = (N(FA |FA ) + 0.5)∗ P(FA |FA ) = 0.83 Again, the kernel of the fuzzy set FA is not included in the kernel of FA which means that in a crisp querying process, the database record would have been rejected. Instead, the calculated value of S (in the numerical example, S = 0.83) can be interpreted as the degree in which the query criterion is satisfied by the actual interval in the database. Case 4 • Rule: if (A between k1 and k2 ) then (A_OK) CFr = 1 • Fact in the database: a1 ≤ A ≤ a2 , with s1 < a1 < k2 and a2 > s2 Taking into account the position of a2 (shown in Fig. 9) and using relation (14), geometrical considerations lead to P(F A |FA ) = 1. Then, from (15), N(FA |FA ) = 0, and from (13)
1 Rule Database
0
Fig. 8 Numerical example for Case 3
1
0.7 0.9
1.55 1.6 1.4
388
M. Calin, C. Leonte
1 Rule Database
0
s1
a1
s2
a2
k2
k1
Fig. 9 Graphical illustration for Case 4
S = (N(FA |FA ) + 0.5) ∗ P(FA |FA ) . From the position of a1 : P(FA |FA ) = 1 . Making the respective replacements we have S = 0.5, and finally CFc = CFr · CFf · S = 0.5. As well as Case 1, this is another situation that must be treated in the conventional manner. Otherwise, the calculated value of S could enforce a misleading conclusion, because one might consider that S = 0.5 represents an acceptable value. In fact, the database record must be rejected (as it would have been in a conventional query, too) without performing any computation for S. The discussed cases show that the validity of a criterion can be judged according to the following three situations which are mutually exclusive:
• KER(FA ) ⊆ KER(FA )
The criterion matches in the crisp sense; set S to 1. • KER(FA ) ⊆ SUPP(FA ) and KER(FA ) ⊂KER(FA ) Matching in the fuzzy sense; compute the value of S using relations (12) to (16). The criterion does not • KER(FA )∩ COKER (FA ) = ∅ match; reject the record.
The preceding discussion has only analyzed those cases that occur at one end of the matching intervals, but the final conclusions are valid for any combined situations occurring at both ends.
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
389
3.3 A Fuzzy Relation for Comparing Intervals This section presents a method [3] to express the matching degree of two arbitrary intervals through a fuzzy relation having the properties of reflexivity and symmetry, that is a fuzzy relation of tolerance. This method could be useful in situations in which one has to compare two entities whose attributes are (partially) expressed through intervals, and to give a measure of their matching degree. Such a real world situation can occur in plant breeding or in other agricultural disciplines when two cultivars must be compared in order two express some sort of resemblance. For example, one wishes to search in a database for equivalents of some given cultivar (that already exists or is designed to be obtained through breeding). Those characters expressed as intervals can be compared using the described procedure. Parts of the method can also be used in different other situations. Numerous methods for comparing intervals are currently known. The following approach is based on geometrical remarks regarding the two partial aspects of the topic: the length comparison and the measure of concentricity. Each of them leads to a reflexive and symmetrical fuzzy relation. The overall measure of intervals matching is obtained by combining these two partial measures. Each of them can also be used separately wherever necessary.
3.3.1 Matching Let [a, b] and [c, d] be two intervals that must be compared in order to compute a measure of their matching. Let this matching degree be match([a, b]; [c, d]). The aim is to define match as a non-dimensional measure having the following properties: Property 1. 0 ≤ match([a, b]; [c, d]) ≤ 1,
(17)
match([a, b]; [c, d]) = 0
(18)
where
means no matching at all, and match([a, b]; [c, d]) = 1 iff a = c and b = d
(19)
means a perfect match . Property 2. (reflexivity). match([a, b]; [a, b]) = 1
(20)
390
M. Calin, C. Leonte
Property 3. (symmetry). match([a, b]; [c, d]) = match([c, d]; [a, b])
(21)
Definition 7. The matching degree is calculated as: match([a, b]; [c, d]) = r ound([a, b]; [c, d]) · conc([a, b]; [c, d])
(22)
where • round([a, b]; [c, d]) measures the degree in which the lengths of the two intervals match, • conc([a, b]; [c, d]), expresses the degree of concentricity of the intervals. Remark 2. The name round stands for roundness, and conc stands for concentricity. The partial measures round and conc are defined further on.
3.3.2 Roundness In relation (22), round denotes the roundness of a rectangle whose sides have the lengths b −a and d −c, respectively. The measure roundness of a shape was used [8] in literature within image processing applications. For two dimensions this measure is defined as follows. Let R be a two-dimensional (2D) region. The roundness of region R can be computed using the following procedure: • consider a reference circle C2D having – the centre in the centre of gravity of region R ; - the radius r =
ar ea(R) π
(23)
• compute
ar ea(C2D ∩ R) + ar aa(C2D ∩ R) r oundness2D (R) = max 1 − ,0 ar aa(R)
(24)
• normalize to the interval [0, 1] When region R is a rectangle, denoted ABCD, two special cases appear. These are shown in Fig. 10 where α1 = ∠EGP and α2 = ∠JGQ. There are two more cases that correspond to Fig. 10, in which AB > BC, but the lines of reasoning are the
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences A
B
F
K
J
A
P
E D
α1
G
α1
B K
α2 P
L
H Q
F
391
G
E
L
D
C
C
Fig. 10 Computation of the roundness of rectangle ABCD
same. Figure 11 shows the special case in which the rectangle ABCD is a square; as it can be expected, this is the case of maximum roundness. Denoting roundness(ABCD) the roundness of the rectangle ABCD, and denoting X=
ar ea(C2D ∩ ABCD) + ar ea(C2D ∩ ABCD) , ar ea(ABCD)
(25)
relation (24) for roundness becomes r oundness(ABCD) = max(1-X, 0).
(26)
According to Fig. 10, two distinct cases appear. Case a (left side of Fig. 10): ar ea C2D ∩ ABCD = 2 · ar ea(EFP), ar ea C2D ∩ ABCD = ar ea(ABCD)-ar ea(C2D ) + 2 · ar ea(EFP), 4 · ar ea(EFP) , X= ar ea(ABCD) ar ea(ABCD) ar ea(EFP) = (α1 − sin α1 cos α1 ). π 4 X = (α1 − sin α1 cos α1 ). π A
(27) (28) (29) (30) (31)
B
G
Fig. 11 Square is “the roundest rectangle”
D
C
392
M. Calin, C. Leonte
Case b (right side of Fig. 10). In this case we obtain X=
4 (α1 − sin α1 cos α1 + α2 − sin α2 cos α2 ). π
(32)
Then, α1 and α2 must be computed. cos α1 =
GP AB GP = = EG r adi us(C2D ) 2
1√ π = π p, AB · BC 2
(33)
where p=
AB . BC
(34)
Thus,
1√ πp , 2
(35)
1 π α2 = arccos . 2 p
(36)
α1 = arccos and, through similar calculations,
From Table 6, where a few computed values of roundness are listed, some conclusions can be drawn: i) roundness(ABCD) becomes zero when the longer side becomes five times bigger than the shorter one, approximately; ii) roundness(ABCD) reaches its maximum, roundness(ABCD) ≈ 0.82, for the square; the normalization in [0, 1] is shown in the last column of Table 6. Another important conclusion comes from relations (31), (32), (35) and (36): iii) roundness(ABCD) only depends on the ratio between the height and width of the rectangle. The component round ([a, b]; [c, d]) in relation (22) will be expressed using the roundness of the rectangle that has the two dimensions equal to the lengths of the compared intervals, [a, b] and [c, d] respectively. Conclusion iii) drawn above is important because it shows the symmetry of round([a, b]; [c, d]). The other property of round([a, b]; [c, d]) must be reflexivity. It comes from ii) and the consequent normalization, which entails r ound([a, b]; [c, d]) = 1 iff a = c and b = d. This is why a normalization in [0, 1] of roundness was necessary.
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
393
Table 6 Computed values of the roundness of a rectangle and their normalization in [0, 1] p = AB/BC
1/p
α1 (rad.)
α2 (rad.)
round
normalization
0.2 0.21 0.66 0.78 0.8 1 1.25 1.27 1.5 4.75 5
5 4.75 1.5 1.27 1.25 1 0.8 0.78 0.66 0.21 0.2
1.16 1.15 0.76 0.66 0.65 0.48 0.14 0 0 0 0
0 0 0 0 0.14 0.48 0.65 0.66 0.76 1.15 1.16
0 0.006183 0.665968 0.769158 0.778422 0.818908 0.778422 0.769158 0.665968 0.006183 0
0 0.00755047 0.81323948 0.93924826 0.9505613 1 0.9505613 0.93924826 0.81323948 0.00755047 0
3.3.3 Concentricity The other element in calculating the matching degree (22) is the concentricity of the two intervals one from another,conc([a, b]; [c, d]). The concentricity is determined by the centers of the intervals. In Fig. 12 some cases are shown. The intervals are placed on the real-number axis, in its positive region, but this does not restrict the generality. The centers of [a, b] and [c, d] have the co-ordinates m and m , respectively. To express the concentricity, the points shown in Fig. 13 were placed in a Cartesian system of co-ordinates. The points A, B and C, D are placed on Oy as if the intervals [a, b] and [c, d] were represented on the vertical axis. Again, the two intervals were assumed to be in the positive region of the axis, but this does not affect the generality. Denoting lab = b − a the length of interval [a, b] and lcd = d − c the length of [c, d], for the points in Fig. 13, the following relations and co-ordinates are defined:
O
a c
O
m' a) [a, b ] ≠ [c, d ] a
c
O
m
c
d m ≠ m'
m m' b) [a, b ] ≠ [c, d ]
a
b
b d m = m' b
m m' c) [a, b ] = [c, d ]
d m = m'
Fig. 12 Different positions of the centers of two intervals
394
M. Calin, C. Leonte
lab = b − a = yB − yA lcd A (0, yA ) B (0, yB ) C (0, yC ) D (0, yD ) O’ (0, (yA + yB )/2)
= d − c = y D − yC A’ ((lab + lcd )/2, yA ) B’ ((lab + lcd )/2, yB ) C’ ((lab + lcd )/2, yD ) D’ ((lab + lcd )/2, yD ) M’ ((lab + lcd )/4, (yC + y D )/2)
Having defined the points shown in Fig. 13, the concentricity conc is defined as the cosine of δ = ∠MO’M’ : conc([a, b]; [c, d]) = cos δ.
(37)
Elementary calculations lead to 1 2 cos δ =
, where tgδ = [(yC + y D ) − (y A + y B )] 2 lab + lcd 1 + tg δ
(38)
Concentricity conc([a, b]; [c, d]) = cos δ has the following properties: i) ii) iii) iv)
0 < conc([a, b]; [c, d]) ≤ 1 conc([a, b]; [a, b]) = 1 conc([a, b]; [c, d]) = conc([c, d]; [a, b]) conc([a, b]; [c, d]) = conc([a, b]; [c-α , d+α])
Property i ) shows the normalization of conc, in [0, 1]. Property ii) shows the reflexivity of conc, which means that when the current interval is equal to the reference one, conc is equal to 1. Property iii) shows the symmetry of conc that is, when computing the concentricity of two intervals, it doesn’t matter which one is the reference.
D
D'
B
B'
M' O'
Fig. 13 Construction of the angle δ expressing the concentricity of two intervals
δ M
C
C'
A
A'
O
Applying Fuzzy Decision and Fuzzy Similarity in Agricultural Sciences
395
In addition, from ii) and iv) we may conclude that concentricity is equal to 1 for any two intervals that have the same centre (m = m in Fig. 12). This means that conc is not sufficient to express the whole matching degree between two intervals; it must be used together with round in the evaluation of the matching degree.
3.4 Conclusions The utilization of the Fuzzy technique described in Sect. 3.2 allows linguistic formulations of the query criteria for interval valued attributes. Compared to the conventional methods, it reduces the risk of rejecting records that can be valuable for the final purpose. Degrees of acceptability are also computed for the respective attributes in each selected record. In Sect. 3.3, a reflexive and symmetrical fuzzy relation was defined for comparing two intervals. It can also be used as a query technique on a database having interval valued attributes. Moreover, a matching degree between the two intervals is also computed. The computed values of similarity that respectively result from applying the methods described in Sects. 3.2 and 3.3 can be used to automate further decisionmaking within a Multi-Attribute Decision Making procedure, as discussed in Sect. 2.
References 1. Bellman RA, Zadeh LA (1970) Decision-making in a fuzzy environment, Management Sciences, Ser. B 17 141–164 2. Boss B, Helmer S (1996) Indexing a Fuzzy Database Using the Technique of Superimposed Coding - Cost Models and Measurements (Technical Report of The University of Mannheim) www.informatik.uni-mannheim.de/techberichte/html/TR-96-002.html 3. Calin M, Galea D (2001) A Fuzzy Relation for Comparing Intervals. In: B. Reusch (Ed.), Computational Intelligence. Theory and Applications : International Conference, 7th Fuzzy Days Dortmund, Germany, October 1–3, 2001, Proceedings. Lecture Notes in Computer Science. Vol. 2206 / 2001 p. 904. Springer-Verlag Heidelberg 4. Calin M, Leonte C (2001) Aplying the Fuzzy Multi-Attribute Decision Model in the Selection Phase of Plant Breeding Programs. In: Farkas, I., (ed.) Artificial Intelligence in Agriculture 2001. Proc. of The 4th IFAC Workshop Budapest, Hungary, 6–8 June 2001. pp. 93–98. Elsevier Science 5. Calin M, Leonte C, Galea D (2003) Fuzzy Querying Method With Application in PlantBreeding. Proc. of The International Conference “Fuzzy Information Processing - FIP 2003”, Beijing, China. Springer and Tsinghua University Press 6. Coghlan A (2006) Genetically modified crops: a decade of disagreement. New Scientist, 2535, 21 Jan 2006 7. Fullér R (2003) Soft Decision Analysis. Lecture notes http://www.abo.fi/∼rfuller/lsda.html 8. Hiltner J (1996) Image Processing with Fuzzy Logic. Summer School "Intelligent Technologies and Soft Computing", Black Sea University, Romania, Sept. 22–28, 1996. 9. Kackprzyk J (1997) Fuzziness in Database Management Systems. Tutorial at the Conference “5th Fuzzy Days”, Dortmund, Germany 10. Kacprzyk J (1995) Fuzzy Logic in DBMSs and Querying 2nd New Zealand Two-Stream International Conference on Artificial Neural Networks and Expert Systems (ANNES ’95) Dunedin, New Zealand
396
M. Calin, C. Leonte
11. Lammerts van Bueren E, Osman A (2001) Stimulating GMO-free breeding for organic agriculture: a view from Europe. LEISA Magazine, December 2001 12. Lelescu, A. Processing Imprecise Queries in Database Systems http://www.cs.uic.edu/ ∼alelescu/eecs560_project.html 13. Leonte C. (1996) Horticultural Plant Breeding (Published in Romanian). Publisher: EDP, Bucharest, Romania 14. Leonte C, Târdea G, Calin M (1997) On the correlation between different quantitative characters of Cape pod bean variety (published in Romanian). Lucrari stiintifice, Horticultura, USAMV Iasi, Romania, Vol. 40, 136–142 15. Medina JM, Pons O, Vila MA (1994) GEFRED. A Generalized Model of Fuzzy Relational Databases Technical Report #DECSAI-94107, Approximate Reasoning and Artificial Intelligence Group (ARAI) Granada University, Spain, http://decsai.ugr.es/difuso/fuzzy.html 16. Negoita CV, Ralescu DA (1974) Fuzzy Sets and Applications. Editura Tehnica, Bucharest, (published in Romanian) 17. Reusch B (1996) Mathematics of Fuzzy Logic. In: Real World Applications of Intelligent Technologies (H-J. Zimmermann, M. Negoita, D. Dascalu eds.). Publishing House of the Romanian Academy, Bucharest, Romania 18. Saaty TL (1980) The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation. McGraw-Hill, New York 19. Serag-Eldin G, Nikravesh M (2003) Web-Based BISC Decision Support System. 2003 BISC FLINT-CIBI International Joint Workshop on Soft Computing for Internet and Bioinformatics. UC Berkley, 15–19 December 2003 20. Sperling L, Ashby JA, Smith ME, Weltzien E, McGuire S (2001) A Framework for Analysing Participatory Plant Breeding Approaches and Results. Participatory Plant Breeding – A Special Issue of EUPHYTICA, Vol. 122(3) 21. Williams J, and Steele N (2002) Difference, Distance and Similarity as a basis for fuzzy decision support based on prototypical decision classes, Fuzzy Sets and Systems 131 35–46 22. Yager RR (1988) Ordered weighted averaging aggregation operators in multi-criteria decision making, IEEE Trans. on Systems, Man and Cybernetics, 18(1988) 183–190 23. Zadeh LA (1965) Fuzzy Sets. Information and Control, 8: 338–353 24. Zimmermann H-J (1996) Fuzzy Decision Support Systems, In: Real World Applications of Intelligent Technologies (H-J. Zimmermann, M. Negoita, D. Dascalu eds.). Publishing House of the Romanian Academy, Bucharest, Romania 25. *** (1994) FuzzyCLIPS Version 6.02A. User’s Guide. Knowledge Systems Laboratory, Institute for Information Technology, National Research Council, Canada
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends Arthur Kordon
Abstract The chapter summarizes the current state of the art of applying soft computing solutions in the chemical industry, based on the experience of The Dow Chemical Company and projects the future trends in the field, based on the expected future industrial needs. Several examples of successful industrial applications of different soft computing techniques are given: automated operating discipline, based on fuzzy logic; empirical emulators of fundamental models, based on neural networks; accelerated new product development, based on genetic programming, and advanced process optimization, based on swarm intelligence. The impact of these applications is a significant improvement in process operation and in much faster modeling of new products. The expected future needs of industry are defined as: predictive marketing, accelerated new products diffusion, high-throughput modeling, manufacturing at economic optimum, predictive optimal supply-chain, intelligent security, reduce virtual bureaucracy, emerging simplicity, and handling the curse of decentralization. The future trends that are expected from the soft computing technologies, which may satisfy these needs, are as follows: perception-based modeling, integrated systems, universal role of intelligent agents, models/rules with evolved structure, and swarm intelligence-based process control.
1 Introduction One of the current trends in introducing emerging technologies in industry is the growing acceptance of different approaches of computational intelligence in general and of soft computing in particular. This trend is represented in several recently published books. An overview of different industrial implementations is given in (Jain and Martin 1999), numerous soft computing applications in the area of intelligent control are shown in (Zilouchian and Jamshidi 2001). A recent summary of various
Arthur Kordon Engineering&Process Sciences, Core R&D The Dow Chemical Company, 2301 N Brazosport Blvd, Freeport, TX 77541, USA e-mail: [email protected]
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
397
398
A. Kordon
soft computing applications in fish processing, facilities layout planning, mobile position estimation in cellular systems, etc. is given in (Karray and De Silva 2004). Of special importance for the future development of soft computing is its successful introduction into big global corporations in key industries. The focus of this chapter is to show exactly this case and to present a systematic view of the current state of the art of soft computing applications, the open issues that need to be overcome, and the future trends in this area, based on the experience of a big global corporation, such as The Dow Chemical Company. Although limited to the chemical industry, the presented view is typical for many other industries based on manufacturing and new product development. The proposed future trends in soft computing industrial applications are based on the accumulated experience from the past successes and failures and predicted future industrial needs. One can define this approach as “supply-demand” method of predicting, where the future demands in industry is expected to be satisfied by the supply of research approaches and ideas. From the predicted future industrial needs the attention is focused on those that could be resolved by soft computing or computational intelligence technologies. The hypothesis is that these needs will navigate the direction of future research in soft computing and inspire new ideas and solutions. It is assumed, though, that independently of the “supply side”, research will be driven also by its internal scientific mechanisms, such as combining evolutionary development and revolutionary paradigm shifts (Barker 1989). However, predicting the future trends on the “supply side” is not a subject of this chapter. This topic is covered in different papers from academic authors. An example of defining the future directions of Genetic Fuzzy Systems is given in (Herrera 2005). In predicting the future trends it is important to emphasize the critical role of industry in the short history of soft computing. For example, the successful industrial applications of fuzzy logic in Japan in mid 70s gave a decisive support for the development of this approach in a moment when many researchers had rejected the idea or had serious doubts about its scientific value (Karray and De Silva 2005). In similar ways, the fast acceptance in industry and the demonstrated value from many successful applications contributed to the accelerated growth in the research areas of neural networks and evolutionary computation. It is expected that industry will play this role in the future and this assumption is at the basis of the “supply-demand” method of prediction. However, in order to make the predictions more accurate, some specific nontechnical issues have to be taken into account. Industrial R&D has a faster dynamics and is a subject to more frequent structural changes than the academic environment. As a result, it is more sensitive to the speed of acceptance of a new technology, from several different stakeholders, such as researchers, process engineers, managers, etc. That increases significantly the importance of proper solutions to various organizational and political issues for promoting the technology from research to the business domain. Without addressing these issues there is always a potential for reducing the support or even questioning the rationale of using soft computing. The chapter is organized in the following manner. The starting point of our analysis is based on the current state of the art of soft computing in the chemical
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
399
industry, illustrated with typical examples in Sect. 2. The open issues in soft computing industrial applications, which will affect the future trends, are discussed in Sect. 3 followed by defining the projected industrial needs in Sect. 4. The future trends in soft computing industrial applications, driven by expected industrial needs, are described in Sect. 5.
2 Current State of The Art of Soft Computing in the Chemical Industry The chemical industry has some specific issues, such as: • high-dimensionality of plant data (thousands of variables and control loops) • increased requirements for model robustness toward process changes due to the cyclical nature of the industry • multiple optima • key process knowledge is in process operators and poorly documented • high uncertainty in new material market response, etc. One of the observed current tendencies in chemical industry is saturation with a variety of modeling solutions in manufacturing. As a result of the intensive modeling efforts in the last 20 years many manufacturing processes in the most profitable plants are already supplied by different types of models (steady-state, dynamic, model predictive control, etc.). This creates a culture of modeling “fatigue” and resistance to introduction of new solutions, especially based on unknown technologies. This makes the efforts of applying soft computing systems in industry especially challenging since demonstration of significant competitive advantages relative to the alternative modeling and hardware solutions is required. One area where soft computing systems have a clear competitive advantage is in the development of simple empirical solutions in terms of models and rules. In several industrial applications it has been shown that the models generated by soft computing are a low-cost alternative to both high fidelity models (Kordon et al. 2003a) and expensive hardware analyzers (Kordon et al. 2003b). Different soft computing technologies have been explored by several researchers in The Dow Chemical Company since the late 80s – early 90s (Gerules at al. 1992; Kalos 1992; Kalos and Rey 1995). However, the real breakthrough in applying these technologies was achieved after consolidating the available resources in a specialized research group in 2000. It resulted in various successful applications in the area of inferential sensors, automated operating discipline, accelerated new product development, and advanced process optimization. The key accomplishments and application areas are summarized in (Kotanchek et al. 2002; Kordon et al. 2004; Kordon et al. 2005). Examples of successful industrial applications from different soft computing technologies are given below:
400
A. Kordon
2.1 Fuzzy Logic Application Operating discipline is a key factor for competitive manufacturing. Its main goal is to provide a consistent process for handling all possible situations in the plant. It is the biggest knowledge repository for plant operation. However, the documentation for operating discipline is static and is detached from the real-time data of the process. The missing link between the dynamic nature of process operation and the static nature of operating discipline documents is traditionally carried out by the operations personnel. However, this makes the existing operating discipline process very sensitive to human errors, competence, inattention, or lack of time to consult the documentation. One approach to solving the problems associated with operating discipline and making it adaptive to the changing operating environment is to use real-time hybrid intelligent systems. Such type of a system was successfully implemented in a large-scale chemical plant at The Dow Chemical Company (Kordon at al. 2001). It is based on integrating experts’ knowledge with soft sensors and fuzzy logic. The hybrid system runs in parallel with the process; it detects and recognizes problem situations automatically in real-time; it provides a user-friendly interface so that operators can readily handle complex alarm situations; it suggests the proper corrective actions via a hyper-book; and it facilitates effective shift-to-shift communication. The processed data are then supplied to feature detection, by testing against a threshold or a range. Every threshold contains a membership function as defined by the fuzzy logic approach. The values for the parameters of the membership function are based either on experts’ assessment or on statistical analysis of the data.
2.2 Neural Networks Application The execution speed of the majority of the complex first-principle models is too slow for real time operation. One effective solution is to emulate a portion of the fundamental model, by a neural network or symbolic regression model, called emulator, built only with selected variables, related to process optimization. The data for the emulator are generated by design of experiments from the first-principle model. In this specific case, neural networks are an appropriate solution because they will operate within the training range. Usually the fundamental model is represented with several simple emulators, which are implemented on-line. One interesting benefit of emulators is that they can be used as fundamental model validation indicators as well. Complex model validation during continuous process changes requires tremendous efforts in data collection and numerous model parameters fitting. It is much easier to validate the simple emulators and to infer the state of the complex model on the basis of the high correlation between them. An example of emulator application for intermediate product optimization between two chemical plants in The Dow Chemical Company is given in (Kordon at al. 2003a).
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
401
2.3 Evolutionary Computation Application The key creative process in fundamental model building is hypothesis search. Unfortunately, the effectiveness of hypothesis search depends very strongly on creativity, experience, and imagination of model developers. The broader the assumption space (i.e., the higher the complexity and dimensionality of the problem), the larger the differences in modeler’s performance and the higher the probability for ineffective fundamental model building. In order to improve the efficiency of hypothesis search and to make the fundamental model discovery process more consistent, a new “accelerated” fundamental model building sequence was proposed. The key idea is to reduce the fundamental hypothesis search space by using symbolic regression, generated by Genetic Programming (GP) (Koza 1992). The main steps of the proposed methodology are shown in Fig. 1. The key difference from the classical modeling sequence is in running simulated evolution before beginning the fundamental model building. As a result of GP-generated symbolic regression, the modeler can identify key variables and assess the physical meaning of their presence/absence. Another significant side effect from the simulated evolution is the analysis of the key transforms with high fitness that persist during the GP run. Very often some of the transforms have direct physical interpretation that can lead to better process understanding at the very early phases of fundamental model development. The key result from the GP-run, however, is the list of potential nonlinear empirical models in the form of symbolic regression. The expert may select and interpret several empirical solutions or repeat the GP-generated symbolic regression until an acceptable model is found. The fundamental model building step 5 is based either on a direct use of empirical models or on independently derived first principle models induced by the results from the symbolic regression. In both cases, the effectiveness of the whole modeling sequence could be significantly improved. The large potential of genetic programming (GP)-based symbolic regression for accelerated fundamental model building was demonstrated in a case study for structure-property relationships (Kordon at al. 2002). The generated symbolic solution was similar to the fundamental model and was delivered with significantly less human efforts (10 hours vs. 3 months). By optimizing the capabilities for obtaining fast and reliable GP-generated functional solutions in combination with the fundamental modeling process, a real breakthrough in the speed of new product development can be achieved.
2.4 Swarm Intelligence Application Process optimization is an area where soft computing technologies can make almost immediate economic impact and demonstrate value. Since the early 90s various
402
A. Kordon
evolutionary computation methods, mostly genetic algorithms, have been successfully applied in industry, including The Dow Chemical Company. Recently, a new approach, Particle Swarm Optimization (PSO) is found to be very attractive for industrial applications. The main attractiveness of PSO is that it is fast, it can handle complex high-dimensional problems, it needs a small population size, and it is simple to implement and use. Different types of PSO have been explored in The Dow Chemical Company. A hybrid PSO and Levenberg-Marquardt method was used for quick screening of complicated kinetic models (Katare et al. 2004). The PSO successfully identified the promising regions of parameter space that are then optimized locally. A different, multi-objective PSO was investigated in (Rosca et al. 2004) and applied for real-time optimization of a color spectrum of plastics based on 15 parameters.
1. Problem definition
2. Run symbolic regression
GP
3. Identify key factors&transforms
4. Select GP generated models
5. Construct first principle models
6. Select&verify the final model solution
7. Validate the model
Fig. 1 Accelerated new product development by using Genetic Programming
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
403
3 Open Issues in Soft Computing Industrial Applications The ultimate success of an industrial application requires support from and interaction with multiple people (operators, engineers, managers, researchers, etc.). Because it may interfere with the interests of some of these participants, human factors, even “politics”, can play a significant role (Kordon 2005). On one hand, there are many proponents of the benefits of soft computing who are ready to cooperate and take the risk to implement this new technology. Their firm commitment and enthusiasm is a decisive factor to drive the implementation to the final success. On the other hand, however, there are also a lot of skeptics who do not believe in the real value of soft computing and look at the implementation efforts as a “research toy exercise”. In order to address these issues a more systematic approach in organizing the development and application efforts is recommended. The following key organizational and political issues are discussed below.
3.1 Organizational Issues 1) Critical mass of developers: It is very important at this early phase of industrial applications of soft computing to consolidate the development efforts. The probability for success based only on individual attempts is very low. The best-case scenario is to create a virtual group that includes not only specialists, directly involved in soft computing development and implementation, but also specialists with complementary areas of expertise, such as machine learning, expert systems, and statistics. 2) Soft computing marketing to business and research communities: Since soft computing is virtually unknown not only to business-related users but to most of the other research communities as well, it is necessary to promote the approach by significant marketing efforts. Usually this research-type of marketing includes series of promotional meetings based on two different presentations. The one directed toward the research communities focuses on the “technology kitchen”, i.e., gives enough technical details to describe the soft computing technologies, demonstrates the differences from other known methods, and clearly illustrates their competitive advantages. The presentation for the business-related audience focuses on the “technology dishes”, i.e., it demonstrates with specific industrial examples the types of applications that are appropriate for soft computing, describes the work process to develop, deploy, and support a soft computing application, and illustrates the potential financial benefits. 3) Link soft computing to proper corporate initiatives: The best case scenario we recommend is to integrate the development and implementation efforts within the infrastructure of a proper corporate initiative. A typical example is the Six Sigma initiative, which is practically a global industrial standard (Breyfogle III 2003). In this case the organizational efforts will be minimized since the companies have already invested in the Six Sigma infrastructure and the implementation process is standard and well known.
404
A. Kordon
3.2 Political Issues 1) Management support: Consistent management support for at least several years is critical for introducing any emerging technology, including soft computing. The best way to win this support is to define the expected research efforts and to assess the potential benefits from specific application areas. Of decisive importance, however, is the demonstration of any value creation by resolving practical problems as soon as possible. 2) Skepticism and resistance toward soft computing technologies: There are two key sources with this attitude. The first source is the potential user in the businesses who is economically pushed more and more toward cheap, reliable, and easy-tomaintain solutions. In principle, soft computing applications require some training and significant cultural change. Many users are reluctant to take the risk even if they see a clear technical advantage. A persistent dialog, supported with economic arguments and examples of successful industrial applications is needed. Sometimes sharing the risk by absorbing the development cost by the research organization is a good strategy, especially in applications that could be easily leveraged. The second source of skepticism and resistance is in the research community itself. For example, the majority of model developers prefers the first principle approaches and very often treats the data-driven methods as “black magic” which cannot replace solid science. In addition, many statisticians express serious doubts about some of the technical claims in different soft computing technologies. The only winning strategy to change this attitude is by more intensive dialog and finding areas of common interest. An example of a fruitful collaboration between fundamental model developers and soft computing developers is given in the previous section and described in (Kordon at al. 2002). Recently, a joint effort between statisticians and soft computing developers demonstrated significant improvement in using genetic programming in industrial statistical model building (Castillo et al. 2004). 3) Lack of initial credibility: As relatively new approach to industry, soft computing does not have a credible application history for convincing a potential industrial user. Almost any soft computing application requires a high-risk culture and significant communication efforts. The successful examples, discussed in this chapter, are a good start to gain credibility and increase the soft computing potential customer base in the chemical industry.
4 Projected Industrial Needs Related to Soft Computing Technologies In defining the expected industrial needs in the next 10–15 years one has to take into account the increasing pressure for more effective and fast innovation strategies (George et al. 2005). One of the key consequences is the expected reduced cost and time of exploring new technologies. As a result, priorities will be given to those methods, which can deliver practical solutions with minimal investigation and
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
405
development efforts. This requirement pushes toward better understanding of not only the unique technical capabilities of a specific approach, but also to focus attention on evaluating the total cost-of-ownership (potential internal research, software development and maintenance, training, and implementation efforts, etc.). Only approaches with clear competitive advantage will be used for solving the problems driven by the new needs in industry. The projected industrial needs are based on the key trends of increased globalization, outsourcing, progressively growing computational power, and expanding wireless communication. The selected needs are summarized in the mind-map in Fig. 2 and are discussed below:
4.1 Predictive Marketing Successful acceptance of any new product or service by the market is critical for industry. Until now most of the modeling efforts are focused on product discovery and manufacturing. Usually the product discovery process is based on new composition or technical features. Often, however, products with expected attractive new features are not accepted by the potential customers. As a result, big losses are recorded and the rationale of the product discovery and the credibility of the related technical modeling efforts are questioned. One of the possible solutions to this generic issue in industry is by improving the accuracy of market predictions with better modeling. Recently, there are a lot of activities in using the enormous amount of data on the Internet for customer
Fig. 2 A mind-map of the future industrial needs
406
A. Kordon
characterization (Swartz 2000) and modeling by intelligent agents (Said and Bouron 2001). Soft computing plays a significant role in this new growing new type of modeling. The key breakthrough, however, can be achieved by modeling customer’s perceptions. The subject of marketing is the customer. In predicting her response on a new product, the perception of the product is at the center point of the decision making process. Unfortunately, perception modeling is still an open area of research. However, this gap may create a good opportunity to be filled by the new approach in soft computing – perception-based computing (Zadeh 1999, 2005). With the growing speed of globalization of special importance is predicting the perception of a new product in different cultures. If successful, the impact of predictive marketing to any types of industry will be enormous.
4.2 Accelerated New Products Diffusion The next big gap to be filled with improved modeling is the optimal new products diffusion, assuming a favorable market acceptance. The existing modeling efforts in this area varied from analytical methods (Mahajan et al. 2000) to agent-based simulations (Bonabeau 2002). The objective is to define an optimal sequence of actions to promote a new product into research and development, manufacturing, supply-chain, and different markets. There are different solutions for some of these sectors (Carlsson et al. 2002). What is missing, however, is an integrated approach across all sectors.
4.3 High-Throughput Modeling Recently, high-throughput combinatorial chemistry is one of the leading approaches for generating innovative products in the chemical and pharmaceutical industries (Cawse 2003). It is based on intensive design of experiments through a combinatorial sequence of potential chemical compositions and catalysts on smallscale reactors. The idea is by fast data analysis to find new materials. The bottleneck, however, is the speed and quality of model generation. It is much slower than the speed of experimental data generation. Soft computing could increase the efficiency of high-throughput research by adding additional modeling methods that could generate empirical dependencies and capture the accumulated knowledge during the experimental work.
4.4 Manufacturing at Economic Optimum Most of existing manufacturing processes operate under model predictive control systems or well-tuned PID controllers. It is assumed that the setpoints of the
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
407
controllers are calculated based on either optimal or sub-optimal conditions. However, in majority of the cases the objective functions include technical criteria and are not directly related to economic profit. Even when the economics is explicitly used in optimization, the resulted optimal control is not adequate to fast changes and swings in raw material prices. The high dynamics and sensitivity to local events of global economy require process control methods that continuously fly over the moving economic optimum. It is also desirable to include a predictive component based on economic forecasts.
4.5 Predictive Optimal Supply-Chain As a consequence of increased outsourcing of manufacturing and the growing tendency of purchasing by the Internet, the share of supply-chain operations in total cost have significantly increased (Shapiro 2001). One of the challenges in the future supply-chain systems is the exploding amount of data they need to handle in real time because of the application of the Radio Frequency Identification Technology (RFID). It is expected that this new technology will require on-line fast pattern recognition, trend detection, and rule definition on very large data sets. Of special importance are the extensions of supply-chain models to incorporate demand management decisions and corporate financial decisions and constraints. It is obviously suboptimal to evaluate strategic and tactical supply chain decisions without considering marketing decisions that will shift future sales to those products and geographical areas where profit margins will be highest. Without including the future demand forecasts the supply-chain models lack predictive capability. Current efforts in supply chain are mostly focused on improved data processing, analysis, and exploring different optimization methods (Shapiro 2001). What is missing in the supply-chain modeling process is including the growing experience of all participants in the process and refining the decisions with the methods of soft computing.
4.6 Intelligent Security It is expected that with the technological advancements in distributed computing and communication the demand for protecting security of information will grow. Even at some point the security challenges will prevent the mass-scale applications of some technologies in industry because of the high risk of intellectual property theft, manufacturing disturbances, even process incidents. One technology in this category is distributed wireless control systems that may include many communicating smart sensors and controllers. Without secure protection of the communication, however, it is doubtful that this technology will be accepted in manufacturing even if it has clear technical advantages.
408
A. Kordon
Soft computing can play a significant role in developing sophisticated systems with build-in intelligent security features. Of special importance are the coevolutionary approaches based on intelligent agents (Skolicki et al. 2005).
4.7 Reduced Virtual Bureaucracy Contrary to the initial expectations of improved efficiency by using global virtual offices, an exponentially growing role of electronic communication not related to creative work activities is observed. Virtual bureaucracy enhances the classical bureaucratic pressure with new features like bombardment of management emails from all levels in the hierarchy; transferring the whole communication flow to any employee in the organization; obsession with numerous virtual training programs, filling electronic feedback forms, or exhaustive surveys; continuous tracking of bureaucratic tasks and pushing the individual to complete them at any cost, etc. As a result, the efficiency of creative work is significantly reduced. Soft computing cannot eliminate the root cause of this event, which is based on business culture, management policies, and human nature. However, by modeling management decision making and analyzing efficiency in business communication, it is possible to identify criteria for bureaucratic content of messages. It could be used in protecting the individual from virtual bureaucracy in a similar way as the spam filters are operating.
4.8 Emerging Simplicity An analysis of survivability of different approaches and modeling techniques in industry shows that the simpler the solution the longer it is used and the lower the need to be replaced with anything else. A typical case is the longevity of the PID controllers which still are the backbone of manufacturing control systems. According to a recent survey, PID is used in more than 90% of practical control systems, ranging from consumer electronics such as cameras to industrial processes such as chemical processes (Li et al. 2006). One of the reasons is their simple structure that appeals to the generic knowledge of process engineers and operators. The defined tuning rules are also simple and easy to explain. Soft computing technologies can deliver simple solutions when multi-objective optimization with complexity as explicit criterion is used. An example of this approach with many successful applications in the chemical industry is by using symbolic regression generated by Pareto Front Genetic Programming (Smits and Kotanchek 2004). Most of the implemented empirical models are very simple (see examples in (Kordon et al. 2006).) and were easily accepted by process engineers. Their maintenance efforts were low and the performance over changing process conditions was acceptable.
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
409
4.9 Handling the Curse of Decentralization The expected growth of wireless communication technology will allow massscale introduction of “smart” component of many industrial entities, such as sensors, controllers, packages, parts, products, etc. at relatively low cost. This tendency will give a technical opportunity for design of totally decentralized systems with build-in self-organization capabilities. On the one hand, this could lead to design of entirely new flexible industrial intelligent systems capable of continuous structural adaptation and fast response to changing business environment. On the other hand, however, the design principles and reliable operation of a totally distributed system of thousands of communicating entities of diverse nature is still a technical dream. Most of the existing industrial systems avoid the curse of decentralization by their hierarchical organization. However, this imposes significant structural constraints and assumes that the designed hierarchical structure is at least rational if not optimal. The problem is the static nature of the designed structure, which is in striking contrast to the expected increased dynamics in industry in the future. Once set, the hierarchical structure of industrial systems is changed either very slowly or not changed at all, since significant capital investment is needed. As a result, the system does not operate effectively or optimally in dynamic conditions. The appearance of totally decentralized intelligent systems of free communicating sensors, controllers, manufacturing units, equipment, etc., can allow emergence of optimal solutions in real time and effective response to changes in the business environment. Soft computing technologies could be the basis in the design of such systems and intelligent agents could be the key carrier of the intelligent component of each distributed entity.
5 Future Trends of Soft Computing Driven by Expected Industrial Needs An analysis of the expected industrial needs shows that there will be a shift of demand from manufacturing-type process related models to business-related models, such as marketing, product diffusion, supply-chain, security, etc. As was discussed in Sect. 2, there is saturation in manufacturing with different models and most of the existing plants operate close to the optimum. The biggest impact could be in using soft computing for modeling the human-related activities in the business process. By their nature they are based on fuzzy information, perceptions, and imprecise communication and soft computing techniques are the appropriate methods to address these modeling challenges. The selected future trends in soft computing industrial applications is shown in Fig. 3 and discussed below.
410
A. Kordon
Fig. 3 A mind-map of the future trends in soft computing industrial applications
5.1 Perception-Based Modeling The recent development of Prof. Zadeh in the area of computing with words, precisiated natural language, and the computational theory of perceptions (Zadeh 1999, 2005).] are the theoretical foundation of perception-based modeling. Predictive marketing could be a key application area of this approach. However, the theoretical development has to be gradually complemented with available software tools for real world applications. A representative test case for proof of concept application is also needed.
5.2 Integrate & Conquer The current trend of integrating different approaches of soft computing will continue in the future. Most of existing industrial applications are not based on a single soft computing technique and require several methods for effective modeling. An example of successful integration of neural networks, support vector machines, and genetic programming is given in (Kordon 2004). The three used techniques complement each other to generate the final simple solution. Neural networks are used for nonlinear variable selection and reduce the number of variable. Support vector machines condense the data points to those with significant information content, i.e., the support vectors. Only the reduced and informational-rich data set is used for the computationally intensive genetic programming. As a result, the model development time and cost is significantly reduced. More intensive integration efforts are needed in the future with inclusion of the new developments in soft computing. This trend is a must for satisfying any of the needs, discussed in the previous section.
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
411
5.3 Universal Role of Intelligent Agents The growing role of distributed systems in the future requires a software entity that represents a system, is mobile and has intelligent capabilities to interact, pursue goals, negotiate, and learn. Intelligent agents can play successfully this role and soft computing has a significant impact in this technology (Loia and Sessa 2001). Of special importance is the use of intelligent agents as natural integrator of the different soft computing techniques. It is possible that intelligent agents become the universal carrier of computational intelligence. The implementation of soft computing systems based on intelligent agents can satisfy most of the future industrial needs with critical role for predictive supply-chain, intelligent security, and solving the curse of decentralization.
5.4 Solutions with Evolved Structure One of the key differences between the existing industrial applications and the future needs is the transition from models and rules with fixed structure to solutions with evolved structure. Recently, evolving Takagi-Sugeno fuzzy systems have demonstrated impressive performance on several benchmarks and applications (Angelov and Filev, 2004). They are based on incremental recursive clustering for on-line training of fuzzy partitions with possibility of adjoining new or removing old fuzzy sets or rules whenever there is a change in the input-output data pattern. If successful, this approach can reduce significantly the maintenance cost of many applied models in manufacturing.
5.5 Swarm Intelligence-Based Control One potential approach that could address the need for continuous tracking the economic optimum and could revolutionize process control is the swarm-based control (Conradie et al. 2002). It combines the design and controller development functions into a single coherent step through the use of evolutionary reinforcement learning. In several case studies, described in (Conradie et al. 2002) and (Conradie and Aldrich 2006), the designed swarm of neuro-controllers finds and tracks continuously the economic optimum, while avoiding the unstable regions. The profit gain in case of a bioreactor control is > 30% relative to classical optimal control (Conradie and Aldrich 2006).
412
A. Kordon
6 Conclusion The chapter summarizes the current state of the art of applying soft computing solutions in the chemical industry, based on the experience of The Dow Chemical Company and projects the future trends in the field, based on the expected future industrial needs. Several examples of successful industrial applications of different soft computing techniques are given: automated operating discipline, based on fuzzy logic; empirical emulators of fundamental models, based on neural networks; accelerated new product development, based on genetic programming, and advanced process optimization, based on swarm intelligence. The impact of these applications is a significant improvement in process operation and in much faster modeling of new products. The expected future needs of industry are defined as: predictive marketing, accelerated new products diffusion, high-throughput modeling, manufacturing at economic optimum, predictive optimal supply-chain, intelligent security, reduce virtual bureaucracy, emerging simplicity, and handling the curse of decentralization. The future trends that are expected from the soft computing technologies, which may satisfy these needs, are as follows: perception-based modeling, integrated systems, universal role of intelligent agents, models/rules with evolved structure, and swarm intelligence-based process control. Acknowledgments The author would like to acknowledge the contribution in the discussed industrial applications of the following researchers from The Dow Chemical Company: Flor Castillo, Elsa Jordaan, Guido Smits, Alex Kalos, Leo Chiang, and Kip Mercure, and Mark Kotanchek from Evolved Analytics.
References Angelov P, Filev D (2004) Flexible models with evolving structure. International Journal of Intelligent Systems 19:327–340 Barker J (1989) Discovering the future: the business of paradigms. ILI Press, St. Paul Bonabeau. E (2002) Agent-based modeling: methods and techniques for simulating human systems. PNAS, vol.99, suppl.3, pp. 7280–7287 Breyfogle III F (2003) Implementing Six Sigma, 2nd . Wiley, Hoboken Carlsson B, Jacobsson S, Holmen M, Rickne A (2002) Innovation systems: analytical and methodological issues. Research Policy 31:233–245 Castillo F, Kordon A, Sweeney J, Zirk W (2004) Using genetic programming in industrial statistical model building. In: O’Raily U. , Yu T., Riolo, R. and Worzel, B. (eds): Genetic programming theory and practice II. Springer, New York, pp. 31–48 Carlsson B, Jacobsson S, Holmen M, Rickne A (2002) Innovation systems: analytical and methodological issues. Research Policy 31:233–245 Cawse J (Editor) (2003) Experimental design for combinatorial and high-throughput materials development. Wiley, Honoken Conradie A, Mikkulainen R, Aldrich C (2002) Adaptive control utilising neural swarming. In Proceedings of GECCO’2002, New York, pp. 60–67. Conradie A, AldrichC (2006) Development of neurocontrollers with evolutionary reinforcement learning. Computers and Chemical Engineering 30:1–17
Soft Computing in the Chemical Industry: Current State of the Art and Future Trends
413
George M, Works J, Watson-Hemphill K (2005) Fast innovation. McGraw Hill, New York Gerules M, Kalos A, Katti S (1992) Artificial neural network application development: a knowledge engineering approach. In 132A, AIChE 1992 annual fall meeting, Miami Herrera F (2005) Genetic fuzzy systems: status, critical considerations and future directions. International Journal of Computational Intelligence 1:59–67 Jain J, Martin N (eds), (1999) Fusion of neural networks, fuzzy sets, and genetic algorithms: industrial applications. CRC Press, Boca Raton Kalos A (1992) Knowledge engineering methodology for industrial applications. In EXPERSYS92, ITT-International Kalos A, Rey T (1995) Application of knowledge based systems for experimental design selection. In EXPERSYS-95, ITT-International Karray G, De Silva C (2004) Soft computing and intelligent systems design: theory, tools and applications. Addison Wesley, Harlow Katare S, Kalos A, West D (2004) A hybrid swarm optimizer for efficient parameter estimation. In Proceedings of CEC’2004, Portland, pp. 309–315 Kordon A (2004) Hybrid intelligent systems for industrial data analysis. International Journal of Intelligent Systems 19:367–383 Kordon A (2005) Application issues of industrial soft computing systems. In Proc. of NAFIPS’2005, Ann Arbor Kordon A, Kalos A, Smits G (2001) Real time hybrid intelligent systems for automating operating discipline in manufacturing. In the Artificial Intelligence in Manufacturing Workshop Proceedings of the 17th International Joint Conference on Artificial Intelligence IJCAI-2001, pp. 81–87 Kordon A, Pham H, Bosnyak C, Kotanchek M, Smits G (2002) Accelerating industrial fundamental model building with symbolic regression: a case study with structure – property relationships. In Proceedings of GECCO’2002, New York, volume Evolutionary Computation in Industry, pp. 111–116 Kordon A, Kalos A, Adams B (2003a) Empirical emulators for process monitoring and optimization. In Proceedings of the IEEE 11th Conference on Control and Automation MED’2003, Rhodes, pp.111 Kordon A, Smits G, Kalos A, Jordaan (2003b) Robust soft sensor development using genetic programming. In Nature-Inspired Methods in Chemometrics (R. Leardi-Editor) Elsevier, Amsterdam Kordon A, Castillo F, Smits G, Kotanchek M (2006) Application issues of genetic programming in industry. In Genetic Programming Theory and Practice III, T. Yu, R. Riolo and B. Worzel (eds) Springer, NY pp. 241–258 Kotanchek M, Kordon A, Smits G, Castillo F, Pell R, Seasholtz M.B, Chiang L,. Margl P, Mercure P.K., Kalos A (2002) Evolutionary computing in Dow Chemical. In Proceedings of GECCO’2002, New York, volume Evolutionary Computation in Industry, pp. 101–110 Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge Li Y, Ang K, Chong G (2006) Patents, software, and hardware for PID control. IEEE Control Systems Magazine 26:42–54 Loia V, Sessa S (eds) (2001) Soft computing agents. Physica-Verlag, Heidelberg Mahajan V, Muller E, Wind Y (eds) (2000) New product diffusion models. Kluwer, Boston Rosca N, Smits G, Wessels J (2005) Identifying a pareto front with particle swarms. unpublished Said B, Bouron T (2001) Multi-agent simulation of a virtual consumer population in a competitive market. in Proc. of 7th Scandinavian Conference on Artificial Intelligence, pp. 31–43 Schwartz D(2000) Concurrent marketing analysis: a multi-agent model for product, price, place, and promotion. Marketing Intelligence & Planning 18:24–29 Shapiro J (2001) Modeling the supply chain. Duxbury, Pacific Grove Skolicki Z, Houck M, Arciszewski T (2005) Improving the security of water distribution systems using a co-evolutionary approach. in the Critical Infrastructure Protection Session, “Working Together: Research & Development (R&D) Partnerships in Homeland Security,” Department of Homeland Security Conference, Boston
414
A. Kordon
Smits G, Kotanchek M (2004) Pareto front exploitation in symbolic regression. In Genetic Programming Theory and Practice II , U.M. O’Reilly, T. Yu, R. Riolo and B. Worzel (eds) Springer, NY, pp 283–300 Zadeh L (1999) From computing with numbers to computing with words—from manipulation of measurements to manipulation of perceptions”, IEEE Transactions on Circuits and Systems 45:105–119 Zadeh L (2005) Toward a generalized theory of uncertainty (gtu) – an outline. Information Sciences 172:1–40 Zilouchian A, Jamshidi M (eds),(2001) Intelligent control systems using soft computing methodologies. CRC Press, Boca Raton
Identifying Aggregation Weights of Decision Criteria: Application of Fuzzy Systems to Wood Product Manufacturing Ozge Uncu, Eman Elghoneimy, William A. Gruver, Dilip B Kotak and Martin Fleetwood
1 Introduction A rough mill converts lumber into components with prescribed dimensions. The manufacturing process begins with lumber being transferred from the warehouse to the rough mill where its length and width are determined by a scanner. Then, each board is cut longitudinally by the rip saw to produce strips of prescribed widths. Next, the strips are conveyed to another scanner where defects are detected. Finally, the chop saw cuts the strips into pieces with lengths based on the defect information and component requirements. These pieces are conveyed to pneumatic kickers which sort them into bins. The rough mill layout and process of a major Canadian window manufacturer is used throughout this study. The rough mill layout and process, as described by Kotak, et al. [11], is shown in Fig. 1. The operator receives an order in which due dates and required quantities of components with specific dimensions and qualities are listed. Since there is a limited number of sorting bins (which is much less than the number of components in the order), the operator must select a subset of the order, called cut list, and assign it to the kickers. After selecting the cut list, the loads of lumber (called jags) that will be used to produce the components in the cut list must be selected by the operator. The operator also determines the arbor configuration and priority of the rip saw. Methods for kicker assignment have been reported by Siu, et al. [17] and jag selection has been investigated by Wang, et al. [22], [23]. A discrete event simulation model was developed by Kotak, et al. [11] to simulate the daily operation of a rough mill for specified jags, cut lists, ripsaw configuration, and orders. Wang, et al. [22] proposed a two-step method to select the most suitable jag for a given cut list. The first step selects the best jag type based on the proximity of the length distribution of the cut list and the historical length distribution of the Ozge Uncu · Eman Elghoneimy · William A. Gruver School of Engineering Science, Simon Fraser University, 8888 University Dr. Burnaby BC, Canada, e-mail: {ouncu, eelghone, gruver}@sfu.ca Dilip B Kotak · Martin Fleetwood Institute for Fuel Cell Innovation, National Research Council, Vancouver, BC, Canada, e-mail: {dilip.kotak, martin.fleetwood}@nrc-cnrc.gc.ca
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
415
416
O. Uncu et al.
Fig. 1 Rough mill layout and process
components cut by using the corresponding jag type. The second step selects the most suitable jag in the inventory with the selected jag type. The following two decision criteria are used to rank the jags: width code of the jag and the board footage of the jag. In this model, the priorities of these two criteria are supplied a priori by the user and maintained the same, even under different cut list characteristics. Wang, et al. [23] extended this method to cover a number of decision criteria while selecting the suitable jag type. In their method, the priorities of the decision criteria must be identified in order to select the most suitable jag type for a given cut list. This was achieved by using a Mamdani Type 1 fuzzy rulebase, Fuzzy AHP [4] and Fuzzy TOPSIS [5]. The fuzzy rulebase has cut list characteristics as antecedent and pair-wise comparison of the importance of decision criteria as the consequent. In the study presented in this paper, the method of Wang, et al. [23] is simplified so that conventional AHP and TOPSIS methods are utilized, thereby avoiding the computationally expensive fuzzy arithmetic and irregularities associated with algebraic manipulation of triangular fuzzy numbers. The simplified model retained the fuzzy rulebase in order to capture the uncertainty due to different expert opinions on the importance of decision criteria for jag type selection. Then, the simplified model is further modified and extended to include decision criteria for determining the most suitable jag for the selected jag type. The modified and extended model does not require expert input to obtain pair-wise comparisons of the importance of decision criteria. Our approach eliminates the need for expert input which may lead to unsatisfactory results as discussed in Sect. 4. A fuzzy rulebase is used to determine the weights of the decision criteria in selecting jag types and jags under different cut list characteristics. This rulebase is tuned by using genetic algorithms (GA) [8]. Since the order to be fulfilled is constant, an objective function to be minimized is the cost associated with the selected raw material and the processing time to complete the order. The remainder of the paper is structured as follows. Background on multicriteria decision making methods and the jag selection approach used in the proposed
Application of Fuzzy Systems to Wood Product Manufacturing
417
methods will be provided in Sect. 2. Then, the benchmark method and two methods which find the most suitable weights to lower the overall cost of jag sequence will be explained in Sect. 3. The proposed approaches have been evaluated on four test order files for which results are provided in Sect. 4. Conclusions based on the results and potential extensions are discussed in Sect. 5.
2 Background Since the proposed methods use the jag selection approach of Wang, et al. [22], the method will be summarized. In addition, we provide a review of MCDM methods and their associated disadvantages.
2.1 Multi-Criteria Decision Making Methods Multi-Criteria Decision Making (MCDM) involves the evaluation of alternatives on the basis of a number of decision criteria. When the alternatives and decision criteria are given, there are two questions to be answered: How are the weights of the decision criteria determined and how are the alternatives associated with the scores ranked? The earliest MCDM methods, which treated the second question, can be considered as a weighted sum model (WSM) [18] and a weighted product model (WPM) [18]. WPM and WSM provide a basis to calculate scores of the alternatives to rank them. The following two equations are used to choose the best alternative with respect to the associated scores calculated by WSM and WPM, respectively: ⎛ ⎞⎫ m ⎬ m = Ak akj w j = max ⎝ ai j w j ⎠ ⎭ ⎩ i=1,...,n j =1 j =1 ⎧ ⎛ ⎞⎫ m m ⎬ ⎨ = Ak (akj )w j = max ⎝ (ai j )w j ⎠ ⎭ ⎩ i=1,...,n j =1 j =1 ⎧ ⎨
Ak ∗
Ak ∗
(1)
(2)
where w j is the weight of the j t h criterion, ai j is the value of the i t h alternative in terms of the j t h criteria, Ak ∗ is the best alternative, and Ak is the k t h alternative. A major limitation of WSM is that it requires the criteria to have the same scale so that the addition of scores is meaningful. WPM uses ratios to avoid scale issues. Since the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) [9] and Analytic Hierarchy Process (AHP) [15] will be used in Sect. 3, these methods will now be explained. For a given set of weights and alternatives TOPSIS involves choosing an alternative that has the shortest distance to the positive ideal solution and the longest distance to the negative ideal solution. The set of criteria is divided into two subsets,
418
O. Uncu et al.
one to be minimized and the other to be maximized. Let the set of criteria to be minimized and maximized be denoted as Smin and Smax , respectively, and let d(x, y) denote the Euclidean distance between vectors x and y. Then, the positive ideal solution s + = (s1+ , . . . , sm+ ) and the negative ideal solution s − = (s1− , . . . , sm− ) are defined as follows: ⎧ ⎧ ⎨ max (ai j w j ) if j ∈ Smax ⎨ min (ai j w j ) if j ∈ Smax i=1,...,n = i=1,...,n , s− s+ j = ⎩ min (a w ) j else else ⎩ max (ai j w j ) ij j i=1,...,n
i=1,...,n
(3) Let the score vector of the i t h alternative be denoted by SC Vi = (sci1 , . . . , scim ), where sci j = ai j w j is the score of the i t h alternative in terms of j t h criterion. Then, the best alternative is selected as follows: d(SC Vk , s − ) d(SC Vi , s − ) A k ∗ = A k = max d(SC Vk , s − ) + d(SC Vk , s + ) i=1,...,n d(SC Vi , s − ) + d(SC Vi , s + ) (4)
The Analytic Hierarchy Process (AHP) [15] decomposes the solution of a MCDM problem into the following three steps. The first step models the problem in a hierarchical structure showing the goal, objectives (criteria), sub-objectives, and alternatives. The second step identifies the weights of each decision criterion. The domain experts compare the importance of decision criteria in pairs to create a pair-wise comparison matrix. In most of the methods derived from AHP, the criteria are assumed to be in ratio scale. The pair-wise comparison matrix has the following form: ⎡
c11 ⎢ c21 PC M = ⎢ ⎣ · cm1
c12 · c ji cm2
⎤ . . . c1m ⎥ ci j ⎥ , where ci j = 1/c j i and cii = 1 ⎦ · . . . cmm
(5)
In the original AHP method, the weight vector was calculated by determining the eigenvector associated with the largest eigenvalue of PCM. If the PCM is a perfectly consistent comparison, then the rank of PCM is equal to 1 and the only eigenvalue of PCM is equal to m, the number of decision criteria. Thus, in order to validate the results, a consistency measure is defined with respect to the difference between the largest eigenvalue of PCM and m. The consistency index is defined by Saaty [15] as follows: CI =
λmax − m m−1
(6)
Then, a random index RI is calculated as the average of the consistency measures for many thousands randomly generated PCM matrices, in the form explained in (5). The ratio of CI and RI, called the consistency ratio, is used as the consistency measure. A rule of thumb is to consider the weight vector to be reliable if the
Application of Fuzzy Systems to Wood Product Manufacturing
419
consistency ratio is less than 0.1. Then, the alternatives can be scored, ranked and the best alternative can be selected by using a ranking method such as WSM, WPM, or TOPSIS. The major criticisms on the original AHP method are as follows: i) The rate of growth in the number of pair-wise comparisons in the method is quadratic, ii) it does not deal with interdependencies between decision criteria, iii) it is sensitive to the pair-wise comparisons and measurement scales, iv) whenever another alternative is added, the ranks of alternatives can be totally changed (rank reversal), and, most importantly, v) experts are asked to provide crisp numbers while building the PCM matrix. Several studies have been published to address these issues. Triantaphyllou [19] proposed a method in which the relative performance of two decision criteria is examined within a given alternative instead of comparing two alternatives at a time. To address the rank reversal issue, Belton and Gear [1] proposed to divide each relative value by the maximum value in the corresponding vector of relative values instead of having the relative values sum to one. Sensitivity to the selection of the measurement scale was studied by Ishizaka, et al. [10]. The last issue has been treated by several researchers using fuzzy set theory. Van Laarhoven and Pedrycz [21] used triangular membership functions and the least square method to extend the AHP method to the fuzzy domain. Buckley [3] criticized the use of triangular membership functions and used trapezoidal to modify the fuzzy AHP method of Van Laarhoven and Pedrycz. These methods still have unreliable results due to rank reversals and require considerable amount of computations. Deng [7] proposed a method that used α-cut and fuzzy extent analysis to reduce the computations in the pair-wise comparison matrix represented by triangular membership functions. Mikhailov [13] cited the criticism of Saaty [16] on the use of extent analysis and further investigated the disadvantages of utilizing the comparison matrix constructed by fuzzy judgments represented by the triangular membership and their reciprocals. He proposed to use fuzzy linear and nonlinear programming with the reciprocals of the triangular membership functions to determine crisp weights. Fuzzy extensions of fuzzy WSM, fuzzy WPM, and fuzzy TOPSIS have also been studied [18].
2.2 Jag Selection Jag selection requires finding the most suitable jag (among available jags) with proper capabilities to produce a given cut list” [22]. Wang, et al. [22] utilized a method of case-based reasoning, called CBR-JagSelection, where jag types are defined by mill, grade, production line, length code, and thickness. A distribution of the percentage of cut quantities in each predetermined sort length bin is calculated for each jag type based on the average of historical cut quantities achieved by using jags that belong to the corresponding jag type. The CBR-JagSelection algorithm can be summarized as follows: For a given cut list, the distribution of the percentage of required quantities for the components in the cut list is created. The difference between the quantity-length distribution of the cut list and the historical distribution of each jag type is calculated by using the differences between percentages in the
420
O. Uncu et al.
two distributions and the values associated with each length bin. This difference, which can be considered as a penalty measure, is used to sort the jag types. By this means, the penalty measure is a measure of suitability of the jag type for the given cut list. Starting from the most suitable jag type, the available jags in the inventory are selected for each jag type until a maximum number of jags has been reached. The list of jags for each jag type is sorted by using criteria such as the proximity of the board footages (a measure of volume used in lumber industry) of the jag and the cut list, and the suitability of the width of the selected jag for the given ripsaw arbor configuration and priority. For each alternative, individual scores corresponding to each decision criterion are calculated. Then the weighted average of the scores is used to rank the jags. The weights are specified by the user at the beginning of the run and are not changed until the order is finished. The sorted list of jags is presented to the user to select the jag [22]. The weights associated with the proximity of board footages and suitability of the width of the selected jag will be denoted by wBDT and wWDT , respectively. Users of the jag selection method have reported that it is difficult to maintain wBDT and wWDT fixed during the production run with in the jag selection method because of the fact that the shop floor managers want the weights to change dynamically depending on the characteristics of the cut lists. This issue will be address in Sect. 3.4. The algorithm of Wang, et al. [22] generates a sorted list of alternative jags for a given cut list. Suppose we would like to generate a list of suitable jags for a cut list with homogeneous width and constant thickness. Let the cut list consist of the set of pairs of (li , qi ), where li and qi are the length and the required quantity of the i t h component, respectively. Furthermore, let NCO be the number of components in the cut list and suppose that the components in the cut list are arranged in descending order with respect to their length. Then, the cumulative quantity percentage distribution of the cut list can be computed as follows: C P Dm =
qi
i=1 i≤m
NC
, ∀m = 1, . . . , NC O
(7)
qi
i=1
Each jag type in the history file has a similar distribution that will be used to measure the degree of match between the jag type and the cut list. The distribution defined on length axis is discretized into 75 bins in the interval of [50mm, 3750mm] with 50mm steps. The degree of match between the cut list and the jag type distributions is calculated. Then, the jag types are sorted according to their degree of match. Next, the jags available in the inventory are found for each jag type and sorted by priority according to their board footage and width. The procedure for determining the sorted list of jags is as follows. Let NJT be the number of jag types, NCO be the number of components in the cut list, the cumulative percentage in the j t h bucket of distribution of the k t h jag type J Tk be C P D_ J Tk, j , and the value of the i t h component with length li be V (li ).
Application of Fuzzy Systems to Wood Product Manufacturing
421
Step 0. Provide w B DT and wW DT Step 1 For all jag type, ∀k = 1, . . . , N J T DO Step 1.1 Initialize extra,E and cumulative penalty Pk , with 0 Step 1.2 Initialize flagFeasible with 1 Step 1.3 For all component in the cutlist, ∀i = i, . . . , NC O DO Step 1.3.1 Find the bucket in the jag type distribution in which li resides Step 1.3.2 D is defined as C P D_ J Tk, j + E-C P D j Step 1.3.3 IF D < 0 THEN set flagFeasible to 0 and E to 0, ELSE E = D Step 1.3.4 Update the penalty by Pk = Pk + |D| × V (li ) Step 1.4 IF flagFeasible equals 0 THEN jag type is categorized as infeasible, ELSE it is categorized as feasible Step 2. Sort the feasible jag types in ascending order with respect to penalties. Step 3. Sort the infeasible jag types in ascending order with respect to penalties. Step 4. Merge the two sorted feasible and infeasible jag type lists and select best MAXJAGTYPE ones. Step 5. For each of the remaining jag type in the sorted jag type list DO Step 5.1 Get the available jags from the inventory Step 5.2 Sort the available jags in descending order with respect to their priority calculated by using the jag width and the match between the board footage of the jag and the cut list and the weights (w B DT and wW DT ) specified by the user at the beginning of the run.
Algorithm 1. Jag Ranking (Jag List Creation)
3 Jag Sequencing Approaches As mentioned in Sect. 2, the jag selection algorithm of Wang, et al., determines the most suitable jag for a given cut list. Because the entire production run of an order is not considered there is no measure that can be directly associated with business goals (such as minimizing raw material cost, minimizing production time, maximizing profit) to investigate whether the selected jags provide acceptable results. Solution of the jag sequencing problem requires determining the most suitable sequence of jags to produce all components in the order while minimizing raw material cost and production time. Thus, the objective is to minimize the costs associated with the jag sequence. Let R(Jagi ) be the raw material cost of one board foot of lumber with grade equal to the grade of the i t h jag in the sequence, B F(Jagi ) be the board footage of the i t h jag, PTC be the opportunity cost incurred for an extra second of production, and T (Jagi ) be the time required to cut components out of i t h jag in the sequence. Then, the evaluation function for j t h jag sequence can be represented:
422
O. Uncu et al.
E(J S j ) =
N Jj (R(J agi )B F(J agi ) + T (J agi )PT C)
(8)
i=1
where N J j is the number of jags in the j t h jag sequence. There is also a time constraint to solve this problem. A 15 minute break in the production must occur every two hours. Thus, the most suitable jag sequence should be determined within 15 minutes.
3.1 Zero Intelligence Jag Sequencing In order to have a benchmark to evaluate the proposed method to optimize the weights of the decision criteria used in jag selection, a method, with no intelligence, is utilized to output the jag sequence. In this method, the jags are selected repeatedly by using the jag selection approach of Algorithm 1 until the order is fulfilled. Thus, the selected jags are optimized with respect to the selection criteria (with fixed weights) for the current cut list and the overall objective is ignored.
3.2 Fuzzy Rulebase Based Analytic Hierarchy Process Wang, et al. [23] extended their jag selection method to consider factors other than the penalty due to the difference between length-quantity distribution of the cut list and that associated with the jag types. The chopsaw yield, average processing time, the above mentioned penalty, and the standard material cost associated with the jag types were used to rank the jag types from more desirable to less desirable. Their method determines the weights of these new decision criteria and ranks the jag types by using these weights and the value of the alternatives with respect to the decision criteria. Although AHP is a popular MCDM method to find the weights of decision criteria, it requires experts to make pair-wise comparisons of the importance of decision criteria. However, it is difficult for a decision maker to precisely quantify and process linguistic statements such as “How important criterion A is compared to criterion B.” Fuzzy set theory [24] provides a basis for treating such problems. Wang, et al., adopted a fuzzy AHP method [4] to determine the weights. Trapezoidal membership functions were used to represent the pair-wise comparisons. Since these comparisons change depending on the characteristics of the cut list, a fuzzy rulebase, with cut list characteristics as antecedent and pair-wise comparisons as consequents, is constructed through interviews with domain experts. There are uncertainties due to different expert opinion and uncertainties due to imprecision in historical data used in the jag selection process. The antecedent variables of the rulebase are the percentage of long sizes, due date characteristics and cumulative required quantity of the cut list. The consequents of the rulebase are the pair-wise
Application of Fuzzy Systems to Wood Product Manufacturing
423
comparisons of the importance of decision criteria, namely chopsaw yield, average processing time, penalty (as described in Sect. 2.3), and standard material cost of jag type. Let x 1 be the percentage of long sizes in the current cut list file, li and qi denote the length and required quantity of the i t h component in the order file, respectively, NCO be the number of components in the order file, and LONG be the number of components in the order file with length greater than 2000mm. Similarly, let x 2 be the due date attribute of the current cut list file, di be the due date of the i t h component in the cut list file, and the due date priority of any component is 1, 2, 3, or 4 where 1 corresponds to the most urgent. Furthermore, let x 3 be the scaled cumulative quantity of the current cut list file. Then, the antecedent variables of the rulebase in Fig. 2 can be defined as follows in (9): x1 = λ
i∈Long NC O j =1
NC O
li2 qi
l 2j q j
, x2 =
i=1 NC O
NC O
qi , , x3 =
qi di
i=1
γ
qi (9)
i=1
where λ and γ are set to 2 and 20000, respectively. These constants are used to normalize the antecedent variables. The rulebase to capture the pair-wise comparisons is shown in Fig. 2. The acronyms V, E, M, D, VD, L, N, U, S, L, ELS, CLS, SLS, ES, SMS, CMS, and EMS represent the linguistic labels Very Easy, Easy, Medium, Difficult, Very Difficult, Loose, Normal, Urgent, Small, Medium, Large, Extremely Less Significant,
Fig. 2a General rulebase structure to capture fuzzy pair-wise comparisons based on cut list characteristics
424
O. Uncu et al.
Fig. 2b Sample rule
Considerably Less Significant, Somewhat Less Significant, Equal Significance, Somewhat More Significant, Considerably More Significant, and Extremely More Significant, respectively. Then, the membership functions of the fuzzy sets used in the rulebase are defined in Table 1. The rulebase contains 45 rules corresponding to all possible combinations of fuzzy sets that represent antecedent variables. In the method proposed by Wang, et al. [23], for a given cut list x 1 , x 2 and x 3 are calculated and fuzzified by using the rulebase shown in Fig. 2. Let μi (x j ) denote the membership value of j t h antecedent variable in i t h rule. Then, the antecedent membership values are aggregated to find the degree of fire in each rule. Zadeh’s MIN operator is used to determine the output fuzzy sets in each rule as in Mamdani type fuzzy rulebases [12]. Then, the output fuzzy sets are aggregated using Zadeh’s MAX operator. Finally, the output fuzzy set is defuzzified with the centre of gravity method. After finding the defuzzified value of pair-wise comparisons, Wang, et al., change the results in ordinal scale to ratio scale by using Table 2. Wang, et al., use these results in ratio scale as the modes of triangular fuzzy numbers to construct a fuzzy pair-wise comparison matrix. The left and right spread values are determined as in explained in [23]. Then, the fuzzy weights are identified Table 1a Fuzzy sets and membership functions of the antecedent variables
Length
Due date
Quantity
Linguistic Term
Membership function
VE E M D VD L N U S M L
(0,0,0.02,0.04) (0.02,0.04,0.16,0.2) (0.16,0.2,0.4,0.5) (0.4,0.5,0.9, 0.95) (0.9,0.95,1,1) (0,0,0.33,0.4) (0.33,0.4,0.8,0.9) (0.8,0.9,1,1) (0,0,0.2,0.25) (0.2,0.25,0.4,0.45) (0.4,0.45, 1,1)
Application of Fuzzy Systems to Wood Product Manufacturing
425
Table 1b Fuzzy sets and membership functions of the scales used in pair-wise comparisons in consequent variables Scale Linguistic Term
Membership function
ELS CLS SLS ES SMS CMS EMS
(0,0,0.1,0.15) (0.1,0.15,0.25,0.3) (0.25,0.3,0.4,0.45) (0.4,0.45,0.55,0.6) (0.55,0.6,0.7,0.75) (0.7,0.75,0.85,0.9) (0.85,0.9,1,1)
by using Fuzzy Row Means of Normalized Columns methods [4]. After identifying the weights, the Fuzzy TOPSIS method proposed [5] is used to rank the alternatives. Please refer to [23] for technical details. Thus, the model has three stages: i) Identify the pair-wise comparisons depending on the cut list characteristics, ii) identify weights by processing the result of the first stage with fuzzy AHP, and iii) rank the jag types by using Fuzzy TOPSIS with the weights identified in the second stage. This model provides decision support to select a jag for a given cut list. The first stage model is extremely useful because the rules are determined by expert users through interviews, i.e., the rulebase which implies the pair-wise comparisons between the importance of decision criteria is decided by domain experts. Thus, the use of fuzzy sets (linguistic labels) offers a valuable mechanism to capture the uncertainty of the linguistic labels due to different expert opinions under different cut list characteristics. The second stage identifies fuzzy weights associated with the criteria that affect the jag type selection. The use of fuzzy sets is beneficial if the weights are to be presented to the end user either to give linguistic feedback or to obtain approval of the experts. However, the jag sequencing problem is different than selection a jag for a given cut list. In this study, the goal is to find the most suitable weights to obtain a better jag sequence with respect to the objective function in (8). The recommendation supplied by the decision support system is not the weights, but instead it is the jag sequence itself. Hence, the use of fuzzy logic is no longer necessary in the second stage. Thus, in this study, the method proposed by Wang, et al. [23] is simplified. In the simplified model, the fuzzy pair-wise comparisons are obtained by firing the fuzzy rulebase with crisp cut list attribute values. Then, the output fuzzy sets are defuzzified to obtain a crisp comparison matrix for the traditional AHP method, and the traditional TOPSIS method is utilized to rank the alternatives. This is advantageous for the following three reasons: i) imprecision due to expert opinion is still captured in the first stage, ii) computationally expensive fuzzy arithmetic operations Table 2 Correspondence of ordinal to ratio scale Scale
Pair-wise Comparison Value
Ordinal Ratio
01 1/9
0.2 1/7
0.3 1/5
0.4 1/3
0.5 1
0.6 3
0.7 5
0.8 7
0.9 9
426
O. Uncu et al.
are avoided, iii) irregularities due to fuzzy arithmetic on triangular fuzzy numbers are avoided. Instead of Fuzzy AHP, this study utilizes conventional AHP with fuzzy pair-wise comparisons. The weights are determined by the Row Means of Normalized Columns method. The jag types with scores calculated by using these weights are ranked by using the conventional TOPSIS method as explained in Sect. 2.
3.3 Weight Determination Using Fuzzy Rulebase Tuned by GA In the previous subsection, we proposed a simplified method to find the most suitable jag sequence by determining the most suitable weights associated with the decision criteria used to rank the jag types. This goal is achieved with the AHP method. The pair-wise comparisons of importance of decision criteria under different cut list characteristics, made by domain experts, were used to determine these weights. The uncertainty in these comparisons for different cut list conditions is captured by using fuzzy rulebases. However, is it necessary to use methods that require human input such as AHP to determine the weights? Since the simulation model proposed by Kotak, et al. [11] can quickly evaluate what-if scenarios, we utilize a metaheuristic search mechanism to identify the most suitable weights associated with the decision criteria without pair-wise comparisons acquired from experts. Thus, a Type 1 Mizumoto fuzzy rulebase model [14] is proposed to imply the crisp weights under different cut list conditions. In this study, the scope of application of fuzzy models is extended. We propose the use of two types of rulebases: (i) a rulebase for the weights to rank the jag types, and (ii) a rulebase for the weights to rank the jags. In rulebase (i), the antecedent variables are similar to those in the rulebase of Wang, et al. [23], i.e., size distribution, due date characteristics, and cumulative required quantity of the order file. Similarly, the consequent variables are the weights of average yield of the jag type, standard material cost of the grade associated with the jag type, and the penalty due to the difference between size distributions associated with the jag type and cut list. The processing time of the jag type and the standard material cost is highly correlated and determined by the grade of the jag types. For this reason, the weight associated with the processing time of the jag type is eliminated from the model. Thus, the input variables of the fuzzy rulebase used to select the jag type is defined as in (9). After selecting the most suitable jag type, the jags of the selected jag type will be ranked by using the weights identified by rulebase (ii) in which the antecedent variables are the percentage of long components in the cut list, board footage ratio between jag and cut list, and the scaled difference between the longest and second longest piece in the current cut list. The consequent variables are the weights associated with the board footage and the width code of the jag. The first input variable is defined as in (9). The remaining two input variables of the second type of the fuzzy rulebase are defined as follows:
Application of Fuzzy Systems to Wood Product Manufacturing
max (li ) −
i=1,...,N O
max i=1,...,N O− max
i=1,...,N O
x4 =
φ max
x5 =
J ag j ∈Select ed J ag Set
427 (l i ) (li )
(10)
(B F(J ag j ))
B F(cutli st)
(11)
where φ is 3700, J ag j is the j t h jag in the set of jags with the selected jag type chosen by using the weights identified by the rulebase (i), and BF(.) is the board footage of the argument. Let N Vk and N Rk be the number of cut list state variables and the total number of rules for the k t h decision criterion, respectively. It is assumed that the decision criteria are independent. Thus, a separate rulebase can be built for each criterion. Based on this assumption, the proposed rulebase structure for the k t h decision criterion is formally represented: NV k Rk : ALSO AND[x j ∈ X j isr Ai j ] → wki ∈ [0, 1] , ∀k = 1, . . . , NC N Rk i=1
j =1
(12)
where x j is the j t h cut list state variable with domain X j , Ai j is the Type 1 fuzzy set (linguistic value) of the j t h cut list state variable in the i t h rule with membership function μi (x j ) : X j → [0, 1], wki is the scalar weight of the k t h decision criterion in i t h rule. It should be noted that we have the following constraint on the membership functions: N Rk
μi (x j ) = 1, ∀ j = 1, . . . , N Vk ,
i=1
where x j is any crisp value in X j . A sample fuzzy rulebase to define the weight of standard material cost is given in Fig. 3. To reduce the number of rules, the antecedent variables were assigned to only two linguistic labels. Then, the consequent constants (crisp weights) are tuned by using genetic algorithms [8]. Let L, H, LO, and U denote Low, High, Loose, and Urgent, respectively. Then, the fuzzy sets to represent linguistic labels associated with antecedent variables are given in Table 3. Sugeno’s product, sum, and standard negation triplet will be used in place of AND, OR, NEGATION, respectively. Since the Mizumoto Type 1 fuzzy rulebase structure [14] is used, IMPLICATION is equivalent to scaling each consequent constant with the corresponding degree of fire. The aggregation of consequents is achieved by taking the weighted average of the scaled consequents. The consequent scalar weights of the above mentioned rulebases are tuned by usR Since there are only two linguistic ing the genetic algorithm toolbox of MATLAB. labels for each antecedent variable and three variables in rulebases that are designed for determining weights to rank jag types and jags, eight scalar values must be tuned
428
O. Uncu et al.
Fig. 3 Sample rulebase with cut list characteristics as antecedent and scalar weight as consequent
for each weight type. Thus, a total of forty values must be tuned for five different weight types. Therefore, each chromosome in the population has forty genes, where each gene is constrained to have a value in unit interval. The chromosome structure is shown in Fig. 4. where wki is the scalar weight of the k t h decision criterion in i t h rule, k = 1, .., 5, and i = 1, . . . , 8. w1i , w2i , w3i , w4i , and w5i stands for the weight associated with the board footage, the width code of the jag, the standard cost of the jag type, the chopsaw yield of the jag type, and the penalty due to mismatch in the distributions associated with the cut list and the jag type, respectively. The population size is forty chromosomes. The initial population was created randomly from uniform distribution in the unit interval. The maximum execution time was 15 minutes. The maximum number of generations allowed having the same best solution before termination which was set to 10 iterations. The default values R GA were used for the rest of the genetic algorithm parameters of the MATLAB toolbox. Table 3 Linguistic labels and membership functions of the antecedent variables Percentage Long Components Due Date Total Quantity BDRatio GapBetweenLong Components
Linguistic Term
Membership function
L H LO U L H L H L H
(0,0,0.2,0.8) (0.2,0.8,.1.1) (0,0,0.2,0.8) (0.2,0.8,.1.1) (0,0,0.2,0.8) (0.2,0.8,.1.1) (0,0,0.2,0.8) (0.2,0.8,.1.1) (0,0,0.2,0.8) (0.2,0.8,.1.1)
Application of Fuzzy Systems to Wood Product Manufacturing
W12
W18
...
...
Wij
429
...
W51
...
W58
Fig. 4 Chromosome structure used in the genetic algorithm
4 Experimental Results In order to test the proposed methods, four order files were selected from actual orders. Since the GA method is a randomized method, the method was executed with 100 different seeds to observe whether it converged to the objective value. The three methods explained in Sect. 3 were compared with respect to the objective function defined in (8). The value of a second of production PTC was set to $0.75/sec. The quantity and due date distribution of these four orders are shown in Figs. 5 and 6, respectively. The total quantity of components that needs to be produced in Order 1, 2, 3, and 4 are 3415, 2410, 3312, and 3410, respectively. The cumulative length (lineal meters) of the components ordered in Order 1, 2, 3 and 4 are 4338.8m, 2609.8m, 3944.7m, 3652.1m. The gap between the longest and the second longest component in Order 1, 2, 3, and 4 are 45 mm, 330 mm, 541 mm, and 500 mm, respectively. The percentage of long components (length greater than 2000 mm) in Order 1, 2, 3 and 4 are 0%, 8.1%, 27.8%, and 0.2%, respectively. The board footage of orders 1, 2, 3 and 4 are 4191, 2521, 3811 and 2911, respectively. The first three orders have components with 6/4 (or 40mm) thickness and 57mm width. The last order have components with 5/4 (or 33mm) thickness and 57mm width. The results for Order 1 are summarized in Table 4. Although the three methods used the same number of jags to produce the order, Zero Intelligence (ZI), AHP
Order 1
600 400 200 0
0
1000
2000
Order 2
600
Quantity
Quantity
800
3000
400 200 0
4000
0
1000
Length(mm) Order 3
3000
4000
Order 4
600
Quantity
Quantity
1000
2000
Length(mm)
500
400 200 0
0 0
1000
2000
3000
Length(mm)
4000
0
1000
2000
3000
Length(mm)
Fig. 5 Quantity distribution over length for four benchmark order files
4000
430
O. Uncu et al.
Table 4 Results of three methods for Order 1 Jag# Mill Grde Leng
Wdth
Thck
BdFt
1 2 3 Zero Intelligence 4 5 6 7 Jag# 1 2 3 AHP - Fuzzy Rulebase 4 5 6 7 Jag# 1 2 3 GA – Fuzzy Rulebase 4 5 6 7
20 1 1 1 12 1 1 Wdth 20 1 1 1 1 20 1 Wdth 20 1 1 1 1 1 20
6/4 6/4 6/4 6/4 6/4 6/4 6/4 Thck 6/4 6/4 6/4 6/4 6/4 6/4 6/4 Thck 6/4 6/4 6/4 6/4 6/4 6/4 6/4
1552.58 588.645 479.108 1456.37 1384.93 1356.36 1421.13 BdFt 1552.58 588.645 479.108 1456.37 1563.05 771.525 1356.36 BdFt 1552.58 588.645 479.108 1456.37 1809.75 625.793 1114.43
BT KF KF PT KF PT PT Mill BT KF KF PT PT BT PT Mill BT KF KF PT KF KF BT
S1 S1 S1 S1 CL S1 S1 Grde S1 S1 S1 S1 S1 S1 S1 Grde S1 S1 S1 S1 S1 S1 S1
15.9 15.4 15 16 12.8 15.2 16 Leng 15.9 15.4 15 16 15.7 8 15.2 Leng 15.9 15.4 15 16 12.9 12.6 13
OverAll Cost
$18,858
OverAll Cost
$18,633
OverAll Cost
$18,435
and GA methods used 8239, 7767, and 7626 board feet of lumber. Moreover, the ZI method used a CL (Clear) grade jag which is a considerably better grade, hence more expensive, than S1 (Shop 1) grade jags. The difference in the overall cost can be attributed to these two observations. Thus, the ZI, AHP, and GA methods are ranked as 3rd , 2nd, and 1st with respect to overall cost for Order 1.
Order 1
3 2 1 0
1
2
Frequency
10 5 1
2
3 DueDate
4 2 1
2
4
3 DueDate
4
Order 4
30
15
0
6
0
4
Order 3
20 Frequency
3 DueDate
Order 2
8 Frequency
Frequency
4
20 10 0
Fig. 6 Due date distribution for four benchmark order files
1
2
3 DueDate
4
Application of Fuzzy Systems to Wood Product Manufacturing
431
Table 5 Results of three methods for Order 2 Jag# Mill Grde Leng
Wdth
Thck
BdFt
1 2 Zero Intelligence 3 4 5 Jag# 1 2 AHP - Fuzzy Rulebase 3 4 5 6 Jag# 1 2 GA - Fuzzy Rulebase 3 4
12 20 25 1 1 Wdth 20 6 1 1 1 20 Wdth 12 25 25 1
6/4 6/4 6/4 6/4 6/4 Thck 6/4 6/4 6/4 6/4 6/4 6/4 Thck 6/4 6/4 6/4 6/4
1384.93 1157.29 789.623 625.793 479.108 BdFt 1552.58 420.053 625.793 588.645 479.108 1165.86 BdFt 1384.93 1508.76 789.623 479.108
KF BT YB KF KF Mill BT KF KF KF KF PT Mill KF YB YB KF
CL S1 S3 S1 S1 Grde S1 CL S1 S1 S1 S1 Grde CL VG S3 S1
12.8 12 14.2 12.6 15 Leng 15.9 13 12.6 15.4 15 16 Leng 12.8 14 14.2 15
OverAll Cost
$8,898.40
OverAll Cost
$10,726
OverAll Cost
$7,924.40
The results for Order 2 are summarized in Table 5. ZI, AHP and GA methods used different number of jags and different overall board footages of lumber for Order 2. ZI, AHP, and GA methods used 4436, 4832, and 4162 board feet of lumber. Although the ZI method used a high board footage of CL (Clear) grade jag, it also used a S3 (Shop 3) grade jag which is a considerably lower grade than S1 grade jags. The GA method used two jags with higher grades (CL and VG) first and then a lower grade (S3) and medium grade (S1) to complete the production. The difference in the consumed board footages and associated grades creates the main difference in the ranks of the methods. For Order 2, the ZI, AHP, and GA methods are ranked as 2nd, 3rd , and 1st with respect to overall cost. The results for Order 3 are summarized in Table 6. This time, ZI and AHP methods used exactly the same jag sequence. Although the GA method used the same number of jags, the board footage used to complete order is 7665 vs 9020. The GA method used high grade lumber to produce the longer pieces in the order first, then it used a considerably less amount of middle grade S1 jags to finish the rest of order. For Order 3, the GA method is the winner and ZI and AHP methods are tied with respect to overall cost. The results for Order 4 are summarized in Table 7. Although, the jag sequence obtained by the GA method used a higher number of jags than the one acquired by the AHP method, the GA method is slightly better than the AHP method with respect to overall cost. The reason is that the GA method selected jags with total board footage less than that determined by the AHP method. The AHP method yields a better jag sequence than the ZI method. The cost differences for the jag sequences acquired by using the ZI, AHP and GA methods are shown in Fig. 7. The percentage improvement in overall cost for three different methods in each order with respect to worst case in the corresponding order is given in Table 8.
432
O. Uncu et al.
Table 6 Results of three methods for Order 3 Jag# Mill Grde Leng
Wdth
Thck
BdFt
1 2 Zero Intelligence 3 4 5 6 Jag# 1 2 AHP- Fuzzy Rulebase 3 4 5 6 Jag# 1 2 GA - Fuzzy Rulebase 3 4 5 6
15.6 12 15.9 18.2 15.4 19.8 Leng 15.6 12 15.9 18.2 15.4 19.8 Leng 18.6 12.8 15.4 15 18.2 15.9
20 20 20 1 1 1 Wdth 20 20 20 1 1 1 Wdth 12 12 1 1 1 20
6/4 6/4 6/4 6/4 6/4 6/4 Thck 6/4 6/4 6/4 6/4 6/4 6/4 Thck 6/4 6/4 6/4 6/4 6/4 6/4
1566.86 1157.29 1552.58 1611.63 588.645 2543.18 BdFt 1566.86 1157.29 1552.58 1611.63 588.645 2543.18 BdFt 2048.83 1384.93 588.645 479.108 1611.63 1552.58
Table 7 Results of three methods for Order 4 Jag# Mill Grde Leng
Wdth
Thck
BdFt
1 2 Zero Intelligence 3 4 5 6 7 Jag# 1 2 AHP - Fuzzy Rulebase 3 4 5 6 Jag# 1 2 GA - Fuzzy Rulebase 3 4 5 6 7
4 4 4 12 12 6 6 Wdth 4 4 4 6 12 6 Wdth 4 4 4 12 25 8 6
5/4 5/4 5/4 5/4 5/4 5/4 5/4 Thck 5/4 5/4 5/4 5/4 5/4 5/4 Thck 5/4 5/4 5/4 5/4 5/4 5/4 5/4
1322.75 1234.25 1312.17 755.17 267.436 784.03 903.318 BdFt 1322.75 1234.25 1312.17 903.318 267.436 784.03 BdFt 1322.75 1234.25 1312.17 267.436 346.32 434.824 784.03
BT BT BT KF KF KF Mill BT BT BT KF KF KF Mill KF KF KF KF KF BT
BT BT BT BT BT BT BT Mill BT BT BT BT BT BT Mill BT BT BT BT BT BT BT
CL S1 S1 S1 S1 S1 Grde CL S1 S1 S1 S1 S1 Grde CL CL S1 S1 S1 S1
S1 S1 S1 S1 S1 S1 S1 Grde S1 S1 S1 S1 S1 S1 Grde S1 S1 S1 S1 S1 S1 S1
13.8 10 13.7 19.6 11 11.6 11.4 Leng 13.8 10 13.7 11.4 11 11.6 Leng 13.8 10 13.7 11 12 18.4 11.6
OverAll Cost
$19,718
OverAll Cost
$19,718
OverAll Cost
$15,283
OverAll Cost
$11,201
OverAll Cost
$10,295
OverAll Cost
$10,093
Application of Fuzzy Systems to Wood Product Manufacturing
433
Overall Cost
25000 20000 Zero Intelligence
15000
AHP - Fuzzy Rulebase
10000
GA-Fuzzy Rulebase
5000 0 1
2
3
4
Order
Fig. 7 Cost Comparison of Methods for Different Orders Table 8 Percentage improvement in overall cost for ZI, AHP and GA methods Order
Method Zero Intelligence
AHP - Fuzzy Rulebase
GA - Fuzzy Rulebase
1 2 3 4
0 17.12287899 0 0
1.193127585 0 0 8.088563521
2.24307986 26.12343837 22.49213916 9.891973931
It should be noted that the GA method gives several set of weights that yields jag sequences with the same overall cost, i.e., multiple solutions to the optimization problem. We checked whether different sets of weights were close to each other by taking the averages of the weights and running the model with averaged weights. Since 100 executions for all order files yielded the same result, we concluded that the GA method converges to the same objective value.
5 Conclusions and Future Studies This study extended the method proposed by Wang, et al. [23] to identify the aggregation weights of a multicriteria decision making problem of jag type and jag selection. However, rather than finding suitable jag types and hence jags for a given cut lists, the larger issue is to determine a suitable jag sequence for a given order file. Thus, the performance measure to evaluate the proposed methods is the overall cost (including raw material and processing time costs) to complete a given order for the jag sequence. As our results have shown, the AHP method, which requires expert opinion for creating a fuzzy rulebase, may yield worse results than the Zero Intelligence method with fixed weights. This result is likely due to inconsistent pair-wise comparisons. The GA method, which utilizes a fuzzy rulebase created randomly and tuned by a genetic algorithm, consistently yielded better results than the other two methods. The major result of this study is to utilize Type 1 fuzzy rulebases to capture the uncertainty due to selection of learning parameters of genetic algorithms. This study can extended to Type 2 by using an approach proposed by Uncu [20] involving the selection of values of learning parameters in the Fuzzy C-Means algorithm [2]. In
434
O. Uncu et al.
order to explain the rationale, consider the following analogy. In grey-box system modeling techniques, different opinions are a source of uncertainty. The learning algorithm or different values of learning parameters of the structure identification algorithm can be treated as different experts. In this study, Type 1 fuzzy system models are identified by using different values of the learning parameters of the structure identification method. Thus, the genetic algorithm will be executed with different values of its learning parameters. The uncertainty due to the selection of the learning parameters could be treated as the source of Type 2 fuzziness.
References 1. Belton V., Gear T. (1983) On a shortcoming of Saaty’s method of analytic hierarchies. Omega 11: 228–230 2. Bezdek J.C. (1973) Fuzzy Mathematics in Pattern Classification, Ph.D. Thesis, Cornell University, Ithaca 3. Buckley J.J. (1985) Fuzzy hierarchical analysis. Fuzzy Sets and Systems 17: 233–247 4. Chang P.T., Lee E.S. (1995) The estimation of normalized fuzzy weights. Computers & Mathematics with Applications 29: 21–42 5. Chen S. J., Hwang C. L. (1992) Fuzzy multiple attribute decision making. Lecture Notes in Economics and Mathematical Systems 375 6. Chu T.C., Lin Y.C. (2003) A fuzzy TOPSIS method for robot selection. International Journal of Advanced Manufacturing Technology 21:284–290 7. Deng H. (1999) Multicriteria analysis with fuzzy pair-wise comparisons. International Journal of Approximate Reasoning 21: 215–331 8. Holland J. H. (1975) Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI 9. Hwang C.L., Yoon K. (1981) Multi-Attribute Decision Making: Methods and Applications. Springer-Verlag, New York, 1981 10. Ishizaka A., Balkenborg D., Kaplan T. (2005) AHP does not like compromises: the role of measurement scales. Joint Workshop on Decision Support Systems, Experimental Economics & e-Participation: 45–54 11. Kotak D. B., Fleetwood M., Tamoto H., Gruver W. A. (2001) Operational scheduling for rough mills using a virtual manufacturing environment. Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, Tucson, AZ, USA 1:140–145 12. Mamdani E.H., Assilian S. (1974) An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller. International Journal of Man-Machine Studies 7:1–13 13. Mikhailov L. (2003) Deriving priorities from fuzzy pair-wise comparision judgements. Fuzzy Sets and Systems 134: 365–385 14. Mizumoto M. (1989) Method of fuzzy inference suitable for fuzzy control. Journal of the Society of Instrument Control Engineering 58: 959–963 15. Saaty T. L. (1980) The Analytic Hierarchy Process. McGraw Hill, New York, 1980. 16. Saaty T.L. (1988) Multicriteria Decision Making: The Analytic Hierarchy Process. RWS Publications, Pittsburgh, PA 17. Siu N., Elghoneimy E., Wang Y., Fleetwood M., Kotak D. B., Gruver W. A. (2004) Rough mill component scheduling: Heuristic search versus genetic algorithms. Proc. of the IEEE Int. Conference on Systems, Man, and Cybernetics, Netherlands 5: 4226–4231 18. Triantaphyllou E., Lin C.-T. (1996) Development and evaluation of five fuzzy multiattribute decision-making methods. International Journal of Approximate Reasoning 14: 281–310 19. Triantaphyllou E. (1999) Reduction of pair-wise comparisons in decision making via a duality approach. Journal of Multi-Criteria Decision Analysis 8:299–310
Application of Fuzzy Systems to Wood Product Manufacturing
435
20. Uncu O. (2003) Type 2 Fuzzy System Models with Type 1 Inference. Ph.D. Thesis, University of Toronto, Toronto, Canada 21. Van Laarhoven M.J.P., Pedrycz W. (1983) A fuzzy extension of Saaty’s Priority Theory. Fuzzy Sets and Systems 11: 229–241 22. Wang Y., Gruver W. A., Kotak D. B., Fleetwood M. (2003) A distributed decision support system for lumber jag selection in a rough mill. Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, Washington, DC, USA 1:616–621 23. Wang Y., Elghoneimy E., Gruver W.A., Fleetwood M., Kotak D.B. (2004) A fuzzy multiple decision support for jag selection. Proc. of the IEEE Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS’04 2: 27–30. 24. L. A. Zadeh (1965) Fuzzy Sets. Information and Control 8:338–353
On the Emergence of Fuzzy Logics from Elementary Percepts: the Checklist Paradigm and the Theory of Perceptions Ladislav J. Kohout and Eunjin Kim
1 Introduction The Checklist paradigm provides an organizing principle by using of which one investigates how and why specific systems of fuzzy logic emerge from more fundamental semantic and pragmatic principles. The basic construct that this paradigm uses is a checklist. Checkliststs are useful in two distinct but complementary ways: • As abstract formal constructs through which by a mathematically rigorous procedure the semantics of a variety of fuzzy logic connectives emerge from basic cognitive principles (Bandler and Kohout 1980, 1986). • In a concrete way, as a useful tool for acquiring the fuzzy membership function in practical applications (Bandler and Kohout 1980, Kohout and Kim 2000). The Checklist Paradigm (Bandler and Kohout 1980) also furnishes the semantics for both, point and interval-valued fuzzy logics. It clarifies the difference between these two different kinds of logics, also showing their mutual relationship. Through its cognitive interpretation the Checklist paradigm establishes the fundamental link between fuzzy sets and Zadeh’s theory of perception (Zadeh 2001). The granulation of fuzzy classes depends on the selection of the elementary percepts which are captured as atomic properties appearing in the template of the corresponding checklist.
1.1 Plethora of Fuzzy Logic Connectives It is well established now, that in fuzzy sets and systems we use a number of different many-valued logic connectives. In control engineering, in information retrieval,
Ladislav J. Kohout Department of Computer Science, Florida State University, Tallahassee, Florida 32306-4530, USA Eunjin Kim Department of Computer Science, John D. Odegard School of Aerospace Science, University of North Dakota, Grand Forks ND 58202-9015, USA
M. Nikravesh et al., (eds.), Forging New Frontiers: Fuzzy Pioneers I. C Springer 2007
437
438
L. J. Kohout, E. Kim
in analysis of medical data, or when analyzing data in aviation industry, different many-valued logics are used for different classes of application problems Although the first Zadeh’s paper used max, and min, in one of his subsequent papers Zadeh already used other connectives, such as BOLD connective (now called Lukasiewicz t-norm). Currently, the most widespread systems of connectives are based on t-norms and implication operators linked together by residuation. Other systems link AND connective with OR by means of DeMorgan rules, but some of these systems may have an implication that does not necessarily residuate. An example is min, max and Kleene-Dienes implication. In approximate reasoning, one is also interested in using a number of different connectives. This naturally motivates the comparative study of different logic connectives from the point of view of their adequacy for various applications.
1.2 Where did Fuzzy Logic Connectives Come From? This practical need of knowing useful properties of different systems of fuzzy logics leads to the question, if there are any underlying epistemological principles from which different systems may be generated. The answer to this question is provided by the Checklist Paradigm that was proposed by Bandler and Kohout in 1979. Since then a number of new results in this area have been obtained. The purpose of this paper is to survey the most important results to date.
2 Linking the Checklist Paradigm to the Theory of Perceptions When dealing with fuzzy sets, we are concerned not only with semantics but also with epistemological issues. This concern manifests itself in three facets that are mutually interlinked (Zadeh 2001): • Theory of preceptions, • Granularity, and • Computing with words. The issue of indiscernibility appears deeply entrenched in the very foundations of fuzzy logic in the wider sense. Indistiguishability (tolerance relation1 in mathematical sense) is deeply embedded in the semantics of fuzzy logic, whichever (pragmatic) interpretation we use. According to Zadeh a granule is a fuzzy set of points having the the form of a clump of elements drawn together by similarity. The Checklist paradigm provides elementary percepts that participate in forming granules and computing with words provides a framework for computations with granules and interpreting their meaning. 1
Tolerance is a relation that is symmetric and reflexive.
On the Emergence of Fuzzy Logics from Elementary Percepts:
439
2.1 Basic Principles of the Checklist Paradigm In this section we present the overview of the Checklist Paradigm semantics. As the Checklist Paradigm furnishes the semantics for both, point and interval-valued fuzzy logics we also clarify the difference between these two types of logics as well as their mutual relationship. The checklist paradigm puts ordering on the pairs of distinct implication operators and other pairs of connectives. It pairs two connectives of the same type (i.e. two ANDs or two ORs or two Implications), thus providing the interval bounds on the value of the fuzzy membership function. The interval logic given by the pair of TOP and BOT connectives has as its membership function a fuzzy valued function2 μ(Fuzz): X → P(R) with the rectangular shape. Hence it is a special case of fuzzy sets of the second type. We shall call such a logic system checklist paradigm based fuzzy logic of the second type or proper interval fuzzy logic. The logic which has as its membership function a real valued function that yields a single point as the value of a logic formula, in the way analogous to the assignment of values to the elements of ordinary fuzzy sets (i.e. fuzzy sets of the 1st type) is called a singleton logic, or checklist fuzzy logic of the first kind. In the logic of 2nd type the atomic object, the basic element of the valuation space, is a subset of a rectangular shape. In the logic of the 1st type, the atomic object is a singleton.
3 Checklist Paradigm: Its Mathematics and Cognitive Interpretation First, we provide the checklist paradigm based semantics of a fuzzy membership function for both kinds of logic, fuzzy point-logics and for fuzzy interval-logics. That is followed by deriving an alternative semantics of connectives for several systems of fuzzy logic by means of the checklist paradigm.
3.1 The Checklist, Degrees of Membership and Fuzzy Classes A checklist template Q is a finite family of some items. For the purpose of this paper, we restrict ourselves to specific elementary items, namely properties (Q1 , Q2 , . . ., Qi , . . ., Qn ). With a template Q, and a given proposition A, one can associate a specific checklist QA = (Q, A) which pairs the template Q with a given proposition A. A valuation gA of a checklist QA is a function from Q to {0,1}. Let us denote by symbol aQ the degree δ(A) to which the proposition A holds with respect to a template Q. This degree is given by the formula
2
P(R) denotes the power set of Reals. An interval is an element of this powerset.
440
L. J. Kohout, E. Kim
a Q = δ(A Q ) =
n
aiA , where n = car d (Q) and qiA = g A (Pi )
(1)
i=1
The checklist (Bandler and Kohout 1980) is used by an observer to assess the degree of membership of some thing to a specific fuzzy class hence such a class is a cognitive construct. This thing can be an abstract object, a linguistic statement, a logic sentence or a concrete physical object. An observer can be an abstract mathematical construct, an identification algorithm, a sensor or a measuring device or a real person. This bridges the gap between pure mathematical theory of fuzzy membership function and its empirical acquisition. Each different choice of properties for the repertory of properties {Qi } defines a different specific class. The degree to which a given object belongs to a chosen class is then obtained by comparison of the properties of a specific checklist with the properties of this object. So δ(A) is in fact a fuzzy membership function.
3.1.1 Cognitive interpretation The checklist can be given cognitive interpretation. In this interpretation, the properties {Qi } listed in a checklist are some elementary percepts. So this establishes the fundamental link of fuzzy sets to the theory of perception. It can be seen that granulation of fuzzy classes depends on the choice of the properties {Qi } of a corresponding checklist. This idea was first published in (Bandler and Kohout 1980), together with the mathematics of the coarse and fine structures obtained by the means of the checklist. Furthermore, the interdependency of granules as well as of different levels of granulation can be investigated by the analysis of inter-dependency of various checklists. Non-associative triangle products of relations can be used for this purpose (Bandler and Kohout 1980, Kohout 2001,2004, Kohout and Kim 2002).
3.2 The Emergence of Fuzzy Logic Connectives Having just the valuations of logic formulas that provide membership functions is not enough. Membership functions are combined together by logic operations that are defined by many-valued logic connectives. In order to derive the semantics of these connectives, we shall use two checklists. Let us assume that we have two distinct checklists that are filled in by two observers, agents A and B (who may or may not be in fact the same individual). Each observer uses the checklists assessing a different thing3. Each observer then fills in the relevant properties of the given
3
As remarked previously, this thing may be either an abstract object, a linguistic statement, a logic sentence or a concrete physical object, etc.
On the Emergence of Fuzzy Logics from Elementary Percepts:
441
object. As we already know, these properties provide the arguments for the valuation function. It should be noted that the “observers” may be viewed in two ways: • In an abstract way, as a mathematical procedure, and • In a concrete way, as a mechanism for constructing the fuzzy membership function from some empirical material.
3.2.1 Fine and Coarse Structure A fine valuation structure of a pair of propositions A, B with respect to the template Q is a function from Q into { 0,1} assigning to each attribute Pi the ordered pair of its values (Q iA , Q iB ). The cardinality of the set of all attributes Pi . is denoted by α j,k We have the following constraint on the values: α00 + α01 + α10 + α11 = n. Further, we define r0 = α00 + α01 , r1 = α10 + α11 , c0 = α00 + α10 , c1 = α01 + α11 . These entities can be displayed systematically in a contingency table.(see Table 1). The four α j,k of the contingency table constitute its fine structure. The margins c0 , c1 , r0 , r1 constitute its coarse structure. The coarse structure imposes bounds upon the fine structure, without determining it completely. Hence, associated with the various logical connectives between propositions are their extreme values. Thus we obtain the inequality restricting the possible values of m i (F). See Bandler and Kohout (1980, 1986).
3.2.2 Valuation of the Fine Structure and its Cognitive Interpretation Now let F be any logical propositional function of propositions A and B. For i, j ∈ {0, 1} let f (i, j ) be the classical truth value of F for the pair (i,j) of truth values; let αi j . Then we define the a (non-truth-functional) fuzzy assessment of the u(i, j ) = n truth of the proposition F(A,B) to be m f in (F(A, B)) =
f (i, j ) · u i j
i, j
This assessment operator will be called the value of the fine structure.
Table 1 Checklist Paradigm Contingency Table No for A Yes for A Column Total
No for B
Yes for B
Row total
α00 α10 c0
α01 α11 c1
r0 r1 n
(2)
442
L. J. Kohout, E. Kim
3.2.3 Percepts and Observers When human observers make summarization of observational data into percepts, the fine structure of the observed properties on basis of which the summarization is performed may be hidden. In other words, a summarization is a cognition based measure that acts on the perceived properties. This may be conscious or subconscious. We can capture this summarization mathematically by assigning a measure to the fine structure (the interior part of the contingency table). Different measures can be assigned, each having a different cognitive and epistemic interpretation (Bandler and Kohout 1980). Let us look at the ways that Bandler and Kohout (1980) used the measures for summarization of the fine structure of the contingency tables of the checklists. Two observers A and B, (which may or may not be in fact the same individual) are assessing two different objects V and W (parts in engineering, patients in clinical observations, etc.) using each her/his own checklist, as follows: • A uses the list on V , • B uses the list on W . Both checklists are derived from the same checklist template, i.e. they both contain the list of the same properties. Then, we wish to assign a measure to the degree to which observer A’s saying “yes” to items on the her/his checklist implies observer B’s saying “yes” on to the same items on her/his checklist; in brief words, a measure of the support by the fine data of the checklist4 give to the statement “if yes-A then yes-B” while referring to the same item ( property) on her/his respective checklist.
3.2.4 Assigning Measures to Percepts and Providing Epistemological Interpretation In classical logic, if yes-A then yes-B is satisfied by “performance” whenever yes-A and yes-B occur together , and “by default” whenever no-A occurs, regardless of the B’s answer. Thus all entries support the statement except for α10 . Thus if (in ignorance of any yet finer detail) we weight all items equally, the appropriate classical measure of support is m1 =
α00 + α01 + α11 α10 . =1− α00 + α01 + α10 + α11 n
(3)
The selection of different αi j s of the contingency table (Fig. 1) as the components of the measure formula yields different types of logic connectives. Bandler and Kohout (1980) dealt with five implication measures: m 1 to m 5 . Their paper
4
summarized in the fine structure of the contingency table.
On the Emergence of Fuzzy Logics from Elementary Percepts:
443
&
Fig. 1 Directed Graph of Transformations of 8-group
contains wealth of other material on implication operators and their properties Here we shall deal only with the first three measures. Classical approach of the first measure is not the only possible one. Another point of view, worthy of attention, says that only the cases in which an item was checked “yes-A” are relevant, that is only the cases of satisfaction “by performance”. In this view the appropriate measure would be a performance measure: m2 =
α11 α10 =1− . α10 + α11 r1
(4)
Still another point of view wishes to distinguish the proportions of satisfaction “by performance” α11 /n” and “by default” r0 /n, and to assign as measure the better of the two, thus m3 =
α11 ∨ (α00 + α01 ) . n
Each measure generates a different implication operator as shown in Table 2.
5
∨, ∧ are max, min respectively.
(5)
444
L. J. Kohout, E. Kim Table 2 Bounds on Implication Measures Measure
BOT-connective
TOP-connective
m 1 = 1 − (α10 /n)
max(1-a, b) Kleene-Dienes implication a+b−1 max(0, ) a
min(1, 1-a+b) Lukasiewicz implication
m 2 = 1 − (α10 /r1 ) m3 =
a11 ∨ (a00 + a01 ) max(a+b-1, 1-a) n
min(1, b/a) Goguen G43 implication EZ (Zadeh) Implication5 (a ∧ b) ∨ (1 − a)
The bounds on normalized values6 of the interior of the contingency table are displayed in the two constraint structures, labeled as MINDIAG and MAXDIAG. The first is obtained by minimizing the values of the diagonal, the second by maximizing these.
3.2.5 Fuzzy Membership and the Emergence of Intervals We have seen that the checklist paradigm puts ordering on the pairs of distinct implication operators. Hence, it provides a theoretical justification of interval-valued approximate inference. The measures so far discussed were implicative measures yielding implication operators. We shall denote these7 as m(Ply). Other connectives have their measures as well. For example on may have α11 α00 + α01 + α11 = u 11 , m 1 (∨) = = u 00 + u 01 + u 11 . n n α01 + α10 α00 + α11 = u 01 + u 10 , m 1 (≡) = = u 00 + u 11 . m 1 (⊕) = n n
m 1 (&) =
(6) (7)
Table 3 MINDIAG and MAXDIAG constraints
6
Normalized values are values given by u i j . These can be translated directly into the names of argument variables as shown in Table 3.
7
Ply is a neutral name for implication operator that does not put any ontological connotation onto it.
On the Emergence of Fuzzy Logics from Elementary Percepts:
445
Based on the above observations one can make the following statements: • A specific approximation measure m(F), provides the assessment of the amount of elementary percepts participating in formation of the membership function within the framework of the fine structure of the checklist. Within the framework of the coarse structure the Checklist Paradigm generates pairs of distinct connectives BOT and TOP of the same logical type that determine the end points of intervals of fuzzy membership values expressed by m(F). Thus we have the following general form of inequality: con B OT ≤ m(F) ≤ con T O P
(8)
For the m 1 measure, there are 16 inequalities linking the TOP and BOT types of connectives thus yielding 16 logical types of TOP and BOT pairs of connectives. Ten of these interval pairs generated by m1 are listed in Table 4. All 16 pairs of connectives derived from m 1 are dealt with in (Bandler and Kohout 1982, 1986). The most up-to-date account of the system based on m 1 appears in (Kohout and Kim 2004).
3.2.6 Natural Crispness and Fuzziness The width of the interval produced by an application of a pair of associated connectives (i.e. TOP and BOT connectives) characterizes the margins of imprecision of an interval logic expression. The interval between the TOP connective and the BOT connective is directly linked to the concept of fuzziness for system m 1 . We define the un-normalized fuzziness of x (cf. Bandler and Kohout 1980a) as φ(x) = min(x, 1 − x) then for x in the range [0, 1], φ(x) is in the range [0, 1], with value 0 if and only if x is Table 4 Connectives of System with m1 Logical Type
Valuation
Proposition
Connectives
BOTTOM
Conjunction(AND) Non-Disjunction(Nicod) Non-Conjunction(Sheffer) Disjunction(OR) Non-Inverse Implication Non-Implication Inverse Implication Implication Equivalence (iff) Exclusion(eor)
a&b a↓b a |b a∨b a← | b a→ | b a←b a→b a≡b a⊕p
max(0, a+b-1) max(0, 1-a-b) max(1-a, 1-b) max(a, b) max(0, b-a) max(0, a-b) max(a, 1-b) max(1-a, b) max(1-a-b, a+b-1) max(a-b, b-a)
TOP ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤
min(a,b) min(1-a, 1-b) min(1, 2-a-b) min(1, a+b) min(1-a, b) min(a, 1-b) min(1, 1+a-b) min(1, 1-a+b) min(1-a+b, 1+a-b) min(2-a-b, a+b)
446
L. J. Kohout, E. Kim
crisp, and value 0.5 if and only if x is 0.5.The crispness is defined as the dual of fuzziness8 . The Gap Theorems (6) and (7) (Bandler and Kohout 1986) determine the width of the interval as a function of the fuzziness of the arguments of a logic connective. (a ≡T O P b − a ≡ B OT b) = (a ⊕T O P b − a ⊕ B OT b) = 2 min(φa, φb) (a&T O P b − a& B OT b) = (a ∨T O P b − a ∨ B OT b) = = (a →T O P b − a → B OT b) = min(φa, φb)
(9) (10)
3.3 Cognitive Aspects Linguists distinguish the surface from the deep structure of language. The deep structure is hidden in the cognitive structures of the brain, while the surface structure manifests itself in utterances of sentences by a speaker. Similar relationship obtains between the fine and the coarse structures of the fuzzy statements. The fine structure is a deep structure that may or may not be hidden, while the coarse structure is on the surface and manifests itself in approximate reasoning.
4 Groups of Transformations Logic transformations are useful in investigating the mutual interrelationships of logic connectives. The global structure of systems of logic connectives can be fruitfully studied by employing the abstract group properties of their group transformations. A specific abstract group provides a global structural characterization of a specific family of permutations that concretely represent this abstract group. This idea can be used for global characterization of logic connectives.
4.1 Abstract Groups and their Realization by the Groups of Transformations An abstract group captures the general structure of a whole class of systems of logics. The concrete realization of such an abstract group is a group of transformations, which captures how the connectives transform to each other. This, of course, also captures connectives interactions.
This idea of natural crispness of a fuzzy proposition as κa = a ∨ (1 − a) and fuzziness as its dual φa = 1 − κa first appeared in (Bandler and Kohout 1978).
8
On the Emergence of Fuzzy Logics from Elementary Percepts:
447
An abstract group has to be distinguished from its realization by transformations (Weyl 1950). A realization consists in associating with each element a of the abstract group a transformation T (a) of a point-field9 in such a way that to the composition of elements of the group corresponds composition of the associated transformations T (ba) = T (b) T(a). To each point of the field over which the transformations are applied may correspond some structure, and transformations capture the laws of dependencies of these structures on each other. In our case, the points of the field of elements of transformations are the individual logic connectives, the inner structure of which is determined by the logic definitions of these connectives.
4.2 Transforming Logics with Commutative AND and OR 4.2.1 Piaget Group of Transformations A global characterization of two-valued connectives of logic using groups was first given by a Swiss psychologist Jean Piaget in the context of studies of human cognitive development. Given a family of logical connectives one can apply to these various transformations. Individual logic connectives are 2-argument logic functions. Transformations are functors that, taking one connective as the argument will produce another connective. Let 4 transformations on basic propositional functions f(x,y) of 2 arguments be given as follows: I ( f ) = f (x, y) : Identity D( f ) = ¬ f (¬x, ¬y) : Dual C( f ) = f (¬x, ¬y) : Contradual N( f ) = ¬ f (x, y) : Negation
(11)
The transformations {I, D, C, N} are called identity, dual, contradual, negation transformation, respectively. It is well known that for the crisp (2-valued) logic these transformations determine the Piaget group. This group of transformations is a realization of the abstract Klein 4-element group. The group multiplication table is shown as (12) below. ◦ I C N D
9
I I C N D
CN CN I D D I NC
D D N C I
(12)
A transformation is a relation with special properties (i.e. a function), that has its domain X and range Y . These two taken together (i.e. X ∪ Y ) form a set called field.
448
L. J. Kohout, E. Kim
It is well known that the Piaget group of transformation is satisfied by some many-valued logics (Dubois and Prade 1980).For all these logics described in the quoted reference, the connectives characterized by the Piaget group are just a single family that can be used for point-based many-valued logic inference. By point-based we mean an inferential system where any formula has just a single truth-value from the interval [0, 1]. (Kohout and Bandler 1979, 1992) however, demonstrated that these transformations also apply to interval fuzzy logics. This has been further investigated in the subsequent papers. See the list of references in (Kohout and Kim 2004) for further details.
4.2.2 Kohout and Bandler Group of Transformations Adding new non-symmetrical transformations to those defined by Piaget enriches the algebraic structure of logical transformations. (Kohout and Bandler 1979) added the following non-symmetric operations to the above defined four symmetric transformations: LC( f ) = f (¬x, y), RC( f ) = f (x, ¬y) L D( f ) = ¬ f (¬x, y), RD( f ) = ¬ f (x, ¬y)
(13)
The group multiplication table that includes the asymmetrical transformations given by (13) is shown in Table 5. It is an 8-element symmetric group S2×2×2 . It can be seen that it contains the abstract Klein 4-group as its subgroup. In order to see the link of these abstractions to logic we have to describe the concrete realization of the abstract group by means of the transformations of Piaget with added extensions provided by Kohout and Bandler (1979). We can see that the transformations discovered by Piaget are symmetrical in arguments x, y with respect to the application of negation, while those provided by Kohout and Bandler are asymmetrical in x, y. The point-field of the group of transformations that realizes the 8-element symmetric abstract group (cf. Table 5) consists of the following set of points CON. The transformations T p are listed in Table 6 together with the set CON. It is revealing to display the concrete group of transformations as a graph where Table 5 Abstract Symmetric 8-element group o
I
C
N
D
LC
RC
LD
RD
I C N D LC RC LD RD
I C N D LC RC LD RD
C I D N RC LC RD LD
N D I C LD RD LC RC
D N C I RD LD RC LC
LC RC LD RD I C N D
RC LC RD LD C I D N
LD RD LC RC N D I C
RD LD RC LC D N C I
On the Emergence of Fuzzy Logics from Elementary Percepts:
449
Table 6 Group of Transformations realizing the abstract 8-element group CON = |, ∨, ← |, → |, ←, →} ⎧ {&, ↓, ⎫ &, ↓, |, ∨, ← |, → |, ←, → &, ↓, |, ∨, ← |, → |, ←, → ⎪ ⎪ ⎪ I = , D = , ⎪ ⎪ ⎪ ⎪ &, ↓, |, ∨, ← |, → |, ←, → ∨, |, ↓, &, →, ←, → |, ← | ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ |, → |, ←, → &, ↓, |, ∨, ← |, → |, ←, → &, ↓, |, ∨, ← ⎪ ⎪ ⎪ ,N = , ⎪ ⎬ ⎨ C= ↓, &, ∨, |, → |, ← |, →, ← |, ∨, &, ↓, ←, →, ← |, → | TP = &, ↓, |, ∨, ← |, → |, ←, → &, ↓, |, ∨, ← |, → |, ←, → ⎪ ⎪ ⎪ LC = , RC = ,⎪ ⎪ ⎪ ⎪ ⎪ ← |, → |, ←, →, &, ↓, |, ∨ → |, ← |, →, ←, ↓, &, ∨, | ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ &, ↓, |, ∨, ← |, → |, ←, → &, ↓, |, ∨, ← |, → |, ←, → ⎪ ⎪ ⎪ ⎪ , RD = ⎭ ⎩ LD = ←, →, ← |, → |, |, ∨, &, ↓ →, ←, → |, ← |, ∨, |, ↓, &
the graph edges are labeled by the names of transformations that were defined abstractly by (11) and (13) listed above.
5 Group Representation of the Checklist Paradigm Interval Based Fuzzy Logic Systems In this section we shall look at system m 1 the transformation group representation of which is a representation of 8 element symmetric S2×2×2 abstract group introduced and further described above. This group includes implications, AND, OR connectives and some other connectives forming a closed system of connectives. Transformations of another closed system of connectives, containing equivalence and exclusive-or form a subgroup of the S2×2×2 group.
5.1 Interval System m 1 with Lukasiewicz and Kleene-Dienes Bounds The full systems of connectives we are concerned with contain 16 connectives, ten of which are genuinely two argument. The following two theorems are concerned with closed subsystems that contain eight connectives.
5.1.1 Group Representation of Subsystems with Eight Connectives Theorem 1. (Kohout and Bandler 1992) The closed set of connectives generated by the Lukasiewicz implication operator a →5 b = min(1, 1 − a + b) by the transformation T is listed below. This set of connectives together with T is a realization of the abstract group S2×2×2 .
450
L. J. Kohout, E. Kim
Connective Transformation Type of Interval Bound g1 = I (→5 ) = mi n(1, 1 − a + b) →T O P g2 = C(→5 ) = mi n(1, 1 + a − b) ←T O P g3 = D(→5 ) = max(0, b − a) B OT g4 = N(→5 ) = max(0, a − b) B OT
g5 g6 g7 g8
= = = =
LC(→5 ) = mi n(1, a + b) L D(→5 ) = max(0, 1 − a − b) RC(→5 ) = mi n(1, 2 − a − b) R D(→5 ) = max(0, a + b − 1)
∨T O P ↓ B OT |T O P B OT
Theorem 2. (Kohout and Kim 1997). The closed set of connectives generated by the Kleene-Dienes implication operator a →6 b = max(1 − a, b) by the transformation T is listed below. This set of connectives together with T is a realization of the abstract group S2×2×2 .
Connective Transformation Type of Interval Bound g1 = I (→6 ) = max(1 − a, b) → B OT g2 = C(→6 ) = max(a, 1 − b) ← B OT g3 = D(→6 ) = mi n(1 − a, b) T O P g4 = N(→6 ) = mi n(a, 1 − b) T O P g5 = LC(→6 ) = max(a, b) ∨ B OT g6 = L D(→6 ) = mi n(1 − a, 1 − b) ↓T O P g7 = RC(→6 ) = max(1 − a, 1 − b) | B OT g8 = R D(→6 ) = max(a, b). &T O P
5.1.2 4-Group Structure of the Group Representation of the Subsystem {≡T O P , ⊕ B OT , ≡ B OT , ⊕T O P } of m 1 It is interesting to see that the Klein group which is a subgroup of S2×2×2 group characterizes the structure of some TOP – BOT pairs of interval logic systems generated by the checklist paradigm. Thus application of T to the set of interval connectives {≡T O P , ⊕ B OT , ≡ B OT , ⊕T O P } yields the representation that has the structure of the Klein group.
5.1.3 The TOP and BOT Bounds–Preserving Interval Inferences Definition 1. A subset A of a set B of connectives is an elementary closed set iff it is a knot in A. This means that 1. any element in B is reachable from any other element in B, and
On the Emergence of Fuzzy Logics from Elementary Percepts:
451
2. no element outside B is reachable from B by any number of applications of transformation T . The closure is relative with respect to application of a specific set of transformations T . Definition 2. A pure set of connectives is a set that contains 1. only the connectives of one classification type (e.g. only TOP, or only BOT, or only Min-diag, or only Max-diag, etc.). 2. It is an elementary closed set. Theorem 3. (Kohout and Kim 1997). The following sets of connectives are pure sets of connectives with respect to transformation T{I, C, LC, RC}: 1) 2) 3) 4)
The set of TOP connectives {&, ↓, ←|, → | }. This set is TOP-bound preserving. The set of BOT connectives {&, ↓, ←|, → | }. This set is BOT-bound preserving. The set of TOP connectives {|, ∨, ←, →}. This set is TOP-bound preserving. The set of BOT connectives {|, ∨, ←, →}. This set is BOP-bound preserving.
5.1.4 Max-diag and Min-diag Classification of Interval Connectives We have seen that the interval TOP and BOT classes of connectives represent the extremes of the fine structure that is captured by the contingency tables. Contingency tables acquire the extreme values for minimal and maximal values on its diagonal, respectively. This yields another classification of interval connectives. This classification reflects the properties of the fine structure, of which the coarse structure is an approximation. • A contingency table that has a zero on the diagonal, i.e. αii = 0 yields a connective of Min-diag type. • A contingency table that has a zero off its diagonal that is where αi j = 0 or α j i = 0 and j = i , yields a connective of Max-diag type. Theorem 4. (Kohout and Kim 1997). The following sets of connectives are pure sets of connectives with respect to transformation T{I, N, C, D}: 1) 2) 3) 4)
The set of Max-diag connectives {&TOP , ↓TOP , |BOT , ∨BOT }. The set of Min-diag connectives {&BOT , ↓BOT , |TOP , ∨TOP }. The set of Min-diag connectives {←−BOT , −→BOT , ←|TOP , → |TOP }. | BOT }. The set of Max-diag connectives {←−TOP , −→TOP , ←| BOT , →
5.2 Classification of Connectives From the previous theorems we can see that there are at least 3 kinds of classification of connectives generated by the checklist paradigm semantics, namely,
452
L. J. Kohout, E. Kim
• Interval-based: TOP-BOT interval pair giving the bounds on the values of the interval. See Table 4 above. For further details see (Bandler and Kohout 1982, 1986) • Constraint-based (Theorem 4): Maxdiag-Mindiag characterising the fine structure of the checklist paradigm. For further detais see (Kohout and Kim 1997). • Group Transformation-Based: which is studied in great detail in (Kohout and Kim 2004). The subset {≡T O P , ⊕ B OT , ≡ B OT , ⊕T O P } given above was just an example of one of the subsystems that fall into this classification .
5.3 Non-commutative Systems of Fuzzy Logics Recently several works have been published on systems of fuzzy logics with noncommutative AND and OR connectives. It should be noted that the Checklist paradigm has provided semantics for some of these as well. In particular, systems based on m 2 and m 3 measures introduced above are in this category. We shall overview in some detail system m 2 . It is obvious that a non-commutative system of logic has to contain two AND and also two OR connectives. As a consequence of non-commutativity it will also contain two implication operators (denoted by right arrows →) and two co-implication operators ← . As the consequence some other connectives will be also in duplicate. We denote the duplicate connectives by bullets attached to the symbol of the connective. For the purpose of capturing non-commutativity, a new transformation will be introduced, namely a commutator K that is added to the eight already introduced transformations: T P = {I, C, D, N, LC, L D, RC, R D, K }
(14)
This is applied to AND and OR connectives. The following expression is the equational definition of the commutator: a&•b = K (a&b) = b&a;
a ∨• b = K (a ∨ b) = b ∨ a.
(15)
It is clear that for any connective ∗ the commutator yields a ∗ b = K(b ∗ a).
5.4 Commutativity and Contrapositivity of Connectives: Its Presence or Absence Definition 3. An implication operator is contrapositive, if its valuation satisfies the semantic equality. Otherwise it is not contrapositive. The implication operators of Kleene-Dienes (plybot) and Lukasiewicz (plytop) appearing in the interval system m 1 are contrapositive. When the implication operator → is not contrapositive, the AND and OR connectives generated by the application of the group-compliant logic transformation T to → are non-commutative.
On the Emergence of Fuzzy Logics from Elementary Percepts:
453
Theorem 5. Let S2×2×2 be represented by the set of transformations T{I, N, C, D, LC, RC, LD, RD} applied to the set C O N = {&, ∨, ↓, |, −→, ←−, → | , ←|} Then, if → is contrapositive, the corresponding &and ∨ in CON must be commutative. For the proof see (Kohout and Kim 1997). It is clear that for any connective ∗ the commutator yields a ∗ b = K (b∗ a). We say that a pair of two–argument logic sentences is pairwise commutative if and only if a ∗ b = K (b∗ a).
(16)
Commutativity involves restrictions on transformations of connectives, as does the contrapositivity. Theorem 6. • For any contrapositive → the following equalities hold: C[K (a → b)] = K [C(a → b)] = a → b
(17)
• For a non-contrapositive (1) above fails, but the following equalities hold: K (C(K (C(a → b)))) = a → b.
(18)
For the proof see (Kohout and Kim 1997).
5.5 The Group of Transformations of the Interval System m 2 of Connectives First, we shall examine two closed subsystems generated by the implications of this system. The abstract group structure of these is captured by S2×2×2 . Each of these subsystems contains one AND and one OR of the non-commutative pair.of these connectives. This will be followed by the presentation of the 16 element S2×2×2×2 abstract group and by the graph of corresponding transformation group representation. Theorem 7. The following closed set of connectives is generated from the ply-top implication operator a →4 b = min(1, b/a) by the transformation T P :
454
L. J. Kohout, E. Kim
→= I (→4 ) = mi n(1, b/a) ←• = C(→4 ) = mi n(1, 1 − b/1 − a) ←|• = D(→4 ) = max(0, b − a/1 − a) → | = N(→4 ) = max(0, a − b/a) ∨ = LC(→4 ) = mi n(1, b/1 − a) ↓= L D(→4 ) = max(0, 1 − a − b/1 − a) | = RC(→4 ) = mi n(1, 1 − b/a) & = R D(→4 ) = max(0, a + b − 1/a).
This set of connectives together with T P is a realization of the abstract group S2×2×2 . It is a closed subsystem of connectives: This is, however not the full story. There is another representation that generates the same abstract group S2×2×2 . It is generated by the co-implication operator a ←4 b = min(1, a/b): Theorem 8. The following closed set of connectives is generated from the ply-top co- implication operator a ←4 b = min(1, a/b) by the transformation T P : ←= I (←4 ) = mi n(1, a/b) →• = C(←4 ) = mi n(1, 1 − a/1 − b) → |• = D(←4 ) = max(0, a − b/1 − b) ←| = N(←4 ) = max(0, b − a/b) ∨• = LC(←4 ) = mi n(1, a/1 − b) ↓• = L D(←4 ) = max(0, 1 − b − a/1 − b) |• = RC(←4 ) = mi n(1, 1 − a/b) &• = R D(←4 ) = max(0, b + a − 1/b).
The transformation group of this set is a representation of S2×2×2 abstract group. It is a closed subsystem of connectives: Corollary 9. Application of the commutator K to any connective from a closed system of connectives given by Theorem 7 yields the connective of the same type that belongs to the system of closed connectives of Theorem 8 and vice versa. Proof. A direct application of the commutator K defined by the (16) above, to a connective from one subsystem yields a corresponding connective of the same type in the other subsystem Non-commutative systems contain two AND and two OR connectives. The closed system of Theorem 7 contains & and ∨ connectives. The closed system of Theorem 8 contains their counterparts, &· and ∨· connectives. These subsystems are linked together by the commutator as shown in Collorary 9.Ten two-argument
On the Emergence of Fuzzy Logics from Elementary Percepts:
455
Table 7 Interval connectives emerging from measure m 2 Bot: C O Nbot (a, b) & ← | ↓ → |• → | &• ← |• ↓•
max(0,a+b-1/a) max(0,(b-a)/b) max(0,(1-a-b)/(1-a)) max(0,(a-b)/(1-b)) max(0,(a-b)/a max(0,(a+b-1)/b) max(0,(b-a)/(1-a)) max(0,(1-a-b)/(1-b))
≤ ≤ ≤ ≤ ≤ ≤ ≤ ≤
C O Ntop (a, b)
Top:
min(1,b/a) min(1,(1-a)/b) min(1,(1-b)/(1-a)) min(1,a/(1-b)) min(1,(1-b)/a) min(1,a/b) min(1,b/(1-a)) min(1,(1-a)/(1-b))
→ |• ←• ∧• | ← ∧ →•
connectives of the system based on measure m 2 are shown in Table 7. All sixteen connectives are derived and further examined in (Kohout and Kim 2005). The non-commutative interval system that emerges from the fine structure of the Checklist paradigm by application of the measure m 2 has the 16 element transformation group the structure of which is captured in Fig. 2. The next question that naturally arises is: “What abstract group is represented by Fig. 2?” The multiplication table of this group shown in Table 8 answers this question. Further details concerning the non-commutative interval system based on the measure m 2 which contains the Goguen implication and co-implication can be found in (Kohout and Kim 1997, 2005). The group classification of the subsystems
Fig. 2 Graph of Transformations of 16-symmetric group
456
L. J. Kohout, E. Kim
Table 8 Abstract Symmetric 16-element Group S2×2×2×2 ·
of connectives of this non-commutative system appears in (Kohout and Kim 2005). It looks at the structure of the system in terms of the subgroups of S2×2×2×2 .
5.6 Interval versus Point Fuzzy Logics Fuzzy interval logic systems are usually treated as completely different from fuzzy point logics. It is often believed that these two kinds of logics are not linked by any mathematical theory or foundational ontological and epistemological principles. We shall demonstrate that the Checklist paradigm semantics provide such unifying principles for both.
6 Distinguishing Checklist Paradigm Fuzzy Logics of the Second and the First Kind In order to clarify these issues we have to provide some meaningful definitions first. Definition 4. Fuzzy singleton logic (also called point logic) has as its membership function a real valued function that yields a single point as the value of a logic formula, in the way analogous to the assignment of values to the elements of ordinary fuzzy sets (i.e. fuzzy sets of the first type). Hence this system should be called a singleton logic, or the checklist fuzzy logic of the first kind.
On the Emergence of Fuzzy Logics from Elementary Percepts:
457
Definition 5. Fuzzy interval logic which will also be called proper interval fuzzy logic has as its membership function a fuzzy valued function μ(Fuzz): X → P(R) with the rectangular shape. It is a special case of fuzzy sets of the second type. Hence we shall call it also fuzzy logic of the second kind. The interval logic derived with help of m 1 (see Table 2 above) is a proper interval logic – it has a membership function with the rectangular shape. Now we have to demonstrate under which conditions it may collapse into m 1 based singleton fuzzy logic, i.e. a logic of the first kind.
6.1 A Measure for the Expected Case Systems m 1 to m 5 are pure fuzzy interval logics, not involving any probabilistic notions. Bandler and Kohout (1980), however, also asked the question as to how the fine structure can be characterized by expected values. When only the row and column totals ri , c j of the fine structure captured by the contingency table are known (see Fig.1 above), one can ask what are the expected values for the αi j . It can be seen that the ways in which numbers can be sprinkled into the cells of the contingency table so as to give the fixed coarse totals constitute a hyper–geometric distribution. The inequalities determining the interval formed by BOT and TOP pair of connectives of the same logical type change into equalities if the probabilistic notion of expected value is introduced as an additional constraint. The interval collapses into single point, the expected value, and the MID connective emerges (Bandler and Kohout 1980). The expected values for the measure m 1 and the corresponding logical types of MID connectives are listed above in Table 9; see also Fig. 1 in (Bandler and Kohout 1986) Their paper lists all 16 connectives. Interestingly, the implication operator that emerges from the checklist paradigm derivation is Reichenbach’s implication operator that was introduced by Hans Reichenbach in late 1930s.
Table 9 Expected case, system m 1 – Fuzzy logic of the Type1 (point logic) Logical type
Connective
Expected value
AND OR Implication Inverse Implication Non-Implication Non-Inverse Implication Equivalence Exclusive OR Nicod Sheffer
a∧b a∨b a→b a←b ¬(a → b) ¬(a ← b) a≡b a⊕b ¬(a OR b) ¬(a AND b)
ab a + b − ab 1 − a + ab 1 − b + ab a(1 − b) (1 − a)b (1 − a)(1 − b) + ab (1 − a)b + a(1 − b) (1 − a)(1 − b) 1 − ab
458
L. J. Kohout, E. Kim
6.2 Fuzzy Singleton and Interval Logics Revisited: an Ontological Distinction The previous section demonstrated that the epistemology offered by the Checklist paradigm could yield ontological distinctions10 between these two kinds of fuzzy logics. The bounds induced by m 1 described by (1) and also shown in Table 3 above as distinctions between Maxdiag and Mindiag of course still hold for these MID connectives. What changes is the character of the space of values which the lower and upper bounds jointly delimit. In the logic of 2nd type the atomic object, the basic element of the valuation space is a subset of a rectangular shape. In the logic of the 1st type, the atomic object is a singleton. The constraints that delimit the atomic objects of the logic of the 1st type are tighter, and contain more information determining a specific single connective on which the bounds of the interval operate. In the expected case for m 1 these connectives are probabilistic, i.e. the connectives of the Reichenbach logic. In case of logic of the 2nd type we do not have the information about the exact nature of the connective that is constrained by the lower and the upper bound forming the interval. For example, in the case of AND, it could be any continuous t-norm (or more generally any copula).
7 Conclusion The Checklist paradigm has clarified the following issues: • Many-valued logic connectives which form the basis for operations on fuzzy sets emerge from more basic epistemological principles that can be given cognitive interpretation instead of being ad hoc postulated. • Through the distinction between the fine and the course structure, the Checklist paradigm demonstrates that indiscernibility is related to the semantics of fuzzy sets. • Granules provided by the intervals are primary, the singleton membership function emerges by the collapse of the intervals into points under special circumstances. • The Checklist paradigm also demonstrates how the probability emerges from fuzziness under very special circumstances. • This sheads some light on the issue of the ontology of subjective probabilities. Cognitive disposition to provide a measure for the expected case presupposes
10 The word ‘ontology’ here has a broader meaning than is customary in computational intelligence or knowledge engineering. In this paper we use it in the broader sense in which it is used in logic and philosophy. Namely ‘ontology’ here means “What is there” – the basic notions under which the defined systems and their important features exist.
On the Emergence of Fuzzy Logics from Elementary Percepts:
459
some experience. This experience cannot be captured fully by counting frequencies or by postulating axioms for subjective probabilities. • The Checklist paradigm indicates that this experience could be captured within the framework of Zadeh’s theory of perceptions (Zadeh 2001). • The Checklist paradigm indicates that deeper cognitive justification of Zadeh’s Theory of perception could come from Piaget’s cognitive epistemology; in particular through the Piaget logic transformation group which has been empirically discovered and psychologically justified (Inherder and Piaget 1964).
References Kohout LJ, Bandler W (1979) Checklist paradigm and group transformations (Technical Note EES–MMS–ckl91.2,–1979, Dept of Electrical Eng. Science, University of Essex., UK) Bandler W, Kohout LJ (1980) Semantics of implication operators and fuzzy relational products. Internat. J of Man-Machine Studies, 12:89–116. Reprinted in: Mamdani,E.H. and Gaines, B.R. (eds) Fuzzy Reasoning and its Applications. Academic Press, London, pp 219–246 Bandler W, Kohout LJ (1984) Unified theory of multiple-valued logical operators in the light of the checklist paradigm. In: Proc. of the 1984 IEEE Conference on Systems, Man and Cybernetics, IEEE New York, pp 356–364 Bandler W, Kohout LJ (1986) The use of checklist paradigm in inference systems. In: Negoita CV and Prade H (eds) Fuzzy Logic in Knowledge Engineering, Chap. 7, Verlag TUV Rheinland, Koln, pp 95–111 Dubois D, Prade H (1980) Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York. Inhelder B, Piaget J (1964) The Early Growth of Logic in the Child. Norton, New York. Kohout LJ (2001) Boolean and fuzzy relations. In: Pardalos PM, Floudas CA (eds) The Encyclopedia of Optimization Vol. 1 A–D. Kluwer, Boston, pp 189–202 Kohout LJ (2001a) Checklist paradigm semantics for fuzzy logics. In: Pardalos PM, Floudas CA (eds) The Encyclopedia of Optimization Vol. 1 A–D. Kluwer, Boston, pp 23–246 Kohout LJ (2004) Theory of fuzzy generalized morphisms and relational inequalities. Internat J of General Systems, 22:339–360 Kohout LJ, Bandler W (1992) How the checklist paradigm elucidates the semantics of fuzzy inference. In: Proc. of the IEEE Internat Conf on Fuzzy Systems. IEEE, New York, 1 pp 571–578 Kohout LJ, Bandler W (1992a) Use of fuzzy relations in knowledge representation, acquisition and processing. In: Zadeh LA, Kacprzyk J (eds) Fuzzy Logic for the Management of Uncertainty. John Wiley, New York, pp 415–435 Kohout and Bandler (1993) Interval-valued systems for approximate reasoning based on the checklist paradigm. In: Wang, P (ed) Advances in Fuzzy Theory and Technology Kohout LJ, Kim E (1997) Global characterization of fuzzy logic systems with paraconsistent and grey set features. In: Wang P (ed) Proc. 3rd Joint Conf. on InformationSciences JCIS’97 (5th Int. Conf. on FuzzyTheory and Technology, Vol. 1 Fuzzy Logic, Intelligent Control and Genetic Algorithms. Duke University, Research Triangle Park, NC, pp 238–241 Kohout LJ, Kim E (2000) Reasoning with cognitive structures of agents I: Acquisition of rules for computational theory of perceptions by fuzzy relational products. In: Da Ruan and Kerre E (eds) Fuzzy IF-THEN Rules in Computational Intelligence Chap. 8, Kluwer, Boston, pp 161–188 Kohout LJ Kim E. (2002) The role of BK products of relations in soft computing. Soft Computing, 6:89–91 Kohout LJ, Kim E (2004) Characterization of interval fuzzy logic systems of connectives by group transformations. Reliable Computing, 10:299–334
460
L. J. Kohout, E. Kim
Kohout LJ, Kim E (2005) Non-commutative fuzzy interval logics with approximation semantics based on the checklist paradigm and their group transformations. In: Proc. of FUZZ–IEEE2005. IEEE Neural Network Council, Piscataway, NJ, CD-ROM. Zadeh LA (2001) From computing with numbers to computing with words – from manipulation of measurements to manipulation of perceptions. In: Wang PP (ed) Computing with Words. John Wiley, New York, pp 35–68