Novel Approaches in Cognitive Informatics and Natural Intelligence Yingxu Wang University of Calgary, Canada
Information science reference Hershey • New York
Director of Editorial Content: Assistant Development Editor: Director of Production: Managing Editor: Assistant Managing Editor: Typesetter: Cover Design: Printed at:
Kristin Klinger Deborah Yahnke Jennifer Neidig Jamie Snavely Carole Coulson Michael Brehm Lisa Tosheff Yurchak Printing Inc.
Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail:
[email protected] Web site: http://www.igi-global.com and in the United Kingdom by Information Science Reference (an imprint of IGI Global) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com Copyright © 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Novel approaches in cognitive informatics and natural intelligence / Yingxu Wang, editor. p. cm. Includes bibliographical references and index. Summary: "This book covers issue of cognitive informatics with a transdisciplinary enquiry of cognitive and information sciences that investigates the internal information processing mechanisms and processes of the brain and natural intelligence, and their engineering applications via an interdisciplinary approach"--Provided by publisher. ISBN 978-1-60566-170-4 (hardcover) -- ISBN 978-1-60566-171-1 (ebook) 1. Neural computers. 2. Cognitive science. 3. Artificial intelligence. I. Wang, Yingxu. QA76.87.N68 2009 006.3--dc22 2008018331
British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book set is original material. The views expressed in this book are those of the authors, but not necessarily of the publisher.
Novel Appraoches in Cognitive Informatics and Natural Intelligence is part of the IGI Global series named Advances in Cognitive Informatics and Natural Intelligence (ACINI) Series, ISBN: Pending
If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating the library's complimentary electronic access to this publication.
Advances in Cognitive Informatics and Natural Intelligence (ACINI) Series ISBN: pending
Editor-in-Chief: Yingxu Wang, University of Calgary, Canada Novel Approaches in Cognitive Informatics and Natural Intelligence Yingxu Wang, University of Calgary, Canada Information Science Reference • copyright 2009 • 395 pp • H/C (ISBN: 978-1-60566-170-4) • US $195.00 (our price)
Creating a link between a number of natural science and life science disciplines, the emerging . eld of cognitive informatics presents a transdisciplinary approach to the internal information processing mechanisms and processes of the brain and natural intelligence. Novel Approaches in Cognitive Informatics and Natural Intelligence penetrates the academic field to offer the latest advancements in cognitive informatics and natural intelligence. This book covers the five areas of cognitive informatics, natural intelligence, autonomic computing, knowledge science, and relevant development, to provide researchers, academicians, students, and practitioners with a ready reference to the latest findings.
The Advances in Cognitive Informatics and Natural Intelligence (ACINI) Book Series seeks to fill the gap of literature that transcends disciplinary boundaries, and is devoted to the rapid publication of high quality books. In providing a scholarly channel for new research principles, theories and concepts, the book series will enhance the fields of Natural Intelligence, Autonomic Computing, and Neuroinformatics. The development and the cross fertilization between the aforementioned science and engineering disciplines have led to a whole range of extremely interesting new research areas known as Cognitive Informatics and Natural Intelligence. Advances in Cognitive Informatics and Natural Intelligence (ACINI) Book Series seeks to propel the availability of literature for international researchers, practitioners, and graduate students to investigate cognitive mechanisms and processes of human information processing, and to stimulate the transdisciplinary effort on cognitive informatics and natural intelligent research and engineering applications.
Hershey • New York Order online at www.igi-global.com or call 717-533-8845 x 100 – Mon-Fri 8:30 am - 5:00 pm (est) or fax 24 hours a day 717-533-7115
Editorial Advisory Board
Editor in Chief Yingxu Wang, University of Calgary, Canada
Associate Editors Lotfi A. Zadeh, University of California, Berkeley, USA Witold Kinsner, University of Manitoba, Canada John Bickle, University of Cioncinnati, USA Christine Chan, University of Regina, Canada
International Editorial Advisory Board James Anderson, Brown University, USA George Baciu, Hong Kong Polytechnic University, Hong Kong Franck Barbier, Par University, France Brian H. Bland, University of Calgary, Canada Keith Chan, Hong Kong Polytechnic University, Hong Kong Michael R.W. Dawson, University of Alberta, Canada Geoff Dromey, Griffith University, Australia Frank L. Greitzer, Pacific Northwest National Lab, USA Ling Guang, Ryerson University, Canada Bo Huang, The Chinese University of Hong Kong, Hong Kong Brian Henderson-Sellers, University of Technology Sydney, Australia Zeng-Guang Hou, Chinese Academy of Sciences, China Yaochu Jin, Honda Research Instutite Europe, Germany Jiming Liu, University of Windsor, Canada Pelayo F. Lopez, Universidad de Castilla-La Mancha, Spain Roger K. Moor, Department of Computer Science, University of Sheffield, UK Bernard Moulin, University of Laval, Canada Dilip Patel, South Bank University, UK Shushma Patel, South Bank University, UK Witold Pedrycz, University of Alberta, Canada Lech Polkowsk, University Warmia and Mazury, Poland Vaclav Rajlich, Wayne State University, USA Fernando Rubio, Universidad Complutense de Madrid, Spain
Gunther Ruhe, University of Calgary, Canada Philip Sheu, University of California, Irvine, USA Kenji Sugawara, Chiba Technical Institute, Japan Jeffrey Tsai, University of Illinois in Chicago, USA Guoyin Wang, Chongqing University of Posts and Telecoms, China Yiyu Yao, University of Regina, Canada Du Zhang, Department of Computer Science, California State University, USA Ning Zhong, Maebashi Institute of Technology, Japan Mengchu Zhou, New Jersey Institute of Technology, USA Xiaolin Zhou, Peking University, China
Table of Contents
Preface . ............................................................................................................................................................... xix Acknowledgment................................................................................................................................................ xxii Section I Cognitive Informatics Chapter I The Theoretical Framework of Cognitive Informatics.............................................................................................1 Yingxu Wang, University of Calgary, Canada Chapter II Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?.................................................28 Witold Kinsner, University of Manitoba, Canada Chapter III Cognitive Processes by using Finite State Machines..............................................................................................52 Ismael Rodríguez, Universidad Complutense de Madrid, Spain Manuel Núñez, Universidad Complutense de Madrid, Spain Fernando Rubio, Universidad Complutense de Madrid, Spain Chapter IV On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes............................65 Yingxu Wang, University of Calgary, Canada Chapter V A Selective Sparse Coding Model with Embedded Attention Mechanism.............................................................78 Qingyong Li, Beijing Jiaotong University, China Zhiping Shi, Chinese Academy of Sciences, China Zhongzhi Shi, Chinese Academy of Sciences, China
Section II Natural Intelligence Chapter VI The Cognitive Processes of Formal Inferences......................................................................................................92 Yingxu Wang, University of Calgary, Canada Chapter VII Neo-Symbiosis: The Next Stage in the Evolution of Human Information Interaction.........................................106 Douglas Griffith, General Dynamics Advanced Information Systems, USA Frank L. Greitzer, Pacific Northwest National Laboratory, USA Chapter VIII Language, Logic, and the Brain............................................................................................................................118 Ray E. Jennings, Simon Fraser University, Canada Chapter IX The Cognitive Process of Decision Making.........................................................................................................130 Yingxu Wang, University of Calgary, Canada Guenther Ruhe, University of Calgary, Canada Chapter X A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects.............................142 Tiansi Dong, Cognitive Ergonomic Systems, Germany Chapter XI A Formal Specification of the Memorization Process..........................................................................................157 Natalia López, Universidad Complutense de Madrid, Spain Manuel Núñez, Universidad Complutense de Madrid, Spain Fernando L. Pelayo, Universidad de Castilla-La Mancha, Spain Section III Autonomic Computing Chapter XII Theoretical Foundations of Autonomic Computing.............................................................................................172 Yingxu Wang, University of Calgary, Canada Chapter XIII Towards Cognitive Machines: Multiscale Measures and Analysis.......................................................................188 Witold Kinsner, University of Manitoba, Canada Chapter XIV Towards Autonomic Computing: Adaptive Neural Network for Trajectory Planning.........................................200 Amar Ramdane-Cherif, Université de Versailles St-Quentin, France Chapter XV Cognitive Modelling Applied to Aspects of Schizophrenia and Autonomic Computing.....................................220 Lee Flax, Macquarie University, Australia
Chapter XVI Interactive Classification Using a Granule Network............................................................................................235 Yan Zhao, University of Regina, Canada Yiyu Yao, University of Regina, Canada Section IV Knowledge Science Chapter XVII A Cognitive Computational Knowledge Representation Theory.........................................................................247 Mehdi Najjar, University of Sherbrooke, Canada André Mayers, University of Sherbrooke, Canada Chapter XVIII A Fixpoint Semantics for Rule-Base Anomalies..................................................................................................265 Du Zhang, California State University, USA Chapter XIX Development of an Ontology for an Industrial Domain.......................................................................................277 Christine W. Chan, University of Regina, Canada Chapter XX Constructivist Learning During Software Development......................................................................................292 Václav Rajlich, Wayne State University, USA Shaochun Xu, Laurentian University, Canada Chapter XXI A Unified Approach to Fractal Dimensions..........................................................................................................304 Witold Kinsner, University of Manitoba, Canada Section V Relevant Development Chapter XXII Cognitive Informatics: Four Years in Practice: A Report on IEEE ICCI’05........................................................327 Du Zhang, California State University, USA Witold Kinsner, University of Manitoba, Canada Jeffrey Tsai, University of Illinois in Chicago, USA Yingxu Wang, University of Calgary, Canada Philip Sheu, University of California, USA Taehyung Wang, California State University, USA
Chapter XXIII Toward Cognitive Informatics and Cognitive Computers: A Report on IEEE ICCI’06.......................................330 Yiyu Yao, University of Regina, Canada Zhongzhi Shi, Chinese Academy of Sciences, China Yingxu Wang, University of Calgary, Canada Witold Kinsner, University of Manitoba, Canada Yixin Zhong, Beijing University of Posts and Telecommunications, China Guoyin Wang, Chongqing University of Posts and Telecommunications, China Zeng-Guang Hou, Chinese Academy of Sciences, China Compilation of References.................................................................................................................................335 About the Contributors......................................................................................................................................363 Index.....................................................................................................................................................................369
Detailed Table of Contents
Preface . ............................................................................................................................................................... xix Acknowledgment................................................................................................................................................ xxii Section I Cognitive Informatics Chapter I The Theoretical Framework of Cognitive Informatics.............................................................................................1 Yingxu Wang, University of Calgary, Canada Cognitive Informatics (CI) is a transdisciplinary enquiry of the internal information processing mechanisms and processes of the brain and natural intelligence shared by almost all science and engineering disciplines. This chapter presents an intensive review of the new field of CI. The structure of the theoretical framework of CI is described, encompassing the Layered Reference Model of the Brain (LRMB), the OAR model of information representation, Natural Intelligence (NI) vs. Artificial Intelligence (AI), Autonomic Computing (AC) vs. imperative computing, CI laws of software, the mechanism of human perception processes, the cognitive processes of formal inferences, and the formal knowledge system. Three types of new structures of mathematics, Concept Algebra (CA), RealTime Process Algebra (RTPA), and System Algebra (SA), are created to enable rigorous treatment of cognitive processes of the brain as well as knowledge representation and manipulation in a formal and coherent framework. A wide range of applications of CI in cognitive psychology, computing, knowledge engineering, and software engineering has been identified and discussed. Chapter II Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?.................................................28 Witold Kinsner, University of Manitoba, Canada This chapter provides a review of Shannon and other entropy measures in evaluating the quality of materials used in perception, cognition, and learning processes. Energy-based metrics are not suitable for cognition, as energy itself does not carry information. Instead, morphological (structural and contextual) metrics as well as entropybased multiscale metrics should be considered in cognitive informatics. Appropriate data and signal transformation processes are defined and discussed in the perceptual framework, followed by various classes of information and entropies suitable for characterization of data, signals, and distortion. Other entropies are also described, including the Rényi generalized entropy spectrum, Kolmogorov complexity measure, Kolmogorov-Sinai entropy, and Prigogine entropy for evolutionary dynamical systems. Although such entropy-based measures are suitable for many signals, they are not sufficient for scale-invariant (fractal and multifractal) signals without corresponding complementary multiscale measures.
Chapter III Cognitive Processes by using Finite State Machines..............................................................................................52 Ismael Rodríguez, Universidad Complutense de Madrid, Spain Manuel Núñez, Universidad Complutense de Madrid, Spain Fernando Rubio, Universidad Complutense de Madrid, Spain Finite State Machines (FSM) are formalisms that have been used for decades to describe the behavior of systems. They can also provide an intelligent agent with a suitable formalism for describing its own beliefs about the behavior of the world surrounding it. In fact, FSMs are the suitable acceptors for right linear languages, which are the simplest languages considered in Chomsky’s classification of languages. Since Chomsky proposes that the generation of language (and, indirectly, any mental process) can be expressed through a kind of formal language, it can be assumed that cognitive processes can be formulated by means of the formalisms that can express those languages. Hence, we will use FSMs as a suitable formalism for representing (simple) cognitive models. We present an algorithm that, given an observation of the environment, produces an FSM describing an environment behavior that is capable to produce that observation. Since an infinite number of different FSMs could have produced that observation, we have to choose the most feasible one. When a phenomenon can be explained with several theories, Occam’s razor principle, which is basic in science, encourages choosing the simplest explanation. Applying this criterion to our problem, we choose the simplest (smallest) FSM that could have produced that observation. An algorithm is presented to solve this problem. In conclusion, our framework provides a cognitive model that is the most preferable theory for the observer, according to the Occam’s razor criterion Chapter IV On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes............................65 Yingxu Wang, University of Calgary, Canada An interactive motivation-attitude theory is developed based on the Layered Reference Model of the Brain (LRMB) and the Object-Attribute-Relation (OAR) model. This chapter presents a rigorous model of human perceptual processes such as emotions, motivations, and attitudes. A set of mathematical models and formally described cognitive processes are developed. The interactions and relationships between motivation and attitude are formally described in real-time process algebra (RTPA). Applications of the mathematical models of motivations and attitudes in software engineering are demonstrated. This work is the detailed description of a part of the layered reference model of the brain (LRMB) that provides a comprehensive model for explaining the fundamental cognitive processes of the brain and their interactions. This work demonstrates that the complicated human emotional and perceptual phenomena can be rigorously modeled in mathematics and be formally treated and described. Chapter V A Selective Sparse Coding Model with Embedded Attention Mechanism.............................................................78 Qingyong Li, Beijing Jiaotong University, China Zhiping Shi, Chinese Academy of Sciences, China Zhongzhi Shi, Chinese Academy of Sciences, China Sparse coding theory demonstrates that the neurons in the primary visual cortex form a sparse representation of natural scenes in the viewpoint of statistics, but a typical scene contains many different patterns (corresponding to neurons in cortex) competing for neural representation because of the limited processing capacity of the visual system. We propose an attention-guided sparse coding model. This model includes two modules: the non-uniform sampling module simulating the process of retina and a data-driven attention module based on the response saliency. Our experiment results show that the model notably decreases the number of coefficients which may be activated, and retains the main vision information at the same time. It provides a way to improve the coding efficiency for sparse coding model and to achieve good performance in both population sparseness and lifetime sparseness.
Section II Natural Intelligence Chapter VI The Cognitive Processes of Formal Inferences......................................................................................................92 Yingxu Wang, University of Calgary, Canada Theoretical research is predominately an inductive process, while applied research is mainly a deductive process. Both inference processes are based on the cognitive process and means of abstraction. This chapter describes the cognitive processes of formal inferences such as deduction, induction, abduction, and analogy. Conventional propositional arguments adopt static causal inference. This chapter introduces more rigorous and dynamic inference methodologies, which are modeled and described as a set of cognitive processes encompassing a series of basic inference steps. A set of mathematical models of formal inference methodologies is developed. Formal descriptions of the 4 forms of cognitive processes of inferences are presented using Real-Time Process Algebra (RTPA). The cognitive processes and mental mechanisms of inferences are systematically explored and rigorously modeled. Applications of abstraction and formal inferences in both the revilement of the fundamental mechanisms of the brain and the investigation of next generation cognitive computers are explored. Chapter VII Neo-Symbiosis: The Next Stage in the Evolution of Human Information Interaction.........................................106 Douglas Griffith, General Dynamics Advanced Information Systems, USA Frank L. Greitzer, Pacific Northwest National Laboratory, USA The purpose of this paper is to re-address the vision of human-computer symbiosis as originally expressed by J.C.R. Licklider nearly a half-century ago and to argue for the relevance of this vision to the field of cognitive informatics. We describe this vision, place it in some historical context relating to the evolution of human factors research, and observe that the field is now in the process of re-invigorating Licklider’s vision. A central concept of this vision is that humans need to be incorporated into computer architectures. We briefly assess the state of the technology within the context of contemporary theory and practice, and we describe what we regard as this emerging field of neo-symbiosis. Examples of neo-symbiosis are provided, but these are nascent examples and the potential of neo-symbiosis is yet to be realized. We offer some initial thoughts on requirements to define functionality of neo-symbiotic systems and discuss research challenges associated with their development and evaluation. Methodologies and metrics for assessing neo-symbiosis are discussed. Chapter VIII Language, Logic, and the Brain............................................................................................................................118 Ray E. Jennings, Simon Fraser University, Canada Language is primarily a physical, and more particularly a biological phenomenon. To say that it is primarily so is to say that that is how, in the first instance, it presents itself to observation. It is curious then that theoreticians of language treat it as though it were primarily semantic or syntactic or some fusion of the two, and as though our implicit understanding of semantics and the syntax regulates both our language production and our language comprehension. On this view the brain is both a repository of semantic and syntactic constraints, and is the instrument by which we draw upon these accounts for the hard currency of linguistic exchange. With this view comes a division of the vocables of language into those that carry semantic content (lexical vocabulary) and those that mark syntactic form (functional and logical vocabulary). Logical theory of the past 150 years has been understood by many as a purified abstraction of linguistic forms. So it is not surprising that the “logical” vocabulary of natural language has been understood in the reflected light of that formal science. Those internal transactions in which “logical” vocables essentially figure, the transactions that we think of as reasonings, are seen by many as constrained by those laws of thought that logic was supposed to codify. Of course no vocabulary can be entirely independent of semantic understanding, but whereas the meaning of lexical vocabulary varies from context to context (run on the treadmill,
run on the market, run-on sentence, run in her stocking, run down, run the tap etc.) logical vocabulary is thought to have fixed minimal semantic content independently of context. A biological view of language presents a sharply contrasting picture. On an evolutionary time-scale the human brain and human language have co-evolved. So we have pre-linguistic ancestors, some of whose cunning we have inherited, as we have quasi-linguistic ancestors and early linguistic ancestors whose inherited skills were enhanced and made more effective by the slow acquisition of linguistic instruments of control and coordination. Where in this long development does logic enter? On the shorter time-scale of linguistic evolution, we know that all connective vocabulary descends from lexical vocabulary, much of it from the language of spatial and other physical relationships. We can now say, more or less, how that happens. We can even find many cases of mutations in logicalized vocabulary, semantic changes that come about in much the way that biological mutations occur in molecular biological processes. These changes proliferate to yield a wide diversity in the evolved uses of natural language connectives. Just as surprisingly, we discover, we don’t in general understand connective vocabulary, nor do we need to for the purpose of using it correctly in speech. And by no means do our automatic uses of it coincide with those that would be predicted by the syntax/semantics view. Far from having fixed minimal semantic content, logical vocabulary is semantically rich, context-dependent, and, partly because we do not in general understand it, semantically extremely fragile. Chapter IX The Cognitive Process of Decision Making.........................................................................................................130 Yingxu Wang, University of Calgary, Canada Guenther Ruhe, University of Calgary, Canada Decision making is one of the basic cognitive processes of human behaviors by which a preferred option or a course of actions is chosen from among a set of alternatives based on certain criteria. Decision theories are widely applied in many disciplines encompassing cognitive informatics, computer science, management science, economics, sociology, psychology, political science, and statistics. A number of decision strategies have been proposed from different angles and application domains, such as the maximum expected utility and Bayesian method. However, there is still a lack of a fundamental and mathematical decision model and a rigorous cognitive process for decision making. This chapter presents a fundamental cognitive decision making process and its mathematical model, which is described as a sequence of Cartesian-product-based selections. A rigorous description of the decision process in Real-Time Process Algebra (RTPA) is provided. Real-world decisions are perceived as a repetitive application of the fundamental cognitive process. The result shows that all categories of decision strategies fit in the formally described decision process. The cognitive process of decision making may be applied in a wide range of decision-based systems, such as cognitive informatics, software agent systems, expert systems, and decision support systems. Chapter X A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects.............................142 Tiansi Dong, Cognitive Ergonomic Systems, Germany This chapter proposes a commonsense understanding of distance and orientation knowledge between extended objects, and presents a formal representation of spatial knowledge. The connection relation is taken as primitive. A new axiom is introduced to govern the connection relation. Notions of ‘near extension’ regions and the ‘nearer’ predicate are coined. Distance relations between extended objects are understood as degrees of the near extension from one object to the other. Orientation relations are understood as distance comparison from one object to the sides of the other object. Therefore, distance and orientation relations are internally related through the connection relation. The ‘fiat projection’ mechanism is proposed to model the mental formation of the deictic orientation reference framework. This chapter shows diagrammatically the integration of topological relations, distance relations, and orientation relations in the RCC frameworks.
Chapter XI A Formal Specification of the Memorization Process..........................................................................................157 Natalia López, Universidad Complutense de Madrid, Spain Manuel Núñez, Universidad Complutense de Madrid, Spain Fernando L. Pelayo, Universidad de Castilla-La Mancha, Spain In this chapter we present the formal language STOPA (STOchastic Process Algebra) to specify cognitive systems. In addition to the usual characteristics of these formalisms, this language features the possibility of including stochastic time. This kind of time is useful to represent systems where the delays are not controlled by fixed amounts of time, but are given by probability distribution functions. In order to illustrate the usefulness of our formalism, we will formally represent a cognitive model of the memory. Following contemporary theories of memory classification (see [Squire et al., 1993; Solso, 1999]) we consider sensory buffer, short-term, and long-term memories. Moreover, borrowing from Y. Wang and Y. Wang (2006), we also consider the so-called action buffer memory. Section III Autonomic Computing Chapter XII Theoretical Foundations of Autonomic Computing.............................................................................................172 Yingxu Wang, University of Calgary, Canada Autonomic computing (AC) is an intelligent computing approach that autonomously carries out robotic and interactive applications based on goal- and inference-driven mechanisms. This chapter attempts to explore the theoretical foundations and technical paradigms of AC. It reviews the historical development that leads to the transition from imperative computing to AC. It surveys transdisciplinary theoretical foundations for AC such as those of behaviorism, cognitive informatics, denotational mathematics, and intelligent science. On the basis of this work, a coherent framework towards AC may be established for both interdisciplinary theories and application paradigms, which will result in the development of new generation computing architectures and novel information processing systems. Chapter XIII Towards Cognitive Machines: Multiscale Measures and Analysis.......................................................................188 Witold Kinsner, University of Manitoba, Canada Numerous attempts are being made to develop machines that could act not only autonomously, but also in an increasingly intelligent and cognitive manner. Such cognitive machines ought to be aware of their environments which include not only other machines, but also human beings. Such machines ought to understand the meaning of information in more human-like ways by grounding knowledge in the physical world and in the machines’ own goals. The motivation for developing such machines ranges from self-evidenced practical reasons, such as the expense of computer maintenance, to wearable computing in health care, and gaining a better understanding of the cognitive capabilities of the human brain. To achieve such an ambitious goal requires solutions to many problems, ranging from human perception, attention, concept creation, cognition, consciousness, executive processes guided by emotions and value, and symbiotic conversational human-machine interactions. An important component of this cognitive machine research includes multiscale measures and analysis. This chapter presents definitions of cognitive machines, representations of processes, as well as their measurements, measures and analysis. It provides examples from current research, including cognitive radio, cognitive radar, and cognitive monitors.
Chapter XIV Towards Autonomic Computing: Adaptive Neural Network for Trajectory Planning.........................................200 Amar Ramdane-Cherif, Université de Versailles St-Quentin, France Cognitive approach through the neural network (NN) paradigm is a critical discipline that will help bring about autonomic computing (AC). NN-related research, some involving new ways to apply control theory and control laws, can provide insight into how to run complex systems that optimize to their environments. NN is one kind of AC systems that can embody human cognitive powers and can adapt, learn, and take over certain functions previously performed by humans. In recent years, artificial neural networks have received a great deal of attention for their ability to perform nonlinear mappings. In trajectory control of robotic devices, neural networks provide a fast method of autonomously learning the relation between a set of output states and a set of input states. In this chapter, we apply the cognitive approach to solve position controller problems using an inverse geometrical model. In order to control a robot manipulator in the accomplishment of a task, trajectory planning is required in advance or in real time. The desired trajectory is usually described in Cartesian coordinates and needs to be converted to joint space for the purpose of analyzing and controlling the system behavior. In this chapter, we use a memory neural network (MNN) to solve the optimization problem concerning the inverse of the direct geometrical model of the redundant manipulator when subject to constraints. Our approach offers substantially better accuracy, avoids the computation of the inverse or pseudoinverse Jacobian matrix, and does not produce problems such as singularity, redundancy, and considerably increased computational complexity. Chapter XV Cognitive Modelling Applied to Aspects of Schizophrenia and Autonomic Computing.....................................220 Lee Flax, Macquarie University, Australia We give an approach to cognitive modelling which allows for richer expression than the one based simply on the firing of sets of neurons. The object language of the approach is first-order logic augmented by operations of an algebra, PSEN. Some operations useful for this kind of modelling are postulated: combination, comparison and inhibition of sets of sentences. Inhibition is realised using an algebraic version of AGM belief contraction(Peter Gärdenfors: Knowledge in Flux,1988). It is shown how these operations can be realised using PSEN. Algebraic modelling using PSEN is used to give an account of an explanation of some signs and symptoms of schizophrenia due to Frith (The Cognitive Neuropsychology of Schizophrenia, 1992) as well as a proposal for the cognitive basis of autonomic computing. A brief discussion of the computability of the operations of PSEN is also given. Chapter XVI Interactive Classification Using a Granule Network............................................................................................235 Yan Zhao, University of Regina, Canada Yiyu Yao, University of Regina, Canada Classification is one of the main tasks in machine learning, data mining, and pattern recognition. Compared with the extensively studied automation approaches, the interactive approaches, centered on human users, are less explored. This chapter studies interactive classification at 3 levels. At the philosophical level, the motivations and a process-based framework of interactive classification are proposed. At the technical level, a granular computing model is suggested for re-examining not only existing classification problems, but also interactive classification problems. At the application level, an interactive classification system, ICS, using a granule network as the search space, is introduced. ICS allows multi-strategies for granule tree construction, and enhances the understanding and interpretation of the classification process. Interactive classification is complementary to the existing classification methods.
Section IV Knowledge Science Chapter XVII A Cognitive Computational Knowledge Representation Theory.........................................................................247 Mehdi Najjar, University of Sherbrooke, Canada André Mayers, University of Sherbrooke, Canada Encouraging results of previous years in the field of knowledge representation within virtual learning environments confirms that artificial intelligence research in this topic, find it very beneficial to integrate the knowledge psychological research have accumulated on understanding the cognitive mechanism of human learning and all the positive results obtained in computational modelling theories. This chapter introduces a novel cognitive and computational knowledge representation approach inspired by cognitive theories which explain the human cognitive activity in terms of memory subsystems and their processes, and whose aim is to suggest formal computational models of knowledge that offer efficient and expressive representation structures for virtual learning. Practical studies both contribute to validate the novel approach and permit drawing general conclusions. Chapter XVIII A Fixpoint Semantics for Rule-Base Anomalies..................................................................................................265 Du Zhang, California State University, USA A crucial component of an intelligent system is its knowledge base that contains knowledge about a problem domain. Knowledge base development involves domain analysis, context space definition, ontological specification and knowledge acquisition, codification, and verification. Knowledge base anomalies can affect the correctness and performance of an intelligent system. In this chapter, we describe fixpoint semantics for a knowledge base that is based on a multi-valued logic. We then use the fixpoint semantics to provide formal definitions for 4 types of knowledge base anomalies: inconsistency, redundancy, incompleteness, circularity. We believe such formal definitions of knowledge base anomalies will help pave the way for a more effective knowledge base verification process. Chapter XIX Development of an Ontology for an Industrial Domain.......................................................................................277 Christine W. Chan, University of Regina, Canada This chapter presents a method for ontology construction and its application in developing ontology in the domain of natural gas pipeline operations. Both the method as well as the application ontology developed, contribute to the infrastructure of Semantic Web that provides semantic foundation for supporting information processing by autonomous software agents. This chapter presents the processes of knowledge acquisition and ontology construction for developing a knowledge-based decision support system for monitoring and control of natural gas pipeline operations. Knowledge on the problem domain was acquired and analyzed using the Inferential Modeling Technique, then the analyzed knowledge was organized into an application ontology and represented in the Knowledge Modeling System. Since ontology is an explicit specification of a conceptualization that provides a comprehensive foundation specification of knowledge in a domain, it provides semantic clarifications for autonomous software agents that process information on the Internet. Chapter XX Constructivist Learning During Software Development......................................................................................292 Václav Rajlich, Wayne State University, USA Shaochun Xu, Laurentian University, Canada
This chapter explores the non-monotonic nature of the programmer learning that takes place during incremental program development. It uses a constructivist learning model that consists of four fundamental cognitive activities: absorption that adds new facts to the knowledge, denial that rejects facts that do not fit in, reorganization that reorganizes the knowledge, and expulsion that rejects obsolete knowledge. A case study of an incremental program development illustrates the application of the model and demonstrates that it can explain the learning process with episodes of both increase and decrease in the knowledge. Implications for the documentation systems are discussed in the conclusions. Chapter XXI A Unified Approach to Fractal Dimensions..........................................................................................................304 Witold Kinsner, University of Manitoba, Canada Many scientific chapters treat the diversity of fractal dimensions as mere variations on either the same theme or a single definition. There is a need for a unified approach to fractal dimensions for there are fundamental differences between their definitions. This chapter presents a new description of three essential classes of fractal dimensions based on: (i) morphology, (ii) entropy, and (iii) transforms, all unified through the generalized-entropy-based Rényi fractal dimension spectrum. It discusses practical algorithms for computing 15 different fractal dimensions representing the classes. Although the individual dimensions have already been described in the literature, the unified approach presented in this chapter is unique in terms of (i) its progressive development of the fractal dimension concept, (ii) similarity in the definitions and expressions, (iii) analysis of the relation between the dimensions, and (iv) their taxonomy. As a result, a number of new observations have been made, and new applications discovered. Of particular interest are behavioral processes (such as dishabituation), irreversible and birth-death growth phenomena (e.g., diffusion-limited aggregates, DLAs, dielectric discharges, and cellular automata), as well as dynamical nonstationary transient processes (such as speech and transients in radio transmitters), multifractal optimization of image compression using learned vector quantization with Kohonen’s self-organizing feature maps (SOFMs), and multifractal-based signal denoising. Section V Relevant Development Chapter XXII Cognitive Informatics: Four Years in Practice: A Report on IEEE ICCI’05........................................................327 Du Zhang, California State University, USA Witold Kinsner, University of Manitoba, Canada Jeffrey Tsai, University of Illinois in Chicago, USA Yingxu Wang, University of Calgary, Canada Philip Sheu, University of California, USA Taehyung Wang, California State University, USA The 2005 IEEE International Conference on Cognitive Informatics (ICCI’05) was held during August 8th to 10th 2005 on the campus of University of California, Irvine. This was the fourth conference of ICCI. The previous conferences were held at Calgary, Canada (ICCI’02), London, UK (ICCI’03), and Victoria, Canada (ICCI’04), respectively. ICCI’05 was organized by General Co-Chairs of Jeffrey Tsai (University of Illinois) and Yingxu Wang (University of Calgary), Program Co-Chairs of Du Zhang (California State University) and Witold Kinsner (University of Manitoba), and Organization Co-Chairs of Philip Sheu (University of California), Taehyung Wang (California State University, Northridge), and Shangping Ren (Illinois Institute of Technology).
Chapter XXIII Toward Cognitive Informatics and Cognitive Computers: A Report on IEEE ICCI’06.......................................330 Yiyu Yao, University of Regina, Canada Zhongzhi Shi, Chinese Academy of Sciences, China Yingxu Wang, University of Calgary, Canada Witold Kinsner, University of Manitoba, Canada Yixin Zhong, Beijing University of Posts and Telecommunications, China Guoyin Wang, Chongqing University of Posts and Telecommunications, China Zeng-Guang Hou, Chinese Academy of Sciences, China Cognitive informatics (CI) is a cutting-edge and multidisciplinary research area that tackles the fundamental problems shared by modern informatics, computation, software engineering, AI, cybernetics, cognitive science, neuropsychology, medical science, systems science, philosophy, linguistics, economics, management science, and life sciences. CI can be viewed as a trans-disciplinary enquiry of cognitive and information sciences that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, and their engineering applications. It is a trans-disciplinary study of the internal information processing mechanisms and processes of the natural intelligence—human brains and minds—and their engineering applications. Compilation of References.................................................................................................................................335 About the Contributors......................................................................................................................................363 Index.....................................................................................................................................................................369
xix
Preface
Cognitive informatics (CI) is a new discipline that studies the natural intelligence and internal information processing mechanisms of the brain, as well as the processes involved in perception and cognition. CI provides a coherent set of fundamental theories and contemporary mathematics, which form the foundation for most information and knowledge-based science and engineering disciplines, such as computer science, cognitive science, neuropsychology, systems science, cybernetics, computer/software engineering, knowledge engineering, and computational intelligence. The basic characteristic of the human brain is information processing. Information is recognized as the third essence supplementing matter and energy to model the natural world. Information is any property or attribute of the natural world that can be distinctly elicited, generally abstracted, quantitatively represented, and mentally processed. Informatics is the science of information that studies the nature of information, it’s processing, and ways of transformation between information, matter, and energy. Cognitive Informatics is the transdisciplinary enquiry of cognitive and information sciences that investigates the internal information processing mechanisms and processes of the brain and natural intelligence, and their engineering applications via an interdisciplinary approach. In many disciplines of human knowledge, almost all of the hard problems yet to be solved share a common root in the understanding of the mechanisms of natural intelligence and the cognitive processes of the brain. Therefore, CI is a discipline that forges links between a number of natural science and life science disciplines with informatics and computing science. This book, “Novel Approaches in Cognitive Informatics and Natural Intelligence,” is the first volume in the IGI Global Series of Advances in Cognitive Informatics and Natural Intelligence. It covers five sections on (i) Cognitive Informatics; (ii) Natural Intelligence; (iii) Autonomic Computing; (iv) Knowledge Science; and (v) Relevant Development.
Section i. Cognitive Informatics A wide range of interesting and ground-breaking progresses has been made in CI, especially the theoretical frameworks of CI and denotational mathematics for CI. This section presents the recent advances in CI on theories, models, methodologies, mathematical means, and techniques toward the exploration of the natural intelligence and the brain, which form the foundations for natural intelligence, neural informatics, autonomic computing, and agent systems. This section on cognitive informatics encompasses the following five chapters: • • • • •
Chapter I. The Theoretical Framework of Cognitive Informatics Chapter II. Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics? Chapter III. Cognitive Processes by using Finite State Machines Chapter IV. On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes Chapter V. A Selective Sparse Coding Model with Embedded Attention Mechanism
xx
Section ii. Natural Intelligence Natural intelligence, in the narrow sense, is a human or a system ability that transforms information into behaviors. In the broad sense, it is any human or system ability that autonomously transfers the forms of abstract information between data, information, knowledge, and behaviors in the brain. The history of human quest to understand the brain and natural intelligence is certainly as long as human history itself. It is recognized that artificial intelligence is a subset of natural intelligence, therefore, the understanding of natural intelligence is a foundation for investigating into artificial, machinable, and computational intelligence. This section on natural intelligence encompasses the following six chapters: • • • • • •
Chapter VI. The Cognitive Processes of Formal Inferences Chapter VII. Neo-Symbiosis: The Next Stage in the Evolution of Human Information Interaction Chapter VIII. Language, Logic, and the Brain Chapter IX. The Cognitive Process of Decision Making Chapter X. A Common Sense Approach to Representing Spatial Knowledge Between Extended Objects Chapter XI. A Formal Specification of the Memorization Process
Section iii. Autonomic Computing The approaches to computing can be classified into two categories known as imperative and autonomic computing. Corresponding to these, computing systems may be implemented as imperative or autonomic computing systems. An imperative computing system is a passive system that implements deterministic, context-free, and storedprogram controlled behaviors. While an autonomic computing system is an intelligent system that autonomously carries out robotic and interactive actions based on goal- and event-driven mechanisms. The autonomic computing system implements nondeterministic, context-dependent, and adaptive behaviors. Autonomic computing does not rely on instructive and procedural information, but are dependent on internal status, and willingness that formed by long-term historical events and current rational or emotional goals. This section on autonomic computing encompasses the following five chapters: • • • • •
Chapter XII. Theoretical Foundations of Autonomic Computing Chapter XIII. Towards Cognitive Machines: Multiscale Measures and Analysis Chapter XIV. Towards Autonomic Computing: Adaptive Neural Network for Trajectory Planning Chapter XV. Cognitive Modeling Applied to Aspects of Schizophrenia and Autonomic Computing Chapter XVI. Interactive Classification Using a Granule Network
Section iv. Knowledge Science Knowledge science is an emerging field that studies the nature of human knowledge, its mathematical model, and its manipulation. Because almost all disciplines of science and engineering deal with information and knowledge, investigation into the generic theories of knowledge science and its cognitive foundations is one of the profound areas of cognitive informatics. Francis Bacon (1561-1626) asserted that “knowledge is power.” In CI, knowledge is recognized as one of the important forms of cognitive information supplement to behaviors, experience, and skills. This section on knowledge science encompasses the following five chapters: • • • • •
Chapter XVII. A Cognitive Computational Knowledge Representation Theory Chapter XVIII. A Fixpoint Semantics for Rule-Base Anomalies Chapter XIX. Development of an Ontology for an Industrial Domain Chapter XX. Constructivist Learning During Software Development Chapter XXI. A Unified Approach to Fractal Dimensions
xxi
Section v. Relevant Development A series of the IEEE International Conferences on Cognitive Informatics (ICCI) have been organized annually. The inaugural conference was held at Calgary, Canada (ICCI’02), followed by events in London, UK (ICCI’03); Victoria, Canada (ICCI’04); Irvine, USA (ICCI’05); Beijing, China (ICCI’06), Lake Tahoe, USA (ICCI’07), and Stanford University, USA (ICCI’08). This part on relevant development encompasses the following two chapters: • •
Chapter XXII. Cognitive Informatics: Four Years in Practice: A Report on IEEE ICCI’05 Chapter XXIII. Toward Cognitive Informatics and Cognitive Computers: A Report on IEEE ICCI'06
A wide range of applications of CI has been identified. The key application areas of CI can be divided into two categories. The first category of applications implements informatics and computing techniques to investigate cognitive science problems, such as memory, learning, and reasoning. The second category adopts cognitive theories to investigate problems in informatics, computing, and software/knowledge engineering. CI focuses on the nature of information processing in the brain, such as information acquisition, representation, memory, retrieve, generation, and communication. Through the interdisciplinary approach and with the support of modern information and neuroscience technologies, mechanisms of the brain and the mind may be systematically explored within the framework of CI.
xxii
Acknowledgment
Many persons have contributed their dedicated work to this book and related research and events. The Editor-inChief would like to thank all authors, the associate editors of IJCINI, the editorial board members, and invited reviewers for their great contributions to this book. I would also like to thank the IEEE Steering Committee and organizers of the series of IEEE International Conference on CI (ICCI) in the last eight years, particularly Witold Kinsner, James Anderson, Witold Pedrycz, John Bickle, Du Zhang, Yiyu Yao, Jeffrey Tsai, Philip Sheu, Jean-Claude Latombe, Dilip Patel, Christine Chan, Shushma Patel, Guoyin Wang, Ron Johnston, and Michael R.W. Dawson. I would like to acknowledge the publisher of this book, IGI Global, USA. I would like to thank Dr. Mehdi KhosrowPour, Jan Travers, Kristin M. Klinger and Deborah Yahnke, for their professional editorship. I would also like to thank Maggie Ma and Siyuan Wang for their valuable help and assistance.
Yingxu Wang
Section I
Cognitive Informatics
Chapter I
The Theoretical Framework of Cognitive Informatics Yingxu Wang University of Calgary, Canada
Abstract Cognitive Informatics (CI) is a transdisciplinary enquiry of the internal information processing mechanisms and processes of the brain and natural intelligence shared by almost all science and engineering disciplines. This chapter presents an intensive review of the new field of CI. The structure of the theoretical framework of CI is described, encompassing the Layered Reference Model of the Brain (LRMB), the OAR model of information representation, Natural Intelligence (NI) vs. Artificial Intelligence (AI), Autonomic Computing (AC) vs. imperative computing, CI laws of software, the mechanism of human perception processes, the cognitive processes of formal inferences, and the formal knowledge system. Three types of new structures of mathematics, Concept Algebra (CA), Real-Time Process Algebra (RTPA), and System Algebra (SA), are created to enable rigorous treatment of cognitive processes of the brain as well as knowledge representation and manipulation in a formal and coherent framework. A wide range of applications of CI in cognitive psychology, computing, knowledge engineering, and software engineering has been identified and discussed.
Introduction The development of classical and contemporary informatics, the cross fertilization between computer science, systems science, cybernetics, computer/software engineering, cognitive science, knowledge engineering, and neuropsychology, has led to an entire range of extremely interesting new research field known as Cognitive Informatics (Wang, 2002a; Wang et al., 2002; Wang, 2003a/b; Wang, 2006b; Wang and Kinsner, 2006). Informatics is the science of information that studies the nature of information, it’s processing, and ways of transformation between information, matter, and energy. De. nition 1. Cognitive Informatics (CI) is a transdisciplinary enquiry of cognitive and information sciences that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, and their engineering applications via an interdisciplinary approach.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
The Theoretical Framework of Cognitive Informatics
In many disciplines of human knowledge, almost all of the hard problems yet to be solved share a common root in the understanding of the mechanisms of natural intelligence and the cognitive processes of the brain. Therefore, CI is a discipline that forges links between a number of natural science and life science disciplines with informatics and computing science. The structure of the theoretical framework of CI is described in Figure 1, which covers the Information-Matter-Energy (IME) model (Wang, 2003b), the Layered Reference Model of the Brain (LRMB) (Wang et al., 2006), the Object-Attribute-Relation (OAR) model of information representation in the brain (Wang, 2006h; Wang and Wang, 2006), the cognitive informatics model of the brain (Wang et al., 2003; Wang and Wang, 2006), Natural Intelligence (NI) (Wang, 2003b), Autonomic Computing (AC) (Wang, 2004), Neural Informatics (NeI) (Wang, 2002a; Wang, 2003b; Wang, 2006b), CI laws of software (Wang, 2006f), the mechanisms of human perception processes (Wang, 2005a), the cognitive processes of formal inferences (Wang, 2005c), and the formal knowledge system (Wang, 2006g). In this chapter, the theoretical framework of CI is explained in Section 2. Three structures of new descriptive mathematics such as Concept Algebra (CA), Real-Time Process Algebra (RTPA), and System Algebra (SA) are introduced in Section 3 in order to rigorously deal with knowledge and cognitive information representation and manipulation in a formal and coherent framework. Applications of CI are discussed in Section 4, which covers cognitive computing, knowledge engineering, and software engineering. Section 5 draws conclusions on the theories of CI, the contemporary mathematics for CI, and their applications.
Figure 1. The theoretical framework of CI t he t heoretical f ramework of c ognitive informatics (c i)
CI Theories (T)
CI Applications (A)
Descriptive Mathematics for CI (M)
T1 The IME model T2 The LRMB model
T7 CI laws of software
M1 Concept algebra (CA)
A1 Future generation Computers
T3 The OAR model
T8 Perception processes
M2 RTPA
A2 Capacity of human memory
T4 CI model of the brain
T9 Inference processes
M3 System algebra (SA)
A3 Autonomic computing
T5 Natural intelligence
T10 The knowledge system
T6 Neural informatics A9 Cognitive complexity of software
A4 Cognitive properties of knowledge A5 Simulation of cognitive behaviors
A8 Deductive semantics of software
A7 CI foundations of software engineering
A6 Agent systems
The Theoretical Framework of Cognitive Informatics
The Fundamental Theories of CI The fundamental theories of CI encompass ten transdisciplinary areas and fundamental models, T1 through T10, as identified in Figure 1. This section presents an intensive review of the theories developed in CI, which form a foundation for exploring the natural intelligence and their applications in brain science, neural informatics, computing, knowledge engineering, and software engineering.
The Information-Matter-Energy (IME) Model Information is recognized as the third essence of the natural world supplementing to matter and energy (Wang, 2003b), because the primary function of the human brain is information processing. Theorem 1. A generic world view, the Information-Matter-Energy (IME) model, states that the natural world (NW) that forms the context of human beings is a dual world: one aspect of it is the physical or the concrete world (PW), and the other is the abstract or the perceptive world (AW), where matter (M) and energy (E) are used to model the former, and information (I) to the latter, i.e.: NW =ˆ PW || AW = p( M , E ) || a ( I ) = n( I , M , E )
(1)
where || denotes a parallel relation, and p, a, and n are functions that determine a certain PW, AW, or NW, respectively, as illustrated in Figure 2. According to the IME model, information plays a vital role in connecting the physical world with the abstract world. Models of the natural world have been well studied in physics and other natural sciences. However, the modeling of the abstract world is still a fundamental issue yet to be explored in cognitive informatics, computing, software science, cognitive science, brain sciences, and knowledge engineering. Especially the relationships between I-M-E and their transformations are deemed as one of the fundamental questions in CI. Corollary 1. The natural world NW(I, M, E), particularly the part of the abstract world AW(I), is cognized and perceived differently by individuals because the uniqueness of perceptions and mental contexts among people. Corollary 1 indicates that although the physical world PW(M, E) is the same to everybody, the natural world NW(I, M, E) is unique to different individuals because the abstract world AW(I), as a part of it, is subjective depending on the information an individual obtains and perceives.
Figure 2. The IME model of the world view T h e a b s tra c t w o rld (A W )
I T h e n a tu ra l w o rld (N W )
M
E T h e p h ys ic a l w o rld (P W )
The Theoretical Framework of Cognitive Informatics
Corollary 2. The principle of transformability between I-M-E states that, according to the IME model, the three essences of the world are predicated to be transformable between each other as described by the following generic functions f1 to f6: I = f1 (M)
(2.1)
M = f2 (I) Ҁ f1 -1(I)
(2.2)
I = f3 (E)
(2.3)
E = f4 (I) Ҁ f3 -1(I)
(2.4)
E = f5 (M)
(2.5)
M = f6 (E) = f5 -1(E)
(2.6)
where a question mark on the equal sign denotes an uncertainty if there exists such a reverse function (Wang, 2003b). Albert Einstein revealed Functions f5 and f6 , the relationship between matter (m) and energy (E), in the form E = mC2, where C is the speed of light. It is a great curiosity to explore what the remaining relationships and forms of transformation between I-M-E will be. In a certain extent, cognitive informatics is the science to seek possible solutions for f1 to f4. A clue to explore the relations and transformability is believed in the understanding of the natural intelligence and its information processing mechanisms in CI. Definition 2. Information in CI is defined as a generic abstract model of properties or attributes of the natural world that can be distinctly elicited, generally abstracted, quantitatively represented, and mentally processed. Definition 3. The measurement of information, Ik, is defined by the cost of code to abstractly represent a given size of internal message X in the brain in a digital system based on k, i.e.: I k = f : X ® Sk
(3)
= é logk X ù
where Ik is the content of information in a k-based digital system, and Sk the measurement scale based on k. The unit of Ik is the number of k-based digits (Wang, 2003b). Eq. 3 is a generic measure of information sizes. When a binary digital representation system is adopted, i.e. k = b = 2, it becomes the most practical one as follows. Definition 4. The meta-level representation of information, Ib, is that when k = b = 2, i.e.: Ib = f : X ® Sb
(4)
= é logb X ù
where the unit of information, Ib, is a bit. Note that the bit here is a concrete and deterministic unit, and it is no longer probability-based as in conventional information theories (Shannon, 1948; Bell, 1953). In a certain extent, computer science and engineering is a branch of modern informatics that studies machine representation and processing of external information; while CI is a branch of contemporary informatics that studies internal information representation and processing in the brain.
The Theoretical Framework of Cognitive Informatics
Theorem 2. The most fundamental form of information that can be represented and processed is binary digit where k = b = 2. Theorem 2 indicates that any form of information in the physical (natural) and abstract (mental) worlds can be unified on the basis of binary data. This is the CI foundation of modern digital computers and NI.
The Layered Reference Model of the Brain The Layered Reference Model of the Brain (LRMB) (Wang et al., 2006) is developed to explain the fundamental cognitive mechanisms and processes of natural intelligence. Because a variety of life functions and cognitive processes have been identified in CI, psychology, cognitive science, brain science, and neurophilosophy, there is a need to organize all the recurrent cognitive processes in an integrated and coherent framework. The LRMB model explains the functional mechanisms and cognitive processes of natural intelligence that encompasses 37 cognitive processes at six layers known as the sensation, memory, perception, action, meta cognitive, and higher cognitive layers from the bottom-up as shown in Figure 3. LRMB elicits the core and highly repetitive recurrent cognitive processes from a huge variety of life functions, which may shed light on the study of the fundamental mechanisms and interactions of complicated mental processes, particularly the relationships and interactions between the inherited and the acquired life functions as well as those of the subconscious and conscious cognitive processes.
The OAR Model of Information Representation in the Brain Investigation into the cognitive models of information and knowledge representation in the brain is perceived to be one of the fundamental research areas that help to unveil the mechanisms of the brain. The Object-AttributeRelation (OAR) model (Wang et al., 2003; Wang, 2006h) describes human memory, particularly the long-term Figure 3. The layered reference model of the brain (LRMB)
Conscious cognitive processes
Layer 6 Higher cognitive functions
Layer 5 Meta cognitive functions
Subconscious cognitive processes
Layer 4 Action
Layer 3 Perception
Layer 2 Memory
Layer 1 Sensation
The Theoretical Framework of Cognitive Informatics
memory, by using the relational metaphor, rather than the traditional container metaphor that used to be adopted in psychology, computing, and information science. The OAR model shows that human memory and knowledge are represented by relations, i.e. connections of synapses between neurons, rather than by the neurons themselves as the traditional container metaphor described. The OAR model can be used to explain a wide range of human information processing mechanisms and cognitive processes.
The Cognitive Informatics Model of the Brain The human brain and its information processing mechanisms are centred in CI. A cognitive informatics model of the brain is proposed in (Wang and Wang, 2006), which explains the natural intelligence via interactions between the inherent (subconscious) and acquired (conscious) life functions. The model demonstrates that memory is the foundation for any natural intelligence. Formalism in forms of mathematics, logic, and rigorous treatment is introduced into the study of cognitive and neural psychology and natural informatics. Fundamental cognitive mechanisms of the brain, such as the architecture of the thinking engine, internal knowledge representation, long-term memory establishment, and roles of sleep in long-term memory development have been investigated (Wang and Wang, 2006).
Natural Intelligence (NI ) Natural Intelligence (NI) is the domain of CI. Software and computer systems are recognized as a subset of intelligent behaviors of human beings described by programmed instructive information (Wang, 2003b, Wang and Kinsner, 2006). The relationship between Artificial Intelligence (AI) and NI can be described by the following theorem. Theorem 3. The law of compatible intelligent capability states that artificial intelligence (AI) is always a subset of the natural intelligence (NI), i.e.: AI ⊆ NI
(5)
Theorem 3 indicates that AI is dominated by NI. Therefore, one should not expect a computer or a software system to solve a problem where humans cannot. In other words, no AI or computing system may be designed and/or implemented for a given problem where there is no solution being known by human beings.
Neural Informatics (NEI) Definition 5. Neural Informatics (NeI) is a new interdisciplinary enquiry of the biological and physiological representation of information and knowledge in the brain at the neuron level and their abstract mathematical models (Wang, 2004; Wang and Wang, 2006). NeI is a branch of CI, where memory is recognized as the foundation and platform of any natural or artificial intelligence (Wang and Wang, 2006). Definition 6. The Cognitive Models of Memory (CMM) states that the architecture of human memory is parallel configured by the Sensory Buffer Memory (SBM), Short-Term Memory (STM), Long-Term Memory (LTM), and Action-Buffer Memory (ABM), i.e.: CMM SBM || STM || LTM || ABM where the ABM is newly identified in (Wang and Wang, 2006).
(6)
The Theoretical Framework of Cognitive Informatics
The major organ that accommodates memories in the brain is the cerebrum or the cerebral cortex. In particular, the association and premotor cortex in the frontal lobe, the temporal lobe, sensory cortex in the frontal lobe, visual cortex in the occipital lobe, primary motor cortex in the frontal lobe, supplementary motor area in the frontal lobe, and procedural memory in cerebellum (Wang and Wang, 2006). The CMM model and the mapping of the four types of human memory onto the physiological organs in the brain reveal a set of fundamental mechanisms of NeI. The OAR model of information/knowledge representation described in Section 2.3 provides a generic description of information/knowledge representation in the brain (Wang et al., 2003; Wang.2006h). The theories of CI and NeI explain a number of important questions in the study of NI. Enlightening conclusions derived in CI and NeI are such as: (a) LTM establishment is a subconscious process; (b) The long-term memory is established during sleeping; (c) The major mechanism for LTM establishment is by sleeping; (d) The general acquisition cycle of LTM is equal to or longer than 24 hours; (e) The mechanism of LTM establishment is to update the entire memory of information represented as an OAR model in the brain; and (f) Eye movement and dreams play an important role in LTM creation. The latest development in CI and NeI has led to the determination of the magnificent and expected capacity of human memory as described in Section 4.2.
Cognitive Informatics Laws of Software It is commonly conceived that software as an artifact of human creativity is not constrained by the laws and principles discovered in the physical world. However, it is unknown what constrains software. The new informatics metaphor proposed by the author in CI perceives software is a type of instructive and behavioral information. Based on this, it is asserted that software obeys the laws of informatics. A comprehensive set of 19 CI laws for software have been established in (Wang, 2006f) such as: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
Abstraction Generality Cumulativeness Dependency on cognition Three-dimensional behavior space known as the object (O), space (S), and time (T) Sharability Dimensionless Weightless Transformability between I-M-E Multiple representation forms Multiple carrying media Multiple transmission forms Dependency on media Dependency on energy Wearless and time dependency Conservation of entropy Quality attributes of informatics Susceptible to distortion Scarcity
The informatics laws of software extend the knowledge on the fundamental laws and properties of software where the conventional product metaphor could not explain. Therefore, CI forms one of the foundations of software engineering and computing science.
Mechanisms of Human Perception Processes Definition 7. Perception is a set of interpretive cognitive processes of the brain at the subconscious cognitive function layers that detects, relates, interprets, and searches internal cognitive information in the mind.
The Theoretical Framework of Cognitive Informatics
Perception may be considered as the sixth sense of human beings which almost all cognitive life functions rely on. Perception is also an important cognitive function at the subconscious layers that determines personality. In other words, personality is a faculty of all subconscious life functions and experience cumulated via conscious life functions. According to LRMB, the main cognitive processes at the perception layer are emotion, motivation, and attitude (Wang, 2005a). The relationship between the internal emotion, motivation, attitude, and the embodied external behaviors can be formally and quantitatively described by the motivation/attitude-driven behavioral (MADB) model (Wang and Wang, 2006), which demonstrates that complicated psychological and cognitive mental processes may be formally modeled and rigorously described by mathematical means (Wang, 2002b; Wang, 2003d; Wang, 2005c).
The Cognitive Processes of Formal Inferences Theoretical research is predominately an inductive process; while applied research is mainly a deductive one. Both inference processes are based on the cognitive process and means of abstraction. Abstraction is a powerful means of philosophy and mathematics. It is also a preeminent trait of the human brain identified in CI studies (Wang, 2005c). All formal logical inferences and reasonings can only be carried out on the basis of abstract properties shared by a given set of objects under study. Definition 8. Abstraction is a process to elicit a subset of objects that shares a common property from a given set of objects and to use the property to identify and distinguish the subset from the whole in order to facilitate reasoning. Abstraction is a gifted capability of human beings. Abstraction is a basic cognitive process of the brain at the meta cognitive layer according to LRMB (Wang et al., 2006). Only by abstraction can important theorems and laws about the objects under study be elicited and discovered from a great variety of phenomena and empirical observations in an area of inquiry. Definition 9. Inferences are a formal cognitive process that reasons a possible causality from given premises based on known causal relations between a pair of cause and effect proven true by empirical arguments, theoretical inferences, or statistical regulations. Formal inferences may be classified into the deductive, inductive, abductive, and analogical categories (Wang, 2005c). Deduction is a cognitive process by which a specific conclusion necessarily follows from a set of general premises. Induction is a cognitive process by which a general conclusion is drawn from a set of specific premises based on three designated samples in reasoning or experimental evidences. Abduction is a cognitive process by which an inference to the best explanation or most likely reason of an observation or event. Analogy is a cognitive process by which an inference about the similarity of the same relations holds between different domains or systems, and/or examines that if two things agree in certain respects then they probably agree in others. A summary of the formal definitions of the five inference techniques is shown in Table 1. For seeking generality and universal truth, either the objects or the relations can only be abstractly described and rigorously inferred by abstract models rather than real-world details.
The Formal Knowledge System Mathematical thoughts (Jordan and Smith, 1997) provide a successful paradigm to organize and validate human knowledge, where once a truth or a theorem is established, it is true till the axioms or conditions that it stands for are changed or extended. A proven truth or theorem in mathematics does not need to be argued each time when one uses it. This is the advantage and efficiency of formal knowledge in science and engineering. In other words, if any theory or conclusion may be argued from time-to-time based on a wiser idea or a trade-off, it is an empirical result rather than a formal one.
The Theoretical Framework of Cognitive Informatics
Table 1. Definitions of formal inferences (24) No.
Inference technique
Formal description Primitive form
Usage Composite form
1
Abstraction
∀S, p ⇒ ∃ e ∈ E ⊆ S, p(e)
—
To elicit a subset of elements with a given generic property.
2
Deduction
∀x ∈ X, p(x) ⇒ ∃a ∈ X, p(a)
(∀x ∈ X, p(x) ⇒ q(x)) ⇒ ∃a ∈ X, p(a) (∃a ∈ X, p(a) ⇒ q(a))
To derive a conclusion based on a known and generic premises.
3
Induction
((∃a ∈ X, P(a)) ∧ (∃k, k+1 ∈ X, (P(k) ⇒ P(k+1))) ⇒ ∀x ∈ X, P(x)
((∃a ∈ X, p(a) ⇒ q(a)) ∧ (∃k, k+1 ∈ X, ((p(k) ⇒ q(k)) ⇒ (p(k+1) ⇒ q(k+1)))) ⇒ ∀x ∈ X, p(x) ⇒ q(x)
To determine the generic behavior of the given list or sequence of recurring patterns by three samples.
4
Abduction
(∀x ∈ X, p(x) ⇒ q(x)) ⇒ (∃a ∈ X, q(a) ⇒ p(a))
(∀x ∈ X, p(x) ⇒ q(x) ∧ r(x) ⇒ q(x)) ⇒ (∃a ∈ X, q(a) ⇒ (p(a) ∨ r(a)))
To seek the most likely cause(s) and reason(s) of an observed phenomenon.
5
Analogy
∃a ∈ X, p(a) ⇒ ∃b ∈ X, p(b)
(∃a ∈ X, p(a) ⇒ q(a)) ⇒ (∃b ∈ X, p(b) ⇒ q(b))
To predict a similar phenomenon or consequence based on a known observation.
The Framework of Formal Knowledge (FFK) of mankind (Wang, 2006g) can be described as shown in Figure 5. An FFK is centered by a set of theories. A theory is a statement of how and why certain objects, facts, or truths are related. All objects in nature and their relations are constrained by invariable laws, no matter if one observed them or not at any given time. An empirical truth is a truth based on or verifiable by observation, experiment, or experience. A theoretical proposition is an assertion based on formal theories or logical reasoning. Theoretical knowledge is a formalization of generic truth and proven abstracted empirical knowledge. Theoretical knowledge may be easier to acquire when it exists. However, empirical knowledge is very difficult to be gained without hands-on practice. According to the FFK model, an immature discipline of science and engineering is characterized by its body of knowledge not been formalized. Instead of a set of proven theories, the immature disciplines document a large set of observed facts, phenomena, and their possible or partially working explanations and hypotheses. In such disciplines, researchers and practitioners might be able to argue every informal conclusion documented in natural languages from time-to-time probably for hundreds of years, until it is formally described in mathematical forms and proved rigorously. The disciplines of mathematics and physics are successful paradigms that adopt the FFK formal knowledge system. The key advantages of the formal knowledge system are its stability and efficiency. The former is a property of the formal knowledge that once it is established and formally proved, users who refers to it will no longer need to reexamine or reprove it. The latter is a property of formal knowledge that is exclusively true or false that saves everybody’s time from arguing a proven theory.
Denotational Mathematics for CI The history of sciences and engineering shows that new problems require new forms of mathematics. CI is a new discipline, and the problems in it require new mathematical means that are descriptive and precise in expressing and denoting human and system actions and behaviors. Conventional analytic mathematics are unable to solve the fundamental problems inherited in CI and related disciplines such as neuroscience, psychology, philosophy, computing, software engineering, knowledge engineering. Therefore, denotational mathematical structures and means (Wang, 2006c) beyond mathematical logic are yet to be sought. Although there are various ways to express facts, objects, notions, relations, actions, and behaviors in natural languages, it is found in CI that human and system behaviors may be classified into three basic categories known
The Theoretical Framework of Cognitive Informatics
Figure 4. The framework of formal knowledge (FFK) The Formal Knowledge System
Discipline
Doctrine Definitions
Propositions
Hypotheses
Theories Theorems Concepts
Empirical verifications
Lemmas Formal proofs
Factors Corollaries
Laws
Arguments
Principles
Instances
Truths Phenomena Rules Models
Case studies Statistical norms
Methodologies Algorithms
as to be, to have, and to do. All mathematical means and forms, in general, are an abstract and formal description of these three categories of expressibility and their rules. Taking this view, mathematical logic may be perceived as the abstract means for describing ‘to be,’ set theory describing ‘to have,’ and algebras, particularly process algebra, describing ‘to do.’ Theorem 4. The utility of mathematics is the means and rules to express thought rigorously and generically at a higher level of abstraction. Three types of new mathematics, concept algebra (CA), real-time process algebra (RTPA), and system algebra (SA), are created in CI to enable rigorous treatment of knowledge representation and manipulation in a formal and coherent framework. The three new structures of contemporary mathematics have extended the abstract objects under study in mathematics from basic mathematical entities of numbers and sets to a higher level, i.e. concepts, behavioral processes, and systems. A wide range of applications of the denotational mathematics in the context of CI has been identified (Wang, 2002b; Wang, 2006d; Wang, 2006e).
Concept Algebra (CA) A concept is a cognitive unit (Ganter and Wille, 1999; Quillian, 1968; Wang, 2006e) by which the meanings and semantics of a real-world or an abstract entity may be represented and embodied based on the OAR model. Definition 10. An abstract concept c is a 5-tuple, i.e.:
c (O, A, R c , R i , R o )
10
(7)
The Theoretical Framework of Cognitive Informatics
where O is a nonempty set of object of the concept, O = {o1, o2, …, om} = ÞU, where ÞU denotes a power set of U. A is a nonempty set of attributes, A = {a1, a2, …, an} = ÞM. Rc ⊆ O × A is a set of internal relations. Ri ⊆ C′ × C is a set of input relations, where C′ is a set of external concepts. Ro ⊆ C × C′ is a set of output relations.
• • • • •
A structural concept model of c = (O, A, Rc, Ri, Ro) can be illustrated in Figure 6, where c, A, O, and R, R = {R , Ri, Ro}, denote the concept, its attributes, objects, and internal/external relations, respectively. c
Definition 11. Concept algebra (CA) is a new mathematical structure for the formal treatment of abstract concepts and their algebraic relations, operations, and associative rules for composing complex concepts and knowledge (Wang, 2006e).
Figure 5. The structural model of an abstract concept Θ c A Other Cs
Ri
Ro
O
Other Cs
Rc
Figure 6. The nine concept association operations as knowledge composing rules c1
Inheritance Extension
c2 Instantiation
+
º
A1
Tailoring
o 21
⇒
A2
R21
A21
-
º
Substitute
~
º
R1
Composition Decomposition
O1 s
R2
O2
Aggregation Specification
A
11
The Theoretical Framework of Cognitive Informatics
Concept algebra deals with the algebraic relations and associational rules of abstract concepts. The associations of concepts form a foundation to denote complicated relations between concepts in knowledge representation. The associations among concepts can be classified into nine categories, such as inheritance, extension, tailoring, substitute, composition, decomposition, aggregation, specification, and instantiation as shown in Figure 7 and Table 2 (Wang, 2006e). In Figure 7, R = {Rc, Ri, Ro}, and all nine associations describe composing rules among concepts, except instantiation that is a relation between a concept and a specific object. Definition 12. A generic knowledge K is an n-nary relation Rk among a set of n multiple concepts in C, i.e.: n
K = Rk : ( XCi ) → C
(8)
i=1
n
where
UC
i
+
, , , , , } . = C , and Rk ∈ ℜ = {⇒, ⇒, ⇒, ⇒
i=1
In Definition 12 the relation Rk is one of the concept operations in CA as defined in Table 2 (Wang, 2006e) that serves as the knowledge composing rules. Definition 13. A concept network CN is a hierarchical network of concepts interlinked by the set of nine associations ℜ defined in CA, i.e.: n
n
i=1
i= j
CN = Rê : XCi → XC j
(9)
where Rk ∈ R. Because the relations between concepts are transitive, the generic topology of knowledge is a hierarchical concept network. The advantages of the hierarchical knowledge architecture K in the form of concept networks are as follows: a) Dynamic: The knowledge networks may be updated dynamically along with information acquisition and learning without destructing the existing concept nodes and relational links. b) Evolvable: The knowledge networks may grow adaptively without changing the overall and existing structure of the hierarchical network. A summary of the algebraic relations and operations of concepts defined in CA are provided in Table 2.
Real-Time Process Algebra (RTPA) A key metaphor in system modeling, specification, and description is that a software system can be perceived and described as the composition of a set of interacting processes. Hoare (Hoare, 1985), Milner (Milner,1989), and others developed various algebraic approaches to represent communicating and concurrent systems, known as process algebra. A process algebra is a set of formal notations and rules for describing algebraic relations of software processes. Real-Time Process Algebra (RTPA) (Wang, 2002b; Wang, 2005b) extends process algebra to time/event, architecture, and system dispatching manipulations in order to formally describe and specify architectures and behaviors of software systems. A process in RTPA is a computational operation that transforms a system from a state to another by changing its inputs, outputs, and/or internal variables. A process can be a single meta-process or a complex process formed by using the process combination rules of RTPA known as process relations. Definition 14. Real-Time Process Algebra (RTPA) is a set of formal notations and rules for describing algebraic and real-time relations of software processes. RTPA models 17 meta processes and 17 process relations. A meta process is an elementary and primary process that serves as a common and basic building block for a software system. Complex processes can be derived from meta processes by a set of process relations that serves as process combinatory rules. Detailed semantics of RTPA may be referred to (Wang, 2002b). 12
The Theoretical Framework of Cognitive Informatics
Program modeling is on coordination of computational behaviors with given data objects. Behavioral or instructive knowledge can be modelled by RTPA. A generic program model can be described by a formal treatment of statements, processes, and complex processes from the bottom-up in the program hierarchy. Definition 15. A process P is a composed listing and a logical combination of n meta statements pi and pj, 1 ≤ i < n, 1 < j ≤ m = n+1, according to certain composing relations rij, i.e.: P=
n −1
R( p r i =1
i
p j ), j = i + 1
ij
= (...((( p1 ) r12 p2 ) r23 p3 ) ... rn −1,n pn )
(10)
where the big-R notation (Wang, 2002b; Wang, 2007) is adopted to describes the nature of processes as the building blocks of programs. Definition 16. A program P is a composition of a finite set of m processes according to the time-, event-, and interrupt-based process dispatching rules, i.e.:
P=
m
R(@ e P ) k
k
k =1
(11)
Equations 9 and 10 indicate that a program is an embedded relational algebraic entity. A statement p in a program is an instantiation of a meta instruction of a programming language that executes a basic unit of coherent function and leads to a predictable behavior. Theorem 5. The embedded relational model (ERM) states that a software system or a program P is a set of complex embedded relational processes, in which all previous processes of a given process form the context of the current process, i.e.: P=
m
R(@ e P ) k
k
k =1
=
n −1
m
R[@ e R( p (k ) r (k ) p (k ))], j = i + 1 k =1
k
i =1
i
ij
j
(12)
ERM presented in Theorem 5 provides a unified mathematical model of programs (Wang, 2006a) for the first time, which reveals that a program is a finite and nonempty set of embedded binary relations between a current statement and all previous ones that formed the semantic context or environment of computing. Definition 17. A meta process is the most basic and elementary processes in computing that cannot be broken up further. The set of meta processes P encompasses 17 fundamental primitive operations in computing as follows: P ={:=, , ⇒, ⇐, ⇐, , , |, |,
@
, , ↑, ↓, !, , , §}
(13)
Definition 18. A process relation is a composing rule for constructing complex processes by using the meta processes. The process relations R of RTPA are a set of 17 composing operations and rules to built larger architectural components and complex system behaviors using the meta processes, i.e.:
13
The Theoretical Framework of Cognitive Informatics
R = {→, , |, |…|,
*
+
i
R ,R , R , , , ||, ∫∫ , |||, », , , , } t
e
i
(14)
The definitions, syntaxes, and formal semantics of each of the meta processes and process relations may be referred to RTPA (Wang, 2002b; Wang, 2006f). A complex process and a program can be derived from the meta-processes by the set of algebraic process relations. Therefore, a program is a set of embedded relational processes as described in Theorem 5. A summary of the meta processes and their algebraic operations in RTPA are provided in Table 2.
System Algebra (SA) Systems are the most complicated entities and phenomena in the physical, information, and social worlds across all science and engineering disciplines (Klir, 1992; Bertalanffy, 1952; Wang, 2006d). Systems are needed because the physical and/or cognitive power of an individual component or person is not enough to carry out a work or solving a problem. An abstract system is a collection of coherent and interactive entities that has stable functions and clear boundary with external environment. An abstract system forms the generic model of various real world systems and represents the most common characteristics and properties of them. Definition 19. System algebra (SA) is a new abstract mathematical structure that provides an algebraic treatment of abstract systems as well as their relations and operational rules for forming complex systems (Wang, 2006d). Abstract systems can be classified into two categories known as the closed and open systems. Most practical and useful systems in nature are open systems in which there are interactions between the system and its environment. However, for understanding easily, the closed system is introduced first. Definition 20. A closed system S is a 4-tuple, i.e.: S = (C, R, B, Ω)
(15)
where • • • •
C is a nonempty set of components of the system, C = {c1, c2, …, cn}. R is a nonempty set of relations between pairs of the components in the system, R = {r1, r2, …, rm}, R ⊆ C × C. B is a set of behaviors (or functions), B = {b1, b2, …, bp}. Ω is a set of constraints on the memberships of components, the conditions of relations, and the scopes of behaviors, Ω = {ω1, ω2, …, ωq}.
Most practical systems in the real world are not closed. That is, they need to interact with external world known as the environment Θ in order to exchange energy, matter, and/or information. Such systems are called open systems. Typical interactions between an open system and the environment are inputs and outputs. Definition 21. An open system S is a 7-tuple, i.e.: S = (C, R, B, Ω, Θ) = (C, Rc, Ri, Ro, B, Ω, Θ)
where the extensions of entities beyond the closed system are as follows:
14
(16)
The Theoretical Framework of Cognitive Informatics
• • • •
Θ is the environment of S with a nonempty set of components CΘ outside C. Rc ⊆ C × C is a set of internal relations. Ri ⊆ CΘ × C is a set of external input relations. Ro ⊆ C × CΘ is a set of external output relations. An open system S = (C, Rc, Ri, Ro, B, Ω, Θ) can be illustrated in Figure 7 (Wang06d).
Theorem 6. The equivalence between open and closed systems states that an open system S is equivalent to a closed system S , or vice verse, when its environment QS or QS is conjoined, respectively, i.e.: ìï S = S QS ïï í ïï S = S Q S ïî
(17)
According to Theorem 6, any subsystem S k of a closed system S is an open system S. That is, any super system S of a given set of n open systems Sk , plus their environments Θk , 1 ≤ k ≤ n, is a closed systems. The algebraic relations and operations of systems in SA are summarized in Table 2.
Theorem 7. The Wang’s first law of system science, system fusion, states that system conjunction or composition between two systems S1 and S2 creates new relations ∆R12 and/or new behaviors (functions) ∆B12 that are solely a property of the new super system S determined by the sizes of the two intersected component sets #(C1) and #(C2), i.e.: ∆R12 = #(R) – (#(R1) + #(R2)) = (#(C1 + C2))2 – ((#(C1))2 +(#(C2))2) = 2 (#(C1) • #(C2))
(18)
The discovery in Theorem 7 reveals that the mathematical explanation of system utilities is the newly gained relations ∆R12 and/or behaviors (functions) ∆B12 during the conjunction of two systems or subsystems. The empirical awareness of this key system property has been intuitively or qualitatively observed for centuries. However, Theorem 7 is the first rigorous explanation of the mechanism of system gains during system conjunctions and compositions. According to Theorem 7, the maximum incremental or system gain equals to the number of bydirectly interconnection between all components in both S1 and S2, i.e., 2(#(C1) • #(C2)). Figure 7. The abstract model of an open system U Θ Ri1
R1 C1
B1
Ro1
Ω1 S
Rc1
Rc1 R2
Ri2
C2
B2
Ro2
Ω2
15
The Theoretical Framework of Cognitive Informatics
Table 2. Taxonomy of contemporary mathematics for knowledge representation and manipulation Operations
Concept System Algebra Algebra
Super/sub relation / / Related/independent ↔ / ↔ / Equivalent = = Consistent ≅ Overlapped Π
Real-Time Process Algebra Meta Processes Relational Operations Assignment := Sequence → Evaluation Jump Addressing ⇒ Branch | Memory allocation ⇐ Switch |…|… * Memory release While-loop
Conjunction
+
Read
Repeat-loop
R R
Elicitation
*
Write
For-loop
R
Comparison Definition Difference
~
Input Output Timing
| Recursion | Procedure call @ Parallel
||
↑ ↓ !
Event-driven dispatch § Interrupt-driven dispatch
Inheritance Extension Tailoring Substitute Composition Decomposition Aggregation/ generalization Specification Instantiation
⇒
⇒
Duration Increase Decrease Exception detection Skip
Stop System
Concurrence Interleave Pipeline Interrupt Time-driven dispatch
||| »
+
i
t
e i
Theorem 8. The Wang’s 2nd law of system science, the maximum system gain, states that work done by a system is always larger than any of its components, but is less than or is equal to the sum of those of its components, i.e.: n ìï ïïW (S ) £ åW (ei ), h £1 ïí i =1 ïï ïïîW (S ) > max(W (ei )), ei Î ES
(19)
There was a myth on an ideal system in conventional systems theory that supposes the work down by the n ideal system W(S) may be greater than the sum of all its components W(ei), i.e.: W (S ) ³ åW (ei ) . According to i =1 Theorems 7 and 8, the ideal system utility is impossible to achieve. A summary of the algebraic operations and their notations in CA, RTPA, and SA is provided in Table 2. Details may be referred to (Wang, 2006d; Wang, 2006g).
Applications of CI Sections 2 and 3 have reviewed the latest development of fundamental researches in CI, particularly its theoretical framework and descriptive mathematics. A wide range of applications of CI has been identified in multidisciplinary
16
The Theoretical Framework of Cognitive Informatics
and transdisciplinary areas, such as: (1) The architecture of future generation computers; (2) Estimation the capacity of human memory; (3) Autonomic computing; (4) Cognitive properties of information, data, knowledge, and skills in knowledge engineering; (5) Simulation of human cognitive behaviors using descriptive mathematics; (6) Agent systems; (7) CI foundations of software engineering; (8) Deductive semantics of software; and (9) Cognitive complexity of software.
The Architecture of Future Generation Computers Conventional machines are invented to extend human physical capability, while modern information processing machines, such as computers, communication networks, and robots, are developed for extending human intelligence, memory, and the capacity for information processing (Wang, 2004). Recent advances in CI provide formal description of an entire set of cognitive processes of the brain (Wang et al., 2006). The fundamental research in CI also creates an enriched set of contemporary denotational mathematics (Wang, 2006c), for dealing with the extremely complicated objects and problems in natural intelligence, neural informatics, and knowledge manipulation. The theory and philosophy behind the next generation computers and computing methodologies are CI (Wang, 2003b; Wang, 2004). It is commonly believed that the future-generation computers, known as the cognitive computers, will adopt non-von Neumann (Neumann, 1946) architectures. The key requirements for implementing a conventional stored-program controlled computer are the generalization of common computing architectures and the computer is able to interpret the data loaded in memory as computing instructions. These are the essences of stored-program controlled computers known as the von Neumann architecture (Neumann, 1946). Von Neumann elicited five fundamental and essential components to implement general-purpose programmable digital computers in order to embody the concept of stored-program-controlled computers. Definition 22. A von Neumann Architecture (VNA) of computers is a 5-tuple that consists of the components: (a) the arithmetic-logic unit (ALU), (b) the control unit (CU) with a program counter (PC), (c) a memory (M), (d) a set of input/output (I/O) devices, and (e) a bus (B) that provides the data path between these components, i.e.: VNA (ALU, CU, M, I/O, B)
(20)
Definition 23. Conventional computers with VNA are aimed at stored-program-controlled data processing based on mathematical logic and Boolean algebra. A VNA computer is centric by the bus and characterized by the all purpose memory for both data and instructions. A VNA machine is an extended Turing machine (TM), where the power and functionality of all components of TM including the control unit (with wired instructions), the tape (memory), and the head of I/O, are greatly enhanced and extended with more powerful instructions and I/O capacity. Definition 24. A Wang Architecture (WA) of computers, known as the Cognitive Machine as shown in Figure 8, is a parallel structure encompassing an Inference Engine (IE) and a Perception Engine (PE) (Wang, 2006b; Wang, 2006g), i.e.: WA (IE || PE) = ( KMU // The knowledge manipulation unit || BMU // The behavior manipulation unit || EMU // The experience manipulation unit || SMU // The skill manipulation unit ) || ( BPU // The behavior perception unit || EPU // The experience perception unit )
(21)
17
The Theoretical Framework of Cognitive Informatics
As shown in Figure 8 and Eq. 21, WA computers are not centered by a CPU for data manipulation as the VNA computers do. The WA computers are centered by the concurrent IE and PE for cognitive learning and autonomic perception based on abstract concept inferences and empirical stimuli perception. The IE is designed for concept/knowledge manipulation according to concept algebra (Wang, 2006e), particularly the 9 concept operations for knowledge acquisition, creation, and manipulation. The PE is designed for feeling and perception processing according to RTPA (Wang, 2002b) and the formally described cognitive process models of the perception layers as defined in the LRMB model (Wang et al., 2006). Definition 25. Cognitive computers with WA are aimed at cognitive and perceptive concept/knowledge processing based on contemporary denotational mathematics, i.e. Concept Algebra (CA), Real-Time Process Algebra (RTPA), and System Algebra (SA). As that of mathematical logic and Boolean algebra are the mathematical foundations of VNA computers. The mathematical foundations of WA computers are based on denotational mathematics (Wang, 2006b; Wang, 2006c). As described in the LRMB reference model (Wang et al., 2006), since all the 37 fundamental cognitive processes of human brains can be formally described in CA and RTPA (Wang, 2002b; Wang, 2006e). In other words, they are simulatable and executable by the WA-based cognitive computers.
Estimation the Capacity of Human Memory Despite the fact that the number of neurons in the brain has been identified in cognitive and neural sciences, the magnitude of human memory capacity is still unknown. According to the OAR model, a recent discovery in CI is that the upper bound of memory capacity of the human brain is in the order of 108,432 bits (Wang et al., 2003). The determination of the magnitude of human memory capacity is not only theoretically significant in CI, but also practically useful to unveil the human potential, as well as the gaps between the natural and machine intelligence. This result indicates that the next generation computer memory systems may be built according to the OAR model rather than the traditional container metaphor, because the former is more powerful, flexible, and efficient to generate a tremendous memory capacity by using limited number of neurons in the brain or hardware cells in the next generation computers.
Figure 8. The architecture of a cognitive machine
IE LTM
KMU
LTM
Knoledge
LTM
BMU
ABM
Behaviors
ABM
EMU
LTM
Experience
ABM
SMU
ABM
Skills
Enquiries
CM = IE || PE
Interactions
SBM
BPU
ABM
Behaviors
SBM
EPU
LTM
Experience
Stimuli PE The Cognitive Machine (CM)
18
The Theoretical Framework of Cognitive Informatics
Autonomic Computing The approaches to implement intelligent systems can be classified into those of biological organisms, silicon automata, and computing systems. Based on CI studies, autonomic computing (Wang, 2004) is proposed as a new and advanced computing technique built upon the routine, algorithmic, and adaptive systems as shown in Table 3. The approaches to computing can be classified into two categories known as imperative and autonomic computing. Corresponding to these, computing systems may be implemented as imperative or autonomic computing systems. Definition 26. An imperative computing system is a passive system that implements deterministic, context-free, and stored-program controlled behaviors. Definition 27. An autonomic computing system is an intelligent system that autonomously carries out robotic and interactive actions based on goal- and event-driven mechanisms. The imperative computing system is a traditional passive system that implements deterministic, context-free, and stored-program controlled behaviors, where a behavior is defined as a set of observable actions of a given computing system. The autonomic computing system is an active system that implements nondeterministic, context-dependent, and adaptive behaviors, which do not rely on instructive and procedural information, but are dependent on internal status and willingness that formed by long-term historical events and current rational or emotional goals. The first three categories of computing techniques as shown in Table 3 are imperative. In contrast, the autonomic computing systems are an active system that implements nondeterministic, context-sensitive, and adaptive behaviors. Autonomic computing does not rely on imperative and procedural instructions, but are dependent on perceptions and inferences based on internal goals as revealed in CI.
Cognitive Properties of Knowledge Almost all modern disciplines of science and engineering deal with information and knowledge. According to CI theories, cognitive information may be classified into four categories known as knowledge, behaviors, experience, and skills as shown in Table 4. Definition 28. The taxonomy of cognitive information is determined by its types of inputs and outputs to and from the brain during learning and information processing, where both inputs and outputs can be either abstract information (concept) or empirical information (actions). It is noteworthy that the approaches to acquire knowledge/behaviors and experience/skills are fundamentally different. The former may be obtained either directly based on hands-on activities or indirectly by reading, while the latter can never be acquired indirectly.
Table 3. Classification of computing systems Behavior (O) Constant Event (I)
Variable
Constant
Routine
Adaptive
Variable
Algorithmic
Autonomic
Deterministic
Nondeterministic
Type of behavior
19
The Theoretical Framework of Cognitive Informatics
According to Table 4, the following important conclusions on information manipulation and learning for both human and machine systems can be derived. Theorem 9. The principle of information acquisition states that there are four sufficient categories of learning known as those of knowledge, behaviors, experience, and skills. Theorem 9 indicates that learning theories and their implementation in autonomic and intelligent systems should study all four categories of cognitive information acquisitions, particularly behaviors, experience, and skills rather than only focusing on knowledge. Corollary 3. All the four categories of information can be acquired directly by an individual. Corollary 4. Knowledge and behaviors can be learnt indirectly by inputting abstract information; while experience and skills must be learnt directly by hands-on or empirical actions. The above theory of CI lays an important foundation for learning theories and pedagogy (Wang, 2004; Wang, 2006e). Based on the fundamental work, the IE and PE of cognitive computers working as a virtual brain can be implemented on WA-based cognitive computers and be simulated on VNA-based conventional computers.
Simulation of Hman Cognitive Behaviors Using the Contemporary Mathematics The contemporary denotational mathematics as described in Section 3, particularly CA and RTPA, may be used to simulate the cognitive processes of the brain as modeled in LRMB (Wang et al., 2006). Most of the 37 cognitive processes identified in LRMB, such as the learning (Wang, 2006e), reasoning (Wang, 2006b), decision making (Wang et al., 2004), and comprehension (Wang and Gafurov, 2003) processes, have been rigorously modeled and described in RTPA and CA. Based on the fundamental work, the inference engineering and perception engine of a virtual brain can be implemented on cognitive computers or be simulated on conventional computers. In the former case, a working prototype of a fully autonomic computer will be realized on the basis of CI theories.
Agent Systems Definition 29. A software agent is an intelligent software system that autonomously carries out robotistic and interactive applications based on goal-driven mechanisms (Wang, 2003c). Because a software agent may be perceived as an application-specific virtual brain (see Theorem 3), behaviors of an agent are mirrored human behaviors. The fundamental characteristics of agent-based systems are autonomic computing, goal-driven action-generation, knowledge-based machine learning. In recent CI research, perceptivity is recognized as the sixth sense that serves the brain as the thinking engine and the kernel of the natural intelligence. Perceptivity implements self consciousness inside the abstract memories of the brain. Almost all cognitive life functions rely on perceptivity such as consciousness, memory searching, motivation, willingness, goal setting, emotion, sense of spatiality, and sense of motion. The brain may be stimulated by external and internal information, which can be classified as: willingness-driven (internal events such as goals, motivation, and emotions), event-driven (external events), and time-driven (mainly external events triggered by an external Table 4. Types of cognitive information Type of Output
Type of Input
20
Ways of Acquisition
Abstract Concept
Empirical Action
Abstract Concept
Knowledge
Behavior
Direct or indirect
Empirical Action
Experience
Skill
Direct only
The Theoretical Framework of Cognitive Informatics
clock). Unlike a computer, the brain works in two approaches: the internal willingness-driven processes, and the external event- and time-driven processes. The external information and events are the major sources that drive the brain, particularly for conscious life functions. Recent research in CI reveals that the foundations of agent technologies and autonomic computing are CI, particularly goal-driven action generation techniques (Wang, 2003c). The LRMB model (Wang et al., 2006) described in Section 2.2 may be used as a reference model for agent-based technologies. This is a fundamental view toward the formal description and modeling of architectures and behaviors of agent systems, which are created to do something repeatable in context, to extend human capability, reachability, and/or memory capacity. It is found that both human and software behaviors can be described by a 3-dimensional representative model comprising action, time, and space. For agent system behaviors, the three dimensions are known as mathematical operations, event/process timing, and memory manipulation (Wang, 2006g). The 3-D behavioral space of agents can be formally described by RTPA that serves as an expressive mathematical means for describing thoughts and notions of dynamic system behaviors as a series of actions and cognitive processes.
CI Foundations of Software Engineering Software is an intellectual artifact and a kind of instructive information that provides a solution for a repeatable computer application, which enables existing tasks to be done easier, faster, and smarter, or which provides innovative applications for the industries and daily life. Large-scale software systems are highly complicated systems that have never been handled or experienced precedent by mankind. The fundamental cognitive characteristics of software engineering have been identified as follows (Wang, 2006g): • • • • • • • •
The inherent complexity and diversity The difficulty of establishing and stabilizing requirements The changeability or malleability of system behavior The abstraction and intangibility of software products The requirement of varying problem domain knowledge The non-deterministic and poly-solvability in design The polyglotics and polymorphism in implementation The dependability of interactions among software, hardware, and human beings
The above list forms a set of fundamental constraints for software engineering, identified as the cognitive constraints of intangibility, complexity, indeterminacy, diversity, polymorphism, inexpressiveness, inexplicit embodiment, unquantifiable quality measures (Wang, 2006g). A set of psychological requirements for software engineers has been identified, such as: a) Abstract-level thinking; b) Imagination of dynamic behaviors with static descriptions; c) Organization capability; d) Cooperative attitude in team work; e) Long-period focus of attentions; f) Preciseness; g) Reliability; and h) Expressive capability in communication.
Deductive Semantics of Software Deduction is a reasoning process that discovers new knowledge or derives a specific conclusion based on generic premises such as abstract rules or principles. In order to provide an algebraic treatment of the semantics of program and human cognitive processes, a new type of formal semantics known as deductive semantics is developed (Wang, 2006f/g). Definition 30. Deductive semantics is a formal semantics that deduces the semantics of a program from a generic abstract semantic function to the concrete semantics, which are embodied onto the changes of status of a finite set of variables constituting the semantic environment of computing (Wang, 2006g).
21
The Theoretical Framework of Cognitive Informatics
Theorem 10. The semantics of a statement p, θ(p), on a given semantic environment Θ in deductive semantics is a double partial differential of the semantic function, fθ (p) = fp : T ´ S ® V = v p (t, s ), t Î T Ù s Î S Ù v p Î V , on the sets of variables S and executing steps T, i.e.: 2 2 θ (p) = ∂ fq ( p) = ∂ v p (t , s)
∂t ∂s
∂t ∂s
(22)
#T ( p ) # S ( p )
=
R Rv i =0
=
p
(ti , s j )
1
#{s1 , s 2 , ..., s m }
R
R
i =0
j =1
j =1
v p (ti , s j )
æ ö çç s1 s2 sm ÷÷ ÷÷ çç = ç t0 v 01 v 02 v 0m ÷÷ çç ÷÷ ÷ ç (t , t ] v v v ççè 0 1 11 12 1m ÷ ø÷
where t denotes the discrete time immediately before and after the execution of p during (t0, t1), and # is the cardinal calculus that counts the number of elements in a given set, i.e. n = #T(p) and m=#S(p). The first partial differential in Eq. 22 selects all related variable S(p) of the statement p from Θ. The second partial differential selects a set of discrete steps of p’s execution T(p) from Θ. According to Theorem 10, the semantics of a statement can be reduced onto a semantic function that results in a 2-D matrix with the changes of values for all variables over time along program execution. Deductive semantics perceives that the carriers of software semantics are a finite set of variables declared in a given program. Therefore, software semantics can be reduced onto the changes of values of these variables. The deductive mathematical models of semantics and the semantic environment at various composing levels of systems are formally described. Properties of software semantics and relationships between the software behavioral space and the semantic environment are discussed. Deductive semantics is applied in the formal definitions and explanations of the semantic rules of a comprehensive set of software static and dynamic behaviors as modeled in RTPA. Deductive semantics can be used to define abstract and concrete semantics of software and cognitive systems, and facilitate software comprehension and recognition by semantic analyses.
Cognitive Complexity of Software The estimation and measurement of functional complexity of software are an age-long problem in software engineering. The cognitive complexity of software (Wang, 2006j) is a new measurement for cross-platform analysis of complexities, sizes, and comprehension effort of software specifications and implementations in the phases of design, implementation, and maintenance in software engineering. This work reveals that the cognitive complexity of software is a product of its architectural and operational complexities on the basis of deductive semantics and the abstract system theory. Ten fundamental basic control structures (BCS’s) are elicited from software architectural/behavioral specifications and descriptions. The cognitive weights of those BCS’s are derived and calibrated via a series of psychological experiments. Based on this work, the cognitive complexity of software systems can be rigorously and accurately measured and analyzed. Comparative case studies demonstrate that the cognitive complexity is highly distinguishable in software functional complexity and size measurement in software engineering. On the basis of the ERM model described in Theorem 5 and the deductive semantics of software presented in Section 4.8, the finding on the cognitive complexity of software is obtained as follows.
22
The Theoretical Framework of Cognitive Informatics
Theorem 11. The sum of the cognitive weights of all rij, w(rij), in the ERM model determines the operational complexity of a software system Cop, i.e.:
Cop =
n −1
Σ w(r ), j = i + 1 ij
i =1
(23)
A set of psychological experiments has been carried out in undergraduate and graduate classes in software engineering. Based on 126 experiment results, the equivalent cognitive weights of the ten fundamental BCS’s are statistically calibrated as summarized in Table 5 (Wang, 2006j), where the relative cognitive weight of the sequential structures is assumed one, i.e. w1 = 1. According to deductive semantics, the complexity of a software system, or its semantic space, is determined not only by the number of operations, but also by the number of data objects. Theorem 12. The cognitive complexity Cc(S) of a software system S is a product of the operational complexity Cop(S) and the architectural complexity Ca (S), i.e.: C c (S ) = C op (S ) · C a (S ) nC #(C s (C k ))
= {å
k =1 nCLM
å
w(k, i )} ·
i =1
{ å OBJ(CLM k ) + k =1
(24)
nC
å OBJ(C k )}
[FO]
k =1
Based on Theorem 12, the following corollary can be derived. Corollary 5. The cognitive complexity of a software system is proportional to both its operational and structural complexities. That is, the more the architectural data objects and the higher the operational complicity onto these objects, the larger the cognitive complexity of the system.
Table 5. Calibrated cognitive weights of BCS’s BCS
RTPA Notation
Description
Calibrated cognitive weight
1
→
Sequence
1
2
|
Branch
3
3
|… |…
Switch
4
R
i
For-loop
7
5
R
*
Repeat-loop
7
6
R
*
While-loop
8
7
Function call
7
8
Recursion
11
9
|| or ∫∫
Parallel
15
10
Interrupt
22
4
23
The Theoretical Framework of Cognitive Informatics
Table 6. Measurement of software system complexities System
Time complexity (Ct (OP))
Cyclomatic complexity (Cm (-))
Symbolic complexity (Cs (LOC))
Cognitive complexity Operational complexity (Cop (F))
Architectural complexity (Ca (O))
Cognitive complexity (Cc (FO))
IBS (a)
ε
1
7
13
5
65
IBS (b)
O(n)
2
8
34
5
170
MaxFinder
O(n)
2
5
115
7
805
SIS_Sort
O(m+n)
5
8
163
11
1,793
Based on Theorem 11, the cognitive complexities of four typical software components (Wang, 2006j) have been comparatively analyzes as summarized in Table 6. For enabling comparative analyses, data based on existing complexity measures, such as time, cyclomatic, and symbolic (LOC) complexities, are also contrasted in Table 6. Observing Table 6 it can be seen that the first three traditional measurements cannot actually reflect the real complexity of software systems in software design, representation, cognition, comprehension, and maintenance. It is found that: (a) Although four example systems are with similar symbolic complexities, their operational and functional complexities are greatly different. This indicates that the symbolic complexity cannot be used to represent the operational or functional complexity of software systems. (b) The symbolic complexity (LOC) does not represent the throughput or the input size of problems. (c) The time complexity does not work well for a system there is no loops and dominate operations, because in theory that all statements in linear structures are treated as zero in this measure no matter how long they are. In addition, time complexity can not distinguish the real complexities of systems with the same asymptotic function, such as in Case 2 (IBS (b)) and Case 3 (Maxfinder). (d) The cognitive complexity is an ideal measure of software functional complexities and sizes, because it represents the real semantic complexity by integrating both the operational and architectural complexities in a coherent measure. For example, the difference between IBS(a) and IBS(b) can be successfully captured by the cognitive complexity. However, the symbolic and cyclomatic complexities cannot identify the functional differences very well.
Conclusion This chapter has presented an intensive survey of the recent advances and ground breaking studies in Cognitive informatics (CI), particularly its theoretical framework, denotational mathematics, and main application areas. CI has been described as a new discipline that studies the natural intelligence and internal information processing mechanisms of the brain, as well as processes involved in perception and cognition. CI is a new frontier across disciplines of computing, software engineering, cognitive sciences, neuropsychology, brain sciences, and philosophy in recent years. It has been recognized that many fundamental issues in knowledge and software engineering are based on the deeper understanding of the mechanisms of human information processing and cognitive processes. A coherent set of theories for CI has been described in this chapter, such as the Information-Matter-Energy (IME) model, Layered Reference Model of the Brain (LRMB), the OAR model of information representation, Natural Intelligence (NI) vs. Artificial Intelligence (AI), Autonomic Computing (AC) vs. imperative computing, CI laws of software, mechanisms of human perception processes, the cognitive processes of formal inferences, and the formal knowledge system. Three contemporary mathematical means have been created in CI known as the denotational mathematics. Within the new forms of denotational mathematical means for CI, Concept Algebra (CA) has been designed to deal with the new abstract mathematical structure of concepts and their representation and manipulation in learning and knowledge engineering. Real-Time Process Algebra (RTPA) has been devel-
24
The Theoretical Framework of Cognitive Informatics
oped as an expressive, easy-to-comprehend, and language-independent notation system, and a specification and refinement method for software system behaviors description and specification. System Algebra (SA) has been created to the rigorous treatment of abstract systems and their algebraic relations and operations. A wide range of applications of CI has been identified in multidisciplinary and transdisciplinary areas, such as the architecture of future generation computers, estimation the capacity of human memory, autonomic computing, cognitive properties of information, data, knowledge, and skills in knowledge engineering, simulation of human cognitive behaviors using descriptive mathematics, agent systems, CI foundations of software engineering, deductive semantics of software, and cognitive complexity of software systems.
Ak The author would like to acknowledge the Natural Science and Engineering Council of Canada (NSERC) for its support to this work. The author would like to thank the anonymous reviewers for their valuable comments and suggestions.
References Bell, D.A. (1953), Information theory. London: Pitman. Ganter, B., & Wille, R. (1999). Formal concept analysis (pp. 1-5). Springer. Hoare, C.A.R. (1985). Communicating sequential processes. Prentice-Hall Inc. Jordan, D. W., & Smith, P. (1997). Mathematical techniques: An introduction for the engineering, physical, and mathematical sciences (2nd ed.). UK; Oxford University Press. Klir G.J. (1992). Facets of systems science. New York: Plenum. Milner, R. (1989). Communication and concurrency. Englewood Cliffs, NJ: Prentice-Hall. Quillian, M.R. (1968). Semantic memory. In M. Minsky (ed.), Semantic information processing. Cambridge, MA: Cambridge Press. Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal,.27, 379-423, 623-656. von Bertalanffy, L. (1952). Problems of life: An evolution of modern biological and scientific thought. London: C.A. Watts. von Neumann, J. (1946). The principles of large-scale computing machines. Reprinted in Annals of History of Computers, 3(3), 263-273. Wang, Y. (2002, August). Cognitive informatics Keynote Speech from the Proceedings 1st IEEE International Conference on Cognitive Informatics (ICCI’02), Calgary, Canada: IEEE CS Press. (pp. 34-42). Wang, Y. (2002). The real-time process algebra (RTPA). The International Journal of Annals of Software Engineering, 14, 235-274. Wang, Y., R. Johnston, & Smith, M. (eds.) (2002, August). Cognitive informatics: Proceedings of the 1st IEEE International Conference (ICCI02). Calgary, AB, Canada: IEEE CS Press. Wang, Y. (2003). Cognitive informatics: A new transdisciplinary research field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 115-127.
25
The Theoretical Framework of Cognitive Informatics
Wang, Y. (2003). On cognitive informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 151-167. Wang, Y. (2003, August), Keynote Speech: Cognitive informatics models of software agent systems and autonomic computing. Keynote Speech from the Proceedings of International Conference on Agent-Based Technologies and Systems (ATS’03) (p. 25)..Calgary, Canada: Univ. of Calgary Press. Wang, Y. (2003). Using process algebra to describe human and software system behaviors. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 199-213. Wang, Y., D., Liu, & Wang, Y. (2003). Discovering the capacity of human memory. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy,.4(2), 189-198. Wang, Y., & Gafurov, D. (2003, August). The cognitive process of comprehension. Proceedings of the 2nd IEEE International Conference on Cognitive Informatics (ICCI’03) (pp. 93-97). London, UK: IEEE CS Press. Wang, Y. (2004, August). Autonomic computing and cognitive processes. Keynote Specch from the Proceedings of the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) (pp. 3-4). Victoria, Canada: IEEE CS Press. Wang, Y., Dong, L., & Ruhe, G. (2004, July). Formal description of the cognitive process of decision making. Proceedings of the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) (pp. 124-130). Victoria, Canada: IEEE CS Press., Wang, Y. (2005, August). On the cognitive processes of human perceptions. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 203-211). Irvin, California: IEEE CS Press. Wang, Y. (2005, May). On the mathematical laws of software. Proceedings of the 18th Canadian Conference on Electrical and Computer Engineering (CCECE’05) (pp. 1086-1089). Saskatoon, SA, Canada. Wang, Y. (2005, August). The cognitive processes of abstraction and formal inferences. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 18-26). Irvin, California: IEEE CS Press. Wang, Y. (2006, May). A unified mathematical model of programs. Proceedings of the 19th Canadian Conference on Electrical and Computer Engineering (CCECE’06) (pp.2346-2349). Ottawa, ON, Canada. Wang, Y. (2006, July). Cognitive informatics - Towards the future generation computers that think and feel. Keynote Speech from the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 3-7). Beijing, China: IEEE CS Press. Wang, Y. (2006, July). Cognitive informatics and contemporary mathematics for knowledge representation and manipulation. Invited Plenary Talk from the Proceedings of the 1st International Conference on Rough Set and Knowledge Technology (RSKT’06) (pp. 69-78). Lecture Notes in Artificial Intelligence, LNAI 4062, Chongqing, China: Springer. Wang, Y. (2006, July). On abstract systems and system algebra. Proeedings.of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 332-343).Beijing, China: IEEE CS Press. Wang, Y. (2006, July). On concept algebra and knowledge representation. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp.320-331).Beijing, China: IEEE CS Press. Wang, Y. (2006). On the informatics laws and deductive semantics of software. IEEE Transactions on Systems, Man, and Cybernetics (Part C), 36(2), 161-171. Wang, Y. (2006, May) The OAR model for knowledge representation. Proceedings of the 19th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE’06) (pp. 1696-1699). Ottawa, Canada.
26
The Theoretical Framework of Cognitive Informatics
Wang, Y., & Kinsner, W. (2006, March). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (Part C), 36(2), 121-123. Wang, Y., & Wang, Y. (2006, March). On cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics, 36(2), 203-207. Wang, Y. (2006, July). On the Big-R notation for describing iterative and recursive behaviors. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 132-140). Beijing, China: IEEE CS Press. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB), IEEE Transactions on Systems, Man, and Cybernetics (Part C), 36(2), 124-133. Wang, Y. (2006, July). Cognitive complexity of software and its measurement. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 226-235). Beijing, China: IEEE CS Press. Wang, Y. (2007). Software engineering foundations: A software science perspective. CRC Book Series in Software Engineering, Vol. II, Auerbach Publications, USA.
27
28
Chapter II
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics? Withold Kinsner University of Manitoba, Canada
Abstract This chapter provides a review of Shannon and other entropy measures in evaluating the quality of materials used in perception, cognition, and learning processes. Energy-based metrics are not suitable for cognition, as energy itself does not carry information. Instead, morphological (structural and contextual) metrics as well as entropybased multiscale metrics should be considered in cognitive informatics. Appropriate data and signal transformation processes are defined and discussed in the perceptual framework, followed by various classes of information and entropies suitable for characterization of data, signals, and distortion. Other entropies are also described, including the Rényi generalized entropy spectrum, Kolmogorov complexity measure, Kolmogorov-Sinai entropy, and Prigogine entropy for evolutionary dynamical systems. Although such entropy-based measures are suitable for many signals, they are not sufficient for scale-invariant (fractal and multifractal) signals without corresponding complementary multiscale measures.
Introduction This chapter is concerned with measuring the quality of various materials used in perception, cognition and evolutionary learning processes. The multimedia materials may include temporal signals such as sound, speech, music, biomedical and telemetry signals, as well as spatial signals such as still images, and spatio-temporal signals such as animation and video. A comprehensive review of the scope of multimedia storage and transmission is presented by Kinsner (2002). Most of such original materials are altered (compressed or enhanced) either to fit the available storage or bandwidth during their transmission, or to enhance perception of the materials. Since the signals may also be contaminated by noise during different stages of their processing and transmission, various denoising techniques must be used to minimize the noise, without affecting the signal itself (Kinsner, 2002). Different classes of coloured and fractal noise are described by Kinsner (1996). The multimedia compression
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
is often lossy in that the signals are altered with respect not only to their redundancy, but also to their cognitive relevancy. Since the signals are presented to humans, cognitive processes must be considered in the development of suitable quality metrics. This chapter describes a very fundamental class of metrics based on entropy, and identifies its usefulness and limitations in the area of cognitive informatics (CI) (Wang, 2002).
Issues in Compression and Coding A simple source compression consists of taking an input stream of symbols S and mapping the stream into an output stream of codes G, so that G should be smaller than S. The effectiveness of the mapping depends on the selection of an appropriate model of the source. This two-step process is illustrated in Figure 1. Modelling of the source is intended to extract information from the source in order to guide the coder in the selection of proper codes. The models may be either given a priori (static) or may be constructed on-the-fly (dynamic, in adaptive compression) throughout the compression process. In data compression, the modeller may either consider the discrete probability mass function (pmf) of the source, or look for a structure (e.g., the pattern of edges and textures) in the source itself. In perceptual signal compression, the modeller may consider the perceptual framework (e.g., edges and textures in images and the corresponding masking in either the human visual system, HVS, (Pennebaker & Mitchell, 1993) or the human psycho-acoustic system, PAS (Jayant, 1992). It is in this modelling that CI ought to be used extensively. A simple data source coder minimizes the bit rate of the data by redundancy minimization based on Shannon first-order or higher-order entropies. Redundancy is a probabilistic measure (entropy) of the spread of probabilities of the occurrence of individual symbols in the source with respect to the the equal (uniform) symbol probabilities. If the probabilities of the source symbols are all equal, the source entropy becomes maximum, and there is no redundancy in the source alphabet, implying that a random (patternless) source cannot be compressed without a loss of information. The objective of the lossless compression techniques is to remove as much redundancy from the source as possible. This approach cannot produce large source compression. The quality of an actual code is determined by the difference between the code entropy and the source entropy; if both are equal, then the code is called perfect in the information-theoretic sense. For example, Huffman and Shannon-Fano codes (e.g., Held, 1987, and Kinsner, 1991) are close to perfect in that sense. Clearly, no statistical code will be able to have entropy smaller than the source entropy. On the other hand, a perceptual source coder minimizes the bit rate of the input signal, while preserving its perceptual quality, as guided by two main factors: (i) information attributes derived from the structure in the given source (e.g., probabilities related to frequency of occurrence or densities, as well as edges and textures related to the singularities in the signal), and (ii) features derived from the perceptual framework (e.g., masking in the HVS and PAS). This corresponds to the removal of both redundancy and irrelevancy, as shown by the Schouten diagram in Figure 2. This orthogonal principle of both redundancy reduction and irrelevancy removal is usually difficult as it does not correspond to the maximization of signal-to-noise ratio, SNR (i.e., the minimization of the mean-squared error, MSE), and is central to the second-generation of codecs. For example, an edge of an object Figure 1. Compression is modeling and coding.
29
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
in an image may not carry much energy, but may be critical in its shape recognition. Another example is a stop consonant in speech which may be insignificant energetically and broadband spectrally, but may be critical in speech recognition. The major questions in data compression include: (i) how to model the source data (e.g., through statistical or dictionary models, transforms, prediction), (ii) how to measure the redundancy (e.g., through low or high-order entropies which deal with precise knowledge), and (iii) how to encode the source data (through fixed or variablelength codes). On the other hand, the major questions in signal compression include: (i) how to model a linear time-invariant (LTI) signal or a scale-invariant (SI) signal, as described in Sec. 2.1 (i.e., how to find transforms, patterns, prediction, scalar and vector quantization, and analysis/synthesis), (ii) how to measure irrelevancy, and (iii) how to encode the source signal (e.g., through fixed or variable-length codes) (Sayood, 2000). Measuring irrelevancy can be done through feature maps, perceptual entropy (Jayant, Johnson, & Safranek, 1993), and relative multifractal dimension measures (Dansereau & Kinsner, 2001; and Dansereau, Kinsner, & Cevher, 2002) as well as through other models of uncertainty. These include: (i) possibilistic to deal with vague and imprecise, but coherent knowledge (Dubois & Prade,1988), (ii) Dempster-Shafer belief theory to deal with inaccurate and uncertain information, (iii) rough sets to establish the granularity of the information available, (iv) fuzzy sets to deal with membership functions, and (v) fuzzy perceptual measures. Another major question relates to how the source and channel are treated. Figure 3 shows a combined encoding and decoding scheme. A source coder is often followed by a channel coder which adds redundancy for error protection, and a modem which maximizes the bit rate that can be supported in a given channel or storage medium, without causing an unacceptable level of bit error probability. This is of particular importance in wireless communications in which the channel may change appreciably not only during a single transaction but over a session. Ideally, the entire process of source coding, channel coding and modulation should be considered jointly to achieve the most resilient bit stream for transmission, as is often the case in modern source-channel joint coding. There may also be a considerable advantage to the joint coding by including joint text, image, video and sound coding. This chapter addresses the source coding only.
Figure 2. Reduction of redundancy and irrelevancy
30
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Figure 3. Joint source-channel-multimedia coding
Another problem is due to the characteristics of packet switched networks. Specifying the characteristics of traffic in multimedia environments is more difficult than in circuit switched systems in which a fixed bandwidth channel is held for the duration of a call, and only the incidence of calls and their durations are required. Packet switched systems carrying multimedia have variable-bit rates with bandwidth on demand. This calls for knowledge not only of the statistics of the sources, but also of the rules for assembling the packets in order to control the traffic. Such metrics must be based on multi-scale singularity measures because the signals have long-term dependence.
Taxonomy of Compression Methods Multimedia compression can be classified into lossless and lossy approaches, based on the distinctive features of the materials, as described in the next section. The lossless approach includes five methods: (i) the run-length encoding, (ii) statistical encoding, (iii) dictionary encoding, (iv) adaptive encoding, and (v) transform-based encoding. The lossy approach includes transform-based encoding and quantization encoding. A comprehensive taxonomy of the techniques, together with extensive reference material, is provided by Kinsner (2002); Sayood (2000); Kinsner (1998); and Kinsner (1991).
Models of Data, Signals and Complexity Models of Data and Signals The objective of source coding (compression) is a compact digital representation of the source information. Often, the receiver of data is the computer, while the receiver of signals is a human. The above definition of compression requires distinction between data and signals. Digital data are defined as a collection (a bag) of arbitrary finite-state representations of source information, with no concept of temporal or spatial separation between the elements of the bag, and no concept of the origin or destination of the bag. (Notice that in the bag theory, elements of a bag may be equal, while elements of a set must be different.) Examples of data could include either an intercepted encrypted stream of bits (without a known beginning or end), or a financial file, or a computer program. As a consequence, if nothing is known about the nature of the source or destination, compression can only be done losslessly; i.e., without any loss of information, as measured through redundancy (entropy differ-
31
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
ence), with the data modelled either statistically, or through a dictionary, or a transform such as prediction. The coder could then use either fixed or variable-length codes. A signal, on the other hand, is a function of independent variables such as time, distance, temperature, and pressure.The value of the function is called its amplitude, and the variation of the amplitude forms its waveform. The waveform can be either (i) unchanging (DC), (ii) periodic such as alternating (AC) or oscillating, (iii) aperiodic, (iv) chaotic, or (v) random (stochastic). The signals can be either (i) analog (continuous with infinite resolution), or (ii) discrete (sampled in time or space, but still with an infinite resolution), or (iii) digital (discrete and quantized to a specific resolution), or (iv) boxcar (continuous, piecewise constant with step displacements, as formed after a digital-to-analog converter). We are mostly concerned with the digital signals in this chapter. The signals can be classified as linear time invariant, LTI (additive invariance), or scale invariant, SI (multiplicative invariance). The LTI system theory is based on the idea that periodic waveforms shifted by multiples of the period are the same (e.g., Oppenheim & Schafer, 1975; Oppenheim & Willsky, 1983; Oppenheim & Schafer, 1989; and Mitra, 1998). This also applies to stationary and cyclostationary signals in the sense that their statistics do not change (i.e., either the wide-sense stationarity, WSS, in which the first two moments must not change, or the strict-sense stationarity, SSS, where none of the moments could change). Fourier (spectral) and wavelet (spectral and scale) transforms may be applied to such signals in order to extract appropriate features. On the other hand, scale-invariant (fractal) signals are fundamentally different from the LTI signals (Wornell, 1996). Their short-scale and long-scale behaviours are similar (i.e., they have no characteristic scale). Such selfsimilar signals (i.e., signals with one scale for time and amplitude) or self-affine signals (different scales for the time and amplitude) must be processed differently because well-separated samples in the signal may be correlated strongly. Unlike the LTI signals (whose Gaussian distributions have very short tails), the SI signals have power-law distributions that have long tails. Their higher-order moments do not vanish. Consequently, detection, estimation, identification, feature extraction, and classification of fractal signals are all different from the LTI signals. Most of the physical signals are not LTI. Examples of such signals include speech, audio, image, animation and video, telecommunications traffic signals, biomedical signals such as the electrocardiogram (ecg) and electromyogram (emg), sonar, radar, seismic waves, turbulent flow, resistance fluctuations, noise in electronic devices, frequency variations of atomic clocks, and time series such as stock market and employment. They are often highly nonGaussian, nonstationary, and in general have a complex and intractable (broadband) power spectrum. To emphasize this important point, Figure 4 shows the two LTI and SI classes of systems and signals. Many dynamical systems produce signals that are chaotic (deterministic, yet unpredictable in a long term (e.g., Kinsner, 1996; Peitgen, Jürgens, & Saupe, 1992; Sprott, 2003; Kantz & Schreiber, 1997; and Schroeder, 1991). Since such a signal has more attributes than the self-affine signal, more information can be extracted from it if one can show that the measured signal is chaotic indeed. We must also remember that the common assumption that both the LTI and SI signals originate from (and are processed by) systems that do not change in time and space can rarely be assured because both artifacts (such as electronic and mechanical systems) and living organisms age and change with the environment.
Figure 4. LTI and SI systems and signals
32
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
An added complication in processing such signals is that the human receiver does not employ a mean-squarederror criterion to judge the quality of the reconstructed signal (Jayant, 1992). Instead, humans use a perceptual distortion criterion to measure source entropy. This leads to two approaches to source compression: lossless and lossy, with the latter involving characteristic (relevant) features related to the HVS and PAS. The relevancy is measured through feature maps and perceptual entropy (Jayant, Johnson, & Safranek, 1993). The signal is modelled through either transforms, patterns, or analysis/synthesis processes. As it was with data, the coder may use either fixed or variable-length codes.
The EMO and Other World Views We have seen that simple redundant patterns can be removed from messages quite easily through many contextual (non-probabilistic) techniques such as the run-length encoding (Sayood, 2000). More complicated patterns based on the spread of probabilities in the pmf of the source can lead to lossless techniques such as the Huffman and Shannon-Fano (Held, 1987). A transform-based technique such as the JPEG produces higher compression ratios based on concentration of energy in few coefficients in the transform (discrete cosine) domain (Pennebaker & Mitchell, 1993). The consideration of the psycho-acoustic model in audio has resulted in MP3 (MPEG-1 Layer 3) compression (ISO/IEC 11172-3, 1993) On the other hand, perceptual and cognitive signal processing requires techniques based on features related to perception and cognition that go beyond the simple morphological or probabilistic patterns. To enhance perception and cognition, information and knowledge must be considered. Wang (2002) postulated an E-M-I model of the CI world view, where E, M, and I denote energy, matter, and information, respectively. The E and M components are located in the physical world, while the I component is placed in an abstract world, as shown in Figure 5. A similar IME world view was discussed by Stonier (Stonier, 1990, Ch. 3), with the major difference that the information (I) was considered by Stonier to be an integral part of the physical world. Still another approach to a CI world view is to develop an ontology for the structure in the knowledge base of an expert system (e.g., as described by Chan, 2002). We propose another CI world view in which organization (complexity, or pattern, or order, O) is an integral part of the physical world that also includes the E and M components, as shown in Figure 6. The argument for treating order as the integral part of the physical world is as follows. Order can be found in both M and E when the system is far from its thermodynamic equilibrium. In Newtonian physics, space and time were given once and for all, with perfect reversibility, and time was common to all observers. In relativity, space and time were no longer fixed, but “the distinction between the past, present and future was an illusion,” according to Einstein. On the other hand, irreversibility, or Eddington’s thermodynamic arrow of time (e.g., Mackey, 1992; and Hawkins, 1996), is fundamental in Boltzmann’s thermodynamic evolution of an isolated (Hamiltonian) system, from order to disorder, towards its equilibrium at which entropy is maximum. Nonequilibrium is the source of order; it brings “order out of chaos” [Prigogine & Stengers, 1984, p. 287]. Irreversibility is the source of order; it brings “order out of chaos” [Prigogine & Stengers, 1984, p. 292]. Far-from-equilibrium self-organization in open systems leads
Figure 5. Wang’s I-M-E world view with matter (M), energy (E) and information (I)
33
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Figure 6. The EMO world view that includes complexity with energy (E), matter (M) and order (O)
to their increased complexity. This also leads to the existential time arrow (duration) as introduced by Henri Bergson (1859-1941) (Bergson, 1960) which could also play an important role in CI. This complexity can be described in a number of distinct ways: by information, entropy, dimensionality spectra (Rényi), and singularity spectra (Hölder and Mandelbrot). Cognitive processes are also being linked to dynamical systems (e.g., Thelen & Smith, 2002; and Mainzer, 2004). In this view, information and the other measures are just descriptors of the fundamental natural entity, complexity. Figure 6 also illustrates the incompleteness of any view on reality. There are two objective worlds: the physical world and the abstract world. The third is the perceptual world, as formed by the intersection of the physical and abstract worlds. Within this world, order has always been seen by human observers, though time and matter were comprehended just centuries ago, while energy was comprehended even later, and only then the relationship between E and M was established. Today, much is known about the relation between all three elements (e.g., Prigogine & Stengers, 1984; Turcotte, 1997; Vicsek, 1992; Kadanoff, 1993; Alligood, Sauer, & Yorke, 1996; and Mainzer, 2004). The diagram also illustrates that a part of the physical world is not known yet (e.g., the dark matter and dark energy in the Universe), and that a part of the abstract world transcends the physical world.
Ojective and Subjective Metrics There are three basic classes of performance evaluation of compression algorithms and their implementations: (i) efficiency metrics (e.g., compression ratio, percentage, bit rates), (ii) complexity metrics (processing cost, memory size, and chip size), and (iii) delay metrics (to evaluate delays due to the processor used and networking). There are also three classes of metrics that relate to the quality of reconstruction: (i) difference distortion metrics (signal to noise ratio, SNR, and their variations), (ii) perceptual quality metrics (mean opinion score, MOS, segmented SNR), and (iii) recognizability metrics (relative and absolute). The first three classes are always required to evaluate the process of compressing the source and its transmission over a network. The other three classes relate to the evaluation of the fidelity of the reconstructed signal with respect to the human observer. Of course, lossless compression assures the quality of the reconstruction to less than one bit per pixel. On the other hand, lossy compression requires perceptual quality metrics to establish how accurate the reconstructed sound, image or video is to a human user. The recognizability metrics are concerned with the preservation of the intended message in the reconstructed signal, without any reference to the source, thus being an absolute subjective measure. In speech, this metric is called intelligibility. The confusion matrix is another recognizability metric. However, the test is non-binary in that, in addition to the correct utterance, other confusing utterances are also scored. These metrics are summarized by Kinsner (2002). Since many of these objective metrics are based on energy (e.g., MSE, and peak SNR), and energy itself does not carry information, they do not agree with the subjective quality metrics. For example, whispering or shouting
34
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
of a speech utterance differs much in its energy, although the message itself is unaltered significantly. Formants of the utterance and their transitions in time carry much more information than their energy. Fricatives also convey more information than would be implied by their energy. Much effort is being directed towards perceptual coding of digital audio (Painter & Spanias, 1998) and digital image and video (e.g., Farrell & Van Den Branden Lambrecht, 2002; and Tekalp, 1998), with corresponding developments in multidimensional quality metrics. Our focus has been on multifractal complexity measures to determine both the local and global complexities of the signal, using the Rényi fractal dimension spectrum, Mandelbrot singularity spectrum (Kinsner, 1994), and the generalized Kullback-Leibler distance (e.g., Kinsner & Dansereau, 2006; Dansereau, Kinsner, & Cevher, 2002; and Cover & Thomas, 1991).
Symbols, Alphabets, Messages, Probability and Information Since the non-energy-based metrics are related to the concepts of information and entropy, the next three sections describe them critically in order to delineate their advantages and limitations from the perspective of CI. Information, regardless of its definition, will be considered in this chapter as a measure of complexity.
Symbols and Alphabets A symbol σj is defined as a unique entity in a set. There is no limitation on the form that the symbol can take. For example, in a specific natural language, it could be a letter or a punctuation mark (e.g., a, A, α, ℵ, a Braille symbol, or a sign in the American Sign Language). In a specific number system, it could be a digit (e.g., unary {1}, binary {0, 1}, octal {0, 1, ..., 7}, hexadecimal {0, 1, ..., F}, Mayan nearly-vigesimal {•, —} corresponding to {1, 5}, or Babylonian base-60 with two symbols corresponding to {1, 10}). Other universal symbols (morphs) have been designed to form either an arbitrary font, or iconic languages (e.g., Chinese), or music notation, or chemical expressions. A symbol may also be a pixel (either binary, or gray scale, or colour). Another example of a symbol is the phoneme, defined as the elementary indecomposable sound in speech. A set of such unique symbols forms an alphabet. We shall consider several distinct alphabets relevant to compression. A source alphabet, Σ, is a set of symbols that the source uses to generate a message. It is denoted by Σ = {σ1, σ2, ..., σN}
(1)
where N is the cardinality (size) of Σ, and is denoted by N = |Σ|
(2)
It should be clear from the context of this chapter that this notation does not represent an absolute value. It should also be noticed that each symbol is independent from any other symbol in Σ. This independence of symbols could lead to a message whose symbols are arranged in either a random or correlated pattern, depending on the probability mass function discussed in the next section. For transmission and storage, each symbol σj must be encoded with other symbols from a coding alphabet, Γc, denoted by Γc = {γc1, γc2, ..., γcb}
(3)
where the cardinality b = | Γc | gives the base of the number system from which the digits γcj are drawn. This is also the base of the logarithm used in all the subsequent calculations. For example, the binary coding alphabet is Γc = {0, 1} with b = 2. The encoded symbols γj corresponding to the source symbol σj form the code alphabet, Γ, denoted by
35
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Γ = {γ1, γ2, ..., γN}
(4)
Its cardinality usually matches the cardinality of the source alphabet. There are also other alphabets and dictionaries used in the formation of compact messages, but are outside the scope of this chapter.
Strings and Messages A string sj is a collection of symbols σj (a bag, in the bag theory) that is larger than any individual symbol, but smaller than a message M. For example, a string “the” in English could be coded as a unit, and not as three separate symbols, thus resulting in a more compact representation of the string. A bag of all the symbols and strings forms a message M denoted by M ≡ M[σ1, σ2, ..., σM]
(5)
where M = | M | is the size of the message, and the symbol ≡ denotes equivalence. Notice that this vectorial notation [•] allows σi = σj for i ≠ j, while the set notation {•} would preclude equality of its elements.
Probability A Priori DeILQition The definition of probability used in this chapter is in the context of the formation of a message as defined by Ralph Hartley (1888-1970) (Hartley,1928) and Claude Shannon (1916-2001) (Shannon, 1948). Let us consider a process of successive selection of symbols σj (according to some probability p(σj) ≡ pj for that symbol) from a given source alphabet Σ of size N to form a message M containing M symbols. In this scheme of generating the message, the probabilities pj for all the symbols must be given in advance. This collection of known symbol probabilities forms the a priori probability mass function (pmf), P, denoted by P ≡ P[p(σ1), p(σ2), ..., p(σN)]
(6)
Since the pmf is a bag, the vectorial notation [•] is used again. Notice that the name pmf implies a discrete distribution, and distinguishes it from a continuous probability density function (pdf). Also notice that the selection of a symbol can be called an event. Finally, notice that the symbols can be substituted with strings of symbols, sj. We must distinguish between two fundamentally different probability distributions in the pmf: uniform and nonuniform. The uniform distribution is selected if nothing is known about the symbols in the message to be formed. As we shall see, this will lead to the longest possible (worst-case) message. If the symbols in a message form associations and patterns, the distribution is nonuniform, thus leading to shorter messages. If the symbols are independent, then the two distributions are also called the independent-identically-distributed (iid) pmf and independent-nouniformly-distributed (ind) pmf. We shall see that the iid pmf produces messages whose elements are uncorrelated (memoryless) and have the maximum entropy, while the ind pmf produces messages whose elements are still uncorrelated but shorter and with a lower entropy.
A Posteriori Definition If the message M has been formed, transmitted and received, the pmf can be estimated directly from M. If the symbol σj occurs nj times in the message of size M =|M|, then the relative frequency of occurrence of this symbol is defined as
f (σ j )
36
nj M
[dimensionless]
(7)
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
where the symbol ∆ above the equality sign denotes the relation by definition. With this definition, the following conditions are satisfied 0 ≤ f(σj ) ≤ 1, ∀ σj
(8)
and N
∑ f (σ j =1
j
) = 1
(9)
where N is the size of the alphabet. If the message is ergodic, then the frequency of occurrence f(σj) becomes the a posteriori probability p(σj) for a symbol σj p(σj) ≡ f(σj)
(10)
and their complete collection forms the a posteriori pmf.
Conditional and Joint Probabilities The above symbol selection process assumes no dependence of one symbol on any other symbol in the message. This is true when there is no pattern in the message (random message). However, patterns may imply dependence between either individual symbols or even groups of symbols. This can be measured by a conditional probability that symbol σj occurs, given that symbol σi has occurred. This can be expressed as
p (σ j σ i )
p (σi σ j ) p (σi )
(11)
where p(σiσ j) is called the joint probability of a digram σiσj (i.e., the probability that both σi and σj occur). The scaling by p(σi) assures that the conditional probability of the sample space equals 1 again. This concept of digrams can be expanded to k-grams if the dependence (memory) exists between k symbols. When the symbols are independent, then the joint probability is the product of probabilities of the individual symbols p(σi , σj ) = p(σi ) p(σj )
(12)
In this case, the message is called memoryless, or the 0th-order Markovian.
Shannon’s Self-Information For such a memoryless source, the Shannon self-information Ij of the jth event is defined as
I (σ j ) ≡ I j log b
1 = − log b p j pj
[information unit, or u]
(13)
where pj ≡ p(σj) for brevity, and b is the size of the coding alphabet Γc required to code each symbol. Since each symbol probability is confined to the unit interval pj = [0,1], the self-information is always non-negative Ij = [∞,0]. For a binary coding alphabet Γc = {0,1}, b = 2 and u ≡ bit (binary digit), while for natural base b = e, u ≡ nat (natural digit), and for b = 10, u ≡ Hartley. For simplicity, we shall assume the binary coding alphabet. This gives a clear basis for the interpretation of Shannon self-information: it is the number of bits required to repre-
37
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
sent a symbol. If the probability of a symbol is 1, it requires no bit, as it is tautology. When the probability of a symbol drops, the number of bits required increases. This statement could also be rephrased “information that is surprising (improbable, news) is more informative”. For example, the probabilities of the frequent letters E and T in English are p(E) = 0.13 and p(T) = 0.09, respectively, while the less frequent letter Q has probability of p(Q) = 0.0025. Consequently, the letters require I(E) = –log2(0.13) = 2.94 bits, I(T) = 3.47 bits, and I(Q) = 8.64 bits. Of course, the number of bits used in any simple practical code would have to be integers 3, 4, and 9, respectively. In general, the number of information units γj required to encode a symbol σj, whose probability is pj, can be computed from λj Ij = –logb pj
(14)
where x is the ceiling function that produces the closest integer greater or equal to x. This encoded symbol with λj information units is called a codeword. This strategy has been employed in many codes. For example, the Shannon-Fano codes for E, T, and Q are 000, 001, and 111111110, while the slightly better Huffman codes for the letters are 000, 0010, 1111110, respectively (Kinsner, 1991). Another example is the Morse code used in telegraphy in which the letter E requires a single short sound DIT, and the letter T has a single long sound DAH, while the less frequent Q requires four sounds DAH DAH DIT DAH. Such variable-length codes always reduce the number of bits in a message with respect to a code that uses the same number of bits per symbol, regardless of their frequency of use in a specific class of messages. What is the computational application of this definition of Shannon’s information? As we have seen, it leads to more compact messages through efficient coding of symbols, and it allows to calculate the total number of bits in any message to be generated. It should be clear, however, that this definition of information is divorced from all subjective factors, such as meaning (context), common-sense understanding, and perception or cognition. It just means more bits for a lower-probability symbol. This is the main source of difficulties in connecting this definition with subjective performance metrics
Conditional Self-Information Following the reasons behind the definitions of conditional and joint probabilities for messages with inter-symbol dependence (memory), we define the conditional self-information as
I (σ j σi ) ≡ I j i log b
1 = − log b p j i p (σ j σi )
[information unit, or u]
(15)
1 = − log b pij p (σ i σ j )
(16)
and the joint self-information as
I (σi σ j ) ≡ I ij log b
As before, for M independent events, the joint self-information is M
I1...M = ∑ I j
(17)
j =1
This definition of conditional self-information shortens the number of bits per symbol for digrams and, when expanded further, for k-grams.
38
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Etropies of Alphabets and Messages There are many definitions of entropy, as summarized at the end of the next section. We shall first define it based on Shannon’s self-information, followed by a review of other definitions of entropy and distortion entropies in Sec. 6.
Shannon’s Source Etropy and Redundancy While self-information describes the length of a single symbol in terms of information units, thus providing the length of the entire message containing M symbols, entropy gives the average information, regardless of the message size. It is then defined as the average (expected) value of self-information N
H ∑ p (σ j ) I (σ j ) j =1
(18)
N
= −∑ p (σ j ) log b p (σ j ) j =1 N
≡ −∑ p ( j ) log b p ( j ) j =1 N
≡ −∑ p j log b p j [u/symbol] j =1
where N is the size of the source alphabet Σ = {σ1, σ2 , ..., σN} and p(σj) ≡ p(j) ≡ pj is the probability of the jth symbol taken from the corresponding pmf P = [p1, p2, ..., pN]. The expression is related to the Boltzmann entropy (but with the opposite sign) and Boltzmann -Gibbs entropy (with the same sign), as described in Secs. 6.6.3 and 6.6.4, respectively. This entropy function H(P) is non-negative and concave in P (Cover & Thomas, 1991). This is also called the 1st-order entropy denoted by H(1) because the expression uses a single value of the probability in both the self-information and the weight. The parentheses are used in the subscript to differentiate this notation from the Hq notation in the Rényi entropy, as discussed later. We often use another subscript to emphasize the order of the Markov chain model of the message itself. For example, the 1st-order entropy for a memoryless message with a nonuniform pdf is denoted by H(1,0), while the 1st-order entropy for memoryless message with a uniform pmf is denoted by H(1,-1). This special case can be expressed as N
H (1, −1) = H max = −∑ j =1
1 1 log b = log b N N N
(19)
It is very important because it defines redundancy H R HR(A) = Hmax(A) – H(A)
(20)
where A represents any alphabet (either source or code), Hmax(A) represents the maximum possible entropy for an iid distribution, and H(A) is the actual entropy for the given alphabet A. If H R(A) is removed from the message, no loss of information occurs. This defines a lossless compression.
Shannon’s Code Entropy If each individual symbol has a codeword that has an integer number of bits, λj, then the source entropy H(Σ) may be different from the code entropy H(Γ). The code entropy is defined as the weighted sum of the self-information of the individual codewords
39
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
N
H (Γ) + ∑ p j λ j [u/symbol]
(21)
j =1
Notice that since Ij ≤ λj then H(Σ) ≤ H(Γ)
(22)
When the equality in (22) is reached, then the code is called perfect in the information theoretic sense. For example, the arithmetic code (which does not require an integer number of bits per symbol) is closer to the perfect code than the Huffman code (Sayood, 2000).
Higher-Order Message Entropy For independent symbols, the message M is of the 0th order, and its entropy equals the source entropy, H(M) = H(S). If encoded, then the following relation must hold H(M) ≤ H(G). However, if the message is of the 1st order (i.e., is has memory of one symbol), then the message entropy must be of the 2nd order, as denoted by
N N H (2,1) ( M ) S ∑∑ p (i, j ) log b p ( j i )
[u/symbol]
(23)
i = j j =1
where the p(i,j) and p(j | i) are the joint and conditional probabilities, respectively. This can be generalized to any higher order entropy H(k+1,k) for messages of higher-order k (Sayood, 2000).
Entropies of Distortion In lossless compression, the original message M and reconstructed messages M* are the same, and the measures discussed so far are sufficient for their comparison. In lossy compression, the reconstructed message may be different from M, thus leading to distortion and a different reconstruction alphabet Σ*. The distortion can be measured through distortion entropies such as conditional, mutual, and relative (Cover & Thomas, 1991, and Figure 7. Venn diagram illustration of joint entropy, H(X,Y), conditional entropy, H(X|Y) and H(Y|X), and mutual entropy, H(X;Y)
40
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Kinsner, 1998). In order to avoid cumbersome notation, we shall denote the original message as X ≡ M, and the reconstructed message as Y ≡ M*, with the corresponding source and reconstruction alphabets denoted by X = {x1, x2, ..., xN} and Y = {y1, y2, ..., yL}, and their cardinalities of N and L, respectively. Notice that N and L do not have to be equal. We also assume that H(X) = H(X) and H(Y) = H(Y).
Joint Entropy, H (X,Y ) The joint entropy H(X,Y) of two discrete random variables X and Y is fundamental to the definition of the conditional and other entropies. It is defined as N
∆ H (X,Y ) = –Σ
L
Σ
i = 1j = 1
p(xi , y j) log b p(xi , y j)
(24)
where N and L are the cardinalities of X and Y, respectively, and p(x,y) is the joint pmf. This joint entropy can be illustrated by a Venn diagram shown in Figure 7. It can be seen from the diagram in Figure 7 that (for proof, see Cover & Thomas, 1991, p. 28) H(X,Y) ≤ H(X) + H(Y)
(25a)
and H(X,Y) = H(Y,X)
(25b)
C onditional E ntropy, H (Y |X) and H (X|Y ) The conditional entropy H(Y|X) that the reconstruction message Y has occurred, given that the source message X has occurred, is defined as the average conditional self-information I(y | x) H (Y X )
∑ p( x) I (Y
x∈ X
N
X = x)
(26)
L
= −∑∑ p ( xi , y j ) log b p ( y j xi ) i =1 j =1
Similarly, L
N
H ( X Y ) −∑∑ p ( xi , y j ) log b p ( xi y j )
(27)
j =1 i =1
This conditional entropy is illustrated in Figure 7. It can be seen that (Cover & Thomas, 1991, p. 27) H(Y|X) ≤ H(Y) H(X|Y) ≤ H(X)
(28a) (28b)
and, in general, H(X|Y) ° H(Y|X)
(29)
It can also be shown that (Sayood, 1996, Example 7.4.2)
41
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
H(Y|X) ≤ H(Y, X) – H(X) H(X|Y) ≤ H(X, Y) – H(Y)
(30a) (30b)
Mutual Entropy, H (X;Y ) The mutual entropy H(X;Y) of the source message X and the reconstruction message Y is defined as the average mutual self-information denoted by I(x ; y) H ( X ;Y )
∑ ∑ p( x, y) I ( x; y)
(31)
x∈ X y∈Y
where I ( x; y ) log b
p ( xi y j ) p ( xi )
= log b
p ( xi y j ) p ( xi ) p ( y j )
(32)
It can be seen from Figure 7 that, since the mutual entropy is common to both the source and reconstruction, it could be used to make the reconstruction look like the source when H(X;Y) reaches its maximum value. When H(X;Y) = 0, the source and reconstruction are totally different. This feature has made mutual entropy a prominent player in many areas of signals processing. It can also be shown that H(X; Y) = H(X) –H(X|Y) H(Y; X) = H(Y) –H(Y|X)
(33a) (33b)
and H(Y; X) = H(X) + H(Y) – H(Y|X)
(33c)
R elative E ntropy, H (X || Y ) In this chapter, the most important distortion-related entropy is the relative entropy denoted by H(X || Y). If we assume that both the source X and reconstruction Y alphabets have the same cardinality N, then the relative entropy can be written as N
p( x j )
j =1
p( y j )
H ( X Y ) ∑ p ( x j ) log b
(34)
This value is non-negative if the pmfs of the two alphabets are not equal, and zero if and only if P(X) = P(Y). The relative entropy is also called the Kullback-Leibler divergence (distance), as it measures the dissimilarity between two alphabets of the same cardinality. This property is suitable for perceptual quality metrics (Dansereau & Kinsner, 2001, 2006).
Rényi Entropy Spectrum, H q Shannon’s 1st-order and higher-order entropies provide a measure of the average information for either the source or the reconstruction or both, and are of great importance in data and signal transmission, storage, and signal processing. In 1955, Alfréd Rényi (1921-1970; Erdös Number 1) introduced a generalized entropy, Hq, that could discern the spread of probabilities in the pmf. For a source message M with its source alphabet Σ of cardinality N and its corresponding pmf, P, the Rényi entropy spectrum is given by
42
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
H q ( P) =
N 1 log b∑ p qj , −∞ ≤ q ≤ ∞ 1− q j =1
(35)
where q is the moment order. For q = 0, the Rényi entropy becomes the maximum (capacity) entropy H(1,–1), also known as the morphological entropy (Kinsner, 1996, 2005) H0(P) = Hmax = logb N [u/symbol]
(36)
For q = 1, it can be shown that it is the Shannon entropy H(1,0), also known as the information entropy (Kinsner, 1996) N
H1 ( P) = −∑ p j log b p j
(37)
j =1
For q = 2, it becomes the correlation entropy (Kinsner, 1994) N
H 2 ( P) = − log b ∑ p 2j
(38)
j =1
For q = ±∞, it becomes the Chebyshev entropy (Kinsner, 1996) with the extreme values of the probability defining the following two extreme values H∞(P) = –logb pmax H–∞(P) = –logb pmin
(39a) (39b)
Since pmin ≤ pmax, then |log b(pmax)| ≤ |log b (pmin)|, and the entropy spectrum has the upper and lower bounds. It can be shown that Hq is a monotonically nonincreasing function of q, and it becomes constant only for an iid pmf. Since the spread of this “inverted S” curve in Figure 8 depends on the spread of probabilities in the pmf, the curve can be used as a measure of the differences (distortion) in the source pmf, X, and the corresponding reconstructed pmf, Y, as shown in Figure 8. Based on these measures, a suitable cost function can then be established for rate distortion minimization. Figure 8. Rényi entropy spectrum for a source X and its reconstruction Y messages
43
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
This entropy spectrum can also be used as a detector of stationarity of a signal; i.e., while a stationary signal produces a constant curve over time or space, a nonstationary signal produces a varying spectrum trajectory. The major advantages of this approach over the direct study of the pmfs include: (i) the pmfs can be of different cardinalities, (ii) this entropy spectrum Hq can be used in multiscale analysis to establish the fractal dimension spectrum Dq (Kinsner, 1996, 2005) and (iii) Dq can then be used to extract the Mandelbrot singularity spectrum (Kinsner, 1996, 2005). We have applied both Hq and Dq in the study of multifractals in dielectric discharges, transient signal analysis, fingerprint compression, speech segmentation into phonemes, image and video compression, biomedical (ecg and emg) segmentation and classification, DNA sequencing, and cryptography.
Oher Entropies The Shannon and Rényi entropies relate to the Boltzmann-Gibbs entropy concept in which a probability function, W, determines the direction towards disorder: since a closed system tends to a thermodynamical disorder, the entropy increases with increasing W. Since self-information was defined in the same direction, a random message carries more self-information than a legible message. Clearly, self-information should not be the opposite to the conventional perceptual and cognitive information. Several alternative approaches to defining entropy and information will be summarized. We shall start from the Kolmogorov and Kolmogorov-Sinai entropies that provide a fundamental alternative to the Shannon entropy as they do not involve probabilities, with the latter describing dynamic rather than static systems. It is followed by Prigogine’s entropy for open self-organizing systems. For completeness, Boltzmann, Gibbs, Schrödinger, and Stonier entropies will also be highlighted. There are still other entropies (e.g., fuzzy entropy) that are not treated in this chapter.
Kolmogorov Entropy (Complexity) In 1965, Andrei N. Kolmogorov (1903-87) introduced an alternative algorithmic (descriptive) complexity measure KU(X) of a message X as the shortest length of a binary program P that can be interpreted and halted on a universal computer U (such as the Turing machine), and that describes the message completely, without any reference to the pmf. The entropy is given by KU ( X ) = min P P:U ( P ) = X
[bits]
(40)
Since the expected value of this Kolmogorov complexity measure of a random message is close to Shannon’s entropy, this concept can be considered more fundamental than the entropy concept itself (Cover & Thomas, 1991, Ch. 7).
Kolmogorov-Sinai Entropy In dynamical systems, the Kolmogorov-Sinai (KS) entropy HKS is a measure of information loss per iteration in maps for which the iteration count n is an integer, n ∈ Z (or per unit of time in flows for which time t is continuous, t ∈ R) in m-dimensional (mD) phase space (Kinsner, 2003a). Thus, the KS entropy can be used to characterize chaos in an mD phase space (Atmanspacher & Scheingraber, 1987). For example, while nonchaotic systems have HKS = 0, chaotic systems have HKS > 0, and uncorrelated noise has HKS = ∞ (Kinsner, 2003c). There are several schemes to compute the KS entropy (Kinsner, 2003b). If a dynamical system has several positive Lyapunov exponents, the following Ruelle inequality holds for most dynamical systems (Ruelle, 1978; and Grassberger & Procaccia, 1983) J
H KS (λ) ≤ ∑ λ j j =1
44
(41)
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
where J is the index of the smallest positive Lyapunov exponent. Pesin (1977) has shown that the inequality also holds for flows. Thus, Lyapunov exponents provide a good estimate of the KS entropy, without any reference to the source statistics because the Lyapunov exponents can be calculated directly from the trajectories of the corresponding strange attractor. This is important because accurate estimates of the entropy from the process statistics would require a very large number of data points in a time series (Williams, 1997, Ch. 26). The significance of the KS entropy is that it extends the static probabilistic Shannon entropy measure to dynamical systems which are deterministic and dynamic in that they provide a continuous supply of new information during their evolution in chaos. We propose that this single KS entropy could also be generalized to HqKS with moments order q ∈ R, similarly to the generalization of the single Shannon entropy, as discussed in Sec. 6.5.
Prigogine Entropy For years, Ilya Prigogine (1917-2003) had been developing ideas related to dynamical systems and complexity, with emphasis on far-from-equilibrium self-organization. He described three forms of thermodynamics: (i) thermostatics (i.e., systems in equilibrium at which nothing special can happen because any perturbation is ignored by the system due to the Gibbs’ minimum free energy principle), (ii) linear thermodynamics (near-equilibrium, also governed by the minimum principle), and (iii) far-from-equilibrium thermodynamics (Prigogine & Stengers, 1984; and Prigogine, 1996). The latter form is the most interesting, as it includes both inflows and outflows of energy, matter and entropy (organization) between the open system and its environment. This exchange can be written as dSP = dSC + dSE
(42)
where SP denotes Prigogine’s entropy which consists of the internal (Clausius) entropy SC and the exchange entropy SE. Since for irreversible systems, dSC > 0, the Prigogine entropy dSP depends on the new component which can now be either (i) dSE > 0 (nothing special), or (ii) dSE = 0 (an isolated system at equilibrium), or (iii) dSE < 0 (negentropy, or provision of order). If |dSC| < |dSE| then dSP < 0. This negentropy indicates self-organization which can occur in the far-from-equilibrium state because the system does not have to conform to any minimum principle. This entropy appears to be critical in future studies of measures for CI.
Clausius Entropy In 1820, Sadi Carnot (1796-1832) formulated the first law of thermodynamics (that energy cannot be created or destroyed) in the context of the maximum efficiency that a steam engine could achieve. In 1865, Rudolf Clausius (1822-88) proposed the following definition of entropy function SC δQ dSC = T R
(43)
where dSC denotes the exact differential (i.e., whose integral is independent of the configuration path selected), while δQ is an inexact differential of thermal energy Q (as its integral depends on the path selected), T is the absolute temperature in K, and the subscript R denotes that the expression is valid for reversible processes only, close to thermal equilibrium at a macroscopic scale. He also expanded this expression to irreversible systems for which the entropy increases, dSC > 0, and by introducing this physical evolution, he defined the second law of thermodynamics (that heat rise from a colder to a hotter body is impossible), and coined the word entropy from the Greek word (τροπη) for “transformation” or “evolution.” Clausius also made the following famous categorical statements: (1) “The energy of the universe is constant”, and (2) “The entropy of the universe tends to a maximum.” These statememnts apply to an abstract closed universe only.
45
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Boltzmann Entropy In 1898, following Carnot and Clausius, Ludwig Boltzmann (1844-1906) expanded this fundamental concept of thermodynamic entropy ST as given by ST = k logb W
(44)
where k is the Boltzmann constant (1.3807×10 –23 J/K or 3.2983×10 –24 cal/K), b = e, and W is the thermodynamic function such that when the disorder of the system increases, W increases with it, thus increasing ST. He defined entropy in terms of a macrostate determined by a large number of microstates. For example, let us consider that the macrostate is determined by a set of 16 non-overlapping coins distributed in a 2D space, and that the microstate is formed by each coin either face up or down. The number of the most unlikely scenarios of the organized macrostate (in which all the coins are either face up or face down) is W(pmin) = 1. The most likely scenario of the disorganized macrostate is that half of the coins is up and the other half is down (or vice versa) which is W(pmax) = C(16,8) = 12,870. Thus, ST(pmin) < ST (pmax). Since W is represented by the natural numbers, starting from 1, ST is non-negative. Since any ordered closed system tends to a disordered state at its equilibrium ST*, the disordered state is more probable than an ordered state, thus leading to the second law of thermodynamics. Observe that if W is reformulated in terms of a probability function, and the sequence of macrostates is substituted by time t, then –∞ < ST (t0) ≤ ST (t) ≤ 0 for all times t0 < t, regardless of the initial system preparation, where t0 is the initial time. In either case, the entropy difference between t and t0 is positive. Work is required to organize a system. The present research interest is in open systems that are far from this equilibrium. Notice that although Boltzmann did not deal with information explicitly, the concept of “degree of disorder” is related to it.
Boltzmann-Gibbs Entropy In 1902, J. Willard Gibbs (1839-1903) formalized Boltzmann’s entropy within a measure space (consisting of a phase space X, a σ algebra and a measure µ (Mackey, 1992)), and formulated the thermodynamic entropy in terms of densities f on an ensemble to deal with the very large numbers of particles in a volume. An ensemble is a set of small subsystems that are configured identically, each with a finite number of particles. The entropy can be written as H T ( f ) = − ∫ f ( x) log f ( x)dx X
(45)
which is the expected value of the density (for a continuous case). Notice that the sign is the opposite of the original Boltzmann’s ST. Again, Gibbs did not deal with information explicitly. He also formulated the concept of free energy which is the difference between the total energy and the unavailable energy (lost in the processes). This leads to the concept of quality of energy sources, and may also be useful in CI.
Schrödinger Negentropy In 1944, Erwin Schrödinger (1887-1961) introduced the concept of negative entropy (negentropy) to stress organization of living systems (Schrödinger, 1944). He started from Boltzmann’s formulation SS = k logb DS
(46)
where DS is similar to W in (44). Since living organisms have the tendency to maintain a low level of entropy by “feeding upon negative entropy” (i.e., taking orderliness from their environment), he expressed it as:
46
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
− S = k log b
1 DS
(47)
Again, Schrödinger did not deal with information directly. Later, the expression was also pursued by Brillouin (1964) who considered W to be a measure of uncertainty.
Stonier Entropy The Schrödinger entropy was further developed by many others, including Tom Stonier (1927-99) (Stonier, 1990). He considered OS =
1 DS
(48)
in (46) as a measure of an ordered system, and defined information as I = f(Os)
(49)
or I = ce–S/k
(50)
where S is the Schrödinger entropy, k is Boltzmann’s constant, and c is an information constant of a system at zero entropy. This formulation of information is totally different from Shannon and Rényi in that an ordered (legible) message M1 has now more information than a more random string M 2.
Summary and Discussss The main objective of this chapter was to provide a review of self-information and entropy as they might be used in measuring the quality of reconstruction in data and signal compression for multimedia. Another objective was to introduce alternative definitions of entropy that do not require the source or reconstruction statistics. Still another objective was to describe an entropy capable of measuring dynamic information content, as can be found in chaotic dynamical systems. This chapter is an extension of the data and signal compression techniques and metrics described by Kinsner (2002). We have defined data as bags of symbols (or strings) whose origin and destination are not known. Any transformation of the data must be lossless in the sense that no information is lost. On the other hand, signals are bags of symbols (or strings) with known origin and destination. Such data or signals can form finite messages. In cognitive informatics, we are concerned with the transformation of signals to enhance their characteristic features for perception, cognition and learning. The transformations can be lossy, as long as the distortion between the reconstruction and the source does not impede the key objective of the maximal transfer of information through the signals used. We have also distinguished between two fundamentally different classes of signals: linear-time invariant (LTI) and scale-invariant (SI). Many new metrics can be found for the SI signals that are not available for the LTI signals. This chapter has reviewed a number of different forms of Shannon self-information and entropy. The self-information of a symbol is defined as a function of its probability, and is measured in information units such as bits. Entropy is defined as the average (expected) self-information, which can be interpreted as the average number of information units per symbol, regardless of the size of the message. Since the Shannon self-information and entropy have both the same root, their interpretation relates to the Boltzmann entropy. Consequently, Shannon self-information had to be divorced from any cognitive meaning.
47
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
The single kth-order Shannon entropy of messages with different memories (according to Markov-chain models) is useful in developing perfect codes in the information-theoretic sense, but does not deal with the spread of probabilities in the source or destination alphabet. To solve the problem, we discussed the Rényi generalized entropy spectrum, Hq, which provides a bounded representation of the signal. This functional (or vectorial) representation could be used to determine the distortion between a source, Hq(X) and its reconstruction Hq(Y), no longer in terms of scalars, but in terms of vectors. The difference between Hq(X) and Hq(Y) could then be used to establish a cost function in order to achieve an optimal perceptual quality of the reconstruction. This singlescale Rényi entropy spectrum, however, has a serious limitation when dealing with self-similar or self-affine signals which are scale-invariant. For such signals, the analysis must be done at different scales to discover any power-law relationship that might be present in the signal, and if present, a spectrum of fractal dimensions could be computed (Kinsner, 1996). The significance of this Rényi fractal dimension spectrum is that it can characterize strange attractors that are often multifractal. Furthermore, since images or temporal signals can be considered as strange attractors of iterative function systems (Barnsley, 1988), the Rényi fractal dimension spectrum can be used to characterize such signals. We have demonstrated elsewhere that this approach can lead to even better perceptual metrics (Dansereau & Kinsner, 2001; and Kinsner & Dansereau, 2006). Other definitions of entropies have also been presented in this chapter. For example, the Kolmogorov entropy generalizes Shannon’s entropy, as it does not refer the the pmf at all. The Kolmogorov-Sinai entropy also extends Shannon’s entropy, as it can deal with systems that create new information during their evolution. Such metrics could be applicable to learning processes in CI. Although there are many definitions of entropy, the core idea that makes entropy so important in the probabilistic and algorithmic information theories is that it describes disorder and order of a message. This order is critical to CI. Many contemporary quality metrics still have a major difficulty with measuring perceptual quality because they are based on the error energy between the source and the reconstruction, while the human visual system and the psychoacoustic system involve not only energy, but many other factors such as singularities. On the other hand, entropy-based measures are more suitable for quality metrics, as they describe disorder of the source and reconstruction. A suitable cost could then be designed to maximize the perceptual quality of the reconstruction, at the lowest possible bit rate. Since it is most unlikely that a single cost function could apply to all the multimedia materials, it should use adaptation and learning to match both the properties of the material and the specific needs of a user. Thus, the question posed in this chapter has an affirmative answer: although the entropy-based measures are useful in characterizing data and signals, and in establishing the perceptual quality of their reconstructions objectively, they should be used only in conjunction with other complementary concepts such as various multiscale singularity measures that could be developed from the entropy-based measures described in this chapter. In fact, such measures are described by Kinsner (2005) and Kinsner & Dansereau (2006). The fundamental reason for multiscale entropy-based measures being more suitable for quality metrics than various energy-based measures is that the former describe the complexity of the source and reconstruction. The complexity is related not only to the structure and context of the message, but also to the singularity distribution in the message over multiple scales. This property is essential in perceptual, cognitive and conscious processes. Thus, such entropybased multiscale metrics differ fundamentally from any other measures in the classical information theory. This is described in more detail by the unified approach to fractal dimensions (Kinsner, 2005), and is illustrated by the explicit examples of perceptual quality metrics through a relative multiscale entropy-based measures as described by Kinsner & Dansereau, 2006. However, since measuring of the content (meaning) and value (utility) of a message to a single user and to multiple users requires not only the static multiscale entropy-based measures, as described here, but also measures of their relative dynamics, this problem will be covered in our future work.
Acknowledgment This work was supported partially by a grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada.
48
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
References Alligood, K.T. , Sauer, T.D., & Yorke, J.A. (1996). Chaos: An introduction to dynamical systems (p. 603). New York, NY: Springer Verlag. Atmanspacher, H., & Scheingraber, H. (1987). A fundamental link between system theory and statistical mechanics. Foundations of Physics, 17, 939-963. Barnsley, M. (1988). Fractals everywhere (p. 396) Boston, MA: Academic. Bergson, H. (1960). Time and free will: An essay on the immediate data of consciousness. New York, NY: Harper Torchbooks (Original edition 1889, translated by F.L. Pogson). Brillouin, L. (1964). Scientific uncertainty and information. New York, NY: Academic. Chan, C.W. (2002, August). Cognitive informatics: A knowledge engineering perspective. In Proceedings of the 1st IEEE International Conference on Cognitive Informatics (pp. 19-20, 49-56). Calgary, AB.{ISBN 0-76951724-2} Cover, T.M., & Thomas, J.A. (1991). Elements of information theory (p. 542). New York, NY: Wiley. Dansereau, R.M., Kinsner, W., & Cevher, V. (2002, May 12-15). Wavelet packet best basis search using Rényi generalized entropy. In Proceedings of the IEEE 2002 Canadian Conference on Electrical & Computer Engineering, CCECE02, 2, 1005-1008 Winnipeg, MB. ISBN: 0-7803-7514-9. Dansereau, R., & Kinsner, W. (2001, May 7-11). New relative multifractal dimension measures. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP2001, 1741-1744. Salt Lake City, UT. Dubois, D., & Prade, H. (1988). Possibility theory: An approach to computerized processing of uncertainty (p. 263). New York, NY: Plenum. Farrell, J.E., & Van Den Branden Lambrecht, C.J. (eds.) (2002, January). Translating human vision research into engineering technology [Special Issue]. Proceedings of the IEEE, 90(1). Grassberger, P., & Procaccia, I. (1983, January 31). Characterization of strange attractors. Physics Review Letters, 50A(5), 346-349. Hawkins, S. (1996). The illustrated a brief history of time (2nd ed.) (p.248). New York, NY: Bantam. Hartley, R.V.L. (1928). Transmission of information. Bell System Technical Journal, I, 535-563. Held, G. (1987). Data compression: Techniques and applications, hardware and software considerations (2nd ed.), (p. 206). New York, NY: Wiley. ISO/IEC 11172-3 (1993) Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s - Part 3: Audio. Jayant, N. (1992, June). Signal compression: Technology targets and research directions. IEEE Journal on Selected Areas Communications, 10, 796-818. Jayant, N. (ed.) (1997). Signal compression: Coding of speech, audio, text, image and video (p. 231). Singapore: World Scientific. Jayant, N.S., Johnson, J.D., & Safranek, R.S. (1993, October). Signal compression based on models of human perception. Proceedings of the IEEE, 81(10), 1385-1422. Kadanoff, L.P. (1993). From order to chaos: Essays (p. 555). Singapore: World Scientific.
49
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Kantz, H., & Schreiber, T. (1997). Nonlinear time series analysis (p. 304). Cambridge, UK: Cambridge University Press. Kinsner, W. (1991). Review of data compression methods, including Shannon-Fano, Huffman, arithmetic, Storer, Lempel-Ziv-Welch, fractal, neural network, and wavelet algorithms. Technical Report DEL91-1 (p. 157). Winnipeg, MB, Canada: Dept. Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (1994). Fractal dimensions: Morphological, Entropy, Spectrum, and Variance Classes. Technical Report, DEL94-4 (p.146) Winnipeg, MB, Canada: Dept. of Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (1996). Fractal and chaos engineering: Postgraduate lecture notes (p. 760). Winnipeg, MB, Canada: Department of Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (1998). Signal and data compression: Postgraduate lecture notes P. 642). Winnipeg, MB, Canada: Department of Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (2002, August 19-20). Compression and its metrics for multimedia. In Proceedings of the 1st IEEE International Conference on Cognitive Informatics (pp.107-121). Calgary, AB. {ISBN 0-7695-1724-2} Kinsner, W. (2003a). Characterizing chaos with Lyapunov exponents and Kolmogorov-Sinai entropy. Technical Report DEL03-1, (p. 76). Winnipeg, MB, Canada: Dept. Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (2003b, August 18-20). Characterizing chaos through Lyapunov metrics. In Proceedings of the 2nd IEEE International Conference on Cognitive Informatics (pp. 189-201). London UK. ISBN 0-7695-1986-5. Kinsner, W. (2003c). Is it noise of chaos? Technical Report DEL03-2 (p. 98). Winnipeg, MB, Canada: Dept. Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (2005, August 8-10). A unified approach to fractal dimensions. In Proceedings of the 4th IEEE International Conference on Cognitive Informatics (pp. 58-72). Irvine, CA. ISBN 0-7803-9136-5. Kinsner, W., & Dansereau, R. (2006, July 17-19). A relative fractal dimension spectrum as a complexity measure. In Proceedings of the 5th IEEE International Conference on Cognitive Informatics. Beijing, China. ISBN 1-4244-0475-4. Mackey, M.C. (1992). Time’s arrow: The origin of thermodynamic behavior (p. 175). New York, NY: Springer Verlag. Mainzer, K. (2004). Thinking in complexity (4th ed.) (p. 456)..New York, NY: Springer Verlag. Mitra, S.K. (1998). Digital signal processing: A computer-based approach (p.864). New York: McGraw-Hill (MatLab Series) Oppenheim A.V., & Schafer, R.W. (1975). Digital signal processing (p.585). Englewood Cliffs, NJ: PrenticeHall. Oppenheim A.V., & Willsky, A.S. (1983). Signals and systems (p. 796). Englewood Cliffs, NJ: Prentice-Hall. Oppenheim A.V., & Schafer, R.W. (1989). Discrete-time signal processing (p. 879). Englewood Cliffs, NJ: Prentice-Hall. Painter T., & Spanias, A. (1998, April). Perceptual coding of digital audio. Proceedings of the IEEE, 88(4), 451513. Peitgen, H.-O., Jürgens, H., & Saupe, D. (1992). Chaos and fractals: New frontiers of science (p. 984). New York: Springer Verlag.
50
Is Entropy Suitable to Characterize Data and Signals for Cognitive Informatics?
Pennebaker, W.B., & Mitchell, J.L. (1993). JPEG still image data compression standard (p. 638). New York, NY: Van Nostrand Reinhold. Pesin, Y.B. (1977). Characteristic Lyapunov exponents and smooth ergodic theory. Russian Mathematical Surveys, 32, 55-114. Prigogine, I., & Stengers, I. (1984). Order out of chaos: Man’s new dialogue with nature (p. 349). New York, NY: Bantam. Prigogine, I. (1996). The end of certainty: Time, chaos, and the new laws of nature (p. 228). New York, NY: The Free Press. Ruelle, D. (1978). Thermodynamics formalism (p. 183). Reading, MA: Addison-Wesley-Longman and Cambridge, UK: Cambridge University Press. Sayood, K. (2000). Introduction to data compression (2nd ed.) (p. 636). San Francisco, CA: Morgan Kaufman. Schroeder, M.R. (1991). Fractals, chaos, power laws (p. 429). New York, NY: W.H. Freeman. Schrödinger, E. (1944). What is Life? with Mind and Matter and Autobiographical Sketches (p. 184). Cambridge, UK: Cambridge University Press {ISBN 0-521-42708-8 pbk; Reprinted 2002} Shannon, C.E. (1948, July). A mathematical theory of communication. Bell System Technical Journal, 27(7), 398-403. Reprinted in Shannon, C.E. & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: University of Illinois Press. Sprott, J.C. (2003). Chaos and time-series analysis (p. 507). Oxford, UK: Oxford University Press. Stonier, T. (1990). Information and the internal structure of the universe: An exploration into information physics (p. 155). New York, NY: Springer Verlag. Tekalp, A.M. (ed.) (1998, May). Multimedia signal processing [Special Issue]. Proceedings of the IEEE, 86(5). Thelen, E., & Smith, L.B. (2002). A dynamic systems approach to the development of cognition and action (5th pr.) (p. 376). Cambridge, MA: MIT Press. Turcotte, D.L. (1997). Fractals and chaos in geology and geophysics (2nd ed.) (p. 398). Cambridge, UK: Cambridge University Press. Vicsek, T. (1992). Fractal growth phenomena (2nd ed.) (p. 488). Singapore: World Scientific. Wang, Y. (2002, August 19-20). On cognitive informatics. In Proceedings of the 1st IEEE International Conference on Cognitive Informatics (pp. 34-42). Calgary, AB, {ISBN 0-7695-1724-2} Wornell, G.W. (1996). Signal processing with fractals: A wavelet-based approach (p. 177). Upper Saddle River, NJ: Prentice-Hall. Williams, G.P. (1997). Chaos theory tamed (p. 499). Washington, DC: Joseph Henry Press.
51
52
Chapter III
Cognitive Processes by using Finite State Machines Ismael Rodríguez Universidad Complutense de Madrid, Spain Manuel Núñez Universidad Complutense de Madrid, Spain Fernando Rubio Universidad Complutense de Madrid, Spain
Abstract Finite State Machines (FSM) are formalisms that have been used for decades to describe the behavior of systems. They can also provide an intelligent agent with a suitable formalism for describing its own beliefs about the behavior of the world surrounding it. In fact, FSMs are the suitable acceptors for right linear languages, which are the simplest languages considered in Chomsky’s classification of languages. Since Chomsky proposes that the generation of language (and, indirectly, any mental process) can be expressed through a kind of formal language, it can be assumed that cognitive processes can be formulated by means of the formalisms that can express those languages. Hence, we will use FSMs as a suitable formalism for representing (simple) cognitive models. We present an algorithm that, given an observation of the environment, produces an FSM describing an environment behavior that is capable to produce that observation. Since an infinite number of different FSMs could have produced that observation, we have to choose the most feasible one. When a phenomenon can be explained with several theories, Occam’s razor principle, which is basic in science, encourages choosing the simplest explanation. Applying this criterion to our problem, we choose the simplest (smallest) FSM that could have produced that observation. An algorithm is presented to solve this problem. In conclusion, our framework provides a cognitive model that is the most preferable theory for the observer, according to the Occam’s razor criterion.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Cognitive Processes by using Finite State Machines
INTRODUCTION Cognitive Informatics (Kinsner 2005, Wang 2002, 2003) provides Computer Science with a remarkable inspiration source for solving computational problems where the objectives are similar to those performed by the human mind. In spite of the fact that computational environments have some specific requirements and constraints that must not be ignored, understanding our mind is usually the key to provide successful (particularized) intelligent systems. This cross-fertilization has yielded the development of some successful intelligence mechanisms such as neural networks (Lau 1991) and case-based reasoning algorithms (Schank and Abelson 1977). It is particularly relevant to note that the relationship between Computer Science and other mind-related sciences is two-faced. In particular, the development of formal language theories (oriented to computational languages) has led to a better understanding of our mind. Due to the close relationship between language generation and mental processes, some mathematical formalisms proposed for dealing with formal computational languages turned out to be good approximations to model human reasonings. Following this line, it is specially relevant the language theory developed by Noam Chomsky. He proposed that natural languages can be represented as formal languages (Chomsky 1957, 1965). Chomsky considered four categories of languages (from simpler to more complex, right linear, context-free, context-sensitive, and grammars - or recursive enumerable) and he argued that natural languages are context-sensitive. All of these categories can be produced by a kind of suitable formal machine or acceptor ( finite state automata, push-down automata, linear bounded automata, and Turing machines, respectively). Thus, the generation of natural languages can be represented in terms of some kind of formal automata, specifically linear bounded automata. This statement is specially relevant: Since the language is a projection of our cognitive processes, we can say that our own reasonings can be represented by using context-sensitive languages. Similarly, other less expressive languages (like right linear or context-free) may provide approximated and simpler models to represent human mental processes. The difficulty to use a formal language to represent reasonings in a computational environment has discouraged most researchers to explore this trend. Paradoxically, the great expressivity of formal languages is its main handicap. For example, the beliefs/knowledge of an intelligent system cannot be internally represented by a recursive enumerable language (or its acceptor, a Turing machine), because there is no method to automatically construct the Turing machine that produces some given behavior. So, such an internal representation would be unable to create and maintain. Nevertheless, in some domains, using the simplest languages according to Chomsky’s classification could provide us with formalisms endowed with a suitable structure and expressivity while being efficient to handle. In particular, let us note that right linear languages are a suitable formalism for representing the behavior of a wide range of entities and systems. Their acceptors, that is Finite State Machines, have been used for decades to model the behavior of sequential digital circuits and communication protocols. Similarly, an intelligent entity can use an FSM to represent its belief about the behavior of the world that surrounds it. As any other knowledge representation, this model should be updated and maintained so that it provides, at any time, a feasible explanation of the events the agent has observed so far. If the model is accurate then the agent could use it to predict future situations. Hence, FSMs may be the basic formalism for knowledge representation in a learning system. In order to use an FSM to represent the knowledge of an intelligent agent, the agent must create an FSM that is consistent with all the observations and interactions it has performed so far with the environment. Once we have fixed the set of inputs and outputs that the agent will use to interact with its environment (that is, the set of operations an agent can produce to affect the environment and the actions the environment could produce in response, respectively), an environment observation is a sequence of pairs (input,output). Given such a historical trace, the agent will create an FSM describing a behavior that, in particular, produces that behavior. There exist an infinite number of FSMs that may produce a given (finite) sequence of interactions between an agent and its environment, so we have to choose one of them. Any of these FSMs extrapolates infinite behaviors from a single finite behavior. Thus, our aim is to choose an FSM with the best predictive power. If several FSMs fit into some observation then no observational information provides us with a criterion to choose one of them. However, the Occam’s razor principle will give us a scientific criterion to choose one of them. This criterion says that, on equal plausibility, the simplest theory must be chosen. The application of this criterion to our problem will provide us with a scientific argument to choose the machine that has the minimal number of states (arguments for apply-
53
Cognitive Processes by using Finite State Machines
ing this criterion in this case, and in Cognitive Informatics in general, will be extensively discussed in the next section). Since we assume that our capability to express and develop hypotheses matches the one provided by a specific model (in our case, FSMs), we will have that the simplest model (machine) that could have produced the observed events is actually the simplest hypothesis to explain these events. In this chapter, our objective is to create a learning algorithm based on this idea. The rest of this chapter is structured as follows. In Section 2 we will discuss our criterion to choose the best FSM that fits into an observation. This criterion will be based on the Occam’s razor principle. In Section 3 we present finite state machines, which are the basic formalism we will use along the chapter. In Section 4 we define folding operations (also called unifications), which are the basic operations we will apply to construct our minimal machines. Next, in Section 5 we present our algorithm to build the minimal finite state machine that could have produced a given observation. We apply that algorithm to the construction of a learning mechanism in Section 6. Finally, in Section 7 we present our conclusions.
APPLYINGOCCAMRAORPRINCIPLE A key aspect to understand the human learning processes is the preference of people to explain their observations through the simplest available theory. Let us consider the example depicted in Figure 1. Let us imagine a person who observes the inside of a room through the lock of the door, and let us suppose that he observes that, just in this moment, seven flies appear. As a consequence, he will think that the room is full of flies. Let us imagine another person who looks through a different lock of a different room, and he sees nothing. Then, he will think that the room is completely empty. Let us remark that both observers could be mistaken. In the first case, it could happen that there are only seven flies in the room, but that these flies love locks. In the second case, it could happen that the room is full of flies, but all of them are so shy that they keep out of the lock. However, the criteria of our observers are basically valid, because, before more data are collected, they choose the simplest and more probable option. This intuitive preference for simple things is usually known as Occam’s razor criterion. William of Occam criticized the high complexity of the scholastic philosophical theories because their complexity did not improve their predictive power. His criticism can be stated as “Entities should not be multiplied beyond necessity” (Tornay 1938). We can interpret it by considering that, on equal plausibility, we should choose the simplest solution. This distinction criterion, which is one of the main scientific criteria of all times (typical examples of its applicability are the Newton laws and the Maxwell equations of electromagnetism), has been applied to develop computational mechanisms of Knowledge Discovery (Blumer et al. 1987). In fact, the application of the Occam’s razor to these systems is controversial (Domingos 1999). Actually, there exist theoretical arguments supporting it (the bayesian information criterion (Schwarz 1978) and the minimum description length principle (Rissanen 1978)) and against it (the conservation law of generalization performance (Schwarz 1978) and the theory of structural Figure 1. Room with/without flies
54
Cognitive Processes by using Finite State Machines
risk minimization (Vapnik 1995)). Similarly, there are empirical results that support it (the improvement of accurateness obtained by using pruning mechanisms (Mingers 1989)) while some others refuse its validity (only in some cases concept simplification improves ID3 (Fisher and Schlimmer 1988)). It is worth to point out that those who refuse the validity of the Occam’s razor principle usually accept its practical applicability in real-world domains (Rao et at. 1995). If we consider the applicability of the Occam’s razor in the context of Cognitive Informatics, we should ask ourselves whether this criterion is actually applied by the human mind. We conjecture that it actually is. We can illustrate it easily by considering erroneous reasonings of human beings. For instance, a child learning to speak will make linguistic errors as saying “I eated a peach” instead of “I ate a peach” (even if he did not hear the word “eated” before). This error comes from his intuitive use of the English grammar rules, which say that past verbal forms are created by adding the suffix -ed. That is, children minds try to apply the simplest theory to explain the observations they perceive, and the exceptions to the rules are what require a highest learning effort. In fact, the child would never learn to speak if he did not seek the simplest rules that explain his environment (in this case, the linguistic rules). Therefore, the Occam’s razor criterion seems to be part of our own learning mechanism. The natural language is not an accidental example of the applicability of the Occam’s razor within human learning. Let us remark that the language has been created as the result of the simultaneous interaction of a huge amount of human minds during generations. So, the rules underlying it are actually a projection of our own mental processes. In that projection, the preference for the simplification is clear: Regular rules and patterns dramatically outnumber exceptions and irregularities. This property is specially relevant in our context since our application of formal languages for representing reasonings is based on the assumption that generation of language and reasonings can be produced by a formal language. Our aim is to formally apply a criterion based on the Occam’s razor to obtain, in each case, the most plausible theory that explains a sequence of observations, where our abstraction model of reasoning generation will be based on Chomsky’s theory. This means to find the simplest model that could have generated the perceived observation. Specifically, since according to that theory cognitive models can always be modelled by using linear bounded automata, our ideal objective should be to find the simplest linear bounded automaton that could have generated the detected observations. In this regard, we could define the simplicity criterion in terms of the number of states or the number of transitions of the automaton (that is, we assume that the simplest model is the smallest model). However, as we argued before, using a very expressive language is not feasible in practice because of the difficulty or impossibility of creating and/or updating it automatically. Hence, as a first approach to this difficult problem, we will tackle the previous task in the context of right linear languages, which can be modelled by finite state machines and are the simplest languages according to the classification provided by Chomsky. Therefore, our application of the Occam’s razor criterion to the Chomsky’s language theory will consist in developing an algorithm capable to find the simplest FSM that could have generated the detected observation. Specifically, in this approach we will use the number of states of the machine as simplicity criterion. Let us remark that our objective is not to minimize the number of states of a given finite state machine (the classical minimization algorithm for finite automata can be found in (Huffman 1954)) but to create from scratch the FSM with the minimal number of states that could have generated the observation. In general, two machines that can generate a given observation will not be equivalent because any behavior that is not explicitly included in the observation is not specified and can be anyone. In fact, the problem of finding the minimal deterministic Mealy machine that can produce a given sequence of inputs/outputs was first identified in (Gold 1978), where it was found to be NP-hard. To the best of our knowledge, this is the first time this problem is used as the core of a cognitive learning method in the Cognitive Informatics field. The suitability of this application is based on the arguments commented before. Besides, let us note that the solution to this problem given in this chapter is strongly different from the one given in (Gold 1978). While the method in (Gold 1978) basically consists in filling holes (that is, giving values to undefined transitions) in such a way that the minimal machine is pursued, we iteratively manipulate an initial FSM by introducing new loops (we call folding operations, or just unifications, to these operations) until the FSM is minimal. This enables an intuitive and efficient use of pruning in our branch and bound algorithm. Moreover, if the algorithm is terminated before completion, the partial output actually is a (suboptimal) FSM that can be taken as it is. On the contrary, the only output of the algorithm (Gold 1978) is given upon termination.
55
Cognitive Processes by using Finite State Machines
FINITETATEMACHINE In this section we define the simple abstraction we will use as cognitive model. Basically, we will assume that theories explaining observations must be constructed in terms of a finite state machine. These machines can be represented by two main forms: Moore and Mealy machines. The difference between them concerns the treatment of output actions. Due to the clear separation between outputs and states, we will use Mealy machines in our framework. Definition 1. A finite state machine (FSM) M is a tuple (S, I, O, T, sin) where S is the set of states of M, I is the set of input actions of M, O is the set of output actions of M, sin є S is the initial state of M, and T ⊆ S x I x O x S is the set of transitions of M. Intuitively, a transition t=(s,i,o,s’)єT represents that if M is in state s and receives and input i then M produces i/o an output o and moves to state s’. Transitions (s,i,o,s’)єT will be simply denoted by s → s'. In Figure 2 we show two FSMs. Let us consider M1. We have M1=(S,I,O,T,1) where S={1,2,3,4,5,6}, I={a,c,x,z}, and O={b,d,y,w}. The transition set T includes all transitions linking states in M1. Thus, a/b a/d c/d c/w 1 → → 2, 2 → 3, 3 → 2, 2, 2 T = x/ y a/b c/d z/w → 4, 4 → 5, 5 → 6 6 →1 3
For the sake of simplicity, we will assume that our cognitive model concerns only deterministic finite state machines. Definition2. Let M=(S, I, O, T, sin) be a finite state machine. We say that M is deterministic if for all state sєS and input iєI there do not exist transitions (s, i, o1, s1), (s, i, o2, s2) ∈ T with either o1 ≠ o2 or s1 ≠ s2, or both. The finite state machine shown in Figure 2 is deterministic. Let us note that if we did not constrain our cognitive models to be deterministic then the problem of finding the minimal finite state machines that could have produced a sequence of inputs and outputs would be trivial. This is so because it would be enough to create a machine with a single state where each pair of input and output in the observation sequence is represented by a Figure 2. Examples of FSMs
56
Cognitive Processes by using Finite State Machines
transition outgoing from that state and incoming to the same state, labelled by that pair. Since nondeterministic FSMs may produce several outputs in response to an input, they are less suitable as cognitive models than deterministic FSMs. Nondeterministic machines do not provide any additional criterion to choose one of the available outputs after an input is produced. In forthcoming definitions, we will have to deal with sequences of transitions. In the next definition we introduce the notion of trace. n −1 n −1 2 2 1 1 → s3 ,..., sn −1 → sn, and → s2, s2 Definition 3. Let M=(S, I, O, T, sin) be an FSM such that s1 in / on in / on i1 / o1 i2 / o2 sn → sn +1. In this case we say that σ = s1 → s2 → ... → sn +1 is a trace of M.
i /o
i /o
i
/o
In our framework, the interaction with the environment will be defined by means of traces. For instance, if the inputs a and b denote “drop a glass” and “take the glass with your hands”, respectively, and the outputs c and d denote “a glass falls and breaks” and “a broken glass pricks”, respectively, then for some states s1, s2, a/c b/d s3 the trace s1 → s2 → s3 could probably be generated by a real environment. However, if we are not interested on the states involved in the trace then we will use the simpler notion of observation sequence, which is basically a sequence of pairs of inputs and outputs. For instance, for the previous trace, (a/c,b/d) is an observation sequence.
PERFORMINGFOLDINGOPERATION In this section we define the basic operations we will use in our minimization algorithm. The learning algorithm we will present in this chapter, which finds the simplest finite state machine that could have produced a given observation, is based on the folding of traces. This technique consists in the iterative modification of a given finite state machine by creating cycles of states. By introducing new cycles, some states become unnecessary and can be removed. So, the total amount of states is reduced. In this process, newly created states become representative of two former states of the machine. To keep the needed information about the former states represented by a new single state, we need to extend our notion of finite state machine to attach that information. In the next definition we assume that P(X) represents the powerset of set X. Definition 4. A folding machine is a tuple U=(M,S,f ) where M=(S, I, O, T, sin) is a finite state machine, S is a set of states called set of original states of U, and the total function f:S → P(S) is the set function of U. Intuitively, given a folding machine U=(M,S,f ), the set S represents the set of original states in the former finite state machine from which M has been constructed. The mechanism of construction of M will be described later. Besides, the function f associates each state in M with the set of states of S that it represents. Each time two states of the former machine are unified into a single new state, the function f will be modified to include such information. In the next definition we provide the mechanism to perform that operation. Definition 5. Let f:X → P(Y) be a total function. We define the addition of the set y⊆Y to the element x∈X, denoted by f⊕(x,y), as the total function g:X → P(Y) where
if z ≠ x z g ( z) = f(z) y otherwise We extend that operation to sets of elements in (X,P(Y)) by overloading the symbol ⊕ in such a way that f ⊕ {(x1, y1), ..., (xn, yn)} = ((( f ⊕(x1, y1)) ⊕...) ⊕ (xn, yn)). Let us remark that there is no ambiguity in the definition of the operation ⊕ for sets since the order of application of elements is irrelevant. Now we are ready to present the formal definition of the folding of traces, in which we introduce in / on i1 / o1 i2 / o2 i q / oq i r / or → s1 → s2 → ... → sn +1 →r a new cycle in a finite state machine. Given two traces σ = q
57
Cognitive Processes by using Finite State Machines
in / on i1 / o1 i2 / o2 i /o i /o and σ = q ' → s '1 → s '2 → ... → s 'n +1 → r ' of a machine M that produce the same sequence of inputs and outputs from state s1 to sn+1 and from s’1 to s’n+1, respectively, the goal of the folding is to remove the states s’1 to s’n+1 in M. In order to do so, the transition in σ' connecting q’ to s’1 has to be redirected to s1. Besides, the transition of σ' that goes from s’n+1 to r’ has to be replaced by one departing from sn+1. More generally, any transition departing or arriving at a state in {s’1, ..., s’n+1} has to be redirected to/from the corresponding state in {s1, ..., sn+1}. q'
q'
r'
r'
in / on i1 / o1 i2 / o2 Definition 6. Let U=(M,S,f ) be a folding machine, where M=(S, I, O, T, sin). Let σ = s1 → s2 → ... → sn +1 in / on i1 / o1 i2 / o2 and σ ' = s '1 → s '2 → ... → s 'n +1 be two traces of M. The folding operation (also called unification) of the traces σ and σ’ in the folding machine U is a new folding machine U’=(M’,S,f’), with M’=(S’, I, O, T’, s’ in), where
(1) (2)
S ' = S \ {s '1 ,..., s 'n }, i/o T ' = (T \ {u → v {u , v} {s '1,..., s ' n + 1} ≠ ∅}) i/o i/o {u → s j u → s ' j ∈ T ∧ u ∉{s '1 ,..., s 'n +1}} i/o i/o {s j → u s ' j → u ∈ T ∧ u ∉{s '1 ,..., s 'n +1}}
(3) (4)
i/o i/o {s j → sk s ' j → s 'k ∈ T }, f ' = f ⊕ {( s1 , f ( s '1 )),...,( sn + 1, f ( s ' n + 1))}
sin if sin ∉{s '1 ,..., s 'n +1} s 'in = si if sin = s 'i
From now on, we will say that the location of a folding operation is the state where both unified traces diverge (that is, sn+1 in the previous definition). As an example, let us consider the finite state machine M1 depicted in Figure 2, and let us suppose that we want a/b c/d a/b c/d 2 → 3 and σ ' = 4 → to unify the traces σ = 1 → 5 → 6. That is, we want the resulting machine to perform both instances of the input/output sequence (a/b,c/d) through the same sequence of states, in this case 1, 2, and 3. Hence, the states 4, 5, 6 will be unified to 1, 2, 3, respectively. The resulting FSM M2 is also depicted in Figure 2. In this folding, the location is the state 3. Let us remark that the machine resulting after the folding is not equivalent, in general, to that we had before. In particular, some sequences of inputs and outputs that are available from the initial state in the new machine are not available in the old one. Let us consider the traces we commented just before Definition 4.0.6. Supposing that there is a path from state r to state q’ (or from r’ to q) a new cycle will be introduced (we assume the example we introduced before Definition 4.0.6.). So, an infinite set of new available sequences of inputs and outputs will be introduced in the new machine. For instance, the sequence of inputs and outputs (a/b,c/d,x/y,a/b,c/d,x/y) can be executed from state 1 in the machine depicted in Figure 2, but this trace is not available in the machine M1. On the other hand, let us note that no trace that was available before the folding will become unavailable afterwards. We formally present this idea in the next result. Lemma 4.0.1 Let U,U’ be folding machines such that U’ represents the folding of traces σ,σ' in U. Let U=(M,S,f ) with in / on i1 / o1 M=(S, I, O, T, sin) and let U’=(M’,S,f’) with M’=(S’, I, O, T’, s’ in). Then, for all trace s1 → s2 ,..., sn → sn +1 in / on i1 / o1 in M there exists states s’2, ..., s’n+1 such that s '1 → s '2 ,..., s 'n → s 'n +1 is a trace in M’. Besides, for 1≤i≤n+1 we have that if si ∉ S' then si ∈ f'(s'i). Due to lack of space, we do not include the proofs of the lemmas and theorems presented in the chapter. The interested reader can find them in (Núñez et al. 04). More detailed proofs are available from authors. The main feature of the folding of traces is that the folding operation reduces the number of states in the machine. The key to construct the minimal machine that could have produced an observed trace is that some folding
58
Cognitive Processes by using Finite State Machines
operations will be iteratively introduced so that, after each of them, the resulting machine will still be able to produce the observed trace. However, let us remark that not all folding operations are suitable. Particularly, care must be taken to not lose the determinism of the machine. For instance, let us suppose that ir = ir' and or ≠ or', where we assume again the traces commented before Definition 4.0.6. In this case, the unified machine would i r / or ' i r / or have two transitions, sn +1 → r ' , outgoing from the same state sn+1. So, the new machine → r and sn +1 r r' would be nondeterministic. Moreover, if i = i and or = or' then there would exist two equally labelled transitions outgoing from sn+1 and arriving to different states. So, a condition to unify two traces is that ir ≠ ir'. Similarly, that restriction applies to any pair of inputs that label transitions leaving the unified path at any intermediate point. In particular, if there exists a transition leaving the path labelled by some input in one of the traces and, in addition, there does not exist any transition labelled by that input leaving the path at that point in the other trace, then there is no incompatibility with that input. We will refer the availability to introduce a new transition labelled by an input at a point of the folding as the input slot for that input at that point. If it is possible to introduce such a new transition then we will say that the input slot for that input at that point is free. For instance, in the folding operation we performed in machine M1 to create machine M2 (see Figure 2) the input slot to introduce transition z/w → in state 3 is free, because there is no outgoing transition in 3 labelled with an input z. in / on i1 / o1 i2 / o2 → s2 → ... → sn +1 Definition 7. Let U=(M,S,f ) be a folding machine, where M=(S, I, O, T, sin). Let σ = s1 in / on i1 / o1 i2 / o2 and σ ' = s '1 → s '2 → ... → s 'n +1 be two traces of M. The folding of traces σ and σ' in the folding machine U is acceptable if two conditions hold:
•
•
i1 / o1 i2 / o2 i /o → s2 → ...s j → s1 of M such that i1 ≠ ij and 1≤j≤n+1, we have that either For any trace σ1 = s1 i1 / o1 i2 / o2 i 2 / o2 2 → s '2 → ...s ' j → s 2 of M such that i2 ≠ ij and i1 = i2, there does not exist another trace σ = s '1 1 2 1 2 or it does exist such a trace but s = s and o = o . i1 / o1 i2 / o2 i 2 / o2 → s '2 → ...s ' j → s 2 of M such that i2 ≠ ij and 1≤j≤n+1, we have that For any trace σ 2 = s '1 i1 / o1 i2 / o2 i1 / o1 → s2 → ...s j → s1 of M such that i1 ≠ ij and either there does not exist another trace σ1 = s1 1 2 2 1 2 1 i = i , or it does but s = s and o = o . 1
1
For example, the folding where we created M2 from M1 is acceptable. Folding operations are the basic operations to minimize a machine so that we obtain the minimal machine that could have produced a given observed sequence. These operations will be iteratively applied to improve an initial machine that we will explicitly construct. This is a very simple machine that has the capability of performing the observed sequence that is provided. It consists of a set of states containing a state for each step in the observed sequence and a set of transitions where each transition links each state with the next state through the corresponding input and output in the sequence. No cycle is performed in the machine, so all states lead to a new state. The resulting machine is a simple linear machine whose structure is directly inherited from that of the sequence. Let us formally present this idea. Definition 8. Let I and O be sets of inputs and outputs, respectively, and L=[i1/o1,...,in /on] be a sequence, where for all 1≤j≤n we have ij ∈ I and oj ∈ O. The initial machine to perform L is a finite state machine M=(S, I, O, T, sin) where —S = {s1, ..., sn+1} in / on i1 / o1 —T = {s1 → s2 ,..., sn → sn +1} —sin = s1 For instance, let L=(a/b,a/c,a/d,a/b) be a sequence. Then, the initial machine to perform L is the machine M3 depicted in Figure 3. Trivially, an initial machine becomes an (initial) folding machine by introducing the suitable additional information. As we want the new folding machine to represent the machine where no folding operation has been performed yet, the set of original states coincides with the one corresponding to the associated finite state machine. Besides, the set function returns for each state a trivial unitary set containing that state.
59
Cognitive Processes by using Finite State Machines
Definition 9. Let M=(S, I, O, T, sin) represent an initial machine to perform L. The initial folding machine to perform L is the folding machine U=(M,S,f ), where for all s∈S we have that f(s)={s}. Before presenting the algorithm to construct the minimal machine producing a given observation, we formally define the properties such a machine must have. Definition 10. Let M=(S, I, O, T, sin) be a finite state machine. Let L = [i1/o1, ..., in /on] be a sequence such that there in / on i1 / o1 i2 / o2 exists a trace σ = s1 → s2 → ... → sn +1 in M and s1 = sin. We say that M is a minimal machine producing in / on i1 / o1 i2 / o2 L if there does not exist another machine M’=(S’, I, O, T’, s’ in) and a trace σ ' = s '1 → s '2 → ... → s 'n +1 in M such that s'in = s'1 and |S’| < |S|. For instance, the minimal machine producing L=(a/b,a/c,a/d,a/b) is the machine M4, shown in Figure 3. In our algorithm, the new minimized machine will be obtained by the iterative application of some folding operations to an initial machine. The original machine is the initial folding machine associated to the given observation sequence. We need a suitable notation to represent the iterative application of a sequence of folding operations to a machine in such a way that the result after each folding is the input of the next one. This notion is formally introduced in the next definition. Definition 11. Let U1, ..., Un be folding machines and σ1, σ'1, ..., σn, σ'n be traces such that for all 2 ≤ i ≤ n we have that Ui is the folding of σi and σ’i in Un-1. Let us suppose that these n-1 folding operations are acceptable. We say α that α = [(σ1, σ'1), ..., (σn, σ'n)] is a folding sequence from U1 leading to Un and we denote it by U1 ⇒ Un.
MINIMIATION Before presenting our minimization algorithm, we present a property that we will use to prove the optimality of the machines obtained by the algorithm. It says that a minimal folding machine will be obtained by applying the suitable folding sequence. Lemma 5.0.2 Let U be the initial folding machine to perform L. Then, there exists a folding sequence α of U α with U1 ⇒ Un such that U' = (M', S', f') and M’ is a minimal machine producing L. Next, we show that the order of application of the folding operations is irrelevant. α α' Lemma 5.0.3. Let U1 ⇒ Un with α = [(σ1, σ'1), (σ2, σ'2)]. Then, U1 ⇒ Un with α' = [(σ2, σ'2), (σ1, σ'1)].
Now we are ready to present our minimization algorithm. The minimal machine will not be available until the end of the execution. We will find the minimal finite state machine that could have produced a given observation Figure 3. Examples of initial and minimal machines
60
Cognitive Processes by using Finite State Machines
by executing the backtracking algorithm presented in Figure 5. The inputs of that algorithm are the observation sequence L = [i1/o1, ..., in /on] and the initial folding machine U=(M,S,f ) associated to L, where we suppose that in / on i1 / o1 M = (S, I, O, T, s1). Besides, we assume that s1 → s2 ...sn → sn +1 is the unique trace of M that ranges all the states in S. We have used a functional programming notation to define lists: ( ) denotes an empty list, head(l) and tail(l) denote the fist element of l and the remaining of l after removing the head, respectively, and x:l denotes the inclusion, as first element, of x into the list l. Let us comment on the algorithm. Initially, we identify all folding operations that could be performed in the initial machine M and they are introduced in a list (unificationList). Besides, we calculate the number of states that would be eliminated if all the folding operations appearing in the list from a given folding up to the end were performed. We store this information in another list (heuristicList). Then, we search for the best solution in the solutions space. At each point of the search tree we decide whether a folding of the list is performed or not. Hence, each branch of the tree specifies a subset of folding operations of the list. Branches are pruned by comparing the best solution found so far with the addition of the states eliminated from the root to the current node plus an heuristic estimation of the number of states that could be eliminated up to the leaves. This heuristic consists in adding the states that would be eliminated if all the folding operations remaining in the list were performed. Trivially, this method gives an upper bound of the profit that could be gained up to the leaves of the tree. So, the heuristic is valid because it will never make a potentially useful branch to be pruned. If the upper bound of the profit is less than the one provided by the best solution found so far, the corresponding branch is not constructed. Next we prove that our minimization algorithm is optimal, that is, the returned folding machine is a minimal folding machine. Theorem 5.0.1. Let L = [i1/o1, ..., in /on] be a observation sequence and U be the initial folding machine associated to L. Then, the folding machine U’’ returned after the application of the algorithm depicted in Figure 5 to the machine U is a minimal folding machine associated to L. For example, an application of the algorithm shown in Figure 5 to the initial machine M3 that performs L=(a/ b, a/c, a/d, a/b), depicted in Figure 3, gives us the minimal machine M4 depicted in the same figure. In this case, a/b a/b the only folding performed is that relating traces 1 → 2 and 4 → 5, which is acceptable.
LEARNINGALGORITHM In this section we consider how our algorithm to find the minimal FSM fitting into an observation can be used in the core of a learning algorithm. An algorithm that allows an intelligent agent to develop the simplest theory that is consistent with those observations it has collected so far is the following: 1. 2. 3.
4.
5.
First, the sets of inputs and outputs in its environment, that is, the way in which the agent and its environment can affect each other, is fixed. The agent interacts with its environment and collects an historical record of the results of each interaction. When the length of the record exceeds a given threshold, the minimal FSM capable to produce that behavior is constructed according to the algorithm depicted in Figure 5. This FSM represents the cognitive theory of the agent. From now on, the agent takes into account that theory to make its decisions, that is, to decide in any moment the input it will use to interact with the environment. It will use the theory to try to guess in advance the possible effect of its hypothetical future actions. So, it can use it to succeed and avoid failing. The agent keeps recording its interaction with the environment. Periodically, the minimal FSM is reconstructed according to (longer) records, which allows the agent to refine its cognitive theory along time.
Let us note that using the simplest theory (in this case, the smallest FSM) as cognitive model is not only a suitable procedure to extrapolate infinite behaviors from a finite observation, but it is also a mechanism to reduce the size of the cognitive model. Since the basic mechanism of the learning algorithm encourages the creation of small knowledge models, this algorithm may help to reduce the amount of required memory in an intelligent system. 61
Cognitive Processes by using Finite State Machines
Figure 5. Minimization algorithm unificationsList := []; maximalSaving := 0; heuristicList := [0]; for j := 1 to n do for all substring Y of L length j do let k be the position where substring Y starts in L. for all substring of L of length j coinciding with Y do let l be the position where such substring starts in L. ik + j / ok + j ik / ok let σ = sk → sk +1 ...sk + j → sk + j +1 il + j / ol + j il / ol → sl +1 ...sl + j → sl + j +1 let σ ' = sl if unification of σ,σ' is acceptable in U then unificationList := (σ,σ') : unificationList; maximalSaving := maximalSaving + j; heuristicList := maximalSaving : heuristicList; fi od od od (u, bestSaving) := SearchBest(U, unificationList, heuristicList, 0, 0); return u; function SearchBest (u, unificationList, heuristicList, currentSaving, bestSaving) if unificationList = [] then if currentSaving ≥ bestSaving then bestSaving := currentSaving; fi return (u, bestSaving); else (σ,σ') := head(unificationList); maximalSaving := head(heuristicList); bestIndex := 0; if currentSaving + maximalSaving ≥ bestSaving and unification of σ,σ' is acceptable in u then u' := unification (u, σ,σ'); (u", bestSaving') := SearchBest(u', tail(unificationList), tail(heueristicList) currentSaving + length(σ), bestSaving); if bestSaving' ≥ bestSaving then bestSaving := bestSaving'; bestIndex := 1; fi fi maximalSaving := head(tail(heuristicList)); if currentSaving + maximalSaving ≥ bestSaving then (u"', bestSaving') := SearchBest (u', tail(unificationList), tail(heuristicList), currentSaving, bestSaving); if bestSaving' ≥ bestSaving then bestSaving := bestSaving'; bestIndex := 2; fi fi if bestIndex = 2 then return (u"', bestSaving); else if bestIndex = 1 then return (u", bestSaving); else return (u, bestSaving); fi fi
62
Cognitive Processes by using Finite State Machines
CONCLUION We have presented an algorithm that provides a mechanism to find the simplest finite state machine that could have produced a given observation. This algorithm obtains the simplest theory to explain an observation. So, it represents the theory we would obtain in the chosen cognitive model by systematically applying Occam’s razor criterion. Finite state machines are formalisms that produce the simplest kind of languages according to Chomsky’s classification (right linear languages). Hence, since languages and reasoning processes are linked, our approach provides a learning algorithm that fits into the simplest form of reasoning. Let us note that our methodology assumes two postulates. First, following Chomsky, since natural language (and, indirectly, any human cognitive process) is produced by one of the languages in Chomsky’s classification (specifically, the context-sensitive languages), we postulate that the lowest languages in that classification (that is, right linear languages) provide a suitable (simplified) model to represent human reasonings. Second, we assume that the simplest model is the smallest model. Thus, a suitable way to apply Occam’s razor criterion to build the simplest theory that explains an observation of the environment consists in finding the smallest finite state machine that could have produced that observation.
REFERENCE Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987). Occam’s razor. Information Processing Letters, 24, 377-380. Chomsky, N. (1957). Syntactic Structures. Haag, Mouton. Chomsky, N. (1965). Aspect of the Theory of Syntax. MIT Press. Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery 3(4) 409-425. Fisher, D., & Schlimmer, J. (1988). Concept simplification and prediction accuracy. In Proceedings of the Fifth International Conference on Machine Learning. Morgan Kaufmann, 22-28. Gold, E. M. (1978). Complexity of automaton identification from given data. Information and Control, 37, 302320. Huffman, D. (1954). The synthesis of sequential switching circuits. J. Franklin Inst. 257, 3-4, 161-190, 275-303. Kinsner, W. (2005). Some advances in cognitive informatics. In International Conference on Cognitive Informatics (ICCI’05).6-7. IEEE Press. Lau, C. (1991). Neural networks, theoretical foundations an analysis. IEEE Press. Mingers, J. (1989). An empirical comparison of pruning measures for decision tree induction. Machine Learning, 4, 227-243. Núñez, M., Rodríguez, I., & Rubio, F. (2004). Applying Occam’s razor to FSMs. In International Conference on Cognitive Informatics. (pp. 138-147). IEEE Press. Rao, R., Gordon, D., & Spears, W. (1995). For every generalization action, is there really an equal or opposite reaction? Analysis of conservation law. In Proceedings of the Twelveth International Conference on Machine Learning. (pp. 471-479). Morgan. Rissanen, J. (1978). Modelling by shortest data description. Automatica, 14, 465-471. Schaffer, J. (1994). A conservation law for generalization performance. In Proceedings of the 11th International Conference on Machine Learning (pp. 259-265). Morgan Kaufmann.
63
Cognitive Processes by using Finite State Machines
Schank, R., & Abelson, R. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Erlbaum. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464. Tornay, S. (1938). Ockham: Studies and Selections. La Salle, IL: Open Court Publishers. Vapnik, V. (1995). The nature of statistical learning theory. Springer. Wang, Y. (2002). On Cognitive Informatics. In International Conference on Cognitive Informatics (ICCI’02) (pp. 34-42). IEEE Press. Wang, Y. (2003). Cognitive informatics: A new transdisciplinary research field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2),115-127.
64
65
Chapter IV
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes Yingxu Wang University of Calgary, Canada
Abstract An interactive motivation-attitude theory is developed based on the Layered Reference Model of the Brain (LRMB) and the Object-Attribute-Relation (OAR) model. This chapter presents a rigorous model of human perceptual processes such as emotions, motivations, and attitudes. A set of mathematical models and formally described cognitive processes are developed. The interactions and relationships between motivation and attitude are formally described in real-time process algebra (RTPA). Applications of the mathematical models of motivations and attitudes in software engineering are demonstrated. This work is the detailed description of a part of the layered reference model of the brain (LRMB) that provides a comprehensive model for explaining the fundamental cognitive processes of the brain and their interactions. This work demonstrates that the complicated human emotional and perceptual phenomena can be rigorously modeled in mathematics and be formally treated and described.
INTRODUCTION A variety of life functions and cognitive processes has been identified in cognitive informatics (Wang, 2002a, 2003a, 2007b), cognitive science, neuropsychology, and neurophilosophy. In order to formally and rigorously describe a comprehensive and coherent set of mental processes and their relationships, a layered reference model of the brain (LRMB) developed (Wang and Wang, 2006; Wang et al., 2006) that explains the functional mechanisms and cognitive processes of the natural intelligence. LRMB encompasses 37 cognitive processes at six layers known as the sensation, memory, perception, action, meta and higher cognitive layers from the bottom up.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Definition 1. Perception is a set of internal sensational cognitive processes of the brain at the subconscious cognitive function layers that detects, relates, interprets, and searches internal cognitive information in the mind. Perception may be considered as the sixth sense of human beings since almost all cognitive life functions rely on it. Perception is also an important cognitive function at the subconscious layers that determines personality. In other word, personality is a faculty of all subconscious life functions and experience cumulated via conscious life functions. It is recognized that a crucial component of the future generation computers known as the cognitive computers is the perceptual engine that mimic the natural intelligence (Wang, 2006, 2007c). The main cognitive processes at the perception layer of LRMB are emotion, motivation, and attitude (Wang et al., 2006). This chapter presents a formal treatment of the three perceptual processes, their interrelationships, and interactions. It demonstrates that complicated psychological and cognitive mental processes may be formally modeled and rigorously described. Mathematical models of the psychological and cognitive processes of emotions, motivations, and attitudes are developed in the following three sections. Then, interactions and relationships between emotions, motivations, and attitudes are analyzed. Based on the integrated models of the three perception processes, the formal description of the cognitive processes of motivations and attitudes will be presented using Real-Time Process Algebra (RTPA) (Wang, 2002b, 2003c). Applications of the formal models of emotions, motivations, and attitudes will be demonstrated in a case study on maximizing strengths of individual motivations in software engineering.
THEHIERARCHICALMODELOF Emotions are a set of states or results of perception that interprets the feelings of human beings on external stimuli or events in the binary categories of pleasant or unpleasant. Definition 2. An emotion is a personal feeling derived from one’s current internal status, mood, circumstances, historical context, and external stimuli. Emotions are closely related to desires and willingness. A desire is a personal feeling or willingness to possess an object, to conduct an interaction with the external world, or to prepare for an event to happen. A willingness is the faculty of conscious, deliberate, and voluntary choice of actions. According to the study of Fischer and his colleagues (Fischer et al., 1990; Wilson and Keil, 1999), the taxonomy of emotions can be described at three levels known as the sub-category, basic, and super levels as shown in Table 1. It is interesting that human emotions at the perceptual layer may be classified into only two opposite categories: pleasant and unpleasant. Various emotions in the two categories can be classified at five levels according to its strengths of subjective feelings as shown in Table 2, where each level encompasses a pair of positive/negative or pleasant/unpleasant emotions.
Table 1. Taxonomy of emotions Level
66
Description
Super level
Positive (pleasant)
Negative (unpleasant)
Basic level
Joy
Love
Anger
Sadness
Fear
Sub-category level
Bliss, pride, contentment
Fondness, infatuation
Annoyance, hostility, contempt, jealousy
Agony, grief, guilt, loneliness
Horror, worry
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Table 2. The Hierarchy of emotions Description
Level (Positive/Negative) 0
No emotion
1
Weak emotion
2
3
4
— Comfort
Safeness, contentment, fulfillment, trust
Fear
Worry, horror, jealousy, frightening, threatening
Joy
Delight, fun, interest, pride
Moderate emotion Sadness
Anxiety, loneliness, regret, guilt, grief, sorrow, agony
Pleasure
Happiness, bliss, excitement, ecstasy
Anger
Annoyance, hostility, contempt, infuriated, enraged
Love
Intimacy, passion, amorousness, fondness, infatuation
Hate
Disgust, detestation, abhorrence, bitterness
Strong emotion
Strongest emotion
Definition 3. The strength of emotion |Em| is a normalized measure of how strong a person’s emotion on a fivelevel scale identified from 0 through 4, i.e: 0 ≤ |Em| ≤ 4
(1)
where |Em| represents the absolute strength of an emotion regardless whether it is positive (pleasant) or negative (unpleasant), and the scope of |Em| is corresponding to the definitions of Table 2. It is observed that an organ known as hypothalamus in the brain is supposed to interpret the properties or types of emotions in terms of pleasant or unpleasant (Payne and Wenger, 1998; Pinel, 1997; Smith, 1993; Westen, 1999; Wang et al., 2006). Definition 4. Let Te be a type of emotion, ES the external stimulus, IS the internal perceptual status, and BL the Boolean values true or false. The perceptual mechanism of the hypothalamus can be described as a function, i.e: Te : ES × IS → BL
(2)
It is interesting that the same event or stimulus ES may be explained in different types, in terms of pleasant or unpleasant, due to the difference of the real-time context of the perceptual status IS of the brain. For instance, walking from home to the office may be interpreted as a pleasant activity for one who likes physical exercise, but the same walk due to car breakdown will be interpreted as unpleasant. This observation and the taxonomy provided in Tables 1 and 2 leads to the following Theorem. Theorem 1. The human emotional system is a binary system that interprets or perceives an external stimulus and/or internal status as pleasant or unpleasant. Although there are various emotional categories in different levels, the binary emotional system of the brain provides a set of pairwise universal solutions to express human feelings. For example, angry may be explained as a default solution or generic reaction for an emotional event when there is no better solution available; otherwise, delight will be the default emotional reaction.
THEMATHEMATICALMODELOFMOTIVATION Motivation is an innate potential power of human beings that energizes behavior. It is motivation that triggers the transformation from thought (information) into action (energy). In other words, human behaviors are the embodiment of motivations. Therefore, any cognitive behavior is driven by an individual motivation. 67
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Definition 5. A motivation is a willingness or desire triggered by an emotion or external stimulus to pursue a goal or a reason for triggering an action. As described in the Layered Reference Model of the Brain (LRMB) (Wang et al., 2006), motivation is a cognitive process of the brain at the perception layer that explains the initiation, persistence, and intensity of personal emotions and desires, which are the faculty of conscious, deliberate, and voluntary choices of actions. Motivation is a psychological and social modulating and coordinating influence on the direction, vigor, and composition of behavior. This influence arises from a wide variety of internal, environmental, and social sources, and is manifested at many levels of behavioral and neural organizations. The taxonomy of motives can be classified into two categories known as learned and unlearned (Wittig, 2001). The latter is the primary motives such as the survival motives (hunger, thirst, breath, shelter, sleep, and eliminating), and pain. The former is the secondary motives such as the need for achievement, friendship, affiliation, dominance of power, and relief anxiety, which are acquired and extended based on the primary motives. Definition 6. The strength of motivation M is a normalized measure of how strong a person’s motivation on a scale of 0 through 100, i.e.: 0 ≤ M ≤ 100
(3)
where M = 100 is the strongest motivation and M = 0 is the weakest motivation. It is observed that the strength of a motivation is determined by multiple factors (Westen, 1999; Wilson and Keil, 1999) such as: a. b. c.
The absolute motivation |Em|: The strength of the emotion. The relative motivation E - S: A relative difference or inequity between the expectancy of a person E for an object or an action towards a certain goal and the current status S of the person. The cost to fulfill the motivation C: A subjective assessment of the effort needed to accomplish the expected goal.
Therefore, the strength of a motivation can be quantitatively analyzed and estimated by the subjective and objective motivations and their cost as described in the following theorem. Theorem 2. The strength of a motivation M is proportional to both the strength of emotion |Em| and the difference between the expectancy of desire E and the current status S, of a person, and is inversely proportional to the cost to accomplish the expected motivation C, i.e.:
M =
2.5 · | Em | (E -S ) C
(4)
where 0≤ |Em| ≤ 4, 0 ≤ (E,S) ≤ 10, 1 ≤ C ≤ 10, and the coefficient 2.5 makes the value of M normalized in the scope of (0 .. 100). In Theorem 2, the strength of a motivation is measured in the scope 0 ≤ M ≤ 100. When M > 1, the motivation is considered being a desired motivation, because it indicates both an existing emotion and a positive expectancy. The higher the value of M, the stronger the motivation. According to Theorem 2, in a software engineering context, the rational action of a manager of a group is to encourage individual emotional desire, and the expectancy of each software engineers, and to decrease the required effort for the employees by providing additional resources or adopting certain tools. Corollary 1. There are super strong motivations toward a resolute goal by a determined expectancy of a person at any cost.
68
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
It is noteworthy that a motivation is only a potential mental power of human beings, and a strong motivation will not necessarily result in a behavior or action. The condition for transforming a motivation into a real behavior or action is dependent on multiple factors, such as values, social norms, expected difficulties, availability of resources, and the existence of alternative goals. The motivation of a person is constrained by the attitude and decision making strategies of the person. The former is the internal (subjective) judgment of the feasibility of the motivation, and the latter is the external (social) judgment of the feasibility of the motivation. Attitude and decision making mechanisms will be analyzed in the following subsections.
THEMATHEMATICALMODELOFATTITUDE As described in the previous section, motivation is the potential power that may trigger an observable behavior or action. Before the behavior is performed, it is judged by an internal regulation system known as the attitude. Psychologists perceive attitude in various ways. R. Fazio describes an attitude as an association between an act or object and an evaluation (Fazio, 1986). A. Eagly and S. Chaiken define that an attitude is a tendency of a human to evaluate a person, concept, or group positively or negatively in a given context (Eagly and Chaiken, 1992). More recently, Arno Wittig describes attitude as a learned evaluative reaction to people, objects, events, and other stimuli (Wittig, 2001). Attitudes may be formally defined as follows. Definition 7. An attitude is a subjective tendency towards a motivation, an object, a goal, or an action based on an intuitive evaluation of its feasibility. The modes of attitudes can be positive or negative, which can be quantitatively analyzed using the following model. Definition 8. The mode of an attitude A is determined by both an objective judgment of its conformance to the social norm N and a subjective judgment of its empirical feasibility F, i.e.:
ìï1, N = T Ù F = T A = ïí ïï 0, N = F Ú F = F î
(5)
where A = 1 indicates a positive attitude; otherwise, it indicates a negative attitude.
INTERACTIONBETWEENMOTIVATIONATTITUDE This section discusses the relationship between a set of interlinked perceptual psychological processes such as emotions, motivations, attitudes, decisions, and behaviors. A motivation/attitude-driven behavioral model will be developed for formally describing the cognitive processes of motivations and attitudes. It is observed that motivation and attitude have considerable impact on behavior and influence the ways a person thinks and feels (Westen, 1999). A reasoned action model is proposed by Martin Fishbein and Icek Ajzen in 1975 that suggests human behavior is directly generated by behavioral intensions, which are controlled by the attitude and social norms (Fishbein and Ajzen, 1975). An initial motivation before the judgment by an attitude is only a temporal idea; with the judgment of the attitude, it becomes a rational motivation (Wang et al., 2006), also known as the behavioral intention. The relationship between an emotion, motivation, attitude, and behavior can be formally and quantitatively described by the motivation/attitude-driven behavioral (MADB) model as illustrated in Figure 1. In the MADB model, motivation and attitude have been defined in Eqs. 2 and 3. The rational motivation, decision, and behavior
69
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
can be quantitatively analyzed according to the following definitions. It is noteworthy that, as shown in Figure 1, a motivation is triggered by an emotion or desire. Definition 9. A rational motivation Mr is a motivation regulated by an attitude A with a positive or negative judgment, i.e.: Mr = M A =
(6)
2.5 · | Em | (E -S ) A C
Definition 10. A decision D for confirming an attitude for executing a motivated behavior is a binary choice on the basis of the availability of time T, resources R, and energy P, i.e.: ìï1, T Ù R Ù P = T D = ïí ïï 0, T Ú R Ú P = F î
(7)
Definition 11. A behavior B driven by a motivation Mr and an attitude is a realized action initiated by a motivation M and supported by a positive attitude A and a positive decision D toward the action, i.e.:
ì 2.5 | Em | (E -S ) ï ï T, M r D = A D >1 ï C B =í ï ï F, otherwise ï î
Figure 1. The model of motivation/attitude-driven behavior Satisfy/dissatisfy
Motivation
M
Rational motivation
Mr
A
Stimuli
Emotion
Strengthen/weaken
Outcome
D
Attitude (Perceptual feasibility) N
B Behavior
Decision (physical feasibility) F T/R/P
Values/ social norms
Experience
Internal process
70
Availability of time, resources, and energy
External process
(8)
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
FORMALDECRIPTIONOFPROCEes OFMOTIVATION AND ATTITUDE The formal models of emotion, motivation, and attitude have been developed in previous sections. This section extends the models and their relationship into detailed cognitive processes based on the object-attribute-relation (OAR) model (Wang, 2007d) and using RTPA (Wang, 2002b, 2003c), which enables more rigorous treatment and computer simulation.
The Cognitive Process of Motivations The mathematical model of motivation is described in equation 6. Based on equation 6, the cognitive process of motivation (MTVT) is presented in Figure 2. The motivation process is divided into four major sub-processes known as (i) Form motivation goal, (ii) Estimate strength of motivation, (iv) Form rational motivation, and (vi) Stimulate behavior for the motivation. The MADB model provides a formal explanation of the mechanism and relationship between motivation, attitude, and behavior. The model can be used to describe how the motivation process drives human behaviors and actions, and how the attitude as well as the decision making process help to regulate the motivation and determines whether the motivation should be implemented.
The Cognitive Process of Attitudes The mathematical model of attitude has been described in Equation 5. Based on Equation 5, the cognitive process of attitude (ATTD) is presented in Figure 3. The attitude process is divided into three major sub-processes known as (iii) Check the mode of attitude, (v) Determine physical availability, and (vi) Stimulate behavior for the motivation.
The Integrated Process of Motivation and Attitudes According to the model of motivation/attitude-driven behavior (MADB) and the formal description of the motivation and attitude processes as shown in Figures 1 through 3, the cognitive processes of motivation and attitude are interleaved. An integrated process that combines both motivation and attitude is given in Figure 4, via the following sub-processes: (i) Form motivation goals, (ii) Estimate strength of motivation, (iii) Check the mode of attitude, (iv) Form rational motivation, (v) Determine physical availability, and (vi) Stimulate behavior for the rational motivation.
MAIMIINGTRENGTH OFMOTIVATION Studies in sociology provide a rich theoretical basis for perceiving new insights into the organization of software engineering. It is noteworthy that in a software organization, according to Theorem 1x, the strength of a motivation of individuals M is proportional to both the strength of emotion and the difference between the expectancy and the current status of a person. At the same time, it is inversely proportional to the cost to accomplish the expected motivation C. The job of management at different levels of an organization tree is to encourage and improve Em and E, and to help employees to reduce C. Example 1. In software engineering project organization, the manager and programmers may be motivated to the improvement of software quality in different extent. Assume the following factors as shown in Table 3 are collected from a project on the strengths of motivations to improve the quality of a software system, analyze how the factors influence the strengths of motivations of the manager and the programmer.
71
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Figure 2. The cognitive process of motivations
The Motivation Process Motivation (I:: oS; O:: OAR(O’, A’, R’)ST) { I. Form motivation goal(s) Identify (o, A’, R’) II. Estimate strength of motivation M(o)N Quantify (Em(o)N) // The strength of emotion Quantify (S(o)N)
// The current status
Quantify (E(o)N)
// The expectancy of desire
Quantify (C(o)N)
// The cost to accomplish
M (o)N
2.5 Em (o)N (E (o)N -S (o)N) C (o)N
(
M(o) N > 1
|
~
M(o)BL = T
// Positive motivation
M(o)BL = F
// Negative motivation
) III. Check the mode of attitude A(o)N // Refer to the Attitude process IV. Form rational motivation Mr(o) Mr(o)N := M(o)N • A(o)N ( Mr(o)N > 1 |
Mr(o)BL = T
// Rational motivation
Mr(o)BL = F
// Irrational motivation
~
) V. Determine physical availability D(o)N // Refer to the Attitude process VI. Stimulate behavior for M r(o) ( D(o)N = 1 GenerateAction (Mr(o)) ExecuteAction (Mr(o)) R’ := R’ ∪
| ~
// Implement motivation o
// Give up motivation o
D(o)N := 0 o := Ø R’ := Ø ) OAR’ST = // Form new OAR model Memorization (OAR’ST) }
72
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Figure 3. The cognitive process of attitude The Attitude Process Attitude (I:: oS; O:: OAR(O’, A’, R’)ST) { I. Form motivation goal(s) Identify (o, A’, R’) II. Estimate strength of motivation M(o)N // See the MTVT process III. Check the mode of attitude A(o)N // Perceptual feasibility Qualify (N(o)BL) // The social norm Qualify (F(o)BL) (
N(o)BL ∧ F(o)BL = T
|
~
// The subjective feasibility
A(o)N := 1 A(o)N := 0 ) IV. Form rational motivation Mr(o) // Refer to the Motivation process V. Determine physical availability D(o)N Qualify (T(o)BL) // The time availability Qualify (R(o)BL)
// The resource availability
Qualify (P(o)BL)
// The energy availability
(
T(o)BL ∧ R(o)BL ∧ P(o)BL = T
|
~
D(o)N := 1
// Confirmed motivation
D(o)N := 0
// Infeasible motivation
) VI. Stimulate behavior for Mr(o) ( D(o)N = 1
// Implement motivation o
GenerateAction (Mr(o)) ExecuteAction (Mr(o)) R’ := R’ ∪ |
~
// Give up motivation o D(o)N := 0 o := Ø R’ := Ø
) OAR’ST = // Form new OAR model Memorization (OAR’ST) }
�
73
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Table 3. Motivation factors of a project Role
Em
C
E
S
The manager
4
3
8
5
Programmers
3.6
8
8
6
According to Theorem 1x, the strengths of motivations of the manager M1 and the programmer M2 can be estimated using Equation 4, respectively: 2.5 | Em | (E -S ) C 2.5 4 (8 - 5) = 3 = 10.0
M 1 (manager ) =
and 2.5 3.6 (8 - 6) 8 = 2.3
M 2 (programer ) =
The results show that the manager has much stronger motivation to improve the quality of software than that of the programmer in the given project. Therefore, the rational action for the manager is to encourage the expectancy of the programmer or to decrease the required effort for the programmer by providing additional resources or adopting certain tools. According to social psychology (Wiggins et al., 1994), social environment, such as culture, ethical norms, and attitude, greatly influences people’s motivation, behavior, productivity, and quality towards collaborative work. The chain of individual motivation in a software organization can be illustrated as shown in Figure 5. Cultures and values of a software development organization helps to establish a set of ethical principles or standards shared by individuals of the organization for judging and normalizing social behaviors. The identification of larger set of values and organizational policy towards social relations may be helpful to normalize individual and collective behaviors in a software development organization that produces information products for a global market. Another condition for supporting creative work of individuals in a software development organization is to encourage diversity in both ways of thinking and work allocation. It is observed in social ecology that a great diversity of species and a complex and intricate pattern of interactions among the populations of a community may confer greater stability on an ecosystem. Definition 12. Diversity refers to the social and technical differences of people in working organizations. Diversity includes a wide range of differences between people such as those of race, ethnicity, age, gender, disability, skills, educations, experience, values, native language, and culture. System theory indicates that if the number of components of a system reaches a certain level – the critical mass, then the functionality of the system may be dramatically increased (Wang, 2007a). That is, the increase of diversity in a system is the condition to realize the system fusion effect, which results in a totally new system. Theorem 3. The diversity principle states that the more diversity the workforce in an organization (particularly the creative software industry), the higher the opportunity to form new relations and connections that leads to the gain of the system fusion effect.
74
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Figure 4. The Integrated process of motivation and attitude The Motivation and Attitude Process Motivation-Attitude (I:: oS; O:: OAR(O’, A’, R’)ST) { I. Form motivation goal(s) Identify (o, A’, R’) II. Estimate strength of motivation M(o)N // The strength of emotion Quantify (Em(o)N) Quantify (S(o)N)
// The current status
Quantify (E(o)N)
// The expectancy of desire
Quantify (C(o)N)
// The cost to accomplish
M (o) N
2.5 E m (o) N (E (o) N -S (o) N) C (o) N
(
M(o)N > 1
|
~
M(o)BL = T
// Positive motivation
M(o)BL = F
// Negative motivation
) III. Check the mode of attitude A(o)N // Perceptual feasibility Qualify (N(o)BL) // The social norm Qualify (F(o)BL)
// The subjective feasibility
N(o)BL ∧ F(o)BL = T A(o)N := 1 ~ A(o)N := 0
( | )
IV. Form rational motivation Mr(o) Mr(o)N := M(o)N • A(o)N ( Mr(o)N > 1 |
Mr(o)BL = T
// Rational motivation
Mr(o)BL = F
// Irrational motivation
~
) V. Determine physical availability D(o)N Qualify (T(o)BL) // The time availability Qualify (R(o)BL)
// The resource availability
Qualify (P(o)BL)
// The energy availability
(
T(o)BL ∧ R(o)BL ∧ P(o)BL = T
|
~
D(o)N := 1
// Confirmed motivation
D(o)N := 0
// Infeasible motivation
) VI. Stimulate behavior for Mr(o) ( D(o)N = 1
// Implement motivation o
GenerateAction (Mr(o)) ExecuteAction (Mr(o)) R’ := R’ ∪ |
~
// Give up motivation o D(o)N := 0 o := Ø R’ := Ø
) OAR’ST = // Form new OAR model Memorization (OAR’ST) }
�
75
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Figure 5. The chain of motivation in a software organization
Basic human needs of individuals
Motivation
Behavior
Productivity
Organizational objectives Attitude
Quality
The social environment of software engineering
CONCLUION This chapter has described the cognitive processes of emotions, motivations, and attitudes, and demonstrates that complicated psychological and cognitive mental processes may be formally modeled and rigorously described. The perceptional cognitive processes such as emotions, motivations, and attitudes have been explored in order to explain the natural drives and constraints of human behaviors. Relationships and interactions between motivation and attitude have been discussed and formally described in Real-Time Process Algebra (RTPA). It has been recognized that the human emotional system is a binary system that interprets or perceives an external stimulus and/or internal status as pleasant or unpleasant. It has revealed that the strength of a motivation is proportional to both the strength of the emotion and the difference between the expectancy of desire and the current status of a person, and is inversely proportional to the cost to accomplish the expected motivation. Case studies on applications of the interactive motivation-attitude theory and cognitive processes of motivations and attitudes in software engineering have been presented. This work has demonstrated that the complicated human emotional and perceptual phenomena can be rigorously modeled in mathematics and be formally treated and described. This work has been based on two fundamental cognitive informatics models: the Layered Reference Model of the Brain (LRMB) and the Object-Attribute-Relation (OAR) model. The former has provided a blueprint to exploring the natural intelligence and its mechanisms. The latter has established a contextual foundation to reveal the logical representation of information, knowledge, and skills in the abstract space of the brain.
AC The author would like to acknowledge the Natural Science and Engineering Council of Canada (NSERC) for its support to this work. We would like to thank the anonymous reviewers for their valuable comments and suggestions.
REFERENCE Eagly, A.H., & Chaiken, S. (1992). The psychology of attitudes. San Diego: Harcourt, Brace. Fazio, R.H. (1986). How do attitudes guide behavior. In R.M. Sorrentino and E.T. Higgins (eds.), The Handbook of Motivation and Cognition: Foundations of Social Behavior. New York: Guilford Press.
76
On the Cognitive Processes of Human Perception with Emotions, Motivations, and Attitudes
Fischer, K.W., Shaver, P.R., & Carnochan, P. (1990). How emotions develop and how they organize development. Cognition and Emotion, 4, 81-127. Fishbein, M., & Ajzen, I. (1975). Belief, attitude, intention, and behavior: An introduction to theory and research.. Reading, MA: Addison-Wesley. Payne, D.G., & Wenger, M.J. (1998). Cognitive psychology. New York: Houghton Mifflin Co. Pinel, J.P.J. (1997). Biopsychology, (3rd. ed.).Needham Heights, MA: Allyn and Bacon. Smith, R.E. (1993). Psychology. St. Paul, MN: West Publishing Co. Wang, Y. (2002a, August). On cognitive informatics. Keynote Lecture from the Proceedings 1st IEEE International Conference on Cognitive Informatics (ICCI’02) (pp.34-42). Calgary, Canada: IEEE CS Press. Wang, Y. (2002b). The real-time process algebra (RTPA). The International Journal of Annals of Software Engineering, 14, 235-274. Oxford: Baltzer Science Publishers.. Wang, Y. (2003a). Cognitive informatics: A new transdisciplinary research field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy,.4(2), 115-127. Wang, Y. (2003b). On cognitive informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 151-167. Wang, Y. (2003c). Using process algebra to describe human and software behaviors. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy 4(2), 199-213. Wang, Y. (2006, Ju;y). Cognitive informatics - Towards the future generation computers that think and feel. Keynote speech of the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (p. 3-7). Beijing, China: IEEE CS Press. Wang, Y. (2007a). Software engineering foundations: A software science perspective (p. 1580). CRC Software Engineering Series, 2, USA: CRC Press. Wang, Y. (2007b, Jan). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) 1(1), 1-57. Hershey, PA: IGI Publishing, Hershey, PA. Wang, Y. (2007c, July). Towards theoretical foundations of autonomic computing. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) 1(3), 1-15. Hershey, PA: IGI Publishing, Hershey. Wang, Y. (2007d, July). The OAR model of neural informatics for internal knowledge representation in the brain. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) 1(3). 64-75. Herhsey, PA: IGI Publishing. Wang, Y., & Wang, Y. (2006, March). Cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 203-207. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB), IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 124-133. Westen, D. (1999). Psychology: Mind, brain, and culture (2nd ed.). New York: John Wiley & Sons, Inc. Wiggins, J.A., Eiggins, B.B., & Zanden, J.V. (1994). Social psychology (5th ed.). New York: McGraw-Hill, Inc. Wilson, R. A., & Keil, F.C. (eds.) (1999). The MIT encyclopedia of the cognitive sciences. Cambridge, MA: , The MIT Press. Wittig, A.F. (2001). Schaum’s outlines of theory and problems of introduction to psychology (2nd ed.). New York: McGraw-Hill.
77
78
Chapter V
A Selective Sparse Coding Model with Embedded Attention Mechanism Qingyong Li Beijing Jiaotong University, China Zhiping Shi Chinese Academy of Sciences, China Zhongzhi Shi Chinese Academy of Sciences, China
Abstract Sparse coding theory demonstrates that the neurons in the primary visual cortex form a sparse representation of natural scenes in the viewpoint of statistics, but a typical scene contains many different patterns (corresponding to neurons in cortex) competing for neural representation because of the limited processing capacity of the visual system. We propose an attention-guided sparse coding model. This model includes two modules: the non-uniform sampling module simulating the process of retina and a data-driven attention module based on the response saliency. Our experiment results show that the model notably decreases the number of coefficients which may be activated, and retains the main vision information at the same time. It provides a way to improve the coding efficiency for sparse coding model and to achieve good performance in both population sparseness and lifetime sparseness.
Introduction Understanding and modeling the functions of the neurons and neural systems are one of the primary goals of cognitive informatics (CI) (Wang 2002, 2007; Wang and Kinsner 2006). The computational capabilities and limitations of neurons, and the environment in which the organism lives are two fundamental components driving the evolution and development of such systems. The researchers have broadly investigated them.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Selective Sparse Coding Model with Embedded Attention Mechanism
The utilization of environmental constraints is most clearly evident in sensory systems, where it has long been assumed that neurons are adapted to the signals to which they are exposed (Simoncelli 2001). Because not all signals are equally like each other, it is natural to assume that perceptual systems should be able to best process those signals that occur most frequently. Thus, it is the statistical properties of the environment that are relevant for sensory process of vision perception (Field 1987; Simoncelli 2003). Efficient coding hypothesis (Barlow 1961) provides a quantitative relationship between environmental statistics and neural processing. Barlow for the first time hypothesized that the role of early sensory neurons was to remove statistical redundancy in the sensory input. Then, Olshausen and Field put forward a model, called sparse coding, which made the variables (equivalence of neurons stimulated by the same stimulus in the neurobiology) be activated (i.e., significantly non-zero) only rarely (Olshausen 1996). This model is named SC here. Vinje’s results validated the sparse properties of neural responses under natural stimuli conditions (Vinje 2000). Afterwards, Bell brought forward another sparse coding model based on statistical independence (called SCI) and obtained the same results as Olshausen and Field’s model (Bell 1997). More recent studies can be seen in survey (Simoncelli 2003). However, Willmore and Tolhurst (Willmore 2001) argued that there were two different ways for 'sparseness': population sparseness and lifetime sparseness. Population sparseness describes codes in which few neurons are active at any time and it is utilized in Olshausen and Field’s sparse coding model (Olshausen 1996); while lifetime sparseness describes codes in which each neuron's lifetime response distribution has high kurtosis, which is the main contribution in Bell’s sparse coding model (Bell 1997). In addition, it is proved that lifetime sparseness was uncorrelated with population sparseness. Just as figure 3.a shows the number of variables, which have large values produced by sparse coding model and are possible to be activated, is relatively large compared with the computation capacity of neurons. Though, the kurtosis of every response coefficient is also high. So, how to reduce both population sparseness and lifetime sparseness at the same time to retain the important information as much as possible is a valuable problem in practice. Vision attention mechanism is an active strategy in information processing procedure of brain, which has many interesting characteristics such as selectivity and competition. Attention is everywhere in the visual pathway (Britten 1996). Furthermore, a typical scene within the neuron’s classic receptive field (CRF) contains many different patterns that compete for neural representation because of the limited processing capacity of neurons in the visual system. So, integrating attention mechanism into sparse coding framework to reduce the population sparseness and to improve the coding efficiency sounds reasonable and essential. In this chapter, we extend sparse coding principle combining the vision attention. We first model the sampling mechanism of retina by a non-uniform sampling module; then, we implement a bottom-up attention mechanism based on the response saliency of the sparse coefficient. The diagram is illustrated in figure 1. This model has two main contributions: 1. 2.
Modeling the vision attention in the framework of sparse coding. Improving the population sparseness of the response coefficient in the same time retaining the most efficient information.
Figure 1. The diagram of the model Simple cell Natural image
Complex cell
Retina
Attentio n (1)
Attentio n (2)
79
A Selective Sparse Coding Model with Embedded Attention Mechanism
The rest of the chapter is organized as follows. Section 2 presents related work. In section 3, a detailed description of the model is given. Experimental results are presented in section 4. Conclusions are given section 5.
Related Work In the sparse coding model (Olshausen 1996; Bell 1997), a perceptual system is exposed to a series of small image patches, drawn from one or more large images, just like the CRF of neurons. Imagine that each image patch, represented by the vector x, has been formed by the linear combination of N basis functions. The basis functions form the columns of a fixed matrix A. The weight of this linear combination is given by a vector s. Each component of this vector has its own associated basis function, and represents a response value of a neuron in vision system. The linear synthesis model is therefore given by: x = As
(1)
The goal of a perceptual system in this simplified framework is to linearly transform the images x with a matrix of filters W so that the resulting vector u = Wx
(2)
recovers the response values s. In a cortical interpretation, the s models the responses of (signed) simple cells and the column of matrix A closely related to their CRF’s (Olshausen 1996). Figure 2.a shows some basis functions which are selective for location, orientation and frequency just as simple cells. Note that we are considering the contrast only. In the framework of efficient coding hypothesis, a fundamental assumption is that s is non-gaussian in a particular way, called sparseness (Field 1994). Sparseness means that random variable takes very small (absolute) values or very large values more often than a Gaussian random variable and it takes values in between relatively more rarely. Thus, the random variable is activated, which has significantly non-zero value only rarely. There are many models to implement the efficient coding. The most noted models include SC model in (Olshausen 1996) and SCI model in (Bell 1997). Though SCI model achieves good lifetime sparseness, its population sparseness does not show good results (Willmore 2001). It also does not consider the computational capacity limitation of neuron in primary vision cortex. Convergent evidences from single-cell recording studies in monkeys, functional brain imaging and eventrelated potential studies in humans indicate that selective attention can modulate neural processing in visual cortex. Visual attention affects neural processing in several ways. These include the following: enhancement of neural responses to a pattern, filtering of unwanted pattern counteracting the suppression, and so on. There are also many computational modeling of visual attention: given that the purpose of visual attention is to focus computational resources on a specific, “conspicuous" or “salient" region within a scene, it has been proposed that
Figure 2. Basis functions randomly selected from the set. (a) the original basis functions produced by sparse coding model; (b) the corresponding binary basis functions with distinct excitatory subregion labeled with white
80
A Selective Sparse Coding Model with Embedded Attention Mechanism
the control structure underlying vision attention needs to represent such locations within a topographic saliency map. There are two famous saliency-based vision attention models (Itti 1998; Rybak 1998; Itti 2001). They provide a data-driven model to simulate the attention mechanism in vision perception. Obviously, a typical image patch or the input of neuron’s CRF contains many different patterns. Because of the limited processing capacity these patterns compete for neural representation. That is to say, some variables of u for certain basis functions (here we also call pattern), corresponding to simple cells’ response in cortex, will be selected for further processing; on the contrary, some variables will be omitted. Next section will show how to model the competition or attention mechanism in vision sparse coding framework.
Attention-Guided Sparse Coding Model (AGSC ) General Description of the Model A functional diagram of AGSC model is shown in Figure. 1. AGSC model includes two sequent attention modules in the sparse coding framework. At the beginning, the first attention module performs a transformation of the image into a ‘retinal image’ simulating the process of retina. The transformation provides a decrease of resolution for the retinal image from the center to the periphery of the CRF. The retinal image used as an input to the sparse coding module of the simple cell. Then, the second attention module performs the selective attention based on response saliency. It is a data-driven module, related to the so-called ‘feature integration theory’ and ‘saliency-based attention model’ (Itti 1998). The simple cell’s response value and discrepancy distance based on their selective properties such as location, orientation and space-frequency formed the response saliency of simple cell. The simple cells’ responses compete for being further processed in complex cell based on response saliency value.
Non-Uniform Sampling Module It is well known that the density of photoreceptors in the retina is greatest in the central area (fovea) and decreases to the retinal periphery (Kronaver 1985). As a result, the resolution of the image representation in the visual cortex is highest for the part of the image projected onto the fovea and decreases rapidly with distance from the fovea center. The results show that the retina nonuniformly samples the input visual information. The retinal image (labeled RI = {V'ij}) is derived from the initial image I = {Vij} by way of a special transformation which obtains a decrease in resolution from the center of CRF to its periphery. To represent certain area D in the image I at resolution level n (n ∈ {1, 2, 3}), we utilize the recursive computation of the Gaussian-like convolution at each position in D: Rij1 = Vij Rij2 = Rij3 =
p =2
q=2
∑ ∑G
p =−2
q =−2
p =2
q=2
p =−2
q =−2
∑ ∑G
pq
Ri1− p , j − q
pq
Ri2−2 p , j −2 q
(3)
where the coefficients matrix of convolution is as following (Burt 1985):
1 4 [G pq ] = 6 4 1
4 16 24 16 4
6 24 36 24 6
4 16 24 16 4
1 4 1 6 * 256 4 1
(4) 81
A Selective Sparse Coding Model with Embedded Attention Mechanism
The input image patch is taken as the whole CRF and the center of the image patch is the center of the CRF. Here, we simply divide the image patch into three concentric circles from center to periphery. The radiuses for the concentric circles are R0, R1, R2 (empirically specified 6R0 = 2R1 = R2)(Rybak 1998). And the Euclidean distance between point (i, j) and the center is D(i, j). So the retinal image RI after being non-uniformly sampled can be represented as following: Rij1 if D (i, j ) ≤ R0 Vij′ = Rij 2 if R0 ≤ D (i, j ) ≤ R1 3 Rij if R1 ≤ D (i, j ) ≤ R2
(5)
Thus, the input image patch is represented as following: the pixels are full sampled within the central circle just as the original image, sampled with lower resolution within the first ring surrounding the central circle, and sampled with the lowest resolution within the third circle.
Response Saliency and Discrepancy Distance It is the second attention module, named selective attention module based on response saliency in AGSC after the input stimulus is processed by the non-uniform sampling module. It is the key part for the attention mechanism in AGSC since it determines what input patterns are selected and further processed in higher cortex. This section introduces the detail of selective attention module based on response saliency. De. nition 1: Response saliency is the response extent for a neuron compared with a group of neurons which respond to the same stimulus. The purpose of the response saliency is to represent the conspicuity of every neuron in the same perception level for a stimulus and to guide the selection of attended neuron, based on the value of response saliency. The neuron response that has great response saliency value will be chosen to further process. On the contrary, the neuron that has small value will be omitted. In the framework of sparse coding, the simple cells in the primary visual cortex (V1) produce sparse codes for the input stimuli. That is to say, response of simple cell takes very small (absolute) values or very large values often; to compensate, it takes values in between relatively rarely. The lifetime sparseness focuses on the possibility distribution of response (Olshausen 1996). Intuitively, the response value itself provides very useful information: if the response value is bigger, the information represented by the neuron is more important; otherwise, the information is less important. Obviously, the response value gives a foundation for the attention mechanism. Supposed here that Ai represents simple cell i, and Ri represents the simple cell’s response. So Ri is greater, the response saliency value of Ai is also greater. Every simple cell (corresponding to the column of A in Equation 1) carries a specific pattern. Furthermore, every such pattern is selective for location, orientation and frequency. Based on Gestalt similarity perception principle and Hebb rule, we can get that neurons which have similar visual selectivity characteristics such as location, orientation and space-frequency will enhance the response saliency for each other. On the contrary, the neurons with different selectivity characteristics will suppress the response saliency values (Simon 1998). Suppose that the response saliency value of a neuron which has great discrepancy for visual selectivity characteristics among a group of neurons which respond to the same stimulus will decrease, and the value for a neuron which has small discrepancy will increase relatively (Boothe 2002). The neuron set responding to the same stimulus assumes as S, S = {A1, A2, ..., Am}, corresponding to the basis functions in sparse coding model. Now we first define two important measures. Definition 2: pattern distance measures the similarity between two patterns of simple cell, and it is represented as D(Ai, Aj) between two simple cells Ai and Aj. D(Ai, Aj) is a function of simple cell’s selectivity characteristics: location (L), orientation (O) and frequency (F), since every simple cell here can be regarded as a pattern characterized by the parameters: L, O, F.
82
A Selective Sparse Coding Model with Embedded Attention Mechanism
Definition 3: discrepancy distance measures the discrimination for a simple cell among the simple cell set S when they respond to the same stimulus, and it is assumed as Diff(Ai, S) for simple cell Ai. The basis functions obtained by the sparse coding model are selective to location, orientation and spatial frequency just like the simple cell receptive field, so we analyze the visual selectivity of such basis functions instead of simple cells. We first deal with the basis functions as gray image, and transform the gray image into binary image using Otsu's method (Otsu 1979). Figure 2.b shows the binary basis function with distinct excitatory subregion labeled with white. Then we extract the location, orientation and frequency features from the binary basis functions. Location selectivity is the first important characteristic of simple cell receptive field. We treat the center, L = (x, y), of the excitatory subregion as the location selectivity; orientation O is a scalar, which represents the angle (in degree) between the x-axis and the major axis of the ellipse that has the same secondmoments as the excitatory subregion; and here spatial frequency F is replaced by size which is the area of the excitatory subregion. So D(Ai, Aj) can be calculated as below: D( Ai , Aj ) = W1 * N( ( Lix − L jx ) 2 + ( Liy − L jy ) 2) + W2 * N( Oi − O j )+ W3 * N( Fi − F j )
(6)
Here, operation N(.) represents the normalization operator which makes the values between 0 and 1, and 0 ≤ W1, W2, W3 ≤ 1 represents the weights, W1, W2, W3 = 1. Lx and Ly refer to the x-axes coordinates and y-axes coordinates, respectively. We here call the simple cell subset in S, except Ai, as neighbor cells of Ai and refer as NSi. According to definition 3, Diff(Ai, S) reflects the response discrimination extent between Ai and its neighbor cells. It is influenced not only by pattern distance, but also by their response values. So we define Diff(Ai, S) as the weighted sum of response value for neighbor cells and the weights is the designated by the pattern distance. The equation is given by: Diff ( Ai , S ) = (
∑
A j ∈NSi
N ( D (i, j )) ∗
Rj ∑ Ak ∈NSi Rk
)
(7)
Here, operation N(.) represents the normalization operator which makes the values between 0 and 1. Note that the normalization is also implied on response values of neighbor cells in order to limit the value of Diff(Ai, S) in range (0, 1). From equation 7, we can easily get that if the pattern distance and response value are both larger, then the discrepancy distance will be larger too, so the response of the Ai will be suppressed, just like the lateral suppression mechanism in neural system (Simon 1998). After we get the response value and discrepancy distance, we can finally define the response saliency (RS). There are two factors influencing the RS value. The first one is the internal factor-response value. The response value provides the foundation for the data-driven attention mechanism as discussed above and it is also the most important difference among the simple cells responding to the same stimulus. The second one is the external factor-discrepancy distance. It measures the relationship between the individual simple cell and its neighbor cells and simulates the interaction among the cells. Because the details of neural mechanism of attention are yet not known (Britten 1996), we define the RS value as the weighted sum of norm response value and neuron discrepancy distance for the simplicity. The equation is given by:
RS( Ai ) = N( Ri ) + λ * (1 − Diff ( Ai , S ))
(8)
Here λ is the weight which determinates the importance of each component. Note that the second component is defined as the complement of Diff(Ai, S), since Diff(Ai, S) is counteractive factor like the function of suppression. Greater its value is, smaller the RS value will be.
83
A Selective Sparse Coding Model with Embedded Attention Mechanism
Selective Attention Module Based on Response Saliency After we get the simple cell’s response saliency value we can select certain simple cells as the complex cell’s inputs according to the response saliency value. Selection is an important characteristic for attention mechanism (Kahneman 1973). Psychologists regard it as an internal mechanism, which controls how to process the input stimuli and adjust the behaviors. Selection makes the information process procedure be more efficient (Kahneman 1973). We design two selection strategies: threshold selection (TS) and proportion selection (PS).
Threshold Selection Strategy Treisman firstly put forward the concept of thresholds in the famed attenuation attention model (Treisman 1964). He argued that every response pattern had its own threshold, and the input stimulus would be activated if its response was greater than the threshold, otherwise it would be attenuated or ignored. Intuitionally, it sounds reasonable to set up a threshold for the simple cell’s response based on the RS value resembling the attenuation attention model. So we put forward a threshold selection (TS) strategy. TS is a threshold filtering algorithm. Assumed we get a threshold, T. If the response saliency value for a simple cell is greater than T the simple cell is chosen as the input for complex cell; on the contrary, if the value is smaller than T the simple cell is omitted. We can formalize it as follow: 0 Output ( Ai ) = Ri
if RS ( Ai ) ≤ Ti if RS ( Ai ) Ti
(9)
where RS(Ai) refers to response saliency value of simple cell Ai, and Ri is the response value of Ai. Output(Ai) represents the output of the attention module for Ai. Obviously, if its value equals 0, the output of simple cell Ai is omitted, otherwise, the output of Ai will be further processed in the complex cell. The key problem is how to determine the threshold. In principle, different simple cell has different threshold, however, it is very difficult to determine the thresholds even by biology experiments (Treisman 1964). Simply, we assume that all simple cells have the same threshold, T. So we can learn the threshold through data set. We have to note that the purpose of attention mechanism is to omit the petit information of input stimuli and to retain the primary information. In the viewpoint of data transformation, it means that we can well reconstruct the original stimulus by the information processed by attention module. So we can learn threshold T by controlling the reconstruction error. The threshold learning algorithm is described below: Algorithm 1: Threshold Learning Algorithm Input: The upper limitation of reconstruction error (UE), basis functions set (A), training data set (I), sparse coding coefficients set (R), and response saliency value set (RS). Output: Threshold (T) Method: 1. 2. 3.
Initialize T; Filter sparse coding coefficients by T. If RSi greater than T, R'i = Ri, else R'i = 0; Compute the reconstruction error for the data set I: Error ( Ri′,
4.
A) = ∑ Ii
∑ Ii ( x, y ) − ∑ Ri′Ai ( x, y ) x, y i
2
If Error ≥ UE, T = ηT where 0 < η < 1, goto step 2; otherwise, return T.
In order to improve the algorithm convergence, Threshold T is initialized by the mean value of simple cell’s response value. η is the learning rate, and is designated as 0.8 which is empirically proved to get good tradeoff between convergence speed and precision.
84
A Selective Sparse Coding Model with Embedded Attention Mechanism
Proportion Selection Strategy We introduce the proportion selection strategy in this section, which is a bottleneck filtering algorithm. As we know, the processing capacity for the primate’s neurons in V1 is finite (Hubel 1995). That is to say, the number of the input stimuli, which can be processed by the neuron, is restricted to a maximal value. Here we assume the maximal number is M. So the simple cells responding to the stimulus are firstly sorted by descend according to the response saliency value; then the frontal M responses of simple cells are chosen as the input for complex cell. For generalization and simplicity, we transform the control parameters M to the proportion factor P, P = Max number / Total of stimuli. We hypothesize that every complex cell has the same proportion, so the frontal proportion P of simple cell’s responses are processed by a complex cell. We learn the proportion P from training data by controlling the reconstruction error, just as the threshold learning algorithm. The threshold learning algorithm is described below: Algorithm 2: Proportion Learning Algorithm Input: The upper limitation of reconstruction error (UE), basis functions set (A), training data set (I), sparse coding coefficients set (R), and response saliency value set (RS). Output: Threshold (P) Method: 1. 2. 3.
Initialize P; Filter sparse coding coefficients by P. If RSi belongs to the frontal P × Total number of simple cell, R'i = Ri, else R'i = 0; Compute the reconstruction error for the data set I: Error ( Ri′,
4.
A) = ∑ Ii
∑ Ii ( x, y ) − ∑ Ri′Ai ( x, y ) x, y i
2
If Error ≥ UE, P = (1 + η)P where 0 < η < 1, and goto step 2; else return P.
For the reason of error precision, the proportion value P has to be initialized by a relative small value. It is designated with 0.1 in our experiments. η is learning rate and set to 0.01.
Exeriment Result In this section, we estimated the results given by the model, using patches of natural images as the input data. For simplicity, we just consider the static, monochromatic and monocular image. A sample of 50,000 image patches of 16*16 pixels were sampled from natural images available on the WWW (http://www.cis.hut.fi/projects/ica/ data/images). Because the characteristics of location and orientation are more important than space-frequency for simple cell, so in our experiments we set W1=2/5, W2=2/5, W3=1/5 in equation 6. And assign λ=0.2 in equation 8, in order to make the response value more significant. We use fastICA algorithm, which was also successfully applied in (Hyvarinen 2001) to implement the sparse coding and get 160 basis functions as figure 2. There are two selection strategies: threshold selection (the AGSC model is named as AGSC-T), and proportion selection (the AGSC model is named as AGSC-P) as described in section 3. The next sub-section demonstrates the performance of AGSC-P and AGSC-T.
Sparseness Just as Willmore and Tolhurst's results (Willmore 2001), lifetime sparseness does not mean population sparseness, so AGSC model is utilized to improve the population sparseness for the sparse coding model (Olshausen 1996). In our sparse coding model which is implemented by FastICA algorithm (Bell 1997), there are more than
85
A Selective Sparse Coding Model with Embedded Attention Mechanism
Figure 3. Response coefficients for input stimulus. a)Coefficients produced by sparse coding model; b) Response saliency value for every simple cell; c) Coefficients produced by AGSC-P model; d) Coefficients produced by AGSC-T model
70% of simple cells whose responses are greater than the mean response. Figure 3.a shows the sparse codes for an input stimulus. Figure 3.b shows the response saliency value among these simple cells. Comparing figure 3.b with figure 3.a, we can find that the response saliency value is mainly dominated by response value, though it is the function of response value and discrepancy distance Diff. That is to say, the response saliency value is large in most cases if the response value (here means the absolute value) is large. In figure 3.c, the response coefficients processed by AGSC-P are presented. Here, the proportion factor P learned by algorithm 2 is 45%, which means that 72 simple cells’ responses pass the attention module. Note that the proportion factor P is learned once and used forever for a data set. Compared with response coefficients produced by sparse coding model, in which 122 simple cells’ responses are greater than the mean response value, the final coefficients produced by AGSC-P model are great sparser, in which however only 72 simple cells’ responses are greater than 0. Figure 3.d demonstrates the response coefficients produced by AGSC-T model, in which the threshold T equals 1.372 (greater than mean value 0.984). After the process of AGSC-T, there are 69 simple cells passing the AGSC-T model, which is little less than AGSC-P. Obviously, it is sparser than AGSC-T and SCI model. Comparing with AGSC-T and AGSC-P, there is little difference between them, except the location labeled by red arrow in figure 3.c and figure 3.d. In addition, it is a ‘winner take-all or none’ strategy for the output of simple cell, which is extensively utilized in artificial neural network.
Reconstruction Error Analysis AGSC model indeed greatly improves the sparseness of the codes produced by sparse coding model, but it is unknown whether it effectively and correctly codes the input stimulus? After all, one foundational requirement for coding model is to effectively transfer the major information of stimulus. To our excitement, AGSC model shows good performance on this aspect. Figure 4 intuitively demonstrates the capability on reconstructing the original stimulus by code coefficients. The first column is the original image patch (16*16); the second column is the reconstruction image by SCI coefficients; the third one is the reconstruction image by AGSC-P and the last one is for AGSC-T. From figure 4, we can easily get that the reconstruction image is very similar with original image in vision perception. It implies that AGSC model effectively codes the major information and retains the primary features, thought it omits large percent of coefficients by attention model.
86
A Selective Sparse Coding Model with Embedded Attention Mechanism
Figure 4. Input image patch and reconstruction image patch. First column is original image patch; second column is the reconstruction image based on SCI model; the third one is the reconstruction image based on AGSC-P model; the last one is the reconstruction image based on AGSC-T model
We subjectively validate the performance of AGSC model on reconstructing original stimulus. Furthermore, we numerically analyze the reconstruction error of AGSC model here. Reconstruction error is defined as the square sum of error between original image pixels and reconstruction image pixels (Olshausen 1996). The formula is described as below: 2
Error ( s, A) = ∑ I( x, y ) − ∑ si ai ( x, y ) x, y i
(10)
Table 1 shows the reconstruction error of SCI, AGSC-P and AGSC-T. Note that original image here refers to the image after preprocessing procedure, in order to eliminate the influence of preprocessing in FastICA algorithm. The reconstruction error of AGSC-P and AGSC-T little rises, comparing with SC. But the increase ratio is small, for example, the relative increase percent for mean error is 10%. The damage for maximum error and minimum error are also rare. In a word, the reconstruction error is very little, compared with improvement of sparseness (AGSC model at least omits 55% response coefficients). In reality, a little error is acceptable if the reconstruction error is limited in a certain small range. After all, the sparse coding procedure for simple cell is not the same as signal compression, and completely reversible requirement is not necessary (Simoncelli 2001). After analyzing the performance of AGSC model, we have a quick look at the relationship between AGSC-P and AGSC-T. The experiment results discussed above show that AGSC-T and AGSC-P both have good performance, and there are little difference between them. Essentially, AGSC-T and AGSC-P are homologous: the threshold T can indirectly decide the proportion factor P. In the same time, proportion factor P can also determine the range of T. The main difference between AGSC-T and AGSC-P lists as following:
Table 1. The reconstruction error comparison between SCI model (SC) and AGSC model Mean error
Maximum error
Minimum error
0.1113
0.3929
0.0078
AGSC-P
0.1105
0.3913
0.0060
SCI
0.1002
0.3209
0.0032
AGSC-T
87
A Selective Sparse Coding Model with Embedded Attention Mechanism
•
•
The sparseness of AGSC-T is adaptive to input stimulus. The number of simple cells which pass through the AGSC-T attention module is different from different stimuli, because the response value and response saliency value are different. However, that for AGSC-P is invariant, since the proportion factor determines the number; The convergence of Proportion Learning Algorithm is faster than Threshold Learning Algorithm.
Parameter Analysis Threshold T and proportion P play important role in the AGSC model. Obviously, if we choose different values, we will get different coding coefficients. Of course, they directly influence the performance of AGSC model. Because threshold T can transform to proportion P and P is more explicit, we discuss the influence of parameter P in AGSC-P. Intuitionally, the smaller P is the less number of simple cells will be activated in AGSC-P model, or the codes of AGSC-P are sparser. However, the more information will be omitted. Figure 5 shows the reconstruction image with different proportion value. From left to right, P is designated with 0.1, 0.2, 0.3, 0.45 and 0.80 respectively, and the rightmost one is the original image. It is easy to be perceived that the reconstruction image is more and more similar with the original image from left to right. The most left image is distinctly different from original one, but when P is greater than 0.3, the difference between the reconstruction image and original one is very little. It implies that most of information is coded by little codes or simple cells. We now numerically analyze the relationship between reconstruction error and proportion P in AGSC–P model. Figure 6 describes the curve of mean reconstruction error with P. The reconstruction error sharply descends with the rise of P when P is less than 0.3; however, the reconstruction error mildly drops with the increase of P when P is greater than 0.3. This result is not surprising, since there are some hidden patterns which are most important for the input stimulus and are the major features, if such patterns are retained the reconstruction error certainly becomes low; otherwise, the reconstruction error becomes large. Alternatively, most of the other patterns are not very important, so it does not dramatically damage the reconstruction error to omit the response coefficients for these patterns.
Figure 5. The reconstruction image in AGSC-P model according to different proportion value P. The proportion P is designated with 0.10, 0.20, 0.30, 0.45, and 0.80, respectively. The last column refers to the original image
88
A Selective Sparse Coding Model with Embedded Attention Mechanism
Figure 6. The relationship between proportion factor P and reconstruction error in AGSC-P model
Conclusion In this chapter, we put forward an attention-guided sparse coding model, which includes non-uniform sampling module and saliency-based data-driven module, in the framework of efficient coding hypothesis. Our experiments demonstrate that it not only prominently reduces the number of activated coefficients for an input stimulus but also remains the main essential vision information in the condition of omitting more than 50% coding coefficients. This model designs and implements an active and efficient mechanism to adapt to the limited computation capability of neural system and improve the efficiency for traditional sparse coding model. There are two attention mechanisms: bottom-up or data-driven attention method and top-down or promptdriven method. AGSC provides an effective data-driven attention mechanism and improve the coding efficiency. Nevertheless, top-down attention is another important way for vision attention. We will carry out more research work on the top-down attention mechanism in vision perception.
Acknowledgment Supported by National Natural Science Foundation of China No. 60435010, National Basic Research Priorities Programme No. 2003CB317004, and Beijing Jiaotong University Foundation No. 2006RC020. For comments and suggests, we are grateful to Adetunmbi Adebayo.O., Jun Shi and Sulan Zhang.
References Barlow, H.B. (1961). Possible principles underlying the transformation of sensory messages. In Rosenblith, W.A,editor, Sensory communication (pp. 217-234). .Cambridge, MA: The MIT Press. Bell, A.J, & Sejnowski, T.J. (1997). The ‘independent components’ of natural scenes are edge filters. Vision Research, 37(23), 3327-3338. Boothe, R.G. (2002). Perception of the visual environment. New York: Springer-Verlag; Berlin Heidelberg. Britten K H. (1996). Attention is everywhere. Nature, 382, 497-498.
89
A Selective Sparse Coding Model with Embedded Attention Mechanism
Burt P.J. (1985). Smart sensing within a pyramid vision machine. Proceedings of the IEEE, 76(8), 1006-1015. Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4(12), 2379-2394. Field, D.J. (1994). What is the goal of sensory coding. Neural Computation, 6, 559-601. Hubel, D.H. (1995). Eye, brain and vision (Reprint edition). W.H. Freeman & Company. Hyvarinen, A., & Hoyer, P.O. (2001). A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images. Vision Research. 41(18), 2413-2423. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on PAMI, 20(11), 1254-1259. Itti, L. (2001). Visual attention and target detection in cluttered natural scenes. Optical Engineering, 40(9), 17841793. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall, Inc. Kronaver, R.E., & Yehoshua, Y.Z. (1985). Reorganization and diversification of signals in vision. IEEE Trans. on Sys. Man, and Cyber SMC, 15(1), 91-101. Olshausen, B.A, & David, J. F. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607-609. Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62-66. Rybak, G., & Golovan, P. (1998). A model of attention-guided visual perception and recognition. Vision Research. 38, 2387-2400. Simon, H. (1998). Neural networks: A comprehensive foundation. Upper Saddle River, NJ: Prentice Hall PTR. Simoncelli, E. P. (2003). Vision and statistics of the visual environment. Current Opinion in Neurobiology, 13, 144-149. Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics and neural representation. Annual Review of Neuroscience, 24, 1193-1216. Treisman, A.M. (1964). Verbal cues, language, and meaning in selective attention. American Journal of Psychology, 77, 206-219. Vinje, W.E., & Gallant, J.L., (2000). Sparse coding and decorrelation in primary visual cortex during natural vision. Science, 287, 1273-1276. Wang, Y., & Kinsner, W. (2006, March). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics, 36(2), 121-123. Wang, Y. (2002, August). On cognitive informatics. Keynote speech of the Proceedings of the 1st IEEE International Conference on Cognitive Informatics (ICCI02) (pp. 34-42). Calgary, Canada: IEEE CS Press. Wang, Y. (2007, January). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi) 1(1), 1-27. Hershey, PA: IGP. Willmore, B, & Tolhurst, D.J. (2001). Characterizing the sparseness of neural codes. Network. 2(3), 255-270.
90
Section II
Natural Intelligence
92
Chapter VI
The Cognitive Processes of Formal Inferences Yingxu Wang University of Calgary, Canada
Abstract Theoretical research is predominately an inductive process, while applied research is mainly a deductive process. Both inference processes are based on the cognitive process and means of abstraction. This chapter describes the cognitive processes of formal inferences such as deduction, induction, abduction, and analogy. Conventional propositional arguments adopt static causal inference. This chapter introduces more rigorous and dynamic inference methodologies, which are modeled and described as a set of cognitive processes encompassing a series of basic inference steps. A set of mathematical models of formal inference methodologies is developed. Formal descriptions of the 4 forms of cognitive processes of inferences are presented using Real-Time Process Algebra (RTPA). The cognitive processes and mental mechanisms of inferences are systematically explored and rigorously modeled. Applications of abstraction and formal inferences in both the revilement of the fundamental mechanisms of the brain and the investigation of next generation cognitive computers are explored.
INTRODUCTION Inferences are a formalized cognitive process that reasons a possible causal conclusion from given premises based on known causal relations between a pair of cause and effect proven true by empirical observations, theoretical inferences, and/or statistical regulations (Bender, 1996; Wilson and Keil, 2001; Wang, 2007a). Formal logic inferences may be classified as causal argument, deductive inference, inductive inference, abductive inference, and analogical inference (Schoning, 1989; Sperschneider and Antoniou, 1991; Hurley, 1997; Tomassi, 1999; Smith, 2001; Wilson and Keil, 2001; Wang et al., 2006). Theoretical research is predominately an inductive process; while applied research is mainly a deductive process. Abstraction is a powerful means of philosophy and mathematics. It is also a preeminent trait of the human brain identified in cognitive informatics studies (Wang, 2005, 2007c; Wang et al., 2006). All formal logical inferences and reasonings can only be carried out on the basis of abstract properties shared by a given set of objects under study.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
The Cognitive Processes of Formal Inferences
Definition 1. Abstraction is a process to elicit a subset of objects that shares a common property from a given set of objects and to use the property to identify and distinguish the subset from the whole in order to facilitate reasoning. Abstraction is a gifted capability of human beings. Abstraction is a basic cognitive process of the brain at the meta cognitive layer according to the Layered Reference Model of the Brain (LRMB) (Wang, 2003a, 2007c; Wang et al., 2003, 2006). Only by abstraction can important theorems and laws about the objects under study be elicited and discovered from a great variety of phenomena and empirical observations in an area of inquiry. Definition 2. Inferences are a formal cognitive process that reasons a possible causality from given premises based on known causal relations between a pair of cause and effect proven true by empirical arguments, theoretical inferences, or statistical regulations. Mathematical logic, such as propositional and predicate logic, provide a powerful means for logical reasoning and inference on truth and falsity (Schoning, 1989; Sperschneider and Antoniou, 1991; Hurley, 1997; van Heijenoort, 1997). Definition 3. An argument is an assertion that yields () a proposition Q called the conclusion from a given finite set of propositions known as the premises P1, P2, …, Pn, i.e.: BL (P1BL ∧ P2BL ∧ … ∧ PnBL QBL)BL
(1)
where the argument and all propositions are in type Boolean (BL). Hence, BL = T called a valid argument, otherwise it is a fallacy, i.e. BL = F. Equation 1 can also be denoted in the following inference structure:
BL
Premises BL P BL Ù P2 BL Ù ... Ù Pn BL = 1 Conclusion BL Q BL
(2)
Example 1. The following expressions are concrete arguments: (a) A concrete deductive argument Information processing is an intelligent behavior (P1 ). 1BL Ù Computer is able to process information (P2 ). Computer is an itelligent machine (Q ).
(3)
(b) A concrete inductive argument Human is able to process information (P1 ). 2 BL
Ù Computer is able to process information (P2 ). Information processing is a common property of itelligence (Q ).
(4)
Example 2. The following expressions are abstract arguments:
93
The Cognitive Processes of Formal Inferences
(a) Abstract deductive arguments 3 BL
4 BL
"x Î S , P (x ) Ù a Î S $x = a, P (a )
(5)
"n Î N, n < n + 1 $n = 1 Î N, 1 < 2
(6)
where N represents the type of natural numbers. (b) Abstract inductive arguments $x = a Î S , P (a ) 3 BL
Ù $x = b Î S , P (b) Ù $x = c Î S , P (c) "x Î S , P (x )
(7)
1
1 · (1 + 1) $n = 1 Î N Þ å i = 2 i =1 2 ' BL
Ù $n = 14 Î N Þ Ù $n = 15 Î N Þ
4
åi =
14 · (14 + 1) 2
åi =
15 · (15 + 1) 2
i =1 5
i =1
"n Î N Þ
n(n + 1) 2
(8)
where N represents the type of natural numbers. In the above examples that the premier propositions should be arranged in a list that the most general ones are put in the front. This condition preserves the deductive chain in reasoning. It is noteworthy that propositional arguments can be classified as a kind of causal and static inference. More rigorous and dynamic inferences may be modeled and described as a set of mental processes encompassing a series of basic inference steps (Wang, 2007a; Wang and Wang, 2006). Further, the cognitive processes and mental mechanisms of inferences need to be systematically explored and rigorously modeled (Wang, 2002b, 2007b; Wang and Kinsner, 2006; Wang et al., 2006). This chapter describes the cognitive processes of formal inferences, which are the foundations of human reasoning, thinking, learning, and problem solving. A formal treatment of the mechanisms of inferences is presented. Mathematical models of four forms of inferences known as deduction, induction, abduction, and analogy are rigorously developed. Formal descriptions of all inference processes are developed using Real-Time Process Algebra (RTPA) (Wang, 2002a, 2003b, 2006a, 2007a). Applications of the formal inference methodologies and processes in dealing with complicated problems in both the revilement of the fundamental mechanisms of the brain and the investigation of next generation cognitive computers are explored
MATHEMATICALMODEL OFFORMAL Inferences are a formal cognitive process that reasons a possible causal conclusion from given premises based on known causal relations between a pair of cause and effect proven true by empirical arguments, theoretical inferences, or statistical regulations. Inferences may be classified into the deductive, inductive, abductive, and
94
The Cognitive Processes of Formal Inferences
analogical categories (Hurley, 1997; Tomassi, 1999; Wilson and Keil, 2001; Wang et al., 2006). For seeking generality and universal truth, either the objects or the relations can only be abstractly described and rigorously inferred by abstract models rather than real world details.
Deduction Definition 4. Deduction is a cognitive process by which a specific conclusion necessarily follows from a set of general premises. Deduction is a reasoning process that discovers or generates new knowledge based on generic beliefs one already holds such as abstract rules or principles. The validity of a deductive inference depends on its conformity to the validity of generic principle; at the same time, the generic principle that the deduction is based on is evaluated during the deductive practice. Theorem 1. A generic inference formula of logical deduction states that, given an arbitrary nonempty finite set X, let p(x) be a proposition for ∀x ∈ X, a specific conclusion on ∃a ∈ X, p(a) can be drawn as follows: ∀x ∈ X, p(x) ∃a ∈ X, p(a)
(9)
where denotes yield or a causal relation. A composite form of propositions of Equation 2 can be given below: (∀x ∈ X, p(x) ⇒ q(x)) (∃a ∈ X, p(a) ⇒ q(a))
(10)
Any valid logical statement, established mathematical formula, or proven theorem can be used as the generic promise for facilitating the above deductive inferring process. Corollary 1. A sound deductive inference is yielded iff all premises are true and the argument is valid. Corollary 1 may be used to avoid any deductive dilemma and falsity in logical reasoning.
Induction Definition 5. Induction is a cognitive process by which a general conclusion is drawn from a set of specific premises based mainly on experience or experimental evidences. Induction is a reasoning process that derives a general rule, pattern, or theory from summarizing a series of stimuli or events. In contrary to the deductive inference approach, induction may introduce uncertainty during the extension of limited observations into general rules. Inductive inferences encompass rule learning, category formation, generalization, and analogy. Theorem 2. A generic inference formula of logical induction states that, if ∃a, k, succ(k) ∈ X, p(a) and p(k) ⇒ p(succ(k)) are three valid predicates, then a generic conclusion on ∀x ∈ X, p(x) can be drawn as follows: ((∃a ∈ X, p(a)) ∧ (∃k, succ(k) ∈ X, (p(k) ⇒ p(succ(k)))) ∀x ∈ X, p(x)
(11)
where succ(k) denotes the next element of k in X. A composite form of equation 11 can be given below:
95
The Cognitive Processes of Formal Inferences
((∃a ∈ X, p(a) ⇒ q(a)) ∧ (∃k, succ(k) ∈ X, ((p(k) ⇒ q(k)) ⇒ (p(succ(k)) ⇒ q(succ(k))))) ∀x ∈ X, p(x) ⇒ q(x) (12) Theorem 2 indicates that for a finite list or an infinite sequence of recurring patterns, three samplings (two determinate and one random) are sufficient to determine the behavior of the given list or sequence of patterns. Therefore, logical induction is a tremendous powerful and efficient cognitive and inferring tool in science and engineering, as well as in everyday life. It is noteworthy that because of the limitation of samples, logical induction may result in faulty proofs or conclusions. Therefore, as a rule of thumb, the inference results of logic inductions need to be evaluated or validated by more random samples. Corollary 2. A cogent inductive inference is yielded iff all premises are true and the argument is valid. Corollary 2 may be used to avoid any inductive dilemma in logical reasoning.
Abduction Definition 6. Abduction is a cognitive process by which an inference to the best explanation or most likely reason of an observation or event is resulted. Abduction is widely used in causal reasoning, particularly when a change of events need to be traced back where not all of the events have been observed. Theorem 3. A generic inference formula of logical abduction states that based on a general implication ∀x ∈ X, p(x) ⇒ q(x), a specific conclusion on ∃a ∈ X, p(a) can be drawn as follows: (∀x ∈ X, p(x) ⇒ q(x)) (∃a ∈ X, q(a) ⇒ p(a))
(13)
A composite form of Equation 13 can be given below: (∀x ∈ X, p(x) ⇒ q(x) ∧ r(x) ⇒ q(x)) (∃a ∈ X, q(a) ⇒ (p(a) ∨ r(a)))
(14)
Abduction is a powerful inference technique for seeking the most likely cause(s) and reason(s) of an observed phenomenon in causal analyses.
Analogy Definition 7. Analogy is a cognitive process by which an inference about the similarity of the same relations holds between different domains or systems, and/or examines that if two things agree in certain respects then they probably agree in others. Analogy is a mapping process that identifies relation(s) in order to understand one situation in terms of another. Analogy can be used as mental model for understanding new domains, explaining new phenomena, capturing significant parallels across different situations, describing new concepts, and discovering new relations. Theorem 4. A generic inference formula of logical analogy states that based on a specific predicates ∃a ∈ X, p(a), a similar specific conclusion can be drawn iff ∃x ∈ X, p(x) as follows: ∃x ∈ X, p(x) ∧ ∃a ∈ X, p(a) ∃b ∈ X ∧ b ≠ a, p(b)
96
(15)
The Cognitive Processes of Formal Inferences
Table 1. Summary of the mathematical models of formal inferences No.
Inference technique Deduction
1
Formal description Primitive form
Composite form
Usage
∀x ∈ X, p(x) ∃a ∈ X, p(a)
(∀x ∈ X, p(x) ⇒ q(x)) ∃a ∈ X, p(a) (∃a ∈ X, p(a) ⇒ q(a))
To derive a conclusion based on a known and generic premise.
((∃a ∈ X, P(a)) ∧ (∃k, k+1 ∈ X, (P(k) ⇒ P(k+1))) ∀x ∈ X, P(x)
((∃a ∈ X, p(a) ⇒ q(a)) ∧ (∃k, k+1 ∈ X, ((p(k) ⇒ q(k)) ⇒ (p(k+1) ⇒ q(k+1)))) ∀x ∈ X, p(x) ⇒ q(x)
To determine the generic behavior of the given list or sequence of recurring patterns by three samples.
(∀x ∈ X, p(x) ⇒ q(x)) (∃a ∈ X, q(a) ⇒ p(a))
(∀x ∈ X, p(x) ⇒ q(x) ∧ r(x) ⇒ q(x)) (∃a ∈ X, q(a) ⇒ (p(a) ∨ r(a)))
To seek the most likely cause(s) and reason(s) of an observed phenomenon.
∃a ∈ X, p(a) ∃b ∈ X, p(b)
(∃a ∈ X, p(a) ⇒ q(a)) (∃b ∈ X, p(b) ⇒ q(b))
To predict a similar phenomenon or consequence based on a known observation.
(Equation 9/10) Induction 2
(Equation 11/12) Abduction
3 (Equation 13/14) Analogy 4 (Equation 15/16)
A composite form of Equation 15 can be given below: ∃x ∈ X, p(x) ∧ (∃a ∈ X, p(a) ⇒ q(a)) (∃b ∈ X ∧ b ≠ a, p(b) ⇒ q(b))
(16)
Analogy is widely used to predict a similar phenomenon or consequence based on a known observation. The four inference methodologies, deduction, induction, abduction, and analogy, form a set of fundamental cognitive processes of the natural intelligence, which are modeled in LRMB (Wang et al., 2006). A summary of the formal definitions and mathematical models of the four forms of inference techniques is provided in Table 1.
THEPROCE OFFORMAL This section formally describes the four cognitive processes of deduction, induction, abduction, and analogy using RTPA (Wang, 2002a, 2003b, 2006a, 2007a). RTPA is designed for describing the architectures, static and dynamic behaviors of software systems as well as human cognitive behaviors and sequences of actions. In the discussions, a generic model, the Object-Attribute-Relation (OAR) model (Wang et al., 2003; Wang, 2007c), is adopted to describe internal knowledge representation. The RTPA description of the cognitive processes of inferences provides rigorous models of the mental processes, which enable accurate and precise reasoning of the natural intelligence. The formal models also enable computer simulations of human thinking mechanisms as a set of cognitive processes (Wang, 2006b, 2007b, 2007d).
The Cognitive Process of Deduction Based on the mathematical model of deduction as described in equations 9 and 10, the cognitive process of deduction is presented in Figure 1. The deduction process is divided into three sub-processes known as: (i) To form the deductive goal; (ii.a) To search and valid primitive predicate in memory; (ii.b) To search and valid composite predicate in memory; and (iii) To represent and memorize the deduction result. The input of the deduction process is the deductive goal aS and the abstract properties p(x)BL and q(x)BL. The output of the deduction process is the validation of the deduction result p(a | ∃a ∈ X)BL and the memorization of the updated OAR’ST in memory. In the deduction process, Step (i) forms one or multiple deductive goal(s) oS by identifying the object and abstracting its property or category; Steps (ii.a) and (ii.b) search and validate primitive and/or composite deductive predicates in memory in parallel. The former repetitively searches all related objects
97
The Cognitive Processes of Formal Inferences
Figure 1. The cognitive process of deduction in RTPA The Deduction Process Deduction (I:: aS; p(x)BL, q(x)BL; O:: p(a | ∃a ∈ X)BL, OAR’ST) { I. Form deductive goal(s) oS := aS ĺ ObjectIdentification (oS) // Set deductive goal Z Abstraction (X | o ∈ X) // Abstract the property or category of o II.a Search and valid primitive prodecate in memory ĺ(
Sat(p)BL=T GiveupBL=T
R
Sat(p) F
( Z Search (p(x B b)BL = T) Z Evaluate (p) )
ĺ ( C p(x B b)BL = T
// If premise is true
ĺ p(o B b)BL := T ĺ R’ := |C~
// Otherwise
ĺ p(o B b)BL := F ĺ R’ := Ø ) ) II.b Search and valid composite predicate in memory Sat(p q)BL =T GiveupBL =T
R
|| (
Sat(p q) F
( Z Search (p(x B b)BL = T º p(x B c)BL = T ) Z Evaluate (p)
Z Evaluate (q) ) ĺ ( C (p(x B b)BL = T º p(x B c)BL= T)BL = T ĺ (p(o B b)BL := T º p(o B c)BL= T)BL := T ĺ R’ := |C~ ĺ (p(o B b)BL := F º p(o B c)BL= T)BL := F ĺ R’ := Ø ) ) III. Represent and memorize deduction result ĺ sOARST := (o, A, R’) ST // Form new OAR model Z Memorization (OARST sOARST) }
98
The Cognitive Processes of Formal Inferences
x that is equivalent () to b until the search is successful or gave up. If the premise is true based-on the search, the conclusion p(o b)BL is validated. The latter does the same with the composite deduction. Finally, Step (iii) represents the deduction result by a sub-OAR model (o, A, R’)ST, and memorizes it by a composition operation OARST sOARST between the newly established sOARST and the entire OARST (Wang, 2006c, 2007e).
The Cognitive Process of Induction Based on the mathematical model of induction as described in Equations 11 and 12, the cognitive process of induction is presented in Figure 2. The induction process is divided into two sub-processes known as: (i.a) To check primitive predicate; (i.b) To check composite predicate; and (ii) To represent and memorize the induction result. Figure 2. The cognitive process of induction in RTPA
The Induction Process Induction (I:: XSET; p(x)BL, q(x)BL; O:: p(x | ∀x ∈ X)BL, OAR’ST) { I.a Check primitive predicate ĺ( ( C p(a | a ∈ X)BL = T ∧ p(k | k ∈ X)BL = T ∧ p(succ(k) | succ(k) ∈ X)BL = T ĺ p(x | ∀x ∈ X)BL = T ĺ o := X ĺ R’ := <X, p(x | x ∈ X)> |C~ ĺ p(x | ∀x ∈ X)BL = F ĺ o := Ø ĺ R’ := Ø ) ) II.b Check composite predicate || ( ( C p(a | a ∈ X)BL = T º q(a | a ∈ X)BL = T ∧ p(k | k ∈ X)BL = T º q(k | k ∈ X)BL = T ∧ p(succ(k) | succ(k) ∈ X)BL = T º q(succ(k) | succ(k) ∈ X)BL = T ĺ p(x | ∀x ∈ X)BL = T º q(x | ∀x ∈ X)BL = T ĺ o := X ĺ R’ := <X, p(x | x ∈ X)> |C~ ĺ p(x | ∀x ∈ X)BL = F ĺ o := Ø ĺ R’ := Ø ) ) II. Represent and memorize induction result ĺ sOARST := (o, A, R’) ST // Form new OAR model Z Memorization (OARST sOARST) }
99
The Cognitive Processes of Formal Inferences
The input of the induction process is a set of inductive samples XSET and the abstract properties p(a)BL and q(a)BL. The output of the induction process is the validation of the induction result p(x | ∀x ∈ X)BL and the memorization of the updated OAR’ST in memory. In the induction process, Step (i.a) checks the primitive induction by three samples in X, i.e. xS = aS (a specific, usually the first, element), xS = kS (a random element), and xS = succ(k)S (the next element following k). If all three samples confirm p(x | x ∈ X)BL is true, an induction result p(x | ∀x ∈ X)BL = T is achieved. Step (i.b) does the same reasoning with the composite induction in parallel. Finally, Step (ii) represents the induction result by a sub-OAR model (o, A, R’)ST, and memorizes it by a composition operation OARST sOARST between the newly established sOARST and the entire OARST (Wang, 2006c, 2007e).
The Cognitive Process of Abduction Based on the mathematical model of abduction as described in equations 13 and 14, the cognitive process of abduction is presented in Figure 3. The abduction process is divided into four sub-processes known as: (i) To form the abductive goal; (ii.a) To search abductive predicate; (ii.a) To search composite abductive predicate; (iii) To valid the abductive predicate; and (iv) To represent and memorize the abduction result. The input of the abduction process is the abductive goal aS and the abstract properties p(x)BL, q(x)BL, and r(x)BL. The output of the abduction process is the validation of the abduction result p(q(a) ⇒ p(a) | ∃a ∈ X)BL and the memorization of the updated OAR’ST in memory. In the abduction process, Step (i) forms one or multiple abductive goal(s) oS by identifying the object and abstracting its property or category; Steps (ii.a) and (ii.b) search primitive and/or composite abductive propositions in memory in parallel. The former repetitively searches all related objects x that validates proposition p(x b)BL = T ⇒ q(x c)BL = T, until the search is successful or gave up. The latter does the same with the composite deduction. Step (iii) validates the abduction result. If the premise is true based-on the search in Steps (ii.a) and/or (ii.b), the conclusion p(q(a) ⇒ p(a) | ∃a ∈ X)BL is validated. Finally, Step (iv) represents the abduction result by a sub-OAR model (o, A, R’)ST, and memorizes it by a composition operation OARST sOARST between the newly established sOARST and the entire OARST (Wang, 2006c, 2007e).
The Cognitive Process of Analogy Based on the mathematical model of analogy as described in equations 15 and 16, the cognitive process of analogy is presented in Figure 4. The analogy process is divided into three sub-processes known as: (i) To form the analogical goal; (ii.a) To search primitive analogy predicate; (ii.b) To search composite analogy predicate; and (iii) To represent and memorize the analogue result. The input of the analogy process is the analogy goal aS and the abstract properties p(x)BL and q(x)BL. The output of the analogy process is the validation of the analogy result p(b | ∃b ∈ X)BL and the memorization of the updated OAR’ST in memory. In the analogy process, Step (i) forms the analogical goal aS and the abstract properties p(x)BL and q(x)BL. Steps (ii.a) searches the primitive analogical propositions in memory for p(a k)BL = T ⇒ p(b k)BL = T, until the search is successful or gave up. The existence of p(b k)BL = T validates the analogy based on p(a k)BL = T. Step (ii.b) does the same analogical reasoning with the composite analogy in parallel with the primitive analogy. Finally, Step (iii) represents the analogy result by a sub-OAR model (o, A, R’)ST, and memorizes it by a composition operation OARST sOARST between the newly established sOARST and the entire OARST (Wang, 2006c, 2007e).
APPLICATION OFTHEFORMALPROCEE The formal modeling of the cognitive processes of formal inference is not only important for revealing the fundamental mechanisms of the brain, but also inspiring for the investigation of the next generation intelligent computers known as the cognitive computers. This section describes the applications of the formal inference processes in the design of cognitive computers (Wang, 2006b).
100
The Cognitive Processes of Formal Inferences
Figure 3. The cognitive process of abduction in RTPA The Abduction Process Abduction (I:: aS; p(x)BL, q(x)BL, r(x)BL; O:: p(q(a) ⇒ p(a) | ∃a ∈ X)BL, OAR’ST) { I. Form abductive goal(s) oS := aS ĺ ObjectIdentification (oS) // Set abductive goal Z Abstraction (X | o ∈ X) // Abstract the property or category of o II.a Search primitive abductive predicate Sat(p ⇒ q)BL=T GiveUpBL =T
ĺ(
R
Sat(p ⇒ q)BL F
( Z Search (p(x B b)BL = T º q(x B c)BL = T ) Z Evaluate (p⇒q) )
) II.b Search composite abductive predicate Sat(p ⇒ q r ⇒ q)BL =T GiveUpBL=T
|| (
R
Sat(p ⇒ q r ⇒ q)BL F
(Z Search ( p(x B b)BL = T º q(x B c)BL = T ) Z Search (r(x B d)BL = T º q(x B c)BL = T ) Z Evaluate (p⇒q) Z Evaluate (r⇒q ) )
) III. Valid abductive predicate ĺ ( C (p(x B b)BL = T º q(x B c)BL = T)BL = T ĺ (q(o B c)BL = T º p(o B b)BL = T )BL = T ĺ R’ :=
// Primitive abduction
| C (p(x B b)BL = T º q(x B c)BL = T)BL = T ∧ (r(x B d)BL = T º q(x B c)BL = T)BL = T ĺ (q(o B c)BL = T º (p(o B b)BL = T ∨ r(o B d)BL = T )BL = T ĺ R’ := // Composite abduction |C~ ĺ R’ := Ø ) IV. Represent and memory abduction result ĺ sOARST := (o, A, R’) ST // Form new OAR model Z Memorization (OARST sOARST) }
101
The Cognitive Processes of Formal Inferences
Figure 4. The cognitive process of analogy in RTPA The Analogy Process Analogy (I:: aS; p(x)BL, q(x)BL; O:: p(b | ∃b ∈ X)BL, OAR’ST) { I. Form analogical goal(s) oS := aS ĺ ObjectIdentification (oS) // Set deductive goal ĺ Abstraction (X | oS ∈ X) // Abstract the property or category of o II.a Search primitive predicate Sat(p)BL =T GiveUpBL =T
R
ĺ(
Sat(p) F
( Z Search (p(b B k)BL = T) Z Evaluate (p) )
ĺ ( C p(b B k)BL = T
// If premise is true
ĺ p(o B k)BL = T ⇒ p(b B k)BL = T ĺ R’ := <(a, b), (p(a) ⇒ p(b)> |C~ // Otherwise ĺ p(o B k)BL = T i p(b B k)BL = T ĺ R’ := Ø ĺØ ) ) II.b Search composite predicate Sat(p ⇒ q)BL =T GiveUpBL =T
R
|| (
Sat(p ⇒ q) F
(Z Search (p(b B k)BL = T º
q(b B k)BL = T ) Z Evaluate (p⇒q) ) ĺ ( C (p(b B k)BL = T º q(b B k)BL = T)BL = T ĺ (p(o B k)BL = T º q(o B k)BL = T)BL = T ⇒ (p(b B k)BL = T º q(b B k)BL = T)BL = T ĺ R’ := <(a, b), (p(a) º q(a)) ⇒ (p(a) º q(a))> |C~ ĺ (p(o B k)BL = T º q(o B k)BL = T)BL = T i (p(b B k)BL = T º q(b B k)BL = T)BL = T ĺ R’ := Ø ) ) III. Represent and memory analogue result ĺ sOARST := (o, A, R’) ST // Form new OAR model Z Memorization (OARST sOARST) }
102
The Cognitive Processes of Formal Inferences
Definition 8. The architecture of a cognitive computer (CC) is a parallel structured Inference Engine (IE) and a Perception Engine (PE), i.e.: CC (IE || PE) = ( KPU // The Knowledge Processing Unit || BPU // The Behavior Processing Unit ) || ( BPU // The Behavior Perception Unit || EPU // The Event Perception Unit )
(17)
As shown in Definition 8, CC is not centered by a CPU for data manipulation as that of the conventional computers with the von Neumann architecture. However, CC is centered by the concurrent IE and PE for cognitive knowledge processing and autonomic perception based on abstract concept inferences and empirical stimulus perception. In the architecture of CCs, IE is designed for formal inferences and thinking based on the four cognitive inference processes and for concept/knowledge manipulation based on concept algebra (Wang, 2006c, 2006d), particularly the 9 concept operations for knowledge acquisition, creation, and manipulation. PE is designed for feeling and perception processing based on RTPA (Wang, 2002a, 2003b, 2006a, 2007a) and the formally described cognitive process models of the perception layers as defined in the LRMB model (Wang et al., 2006). Definition 9. Concept algebra is an abstract mathematical structure for the formal treatment of concepts and their algebraic relations, operations, and associative rules for composing complex concepts. Associations of concepts, ℜ, defined in concept algebra form a foundation to denote complicated relations between concepts in knowledge representation. The associations between concepts can be classified into nine categories, such as inheritance, extension, tailoring, substitute, composition, decomposition, aggregation, specification, and instantiation, i.e.: +
, , , , , ®} ℜ = {⇒, ⇒, ⇒, ⇒
(18)
According to concept algebra, a concept is the basic unit of thinking and formal inference (Wang, 2006c, 2006d). Human knowledge can be formally represented by concept networks in the form of the OAR model (Wang, 2007c). The formal modeling of concepts and their manipulation in formal inference form a foundation for machine intelligence beyond conventional data processing. The formal inference processes are frequently applied by other higher-layer cognitive processes such as those of knowledge presentation, comprehension, learning, decision making, and problem solving (Wang et al., 2006; Wang, 2007a).
CONCLUION This chapter has explained how rigorous thinking may be carried out by formal inferences. It has demonstrated that formal inferences in the brain may be embodied by the cognitive processes of deduction, induction, abduction, and analogy. Formal descriptions of the four forms of cognitive processes of inferences have been presented using Real-Time Process Algebra (RTPA). A set of rigorous and dynamic inference methodologies has been introduced, which are modeled and described as a set of cognitive processes encompassing a series of basic inference steps. It has been recognized that theoretical research is predominately an inductive process; while applied research is mainly a deductive process. All forms of formal inference processes are based on the cognitive process and means of abstraction and symbolic representation, because the basic unit of human language is abstract concepts. In order to seek generality and universal truth, either the objects or the relations can only be abstractly described
103
The Cognitive Processes of Formal Inferences
and rigorously inferred by abstract models rather than real-world details. Applications of the abstraction and formal inferences in revealing the fundamental mechanisms of the brain and in investigating the next generation cognitive computers have been explored.
AC The author would like to acknowledge the Natural Science and Engineering Council of Canada (NSERC) for its support to this work. We would like to thank the anonymous reviewers for their valuable comments and suggestions.
REFERENCE Bender, E.A. (1996). Mathematical methods in artificial intelligence. Los Alamitos, CA: IEEE CS Press Hurley, P.J. (1997). A concise introduction to logic (6th ed.). London: Wadsworth Publishing Co., ITP. Lipschutz, S. (1964), Schaum’s outline of theories and problems of set theory and related topics. New York, NY: McGraw-Hill Inc. Schoning, U. (1989). Logic for computer scientists. Boston: Birkhauser. Smith, K.J. (2001). The nature of mathematics (9th ed.). CA: Brooks/Cole, Thomson Learning Inc. Sperschneider, V., & Antoniou, G. (1991). Logic: A foundation for computer science. Reading, MA: AddisonWesley. Tomassi, P. (1999). Logic. London and New York: Routledge. van Heijenoort, J. (1997). From Frege to Godel, A source book in mathematical logic 1879-1931. Cambridge, MA: Harvard University Press. Wang, Y. (2002a). The real-time process algebra (RTPA). Annals of Software Engineering: An International Journal, 14, 235-274. Oxford: Baltzer Science Publishers. Wang, Y. (2002b, July). On cognitive informatics. Keynote speech of the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 3-7). Beijing, China: IEEE CS Press. Wang, Y. (2003a). On cognitive informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 151-167. Wang, Y. (2003b). Using process algebra to describe human and software behaviors. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 199-213. Wang, Y. (2005, August). The cognitive processes of abstraction and formal inferences. Proceedings 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 18-26). Irvin, California: IEEE CS Press.. Wang, Y. (2006a, March). On the informatics laws and deductive semantics of software. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 161-171. Wang, Y. (2006b, July). Cognitive informatics - Towards the future generation computers that think and feel. Keynote speech of the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp.3-7). Beijing, China: IEEE CS Press. Wang, Y. (2006c, July). On concept algebra and knowledge representation. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 320-331. Beijing, China:). IEEE CS Press.
104
The Cognitive Processes of Formal Inferences
Wang, Y. (2006d, July). Cognitive informatics and contemporary mathematics for knowledge representation and manipulation. Proceedings of the 1st International Conference on Rough Set and Knowledge Technology (RSKT’06) (pp. 69-78). Lecture Notes on Artificial Intelligence, LNAI, 4062. Chongqing, China: Springer. Wang, Y. (2007a, June). Software Engineering Foundations: A Software Science Perspective, CRC Series of Software Engineering,2. New York:CRC Press. Wang, Y. (2007b, January). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(1), 1-27. Hershey, PA: IGI Global. Wang, Y. (2007c, July). The OAR model of neural informatics for internal knowledge representation in the brain. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) 1(3), 64-75. USA: IPI Publishing. Wang, Y. (2007d, July). Toward theoretical foundations of autonomic computing. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(3), 1-16. USA: IPI Publishing. Wang, Y. (2007e, August). Formal description of the cognitive process of memorization. Proceedings of the 6th IEEE International Conference on Cognitive Informatics (ICCI’07). California, USA: IEEE CS Press. Wang, Y., Liu, D., & Wang, Y. (2003). Discovering the capacity of human memory. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 189-198. Wang, Y., & Kinsner, W. (2006, March). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 121-123. Wang, Y., & Wang, Y. (2006, March). Cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 203-207. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB), IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 124-133. Wilson, R.A., & Keil, F.C. (2001). The MIT Encyclopedia of the Cognitive Sciences. MIT Press.
105
106
Chapter VII
Neo-Symbiosis:
The Next Stage in the Evolution of Human Information Interaction Douglas Griffith General Dynamics Advanced Information Systems, USA Frank L. Greitzer Pacific Northwest National Laboratory, USA
ABTRACT The purpose of this article is to re-address the vision of human-computer symbiosis as originally expressed by J.C.R. Licklider nearly a half-century ago and to argue for the relevance of this vision to the field of cognitive informatics. We describe this vision, place it in some historical context relating to the evolution of human factors research, and observe that the field is now in the process of re-invigorating Licklider’s vision. A central concept of this vision is that humans need to be incorporated into computer architectures. We briefly assess the state of the technology within the context of contemporary theory and practice, and we describe what we regard as this emerging field of neo-symbiosis. Examples of neo-symbiosis are provided, but these are nascent examples and the potential of neo-symbiosis is yet to be realized. We offer some initial thoughts on requirements to define functionality of neo-symbiotic systems and discuss research challenges associated with their development and evaluation. Methodologies and metrics for assessing neo-symbiosis are discussed.
Background In 1960, J.C.R. Licklider wrote in his paper “Man-Machine Symbiosis,” The hope is that in not too many years, human brains and computing machines will be coupled together very tightly, and that the resulting partnership will think as no human brain has ever thought and process data in a way not approached by the information-handling machines we know today (p. 5). This statement is breathtaking for its vision — especially considering the state of computer technology at that time, that is, large mainframes, punch cards, and batch processing. The purpose of this article is to re-address
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Neo-Symbiosis
Licklider’s vision and build upon his ideas to inform contemporary theory and practice within the broader field of human factors as well as to offer a historical perspective for the emerging field of cognitive informatics. It is curious to note that Licklider did not use the term symbiosis again, but he did introduce more visionary ideas in a symbiotic vein. A paper he co-authored with Robert Taylor, titled “The Computer As a Communication Device,” made the bold assertion, “In a few years, men will be able to communicate more effectively through a machine than face to face” (p. 21). Clearly the time estimate was optimistic, but the vision was noteworthy. Licklider and Taylor described the role of the computer in effective communication by introducing the concept of “On-Line Interactive Vicarious Expediter and Responder” (OLIVER), an acronym that by no coincidence was chosen to honor artificial intelligence researcher and the father of machine perception, Oliver Selfridge. OLIVER would be able to take notes when so directed, and would know what you do, what you read, what you buy and where you buy it. It would know your friends and acquaintances and would know who and what is important to you. This paper made heavy use of the concept of “mental models,” relatively new to the psychology of that day. The computer was conceived of as an active participant rather than as a passive communication device. Remember that when this paper was written, computers were large devices used by specialists. The age of personal computing was off in the future. Born during World War II, the field of human factors engineering (HFE) gained prominence for its research on the placement of controls — commonly referred to as knobology within the field of HFE, which was an unjust characterization. Many important contributions were made to the design of aircraft, including controls and displays. With strong roots in research on human performance and human errors, the field gained prominence through the work of many leaders in the field who came out of the military: Alphonse Chapanis, a psychologist and a Lieutenant in the U.S. Air Force; Alexander Williams, a psychologist and naval aviator; Air Force Colonel Paul Fitts; and J.C.R. Licklider. Beginning with Chapanis, who realized that “pilot errors” were most often cockpit design errors that could be corrected by the application of human factors to display and controls, these early educators were instrumental in launching the discipline of aviation psychology and HFE that led to worldwide standards in the aviation industry. These men were influential in demonstrating that the military and aviation industry could benefit from research and expertise of the human factors academic community; their works (Fitts, 1951a) were inspirational in guiding research and design in engineering psychology for decades. Among the most influential early articles in the field that came out of this academic discipline was George Miller’s (1956) “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity to Process Information,” which heralded the field of cognitive science and application of quantitative approaches to the study of cognitive activity and performance. An early focus of HFE was to design systems informed by known human information processing limitations and capabilities — systems that exploit our cognitive strengths and accommodate our weaknesses (inspired by the early ideas represented in the Fitts’ List that compared human and machine capabilities; Fitts, 1951b). While the early HFE practice emphasized improvements in the design of equipment to make up for human limitations (reflecting a tradition of machine centered computing), a new way of thinking about human factors was characterized by the design of the human-machine system, or more generally, human- or user-centered computing (Norman & Draper, 1986). The new subdiscipline of interaction design emerged in the 1970s and 1980s that emphasizes the need to organize information in ways to help reduce clutter and “information overload” and to help cope with design challenges for next-generation systems that will be increasingly complex while being staffed with fewer people. Emphasis on human cognitive processes, and on the need to regard the human-machine system as a joint cognitive system, represented a further refinement that has been called cognitive systems engineering (Hollnagel & Woods, 1983). Fundamental to all of these approaches and perspectives on HFE is the overriding principle to “know your user.” In a recent critical essay, Don Norman (2005) asks us to re-assess the human-centered design perspective: Developed to overcome the poor design of software products, human-centered design emphasized the needs and abilities of users and improved the usability and understandability of products. But despite these improvements, software complexity is still with us. Norman goes on to ask why so many designs of everyday things work so well, even without the benefit of user studies and human-centered design. He suggests that they all were “developed with a deep understanding of the activities that were to be performed (p.14).” Successful designs are those that fit gracefully into the requirements of the underlying activity. Norman does not reject human-centered design,
107
Neo-Symbiosis
but rather encompasses it within a broader perspective of activity-centered design. Further, he questions a basic tenet of human centered design that technology should adapt to the human, rather than vice versa. He regards much of human behavior as an adaptation to the “powers and limitations of technology.” Activity-centered design aims to exploit this fact. Other perspectives suggest that the focus of design should be on human-information interaction rather than human-computer interaction. Gershon (1995) coined the term Human-Information Interaction (HII) to focus attention on improving the way people “find, interact with, and understand information.” As such, HII includes aspects of many traditional research efforts, including usability evaluation methods and cognitive task analysis, but also design concepts that address the ethnographic and ecological environment in which action takes place. Examples of work in this area include distributed cognition (Zhang & Norman, 1994), naturalistic and recognition-primed decision making (Zsombok, 1997); and information foraging and information scent (Pirolli & Card, 1999). In summary, over the last half century or so, the field of human factors has evolved through a series of modest perspective shifts and insights that have yielded a fair degree of success in approaches, methods, and techniques for design and evaluation of systems that are created to support and enhance human-information interaction. The many labels that have been applied to the field (cognitive engineering, human-centered computing, participatory design, decision centered design, etc.) are all “differently hued variants of the same variety” (Hoffman, Feltovich, Ford, Woods, Klein & Feltovich, 2002). Engineering psychology and human factors are moving to a more encompassing scope of the field. Raja Parasuraman (2003) married neuroscience with ergonomics and termed it neuroergonomics. Don Norman (2004) incorporated affect (emotion) into the field with his book, Emotional Design: Why We Love (or Hate) Everyday Things. Hancock, Pepe and Murphy (2005) are developing the concept of hedonomics. They have developed a hierarchy of ergonomics and hedonomic needs derived from Maslow’s (1970) hierarchy of needs: safety, the prevention of pain, forms the foundation of this pyramid; next comes functionality, the promulgation of process; then usability, the priority of preference (the transition from ergonomics to hedonomics begins at the usability layer); the next layer is pleasurable experience; and the apex of the pyramid comprises individuation and personal perfection. So the field is beginning to address the enhancement of individual potential. Recent research in the emerging field of cognitive informatics (Wang, 2005a, b) addresses Maslow’s hierarchy of needs within a formal model that attempts to capture the relationships among human factors and basic human needs. Recently a new research thrust has emerged that aims to shift the focus once more to not only enhancing the interaction environment, which is the aim of cognitive systems engineering, but also to enhance the cognitive abilities of the human operators and decision makers themselves. The Augmented Cognition program (Schmorrow & Kruse, 2004) within the DARPA Information Processing Technology Office (IPTO) aims to monitor and assess the user’s cognitive state through behaviorally and neurologically derived measures acquired from the user while interacting with the system and then to adapt or augment the computational interface to improve performance of the user-computer system. Schmorrow and McBride (2005) explain that this research is based on the view that the weak link in the human-computer system may be attributed to human information processing limitations, and that human and computer capabilities are increasingly reliant on each other to achieve maximal performance. Much of the research within the augmented cognition program seeks to further our understanding of how information processing works in the human mind so that augmentation schemes might be developed and exploited more effectively — in a variety of domains from clinical restoration of function to education to worker productivity to warfighting superiority. Thus, as described by Schmorrow and McBride: “the DARPA Augmented Cognition program at its core is an attempt to create a new frontier, not by optimizing the friendliness of connections between human and computer, but by reconceptualizing a true marriage of silicon- and carbon-based enterprises [International of Journal Human Computer Interaction, p. 128).” While augmented cognition exploits neuroscience research as a path toward symbiosis of humans and machines, research in cognitive informatics embraces neuroscience research as a potential model and point of departure for “brain-like” machine-based cognitive systems that may someday exhibit human-like properties of sensation, perception, and other complex cognitive behaviour (Anderson, 2005a, b). We believe that neo-symbiosis provides a strong contextual framework to organize and guide research in cognitive informatics.
108
Neo-Symbiosis
Neo-Symbiosis Once more, then, we are on the threshold of resurrecting a vision of symbiosis – but today we have the advantage of far greater computational resources and decades of evolution in the field of human factors/cognitive engineering. Licklider’s notion of symbiosis does require updating. First, the term “man/machine symbiosis” is politically incorrect and would be more appropriately termed “human/machine symbiosis.” Then there is a problem with the term symbiosis itself. Symbiosis implies co-equality between mutually supportive organisms. However, we contend that the human must be in the superordinate position. The Dreyfuses (Dreyfus, 1972, 1979, 1992; Dreyfus & Dreyfus, 1986) have made compelling arguments that there are fundamental limitations to what computers can accomplish, limitations that will never be overcome (Dreyfus & Dreyfus, 1986). In this case, it is important that the human remain in the superordinate position so that these computer limitations can be circumvented. On the other hand, Kurzweil has argued for the unlimited potential of computers (Kurzweil, 1999). But should it be proven that computers do, indeed, have this unlimited potential, then some attention needs to be paid to Bill Joy and his nightmarish vision of the future should technology go awry (Joy, 2000). In this case, humans would need to be in the superordinate position for their own survival. Griffith (2005a) has suggested the term neo-symbiosis for this updated vision of symbiosis. The augmented cognition research community is taking Licklider’s vision quite literally in exploring technologies for acquiring, measuring, and validating neurological cognitive state sensors to facilitate human-information interaction and decision-making. Neurobiologically inspired forms of symbiosis, while consistent with the metaphor that Licklider used, were not a focus of Licklider’s vision; but the possibilities for enhanced cognitive performance are enticing. Clearly, however, much work is required to achieve a brain-computer interface that might be called neo-symbiotic. Much of the effort in this field to date has focused on cognitive activity that tends to be more oriented toward attention and perception processes, and less toward decision making and thinking. In this sense, augmented-cognition neurological inputs can help to approach neo-symbiosis by providing information to the computer that can in turn be fed back to the human in the form of adaptive displays and interactions or other functions aimed to mitigate the effects of stress or information overload. More ambitious goals of increasing total cognitive capacity through augmented cognition technologies are still on the horizon of this research program and recent offshoots of augmented cognition R&D such as DARPA’s Neurotechnology for Intelligence Analysts program1. Our interest, similarly, is in the current potential for enhanced human-computer collaboration that will achieve a level of performance that is superior to either the human or the computer acting alone. The principal reason that the beginning of the 21st century is so propitious for the reinvigoration of Licklider’s vision is the result of advancements in computer technology and psychological theory. Therefore, one of our major objectives is to increase the human’s understanding, accuracy, and effectiveness by supporting the development of creative insights. Understanding involves learning about the problem area and increasing the variety of contexts from which the problem can be understood. Enhanced accuracy/effectiveness can be achieved by endowing the computer with a variety of means to support the task or activity. Revisiting thoughtful prescriptions for such computer-based intelligent support capabilities from two decades ago, we find examples such as knowledge of the user’s goals and intentions, contextual knowledge (Croft, 1984). and “cognitive coupling” (Fitter & Sime, 1980) functions that include (Greitzer, Hershman & Kaiwi, 1985) the ability to inform the user about the status of tasks, remind the user to perform certain tasks, advise the user in selecting alternative actions, monitor progress toward the goal, anticipate requests to display or process information, and test hypotheses. In the context of information analysis tasks, examples of such neo-symbiotic contributions by the computer include considering alternative hypotheses, assessing the accuracy of intelligence sources, and increasing the precision of probability estimates through systematic revision. These types of activity-based support functions, enhanced by cognitive models, are the concepts that we believe will put us more solidly on the path to the original vision of Licklider, a neo-symbiosis where there is a greater focus on cognitive coupling between the human user and the computer.
109
Neo-Symbiosis
Neo-Symbiosis Research Agenda Requirements: Implementing Neo-Symbiosis How should neo-symbiosis be implemented? Fortunately, Kahneman (2002, 2003) and Kahneman and Frederick (2002) has provided guidance through a theoretical framework. In his effort to organize seemingly contradictory results in studies of judgment under uncertainty, he has advanced the notion of two cognitive systems introduced by Sloman (1996, 2002) and others (Stanovich, 1999; Stanovich & West, 2002). System 1, termed Intuition, is fast, parallel, automatic, effortless, associative, slow learning, and emotional. System 2, termed Reasoning, is slow, serial, controlled, effortful, rule-governed, flexible, and neutral. The cognitive illusions, which were part of the work for which he won the Nobel Prize, as well as perceptual illusions, are the results of System 1 processing. Expertise is primarily a resident of System 1. So are most of our skilled performance such as recognition, speaking, driving, and many social interactions. System 2, on the other hand, consists of conscious operations, such as what is commonly thought of as thinking. Table 1 summarizes these characteristics and relationships. The upper portion of the table describes human information processing characteristics and strengths, interpreted within Kahneman’s (2003) System 1/System 2 conceptualization. The bottom portion of the table represents an update of traditional characterizations of functional allocation based on human and computer capabilities, such as the original Fitts’ List (Fitts, 1951b), cast within the System 1/System 2 framework. System 1 is effective presumably due to evolutionary forces, massive experience, and by constraining context. Most of the time, it is quite effective. System 1 uses nonconscious heuristics to achieve these efficiencies, so occasionally it errs and misfires. Such misfires are responsible for perceptual and cognitive errors. One of the roles of System 2 is to monitor the outputs of System 1 processes. It is the System 2 processes that require computer support, not only with respect to the pure drudgery and slowness of human System 2 processes, but also with respect to the monitoring of System 1 processes. In most cases, however, it is a mistake to assign System 1 processes to the computer. This was the fundamental error in many automatic target recognition and image interpretation algorithms that attempted to automate the human out of the loop. Even to this day, computer technology has been unsuccessful in modeling human expertise in System 1 domains2. The perceptual recognition processes of most humans are excellent. System design should capitalize upon these superb processes and provide support to other areas of human information processing such as search (there is a tendency to overlook targets); interpretation keys to provide a check and support for the recognition process; analysis and synthesis (e.g., to augment reasoning processes); support to facilitate adjusting to changes in context (e.g., to maintain situational awareness); and computational support (e.g., to make predictions). The bottom portion of Table 1 exhibits examples of how human and computer contributions can be allocated to System 1 and System 2 processing in a neo-symbiotic system. Greitzer (2005b) has discussed the importance of identifying cognitive states in real-world decision-making tasks. A critical question here is, what are the cognitive states that need to be measured? What are the cognitive states that, if identified and measured, could enhance neo-symbiosis? Clearly it would be beneficial to identify neurological correlates for System 1 and System 2 processes. It would be especially beneficial to identify neurological correlates of System 2 while monitoring System 1 processing. Perhaps there is a neurological signature when potential errors are detected in System 1 processing. It is conceivable that some of these errors remain below the threshold of consciousness. If these errors were detectable in the neurological stream, computers could assist in this error monitoring process. As was mentioned previously, the identification of neurological correlates is not a requirement, nor is it the only enabler for neo-symbiosis. Griffith (2005b) has argued that neo-symbiosis can be achieved over a wide range of technological sophistication. Overviews and tutorials can be presented on basic human information processing capabilities, limitations, and biases. A software agent, or avatar, can pop up at strategic times with reminding prompts or checklists. Of course, the capability to monitor the human’s cognitive state through neurological correlates will enhance the ability of the avatar to pop up at strategic times. It might also be possible to monitor the content of the interactions with the computer to identify potential processing problems. Differences in processing time present is yet another potential source of information for detecting errors and biases. In our view, the thrust of the HII research agenda should be targeted at enhancing neo-symbiosis. A major focus of HII research today is aimed at visualization technology that processes and seeks to represent massive
110
Neo-Symbiosis
Table 1. System 1 and System 2 processes Human Processes System 1: Intuition
Processing Characteristicsa:
o o o o o o o
Fast Parallel Automatic Effortless Associative Slow-Learning Emotional
Type of Processing (Examples of Human Information Processing Strengths)
o o o
Expertise Skilled Performance Most Perception
System 2: Reasoning o o o o o o o
Slow Serial Controlled Effortful Rule-governed Flexible Neutral
o o o
Thinking Goal-driven Performance Anomaly and Paradox Detection
Neo-Symbiotic Functions System 1: Intuition
Examples of Human Contributions
o o o o o
Providing Context Detecting Contextual Shifts Intuition Pattern Recognition Creative Insights
o
Recognize Cognitive State Changes Adapt Displays/Interaction Characteristics to Human’s Cognitive State
o Examples of Computer Contributions
System 2: Reasoning o o o o o o o
Supervision/Monitoring Inductive Reasoning Adaptability to Change Contextual Evaluations Anomaly Recognition/Detection Goal-Driven Processes/Planning Creative Insights
o o o o o o o o o o o o
Deductive Reasoning Search Situational Awareness Analysis/Synthesis Hypothesis Generation/Tracking Computational Support Information Storage/Retrieval Multiprocessing Update Status of Tasks Advise on Alternatives Monitor Progress Monitoring System 1 Processes
a This portion of the table based on Kahneman (2003)
data in ways that facilitate insight and decision making. Data visualization technology seeks to facilitate visual thinking to gain an understanding of complex information, and perhaps most particularly to gain insights that would otherwise not be apparent from other data representations. A famous example of a successful visualization is the periodic table of elements (conceived by Mendeleev and published in 1869), which not only provided a simple display of known data but also pointed out gaps in knowledge that led to discoveries of new elements. However, creating novel visualizations of complex data (information visualization) does not guarantee success; there are arguably more examples of visualizations that have not lived up to expectations than success stories. A leap of faith is required to expect that a given scientific visualization will produce the “aha!” moment that leads
111
Neo-Symbiosis
to an insightful solution to a difficult problem. We assert that the key to a successful scientific visualization is its effectiveness in fostering new ways of thinking about a problem — in the System 1 sense as exemplified in Table 1 (e.g., seeing contextual shifts, recognizing new patterns, finding creative insights). This view stresses that the interaction component of HII needs to be emphasized. The human should not be regarded as simply a passive recipient of information display, however creative that information display might be. The human needs to be able to manipulate and interact with the information. The ability to manipulate information and view it in different contexts is key to the elimination of cognitive biases and to the achievement of novel insights (e.g., finding the novel intelligence in massive data). The goal is a neo-symbiotic interaction between the human and the information. Thus, requirements should be defined so that a neo-symbiosis can be achieved between humans and their technology. Questions to guide the requirements definition process for neo-symbiotic systems designed to facilitate HII include: •
•
•
• •
How can such systems be designed to mitigate or eliminate cognitive biases? Detecting/recognizing possible bias is one part of the challenge; an equally critical R&D goal is to define mitigation strategies. What types of interventions will be effective, and how should interventions be managed? We suggest that a mixed-initiative solution will be required that maintains the supervisory control of the human. How can such systems be designed to leverage the unique processing skills of humans? A prerequisite here is to identify the unique processing skills of humans. Technologies and approaches for developing idiosyncratic user models would be most useful. Moreover, expert users can identify and contribute their own unique skills: Consider an image interpretation system in which an expert with knowledge of a certain area could correct and elaborate upon outputs of image interpretation algorithms. How can such systems be designed to facilitate collaboration? One aim is to realize the assertion made by Licklider and Taylor (1968) that people will be able to communicate more effectively through a machine than face to face. How can such systems promote a more pleasurable experience? The goal here is to address some of the objectives outlined by Hancock et al. (2005). How can such systems help someone to leverage personal potential or overcome a personal deficit (e.g., through augmentative/assistive technology)? A major area of interest for neurally-based symbiotic studies is the use of implant technology in which a connection is made between technology and the human brain or nervous system. Important medical applications include restoring lost functionality in individuals due to neurological trauma or a debilitating disease, or for ameliorating symptoms of physical impairment such as blindness or deafness. Other applications that do not address medical needs but instead aim to enhance or augment mental or physical attributes provide a rich area of research in the growing area of augmented cognition. Warwick and Gasson (2005) review the field of research and describes his research and experiences as a researcher and experimental subject who is the first human to have a computer chip inserted into his body that enabled bidirectional information flow and demonstration of control of a remote robot hand using the subjects’ own neural signals (Gasson, Hutt, Goodhew, Kyberd & Warwick, 2002; Warwick & Gasson, 2005). Warwick and Gasson (2005) observe:
By linking the mental functioning of a human and a machine network, a hybrid identify is created. When the human nervous system is connected directly with technology, this not only affects the nature of an individual’s … identity, … but also it raises serious questions as to that individual’s autonomy. It should be appreciated, however, that assistive technology need not necessarily entail implants or any involvement with neurology. Indeed a great deal has already been accomplished via adaptive software and input and output devices (Griffith, 1990; Griffith, Gardner-Bonneau, Edwards, Elkind & Williges, 1989). •
112
What are implications and requirements for computer architectures to achieve neo-symbiosis? A central point underlying neo-symbiosis is that humans need to be included in the computer architecture or system design. It is anathema to the concept of neo-symbiosis that computers and humans be regarded in isolation.
Neo-Symbiosis
They need to be considered together with the objective of each exploiting the other’s potential and compensating for the other’s weaknesses. Ideally the interaction between the two will achieve a multiplicative effect, a true leveraging.
Metrics: Measuring Success An important question is how to identify neo-symbiotic design and how to assess it. It is important to recognize instances of neo-symbiotic design that are already among us in the form of productivity enhancement tools or job aids. For example, spell checking in contemporary word processors compensate for memory and perceptual/motor shortcomings; thesauruses leverage communicative abilities. Various creativity tools, such as concept mapping, leverage creative potential. In the augmented cognition domain, various neurologically-based “cognitive state sensors” are emerging as indicators of cognitive load and as potential cognitive prosthetics for medical purposes. In each of these cases, particularly the most recent developments that aim to enhance cognitive functions and effectiveness, evaluation methods and metrics are needed to guide research and facilitate deployment of technologies. For more advanced development of neo-symbiotic designs that aim to enhance human information processing and decision making (e.g., intelligence analysis performance) or knowledge/skill acquisition (e.g., training applications), we recognize the need for more rigorous evaluation methods and metrics that reflect the impact of the technology on performance. Of course, standard subjective measures can readily be expanded to include neo-symbiotic potential. Many subjective measures are interpreted in terms of usability. There are several sources of established guidelines for usability testing (e.g., Nielsen, 1993). Commonly used criteria include efficiency, ability to learn, and memorability. Usability measures the address of the experience of users; whether or not they found the tool useful, easy to learn, easy to use, and so forth. Often, users are asked to provide this sort of feedback using qualitative measures obtained through verbal (“out loud”) protocols and/or post-hoc comments (via questionnaires, interviews, ratings). Likert scales, in which respondents indicate their degree of agreement or disagreement with particular statements using numerical ratings, can use question stems such as: “Using this application/system enhanced my performance”; or “Using this application/system compensated for my information processing shortcomings.” Subjective measures such as these are designed to assess the acceptance by users of the system. It is unfortunate that the term subjective is used in a pejorative sense and that subjective measures are all too often regarded as second rate measures. Whether or not a system is perceived favorably and judged to be useful are central questions in evaluating the system’s value. Especially relevant to neo-symbiosis is the user’s assessment of the extent to which his or her potential has been enhanced. It is possible to use magnitude estimation to assess the subjective amount, or lack of, neo-symbiosis in an application/system. In magnitude estimation (Stevens, 1975), stimuli are evaluated with respect to a standard stimulus, or modulus. That standard stimulus is assigned a value, and other stimuli are evaluated proportionate to it. So if the modulus was assigned a value of 50, and the stimulus being rated was regarded as half of whatever the rating dimension was, it would be rated 25. Were it regarded as having twice the value on the rating dimensions, it would be rated 100. A given version of Microsoft Word™ could be assigned a value of 50. If someone regarded another word processor as being twice as neo-symbiotic as this version of Word, it would be rated 100. Were it regarded to be only half as neo-symbiotic, it would be rated 25. A desirable property of magnitude estimation methods is that they produce ratio scales. Magnitude estimation is a remarkably robust methodology. Its validity has been demonstrated with stimuli ranging from the loudness of tones to the seriousness of crimes. It uses an anchor to a standard that allows proportional assessments of where an issue, item, stimulus stands with respect to that standard. Thus, statements can be made that a product is 20% better than a related product, 40% worse, and so forth. These ratings are more meaningful and interpretable than many other subjective rating techniques. Whenever feasible, subjective measures should be supplemented with objective measures. Greitzer (2005a) has argued for development of measures of effectiveness based on performance impact in addition to the continued use of traditional subjective usability measures. User satisfaction is a necessary, but not sufficient measure. Behavioral measures are needed to address more cognitive factors and the utility of tools or technologies: Does technology X improve the throughput of cognitive tasks of type Y? Does it yield more efficient or higher quality output for certain types of tasks? Quantitative measures that assess utility may include efficiency in completing
113
Neo-Symbiosis
the task (time, accuracy, completeness). These will be most useful in comparing the utility of alternative tools or assessing the utility of a given tool vs. baseline performance without the tool. For example, in information analysis tasks, it has been observed (Scholtz, Morse & Hewett, 2004) that analysts tend to spend more time in data collection and report generation than in analysis activity (hence a kind of “bathtub curve” as described by Wilkins (2002) in the context of product reliability); tools or technologies that help alleviate the processing load for the collection phase and allow more time for analysis, for example, would be valued for their positive impact on performance (Badalamente & Greitzer, 2005). Time-based measures such as total time on task and dwell times can provide insight on user preferences and efficiency/impact of technologies being assessed (Sanquist, Greitzer, Slavich, Littlefield, Littlefield & Cowley, 2004). Other performance measures must be derived from specific decomposition of cognitive tasks. Greitzer (2005a) described examples of such analysis, within the information analysis domain, based on a decomposition of chains of reasoning (following the work of Hughes and Schum (2003) and analysis of behavior chains based on work of Kantor (1980) that was originally applied to evaluation of library science applications. While subjective measures provide weak support for neo-symbiosis, behavioral or performance measures provide strong support for neo-symbiosis. Absent behavioral or performance measures, questions remain as to the justification for the subjective ratings. Further research is needed to understand the basis for the subjective ratings.
Summary and Conclusion The convergence of developments in different fields provides the foundation for a quantum leap in HII. Advancements in computer technology, cognitive theory, and neuroscience provide the potential for significant advances. Moreover, there is a movement for a more encompassing view of the scope of the field of human factors and ergonomics. The objective has been raised from making technology usable to using technology to enhance human potential, which was the original goal set by Licklider in 1960. The fulfillment of this objective will require collaboration and interaction among the fields of cognitive science, neuroscience, and computer technology. Most of the work in human factors and ergonomics has been empirical. Only occasionally has the field drawn upon theory. The field of HII has been primarily technology driven. Programs and systems are developed on the bases of intuitions and what is regarded as cool and challenging by the developer, rather than from considerations of the information processing shortcomings and potential of the users. Very often techniques are not even subject to empirical assessment. But a strategy of generating an idea and then evaluating it empirically will not prove successful in the long run. HII requirements need to be developed not only on the basis of what a given system is being designed to accomplish, but also on the basis of theory and data in cognitive science and neuroscience. To sum up, we have argued that the field of HII is on the threshold of realizing a new vision of symbiosis — one that embraces the concept of mutually supportive systems, but with the human in a leadership position, and that exploits the advances in computational technology and the field of human factors/cognitive engineering to yield a level of human-machine collaboration and communication that was envisioned by Licklider, yet not attained. As we have described, the field of human factors/HII is not static, but rather must inexorably advance. With advances in computer technology, cognitive science, and neuroscience, human potential and fulfillment can be leveraged more, yielding a spiral of progress: As human potential is raised, then that new potential can be leveraged even further. We think this vision provides a useful framework for cognitive informatics.
References Anderson, J. A. (2005a). Cognitive computation: The Ersatz brain project. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 2-3). IEEE Computer Society. Anderson, J. A. (2005b). A brain-like computer for cognitive software applications: The Ersatz brain project. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 27-36). IEEE Computer Society.
114
Neo-Symbiosis
Badalamente, R. V., & Greitzer, F. L. (2005). Top ten needs for intelligence analysis tool development. In Proceedings of the 2005 International Conference on Intelligence Analysis, McLean, Virginia. Croft, W. B. (1984). The role of context and adaptation in user interfaces. International Journal of Man-Machine Studies, 21, 283-292. Dreyfus, H.L. (1992). What computers still can’t do. MIT Press/Cambridge Press. Dreyfus, H. L., & Dreyfus, S. E. (1986). Mind over machine: The power of human intuition and expertise in the era of the computer. New York: The Free Press. Fitter, M. J., & Sime, M. E. (1980). Creating responsive computers: Responsibility and shared decision-making. In H. T. Smith & T. R. G. Green (Eds.), Human interaction with computers. London: Academic Press. Fitts, P. M. (1951a). Engineering psychology and equipment design. In S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 1287-1340). New York: Wiley. Fitts, P. M. (Ed.) (1951b). Human engineering for an effective air navigation and traffic control system. Washington, DC: National Academy Press, National Academy of Sciences. Gasson, M., Hutt, B., Goodhew, I., Kyberd, P., & Warwick, K. (2002, September). Bi-directional human machine interface via direct neural connection. In Proceedings of the IEEE Workshop on Robot and Human Interactive Communication (pp. 265-270), Berlin, Germany. Gershon, N. (1995). Human information interaction. In Proceedings of the WWW4 Conference, Boston, MA. G reit ze r, F. L. (20 05a). Towa rd t he developme nt of cog n it ive t a sk d if f icu lt y met r ics to support intelligence analysis research. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 315-320). IEEE Computer Society. Greitzer, F. L. (2005b). Extending the reach of augmented cognition to real-world decision making tasks. In Proceedings of the HCI International 2005/Augmented Cognition Conference, Las Vegas, NV. Greitzer, F. L., Hershman, R. L., & Kaiwi, J. (1985). Intelligent interfaces for C2 operability. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Griffith, D. (1990). Computer access for persons who are blind or visually impaired: Human factors issues. Human Factors, 32, 467-475. Griffith, D. (2005a). Beyond usability: The new symbiosis. Ergonomics in Design, 13, 3. Griffith, D. (2005b). Neo-symbiosis: A tool for diversity and enrichment. Retrieved August 6, 2006, from http://2005.cyberg.wits.ac.za. Griffith, D., Gardner-Bonneau, D. J., Edwards, A. D. N., Elkind, J. I., & Williges, R. C. (1989). Human factors research with special populations will further advance the theory and practice of the human factors discipline. In Proceedings of the Human Factors 33rd Annual Meeting (pp. 565-566), Santa Monica, CA. Human Factors Society. Hancock, P. A., Pepe, A. A., & Murphy, L. (2005). Hedonomics: The power of positive and pleasurable ergonomics. Ergonomics in Design, 13, 8-14. Hoffman, R. R., Feltovich, P. J., Ford, K. M., Woods, D. D., Klein, G., & Feltovich, A. (2002). A rose by any other name… would probably be given an acronym. Retrieved August 6, 2006, from http://www.ihmc.us/research/projects/EssaysOnHCC/TheRose.pdf Hollnagel, E., & Woods, D. D. (1983). Cognitive systems engineering: New wine in new bottles. International Journal of Man-Machine Studies, 18, 583-600. Reprinted (1999) in 30th Anniversary Issue of International Journal of Human-Computer Studies, 51, 339-356. Retrieved August 6, 2006, from http://www.idealibrary.com
115
Neo-Symbiosis
Hughes, F. J., & Schum, D. A. (2003). Preparing for the future of intelligence analysis: Discovery – Proof – Choice. Unpublished manuscript. Joint Military Intelligence College. Joy, B. (2000, April). Why the future doesn’t need us. Wired, 8.04. Kahneman, D. (2002, December 8). Maps of bounded rationality: A perspective on intuitive judgment and choice. Nobel Prize lecture. Kahneman, D. (2003). A perspective on judgment and choice: Mapping bounded rationality. American Psychologist, 58, 697-720. Kahneman, D., & Frederick, S. (2002). Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases (pp. 49-81). New York: Cambridge University Press. Kantor, P. B. (1980). Availability analysis. Journal of the American Society for Information Science, 27(6), 311319. Reprinted (1980) in Key Papers in information science (pp. 368-376). White Plains, NY: Knowledge Industry Publications, Inc. Kurzweil, R. (1999). The age of spiritual machines: When computers exceed human intelligence. New York: Penguin Group Licklider, J. C. R. (1960). Man-computer symbiosis. IRE Transactions on Human Factors in Electronics, HFE, 4-11. Licklider, J. C. R., & Taylor, R. G. (1968, April). The computer as a communication device. Science & Technology, 76, 21-31. Maslow, A. H. (1970). Motivation and personality (2nd ed). New York: Viking. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity to process information. Psychological Review, 63, 81-97. Nielsen, J. (1993). Usability engineering. Cambridge, MA: Academic Press/AP Professional. Norman, D. A. (2004). Emotional design: Why we wove (or hate) everyday things. New York: Basic Books. Norman, D. A. (2005). Human-centered design considered harmful. Interactions. Retrieved August 6, 2006, from http://delivery.acm.org/10.1145/1080000/1070976/p14-norman.html?key1=1070976&key2=3820555211&coll=por tal&dl=ACM&CFID=554857554&CFTOKEN=554857554 Norman, D. A., & Draper, S. W. (1986). User-centered system design: New perspectives on human-computer interaction. Mahwah, NJ: Lawrence Erlbaum. Parasuraman, R. (2003). Neuroergonomics: Research and practice. Theoretical Issues in Ergonomics Science, 4(1-2), 5-20. Pirolli, P., & Card, S. K. (1999). Information foraging. Psychological Review, 106(4), 643-675. Sanquist, T. F., Greitzer, F. L., Slavich, A., Littlefield, R., Littlefield, J., & Cowley, P. (2004). Cognitive tasks in information analysis: Use of event dwell time to characterize component activities. In Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting, New Orleans, Louisiana. Schmorrow, D. D., & Kruse, A. A. (2004). Augmented cognition. In W. S. Bainbridge (Ed.), Berkshire encyclopedia of human computer interaction (pp. 54-59). Great Barrington, MA: Berkshire Publishing Group. Schmorrow, D., & McBride, D. (2005). Introduction to special issue on augmented cognition. International Journal of Human-Computer Interaction, 17(2).
116
Neo-Symbiosis
Scholtz, J., Morse, E., & Hewett, T. (2004, March). In depth observational studies of professional intelligence analysts. Paper presented at Human Performance, Situation Awareness, and Automation (HPSAA), Daytona Beach, FL. Retrieved August 6, 2006, from http://www.itl.nist.gov/iad/IADpapers/2004/scholtz-morse-hewett.pdf Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 3-22. Sloman, S. A. (2002). Two systems of reasoning. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases (pp. 379-396). New York: Cambridge University Press. Stanovich, K. E. (1999). Who is rational: Studies of individual differences in reasoning. Mahway, NJ: Erlbaum. Stanovich, K. E., & West, R. F. (2002). Individual differences in reasoning. Implications for the rationality debate. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases. New York: Cambridge University Press. Stevens, S. S. (1975). Psychophysics: Introduction to perceptual, neural, and social prospects New York: Wiley. Wang, Y. (2005a). On cognitive properties of human factors in engineering. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 174-182). IEEE Computer Society. Wang, Y. (2005b). On the cognitive properties of human perception. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 203-210). IEEE Computer Society. Warwick, K., & Gasson, M. (2005). Human-machine symbiosis overview. In Proceedings of the HCI International 2005/Augmented Cognition Conference, Las Vegas, NV. Wilkins, D. J. (2002, November). The bathtub curve and product failure behavior. Reliability HotWire, 21. Retrieved August 6, 2006, from http://www.weibull.com/hotwire/issue21/hottopics21.htm Zhang, J., & Norman, D. (1994). Representations in distributed cognitive tasks. Cognitive Science, 18(1), 87122. Zsombok, C. E. (1997). Naturalistic decision making: Where are we now? In C. Zsombok & G. Klein (Eds.), Naturalistic decision making. Mahwah, NJ: Erlbaum.
Endnotes 1
2
A research program at DARPA, Neurotechnology for Intelligence Analysts, seeks to identify robust brain signals that may be recorded in an operational environment and that are correlated with imagery data of potential interest to the analyst. Investigations of visual neuroscience mechanisms have indicated that the human brain is capable of responding to visually salient objects significantly faster than an individual’s visuomotor response—i.e., essentially before the human indicates awareness. The program seeks to develop information processing triage methods to increase the speed and accuracy of image analysis. http://www. darpa.mil/dso/thrust/biosci/nia.htm As Anderson (2005b) has observed, human expertise in System 1 domains has been very difficult to model in computers, and many researchers (connectionists, behavior-based roboticists) have used this to argue that digital computer metaphor is flawed.
This work was previously published in International Journal of Cognitive Informatics and Natural Intelligence, Vol. 1, Issue 1, edited by Y. Wang, pp. 39-52, copyright 2007 by IGI Publishing (an imprint of IGI Global).
117
118
Chapter VIII
Language, Logic, and the Brain Ray E. Jennings Simon Fraser University, Canada
ABTRACT Although linguistics may treat languages as a syntactic and/or semantic entity that regulates both language production and comprehension, this article perceives that language is a physical and a biological phenomenon. The biological view of languages presents a new metaphor on an evolutionary time-scale the human brain and human language have co-evolved. Therefore, the brain is the instrument with a repository of syntactic and semantic constraints. The logical vocabulary of natural languages has been understood by many as a purified abstraction in formal sciences, where the internal transactions of reasonings are constrained by the logical laws of thought. Although no vocabulary can be entirely independent of semantic understanding, logical vocabulary has fixed minimal semantic content independent of context. Therefore, logic is centered in linguistic evolution by observing that all connective vocabulary descends from lexical vocabulary based on spatial relationship of sentences. Far from having fixed minimal semantic content, logical vocabulary is semantically rich and context-dependent. Many cases of mutations in logical vocabulary and their semantic changes have been observed as similar to that of biological mutations. These changes proliferate to yield a wide diversity in the evolved uses of natural language connectives.
Introduction and Background George Boole (1815-1864) called his seminal study The Laws of Thought. Boole took the title seriously as a description of the relationship between logic and thought. Cognitive scientists should not. Why not? is the subject of this address. Boole’s ambitions for the treatise are clear enough, and are repeated several times in the course of his introduction. This is the opening sentence:
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Language, Logic, and the Brain
The design of the following treatise is to investigate the fundamental laws of those operations of the mind by which reasoning is performed2, to give expression to them in the symbolic language of a Calculus, and upon this foundation to establish the science of Logic, and construct its method. (Boole, p. 1) The difficulty is in getting at those fundamental laws. Boole presents his solution to the problem, which is to get at them through an examination of the language of reason, as a tactic of convenience. A fastidious reader might wish that Boole had spent fewer words on a more perspicuous justification of the method. It will not be necessary to enter into the discussion of … whether language is … an essential instrument of reasoning … . It is the business of Science to investigate laws; … whether we regard signs as the representatives of things and of their relations, or as the representatives of the conceptions and operations of the human intellect, in studying the laws of signs, we are in effect studying the manifested laws of reasoning … . For though in investigating the laws of signs a posteriori, the immediate subject of examination is Language, with the laws which govern its use; while in making the internal processes of thought the direct object of inquiry, we appeal in a more immediate way to our personal consciousness, it will be found that in both cases the results obtained are formally equivalent. Nor could we easily conceive, that the unnumbered tongues and dialects of the earth should have preserved through a long succession of ages so much that is common and universal, were we not assured of the existence of some deep foundation of their agreement in the laws of the mind itself. (pp. 24-25) Boole’s contemporary, Augustus De Morgan (1806-1871), had he understood what we now understand of such things, would have been forced to judge Boole in this matter as he judged the philosophers of his day, who, he said, “speak with the majesty of ignorance.” It is merely an article of faith that we could have access to something called thought independently of our use of language, though we could empirically investigate some human capacities by experiments that required no explanation as to protocol.
Folk Semantics The notion that universal semantic features of language reveal themselves at the level of connective vocabulary is more likely an artefact of our practices of translation. One still reads in introductory logic tests, that English has one word for or, whereas Latin had two. Latin has at least half a dozen words (including vel, aut, sive, seu, and enclitic ve) that we can translate as or, and so does English. They include unless, alternatively, but, and otherwise. One might ask whether each of these is usefully thought of as the representative of a conception or operation of the human intellect. Boole, like many more recent authors3, in the course of expounding oversimple theories about natural language connectives, himself quite unselfconsciously uses the very connectives he is speaking of in ways that his theory does not comprehend. (In the Boole passage cited, notice the whether … or construction.) In every case that I know of, the judgement about the connectives of natural languages arises, not from a study of natural languages, but from a fascination with or a devotion to a formal technology. Psycholinguistic studies are not free of this taint. In one such study4, the dogma that the satisfaction-conditions of non-exclusive disjunction and those of exclusive disjunction exhaust the uses of English or premises, a test administered to young children; nevertheless the explanation of the experimental protocol offered to the young subjects is framed with a use of or that is not admitted by the experiment. Unremarkably, the four-year-olds had sufficient command of that use of the word, that they were able to understand the test. Unremarkably, none of the adult academics noticed. Here is an example: suppose that a waiter says to you “You can have tea or you can have coffee.” Is his pronouncement an exclusive or5 inclusive disjunction? Audiences generally divide on this subject into two camps: a majority who think the former, and a minority who think the latter. But although there is a division on the question when they are explicitly asked, when asked immediately thereafter whether in such a situation they would infer that they could have tea, they are unanimous that no one to whom a waiter said this would be in doubt that he or she could have tea. But this implies that in practice we do not understand the sentence as either sort of disjunction, because from a disjunction (inclusive or exclusive) we cannot infer either disjunct. The answer to the abstract question is an artefact of academic training; our inference in the event is an outcome of our natural understanding of the language.
119
Language, Logic, and the Brain
In general our naturally acquired linguistic understanding does not require and does not confer a semantic understanding. This is evident in the case of lexical vocabulary: as children we acquire both the ability to be the first to use a vocable in conversation, and the ability to make conversational responses to its introduction in suitable ways that nevertheless do not involve our using the vocabulary6. Gradually, our uses and responses occasion neither puzzlement nor correction. But we have almost all made it to adulthood with some eccentricities of vocabulary selection that our interlocutors have not corrected or queried. There is no semantic audit in adolescence by which our uses of the language are held up to independent expert scrutineers. There can be no such scrutineers; for all of us are, in this respect, in the same semantic pickle. Unless we specifically make a study of it, we do not understand the connective vocabulary of our own natural language. We merely know how to use it and to how respond to others’ uses of it. Experts with sufficient academic training in logic and the formal representation of natural language sentences can write textbooks of logic. But few have themselves much understanding beyond what they have acquired from teachers similarly placed who themselves wrote textbooks. The result has been an academic tradition of glib recitals of uninvestigated myths: myths about English or, myths about Latin aut and vel. They are to be found in virtually every introductory logic text7. Though venial in themselves, they also both skew and miscolour academic perceptions of natural language, and its role in our cognitive life. In the first place, they falsely teach us that language is primarily semantic, though they give us no usable theory of meaning, and in the second, they elevate the language of thought, belief, intention, inference, and reasoning to a theoretical role that it cannot serve. In general, we have at best an imperfect semantic understanding of natural language connective vocabulary. As difficult a lesson as that is to learn, it is also true of mental vocabulary. Like natural language connective vocabulary, mental vocabulary is learned without the benefit of explicit semantic theory. Part of the reason for this is that we require no explicit semantic understanding in order to use the vocabulary with standard proficiency. It is perhaps the failure to recognize that fact that sends us haring after semantic theories. It is that fact among others that makes it a certainty that we will not find one. For if its existence is not dictated as a requirement of the transmission of language, then nothing remains that is capable of enforcing or preserving it. One might more plausibly argue that the overburden of such a requirement would effectively prevent the transmission of language. We can, however, explain why a language has the particular connectives and the particular mental terms that it has; moreover, as we better understand how we come by the terms, we also better understand the futility of attempting to gain an understanding beyond that required for its conversational use. We will consider connectives in this light in due course. As for mental terms, I merely illustrate the point with the verb intend. English owes the word to Norman French and ultimately to Latin, and we may suppose without grief that there were uses in these ancestral languages that roughly correspond to our own. But the earliest uses of its Latin forebear, as any Latin pupil can confirm, and its principal nonfigurative understanding had to do with the drawing of a bowstring, an understanding that was certainly still present in 16th-century English. A secondary but still physical understanding, namely to point towards a target exploited a usual incidental feature of the primary action of drawing the string. Its initial mental uses were also crudely spatial: one aimed one’s soul, and in another use, one aimed in one’s soul. Thus anyone who supposes that in its descent from those earliest figurative uses, it has acquired a semantics or has become transformed into a theoretically useful explanatory instrument is bound to say how that acquisition or that transformation came about. Its history records only the loss of its original lexical understanding, and the ellipsis of explicit mention of anima (soul) in its use. No doubt educated 16th-century English speakers still distinguished figurative uses from nonfigurative ones, since they still spoke of intending a bowstring. But the language seems to have lost the nonfigurative use since the advent of firearms. Paraphrase of the residual use requires similarly problematic mentalistic vocabulary in such constructions as minded to, and figurations, as in have it in mind to.
The Fragility of Syntax All of the so-called “logical” connectives of English (and, or, if, but, unless, and so on) are typically morphologically reduced residua of lexical vocabulary, typically prepositional or relative adverbial, but in any case, typically representing a spatial, temporal, or other natural relationship. Or, for example, is a reduced form of oþer (other),
120
Language, Logic, and the Brain
which we would translate as second, as in “every second (every other, every alternate) day.” The original survives in such words as otherwise and either. But is a reduced form of by outan, then butan (outside), unless, a reduction of the longer prepositional phrase on [a condition] less (than that …). Consider but as a representative example. Its early ancestral uses were closely constrained to the outside of structures such as dwellings. The first stage of its functionalization is the slow generalization of its use in application to geographical, and by assimilation, to institutional boundaries (but (that is, outside) the country), then to categorical and unit boundaries (nothing but rubble, no one but his mother, could not but smile), thence to circumstantial boundaries (but for the rain, but that the rain ended the search). At this point there stands only the ellipsis of that between prepositional uses of but and its first, disjunctive, connective use as in It doesn’t rain but it pours. Compare the dialectical It doesn’t rain without (≈ outside) it pours. Of course this is a crude sketch of one path, and there are more than one path by which language of physical relationship yields up in its maturity what some would call a propositional connective. Each of these stages represents an exploitation or abaptation8 of a previous use. Our familiarity with the word but tells us that the story does not end there, since the uses mentioned so far do not exhaust the familiar list. In fact our explanation can be extended to include even a short list of linguistically problematic uses of but; these are, however discussed more fully elsewhere (Jennings 2005a, b). For present illustrative purposes we take but also as a clear illustration of a second phenomenon of linguistic development: mutation. Although mutations in language transmission are more unlike DNA mutations than like, the use of the term mutation is anything but metaphorical and very particularly applied to string replication errors, therefore more like frameshift mutations than like other biological sorts. Like biological mutations, they are sometimes insignificant, sometimes unimportant, and sometimes capable of introducing major and widespread syntactic novelty. Frameshift mutations occur as a result of the deletion or addition of some small number of nucleotides in a DNA string. The alteration causes a shift in the reading frame of the resulting mRNA transcription. Unless compensated for by some other mutation later in the string, all codons below the site of alteration are affected. As an example consider the syntactic disturbance resulting from the deletion of the underscored pyrimidine, uracil in the following RNA string: ... UUU UCU UAU UGU ... ... UUU UCU AUU GU ... . Such a mutation is significant if it results in the production of some nonfunctional protein, an outcome that may or may not be deleterious. Linguistic use also involves replication of syntax. The brain of the recipient must (or rather does in general) transcribe the product of the brain of the speaker in such a way that the syntax of the transcription matches the syntax of the produced speech. The speaker reinforces syntactic construal by prosodic presentation, guiding the hearer to the correct construal by appropriate variations of pitch contour, syllable lengthenings, and stress. Sometimes, however that syntax is in some respects both indeterminate and immaterial, as that of: No trees have fallen over here, so that no disruption is occasioned if the speaker should report the claim as Over here no trees have fallen while the hearer would report it as Here no trees have fallen over9.
121
Language, Logic, and the Brain
Sometimes the consequences become apparent in the later word-formation patterns of the language. Under this heading we may note the transitions form a numpire to an umpire, from an once word to a nonce word, from an ick name to a nickname, and from a naperon to an apron. Though we might be misled in our understanding of the origins of the newly-formed word, it makes little practical difference that one word has replaced another. But these trivial mutations do share a feature worth noting with more interesting mutations: namely that a mutation requires an incubation period in which it remains undetected and therefore uncorrected, and during which it gradually establishes itself within a critical proportion of a linguistic community. It is only after such a period that the capacity for correction is overwhelmed, and the new can constitute a new transmitted orthodoxy. That it should take hold at all requires a degree of ignorance as to linguistic history, a condition that is generally met. A theoretically more interesting form of mutation is one that also requires for its survival and propagation just the condition I have claimed to obtain generally in our use of language. That level of innocence is maintained in large part through semantic ignorance and in part through syntactic indeterminacy. In the case of connectives those two conditions are nurtured by the morphological reduction of the vocabulary from its original, recognizably lexical form, and by the scope-indeterminacies created by the presence of other sentence elements, notably negations and modal elements. Since our example is to be but, we might note the prevalence of negations (no one but, nothing but, and so forth) in the later prepositional uses of the particle and in the earliest sentential uses where modal auxiliaries are also found in negativizing roles. The following illustrate the patterns (with that’s) restored parenthetically: I would have come earlier but (that) I was delayed He does not visit but (that) he begs money of me In the former the scope of the modal would have extends to the end of the sentence, likewise the negative does not in the second. Both are conditional in character (I would have come earlier had I not been delayed / If he visits, he begs money of me.) Now suppose that (the that being absent) the audience takes the scope of the modal or the scope of the negation to extend only to the main clause. How does this syntactic misconstrual distinguish this hearer from the hearer whose construal retains long scope of the modal and the negation? There is sufficiently little difference between the construals that it could readily remain undetected. On the correct construal of the one hearer, a subordinate clause suggests that there was a delay in the one case and suggests that the visitor did actually beg money. On the incorrect construal of the other hearer, a second main clause asserts that there was a delay in the one case and a second main clause asserts that the visitor begs money. There may be few occasions on which the hearers actually agree in their syntactic construal of similar constructions; nevertheless they might agree on the occasions in which the constructions are appropriately produced. It is this agreement among speakers as to occasions of use, and disagreement as to syntax that characterizes the first stage of a mutation. The novel syntactic construal need not require some clearly enunciated reconstrual of the use of but. There is in general no presumption that our interlocutor has a clear understanding of his vocabulary, only that the use is not wildly eccentric. So we need ourselves satisfy only that presumptive standard. This first stage of mutation, because the new use is lies hidden beneath the old, could be called the succubinal stage. The concealment of the novel construal permits an increase in the subpopulation of language users for which the novel construal is the correct one. In the second, migratory stage, the new construal emerges from concealment into environments in which no scope ambiguity hides its new construal. In the case of but, it now finds uses in negation-free constructions: He visited his brother regularly, but he always begged money. No doubt mavens of the old school object to this newly emerging use, and decry it as nonsense; however, the use survives because, for a sufficient number of the linguistic community, it is the natural generalization from the use in negative environments. On the other hand, the generation for which there can be no such generalization are partly right. The construction is clearly conjunctive, by which is meant that either clause can be detached; however the new generation would be hardpressed to explain why but should have been selected rather than and. Nevertheless but inherits a niche from its earlier use, and although that has perhaps changed, it has distinctive
122
Language, Logic, and the Brain
uses in present English speech10. For conservative speakers, but has competing readings in negative environments, and ill-read young or non-native English speakers can only with difficulty be persuaded that the proverb It doesn’t rain but it pours is to be read as If it rains it pours. That new ambiguity in negative environments typically represents a third stage of mutation in which the mutated reading gets marked in some such way as by the affixing of an extremal adverb. Thus we have, for example, just any, even if, for all, and others, to distinguish mutated uses from ancestral ones. By far the most common mutations yield conjunctive readings where there were only conditional ones, or conditional ones where there were only conjunctive ones. But and without (both, incidentally descended from vocabulary of outsideness) afford examples of the former. (Contrast She’ll die without medical help and She’ll die without betraying her comrades.11) Unless affords an example of the latter. Some mutated connectives, such as but are minimally or only prosodically marked. A comma suffices to mark the conjunctive reading in written English. Without is marked variously, but marked. In some cases, such as that of unless, the ancestral reading does not survive. Canadian political English affords an instance that shares with but the feature that it is semantically difficult, and with unless the feature that the original is lost. The prime minister initiates a general election by drawing up a writ. The verb phrase draw up has been heard as drop, with the result that one now speaks of the PM “dropping the writ” rather than “drawing it up,” as has having finally “dropped the writ” rather than as having “drawn it up.” As to what the dropping consists in, no one is quite clear. I have spoken of “stages of mutation,” but these stages, like the generations that produce them, are convenient fictions. Populations change by proportionally minute additions and deletions, and stages emerge as populations of language users do. The result is that understood population-wise, the syntax of a language is, at many points, indeterminate. Moreover a mere 35 independent points of difference would suffice to provide every human on the planet with a unique syntactic understanding of the English language, and many more than 35 such points of difference are present among the English-speaking minority of the Earth’s population. If the natural language of syntax is the language of thought, then the notion of a universally shared language of thought is simply a nonstarter. Even in a small linguistic community, there is no reason to suppose that all of its members work within the same set of syntactic generalizations, nor even that they operate with a syntax, as distinct from a set of merely familiar strings12.
Theory and Practice Again The familiarity of strings depends upon instruction and memory, and both memory and instruction let us down. We can all offer examples from our own missteps, so the example that I quote will not be taken to disoblige its author. In fact he himself offers it in the course of making a point closely related to the general one that I am trying to convey. The author is M.C. Corballis (2002) distinguishing human skills at abstract and applied reasoning. His main point is well taken. The point makes reference to and adapts “a simple test of reasoning”: the Wason13 test. Suppose that you are shown four cards, bearing the symbols A, C, 22, and 17, and are told that there are also symbols on the other side of each card. You are then asked which two cards need to be turned over to check the truth of the following claim: “If a card has a vowel on one side, then it has an even number on the other side.” If you’re like most people, you’ll choose the cards displaying the A (a vowel) and the 22 (an even number). It is indeed rational to turn over the A, but turning over the 22 is not really very revealing, since whatever is on the
123
Language, Logic, and the Brain
other side cannot disconfirm the statement. The better strategy is to turn over the 17, because the presence of an A on the other side would falsify the statement. (Corballis, 2002, p. 97) Corballis goes on to compare our ineptness at solving the abstract problem with our superior talents when the case is described more concretely. But now suppose it is explained that the symbol A stands for ale and C for coke, and it is explained that there are people’s ages on the other. If asked which two cards to turn over to check the truth of the statement: “If a person is drinking ale, he or she must be over 20 years old,” most people easily understand that the critical cards are those bearing A and 17 14.The task is formally the same as that involving the letters and digits as abstract symbols, but now the policeman in you readily understands that you should examine what the seventeen-year-old is drinking if you want to stamp out underage drinking. Now it is by no means to the discredit of Corballis that I should offer this comparison as an excellent example of his own point. The concrete case does not parallel the more abstract one, and while his solution to the practical problem is indeed correct, his solution to the more abstract one is by no means so. In the abstract case we are told only that there is a symbol on each side of each card; in the practical one we are told that are beverages on one side and people’s ages on the other. With a little charity we can accept, in the concrete case, that no beverages are coded numerically, and that no ages are given in Roman numerals. Failure of that condition would indeed change the case. But in the abstract case, we must turn over three cards, not just two, for we are given no assurance that there is a numeral on the other side of the C card. Therefore it too must be flipped to check whether the other side bears only a vowel. In fact, if we take from his statement of the case only what can be strictly inferred, flipping the A card might falsify the claim in either of two ways. It would be falsified if there were no even number on the other side; but assuming that the side initially displayed contains only an A, the statement would also be falsified if on the reverse there were an even number and a vowel. Again, to point this out is to say nothing to the discredit of the author. He readily acquiesces in the view of Tooby and Cosmides15 that ‘most of us are poor logicians’. We are poor logicians because we are not trained in logic. We fare ill in Wason-like tests, because we are not practiced in Wason-like tests. Even if, as De Morgan maintained, the study of logic tends to reduce the difference in difficulty between the familiar and the novel, and even if logic has found its own technological sphere of supremacy, logic is itself unnatural. An ability to extend sequences of sentences to proofs by the application of available rules is not what makes clever people clever. To be sure clever people learn logic with ease, but that is because the extensions of sequences of sentences to proofs provide an uncluttered arena for their cleverness. The steps in a proof do not themselves require extensions of sequences of sentences behind the scenes. Sequent-introduction, the formal practice that most closely approximates this model, does not depend upon a hierarchy of meta-sequent-introductions.
Whence Reasoning? Where then, in our evolutionary development, do our practical reasoning skills come from? Were we looking for a promising clue, we could do worse than to resurrect from the dust the writings of Augustus De Morgan, and browse through his lectures on the syllogism. In his Cambridge lectures, Augustus De Morgan complained about traditional syllogistic that the only relation it was prepared to accept was that associated with the copula verb, that all other relational terms must be relegated to the predicate. So, for example, the syllogistic demands that we represent the sentence [Three] is greater than [two] as
124
Language, Logic, and the Brain
All [three] is [a thing greater than two]. But then the evidently valid argument 4 > 3; 3 > 2/∴ 4 > 2 must be accounted syllogistically invalid, since when, as the tradition requires, we give the copula its accustomed pre-eminence, the resulting representation commits the syllogistic fallacy of introducing four terms: four, things greater than three, three, things greater than two, and two. De Morgan did not quite have the means to make his point as we would now most naturally put it, but his intention was clear enough. What makes syllogistic work is not the particular relation that the copula represents, but rather some assumed formal properties of that relation. From our vantage point, we see that there are really two relations: that of inclusion, and that of set membership. So, to consider an example, in general the first figure syllogism in Barbara is valid in virtue of the transitivity of set inclusion, but in the special case in which the subject term is singular, the validating property can also be understood as the monotonicity of membership along inclusion. To say that a relation R is monotonic along a relation S is to say that for any individuals x, y, and z if Rxy and Syz, then Rxz. (Thus to say that a binary relation R is transitive is to say that it is monotonic along itself.) Membership is monotonic along inclusion, because for any individual x, and any sets a, and b, if x ∈ a, and a ⊆ b, then x ∈ b. Contains is monotonic along identity because for any individuals x, and y, and any set a, if x ∈ a, and x = y, then y ∈ a. De Morgan preferred the language of composition. The composition, R∘ S of two binary relations R and S is the set of pairs {〈x,y〉 ∃z : Rxz & Szy}. To say that R is monotonic along S is to say that R ∘ S ⊆ R. Thus, properly understood, a first-figure syllogism in Barbara in which both the minor and the middle term are singular owes its validity to that monotonicity. The language of syllogistic made no special use of these distinctions, the undifferentiated copula serving adequately for its rather limited purposes. However, De Morgan observed, seen from the vantage point of sufficient mathematical abstraction, the first figure syllogism in Barbara shares a valid form with the greater than argument cited, and with some arguments in which distinct premises introduce distinct relations, provided that the relations bear to one another the right second-order relationships (inclusion, monotonicity, and so on) in the right combinations. First-figure Barbara in which the major term and middle terms are singular shares a valid form with y < z; x = y /∴ x < z. De Morgan’s exposure of the traditionalists was intended in part as a demonstration of the uselessness of syllogistic as an instrument of scientific investigation and of human reasoning more generally. He likened the syllogism to an ornamental cannon: “out of mathematics, nearly all the writing is spent in loading the syllogism, and very little in firing it.” It is easy to see why De Morgan would have regarded his own proposal as an improvement on this score. Insofar as explanation involves decomposing causal relationships into spatiotemporal component relationships, a general theory of relations and their compositions would provide a general mathematical theory of scientific explanation. But although he saw the possibility of composing relations in arguments, as Lucas is a son of Laurie; Laurie is a sister of Alison; Martyn is a son of Alison; therefore, Lucas is a cousin of Martyn he made no special plea on behalf of decomposable relations, such as those of cousin, nephew, niece, for purposes of detailed understanding. His point is the negative one that inclusion relations between classes of items, whether objects or moments or states, relations that depend upon identities between elements, provide inadequate resources. Part of the reason is that identity, as deployed in the definition of inclusion, does not allow for predictable transitions of states or for regularity of change. On De Morgan’s account,
125
Language, Logic, and the Brain
it is to Algebra that we must look for the most habitual use of logic forms … . Here the general idea of relation emerges, and for the first time in the history of knowledge, the notion of relation and relation of relation are symbolized. And here again is seen the scale of gradations of form, the manner in which what is difference of form at one step of the ascent, is difference of matter at the next. It will hereafter be acknowledged that … the algebraist was living in the higher atmosphere of syllogism, the unceasing composition of relation, before it was admitted that such an atmosphere existed. (p. 241) De Morgan’s insight deserves to be better known and its importance more widely acknowledged. The reason is this: the algebraic character of human reasoning is the product of the algebraic character of its evolution. The distinction between vocabulary of matter and vocabulary of form, if it exists, must itself be an evolved distinction, since all linguistic organisms are descended from nonlinguistic ones. Moreover the process of functionalization of which I have earlier sketched an account must itself be descended from an ancestral process through some higher-order development that produced the one kind of change from the other. Again, this follows directly from our having had non-linguistic ancestors. Whatever process it is that alters the significance and morphology of vocabulary has descended from a process that altered the (ancestor of) significance and (the ancestor of) morphology of nonlinguistic physical interventions. Each accession to a higher order requires a longer time frame for illustration. So the number of degree-raising iterations (changes in changes in changes and so forth) that we can usefully make is strictly limited and in any case beyond my expository skills. But if we are looking for the origins of these peculiarly human skills of reasoning, then we must certainly look to prelinguistic conditions, perhaps to biological conditions that ground the earliest developments toward language itself. Certainly we can find purely biological monotonicities that would have offered exploitable traits in the ancestry of referentiality and they can serve as sufficient illustration of the more general point. One such trait is the evolved response of fixing in foveal attention and tracking singular motion detected within the visual field. The story is a complicated one. In the nature of things, we cannot have a detailed understanding, but we can make intelligent inferences that accord with the outcome and with the anthropological evidence. It begins with the development of bipaedalism, the consequent narrowing of the pelvis with changes in shape and size of the birth canal that resulted in soft-cranial births and postpartum encephalization, and with the increase in the degree of cranial flexion that grew out of the requirements of balancing a head on an upright body and all of that together with slow recession of the prognathous jaw which was rendered superfluous by the advent of hand-held tools. Early in these developments the forelimbs became available for non-locomotive uses and among them, the articulated ballistic motions of tool making and particularly of throwing. Again, with these developments came a brain free to develop in size and functional constitution to organize and initiate such motions. An early legacy of that cluster of developments together with the more primitive tracking response was the capacity to track thrown objects, and, one may safely presume, the capacity to occasion tracking by throwing. Now a developing capacity for tracking intermittently visible motion establishes a monotonicity between two relations: the graph of successive vectors representing the direction of gaze, and the graph of the trajectory of the tracked motion. The monotonicity is represented by the accurately anticipated reappearances of intermittently visible objects. At its most highly developed, it would permit the accurate anticipation of a trajectory solely from the perceived character of the articulated ballistic motion that produced it. With this anticipation comes the increased capacity of articulated motion, even when disarmed, to direct attention accurately to a distant location, even one beyond dead ground. Finally, that ability to anticipate is exploited through ballistic motion now disarmed of any ballistic outcome, permitting the motion to acquire a purely deictic significance. On such an account, one can usefully see the development as an (ancestral) functionalization of throwing. The disambiguation of the functionalized form evolves as a natural consequence of the disarming: like the morphologically reduced profile of functionalized vocabulary, it would have evolved into an evidently low-energy version of a comparatively high-energy ancestral ballistic intervention. The evidence of neocortical architecture suggests strongly that the thinking of a thought and the production of one particular phonemic stream rather than another is the outcome of a very fast Darwinian competition for limited workspace in neocortex (Calvin, 1996a, b) The workplace itself is the outcome of a very long evolutionary development, orders of magnitude longer than any particular linguistic functionalization. Indeed what seems evident is that the evolved linguistic brain is a brain capable of contributing to functionalizations
126
Language, Logic, and the Brain
through abstractive innovations. Consider with what ease we understand the sentence He won’t pass through the village without I see him. Moreover there is no reason to suppose that successive functionalizations produce functionally equivalent connective vocabulary. That notion is a product of the semantic illusion. It follows that what evolved brains of temporally close epochs have in common is the command of presumably similar spatial and other physical relationships from which functional vocabulary develops. That is to say that a more plausible neural ground for individual human inferential capacities is the neural ground of our capacities to register and react to monotonicity and compositional relationships between physical relations. Linguistically, our capacities to reason with physical vocabulary provide the biologically relevant measure of brainpower, not our comparatively feeble logical abilities.
Conclusion We have all judged of this or that charlatan that he does not know what he’s talking about. And we have all marvelled at that plausible fluency to which ignorance is no impediment. Such remarks are reserved for the mischievous and the insidious. But at quite an ordinary level, it has to be true of everyone who speaks a language. We need no understanding beyond conversational fluency to engage in conversation in a natural language. Since conversational fluency is what is transmitted from one generation of language users to the next, the most that we are guaranteed is whatever is necessary for that transmission. The difference between the speaker and the charlatan is that the charlatan lays claim to a level of understanding that he, unlike others, has not attained. The speaker’s naïve confidence though demonstrably unwarranted, is not mischievous. He is merely caught up in a compelling illusion that almost all speakers tacitly share. The mischief arises when the illusion is given charge of theory: linguistic theory, logical theory, cognitive theory or any other. One certain theoretical way out of the illusion is to ground cognitive theory in the physicality and more particularly in the biological character of language. What does this amount to? No one to whom the fact is pointed out will disagree that language is primarily a biological phenomenon. The difficulty lies, not in acceding to the fact, but in troubling to work to the theoretical standards that the fact imposes. This requires more than verbal acceptance; it requires a commitment to understanding language in the light of linguistic data: not just data pertaining to how we find ourselves speaking, but the data that reveals how it comes about that what we say has the physical significance that it has. One consequence of this imposition is particularly unwelcome in some academic quarters: it is likely that in some tolerated intellectual pursuits, the worse one’s biological understanding of language, the sillier one’s theories.
References Boole, G. (1854). An investigation of the laws of thought, on which are founded the mathematical theories of logic and probabilities. New York: Dover Publications, Inc. Braine, M. D. S., & Rumain, B. (1989). Development of comprehension of “or”: Evidence for a sequence of competencies. Journal of Experimental Child Psychology, 31, 46-70. Calvin, W. H. (1996a). How brains think: Evolving intelligence, then and now. New York: Basic Books. Calvin, W. H. (1996b). The cerebral code: Thinking a thought in the mosaics of the mind. Cambridge, MA: MIT Press. Calvin, W. H., & Bickerton, D. (2000). Lingua ex machina: Reconciling Darwin and Chomsky with the human brain. Cambridge, MA: MIT Press. Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton/Oxford: Princeton University Press.
127
Language, Logic, and the Brain
Heath, P. (Eds). (1966). On the syllogism and other logical writings by Augustus De Morgan. New Haven: Yale University Press. Cox, J. R., & Griggs, R. A. (1989). The effects of experience on performance in Wason’s selection tasks. Memory and Cognition, 10, 496-503. Han, C.h., Lidz, J., & Musolino, J. (2003). Verb-raising and grammar competition in Korean: Evidence from negation and quantifier scope. Unpublished manuscript. Simon Fraser University, Northwestern University, Indiana University. Han, C.-h., Ryan, D., Storoshenko, S., & Yasuko, S. (in press). Scope of negation, and clause structure in Japanese. In Proceedings of the 30th Berkeley Linguistics Society. Jennings, R. E. (1994). The genealogy of disjunction. New York: Oxford University Press. Jennings, R. E. (2004). The meaning of connectives. In S. Davis & B. Gillon (Eds.), Semantics: A reader. New York: Oxford University Press. Jennings, R. E. (2005). The semantic illusion. (in press). In A. Irvine & K. Peacock (Eds.), Errors of reason. Toronto: University of Toronto Press. Jennings, R. E., & Friedrich, N. A. (2006). Proof and consequence: An introduction to classical logic. Peterborough: Broadview Press. Jennings, R. E., & Schapansky, N. (2000). Without: From separation to negation, a case study in logicalization. In Proceedings of the CLA 2000 (pp.147-158). Ottawa: Cahiers Linguistiques d’Ottawa. Lieberman, P. (1984). The biology and evolution of language. Cambridge, MA: Harvard University Press. Lieberman, P. (1991). Uniquely human: The evolution of speech, thought, and selfless behavior. Cambridge, MA: Harvard University Press. Lieberman, P. (2000). Human language and our reptilian brain: The subcortical bases of speech, syntax and thought. Cambridge, MA: Harvard University Press. Wason, P. (1966). Reasoning. In B. M. Foss (Eds.), New horizons in psychology (pp. 135-151). London: Penguin.
Endnotes
1
2
4 5 6 3
7
8
128
I appropriate biological terminology here and there throughout; my use of biological terms is not metaphorical. Minimal generalizations are required to capture common features of organic and linguistic biology, and in some cases (reading frames and syntax, for example) biology appropriates linguistic vocabulary with comparatively greater violence. The biology of language has closer kinship to biology than the linguistics of biology has to linguistics. The emphasis is mine. Paul Grice (1913-1988) is another. Braine and Rumain, 1981. Notice this interrogative use of or. Compare the introduction and elimination rules of natural deduction, which we understand purely prooftheoretically. See Jennings, 1994 for a wide survey. Adaptation in language use is akin to exaptation in evolutionary biology except that in the former case, an earlier function is exploited in a later analogous one, whereas in exaptation a morphological feature adapted to one function adapts to later functions in a disanalogous environment.
Language, Logic, and the Brain
9
The example is due to Mary Shaw. I say no more on the subject of its present use, as it is more fully discussed elsewhere, notably in Jennings (2004). The reader may, of course, have views on the subject. I remark only that any explanation of the role of but must apply to such constructions as “He got here, but he got here late,” in which the second clause implies the first, and to “He got here late, but he got here” in the first clause implies the second. The explanation must also tell us why the construction is noncommutative; that is why the two quoted sentences are not conversationally equivalent. 11 For a fuller account of mutations in the history of without, as well as examples of mutations in Breton, see Jennings and Schapansky (2000). 12 Examples of large-scale syntactic schism in Korean and Japanese are studied by ….. in …… 13 Wason (1966). [Note that, unlike Corballis’ version, the Wason test specifies a letter on one side and a numeral on the other.] 14 Example adapted from Cox and Griggs (1982). 15 Tooby and Cosmides (1989). 16 In “On the Syllogism: IV and on the Logic of Relations” in Heath (1966, p. 239). 17 For more detailed accounts of these developments, see Lieberman 1984, 1991, 2000, Calvin 1996a, b, Calvin and Bickerton 2000. 18 See Jennings (2005) for a more detailed exposition. 10
This work was previously published in International Journal of Cognitive Informatics and Natural Intelligence, Vol. 1, Issue 1, edited by Y. Wang, pp. 66-78, copyright 2007 by IGI Publishing (an imprint of IGI Global).
129
130
Chapter IX
The Cognitive Process of Decision Making Yingxu Wang University of Calgary, Canada Guenther Ruhe University of Calgary, Canada
Abstract Decision making is one of the basic cognitive processes of human behaviors by which a preferred option or a course of actions is chosen from among a set of alternatives based on certain criteria. Decision theories are widely applied in many disciplines encompassing cognitive informatics, computer science, management science, economics, sociology, psychology, political science, and statistics. A number of decision strategies have been proposed from different angles and application domains, such as the maximum expected utility and Bayesian method. However, there is still a lack of a fundamental and mathematical decision model and a rigorous cognitive process for decision making. This chapter presents a fundamental cognitive decision making process and its mathematical model, which is described as a sequence of Cartesian-product-based selections. A rigorous description of the decision process in Real-Time Process Algebra (RTPA) is provided. Real-world decisions are perceived as a repetitive application of the fundamental cognitive process. The result shows that all categories of decision strategies fit in the formally described decision process. The cognitive process of decision making may be applied in a wide range of decision-based systems, such as cognitive informatics, software agent systems, expert systems, and decision support systems.
INTRODUCTION Decision making is a process that chooses a preferred option or a course of actions from among a set of alternatives on the basis of given criteria or strategies (Wilson and Keil, 2001; Wang et al., 2004). Decision making is one of the 37 fundamental cognitive processes modeled in the Layered Reference Model of the Brain (LRMB) (Wang et al., 2004; Wang, 2007b). The study on decision making is interested in multiple disciplines, such as cognitive informatics, cognitive science, computer science, psychology, management science, decision science, economics, sociology, political science, and statistics (Wald, 1950; Berger, 1990; Pinel, 1997; Matlin, 1998; Payne
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
The Cognitive Process of Decision Making
and Wenger, 1998; Edwards and Fasolo, 2001; Hastie, 2001; Wilson and Keil, 2001; Wang et al., 2004). Each of those disciplines has emphasized on a special aspect of decision making. It is recognized that there is a need to seek an axiomatic and rigorous model of the cognitive decision-making process in the brain, which may be served as the foundation of various decision making theories. Decision theories can be categorized into two paradigms: the descriptive and normative theories. The former is based on empirical observation and on experimental studies of choice behaviors; and the latter assumes a rational decision-maker who follows well-defined preferences that obey certain axioms of rational behaviors. Typical normative theories are the expected utility paradigm (Osborne and Rubinstein, 1994) and the Bayesian theory (Berger, 1990; Wald, 1950). W. Edwards developed a 19-step decision making process (Edwards and Fasolo, 2001) by integrating Bayesian and multi-attribute utility theories. W. Zachary and his colleagues (Zachary et al., 1982) perceived that there are three constituents in decision making known as the decision situation, the decision maker, and the decision process. Although the cognitive capacities of decision makers may be greatly varying, the core cognitive processes of the human brain share similar and recursive characteristics and mechanisms (Wang, 2003a; Wang and Gafurov, 2003; Wang and Wang, 2004; Wang et al., 2004). This chapter adopts the philosophy of the axiom of choice (Lipschutz, 1967). The three essences for decision making recognized in this chapter are the decision goals, a set of alternative choices, and a set of selection criteria or strategies. According to this theory, decision makers are the engine or executive of a decision making process. If the three essences of decision making are defined, a decision making process may be rigorously carried out by either a human decision maker or by an intelligent system. This is a cognitive foundation for implementing expert systems and decision supporting systems (Ruhe, 2003; Ruhe and An, 2004; Wang et al., 2004; Wang, 2007a). In this chapter, the cognitive foundations of decision theories and their mathematical models are explored. A rigorous description of decisions and decision making is presented. The cognitive process of decision making is explained, which is formally described by using Real-Time Process Algebra (RTPA). The complexity of decision making in real-world problems such as software release planning is studied, and the need for powerful decision support systems are discussed.
AMATHEMATICALMODELOFDECIION ANDDECIIONMAING Decision making is one of the fundamental cognitive processes of human beings (Wang et al., 2004; Wang, 2007a; Wang, 2007b) that is widely used in determining rational, heuristic, and intuitive selections in complex scientific, engineering, economical, and management situations, as well as in almost each procedure of daily life. Since decision making is a basic mental process, it occurs every few seconds in the thinking courses of human mind consciously or subconsciously. This section explores the nature of selection, decision, and decision making, and their mathematical models. A rigorous description of decision making and its strategies is developed.
The Mathematical Model of Decision Making The axiom of selection (or choice) (Lipschutz, 1967) states that there exists a selection function for any nonempty collection of nonempty disjoint sets of alternatives. Definition 1. Let {Ai | i ∈ I} be a collection of disjoint sets, Ai ⊆ U, and Ai ≠ ∅, a function c: {Ai} → Ai, i ∈ I
(1)
is a choice function if c(Ai) = ai, ai ∈ Ai. Or an element ai ∈ Ai may be chosen by c, where Ai is called the set of alternatives, U the universal set, and I a set of natural numbers. On the basis of the choice function and the axiom of selection, a decision can be rigorously defined as follows.
131
The Cognitive Process of Decision Making
Definition 2. A decision, d, is a selected alternative a ∈ from a nonempty set of alternatives , ⊆ U, based on a given set of criteria C, i.e.: d = f (, C) = f: × C → , ⊆ U, ≠ ∅
(2)
where × represents a Cartesian product. It is noteworthy that the criteria in C can be a simple one or a complex one. The latter is the combination of a number of joint criteria depending on multiple factors. Definition 3. Decision making is a process of decision selection from available alternatives against the chosen criteria for a given decision goal. According to Definition 2, the number of possible decisions, n, can be determined by the sizes of and C, i.e.: n = # • #C
(3)
where # is the cardinal calculus on sets, and ∩ C = ∅. According to equation 3, in case # = 0 and/or #C = 0, no decision may be derived. The above definitions provide a generic and fundamental mathematical model of decision making, which reveal that the factors determining a decision are the alternatives and criteria C for a given decision making goal. A unified theory on fundamental and cognitive decision making can be developed based on the axiomatic and recursive cognitive process elicited from the most fundamental decision-making categories as shown in Table 1.
Strategies and Citeria of Decision Making According to Definition 2, the outcomes of a decision making process are determined by the decision-making strategies selected by decision makers, when a set of alternative decisions has been identified. It is obvious that different decision making strategies require different decision selection criteria. There is a great variation of decision-making strategies developed in traditional decision and game theories, as well as cognitive science, system science, management science, and economics. The taxonomy of strategies and corresponding criteria for decision making can be classified into four categories known as intuitive, empirical, heuristic, and rational as shown in Table 1. It is noteworthy in Table 1 that the existing decision theories provide a set of criteria (C) for evaluating alternative choices for a given problem. As summarized in Table 1, the first two categories of decision making, intuitive and empirical, are in line with human intuitive cognitive psychology and there is no specific rational model for explaining those decision criteria. The rational decision making strategies can be described by two subcategories: the static and dynamic strategies and criteria. The heuristic decision-making strategies are frequently used by human beings as a decision maker. Details of the heuristic decision-making strategies may be referred to cognitive psychology and AI (Matlin, 1998; Payne and Wenger, 1998; Hastie, 2001; Wang, 2007a). It is interesting to observe that the most simple decision making theory can be classified into the intuitive category, such as arbitrary and preference choices based on personal propensity, hobby, tendency, expectation, and/or common senses. That is, a naïve may still be able to make important and perhaps wise decisions every day, even every few seconds. Therefore, the elicitation of the most fundamental and core process of decision making shared in human cognitive processes is yet to be sought in the following sections. Recursive applications of the core process of decision making will be helpful to solve complicated decision problems in the real-world.
132
The Cognitive Process of Decision Making
The Framework of Rational Decision Making According to Table 1, rational and complex decision making strategies can be classified into the static and dynamic categories. Most existing decision-making strategies are static because the changes of environments of decision makers are independent of the decision makers’ activities. Also, different decision strategies may be selected in the same situation or environment based on the decision makers’ values and attitudes towards risk and their prediction on future outcomes. When the environment of a decision maker is interactive with his/her decisions or the environment changes according to the decision makers’ activities and the decision strategies and rules are predetermined, this category of decision making needs are classified into the category of dynamic decisions, such as games and decision grids (Pinel, 1997; Matlin, 1998; Payne and Wenger, 1998; Wang, 2005a.b). Definition 4. The dynamic strategies and criteria of decision making are those that all alternatives and criteria are dependent on both the environment and the effect of the historical decisions made by the decision maker. Classic dynamic decision making methods are decision trees (Edwards and Fasolo, 2001). A new theory of decision grid is developed in (Wang, 2005a,b) for serial decision makings. Decision making under interactive events and competition is modeled by games (von Neumann and Morgenstern, 1980; Matlin, 1998; Payne and Wenger, 1998; Wang, 2005a). Wang (2005a) presents a formal model of games, which rigorously describes the architecture or layout of games and their dynamic behaviors. An overview of the classification of decisions and related rational strategies is provided in Figure 1. It can be seen that games are used to deal with the most complicated decision problems, which are dynamic, interactive, and under uncontrollable competitions. Further discussion on game theories and its formal models may be
Figure 1. A framework of decisions and strategies Decision Models
Static decisions
Dynamic decisions
Certain outcomes? Yes No Predictable probability?
No
Yes Decision under certainty
Max. profit
Decision under risk
Min. cost
Optimistic
Maximax profit
Minimin cost
Decision under uncertainty
Max. expected utility
Pessimistic
Maximin profit
Decision series
Maximax utility probability
Min. regret
Minimax cost
Decision with interactive events
Decision grid
Decision with competition
Games (formal)
Event-driven automata Minimax regret
Zero sum
Nonzero Sum
133
The Cognitive Process of Decision Making
Table 1. Taxonomy of strategies and criteria for decision making No. 1
Category
Strategy
Criterion (C)
Intuitive
1.1
Arbitrary
Based on the most easy or familiar choice
1.2
Preference
Based on propensity, hobby, tendency, expectation
1.3
Common senses
Based on axioms and judgment
2.1
Trial and error
Based on exhaustive trial
2.2
Experiment
Based on experiment results
2.3
Experience
Based on existing knowledge
2.4
Consultant
Based on professional consultation
Estimation
Based on rough evaluation
3.1
Principles
Based on scientific theories
3.2
Ethics
Based on philosophical judgment and belief
3.3
Representative
Based on common rules of thumb
3.4
Availability
Based on limited information or local maximum
Anchoring
Based on presumption or bias and their justification
4.1.1
Minimum cost
Based on minimizing energy, time, money
4.1.2
Maximum benefit
Based on maximizing gain of usability, functionality, reliability, quality, dependability
4.1.3
Maximum utility
Based on cost-benefit ratio
2
Empirical
2.5 3
Heuristic
3.5 4
Rational
4.1
Static
4.1.3.1
– Certainty
Based on maximum probability, statistic data
4.1.3.2
– Risks
Based on minimum loss or regret
– Uncertainty 4.1.3.3
– Pessimist
Based on maximin
4.1.3.4
– Optimist
Based on maximax
4.1.3.5
– Regretist
Based on minimax of regrets
4.2
Interactive events
Based on automata
4.2.2
Games
Based on conflict
4.2.2.1
– Zero sum
Based on ∑ (gain + loss) = 0
4.2.2.2
– Non zero sum
Based on ∑ (gain + loss) ≠ 0
4.2.3
134
Dynamic
4.2.1
Decision grids
Based on a series of choices in a decision grid
The Cognitive Process of Decision Making
referred to (von Neumann and Morgenstern, 1980; Berger, 1990; Wang, 2005a, b). Decision models may also be classified among others point of views such as structures, constraints, degrees of uncertainty, clearness and scopes of objectives, difficulties of information processing, degrees of complexity, utilities and beliefs, ease of formalization, time constraints, and uniqueness or novelty.
Typical Theories of Decision Making Decision making is the process of constructing the choice criteria (or functions) and strategies and use them to select a decision from a set of possible alternatives. In this view, existing decision theories are about how a choice function may be created for finding a good decision. Different decision theories provide different choice functions. The following are examples from some of the typical decision paradigms as shown in Table 1.
(a) The Game Theory In game theory (Osborne and Rubinstein, 1994), a decision problem can be modeled as a triple, i.e: d = (Ω, C, )
(4)
where Ω is a set of possible states of the nature, C is a set of consequences, and is a set of actions, ⊂ CΩ . If an action a ∈ is chosen, and the prevailing state is ω = Ω, then a certain consequence α(ω) ∈ C can be obtained. Assuming a probability estimation and a utility function be defined for a given action a as p(a): → ℜ and u L : C → ℜ, respectively, a choice function based on the utility theory can be expressed as follows: d = { a | ∑u(a(ω)) p(a) = max (∑u(x(ω)) p(x)) ∧ x ∈ )} Ω
(5)
Ω
(b) The Bayesian Theory In Bayesian theory (Wald, 1950; Berger, 1990) the choice function is called a decision rule. A loss function, L, is adopted to evaluate the consequence of an action as follows: L : Ω × → ℜ
(6)
where Ω is a set of all possible states of nature, is a set of actions, and Ω × denotes a Cartesian product of choice. Using the loss function for determining possible risks, a choice function for decision making can be derived as follows:
d = {a | p[ L ( , ] = min ( p[ L ( , x )])} x∈Α
(7)
where p[L(ω, x)] is the expected probability of loss for action x on ω ∈ Ω. Despite different representations in the utility theory and Bayesian theory, both of them provide alternative decision making criteria from different angles, where loss in the latter is equivalent to the negative utility in the former. Therefore, it may be perceived that a decision maker who uses the utility theory is seeking optimistic decisions; and a decision maker who uses the loss or risk-based theory is seeking pessimistic or conservative decisions.
135
The Cognitive Process of Decision Making
Figure 2. Relationships between decision-making process and other processes in LRMB ProblemSolvingProcess
ComprehensionProcess
DecisionMakingProcess
QualificationProcess
QuantificationProcess
SearchProcess
RepresentationProcess
MemorizationProcess
THEPROCE OFDECIIONMA The LRMB model has revealed that there are 37 interacting cognitive processes in the brain (Wang et al., 2004). Relationships between the decision-making process and other major ones in LRMB are shown in Figure 2. Figure 2 indicates that, according to UML semantics, the decision-making process inherits the Problem-Solving process. In other end, it functions by aggregations of or supported by the Layer 6 processes Comprehension, Qualification, and Quantification, as well as the Layer 5 processes of Search, Representation, and Memorization. Formal descriptions of these related cognitive processes in LRMB may be referred to (Wang, 2003b; Wang and Gafurov, 2003; Wang et al., 2003, 2004). In contrary to the traditional container metaphor, the human memory mechanism can be described by a relational metaphor, which perceives that memory and knowledge are represented by the connections between neurons in the brain, rather than the neurons themselves as information containers. Therefore, the cognitive model of human memory, particularly the Long-Term Memory (LTM), can be described by two fundamental artifacts (Wang et al., 2003): (a) Objects: The abstraction of external entities and internal concepts. There are also sub-objects known as attributes, which are used to denote detailed properties and characteristics of an object. (b) Relations: Connections and relationships between object-object, object–attributes, and attribute-attribute. Based on the above discussion, an Object-Attribute-Relation (OAR) model of memory can be described as a triple (Wang and Wang, 2004; Wang et al., 2003), i.e.: OAR = (O, A, R)
(8)
where O is a given object identified by an abstract name, A is a set of attributes for characterizing the object, and R is a set of relations between the object and other objects or attributes of them. On the basis of the LRMB and OAR models developed in cognitive informatics (Wang, 2003a, 2007b), the cognitive process of decision making may be informally described by the following courses: 1. 2. 3. 4. 5. 6. 7.
136
To comprehend the decision making problem, and to identify the decision goal in terms of Object (O) and its attributes (A). To search in the abstract layer of LTM (Squire et al. 1993; Wang & Wang, 2004) for alternative solutions () and criteria for useful decision strategies (C). To quantify and C, and determine if the search should be go on. To build a set of decisions by using and C as obtained in above searches. To select the preferred decision(s) on the basis of satisfaction of decision makers. To represent the decision(s) in a new sub-OAR model. To memorize the sub-OAR model in LTM.
The Cognitive Process of Decision Making
A detailed cognitive process model of decision making is shown in Figure 3, where a double-ended rectangle block represents a function call that involve a predefined process as provided in the LRMB model. The first step in the cognitive process of decision making is to understand the given decision-making problem. According to the cognitive process of comprehension (Wang and Gafurov, 2003), the object (goal) of decision will be identified, and an initial OAR model will be created. The object, its attributes, and known relations are retrieved and represented in the OAR model. Then, alternatives and strategies are searched, which result in two sets of and C, respectively. The results of search will be quantified in order to form a decision as given in Equation 2, i.e.: d = f: × C → , where ⊆ U and ≠ ∅. When the decision d is derived, the previous OAR model will be updated with d and related information. Then, the decision maker may consider whether the decision is satisfied according to the current states of nature and personal judgment. If yes, the OAR model for the decision is memorized in the LTM. Otherwise, the decision-making process has to be repeated until a satisfied decision is found, or the decision maker chooses to quit without a final decision. During the decision making process, both the mind state of the decision maker and the global OAR model in the brain change from time to time. Although the state of nature will not be changed in a short period during decision making, the perception towards it may be changed with the effect of the updated OAR model. As described in the LRMB model (Wang et al., 2004), the process of decision making is a higher-layer cognitive process defined at Layer 6. The decision making process interacts with other processes underneath this layer such as Search, Representation and Memorization; as well as the processes at the same layer such as Comprehension, Qualification, Quantification, and Problem solving. Relationships between the decision-making process and other related processes have been described in Figure 1 and in (Wang and Wang, 2004; Wang et al., 2004).
FORMALDECRIPTIONOFTHEDECIIONMAPROCE On the basis of the cognitive model of decision making as described in Figure 3, a rigorous cognitive process can be specified by using RTPA (Wang, 2002; Wang, 2003b). RTPA is designed for describing the architectures, static and dynamic behaviors of software systems (Wang, 2002) as well as human cognitive behaviors and sequences of actions (Wang, 2003b; Wang and Gafurov, 2003). The formal model of the cognitive process of decision making in RTPA is presented in Figure 4. According to LRMB and the OAR model of internal knowledge representation in the brain, the result of a decision in the mind of the decision maker is a new sub-OAR model, or an updated version of the global OAR model of knowledge in the human brain. As shown in Figure 4, a decision-making process (DMP) is started by defining the goal of decision in terms of the object, attributes. Then, an exhaustive search of the alternative decisions () and useful criteria (C) are carried out in parallel. The searches are conducted in both the brain of a decision maker internally, and through external resources based on the knowledge, experiences, and goal expectation. The results of searches are quantitatively evaluated until the searching for both and C are satisfied. If nonempty sets are obtained for both and C, the n decisions in d have already existed as determinable by equations 2 and 3. It is noteworthy that learning results, experiences, and skills of the decision maker may dramatically reduce the exhaustive search process in DMP based on known heuristic strategies. When one or more suitable decisions are selected from the set of d by decision makers via evaluating the satisfaction levels, satisfied decisions will be represented in a sub-OAR model, which will be added to the entire knowledge of the decision maker in LTM.
SOLVINGPLANNINGPROBLEM BY DDECIIONUPPORT SYTEM The decision-making models and the formal description of the cognitive decision-making process as presented in Sections 2 through 4 can be used to address the solution of wicked planning problems in software engineering. Wicked planning problems are not only difficult to solve but also difficult to be explicitly formulated. The notion of a wicked planning problem was introduced by (Rittel and Webber, 1984), where several characteristics were
137
The Cognitive Process of Decision Making
Figure 3. The cognitive process of decision making Begin Identify (Object - O) Identify (Attributes - A)
No
Search (Alternatives of choices - )
Search (Criteria of choices - C)
Quantify ()
Quantify (C)
Evaluate (Adequacy of )
Evaluate (Adequacy of C)
Yes
No
Yes
Select (Decision - d) d = f (, C)
No
Evaluate (Satisfaction of d) Yes Representation (OAR) Memorize (OAR)
End
given to classify a problem as wicked. One of them states that there is no definite formulation of the problem. Another one states that wicked problems have no stopping rule. So, in these cases, does it make sense to look into a more systematic approach at all or shouldn’t we just rely on human intuition and personnel experience to figure out a decision? A systematic approach for solving the wicked planning problem of software release planning was given in (Ngo-The and Ruhe 2006). Release planning is known to be cognitively and computationally difficult (Ruhe and Ngo-The 2004). Different kinds of uncertainties make it hard to be formulated and solved, because real-world release planning problems may involve several hundred factors potentially affecting the decisions for the next release. Thus, a good release plan in decision making is characterized as: •
138
It provides a maximum utility value from offering a best possible blend of features in the right sequence of releases.
The Cognitive Process of Decision Making
Figure 4. Formal description of the cognitive process of decision making in RTPA The Decision Making Process (DMP) DMP_Process(I:: OS; O:: OAR(dS)ST) { // I. Form decision goal(s) Identify (O) // The decision making goal Identify (A)
// Sub decision making goals T
R
{
F
Satisfication of As
(
Search () Quantify () Evaluate ()
)
T
R
||
Satisfication of C
F
(
Search (C) Quantify (C) Evaluate (C)
) } // II. Select decisions d = f: × C →
// Refers to Eq. 2
Evaluate (d) (
s(d) ≥ k
// k: a satisfaction threshold
Memorize (OARST) ⊗ |
~
// Otherwise (
GiveUpBL = F
|
~
DMP_Process(I:: OS; O:: OAR(d)ST) Ø ) ) // III. Represent decisions R = OARST = }
// Form new relation on d // Form new OAR model for d
� • •
It is feasible to the existing hard constrains that have to be fulfilled. It satisfies some additional soft constraints sufficiently well. These soft constraints, for example, can be related to stakeholder satisfaction, consideration of the risk of implementing the suggested releases, balancing of resources or other aspects which are either hard to formalize or not known in advance.
It seems that uncertain software engineering decision problems are difficult to be explicitly modeled and completely formalized, since the constraints of organizations, people, technology, functionality, time, budget, and resources. Therefore, all spectrum of decision strategies as identified in Table 1 and Fig, 1 need to be examined. This is a typical case where the idea of decision support arises when human decisions have to be made in complex, uncertain and/or dynamic environments. Carlsson and Turban (2002) pointes out that the acceptance of theses systems is primarily limited by human related factors: (i) cognitive constraints, (ii) understanding the support of such a model, (iii) difficulty in handling large amounts of information and knowledge, and (iv) frustration caused by complicated theories.
139
The Cognitive Process of Decision Making
The solution approach presented in (Ngo-The and Ruhe, 2006) addresses the inherent cognitive and computational complexity by (i) an evolutionary problem solving method combining rigorous solution methods to solve the actual formalization of the problem combined with the interactive involvement of the human experts in this process; (ii) offering a portfolio of diversified and qualified solution at all iterations of the solution process; and (iii) using the multi-criteria decision aid method ELECTRE (Roy 1993) to assist the project manager in the selection of the final solution from the set of qualified solutions. Further research is ongoing to integrate these results with the framework of the decision-making models and the improved understanding of the cognitive process of decision-making as developed in this chapter.
CONCLUION Decision making is one of the basic cognitive processes of human behaviors by which a preferred option or a course of actions is chosen from among a set of alternatives based on certain criteria. The interest in the study of decision making has been widely shared in various disciplines because it is a fundamental process of the brain. This chapter has developed an axiomatic and rigorous model for the cognitive decision-making process, which explains the nature and course in human and machine-based decision making on the basis of recent research results in cognitive informatics. A rigorous description of the decision process in real-time process algebra (RTPA) has been presented. Various decision making theories have been comparatively analyzed, and a unified decision making model has been obtained, which shows that existing theories and techniques on decision making are well fit in the formally described decision process. One of the interesting findings of this work is that the most fundamental decision that is recurrently used in any complex decision system and everyday life is a Cartesian product of a set of alternatives and a set of selection criteria. The larger both the sets, the more ideal the decisions generated. Another interesting finding of this work is that, although the cognitive complexities of new decision problems are always extremely high, they become dramatically simpler when a rational or formal solution is figured out. Therefore, the reducing of cognitive complexities of decision problems by heuristic feedbacks of known solutions in each of the categories of decision strategies will be further studied in intelligent decision support systems. According to case studies related to this work, the models and cognitive processes of decision making provide in this chapter can be applied in a wide range of decision-support and expert systems.
Ackk This work is partially sponsored by Natural Sciences and Engineering Research Council of Canada (NSERC) and Information Circle of Research Excellence (iCORE) of Alberta. The authors would like to thank the anonymous reviewers for their valuable suggestions and comments on this work.
References Berger, J. (1990). Statistical decision theory – Foundations, concepts, and methods. Springer-Verlag. Carlsson, C., & Turban, E. (2002). DSS: Directions for the next decade. Decision Support Systems, 33, 105-110. Edwards, W., & Fasolo, B. (2001). Decision technology. Annual Review of Psychology, 52, 581-606. Hastie, R. (2001). Problems for judgment and decision making. Annual Review of Psychology, 52, 653-683. Lipschutz, S. (1967). Schaum’s outline of set theory and related topics. McGraw-Hill Inc. Matlin, M.W. (1998). Cognition (4th ed.).Orlando, FL: Harcourt Brace College Publishers. Ngo-The, A., & Ruhe, G. (2006). A systematic approach for solving the wicked problem of software release planning. Submitted to Journal of Soft Computing.
140
The Cognitive Process of Decision Making
Osborne, M., & Rubinstein, A. (1994). A course in game theory. MIT Press. Payne, D.G., & Wenger, M.J. (1998). Cognitive psychology. New York: Houghton Mifflin Co. Pinel, J.P.J. (1997). Biopsychology, (3rd ed.). Needham Heights, MA: Allyn and Bacon. Rittel, H., & Webber, M. (1984). Planning problems are wicked problems. In N. Cross (ed.), Developments in Design Methodology (pp 135-144). Chichester, UK: Wiley. Roy, B. (1991). The outranking approach and the foundations of ELECTRE methods. Theory and Decision, 31, 49-73. Ruhe, G. (2003). Software engineering decision support – Methodologies and applications. In Tonfoni and Jain (eds.), Innovations in Decision Support Systems, 3, 143-174. Ruhe, G., & An, N.-T. (2004). Hybrid intelligence in software release planning. International Journal of Hybrid Intelligent Systems, 1(2), 99-110. Squire, L.R., Knowlton, B., & Musen, G. (1993). The structure and organization of memory. Annual Review of Psychology, 44, 453-459. von Neumann, J. and O. Morgenstern (1980), Theory of Games and Economic Behavior, Princeton Univ. Press. Wald, A. (1950). Statistical Ddecision functions. John Wiley & Sons. Wang, Y. (2002). The real-time process algebra. Annals of Software Engineering, 14, 235-274. Wang, Y. (2003a). On cognitive informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 151-167. Wang, Y. (2003b). Using process algebra to describe human and software behaviors. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 199-213. Wang, Y. (2005a). Mathematical models and properties of games. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 294-300). Irvin, California, USA: IEEE CS Press.. Wang, Y. (2005b, August). A novel decision grid theory for dynamic decision making. Proceedings 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 308-314) Irvin, California: IEEE CS Press. Wang, Y. (2007a). Software engineering foundations: A software science perspective. CRC Software Engineering Series, 2(3). USA: CRC Press. Wang, Y. (2007b, January). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence, (1)1, 1-27. Wang, Y., & Gafurov, D. (2003). The cognitive process of comprehension. Proeedings of the. 2nd IEEE International Conference on Cognitive Informatics (ICCI’03) (pp. 93-97). London, UK. Wang, Y., & Wang, Y. (2004, March), Cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 203-207. Wang, Y., Liu, D., & Wang, Y. (2003). Discovering the capacity of human memory. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 189-198. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2004, March). A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 124-133. Wilson, R.A., & Keil, F.C. (2001). The MIT Encyclopedia of the Cognitive Sciences. MIT Press. Zachary, W., Wherry, R., Glenn, F., & Hopson, J. (1982). Decision situations, decision processes, and decision functions: Towards a theory-based framework for decision-aid design. Proceedings of the 1982 Conference on Human Factors in Computing Systems.
141
142
Chapter X
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects1 Tiansi Dong Cognitive Ergonomic Systems, Germany
Abstract This chapter proposes a commonsense understanding of distance and orientation knowledge between extended objects, and presents a formal representation of spatial knowledge. The connection relation is taken as primitive. A new axiom is introduced to govern the connection relation. Notions of ‘near extension’ regions and the ‘nearer’ predicate are coined. Distance relations between extended objects are understood as degrees of the near extension from one object to the other. Orientation relations are understood as distance comparison from one object to the sides of the other object. Therefore, distance and orientation relations are internally related through the connection relation. The ‘fiat projection’ mechanism is proposed to model the mental formation of the deictic orientation reference framework. This chapter shows diagrammatically the integration of topological relations, distance relations, and orientation relations in the RCC frameworks.
Introduction When we open our eyes, we see a snapshot view of the spatial environment. In this chapter, spatial environments are those in which objects are projectively as large as, or larger than, the body but can be visually apprehended from a single place without appreciable locomotion (Montello, 1993, p. 315). They are vista spatial environments following Montello (1993), or the space surrounding the body following (Tversky, Morrison, Franklin, & Bryant, 1999; Tversky, 2005). We subjectively decompose snapshot spatial environments into objects and spatial relations among them; recognize them, describe spatial relations, and identify whether it is the target environment which we want to enter, even detect object movements in the environment. For example, when you have the first snapshot view of your office in the morning, you will recognize objects in the room, such as a chair and a desk which are indistinguishable from yours, describe their relations, such as the chair is near and in front of the desk, identify whether
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
it is your office, and detect object movements, such as the chair has been moved a bit to the left than its location when you left yesterday. There are several interesting issues involved in the knowledge that people have about snapshot views of spatial environments. Firstly, from snapshot views, we only see part of objects, and other parts may be blocked. However, we can recognize them. This can be evidenced by the Gestalt Theory, e.g., (Koehler, 1929; Koffka, 1935; Wertheimer, 1958), and research in object recognition, e.g., (Biederman, 1987; Spelke, 1990; Buelthoff & Edelman, 1992; Humphreys & Khan, 1992; Tarr, 1995; Tarr & Buelthoff, 1995). We can even recognize objects in snapshot views both from the real world and from the virtual world, e.g., films, TV programs, and photos. When you open your office door in the morning, you receive the light reflection from the chair, and you recognize your office chair; when you see a photo of your office, you receive the light reflection from the photo, and you also recognize your office chair. Recognizing objects either in the real environment or in the photo owes to the light reflection and to our recognition activity—we have knowledge about objects. The knowledge of objects resides in the memory and is activated either by some external stimuli or by certain mental desires. Secondly, object recognition means categorization. Objects in the same category are considered equivalent. Rosch, Mervis, Gray, Johnson, & BoyesBraem (1976) argued that categories of recognized objects are structured such that there is generally one level of abstraction at which we find it easiest to name objects and recognize them the fastest, namely the “basic level category”. The basic level is the first categorization made during perception of the environment. This suggests a link between our spatial knowledge and our spatial descriptions. That is, the spatial knowledge acquired through perception structures our language, i.e., Tversky & Lee (1999). Thirdly, from the perspective of cognitive informatics (CI), i.e. Wang (2003), the meta- cognitive process layer carries out the fundamental and elementary cognitive processes commonly used by processes in higher cognitive layers. It is a critical layer which connects with the perception layer and all other higher layers. How the brain builds spatial representations in the meta-cognitive process layer has become a topic of great interest in cognitive informatics, i.e., Wang, Wang, Patel, and Patel, 2006. The aim of the chapter is to present internal relations between three kinds of spatial relations. The reminder of the chapter is structured as follow: we first propose a commonsense understanding of the spatial knowledge that can be acquired through perception; then we review the Region Connection Calculus (the RCC-8 theory) in the literature and point out one weakness in its axioms; after that we set the starting point of the formal representation, propose a new axiom to govern a characteristic property of the connection relation and present the formalism of distance knowledge between regions; then we show the formalism of the orientation knowledge between regions, and present two examples to demonstrate the integration of distance and orientation relations into the RCC framework.
Ammon Sense Understanding Suppose you are standing at the entrance door shown in Figure 1 (a) and looking into the room, you could recognize not only objects in the room, but also spatial relations among them. You would recognize spatial relations between yourself and objects, such as you are nearer to the balloon than to the writing-desk, and also spatial relations among objects, e.g., the balloon is in front of the writing-desk. If you turn on a flashlight, it will emit a light beam. Imagine our eyes are such a flashlight that they can emit a light beam and that we see the side of the object which blocks the light beam. Then, that an object is in front of the observer can be understood as follows: If the observer faces to the object, there will be a light beam which connects with both the eyes and the object, shown in Figure 1 (b); that an object is at the left-hand side of the observer can be understood as: If the observer turns to the left, there will be a light beam which connects with both the eyes and the object, or if the observer faces forward, there will be a light beam which connects with the left side of the observer’s face and the object—The observer can prove this by turning to the left to see whether there is a light beam connecting the object and the eyes; that the observer is nearer to object A than to object B can be explained as: The light beam which connects with the observer and object A is shorter than the light beam which connects with the observer and object B. Imagine the observer does not emit light beams, rather ultrasonic waves, like blind bats, or imagine a blind man with a stick, they can also know whether there are obstacles in front of them and which obstacle is nearer to them than the other, if they know the distance that the emitted ultrasonic wave travels or the extent reached
143
the fundamental and elementary cognitive processes commonly used by processes in higher cognitive layers. It is a critical layer which connects with the perception layer and A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects all other higher layers. How the brain builds spatial representations in the meta-cognitive Figure 1. Spatial relations acquired through observation
Figure 1.: Spatial relations acquired through observation window
window
lamp
writing-desk
writing-desk
bookshelf
lamp bookshelf
balloon
balloon
tea-table cup
tea-table
door
cup door
couch
couch the standing point
the standing point
(a)
(b)
when using a stick. Light beams, ultrasonic waves, and sticks, etc. serve as extensions of the body space of the observer, no matter what these extension objects are, no 2matter these extension objects are created actively or received passively. Spatial relations between the observer and objects can be understood by making extensions of the body space into the space surrounding the body. When we assign extensions to objects, we can give spatial relations between objects. After we have the experience that we can reach the writing-desk while sitting on the chair, we could assign the space of our body as an extension space to the chair and give the spatial relation between the chair and the desk as that the chair is near the writing-desk. If we notice that we reach the front side of the writing-desk while sitting on the chair, we would give the orientation relation between the chair and the desk as that the chair is in front of the writing-desk.
Dstance: The Extension from One Object to the Other When you observed that the balloon was nearer to you than the writing-desk, you just used the light beam as extension objects, and found that the balloon blocked part of the light beam which connects with the writingdesk and the eyes, shown in Figure 1 (b). This could be explained as the light beam that connects with your eyes and the balloon is shorter than the light beam that connects with the eyes and the writing-desk. This can also be proven by the fact that if you walk to the writing-desk following the light beam which connects with the writingdesk and the eyes, you would reach the balloon first. Distances between two extended objects are the degree of the extension from one object to the other by certain extension objects.
Orientation: The Extension to which Side If now you sit in the chair in Figure 2, you could easily reach the front side of the desk. The extension of the chair with your body space reaches to the front side of the desk. The spatial relation between the chair and the desk is not only that the chair is near the desk but also the chair is near the front side of the desk, instead of its other sides. That is, the chair is nearer to the front side of the desk than to its other sides. This is described as an orientation relation that the chair is in front of the desk. The orientation relation between two extended objects can be explained by the distance comparison from one object to sides of the other object. The orientation relation specifies the extension from one object to a particular side of the other object. 144
connecting the object and the eyes; that the observer is nearer to object A than to object B can be explained as: The light beam which connects withExtended the observer A Commonsense Approach to Representing Spatial Knowledge Between Objectsand object A is shorter than the light beam which connects with the observer and object B. FigureQualitative 2.: Qualitative orientation relations amon the cup, thedesk chair, the deskaswill Figure 2. orientation relations among the cup, thegchair, and the willand be described “thebe cup is on the right-hand side of the desk” and “the chair is in front of the desk” described as “the cup is on the right-hand side of the desk” and “the chair is in front of the desk”
Some objects are perceived the same from different perspectives, e.g., the black ball and the white ball in Figure 3 (a). You can describe the orientation relation between the two balls as follows: If I stand at the place shown in Figure 3 (a), the black ball is at the left-hand side of the white ball. This statement assigns a name to the side of the white ball which is nearer to the black ball than its other sides. This side is named based on the standing place and your facing direction. You imagine that you were at 3 the white ball while keeping the facing direction and name sides of the white ball with reference to names of your own sides, shown in Figure 3 (b). Then, the orientation relation between the black ball and the imagined you can be given: “The black ball is located at the left-hand side of the imagined you”. In the language description, you would say: “If I stand at the place shown in Figure 3 (a), the black ball is at the left-hand side of the white ball.” It is such a mechanism as if you project yourself to the white ball, and this mechanism is called the fiat projection following Dong (2005a). This mechanism requires you to know the standing place and your facing direction and to extend this knowledge to the space surrounding the body, i.e., (Tversky, et al., 1999; Tversky, 2005). From the perspective of the fiat projection mechanism, the deictic orientation reference framework in the literature is an art of naming of sides of an object. In Figure 2 the orientation relation between the cup and the desk can be explained through the fiat projection mechanism as follows: You imagine yourself sitting in the chair, and project your sides to the desk, then the cup is nearer to your right side. The linguistic description can be: The cup is on the right side of the desk.
Figure 3. The fiat projection of the observer to the white ball
leftside
right-side
a near extension of the black ball
(a)
(b)
(c)
4 145
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
Representing Spatial Knowledge-A RREVIEW Spatial representation using the connection relation was pioneered by Clarke (1981) and Clarke (1985). In Clarke’s formalism the connection relation was assumed. That two regions are connected means that they share a common point. The problem of this interpretation is that a region does not connect with the complement of itself. This clearly violates our commonsense understanding of space. This problem was repaired in the RCC-8 theory by Randell, Cui, & Cohn (1992). RCC-8 theory inherited two axioms of the connection relation in Clarke’s work: 1) any region connects with itself; 2) if region A connects with region B, then region B connects with region A. However, RCC-8 theory changes the interpretation of the connection relation as that two regions are connected means that their topological closures share a common point. Therefore, in RCC-8 theory a region connects with its complement. Unfortunately, the two axioms are not strong enough to represent distance relations. Think of the relation between two regions that their distance is less than 1 meter. Any region is less than 1 meter away from itself; and if region A is less than 1 meter away from region B, then region B is less than 1 meter away from region A. That is, this distance relation can be interpreted as a connection relation in the RCC-8 theory. However, the closures of the two regions might not share a common point. As RCC-8 theory could not represent region property like 1 meter, distance relations can not be properly represented. In this chapter, we inherit the interpretation of the connection relation in the RCC-8 theory and will introduce the notion of “the category” of a region and a new axiom, so that the connection relation can be better governed and distance relations can be formalized in the RCC framework.
Starting Points A region is understood as a space occupied by an extended object. The category of a region is the collection of all regions which are congruent with this region (which can coincide with this region by moving or rotation). A region has its topological closure and its interior part. A region has sides. A side is a part of the boundary of the region that can be seen from a view point. It is called a side region. A region is denoted by italic capital letters, such as DESK, CHAIR, DOOR, . . . . The category of a region is denoted by typewriter capital letters, such as DESK, CHAIR, DOOR, . . . . Let OBJ represent a region, and the category of OBJ be OBJ, OBJ.category represents the category of OBJ, OBJ.category = OBJ. A side of a region is denoted by OBJ.side. OBJ. front, OBJ.left, OBJ.back, and OBJ.right are four qualitative values of OBJ.side. The category of OBJ.side is written as OBJ.side. Formulae have the form of Z notion following Woodcock & Davies (1996): “∀x|p•q” (it is read as “for every x satisfying p, q holds”), “∀x•q” (it is read as “for every x, q holds”), “∃x|p•q” (it is read as “there is x satisfying p such that q”), “∃x•q” (it is read as “there is x such that q”), and “ιx(q)” (it is read as “the x that q’s”), x is the bound variable, p is the constraint of x, and q is the predicate.
Spatial Relations Between Regions The ‘Connection’ Relation is Primitive The only primitive relation among regions is the ‘connection’ relation: C. If two regions are connected, then for every region, we can move this region to such a place where it connects with both regions. If one region disconnects from the other, then we can find a small region such that this region cannot connect with both regions. This is equivalent to that if one region disconnects from the other, then there is a region category such that regions in this category do not connect with both regions. Let the category be the ball whose diameter is smaller than the minimal distance between two disconnected regions, then any region in this category does not connect with both disconnected regions. An axiom is therefore proposed as follows.
146
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
Axiom 1: For any regions A and B, if one connects with the other, then for any category Z there is a region Z in Z such that Z connects with both A and B. ∀A, B• C(A, B ) →∀Z∃Z |Z∈Z• C(A, Z ) ∧ C(Z, B ) Two trivial properties of the connection relation are axiomatized in RCC-8 theory by Randell, et al. (1992) as follows. Axiom 2: Any region A, A connects with A. ∀A•C(A, A ) Axiom 3: For any regions A and B, if A connects with B, then B connects with A. ∀A,B • C(A, B ) → C(B,A ) The parthood relation P(A, B) is defined in RCC-8 as follows. Definition 1: Given two regions A and B, ‘A is part of B’ is defined as for any region Z, if Z connects with A, then Z connects with B. P(A,B) ≜∀Z •C(Z,A ) →C(Z,B ) The identity relation EQ is defined in RCC-8 by the parthood relation as follows. Definition 2: That two regions A and B are identical is defined as each is a part of the other. EQ(A,B) ≜P(A,B) ∧ P(B,A )
Representation of Spatial Extensions When you sit in a couch, you stretch out your arm to reach the cup on the tea-table. The space that you can reach with the help of your arm is larger than the space of your body. It is a spatial extension of your body space. The extended space is such that for any object connecting with it, you can stretch out your arm and connect with this object. Formally, let A, X be two regions. The spatial extension of A by X, called the “near extension of A by X ”, can be defined as the sum of all regions of the same category as X which connect with the region A, written as A X . A is called “the anchor region” and X is called “the extension region”, shown in Figure 4 (c). Definition 3: Given two regions A and X, the near extension of A by X is defined as the Y such that for every W, W connects with Y, if and only if there is a region V in X.category such that V connects with A and W. A X ≜ιY(∀W •(C(W,Y ) ≡∃V |V∈X.category • C(A,V ) ∧ C(V,W ))) The existence of A X is guaranteed by Theorem 1; the uniqueness of A X is guaranteed by Definition 2. Theorem 1: Given a region A, there is X such that if X connects with A, then there is Y such that for any W, Y connects with W, if and only if there is V in X.category which connects with A and W. ∃X•C(X,A ) →∃Y∀W • (C (W,Y ) ≡ (∃V | V ∈X.category • C (A,V ) ∧C (V,W ))) 147
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
Figure 4. (a) the extension region X; (b) the anchor region A; (c) the near extension region of A by X; (d) the near extension region of TRUNK by ARM
Figure 4.: (a) the extension region X; (b) the anchor region A; (c) the near extension Forregion example, let by TRUNK thenear regionextension of your shoulder’s trunk and ARM the region occupied by your of A X; (d)bethe region of TRUNK by be ARM
arm. Then the near extension of TRUNK by ARM, T RUNK ARM, refers to an extension of your body that your arm can reach, as shown in Figure 4(d). Theorem 2: For any regions A and X, A is a part of the near extension of A by X. ∀A, X • P(A, AX )
7
Theorem 3 For any regions A, B and X, the near extension of A by X connects with B, if and only if A connects with the near extension of B by X. ∀A, B, X • C(AX,B )≡C(A, BX ) When you sit on the couch, you cannot reach the books on the writing-desk, no matter how you move the arm; and you can stretch out the arm and reach the cup on the tea-table. That is, the cup is nearer to you than the books. This suggests the method of distance comparison through the notion of near extension region. Formally, this distance comparison method can be stated as follows: given three regions A, B, and C, that A is nearer to B than to C can be defined as there is an extension region X such that the near extension of A by X connects with B and disconnects from C. Definition 4: Let A, B and C be regions, then that A is nearer to B than to C , written as “nearer(A, B, C )”, is defined as there is a region X, such that the near extension of A by X connects with B and disconnects from C. nearer(A, B, C ) ≜∃X • C(A X , B) ∧┐C(A X , C )
Defining Distance Relations Using near Ex tension Regions The distance between two disconnected regions can be defined by the number of extension regions which are connected one after another and which connect with the two disconnected objects. For example, the meaning of the distance between a chair and a desk is two meters can be understood as that there is a ruler with the length of one meter and that there are two regions occupied this ruler such that they are connected one after another 148
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
and that they connect with the desk and the chair. This is equivalent to that the near extension of the desk by the ruler connects with the near extension of the chair by the ruler. Or two times extension of the desk by the ruler connects with the chair. Formally, the distance from region A to region B can be represented by the minimal number of extension regions (of the same category X) such that the near extension of A by these regions (one after another) connects with B. If the first near extension region is named as “1X”, the second near extension region as “2X”, the third near extension region as “3X”, . . . , then a naive natural number system for distance relations is developed. Given two regions A and B, “the distance from A to B is nX” is defined as that B connects with the nth near extension region and disconnects from the (n-1)th near extension region. The distance from A to B is, therefore, defined in the notion of the naive natural number system as follows. Definition 5: Let A, B be two regions, and X1, X2 , . . . , Xn be n regions of category X, the distance from A to B is defined as ‘nX’, if ┐C((((A X1) X2 )…) Xn-1 , B ) ∧ C((((A X1) X2 )…) Xn , B )
Defining Orientation Relations Between Regions Using the Nearer Predicate Woolf (1980) explained the word left as located nearer to the left hand than to the right. That is, the left orientation with regard to the body is understood by the distance comparison with the two hands: an object is left to the body, if it is nearer to the left hand than to the right hand. This seemingly cyclic definition is explained as of, relating to, situated on, or being the side of the body in which the heart is mostly located. By this, the left hand is defined as the hand which is nearer to the heart than the other hand. If we distinguish four salient sides, left, right, front, back, the orientation relations between regions can be formalized as follows. Definition 6: Let A and B be two regions, and B.left, B.front, B.right, B.back be four side regions of B. ‘A is in front of B’, written as Front(A, B), is defined as A is nearer to B.front than to B.left, B.right, and B.back. Front(A, B) ≜∀P |P∈{B.left, B.front, B.right, B.back }• ┐EQ (P ,B.front ) → nearer(A, B.front, P ) Similarly, ‘Left(A, B)’ stands for “A is at the left-hand side of B”; ‘Right(A, B)’ for “A is at the right-hand side of B”; ‘Behind(A, B)’ for “A is behind B ”. Definition 7: Let A and B be two regions, and B.left, B.front, B.right, B.back be four side regions of B. Left(A, B) ≜∀P |P∈{B.left, B.front, B.right, B.back }• ┐EQ (P ,B.left ) → nearer(A, B.left, P ) Right(A, B) ≜∀P |P∈{B.left, B.front, B.right, B.back }• ┐EQ (P ,B.right ) → nearer(A, B.right, P ) Behind(A, B) ≜∀P |P∈{B.left, B.front, B.right, B.back }• ┐EQ (P ,B.back ) → nearer(A, B.back, P )
149
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
The undetermined orientation relation between regions A and B with four salient sides, written as Undetermined4, is interpreted as for any near extension of a region, it either disconnects from the other or connects with Theorem 6: side LetofAB, and B be two regions, at least two sides of the other. That is, for any X, if A X connects with one A X connects with another side of B. then Undetermined (A, B), Front(A, B), 4
Left(A,B.right, B), Right( A, Bbe ), the andfour Behind( Definition 8: Let A and B be two regions, and B.left, B.front, B.back sides ofA B.,
B)
are jointly exhaustive.
Undetermined4(A, B) ≜∀X∃P,Q |P,Q∈{B.left, B.front, B.right, B.back }∧ ┐EQ (P , Q ) • C(A X, P) → C(A X, Q )
Front(A, B)щLeft(A, B)щRight(A, B)щ Behind( , B)щUndetermined B) 4(A Theorem 4: LetA and B be two regions, then Front(A, B), Left(A, B),ARight(A, B) and Behind(A, B), pairwise disjoint.
Theorem 4—Theorem 6 show the JEPD (jointly exhaustive and pairwise disjoint) property of the orientation system which distinguishes four orientation relations from the undetermined orientation relation, and therefore named as ORIEN4. The conceptual neighborhoods network of ORIEN4 is shown Theorem 5: Let A and B be two regions, then Undetermined (A, B) and B), Left(A, 4 in Figure 5. each Eachof Front(A, transition from B), one Right(A, B) and Behind(A, B) disjoint. orientation to another should pass the ┐(Front(A,B)∧Undetermined4(A, B)) Undetermined4 relation. For example, the ┐(Left(A, B) ∧Undetermined4(A, B)) transition from Front to Left will pass the ┐(Behind(A,B) “front-left” relation which is covered by ∧Undetermined4(A, B)) ┐(Right(A, B) Undetermined4 in ORIEN4. ┐( Front(A, B) ∧ Left(A, B)) ┐( Left(A, B) ∧ Behind(A, B)) ┐( Behind(A, B) ∧ Right(A, B)) ┐( Right(A, B) ∧ Front(A, B)) ┐( Front(A, B) ∧ Behind(A, B)) ┐( Left(A, B) ∧ Right(A, B))
∧Undetermined4(A, B))
Theorem 6: Let A and B be two regions, then Undetermined4(A, B), Front(A, B), Left(A, B), Right(A, B), and Behind(A, B) are jointly exhaustive.
Figure 5.: The conceptual neighborhoods of ORIEN4 Figure 5. The conceptual neighborhoods of ORIEN4 Front
Left
Right
Undetermined4
Behind
150
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
Front(A, B)∨Left(A, B)∨Right(A, B)∨Behind(A, B)∨Undetermined4(A, B) Theorem 4—Theorem 6: show the JEPD (jointly exhaustive and pairwise disjoint) property of the orientation system which distinguishes four orientation relations from the undetermined orientation relation, and therefore named as ORIEN4. The conceptual neighborhoods network of ORIEN4 is shown in Figure 5. Each transition from one orientation to another should pass the Undetermined4 relation. For example, the transition from Front to Left will pass the “front-left” relation which is covered by Undetermined4 in ORIEN4.
Integrating Distance and Orientation Relations into the RCC Framework In this section, we show how to integrate distance and orientation relations in the RCC framework.
Integrating Distance Relations into RCC Framework: RCC-10 In RCC-8 theory, the DC relation is defined as “┐C”, that is, any none-zero distance relation is covered by the DC relation. Therefore, distance relations can be integrated into the RCC-8 by splitting DC into several distance relations. For example, for the cup disconnected from you, you would say that the cup is near you if you can stretch out your hand and hold it; you would say that the cup is far away from you if you cannot even touch it after you stretch out your hand; you would say that the cup is neither very near nor very far away from you if you stretch out your hand and your fingers can touch it (though you cannot hold it). We could formalize near, neither very near nor very far away, and far away as follows. Definition 9: Let A, B and X be three regions, that A is near B , written as “NR X (A, B)”,is interpreted as A disconnects from B and overlaps BX; that A is penumbra far-or-near B , written as “PR X (A, B)”, is interpreted as A disconnects from B and externally connects with BX; that A is far away from B , written as “FR X (A, B)”, is interpreted as A disconnects from BX . NR X (A, B) ≜DC(A, B) ∧ O(A, BX), PR X (A, B) ≜DC(A, B) ∧ EC(A, BX) FR X (A, B) ≜DC(A, BX) O (overlaps) and EC (externally connects) are defined in RCC-8 as follows. O(A, B) ≜∃Z • P(Z, A) ∧ P(Z, B) EC(A, B) ≜C(A, B) ∧┐O(A, B) Theorem 7: Let A, B, and X be three regions, NR X (A, B), PR X (A, B), and FR X (A, B) are pairwise disjoint. ┐(FR X (A, B) ∧ PR X (A, B)) ┐(PR X (A, B) ∧ NR X (A, B)) ┐(NR X (A, B) ∧ FR X (A, B)) Theorem 8: Let A, B, and X be three regions, then DC(A, B) ≡ NR X (A, B)∨ PR X (A, B) ∨FR X (A, B)
151
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
Theorem 9: Let A and B be two regions, then FR X (A, B), PR X (A, B), NR X (A, B), EC(A, B), PO(A, B), TPP(A, B), NTPP(A, B), EQ(A, B), TPP-1(A, B) and NTPP-1(A, B) are jointly exhaustive. FR X (A, B)∨PR X (A, B)∨NR X (A, B) ∨EC(A, B)∨PO(A, B)∨TPP(A, B) ∨NTPP(A, B) ∨EQ(A, B) ∨TPP-1(A, B) ∨NTPP-1(A, B) PO (partially overlaps), TPP (tangential proper part), NTPP (non-tangential proper part), TPP-1, and NTPP-1 are defined in RCC-8 as follows. PO(A, B) ≜ O(A, B) ∧┐ P(A, B) ∧┐P(A, B) TPP(A, B) ≜ PO(A, B) ∧∃Z• EC(Z, A) ∧EC (Z, B) NTPP(A, B) ≜ PO(A, B) ∧┐∃Z• EC(Z,A) ∧EC (Z,B) TPP-1 (A, B) ≜ TPP(B, A) NTPP-1 (A, B) ≜ NTPP (B, A) This is a simple extension of RCC-8 by specifying DC into three distance relations and called RCC-10 following Dong (2005b). The conceptual neighborhood of RCC-10 is shown in Figure 6. The JEPD property of RCC-8 and Theorem 7—Theorem 9 guarantee the JEPD property of RCC-10.
Integrating Orientation Relations into the RCC Framework: RCC-38 In the RCC-8 theory, there is no room for orientation relations, as sides of regions are not represented. If we represent side regions, each relation in RCC-10 can be further specified by orientation relations. FR X in RCC-10 can be further refined into five qualitative relations: FR X∧Front, FR X ∧Left, FR X∧Behind, FR X ∧Righ t, and Figure 6. The conceptual neighborhoods of RCC-10. The DC relation in RCC-8 is specified by three distance relations: FRX, PRX , and NRX
152
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
FR X ∧Undetermined4. Theorem 10: Let A and B be two regions, A being far away from B is equivalent to that A being far away from B with the orientation that A being in front of, or at the left-hand side of, or at the right-hand side of, or behind B or that the orientation relation is undetermined. FR X (A,B) ≡FR X (A,B)∧Front(A,B) ∨FR X (A,B)∧Left(A, B) ∨FR X (A,B)∧Behind(A,B) ∨FR X (A, B)∧Right(A, B) ∨FR X (A, B)∧Undetermined4(A, B) Similarly, PR X , NR X , EC, PO, TPP, NTPP in RCC-10 can be further refined the same way as the refinement of FR X . When a region covers another region, such as EQ, TPP-1, and NTPP-1, their orientation relations are undetermined. Theorem 11: Let A, B be two regions, if one of the three relations, EC(A, B), TPP-1(A, B), and NTPP-1(A, B), holds, then it holds Undetermined4(A, B) EC(A, B) → Undetermined4(A, B) TPP-1 (A, B) → Undetermined4(A, B) NTPP-1 (A, B) → Undetermined4(A, B) The nodes of RCC-10 can, therefore, be refined by adding orientation relations as follows: Each node of RCC10 nests its neighborhood network of orientation relations, shown in Figure 7. The node of the nested structure represents a spatial relation of topological, distance, and orientation. Piaget & Indelder (1948) claimed that people have three kinds of spatial knowledge: topological, distance, and orientation, therefore, the node in the nested structure represents a full spatial relation, and such a node is called a full spatial node. A spatial transition from one full spatial node to another can be a combination of three kinds of spatial transitions: transition through topological relation, transition through distance relation, transition through orientation relation. A neighborhood transition between two full spatial nodes is a neighborhood transition through one and only one of the three spatial transitions. Therefore, the conceptual neighborhood network of full spatial relations can be obtained by flattening the nested structure, shown in Figure 8. This conceptual neighborhoods network of spatial relations has 38 nodes representing 7 topological relations in RCC-8, 3 distance relations, and 4 salient orientation relations, thus, it is named as RCC-38. Theorem 10, Theorem 11, the JEPD property of RCC-10, and the JEPD property of ORIEN4 guarantee the JEPD property of RCC-38.
Cnclusions and Discussss This chapter presented a formal spatial knowledge representation of distance relations and orientation relations between extended objects, and illustrated how to integrate them into the well-known RCC framework. The notion of the fiat projection is proposed to simulate the formation of the deictic orientation reference framework. This chapter pointed out that the axioms in the RCC-8 theory are not strong enough to represent distance relations, and proposed a new axiom to strengthen it. Detailed analysis of the power of the axioms of RCC-8 theory and the proposed axiom will be published separately. This formal spatial knowledge representation system is successfully served as semantic interpretation for spatial linguistic descriptions in the symbolic simulation system for recognizing changed spatial environments, (Dong 2005a; Dong, 2006). The computational complexity to compare two snapshot views in the recognition process is polynomial based on this knowledge representation system.
153
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
Figure 7. TheFigure conceptual of refinedofRCC-10, “Und.” standsstands for Undetermined 7.: The neighborhoods conceptual neighborhoods refined RCC-10, “Und.” for Undetermined 4 4 PRX
FRX FRX, Left
NRX NRX, Front
PRX, Front
FRX, Front PRX, Left
FRX, Right
NRX, Left
PRX, Right
NRX, Right
FRX, Und.
PRX, Und.
NRX, Und.
FRX, Behind
PRX, Behind
NRX, Behind
TPP TPP, Front TPP, Left TPP, FR Theorem 7: Let A, B, and X be three (A, B)щPRX (A, B)щNR X Right EC X (A, B) NTPP EC, Front A, B) regions, NRX (A, B) PRX (A, B), and FRXTPP, Und.щEC(A, B)щPO(A, B)щTPP( NTPP, Front Left Right (A, B) are pairwise disjoint. B)щEQ( A, EC, B) щNTPP(A, EC, NTPP, Left NTPP, Right -1 TPP, Behind щTPP-1(A, B)щNTPP EC, Und. (A, B) PR (A, B)) 䀍㩿FRX (A, B)ġ NTPP, Und.X 䀍㩿PRX (A, B)ġ NRX (A, B)) EC, Behind PO PO (partially overlaps), TPP (tangential )ġBehind FRX (A, B)) 䀍㩿NRX (A, BNTPP, PO, Front proper part), NTPP (non-tangential proper -1 -1 are defined in part), PO, Left PO, RightTPP , and NTPP Theorem 8: Let A, B, and X be three RCC-8 as follows. TPP-1 PO, Und. regions, then EQ
PO(A, B) ѽ O(A, B) ġ䀍 P(A, B) DC(A, B) ҂ NRX (A, B)щ PRX (A, B) PO, Behind ġ䀍P(A, B) TPP , Und. EQ, Und. щFR X (A, B) TPP(A, B) ѽ PO(A, B) ġФZ• EC(Z, A) ġEC (Z, B) Theorem 9: Let A and B be two regions, NTPP(A, B) ѽ PO(A, B) then FRX (A, B), PRX (A, B), NRX (A, B), ġ䀍Ф NTPP-1Z• EC(Z,A) ġEC (Z,B) EC(A, B), PO(A, B), TPP(A, B), NTPP(A, -1 TPP (A, B) ѽ TPP(B, A) B), EQ(A, B), TPP-1(A, B) and NTPP-1(A, NTPP-1 (A, B) ѽ NTPP (B, A) B) are jointly exhaustive. -1
Figure 8.: The conceptual neighborhoods of refined RCC-38, “Und.” stands for Undetermined4 Figure 8. The conceptual neighborhoods of refined RCC-38, “Und.” stands for Undetermined4 TPPఴFront TPPఴLeft
NTPPఴFront
NTPPఴLeft NTPPఴUnd.
NTPPఴRight
12 FRXఴFront
PRXఴFront
PRXఴLeft
FRXఴLeft
NRXఴFront
NRXఴLeft
ECఴFront
NTPPఴBehind POఴFront
ECఴLeft EQఴUnd.
FRXఴUnd. FRXఴRight FRXఴBehind
PRXఴRight
PRXఴBehind
NRXఴRight
NRXఴBehind
ECఴRight
ECఴBehind POఴBehind NTPP-1ఴUnd. TPP-1ఴUnd.
154
13
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
References Biederman, I. (1987). Recognition-by- components: A theory of human image understanding. Psychological Review, 94(2), 115-147. Buelthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. In Proceedings of National Academy of Science (pp. 60-64).. USA. Clarke, B. L. (1981). A calcus of Iidividuals based on connection. Notre Dame Journal of Formal Logic, 23(3), 204-218. Clarke, B. L. (1985). Individuals and points. Notre Dame Journal of Formal Logic, 26(1), 61-75. Dong, T. (2005a). Recognizing variable spatial environments—The theory of cognitive prism. Unpublished doctoral dissertation. University of Bremen, Germany. Dong, T. (2005b). SNAPVis and SPANVis: Ontologies for recognizing variable vista spatial environments. In C. Freksa, M. Knauff, B. Krieg-Brückner, B. Nebel, & T. Barkowsky (ds.), International Conference Spatial Cognition, 4, 344-365. Berlin: Springer. Dong, T. (2006). The theory of cognitive prism—Recognizing variable spatial environments. In Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (pp. 719–724). Menlo Park, CA: AAAI Press. Dong, T. (2007). Knowledge representation of distances and orientation of rregions. International Journal of Cognitive Informatics and Natural Intelligence, 1(2), 86-99. Franklin, N., & Tversky, B. (1990). Searching imagined environments. Journal of Experimental Psychology: General, 119, 63-76. Humphreys, G. K., & Khan, S. C. (1992). Recognizing novel views of three-dimensional objects. Canadian Journal of Psychology, 46, 170-190. Koehler, W. (1929). Gestalt Psychology. London: Liveright. Koffka, K. (1935). Principles of Gestalt psychology. New York: Brace & World. Montello, D. (1993). Scale and multiple Ppsychologies of space. In A. Frank & I. Campari (Eds.), Spatial Information Theory: A Theoretical Basis for GIS (pp. 312-321). Berlin: Springer. Piaget, J., & Indelder, B. (1948). La représentation de l’espace chez l’enfant. Paris: PUF: Bibliothèque de Philosphie Contemporaine. Randell, D., Cui, Z., & Cohn, A. (1992). A spatial logic based on regions and connection. In B. Nebel & W. Swartout & C. Rich (Eds.), Proceeding 3rd International Conference on Knowledge Representation and Reasoning (pp. 165-176). San Mateo: Morgan Kaufmann. Rosch, E., Mervis, C. B., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382-439. Spelke, E. S. (1990). Principles of object oerception. Cognitive Science, 14, 29-56. Tarr, M. J. (1995). Rotating objects to recognize them: A case study of the role of mental transformations in the recognition of three-dimensional objects. Psychonomic Bulletin and Review, 2, 55-82. Tarr, M. J., & Buelthoff, H. H. (1995). Is human object recognition better described by Geon structural descriptions or by multiple views? Commen on Biederman and Gerhardstein (1993). Journal of Experimental Psychology: Human Perception and Performance, 21(6), 1494-1505.
155
A Commonsense Approach to Representing Spatial Knowledge Between Extended Objects
Tversky, B. (2005). Functional significance of visuospatial representation. In P. Shah & A. Miyake (Eds.), Handbook of higher-level visuospatial thinking. Cambridge: Cambridge University Press. Tversky, B., & Lee, P. (1999). How space structures language. In C. Freksa, C. Habel, & K. F. Wender (Eds.), Spatial cognition: an interdisciplinary approach to representation and processing of spatial knowledge (pp. 157-176). Springer-Verlag. Tversky, B., Morrison, J. B., Franklin, N., & Bryant, D. (1999). Three spaces of spatial cognition. Professional Grographer, 51, 516-524. Wang, Y. (2003). On cognitive informatics. Brain and Mind, 4, 151-167. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006). A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (Part C), 36(2),124-133. Wertheimer, M. (1958). Principles of perceptual Organization. In Reading in Perception. New York: Van Nostrand. Woolf, H. B. (1980). Webster’s New Collegiate Dictionary. Springfield, Massachusetts, U.S.A.: G. & C. Merriam Company.
E ndnote 1
156
This chapter is the revised version of Dong (2007). The work reported in this chapter is independent of the work carried out by Cognitive Ergonomic Systems.
157
Chapter XI
A Formal SpeciILcat ion of the Memorization Process Natalia López Universidad Complutense de Madrid, Spain Manuel Núñez Universidad Complutense de Madrid, Spain Fernando L. Pelayo Universidad de Castilla-La Mancha, Spain
Abstract In this chapter we present the formal language, stochastic process algebra (STOPA), to specify cognitive systems. In addition to the usual characteristics of these formalisms, this language features the possibility of including stochastic time. This kind of time is useful to represent systems where the delays are not controlled by fixed amounts of time, but are given by probability distribution functions. In order to illustrate the usefulness of our formalism, we will formally represent a cognitive model of the memory. Following contemporary theories of memory classification (see Squire et al., 1993; Solso, 1999) we consider sensory buffer, short-term, and long-term memories. Moreover, borrowing from Y. Wang and Y. Wang (2006), we also consider the so-called action buffer memory.
Introduction Cognitive informatics (Wang, 2002a, 2007a) is a relatively new field of knowledge. In fact, we are still in a period where new mechanisms are being introduced in the field. These include implementation and representation languages, theoretical models, algorithms, etc. However, not all the advances in cognitive informatics are completely novel. In fact, we may (and should!) take advantage of other more mature fields. For example, this is the case of the development of frameworks for the formal specification of cognitive systems. The introduction of RTPA (Wang, 2002b, 2003) represents a very adequate step in this direction. By conveniently putting together the ideas underlying classical process algebras, RTPA is a new framework to represent cognitive processes and systems.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Formal Speci.cation of the Memorization Process
In order to understand why we consider stochastic time, it is worth to briefly review the main milestones in the development of process algebras (see (Bergstra, 2001) for a good overview of the current research topics in the field and (Hoare, 1985; Milner, 1989; Baeten and Weijland, 1990) for the presentation of the classical formalisms). Process algebras are very suitable to formally specify systems where concurrency is an essential key. The first work was very significant, mainly to shed light on concepts and to open research methodologies. However, due to the abstraction of the complicated features, models were still far from real systems. Therefore, some of the solutions were not specific enough, for instance, those related to real time systems. In particular, features which were abstracted before have been introduced in the models. Thus, they allow the design of systems where not only functional requirements but also performance ones are included. The most significant of these additions are related to notions such as time (e.g. (Reed and Roscoe, 1988; Nicollin and Sifakis, 1991; Yi, 1991; Davies and Schneider, 1995; Baeten and Middelburg, 2002)) and probabilities (e.g. (Glabbeek et al., 1995; Núñez et al., 1995; Núñez and Frutos, 1995; Cleaveland et al., 1999; Cazorla et al., 2003)). An attempt to integrate time and probabilistic information has been given by introducing stochastic process algebras (e.g. (Gцtz et al., 1993; Hillston, 1996; Bernardo and Gorrieri, 1998; Pelayo et al., 2000; López and Núñez, 2001)). Most stochastic process algebras work exclusively with exponential distributions (some exceptions are (Bravetti et al., 1998; D’Argenio et al., 1998; Harrison and Strulo, 2000; Bravetti and Gorrieri, 2002; López, 2004)). The problem is that the combination of parallel/concurrency composition operators and general distributions strongly complicates the definition of semantic models. That is why stochastic process algebras are usually based on (semi)-Markov chains. However, this assumption decreases the expressiveness of the language, because it does not allow to properly express some behaviors where timed distributions are not exponential. In this chapter we build, on the one hand, on our work on stochastic and probabilistic process algebras (see for example (Núñez et al., 1995; Núñez, 2003; López et al., 2004)) and, on the other hand, on the process algebra RTPA (Wang, 2002b, 2003) to introduce a new stochastic process algebra to formally represent cognitive processes containing stochastic information. We call this language STOPA. The main improvement of our framework with respect to RTPA is that we may specify timed information given by stochastic delays. For example, a process as ξ;P is delayed an amount of time t with probability Fξ (t), where Fξ is the probability distribution function associated with ξ. Nevertheless, let us remark that deterministic delays (as presented in timed process algebras) can be specified by using Dirac distributions. As we will see in the following sections, the inclusion of stochastic time introduces some additional complexity in the definition of the operational semantics of the language. In order to asses the usefulness of our language, we formally represent a high level description of the cognitive process of memory. We will consider sensory buffer, short-term, long-term, and action buffer memories (See (Squire et al., 1993; Solso, 1999; Wang and Wang, 2006)).
The Language STOPA In this section we present our language and its operational semantics. The semantic model is strongly based on (Wang, 2002b, 2006). Our modifications are given mainly in the presentation of the operational semantics for process relations and the introduction of the stochastic time. A process can be defined as a single meta-process or as a complex process based on meta-processes using process relations. Thus, STOPA is described by using the following structure: STOPA ::= Meta-processes | Primary types | Abstract data types | Process relations | System architectures | Specification refinement sequences The syntax and operational semantics of the meta-processes are given in Tables 1 and 2 (Wang, 2002b). The definition of the primary data types are the meta-types defined from 2.1 to 2.10 in Table 3 (Wang, 2002b). The abstract data types are described in Table 4 (Wang, 2002b). The interested reader may find in (Wang, 2006, 2007b) a detailed explanation of all the definitions appearing in Tables 1-4 as well as the definitions of system architectures and specification refinement sequences. In Section 3 we will describe the syntax and operational semantics of the process relations.
158
A Formal Specification of the Memorization Process
As we have already commented in the introduction of this chapter, our aim is to add stochastic information in the framework of RTPA. For that reason we have included random variables in the syntax of the process relations, that is, a process can be delayed according to a random variable. We will suppose that the sample space (that is, the domain of random variables) is the set of real numbers R and that random variables take only positive values, that is, given a random variable ξ we have P(ξ≤t)=0 for any t≤0. The reason for this restriction is that random variables are always associated with time distributions. Definition 1 Let ξ be a random variable. We define its probability distribution function, denoted by Fξ, as the function Fξ: R → (0,1) such that Fξ(x)=P(ξ≤x), where P(ξ≤x) is the probability that ξ assumes values less than or equal to x. The following probability distribution functions will be used in the rest of the chapter. Uniform Distributions. Let a,b∈ R+ such that a
0 x−a Fξ ( x) = b − a 1
if
x≤a
if
a< x
if
x≥b
These distributions allow us to keep compatibility with time intervals in timed process algebras in the sense that the same weight is assigned to all the times in the interval. Discrete Distributions. Let PI= {(ti, pi) | i ∈ Ι} be a set of pairs such that for any i∈I we have that ti ≥ 0, pi > 0, for any i,j∈I, if i≠j then ti≠ tj, and Σpi = 1. A random variable ξ follows a discrete distribution with respect to PI, denoted by D(PI), if its associated probability distribution function is Fξ ( x) = {| pi | x ≥ ti |}. Let us note that {|
∑ i∈I
and |} represent multisets. Discrete distributions are important because they allow us to express passive actions, that is, actions that are willing, from a certain point of time, to be executed with probability 1. Exponential Distributions. Let 0<λ∈R. A random variable ξ follows an exponential distribution with parameter λ, denoted by E(λ) or simply λ, if its associated probability distribution function is:
1 − e −λx Fξ ( x) = 0
if if
x≥0 x<0
Poisson Distributions. Let 0<λ∈R. A random variable ξ follows a Poisson distribution with parameter λ, denoted by P(λ), if it takes positive values only in N and its associated probability distribution function is:
λ t −λx e ∑ Fξ ( x) = t∈N ,t ≤ x t ! 0
if
x≥0
if
x<0
During the rest of the chapter, mainly in the examples and when no confusion arises, we will identify random variables with their probability distribution functions. Regarding communication actions, they can be divided into input and output actions. Next, we define our alphabet of actions.
159
A Formal Specification of the Memorization Process
Table 1. Meta-processes (1/2)
160
A Formal Specification of the Memorization Process
Table 2. Meta-processes (2/2)
Definition 2 We consider a set of communication actions Act=Input∪Output, where we assume Input∩Output=∅. We suppose that there exists a bijection f:Input→Output. For any input action a?∈Input, f(a?) is denoted by the output action a!∈Output. If there exists a message transmission between a? and a! we say that a is the channel of the communication (a,b,c, ⋅⋅⋅ to range over Act). We also consider a special action τ ∉ Act that represents internal behavior of the processes. We denote by Actτ the set Act∪{τ} (α,β,γ,⋅⋅⋅ to range over Actτ). Besides, we consider a denumerable set Id of process identifiers. In addition, we denote by V the set of random variables (ξ,ξ',ψ,⋅⋅⋅ to range over V).
Processssions In this section, we define the syntax and operational semantics of our process algebra to describe process relations. Operational semantics is probably the simplest and more intuitive way to give semantics to any process language. In this part of the description of the language, operational behaviors will be defined by means of ω → P’ that each process can execute. These are obtained in a structured way by applying a set transitions P ω → P’ is that the process P may of inference rules (Plotkin, 1981). The intuitive meaning of a transition as P perform the action ω and, after this action is performed, then it behaves as P’. Let us note that labels appearing in these operational transitions have the following types: a? ∈Input, a!∈Output, a∈Act, α∈Act∪{τ}, ω ∈ Actτ∪ V, and ξ∈V. The set of process relations, as well as their operational semantics, is given in Table 5. Next, we intuitively describe each of the operators.
161
A Formal Specification of the Memorization Process
The external, internal, and stochastic choice process relations are used to describe the choice among different actions. They are respectively denoted by:
∑a ;P i
∑
i
∑ξ ;P
; Pi
i
i
where ai ∈Act and ξi ∈ V. The inference rules describing the behavior of these process relations are (CHO1), (CHO2), and (CHO3). The process ∑ ai ; Pi will perform one of the actions ai and after that it behaves as Pi. The term ∑ τ; Pi represents the internal choice among the processes Pi. Once the choice is made, by performing an internal action τ, the process behaves as the chosen process. Finally, the process ∑ ξi ; Pi will be delayed by ξi a certain amount of time t with probability p = P(ξi ≤ t) and after that it will behave as Pi. Sequence is a process relation in which two processes are consequently performed. This relation is denoted by P;Q. The rules (SEQ1), (SEQ2), (SEQ3), and (SEQ4) describe the behavior of these process relations. Intuitively, P is initially performed. Once P finishes, Q starts its performance. If the process P can perform an action then P;Q will perform it. If the process P finishes then the process P;Q will behave as Q. A process will finish its execution if it performs the action √ (see rule (SEQ4)). The branch and switch process relations are denoted by (? expBL = true );P | (? expBL = false );Q and (? expNUM = i );Pi The behavior of the first operator is described in the rule (BRA). If the boolean expression expBL evaluates to true then P is performed; otherwise, Q is performed. In the second operator, if the value of the numerical expression expBL is equal to i then the process Pi is performed (see rule (SWI)). The FOR-DO, REPEAT, and WHILE-DO process relations are denoted, respectively, by: n
RP i =1
exp BL ≠ true
exp BL ≠ true
R
R
P
≥1
P
≥0
and their operational semantics are given in the rules (FOR), (REP), (WHI1) and (WHI2). The FOR-DO process n relation may be used to describe the performance of a certain process a fix amount of times n. Thus, RP behaves n −1
i =1
as P and after that as RP. The REPEAT process relation executes a process P at least once. It continues its execuexp BL ≠ true
i =1
tion until a certain expression takes the value true. Thus, the term WHILE-DO process relation. The WHILE-DO process relation exp BL ≠ true
exp BL ≠ true
R
R
P behaves as P, and after that, as a
≥1
P behaves as P if the boolean expression
≥0
is true, and after that behaves again as R P. If the boolean expression is false then the process finishes ≥0 its performance. Recursion is a process relation in which the definition of a process may contain a call to itself. We will use a notation slightly different to that in RTPA, X:=P where X∈Id, that is, a process identifier. The rule (REC1) applies to external and internal actions while (REC2) applies to stochastic actions. Let us also remark that P(X/X:=P) represents the substitution of all the free occurrences of X in P by X:=P. The parallel, concurrence, and interleaving process relations are denoted, respectively, by: P||tcQ
P ∫ Q
P|||Q
Parallel is a process relation in which two processes are executed simultaneously, synchronized by a common system clock. The parallel process relation is designed to model behaviors of a multi-processor single-clock system. The parameter tc indicates the time of the system clock. It will vary as time passes. If one of the processes of the parallel composition can perform a non-stochastic action then the composition will perform it (see rules (PAR1) and (PAR2)). We suppose that there is an operation * on the set of actions Act such that (Act,*) is a monoid and τ is its identity element. Thus, by rule (PAR3), if we have the parallel composition of P and Q, P may perform a, Q may perform b, and a*b≠τ then they will evolve together. Let us suppose that P can perform a stochastic action, and neither external nor internal actions can be performed by the composition. Then P||tcQ
162
A Formal Specification of the Memorization Process
Table 3. Meta-types
Table 4. Abstract data types
163
A Formal Specification of the Memorization Process
will perform the temporal action and Q will evolve into cond(Q,∆t), being ∆t the actual time consumed by the stochastic action. That is, ∆t=tc’-tc, where tc is the system time in the moment that the performance of ξ started and tc’ is the system time after the execution of ξ. The definition of the function cond(P,∆t) is given in Table 6. In that table we have included, for the sake of completeness, all the cases of the function. However, the most relevant part of this definition is the one concerning choice process relations. In this case, ξi’ is the random variable whose probability distribution function is given as a conditional probability. We have that P(ξi ' ≤ t') = P(ξi ' ≤ t" | ∆t + t' ≤ t"), that is, the probability of the random variable to finish before t’ units of time have passed is equal to the probability of the original random variable ξ, to be less than or equal to t’’, provided that ∆t+t’≥t’’. Let us remark that the modification of the process of the parallel composition that does not perform the stochastic action is needed. This is so because, in this case, we have a multi- processor single-clock system, so the time consumed by the stochastic action has to be taken into account in both sides of the parallel composition. Concurrence is a process relation in which two processes are simultaneously and asynchronously executed, according to separate system clocks. This process relation is designed to model behaviors of a multi-processor multi-clock system. The rules (CON1) and (CON2) indicate that if one of the processes of the composition can perform an action then the composition will asynchronously perform it. However, if one of the processes of the composition can perform an input action and the other can perform the corresponding output action then there is a communication and the process relation will perform it (see rules (CON3) and (CON4)). Since we have in this case a multi-clock system, if one of the processes can perform a stochastic action then the composition will perform it without modifying the other part of the composition (see rules (CON5) and (CON6)). Interleaving is a process relation in which two processes are simultaneously executed while maintaining by a common system clock. The interleaving process relation is designed to model behaviors of a single-processor single-clock system. If one of the processes can perform a non-stochastic action then the composition will perform it (see rules (INT1) and (INT2)). Regarding stochastic actions, see rules (INT3) and (INT4), we consider that a stochastic action , once it has started, cannot be interrupted. Besides, if one of the processes performs a stochastic action then the other component is not modified. In fact, as we suppose a single processor single-clock system, the other side of the composition is not active, so no time has passed for it. Pipeline is a process relation in which two processes are interconnected to each other. The second process takes the inputs from the outputs of the first one. This process relation is denoted by P>>Q. Thus, if P can perform an action then P>>Q will perform it (see rules (PIPE1) and (PIPE3)). Once the process P is finished, that is, P can perform the action √, P>>Q behaves as Q (see rule (PIPE2)). Time-driven dispatch is a process relation in which the i-th process is triggered by a predefined system time tihh:mm:ss:ms. It is denoted as follows @ tihh:mm:ss:ms → Pi, i ∈ {1, ..., n}. According to the rule (TDD), the process Pi is performed when the value of the system time is equal to tihh:mm:ss:ms. Event-driven dispatch is a process relation in which the i-th process is triggered by a system event @eiS. It is denoted as follows @eiS → Pi, i ∈ {1, ..., n}. When the event occurred is eiS , then the process Pi is performed, rule (EDD). Interrupt is a process relation in which a process is temporarily held by another with higher priority. The term describing that the process P is interrupted by the process Q when the event @eS is captured at interrupt point will be represented by P||(@eS Q). If the event eS is captured (rule (INTER1)), the process P||(@ eS Q) will behave as Q, and after that the process will continue its execution behaving as P. If the event eS is not captured and P can perform an action (rules (INTER2) and (INTER3)), the complete process will also perform it.
164
A Formal Specification of the Memorization Process
Table 5. Operational semantics of the process relations
165
A Formal Specification of the Memorization Process
Table 6. Definition of function cond (P, ∆t)
The Memorization Process In this section we give a brief explanation on how the memorization process works. The main goal of this description is to answer the following three basic questions: • How are memories formed? (encoding) • How are memories retained? (storage) • How are memories recalled? (retrieval)
166
A Formal Specification of the Memorization Process
Encoding Process Encoding is an active process which requires selective attention to the material to be encoded. Memories may then be affected by the amount or type of attention devoted to the task of encoding the material. There may be different levels of processing, some of them being deeper than others. These processes are structural encoding (where emphasis is placed on the physical structural characteristics of the stimulus), phonemic encoding (with emphasis on the sounds of the words) and semantic encoding (that is, emphasis on the meaning). The main aspects of encoding fit the OAR-model in the following sense: 1. 2. 3.
Relation: Association with other information. Object: Visual imagery of the real entity or concept that can be used to add richness to the material to be remembered. Besides, it also adds more sensory modalities. Attributes: To make the material personally relevant and to add more detailed information about the object.
Storage Process Over the years, analogies with available technologies have been made in order to try to explain the behavior of the memory. Nowadays, memory theories use a computer-based, or information processing, model. The most accepted model states that there are three stages of memory storage and one more for memory retrieving: •
•
Sensory Store retains the sensory image for only a small part of a second, just long enough to develop a perception. This is stored in the Sensory Buffer Memory (SFM). Following (Wang and Wang, 2006), we also consider Action Buffer Memory (ABM) which is used as a buffer when recovering information. Short Term Memory (STM) lasts about 20 to 30 seconds when we do not consider rehearsal of the information. On the contrary, if rehearsal is used then short term memory will last as long as the rehearsal continues. Short term memory is also limited in terms of the number of items it can hold. Its capacity is about 7 items but can be increased by chunking, that is, by combining similar material into units.
Let us remark that short term memory was originally perceived as a simple rehearsal buffer. However, it turns out to have a more complicated underlying process, being better modeled by using an analogy with a computer, which has the ability to store a limited amount of information in its cache RAM while performing other tasks. In other words, we can consider it as a kind of working memory. •
Long Term Memory (LTM) has been suggested to be permanent. However, even though no information is forgotten, we might lose the means of retrieving it.
Another interesting point regarding memory is to determine the mechanism to change the condition of a certain memory. In other words, how does short term memory stuff get into long term memory? We have to take into account the following: •
• • • •
Serial position effect. Thus, primacy (i.e. first words get rehearsed more often and so that they move into long term memory) and recency (for instance, words at the end that are not rehearsed as often but that are still available in STM) affect long term memory. Rehearsal helps to move things into long term memory. According to the organizational structures of long term memory, we have also to consider: Related items are usually remembered together. Conceptual hierarchies are used as classification scheme to organize memories.
167
A Formal Specification of the Memorization Process
•
Semantic networks are less neatly organized bunches of conceptual hierarchies linked together by associations to other concepts. Schemes are clusters of knowledge about an event or object abstracted from prior experience with the object. Actually, we tend to recall objects that fit our conception of the situation better than ones that do not. A script is a schema which organizes our knowledge about common things or activities (if you know the script applicable to the event, you can better remember the elements of the event).
• •
The process of storing new information in LTM is called consolidation.
Retrieval Process Memory retrieval is not a random process. Once a request is generated the appropriate searching and finding processes take place. This process is triggered according to the organization structures of the LTM, while the requested information is provided via the Action Buffer Memory. Finally, the graphic description of the memorization process is shown in Figure 1.
Formal Description of the Memorization Process Taking as starting point the description of the memorization process given in the previous section, we present in this section how STOPA describes in a rigorous way this cognitive process of the brain.
Figure 1. The model of the memorization process
// PNN =1 MemorizationProcess (I:: ThePerceptionS; O:: OAR (ThePerceptionS)ST, LmemorizationN) {// PNN = 2 oS := ThePerceptionS ( ScopeS := ObjectsS Search (I:: oS; ScopeS; O::) // PNN =3 | ScopeS := AttributesS Search (I:: oS; ScopeS; A::) // PNN =4 | ScopeS := RelationsS Search (I:: oS; ScopeS; R::) // PNN =5 ) EncodingSTM (I:: OAR(oS); O:: OAR(oS)) { } // PNN = 6 ( ? (time in REHEARSAL)=true // PNN = 7 EncodingLTM (I:: OAR(oS); O:: OAR(oS)) { } // PNN = 8 PL1S | ? (time in REHEARSAL)=false // PNN =10 LOOSING (I:: OAR(oS); O::) PL1S ) PL1S DecodingABM (I:: OAR(oS); O:: TheInformation(ST)) { } // PNN =9 }// PNN =11
168
A Formal Specification of the Memorization Process
Cnclusion In this chapter we have presented the algebraic formalism STOPA. The main new feature of this language with respect to previous work is the ability to represent stochastic time, that is, to consider that time is not given by a fix amount but that it depends on a certain probability distribution function. We have described the memorization processes formally in order to show the usefulness of our formalism. We contemplate three lines for future work. First, we have to perform a more thorough study of the semantic framework. In particular, it would be very adequate to define bisimulation semantics to allow the comparison of different representations of a cognitive process. Second, we are working on the use of other graphical formalism for the formal representation of cognitive systems. Finally, we plan to study in a detailed way other cognitive processes and systems, so that we can better test the capabilities of STOPA. Actually, we have started to give a formal description of the algorithms and systems presented in (Núñez et al., 2003).
References Baeten, J.,& Middelburg, C. (2002). Process algebra with timing. EATCS Monograph. Springer. Baeten, J., & Weijland, W. (1990). Process algebra. Cambridge Tracts in Computer Science, 18. Cambridge University Press. Bergstra, J., Ponse, A., & Smolka, S. (eds.) (2001). Handbook of process algebra. North Holland. Bernardo, M., & Gorrieri, R. (1998). A tutorial on EMPA: A theory of concurrent processes with nondeterminism, priorities, probabilities and time. Theoretical Computer Science, 202, 1-54. Bravetti, M., Bernardo, M., & Gorrieri, R. (1998). Towards performance evaluation with general distributions in process algebras. In CONCUR’98, LNCS 1466, (pp. 405-422). Springer. Bravetti, M., & Gorrieri, R. (2002). The theory of interactive generalized semi-Markov processes. Theoretical Computer Science, 282(1), 5-32. Cazorla, D., Cuartero, F., Valero, V., Pelayo, F., & Pardo, J. (2003). Algebraic theory of probabilistic and nondeterministic processes. Journal of Logic and Algebraic Programming, 55(1-2), 57-103. Cleaveland, R., Dayar, Z., Smolka, S., & Yuen, S. (1999). Testing pre-orders for probabilistic processes. Information and Computation, 154(2), 93-148. D’Argenio, P., Katoen, J.-P., & Brinksma, E. (1998). An algebraic approach to the specification of stochastic systems. In Programming Concepts and Methods, (pp.126-147). Chapman & Hall. Davies, J., & Schneider, S. (1995). A brief history of timed CSP. Theoretical Computer Science, 138, 243-271. Glabbeek, R.V., Smolka, S., & Steffen, B. (1995). Reactive, generative and stratified models of probabilistic processes. Information and Computation, 121(1), 59-80. Gцtz, N., Herzog, U., & Rettelbach, M. (1993). Multiprocessor and distributed system design: The integration of functional specification and performance analysis using stochastic process algebras. In 16th Int. Symp. on Computer Performance Modelling, Measurement and Evaluation (PERFORMANCE’93), LNCS 729, (pp. 121–146). Springer. Harrison, P., & Strulo, B. (2000). SPADES – A process algebra for discrete event simulation. Journal of Logic Computation, 10(1), 3-42. Hillston, J. (1996). A compositional approach to performance modelling. Cambridge University Press.
169
A Formal Specification of the Memorization Process
Hoare, C. (1985). Communicating sequential processes. Prentice Hall. López, N., & Núñez, M. (2001). A testing theory for generally distributed stochastic processes. In CONCUR 2001, LNCS 2154, (pp. 321-335). Springer. López, N., Núñez, M., & Rubio, F. (2004). An integrated framework for the analysis of asynchronous communicating stochastic processes. Formal Aspects of Computing, 16(3), 238-262. Milner, R. (1989). Communication and concurrency. Prentice Hall. Nicollin, X. & Sifakis, J. (1991). An overview and synthesis on timed process algebras. In Computer Aided Verification’91, LNCS 575, (pp. 376-3).. Núñez, M. (2003). Algebraic theory of probabilistic processes. Journal of Logic and Algebraic Programming, 56(1-2), 117-177. Núñez, M., & de Frutos, D. (1995). Testing semantics for probabilistic LOTOS. In Formal Description Techniques 8, 365-380. Chapman & Hall. Núñez, M., de Frutos, D., & Llana, L. (1995). Acceptance trees for probabilistic processes. In CONCUR’95, LNCS 962, (pp.249-263). Springer. Núñez, M., Rodríguez, I., & Rubio, F. (2003). Towards the identification of living agents in complex computational environments. In 2nd IEEE Int. Conf. on Cognitive Informatics, (pp. 151-160). IEEE Computer Society Press. Pelayo, F.L., Cuartero, F., Valero, V., & Cazorla, D. (2000). An example of performance evaluation by using the stochastic process algebra ROSA. In 7th Int. Conf. on Real-Time Systems and Applications, (pp. 271-278). IEEE Computer Society Press. Plotkin, G.D. (1981). A structural approach to operational semantics. Technical Report DAIMI FN-19, Computer Science Department, Aarhus University. Reed, G., & Roscoe, A. (1988). A timed model for communicating sequential processes. Theoretical Computer Science, 58, 249-261. Solso, R. (ed.) (1999). Mind and brain science in the 21st century. MIT Press. Squire, L., Knowlton, B., & Musen, G. (1993). The structure and organization of memory. Annual Review of Psychology, 44, 453-459. Wang, Y. (2002a). On cognitive informatics. Keynote speech of the Proceedings of the 1st IEEE Int. Conf. on Cognitive Informatics, (pp. 34-42). IEEE Computer Society Press. Wang, Y. (2002b). The Real Time Process Algebra (RTPA). Annals of Software Engineering, 14, 235-274. Wang, Y. (2003). Using process algebra to describe human and software behaviors. Brain and Mind, 4, 199213. Wang, Y. (2006, March). On the informatics laws and deductive semantics of software. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 161-171. Wang, Y. (2007a, January). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence, (IJCINI), 1(1), 1-27. USA: IPG Publishing. Wang, Y. (2007b). Software engineering foundations: A software science perspective. New York: CRC Press. Wang, Y., & Wang, Y. (2006, March). Cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 203-207. Yi, W. (1991). CCS+ Time = An interleaving model for real time systems. In 18th ICALP, LNCS 510, 217–228. Springer. 170
Section III
Autonomic Computing
172
Chapter XII
Theoretical Foundations of Autonomic Computing1 Yingxu Wang University of Calgary, Canada
Abstract Autonomic computing (AC) is an intelligent computing approach that autonomously carries out robotic and interactive applications based on goal- and inference-driven mechanisms. This chapter attempts to explore the theoretical foundations and technical paradigms of AC. It reviews the historical development that leads to the transition from imperative computing to AC. It surveys transdisciplinary theoretical foundations for AC such as those of behaviorism, cognitive informatics, denotational mathematics, and intelligent science. On the basis of this work, a coherent framework towards AC may be established for both interdisciplinary theories and application paradigms, which will result in the development of new generation computing architectures and novel information processing systems.
INTRODUCTION
1
Autonomic computing (AC) is a mimicry and simulation of the natural intelligence possessed by the brain using generic computers. This indicates that the nature of software in AC is the simulation and embodiment of human behaviors, and the extension of human capability, reachability, persistency, memory, and information processing speed. The history towards AC may be traced back to the work on automata by Norbert Wiener, John von Neumann, Alan Turing, and Claude E. Shannon as early as in the 1940s (Wiener, 1948; von Neumann, 1946/58/63/66; Turing, 1950; Shannon, 1956; Rabin and Scott, 1959). In the same period, Warren McCulloch proposed the term of artificial intelligence (AI) (McCulloch, 1943/65/93), and S.C. Kleene analyzed the relations of automata and nerve nets (Kleene, 1956). Then, Bernard Widrow developed the technology of artificial neural networks in the 1950s (Widrow and Lehr, 1990). The concepts of robotics (Brooks, 1970) and expert systems (Giarrantans and Riley, 1989) were developed in the 1970s and 1980s, respectively. Then, intelligent systems (Meystel and Albus, 2002) and software agents (Negreponte, 1995; Jennings, 2000) emerged in the 1990s. These events and developments lead to the formation of the concept of AC.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Theoretical Foundations of Autonomic Computing
Table 1. Classification of computing methodologies and systems Behavior (O) Constant Event (I)
Variable
Constant
Routine
Adaptive
Variable
Algorithmic
Autonomic
Deterministic
Nondeterministic
Type of behavior
AC was first proposed by IBM in 2001 where it is perceived that “AC is an approach to self-managed computing systems with a minimum of human interference. The term derives from the body’s autonomic nervous system, which controls key functions without conscious awareness or involvement (IBM, 2001).” Various studies on AC have been reported following the IBM initiative (Pescovitz, 2002; Kephart and Chess, 2003; Murch, 2004). The cognitive informatics foundations of AC have been revealed in (Wang, 2002a/03a/03b/04/06b/06f/ 07a/07c; Wang and Kinsner, 2006). A paradigm of AC in term of cognitive machine has been surveyed in (Kinsner, 2007) and investigated in (Wang, 2006a; Wang, 2007b). Based on cognitive informatics theories (Wang, 2002a; Wang, 2003a; Wang, 2007b), AC is proposed as a new and advanced technology for computing built upon the routine, algorithmic, and adaptive systems as shown in Table 1. The first three categories of computing techniques, such as routine, algorithmic, and adaptive computing, as shown in Table 1, are imperative. In contrary, the AC systems do not rely on imperative and procedural instructions, but are dependent on goal, perception, and inference driven mechanisms. Definition 1. An Imperative Computing (IC) system is a passive system that implements deterministic, contextfree, and stored-program controlled behaviors. Definition 2. An Autonomic Computing (AC) system is an intelligent system that implements nondeterministic, context-dependent, and adaptive behaviors based on goal- and inference-driven mechanisms. This chapter attempts to explore the theoretical foundations and engineering paradigms of AC. It is extended from two invited keynote speeches of the author in the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) (Wang, 2004) and the 1st International Conference on Agent-Based Technologies and Systems (ATS’03) (Wang, 2003b). In the remainder of this chapter, the historical development that transfers IC to AC is reviewed. Then, a comprehensive set of theoretical foundations and paradigms for AC are explored, encompassing those of behaviorism, cognitive informatics, denotational mathematics, and intelligent science. On the basis of this work, a coherent framework towards AC may be established for the development of interdisciplinary theories and application paradigms.
FROMIMPERATIVEAUTONOMIC The general-purpose computers may do anything unless a specific program is loaded, in which the stored program transfers a computer as a general behavioral implementing machine to specific intelligent applications. The approaches to computing, or ways for embodiment of intelligent behaviors, can be classified into two categories known as IC and AC as given in Definitions 1 and 2. The IC system is a traditional and passive system that implements deterministic, context-free, and storedprogram controlled behaviors, where a behavior is defined as a set of observable actions of a given computing system. However, the AC system is an active system that implements nondeterministic, context-dependent, and adaptive behaviors. The autonomic systems do not rely on instructive and procedural information, but are dependent on internal status and willingness that formed by long-term historical events and current rational or emotional goals.
173
Theoretical Foundations of Autonomic Computing
In its AC manifesto, IBM proposed eight conditions setting forth an AC system known as self-awareness, self-configuration, self-optimization, self-maintenance, self-protection (security and integrity), self-adaptation, self-resource-allocation, and open-standard-based (IBM, 2001). Kinsner pointed out that the above characteristics indicate that IBM perceives AC is a mimicry of human nervous systems (Kinsner, 2007). In other words, self-awareness (consciousness) and non-imperative (goal-driven) behaviors are the main characteristics of AC systems (Wang, 2007c). According to cognitive informatics, these eight characteristics of AC identified by IBM may be sufficient to identify an adaptive system rather than an autonomic system. Because adaptive behaviors can be implemented by IC techniques, but autonomic behaviors may only be implemented by non-imperative and intelligent means. This leads to the formal description of the conditions and basic characteristics of AC, and what distinguish AC systems from conventional IC systems. Theorem 1. The necessary and sufficient conditions of IC, CIC , are the possession of event Be, time Bt, and interrupt Bint driven computational behaviors, i.e.: C IC = (Be , Bt , Bint )
(1)
Theorem 2. The necessary and sufficient conditions of AC, CAC , are the possession of goal Bg and inference Binf driven computational behaviors, in addition to the event Be, time Bt, and interrupt Bi driven behaviors, i.e.: C AC = (Bg , Binf , Be , Bt , Bint )
(2)
Corollary 1. The behavioral space of IC, CIC , is a subset of AC, CAC; or in other words, CAC is a natural extension of CIC , i.e.: C IC ⊆ C AC
(3)
The theory and philosophy behind AC are cognitive informatics (Wang, 2002a; Wang, 2003a; Wang, 2007b). Cognitive processes of the brain, particularly the perceptive and inference cognitive processes, are the fundamental means for describing AC paradigms, such as robots, software agent systems, and distributed intelligent networks. In recent research in cognitive informatics, perceptivity is recognized as the sixth sense that serves the brain as the thinking engine and the kernel of the natural intelligence. Perceptivity implements self-consciousness inside the abstract memories of the brain. Almost all cognitive life functions rely on perceptivity such as consciousness, memory searching, motivation, willingness, goal setting, emotion, sense of spatiality, and sense of motion. In cognitive informatics research, it is recognized that Artificial Intelligence (AI) is a subset of natural intelligence (NI) (Wang, 2007d). Therefore, AC may be referred to the natural intelligence and behaviors of human beings. A Layered Reference Model of the Brain (LRMB) is developed (Wang, 2006g) that provides a reference model for the design and implementation of AC systems. LRMB expresses a systematical view toward the formal description and modeling of architectures and behaviors of AC systems, which are created to extend human capability, reachability, and/or memory capacity. The LRMB model explains the functional mechanisms and cognitive processes of the natural intelligence with 39 cognitive processes at six layers known as the sensation, memory, perception, action, meta and higher cognitive layers from the bottom up. The cognitive model of the brain can be used as a reference model for goal-driven technologies in AC.
BEHAVIORIFOUNDATION OFAUTONOMIC Behaviorism is a doctrine of psychology and intelligence science that reveals the association between a given stimulus and an observed response of NI or AI systems, which is developed on the basis of associationism (Sternberg, 1998).
174
Theoretical Foundations of Autonomic Computing
Cognitive informatics reveals that human and machine behaviors may be classified into four categories known as the perceptive behaviors, cognitive behaviors, instructive behaviors, and reflective behaviors (Wang, 2007d). This section investigates the behavioral spaces and their basic properties of IC and AC.
The Behavioral Model of Imperative Computing Before the development of the behavioral model of IC, the types of events that trigger a conventional computational behavior are needed to be discussed. Definition 3. An event is an advanced type in computing that captures the occurring of a predefined external or internal change of status, such as an action of users, an external change of environment, and an internal change of the value of a control variable. The event types that may trigger a behavior can be classified into operational (@eS), time (@tTM), and interrupt (@int) events as shown in Table 2, where @ is the event prefix, and S, TM, and the type suffixes, respectively. In Table 2, the time type suffix TM as defined in RTPA (Wang, 2002b) can be extended to three subtypes: TM = hh:mm:ss:ms | yy:MM:dd | yyyy:MM:dd: hh:mm:ss:ms
(4)
The interrupt event is a kind of special event that models the interruption of an executing process, the temporal handover of controls to an Interrupt Service Routine (ISR), and the return of control after its completion. In a realtime environment, an ISR should just conduct the most necessary functions and must be short enough compared with the time slice scheduled for a normal process. A formal model of interrupt can be described below. Definition 4. An interrupt, denoted by , is a process relation in which a running process P is temporarily held before termination by another higher priority process Q via an interrupt event @int at the interrupt point , and the interrupted process will be resumed when the high priority process has been completed, i.e: P Q P || (@int Q )
(5)
where and denote an interrupt service and an interrupt return, respectively. According to Theorem 1, the three generic behaviors of IC known as the event-. time-, and interrupt-driven behaviors, can be defined as follows. Definition 5. An event-driven behavior, denoted by e, is a machine cognitive process in which the kth behavior in term of process Pk is triggered by a predefined system event @ek S, i.e:
Table 2. Types of events in imperative computing No.
Type
Syntax
Usage in system behavioral description
Category
1
Operational event
2
Time event
@eS
@ekS t Pk
External or internal
@tTM
@tkTM e Pk
Internal
3
Interrupt event
@int
@intk int Pk
External or internal
175
Theoretical Foundations of Autonomic Computing
n
R@e
kS
e Pk
k =1
(6)
Definition 6. A time-driven behavior, denoted by t, is a machine cognitive process in which the kth behavior in term of process Pk is triggered by a predefined time point @tkTM, i.e.: n
R@t
k TM
t Pk
k =1
(7)
where the time point @ti may be a system timing or an external timing event. Definition 7. An interrupt-driven behavior, denoted by i, is a machine cognitive process in which the kth behavior in term of process Pk is triggered by a predefined system interrupt @intk , i.e.: n
R@int k =1
k
int Pk
(8)
In general, all types of events, including the operational events, timing events, and interrupt events, are captured by the system in order to dispatch a designated behavior. On the basis of Theorem 1 and Definitions 5 through 7, the mathematical model of a generic IC system can be described as follows. Figure 1. The imperative computing system model
§IC Imperative-SysIDS :: n proc N -1
{
<
R
PiST>
// Processes
R
MEM[ptrP]RT>
// Memory
R
PORT[ptrP]RT>
// Ports
iN = 0 nMEM H -1
|| <
addr P = 0 nPORT H -1
|| <
ptr P = 0
|| <§tTM>
// The system clock
ne N -1
|| <
R @ekS ↳Pk>
// Event-driven behaviors
R @tkTM ↳Pk>
// Time-driven behaviors
R @intk ↳Pk >
// Interrupt-driven behaviors
R ViRT>
// System variables
R SiBL>
// System statuses
k N=0 nt N -1
|| <
k N=0 nint N -1
|| <
k N=0 nV N -1
|| <
iN = 0 nS N -1
|| <
iN = 0
}
176
Theoretical Foundations of Autonomic Computing
Definition 8. The Imperative Computing System, §IC, is an abstract logical model of conventional computing platforms denoted by a set of parallel or concurrent computing resources and behaviors as shown in Figure 1, where || denotes the parallel relation between given components of the system. As shown in Figure 1, an IC system §IC is the executing platform or the operating system that controls all the computing resources of an abstract target machine. The IC system is logically abstracted as a set of process behaviors and underlying resources, such as the memory, ports, the system clock, and system status. An IC behavior in term of a process Pk is controlled and dispatched by the system §IC, which is triggered by various external, system timing, or interrupt events (Wang, 2007d).
The Behavioral Model of Autonomic Computing AC extends the conventional behaviors of IC as discussed in the preceding subsection to more powerful and intelligent ones such as goal-driven and inference-driven behaviors. According to Theorem 2, with the possessing of all the five forms of intelligent behaviors, AC has advanced closer to the intelligent power of human brains from IC. Definition 9. A goal-driven behavior, denoted by g, is a machine cognitive process in which the kth behavior in term of process Pk is triggered by a given goal @gk RT, i.e.: n
R@g k =1
k ST
g Pk
(9)
where the goal @gk ST is in the system type ST that denotes a structured description of the goal. In Definition 9, the goal may be formally described as follows. Definition 10. A goal, denoted by @gk ST, is a triple, i.e: @gkST = (P, Ω, Θ)
(10)
where P = {p1, p2, …, pn} is a non-empty finite set of purposes or motivations, Ω is a set of constraints for the goal, and Θ is the environment of the goal. Therefore, in some extent, AC is a goal-driven problem solving process by machines that searches a solution for a given problem or finds a path to reach a given goal. There are two categories of problems in problem solving: (a) The convergent problem where the goal of problem solving is given but the path of problem solving may be known or unknown; and (b) The divergent problem where the goal of problem solving is unknown and the path of problem solving are either known or unknown. The combination of the above cases in problem solving can be summarized in Table 3. A special case in Table 3 is that when both the goal and path are known, the case is a solved instance of a given problem. Table 3. Classification of problems and goals Type of problem Convergent Divergent
Goal
Path
Type of solution
Known
Unknown
Proof (Specific)
Known
Known
Instance (Specific)
Unknown
Known
Case study (Open-ended)
Unknown
Unknown
Explorative (open-ended)
177
Theoretical Foundations of Autonomic Computing
According to Theorem 2, inference capability is the second extension of AC on top of the capabilities of IC, which is a cognitive process that reasons a possible causality from given premises based on known causal relations between a pair of cause and effect proven true by empirical arguments, theoretical inferences, or statistical regulations. Definition 11. An inference-driven behavior, denoted by inf, is a machine cognitive process in which the kth behavior in term of process Pk is triggered by a given result of inference process @inf k ST, i.e.: n
R@inf k =1
k ST
inf Pk
(11)
Formal inferences can be classified into the deductive, inductive, abductive, and analogical categories (Wang, 2007d). A summary of the formal definitions of the four inference techniques will be described in Table 5 in the section of denotational mathematics. On the basis of the definitions of the behavioral space of AC, a generic AC system may be rigorously modeled below. Definition 12. The AC System, §AC, is an abstract logical model of computing platform denoted by a set of parallel or concurrent computing resources and behaviors as shown in Figure 2, where || denotes the parallel relation between given components of the system.
Figure 2. The autonomic computing system model §AC Autonomic-SysIDS :: n proc N -1
{
<
R
PiST>
// Processes
R
MEM[ptrP]RT>
// Memory
R
PORT[ptrP]RT>
// Ports
iN = 0 nMEM H -1
|| <
addr P = 0 nPORT H -1
|| <
ptr P = 0
|| <§tTM>
// The system clock
ne N -1
|| <
R @ekS ↳Pk>
// Event-driven behaviors
R @tkTM ↳Pk>
// Time-driven behaviors
R @intk ↳Pk >
// Interrupt-driven behaviors
R @gkST ↳ Pk>
// Goal-driven behaviors
R @infkST ↳ Pk >
// Inference-driven behaviors
R ViRT>
// System variables
R SiBL>
// System statuses
k N=0 nt N -1
|| <
k N=0 nint N -1
|| < || < || <
k N=0 nt N -1
k N=0 nint N -1 k N=0 nV N -1
|| <
iN = 0 nS N -1
|| <
iN = 0
}
178
Theoretical Foundations of Autonomic Computing
4. CCOGNITIVEINFORMATIC FFOUNDATION OFAUTONOMIC Cognitive processes of the brain, particularly the perceptive cognitive processes, are the fundamental means for describing AC systems, such as robots, software agent systems, and distributed intelligent networks. In recent research in cognitive informatics, perceptivity is recognized as the sixth sense that serves the brain as the thinking engine and the kernel of the natural intelligence (Wang et al., 2006). Perceptivity implements self consciousness inside the abstract memories of the brain. Almost all cognitive life functions rely on perceptivity such as consciousness, memory searching, motivation, willingness, goal setting, emotion, sense of spatiality, and sense of motion. The LRMB reference model (Wang et al., 2006) is developed to explain the fundamental cognitive mechanisms and processes of natural intelligence. Because a variety of life functions and cognitive processes have been identified in CI, psychology, cognitive science, brain science, and neurophilosophy, there is a need to organize all the recurrent cognitive processes in an integrated and coherent framework. The LRMB model explains the functional mechanisms and cognitive processes of natural intelligence that encompasses 39 cognitive processes at six layers known as the sensation, memory, perception, action, meta cognitive, and higher cognitive layers from the bottom-up as shown in Figure 3. LRMB elicits the core and highly repetitive recurrent cognitive processes from a huge variety of life functions, which may shed light on the study of the fundamental mechanisms and interactions of complicated mental processes as well as AC, particularly the relationships and interactions between the inherited and the acquired life functions as well as those of the subconscious and conscious cognitive processes. The LRMB model of the brain establishes a reference model for implementing AC systems, which provides insightful and multidisciplinary theories for the design and implementation of AC systems. According to LRMB, the brain as well as an AC system can be formally treated as a real-time information processing system at the functional level as described below.
Figure 3. The layered reference model of the brain (LRMB)
Conscious cognitive processes
Layer 6 Higher cognitive functions
Layer 5 Meta cognitive functions
Subconscious cognitive processes
Layer 4 Action
Layer 3 Perception
Layer 2 Memory
Layer 1 Sensation
179
Theoretical Foundations of Autonomic Computing
Definition 13. The cognitive system model of the brain or an AC system, NI-Sys, can be described as a real-time natural intelligent system with an inherited operating system (thinking engine) NI-OS and a set of acquired life applications NI-App, i.e.: NI-Sys NI-OS || NI-App
(12)
where NI-OS represents the inherited life functions, NI-App the developed life functions, and || a parallel relation. Corresponding to Figure 3, the Level 1 through Level 4 life functions belong to NI-OS, and Level 5 and Level 6 life functions are fundamental part of NI-App. All everyday behaviors can be perceived as a real-time combination and invocation of these fundamental life functions at various levels of LRMB. The characteristics of NI-OS have been observed as follows: • • • • • • • •
Inherited Wired (by neural networks) Working subconsciously A real-time system Running subconsciously and automatically Person-independent, common and similar Highly parallel and fault-tolerant With event/time/interrupt/goal/inference-driven mechanisms In contrary to the NI-OS, the characteristics of NI-App have been identified as follows:
• • • • •
Acquired Partially wired (frequently used functions) and partially programmed (temporary functions) Working consciously Can be trained and programmed Person-specific
Figure 4. The functional model of the brain (Wang and Wang, 2006)
T h e e x te rn a l w o rld
T h e in te rn a l w o rld
V is io n A u d itio n T ouch S m e ll T a s te
N I-A p p In p u t s e n s o rs
S e n s o ry b u ffe r m em ory (S B M )
(T h e th ink in g e n g in e )
L o o k in g A c tio n b u ffe r m em ory (A B M )
O u tp u t a c tio n s
S p e a k in g W ritin g … o th e r
S h o rt-te rm M e m o ry (S T M )
180
N I-O S
L o n g -te rm M e m o ry (L T M )
Theoretical Foundations of Autonomic Computing
The goal-driven and inference-driven mechanisms are unique features of NI and AC systems that are autonomously determined by internal events or conditions such as emotions, desires, and rational reasoning. The LRMB model for AC can be extended to a functional model as illustrated in Figure 4 with the NI-Sys (NI-OS || NI-App), LTM, STM, SBM (connected with a set of sensors), and ABM (connected with a set of servos). In Figure 4, the kernel of the brain is the natural intelligence system (NI-Sys), which is the thinking engine of the brain as described in the LRMB model. It is noteworthy that each abstract object is physically stored in a different type of the memories as given below. Theorem 3. The Cognitive Models of Memory (CMM) states that the architecture of human memory is parallel configured by the Sensory Buffer Memory (SBM), Short-Term Memory (STM), Long-Term Memory (LTM), and Action-Buffer Memory (ABM), i.e.: CMM SBM || STM || LTM || ABM
(13)
The CMM model presents the neural informatics foundation of natural intelligence, and the physiological evidences of why natural intelligence may be classified into four forms as given in Theorem 4 in the next section. With the CMM model, an AC system can be implemented by mimicking the following abstract brain models. Definition 14. The functional model of the brain, BRAIN, as a real-time system and a high-level logical model of the brain, describes the functional configuration of the brain and how the NI-Sys interacts with the memory system, i.e: BRAIN NI _ Sys || CMM = ( NI _OS || NI _ App ) || ( LTM || STM || SBM || ABM )
(14)
Equation 14 indicates that although the thinking engine NI-Sys is considered the center of the natural intelligence, the memories are essential to enable the NI-Sys to properly functioning, and to keep temporary and stable results stored and retrievable. The corresponding biological organ of NI-OS in neurophysiology is thalamus – a switching center located above the midbrain, which possesses tremendous amounts of connections to almost all parts of the brain, especially cerebral cortex, eyes, and visual cortex (Sternberg, 1998). NI-Sys interacts with LTM and STM in a bi-directional way, which forms the basic functionality of the brain as a thinking machine. STM provides working space for the NI-Sys, and LTM stores both cumulated information (knowledge) and wired and usually subconscious procedures (skills). The NI-Sys communicates with the external world through inputs and outputs. The former are sensorial information, including vision, audition, touch, smell, and taste. The latter are action and behaviors of life functions, such as looking, speaking, writing, and driving. The actions and behaviors generated in the brain, either from NI-OS or NI-App, are buffered in the ABM before they are executed and outputted to implement the predetermined actions and behaviors. Therefore, ABM plays an important role in the brain to plan and implement
181
Theoretical Foundations of Autonomic Computing
human behaviors. However, it was overlooked in the literature of neuropsychology and cognition science (Pinel, 1997; Sternberg, 1998; Gabrieli, 1998). It is noteworthy that unlike a computer, the brain works in two approaches: the internal willingness-driven processes (in NI-OS), and the external event- and time-driven processes (in NI-App). The external information and events are the major sources that drive the brain, particularly for NI-App functions. In this case, the brain may be perceived as a passive system, at least when it is conscious, which is controlled and driven by external information. Even the internal willingness, such as goals, desires and emotions, may be considered as derived information based on originally external information. A fundamental question in cognitive psychology is how consciousness can be the product of physiological processes in the brain. Similarly, the fundamental question for AC is how autonomic behaviors may be generated by non-imperative processes on generic computers. The cognitive models developed in this section reveals, as that of IC is controlled by stored-programs, AC should be controlled by predefined and learned cognitive processes as identified in LRMB.
DENOTATIONALMATHEMATIC FOUNDATION OF Au AUTONOMICCOMPUTING As that of IC is based on mathematical foundations of Boolean algebra, the more intelligent capability of AC should be treated by more powerful mathematical structures known as denotational mathematics in the form of system algebra (Wang, 2006c), concept algebra (Wang, 2006d), and real-time process algebra (RTPA) (Wang, 2002b). The history of sciences and engineering shows that new problems require new forms of mathematics. AC and intelligence science are new disciplines, and the problems in them require new mathematical means that are descriptive and explicit in expressing and denoting human and system behaviors in the non-imperative approach. Conventional analytic mathematics are unable to solve the fundamental problems inherited in AC and related disciplines. Therefore, denotational mathematical structures and means (Wang, 2006b) beyond set theory and mathematical logic are yet to be sought. Although there are various ways to express facts, objects, notions, relations, actions, and behaviors in natural languages, it is found in cognitive informatics that human and AC system behaviors may be classified into three basic categories known as to be, to have, and to do. All mathematical means and forms, in general, are an abstract and formal description of these three categories of expressibility and their rules. Adopting this view, mathematical logic may be perceived as the abstract means for describing ‘to be,’ set theory describing ‘to have,’ and algebras, particularly process algebra, describing ‘to do.’ Three forms of new mathematics known as concept algebra, system algebra, and RTPA are created in cognitive informatics to enable rigorous treatment of knowledge representation and manipulation in a formal and coherent framework. The three new structures of contemporary mathematics have extended the abstract objects under study in mathematics from basic mathematical entities of numbers and sets to higher levels, i.e. concepts, systems, and behavioral processes as shown in Table 4. Table 4 indicates that, in general, the utility of mathematics is the means and rules to express thought rigorously and generically at a higher level of abstraction. It is recognized that intelligent inference capability are based on the cognitive process of abstraction. Abstraction is not only a powerful means of philosophy and mathematics, but also a preeminent trait of the human brain identified in cognitive informatics studies (Wang, 2005). All formal logical inferences and reasoning can only be carried out on the basis of abstract properties shared by a given set of objects under study. Table 4. Conventional and contemporary mathematical means for denotational problems
182
Basic denotational problems in AC
Classic mathematics
Contemporary mathematics
To be
Logic
Concept algebra
To have
Set theory
System algebra
To do
Functions
RTPA
Theoretical Foundations of Autonomic Computing
Table 5. Definitions of formal inferences processes No.
Inference technique
Formal description Primitive form
Usage
Composite form
1
Deduction
∀x ∈ X, p(x) ∃a ∈ X, p(a)
(∀x ∈ X, p(x) ⇒ q(x)) ∀x ∈ X, p(x) ∃a ∈ X, p(a) (∃a ∈ X, p(a) ⇒ q(a))
To derive a conclusion based on a known and generic premises.
2
Induction
((∃a ∈ X, P(a)) ∧ (∃k, k+1 ∈ X, (P(k) ⇒ P(k+1))) ∀x ∈ X, P(x)
((∃a ∈ X, p(a) ⇒ q(a)) ∧ (∃k, k+1 ∈ X, ((p(k) ⇒ q(k)) ⇒ (p(k+1) ⇒ q(k+1)))) ∀x ∈ X, p(x) ⇒ q(x)
To determine the generic behavior of the given list or sequence of recurring patterns by three samples.
3
Abduction
(∀x ∈ X, p(x) ⇒ q(x)) (∃a ∈ X, q(a) ⇒ p(a))
(∀x ∈ X, p(x) ⇒ q(x) ∧ r(x) ⇒ q(x)) (∃a ∈ X, q(a) ⇒ (p(a) ∨ r(a)))
To seek the most likely cause(s) and reason(s) of an observed phenomenon.
4
Analogy
∃a ∈ X, p(a) ∃b ∈ X, p(b)
(∃a ∈ X, p(a) ⇒ q(a)) (∃b ∈ X, p(b) ⇒ q(b))
To predict a similar phenomenon or consequence based on a known observation.
Definition 15. Abstraction is a process to elicit a subset of objects that shares a common property from a given set of objects and to use the property to identify and distinguish the subset from the whole in order to facilitate reasoning, i.e.: ∀S, p ⇒ ∃ e ∈ E ⊆ S, p(e)
(15)
Abstraction is a gifted capability of human beings, which is identified as a basic cognitive process of the brain at the meta cognitive layer according to LRMB (Wang et al., 2006). Only by abstraction can important theorems and laws about the objects under study be elicited and discovered from a great variety of phenomena and empirical observations in an area of knowledge inquiry. On the basis of abstraction, formal inferences may be classified into the deductive, inductive, abductive, and analogical categories as shown in Table 5 (Wang, 2005). Deduction is a cognitive process by which a specific conclusion necessarily follows from a set of general premises. Induction is a cognitive process by which a general conclusion is drawn from a set of specific premises based on three designated samples in reasoning or experimental evidences. Abduction is a cognitive process by which an inference to the best explanation or most likely reason of an observation or event. Analogy is a cognitive process by which an inference about the similarity of the same relations holds between different domains or systems, and/or examines that if two things agree in certain respects then they probably agree in others. Detailed descriptions of the formal cognitive inference processes for AC as listed in Table 5 may be referred to (Wang, 2005), which can be used to simulate machine cognitions and the implementation of inference engines for AC systems on the basis of denotational mathematics.
INTELLIGENTCIENCEFOUNDATION OFAUTONOMIC Intelligence is perceived as the driving force or the ability to acquire and use knowledge and skills, or to reasoning in problem solving. It was conventionally perceived that only human beings possess higher-level intelligence. However, the development of computers, robots, and autonomic systems indicates that intelligence may also be created or implemented by machines and man-made systems. This is the intelligent behavioral foundation for designing and implementing AC systems. Definition 16. Intelligence, in the narrow sense, is a human or a system ability that transforms information into behaviors; and in a broad sense, it is any human or system ability that autonomously transfers the forms of abstract information between data, information, knowledge, and behaviors in the brain.
183
Theoretical Foundations of Autonomic Computing
With the clarification of the intension and extension of the generic concept of intelligence, the terms of natural and artificial intelligence can be derived below. Definition 17. Natural intelligence (NI) is the intelligent capability possessed or implemented by the brains of human beings and other advanced species. Definition 18. Artificial intelligence (AI) is the intelligent capability possessed or implemented by machines or man-made systems. Intelligence can be formally modeled as a set of functions that transfers a pair of abstract objects in the brain or system as given in Definitions 17 or 18. Theorem 4. The nature of intelligence states that intelligence I can be classified into four forms called the perceptive intelligence Ip, cognitive intelligence Ic, instructive intelligence Ii, and reflective intelligence Ir as modeled below: I Ip : D → I (Perceptive)
(16)
|| Ic : I → K (Cognitive) || Ii : I → B (Instructive) || Ir : D → B (Reflective)
According to Definition 16 and Theorem 4, the narrow sense of intelligence corresponds to the instructive and reflective intelligence; while the broad sense of intelligence includes all four forms of intelligence, that is, the perceptive, cognitive, instructive, and reflective intelligence. On the basis of the conceptual models developed in this subsection, the mechanisms of NI can be described by a generic intelligence model as given in Definition 19 and Figure 5. Definition 19. The Generic Intelligence Model (GIM) describes the mechanisms of NI, as shown in Figure 5, according to Theorem 4 on the nature of intelligence. In Figure 5, four forms of natural intelligence are described as the driving forces that transfer between a pair of abstract objects in the brain or an AC system such as data (D), information (I), knowledge (K), and behavior (B). The GIM model and Theorem 4 reveal that NI and AI share the same cognitive informatics foundations. In other words, they are compatible. Therefore, on the basis of Theorem 4, the studies on NI and AI in cognitive informatics, and AC in particular, may be unified into a common framework. The intelligent behavioral foundations of AC as given in the GIM model provides a new paradigm of AC, which reveals that an AC system may not only implement the reflective and instructive intelligence, but also implement the cognitive and perceptive intelligence according to the theory of the intelligent behavioral paradigm.
Figure 5. The generic intelligence model (GIM) K S tim uli E nq uiries
184
Ir
D
ltm
B
Ic
Sb m
Ip
I St m
B eh aviors
abm
Ii
Ip – Perceptive in te llige n ce
Ii – Instructive in te llige n ce
Ic – Cognitive in te llige n ce
Ii – Reflective in te llige n ce
Theoretical Foundations of Autonomic Computing
Conclusion This chapter has presented a new perspective on autonomic computing as a novel computing system with the highest level of machine intelligence, which embodies the goal- and inference-driven computational behaviors on top of imperative computing techniques with event-, time-, and interrupt-driven computational behaviors. This chapter has explored the theoretical foundations and engineering paradigms of AC. A comprehensive set of theoretical foundations for AC, such as those of behaviorism, cognitive informatics, denotational mathematics, and intelligent science, have been identified. The findings of this work, particularly the theorems of the necessary and sufficient conditions of imperative and autonomic computing, and the generic intelligence model of natural and machine intelligence, have formed a solid foundation for understanding and developing advanced autonomic computing techniques and their engineering applications.
Acknowledgment The author would like to acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) to this work. The author would like to thank the valuable comments and suggestions of the reviewers and colleagues.
References Brooks, R.A. (1970). New approaches to robotics. American Elsevier, 5, 3-23. New York. Gabrieli, J.D.E. (1998). Cognitive neuroscience of human memory. Annual Review of Psychology, 49, 87-115. Giarrantans, J., & Riley, G. (1989). Expert systems: Principles and programming. Boston: PWS-KENT Pub. Co. IBM (2006, June). Autonomic computing white paper: An architectural blueprint for autonomic computing (4th ed.) (pp.1-37). IBM (2001). IBM autonomic computing manifesto. http://www.research. ibm.com/autonomic/. Jennings, N.R. (2000). On agent-based software engineering. Artificial Intelligence, 17(2), 277-296. Kephart, J., & Chess, D. (2003, January). The vision of autonomic computing. IEEE Computer, 26(1), 41-50. Kinsner, W. (2007). Towards cognitive machines: Multiscale measures and analysis. The International Journal on Cognitive Informatics and Natural Intelligence (IJCINI), 1(1), 28-38. Kleene, S.C. (1956). Representation of events by nerve nets. In C.E. Shannon and J. McCarthy (eds.) Automata Studies (pp. 3-42). Princeton Univ. Press. McCulloch, W.S.,& Pitts, W.H. (1943). A logic calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5. USA. McCulloch, W.S. (1965). Embodiments of mind. Cambridge, MA: MIT Press. McCulloch, W.S. (1993). The complete works of Warren S. McCulloch. Salinas, CA: Intersystems Pub. Meystel, A.M., & Albus, J.S. (2002). Intelligent systems, architecture, design, and control. John Wiley & Sons, Inc. Murch, R. (2004). Autonomic computing. London: Person Education.
185
Theoretical Foundations of Autonomic Computing
Pescovitz, D. (2002). Autonomic computing: Helping computers help themselves. IEEE Spectrum, 39(9), 49-53. Pinel, J.P.J. (1997), Biopsychology, (3rd ed.). Needham Heights, MA: Allyn and Bacon. Rabin, M.O., & Scott, D. (1959). Finite automata and their decision problems. IBM Journal of Research and Development, 3, 114-125. Shannon, C.E. (ed.) (1956). Automata studies. Princeton: Princeton University Press. Sternberg, R.J. (1998). In search of the human mind, (2nd ed.), Orlando, FL: Harcourt Brace & Co., Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460. von Neumann, J. (1946). The principles of large-scale computing machines. Reprinted in Annals of History of Computers, 3(3), 263-273. von Neumann, J. (1958). The computer and the brain. New Haven: Yale University Press von Neumann, J. (1963). General and logical theory of automata, A.H. Taub ed., collected works, 5, Pergamon, (pp. 288-328). von Neumann, J., & Burks, A.W. (1966). Theory of self-reproducing automata. Urbana, IL: University of Illinois Press. Wang, Y. (ed.) (2007a). Special issues on autonomic computing. The International Journal on Cognitive Informatics and Natural Intelligence (IJCINI), 1(3). Wang, Y. (2007b). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), 1(1), 1-27. Hershey, PA: IGP. Wang, Y. (2007c). Exploring machine cognition mechanisms for autonomic computing. The International Journal on Cognitive Informatics and Natural Intelligence (IJCINI), 1(2), i-v. Wang, Y. (2007d). Software engineering foundations: A software science perspective. CRC Book Series in Software Engineering, 2-3. USA: CRC Press. Wang, Y. (2006a, July). Cognitive informatics - Towards the future generation computers that think and feel. Keynote speech of the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (PP. 3-7). Beijing, China; IEEE CS Press. Wang, Y. (2006b, July). Cognitive informatics and contemporary mathematics for knowledge representation and manipulation. Invited Plenary Talk at the Proceedings of the 1st International Conference on Rough Set and Knowledge Technology (RSKT’06), Lecture Notes in Artificial Intelligence, LNAI 4062, (pp. 69-78). Chongqing, China: Springer. Wang, Y. (2006c, July). On abstract systems and system algebra. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 332-343). Beijing, China: IEEE CS Press. Wang, Y. (2006d, July). On concept algebra and knowledge representation. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 320-331). Beijing, China: IEEE CS Press. Wang, Y. (2005, August). The cognitive processes of abstraction and formal inferences. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 18-26). Irvin, California: IEEE CS Press. Wang, Y. (2004, August) On Autonomic Computing and Cognitive Processes. Keynote Speech of the Proceedings of the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) (pp.3-4). Victoria, Canada: IEEE CS Press. Wang, Y. (2003a). Cognitive informatics: A new transdisciplinary research field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 115-127.
186
Theoretical Foundations of Autonomic Computing
Wang, Y. (2003b, August). Cognitive informatics models of software agent systems and autonomic computing. Keynote Speech of the Proceedings of the 1st International Conference on Agent-Based Technologies and Systems (ATS’03) (p. 25). Calgary, Canada: ). University of Calgary Press. Wang, Y. (2002a). On cognitive informatics. Keynote Speech of the Proceedings of the 1st IEEE International Conference on Cognitive Informatics (ICCI’02) (pp. 34-42). Calgary, Canada: IEEE CS Press. Wang, Y. (2002b). The real-time process algebra (RTPA). Annals of Software Engineering: An International Journal, 14, 235-274. Wang, Y., & Wang, Y. (2006, March). On cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 16-20. Wang, Y., & Kinsner, W. (2006, March). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 121-123. Wang, Y., Wang, Y. Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 124-133. Wiener, N. (1948). Cybernetics or control and communication in the animal and the machine. Cambridge, MA: MIT Press. Widrow, B., & Lehr, M.A. (1990, September). 30 years of adaptive neural networks: Perception, madeline, and backpropagation. Proceedings of the IEEE, 78(9), 1415-1442.
Endnote 1
This chapter is an extension based on the keynote lectures of the author in the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) “On Autonomic Computing and Cognitive Processes (Wang, 2004),” and in the 1st International Conference on Agent-Based Technologies and Systems (ATS’03) on “Cognitive Informatics Models of Software Agents and Autonomic Computing (Wang, 2003b).”
187
188
Chapter XIII
Towards Cognitive Machines: Multiscale Measures and Analysis Witold Kinsner University of Manitoba, Canada
Abstract Numerous attempts are being made to develop machines that could act not only autonomously, but also in an increasingly intelligent and cognitive manner. Such cognitive machines ought to be aware of their environments which include not only other machines, but also human beings. Such machines ought to understand the meaning of information in more human-like ways by grounding knowledge in the physical world and in the machines’ own goals. The motivation for developing such machines ranges from self-evidenced practical reasons, such as the expense of computer maintenance, to wearable computing in health care, and gaining a better understanding of the cognitive capabilities of the human brain. To achieve such an ambitious goal requires solutions to many problems, ranging from human perception, attention, concept creation, cognition, consciousness, executive processes guided by emotions and value, and symbiotic conversational human-machine interactions. An important component of this cognitive machine research includes multiscale measures and analysis. This chapter presents definitions of cognitive machines, representations of processes, as well as their measurements, measures and analysis. It provides examples from current research, including cognitive radio, cognitive radar, and cognitive monitors.
INTRODUCTION Computer science and computer engineering have contributed to many shifts in technological and computing paradigms. For example, we have seen shifts (i) from large batch computers to personal and embedded real-time computers, (ii) from control-driven microprocessors to data- and demand-driven processors, (iii) from uniprocessors to multiple-processors (loosely coupled) and multiprocessors (tightly coupled), (iv) from data-path processors to structural processors (e.g., neural networks (Bishop, 1995)), quantum processors (Nielsen and Chuang, 2000) and biomolecular processors (Sanz et al., 2003), (v) from silicon-based processors to biochips (Ruaro et al., 2005), (vi) from vacuum tubes to transistors to microelectronics to nanotechnology, (vii) from large passive
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Towards Cognitive Machines
sensors to very small smart active sensors (Soloman, 1999), (viii) from local computing to distributed computing and network-wide computing, (vii) from traditional videoconferencing to telepresence (e.g., WearTel and EyeTap (Mann, 2002)), (viii) from machines that require attention (like a palmtop or a wristwatch computer) to those that have a constant online connectivity that drops below the conscious level of awareness of users (like autonomic computers (Ganek and Corbi, 2003), (IBM 2006), and eyeglass-based systems (Mann, 2002), (Haykin and Kosko, 2001)), (ix) from crisp-logic-based computers to fuzzy or neurofuzzy computers (Pedrycz and Gomide, 1998), as well as (x) from control-driven (imperative) systems to cognitive systems such as cognitive radio (Haykin, 2005a), cognitive radar (Haykin, 2006), active audition (Haykin and Chen, 2005), and cognitive robots. These remarkable shifts have been necessitated by the system complexity which now exceeds our ability to maintain them (Ganek and Corbi, 2003), while being facilitated by new developments in technology, intelligent signal processing, and machine learning (Haykin et al., 2006). Since the 1950s, philosophers, mathematicians, physicists, cognitive scientists, neuroscientists, computer scientists, and computer engineers have debated the question of what could constitute digital sentience (i.e., the ability to feel or perceive in the absence of thought and inner speech), as well as machine consciousness or artificial consciousness (e.g., (Neumann, 1958), (Searle, 1980), (Minsky, 1986), (Rumelhart and McClelland, 1986), (Cotterill, 1988), (Posner, 1989), (Klivington, 1989), (Penrose, 1989), (Kurzweil, 1990), (Dennett, 1991), (Searle, 1992), (Penrose, 1994)). Consequently, many approaches have been developed to modelling consciousness, including biological, neurological, and engineering (practical). The approach to cognition taken in this chapter is mostly engineering in which the behaviour of a system can be observed, measured, characterized, modelled and implemented as an artefact, such as a cognitive robot (either isolated or societal) to improve its interaction with people, or a cognitive radio to improve the utilization of a precious resource, i.e., the frequency spectrum. In general, the intent of such cognitive systems is to improve their performance, to reduce waste in resource utilization, and to provide a test-bed for learning about cognition and cognitive processes. If an approach is purely reductionist, it may not be capable of describing the complexities of cognition. Since our engineering approach considers not only the individual components of a system, but also their interactions, it may be capable of describing the dynamics of cognitive processes. Although engineering approaches have serious limitations (e.g., (Parsell, 1905), (Chalmers, 1997)), they are intended to produce a range of specific practical outcomes. The next section provides a few definitions and models of consciousness, and serves as a preamble for several applications. The chapter also identifies a few problems with the current status of cognitive machines.
WHATCOGNITION According to the Oxford Dictionary, cognition is “knowing, perceiving, or conceiving as an act.” The Encyclopedia of Computer Science (Ralson et al., 2003) provides a computational point of view of cognition consisting of the following three attributes: (i) cognition can be described by mental states and processes (MSPs) intervening between input stimuli and output responses, (ii) the MSPs can be described by algorithms, and (iii) the MSPs should lend themselves to scientific investigations. Another view of cognition is suggested by Pfeifer and Scheier (Pheifer&Scheier, 1999, pp. 5-6) as an interdisciplinary study of the general principles of intelligence through a synthetic methodology termed learning by understanding. Cognition also includes language and communications, as studied in different forms (e.g., (Fischler and Firschein, 1987, p.81), (Roy and Pentland, 2002), (Roy, 2005)). Although the language and communications of humans and machines differ significantly, their roles are similar. More specifically, Haikonen (Haikonen, 2003) defines cognition as the association of auxiliary meanings with percepts, the use of percepts as symbols, the manipulation of these symbols, reasoning, response generation, and language.
189
Towards Cognitive Machines
COGNITIVEMACHINEMODEL Haikonen Model Pentti Haikonen of the Cognitive Technology at Nokia in Helsinki has advanced the concept of cognitive machines in the context of the mind-body problem. He sees emulation of the cognitive process of the brain through a cognitive architecture (consisting of the flow of perception, inner speech, inner imaginary, and emotions), rather than the classical architecture involving rule-based artificial intelligence and neural networks. His architecture does not involve rule-based computing. To explain the scope of the architecture, Haikonen gives a good illustrative example of five classes of behavioural machines (mobile robots) which have external environmental sensors to monitor the environment, and internal self-sensors to monitor the internal states of the machine itself. 1.
2.
3.
4.
5.
Simple-Reflex Machine. In response to an obstacle, the robot backs off and turns a little so that it could avoid the obstacle. This behaviour is driven by recognition (i.e., sensing, characterization, and classification) alone, and its operation requires no memory. Although the robot is “aware” of the obstacle (by detecting it) and responds to it, this behaviour is too primitive to call it a conscious act. Simple-Reflex Machine with Smart Memory. If memory and some processing capabilities are added, the robot not only can sense the obstacles, but also can record their locations and the responses to each obstacle, and then can compute the “best” path by deleting any loops in the path, using known optimization algorithms. Thus, its “awareness” has been expanded from the encounter with the obstacles to the layout of the maze traversed. However, this behaviour is still too simple to call it consciousness. Machine with Meaningful Perception and Associative Memory. If an associative memory ((Kohonen, 1987, Hinton&Anderson, 1981)) is added to the previous robot, the robot can not only sense, but can also perceive its environment, and can learn from the experiences encountered so far. Furthermore, if a cost function is established within the robot (i.e., a function capable of discriminating between a “pain” for dangerous obstacles, and a “pleasure” for useful objects) then the “seen” obstacle can evoke “images” of its past encounters, and a response can be produced even before the actual contact with the object. This perception goes beyond recognition in that it is an active process of searching and finding threats and opportunities through evoked “imagery” of the object and associated actions. In a way, such a machine exhibits an attention process, seeking “satisfaction,” as controlled by a good-bad discrimination criterion and its own needs. Machine Associative Memory and Reporting. If representational symbols (words) and a corresponding symbol system in the form of a language (e.g., (Chomsky, 2006), (Dawkins, 1990)) are established within the robot, then the robot not only can exhibit the behaviour just described, but also can report (declare) on its perception and actions during and after the perceived events, as well as on its inner self-sensed percepts. The robot can also associate meaning with the declaration of its peers. Machine with Self Awareness and a “Mind.” The previous two classes of behaviour were based on an “open-loop” control process. If feedback is added to the flow of percepts, then the robot could perceive its own declarations and understanding them silently, without the need to act upon them. This would allow the robot to distinguish between the percepts caused by external environment and those caused internally. The “inner images” could lead to “imagining.” If another good-bad criterion could be developed and applied to the “inner images,” then the robot could develop a “mental content” or a form of a “mind.”
In summary, the Haikonen cognitive machine architecture involves: (i) sensory preprocessing circuits to derive representations of the sensed external events, (ii) introspective feedback loops that discriminate between the external representations, and broadcast the outcomes to other loops, (iii) associative cross-coupling of those loops, and (iv) attention control through adaptive thresholding. Such a machine requires: (i) distributed signal representations, (ii) associative processing and learning, (iii) perception processes, (iv) sensory attention and inner attention, (v) the flow of inner “speech” and “imagery,” as well as (vi) evaluation of the significance (relevance) of reasoning, motivation, and action. Thus, consciousness might emerge spontaneously in such a complex system. A demonstration of a low-complexity system exhibited emotions, but not consciousness (Haikonen, 2004). 190
Towards Cognitive Machines
Similar emergent consciousness in autonomous agents is described by Freeman (Freeeman, 2001) and Cotterill (Cotterill, 2003). Both approaches are consistent with Grossberg’s view of the mind and brain in which new states are being synthesized continuously to form a better adaptive relationship with the environment (Grossberg, 1982), and with Grossberg’s adaptive resonance theory (ART) (Grossberg, 1988), as well as with the dynamic system approach to consciousness and action (Thelen&Smith, 2002). A major criticism of Haikonen’s model is that there is no systematic approach to measuring the basic system reactions or emotions, or perception of time (Parsell, 2005). This chapter addresses a part of this problem.
Franklin’s Model Functional consciousness of an autonomous agent (in the form of an intelligent distribution agent, IDA) was described by Stan Franklin from the Conscious Software Research Group at the University of Memphis (Franklin, 1995), (Fran03). An IDA can exhibit several functions of consciousness as elicited by Bernard Baars under his global workspace theory (Baars, 1988). An implementation of the concept includes a system to communicate with US Navy sailors in a natural language.
Aleksander’s WisARDMAGNU Igor Aleksander of the Neural Systems Engineering group at the Department of Electrical and Electronic Engineering at the Imperial College of Science, Technology and Medicine in London has an extensive background in neurocomputing, including parsimonious neural networks (Aleksander, 1989). He has been working on a neural pattern recognition system called WISARD (Aleksander, 1998) and a neural representation modeller called MAGNUS (Aleksander, 1998), (Aleksander, 2003), and the artificial neural consciousness system (Aleksander, 2003), (Aleksander, 2006). His model adds the ability to predict (anticipate) foreseeable events. Aleksander considers prediction as the key feature of consciousness.
Taylor’s CODAM Model John Taylor of the Department of Mathematics at King’s College in London has developed many models of perception and consciousness, with focus on attention as the prerequisite for consciousness. His thesis is that the application of engineering control theory to attention could show how consciousness is created. This view is motivated by his extensive knowledge of experimental data of visual illusions, memory, attention and motor control through a number of imaging techniques. One of the techniques is the magnetoencephalography (MEG), capable of detecting the very small (femtoTesla) magnetic fields generated by the brain. MEG produces a fine temporal resolution (milliseconds), but poor spatial resolution (millimetres). The problem of spatial resolution can be resolved by the functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) whose temporal resolution is poor (tens of seconds), but spatial resolution is very good. His studies identified the parietal lobe as an important brain region in the creation of biological consciousness. This has lead to the development of the corollary discharge of attention movement (CODAM) model (Taylor, 2003), (Taylor, 2002), (Taylor, 2001). This model has been used to study mental diseases such as schizophrenia and autism. Over 15 other models amenable for implementations have been described at a Workshop on Models of Consciousness (Sanz et al., 2003). A brief review of the state of cognitive systems engineering is presented in (Hoffman et al., 2002).
ExAMPLE OFYTEM The models described in the previous sections are designed to provide insight into the development of cognition and consciousness. Practical applications may not require the complexities involved. To this end, Simon Haykin of the Adaptive Systems Laboratory at McMaster University in Hamilton has defined (Haykin, 2005b) a cognitive
191
Towards Cognitive Machines
machine as an intelligent system that: (i) is aware of its surrounding environment (i.e., outside world), (ii) uses understanding-by-building to learn through interactions with the environment and other means, and (iii) adapts its internal states to statistical variations in input stimuli by making corresponding changes in adjustable system parameters in real-time with specific objectives (e.g., reliability, efficiency, active sensing) as determined by the application of interest. The essential attributes of such a machine include: awareness, intelligence, learning, adaptivity with temporal and structural reconfigurability, action, and real-time operation. This characterization of cognitive machines is mostly behavioural, and leads to systems realizable with today’s technologies.
Cognitive Radio The electromagnetic radio spectrum can be considered as a precious natural resource that should be utilized efficiently. Some frequency bands are used heavily, while others are underutilized. The utilization of the bands also changes with location and time. By being aware of the actual utilization of the bands, and by dynamic spectrum management (i.e., through frequency allocation and adjustment of the transmitted power), this resource could be used more efficiently. Cognitive radio based on the software-defined radio (Dillinger et al., 2003) can achieve this goal. A comprehensive description of such a cognitive radio system is provided by Haykin (Haykin, 2005a).
Cognitive Radar Similarly to cognitive radio, cognitive radar is aware of its environment, utilizes intelligent signal processing, provides feedback from the receiver to the transmitter for adaptive illumination based on the range and velocity, and preserves the information contents of radar returns. A network of such systems can also be set up to collaborate with one another as multifunction radars and noncoherent radars. A description of such a system is provided by Haykin (Haykin, 2006).
Active Audition Active audition involves sound localization, segregation of the target source from interference, tracking of the source, and learning from the experience to adapt to the changing environment (Haykin and Chen, 2005). This problem has many applications, including improved devices for hearing-impaired persons.
Meaning Machines and Affective Computing Deb Roy of the Cognitive Machines Group at the MIT Media Laboratory has been focusing on human-machine language development for conversational robots (Roy 05). Others from that group have been developing machine recognition of affective (emotional) states and their synthesis in order to accelerate and improve learning of individuals (Kort and Reilly, 2002).
Autonomic Computing Systems Autonomic computing (AC) is a scaled-down form of cognitive machines (Ganek and Corbi, 2003), (IBM, 2006), (Kinsner, 2005a), (Wang, 2003), (Wang, 2004), (Wang, 2006). The systems are evolving earnestly because the cost-performance of hardware improvements (speed and capacity) have lead to escalating complexity of software (features and interfaces). However, this increased complexity requires elaborate managing systems that are now six to ten times the cost of the equipment itself. Autonomic computing is intended to simplify this problem by making the systems self-configuring, self-optimizing, self-organizing, self-healing, self-protecting, and selftelecommunicating, thus leading to increased reliability, robustness, and dynamic flexibility. This involves not only the traditional fault tolerant computing (i.e., tolerating hardware and software faults), but also tolerating various faults made by human operators and users, thus shifting attention from the mean-time-between failures
192
Towards Cognitive Machines
(MTBF) to the mean-time-to-recover (MTTR) in order to make the systems more available. AC applies to both desktop computing, portable computing, pervasive computing, and embedded systems.
Other Projects Sandia Laboratories have been developing several projects related to cognitive machines, including human emulation, augmented cognition, knowledge capture, episodic memory, and naturalistic decision making. UCLA Cognitive Systems Laboratory has been involved in several projects, including evidential reasoning, default reasoning, learning, constraint processing, and graphoids (UCLA, 2006). James Anderson of the Department of Cognitive and Linguistic Sciences at Brown University in Providence has been developing the Ersatz brain project (Anderson, 2005), (Anderson, 2002) among many other neurocomputing projects.
PROCEing and MMETRIC What do the theoretical models and practical implementations of cognitive machines have in common? The key distinguishing features include: (i) intelligent signal processing that is robust and embedded in the machine itself, (ii) real-time learning prompted by the awareness of the external and internal environments, and interaction with both, (iii) closed-loop feedback control systems engineering so that past experiences could be used in the future, and (iv) state and quality metrics to discern between the desirable and undesirable actions.
Intelligent Signal Processing From the perspective of applied and industrial mathematics, a signal is an electrical representation of various ubiquitous, quantifiable physical variables such as temperature, pressure, flow, and light intensity. Such signals can be either analog (continuous with very high resolution), or discrete (sampled, but with very high resolution), or digital (sampled and quantized to a finite resolution), or boxcar (step-wise analog) functions over time (e.g., speech), space (e.g., images) or both (e.g., volumetric Doppler radar). The digital signals are of particular importance to computer-based signal processing which deals with the modelling, analysis/synthesis, feature extraction, and classification of such signals in order to gain insight into the underlying physical process, or to perform specific control tasks with the process. Signal processing is used in nearly all fields of human endeavour from signal detection in the presence of noise, to fault diagnosis, advanced control, audio and image processing (restoration, enhancement, segmentation, reconstruction, coding, compression), communications engineering, intelligent sensor systems with reconfigurable architectures, business, and humanistic intelligence (HI) (Haykin and Kosko, 2001) which utilizes the natural capabilities of the human body and mind, as well as cognitive informatics (CI) (Wang, 2002) . Intelligent signal processing (ISP) is treated in literature very thoroughly (e.g., (Haykin and Kosko, 2001)). Since many real-world physical systems are time varying, complex (high-dimensional), nonlinear, statistically nonstationary, non-Gaussian, nonlocal, sometimes chaotic, and subjected to unwanted signals (noise), the classical statistical signal processing (SSP) (e.g., (Oppenheim et al., 1999), (Proakis and Manolakis, 1995)) must be augmented by ISP and CI because of the required autonomy and interaction with humans. ISP has been found to be a more useful approach as it employs adaptation and learning to extract the essential information from the acquired signals and noise, without any assumed statistical models of the signals or theirs sources. These signals no longer exhibit additive invariance (short-range dependence), but multiplicative invariance (self-affinity with long-range dependence). The ISP tools include supervised and unsupervised learning through adaptive neural networks (Bishop, 1995), wavelets and their variations (Mallat, 1998), fuzzy rule-based computation (Pedrycz and Gomide, 1998), rough sets (Pawlak, 1991), granular computing (Bargiela and Pedrycz, 2002), genetic algorithms and evolutionary computation (Goldberg, 2002), and blind signal estimation (Haykin and Kosko, 2001). Algorithms for ISP must be implemented to satisfy the B-robustness property (Gadhok and Kinsner, 2006) because of higher-order statistics involved.
193
Towards Cognitive Machines
CI is concerned with (i) the extraction of characteristic features from signals obtained from measurements and observations, and (ii) the measurement and characterization of patterns (i.e., order and correlation) in processes related to perception and cognition (i.e., interaction with humans). Signals obtained from physical dynamical processes appear to be very complex. Much attention has been given to deterministic and stochastic linear-time-invariant (LTI) signals with a limited-bandwidth power spectrum density and short-tail distributions, leading to processing with finite moments. However, many physical signals are fundamentally different from the LTI signals in that they are invariant to scale rather than to translation (Worn, 1996). Such signals have different degrees of singularity as measured by their noninteger (fractal) dimensions (Kinsner, 2005b). Correlation in such signals persists from short to very long ranges, with distributions having long tails (infinite moments). In contrast to the well-established LTI system theory, the nonlinear scale invariant (NSI) system theory and applications are still developing. There is also another class of signals, the chaotic signals, originating from nonlinear dynamical systems, such as the AC systems (Kantz and Schreiber, 2004), (Sprott, 2003), (Principe et al., 2000). Research is being conducted to measure and characterize such systems.
Real-Time Learning Real-time learning may not be achievable due to the curse of dimensionality. A partial solution to this problem seems to be forthcoming through approximate dynamic programming (Farias, 2002), (Farias and Roy, 2003), (Farias and Roy, 2004), (Haykin, 2005b). This approach is based on Bellman’s dynamic programming, and it can be used as an optimal policy to guide the interaction of the learning system with its environment. Other approaches include neurofuzzy (Pedrycz and Gomide, 1998), granular (Bargiela and Pedrycz, 2002), and evolutionary computing (Goldberg, 2002).
Control with Feedback This component can be implemented in many different ways, from classical control system to neurofuzzy (Pedrycz and Gomide, 1998).
Metrics Cognitive systems cannot operate without some specific good-bad discriminating criteria which, in turn, require metrics. Such metrics are being developed (e.g., (Kinsner and Dansereau, 2006)). The classical energy-based metrics (such as the scalar peak signal-to-noise ratio, PSNR) must be augmented by vectorial information-based or entropy-based metrics, such as the Rényi entropy spectrum, or the Rényi fractal dimension spectrum (Kinsner, 2005b), or the relative Rényi fractal dimension spectrum (Kinsner and Dansereau, 2006). The latter can also be used for subjective quality metrics to evaluate images and video.
CONCLUDINGREMAR Intelligent signal processing, real-time machine learning, and cognitive informatics are essential in solving some design and implementation issues of cognitive machines. Although the examples of cognitive radio, cognitive radar, active audition, cognitive robotics, and autonomic computing indicate that such cognitive systems are now feasible, much work must be conducted in all the areas to make the ambitious goals of such systems more practicable. Particular attention must be given to metrics and cognitive informatics.
194
Towards Cognitive Machines
Acknowledgment Partial financial support of this work from the Natural Sciences and Engineering Research Council (NSERC) of Canada is gratefully acknowledged.
References Aleksander, Igor. (1989). Neural computing architectures: The design of brain-like machines. (p. 401). Cambridge, MA: MIT Press. Aleksander, Igor. (1998, Augusut) From WISARD to MAGNUS: A family of weightless neural machines (pp. 18-30). Aleksander, Igor. (2003). How to build a mind: Towards machines with imagination (maps of the mind 2nd ed.) (p 192) New York, NY: Columbia University Press. Aleksander, Igor. (2006). Artificial consciousness: An update. Available as of May 2006 from http://www.ee.ic. ac.uk/research/neural/publications/iwann.html (This is an update on his paper “Towards a neural model of consciousness,” in Proc. ICANN94, New York, NY: Springer, 1994.) Anderson, J. (2002, August 19-20). Hybrid computation with an attractor neural network In Proceedings of the 1st IEEE Intern. Conf. Cognitive Informatics (pp. 3-12). Calgary, AB{ISBN 0-7695-1724-2} Anderson, J. (2005, August 8-10). Cognitive computation: The Ersatz brain project. In Proceedings of the 4th IEEE Intern. Conf. Cognitive Informatics, (pp. 2-3). Irvine, CA {ISBN 0-7803-9136-5} Austin, J. (ed.) (1998). RAM-based neural networks (p. 240). Singapore: World Scientific. Baars B. (1988). A cognitive theory of consciousness. Cambridge, UK: Cambridge University Press.,Available as of May 2006 from http://nsi.edu/users/baars/BaarsConsciousnessBook1988/index.html Bargiela, A., & Pedrycz, W. (2002). Granular computing: An introduction (p. 480). New York, NY: Springer. Bishop, C.M. (1995). Neural networks for pattern recognition (p. 482). Oxford, UK: Oxford University. Chalmers, D. (1997). The conscious mind: In search of a fundamental theory. (p. 432). Oxford, UK: Oxford University Press. Chomsky, N. (1988). Language and mind (p. 208). Cambridge, UK: Cambridge Univ. Press, 2006 (3rd ed.). Cotterill, R. (ed.) (1988). Computer simulations in brain science (p. 566) Cambridge, UK: Cambridge University Press. Cotterill, R. (2003). CyberChild: A simulation test-bed for consciousness studies. In J. Consciousness Studies, 10, 4-5, 31-45. Dawkins, R. (1990). The selfish gene (2nd ed.) (p. 368). Oxford, UK: Oxford University Press. Dennett D.C. (1991). Consciousness explained (p. 528). London, UK: Allan Lane/Penguin. Dillinger, M., Madani, K., & Alonistioti, Nancy (eds.) (2003). Software defined radio: Architectures, systems and functions (p. 454). New York, NY: Wiley. de Farias, D.P. (2002). The linear programming approach to approximate dynamic programming: Theory and application, doctorate dissertation. (p. 146). Stanford, CA: Stanford University. Available as of May 2006 from http://web.mit.edu/~pucci/www/daniela_thesis.pdf
195
Towards Cognitive Machines
de Farias, D.P., & Van Roy, B. (2003). The linear programming approach to approximate dynamic programming. Oper. Res., 51(6), 850-865. de Farias, D.P., & Van Roy, B. (2004, August). On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res., 29,(3), 462-478. Fischler, M.A., & Firschein, O. (1981). Intelligence: The eye, the brain and the computer (p. 331). Reading, MA: Addison-Wesley. Franklin, S. (1995). Artificial minds (p. 464). Cambridge, MA: MIT Press. Franklin, S. (2003). IDA: A conscious artefact. Freeeman, W. (2001). How brains make up their minds (2nd ed.) (p. 146). New York, NY: Columbia University Press. Gadhok, N., & Kinsner, W. (2006, May 10-12). An implementation of beta-divergence for blind source separation. In Proceedings of the IEEE Can. Conf. Electrical & Computer Eng., CCECE06 (pp. 642-646). Ottawa, ON. Ganek, A.G., & Corbi, T.A. (2006, May). The dawning of the autonomic computing era. IBM Systems Journal, 42(1), 34-42. Available as of May 2006 from http://www.research.ibm.com/journal/sj/421/ganek.pdf/ Goldberg, D.E. (2002). The design of innovation: Genetic algorithms and evolutionary computation (p. 272). New York, NY: Springer. Grossberg, S. (1982). Studies of mind and brain: Neural principles of learning, perception, development, cognition and motor control (p.662). Boston, MA: D. Reidel Publishing. Grossberg, S. (ed.) (1988). Neural networks and natural intelligence (p. 637). Cambridge, MA: MIT Press. Haikonen, P.O.A. (2003). The cognitive approach to conscious machines (p. 294). New York, NY: Academic. (See also http://personal.inet.fi/cool/pentti.haikonen/) Haikonen, P.O.A. (2004, June). Conscious machines and machine emotions. Workshop on Models for Machine Consciousness, Antwerp, BE.. Haykin, S. (2005, February). Cognitive radio: Brain-empowered wireless communications. IEEE J. Selected Areas in Communications, 23(2), 201-220. Haykin, S. (2005, September 28-30). Cognitive machines. In IEEE Intern. Workshop on Machine Intelligence & Sign. Proceedings, IWMISP05. Mystic, CT. Available as of May 2006 from http://soma.crl.mcmaster.ca/ASLWeb/Resources/data/ Cognitive_Machines.pdf Haykin, S. (2006, January). Cognitive radar. IEEE Signal Processing Mag (pp. 30-40).. Haykin, S., & Chen, Z. (2005). The cocktail party problem. Neural Computation, 17, 1875-1902. Haykin, S., & Kosko, Bart. (2001). Intelligent signal processing (p. 553) New York, NY: Wiley. . Haykin, S., Principe, C.J., Sejnowski, T.J., & McWhirter, J. (2006). New directions in statistical signal processing (p. 544). Cambridge, MA: MIT Press. Hinton, G.E., & Anderson, J.A. (1981). Parallel models of associative memory (p. 295). Hillsdale, NJ: Lawrence Erlbaum Associates. Hoffman, R.R., Klein, G., & Laughry, K.R. (2002, January/February). The state of cognitive systems engineering. IEEE Intelligent Systems Magazine (pp. 73-75).. Holland, O. (ed.), (2003). Machine consciousness (p. 192). Exeter, UK: Imprint Academic.
196
Towards Cognitive Machines
Hyvarinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis (p. 481). New York, NY: Wiley. IBM Autonomic Computing Manifesto. (Available as of May 2006, http://www.research.ibm.com/autonomic/) Kantz, H., & Schreiber, T. (2004). Nonlinear time series analysis (p. 369) (2nd ed.). Cambridge, UK: Cambridge University Press. Kinsner, W. (2005, June 16-18). Signal processing for autonomic computing. In Proceedings 2005 Meet. Can. Applied & Industrial Math Soc., CAIMS 2005, Winnipeg, MB Available as of May 2006 from http://www. umanitoba.ca/institutes/iims/caims2005_theme_signal.shtml Kinsner, W. (2005, August 8-10). A unified approach to fractal dimensions. In Proceedings of the IEEE 2005 Intern. Conf. Cognitive Informatics, ICCI05 (pp. 58-72). Irvine, CA.{ISBN: 0-7803-9136-5}. Kinsner, W., & Dansereau, R. (2006, July 17-19). A relative fractal dimension spectrum as a complexity measure. In Proceedings of the IEEE 2006 Intern. Conf. Cognitive Informatics, ICCI06, Beijing, China.{ISBN: 1-42440475-4}. Klivington, K. (1989). The science of mind (p. 239). Cambridge, MA: MIT Press. Kohonen, T. (2002). Self-organization and associative memory. (pp. 312) (2nd ed.). New York, NY: Springer Verlag. Kort, B., & Reilly, R. (2002). Theories for deep change in affect-sensitive cognitive machines: A constructivist model. Educational Techology. & Society, 5(4), 3 Kurzweil, R. (1990). The age of intelligent machines (p. 565). Cambridge, MA: MIT Press. Mallat, S. (1998). A wavelet tour of signal processing (p. 577). San Diego, CA: Academic Press. Mann, S. (2002). Intelligent image processing (p. 339). New York, NY: Wiley/IEEE. Minsky, M. (1986). The society of mind (p339). New York, NY: Touchstone. Nielsen, M.A., & Chuang, I.L. (2000). Quantum computation and quantum information (p. 676) Cambridge, UK: Cambridge University Press.. Oppenheim, A.V., Schafer, R.W., & Buck, J.R. (1999). Discrete-time signal processing (p. 870) (2nd Ed.) Prentice Hall. Parsell, M. (2005, March). Review of P.O. Haikonen, the cognitive approach to consious machines. Psyche, 11(2), 1-6.. Available as of May 2006 from http://psyche.cs.monash.edu.au/book_reviews/haikonen/haikone.pdf Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data (p. 252). New York, NY: Springer. Pedrycz, W., & Gomide, F. (1998). An introduction to fuzzy sets: Analysis and design (p. 465). Cambridge, MA: MIT Press. Penrose, R. (1989). The emperor’s new mind (p. 480). Oxford, UK: Oxford University Press. Penrose, R. (1994). The shadows of the mind: A search for the missing science of consciousness (p.457). Oxford, UK: Oxford University Press. Pheifer, R., & Scheier, C. (1999). Understanding intelligence (pp. 720). Cambridge, MA: MIT Press. Posner, M. (ed.) (1989). Foundations of cognitive science (p 888). Cambridge, MA: MIT Press.
197
Towards Cognitive Machines
Principe, J.C., Euliano, N.R., & Lefebvre, W.C. (2000). Neural and adaptive systems: Foundations through simulations (p. 656). New York, NY: Wiley. Proakis, J.G., & Manolakis, D.G. (1995). Digital signal processing: Principles, algorithms and applications (p. 1016) (3rd ed.). Upper Saddle River, NJ: Prentice-Hall. Ralson, A., Reilly, E.D., & Hemmendinger, D. (eds.) (2003). Encyclopedia of computer science (p. 2064) (4th ed.), New York, NY: Wiley. Roy, D.K. (2005, August). Grounding words in perception and action: Insight for computational models. Trends in Cognitive Sciences, 9(8), 389-396. Roy, D.K., & Pentland, A.P. (2002). Learning words from sights and sounds: A computational model. Cognitive Science, 26, 113-146. Ruaro, M.E., Bonifazi, P., & Torre, V. (2005, March). Toward the neurocomputer: Image processing and pattern recognition with neuronal cultures. IEEE Trans. Biomedical Eng., 52(3), 371-383. Rumelhart, D.E., & McClelland, J.L. (1986). Parallel Distributed Processing, 1-2, 547&611. Cambridge, MA: MIT Press. Sandia National Laboratories (2006). Projects. Available as of May 2006 from http://www.sandia.gov/cog.systems/Projects.htm Sanz, R., Chrisley, R., & Sloman, A. (2003). Models of sonsciousness: Scientific report (p. 37). European Science Foundation. Available as of May 2006 from http://www.esf.org/generic/1650/EW0296Report.pdf Searle, J.R. (1980). Minds, brains and programs. Behavioral & Brain Sciences, 3, 417-424. Searle, J.R. (1992). The rediscovery of the mind. (p. 288). Cambridge, MA: MIT Press. Sienko, T., Adamatzky, A., Rambidi, N.G., & Conrad, M. (2003). Molecular computing (p. 257). Cambridge, MA: MIT Press. Soloman, S. (1999). Sensor handbook (p. 1486). New York, NY: McGraw-Hill. Sprott, J.C. (2003). Chaos and time-series analysis (p. 507). Oxford, UK: Oxford University Press. Taylor, J.G. (2001). The Race to Consciousness (p. 392). Cambridge, MA: MIT Press. Taylor, J.G. (2002). Paying attention to consciousness. Trends Cogn. Sciences, 6, 206-210. Taylor, J.G. (2003, June 20-24). The CODAM model of attention and consciousness. In Proceedings of the Intern. Joint Conf. Neural Networks, IJCNN03, 1, 292-297. Portland, OR. Thelen, E., & Smith, L.B. (2002). A dynamic system approach to the development of cognition and action (p. 376). Cambridge, MA: MIT Press. UCLA Cognitive Systems Laboratory (2006). Available as of May 2006 fromhttp://singapore.cs.ucla.edu/cogsys. html Velmans, M. (2000). Understanding consousness (p. 296). New York, NY: Routledge. von Neumann, J. (1958). The computer and the brain (p. 82). New Haven, CT: Yale University Press. Wang, Y. (2002, August 19-20). On cognitive informatics. In Proceedings of the 1st IEEE Intern. Conf. Cognitive Informatics (p. 34-42). Calgary, AB. ISBN 0-7695-1724-2. Wang, Y. (2003, August). Cognitive informatics models of software agent systems and autonomic computing. Keynote speech of the Proceedings of the Intern. Conf. Agent-Based Technologies and Systems (ATS’03) (p. 25). Calgary, Canada: U of C Press.
198
Towards Cognitive Machines
Wang, Y. (2004, August 16-17). On autonomic computing and cognitive processes. Keynote speech in Proceedings of the 3rd IEEE Intern. Conf. on Cognitive Informatics, ICCI04,(pp. 3-4). Victoria, BC. ISBN 0-7695-2190-8. Wang, Y. (2006). Cognitive informatics: Towards the future generation computers that think and feel. Keynote speech in Proceedings of the 5th IEEE Intern. Conf. Cognitive Informatics, ICCI06, (pp. 3-7). Beijing, China2006). Wornell, G.W. (2006) Signal processing with fractals: A wavelet-based approach (p. 127). Upper Saddle River, NJ: Prentice-Hall.
199
200
Chapter XIV
Towards Autonomic Computing: Adaptive Neural Network for Trajectory Planning Amar Ramdane-Cherif Université de Versailles St-Quentin, France
Abstract Cognitive approach through the neural network (NN) paradigm is a critical discipline that will help bring about autonomic computing (AC). NN-related research, some involving new ways to apply control theory and control laws, can provide insight into how to run complex systems that optimize to their environments. NN is one kind of AC systems that can embody human cognitive powers and can adapt, learn, and take over certain functions previously performed by humans. In recent years, artificial neural networks have received a great deal of attention for their ability to perform nonlinear mappings. In trajectory control of robotic devices, neural networks provide a fast method of autonomously learning the relation between a set of output states and a set of input states. In this chapter, we apply the cognitive approach to solve position controller problems using an inverse geometrical model. In order to control a robot manipulator in the accomplishment of a task, trajectory planning is required in advance or in real time. The desired trajectory is usually described in Cartesian coordinates and needs to be converted to joint space for the purpose of analyzing and controlling the system behavior. In this chapter, we use a memory neural network (MNN) to solve the optimization problem concerning the inverse of the direct geometrical model of the redundant manipulator when subject to constraints. Our approach offers substantially better accuracy, avoids the computation of the inverse or pseudoinverse Jacobian matrix, and does not produce problems such as singularity, redundancy, and considerably increased computational complexity.
Introduction Current research areas on theories and applications of Cognitive Informatics (Wang et al., 2005; Chiew, 2003; Wang, 2005) have demonstrated a consistent effort at applying cognitive informatics to real word problem domains such as autonomous computing. Almost all of the hard problems yet to be solved in this discipline are stemmed from the fundamental constraints of the brain and the understanding of its cognitive mechanisms and processes (Wang et al., 2002; Wang et al., 2003).
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Towards Autonomic Computing
The autonomic computing (Wang, 2003) derives from the body’s autonomic nervous system, which controls key functions without conscious awareness or involvement. Autonomic controls use motor neurons to send indirect messages to organs at a sub-conscious level. These messages regulate temperature, breathing, and heart rate without conscious thought. The implications for computing are immediately evident; a NN, which computes joint positions for a robot and adapts itself under varying conditions without considerably increased computational complexity. In recent years, artificial neural networks have received a great deal of attention for their ability to perform nonlinear mappings. In trajectory control of robotic devices, neural networks provide a fast method of autonomously learning the relation between a set of output states and a set of input states (Guez et al., 1989; Kieffer et al., 1991; Hunt et al., 1992; Ramdane-Cherif et al., 1995). In (Jung et al., 2000; Fang et al., 1993; Fang et al., 1998) several neural network inverse control techniques are applied for trajectory tracking of a PD controlled rigid robot and (Kawato et al., 1990) look ahead planning based on neural networks is successfully applied to real time control of a robot arm. the task is to touch a rolling ball with a robot arm Traditional approaches to control redundant manipulators have centered on the Jacobian pseudoinverse (Klein et al., 1983) which is non intuitive, tiresome to compute and generates arbitrary joint position vectors in the neighborhood of singularities. These solutions are often inappropriate and result in unacceptable large joint velocities and accelerations. Many methods have therefore been developed to solve this problem. Some of them extend the pseudoinverse (Liegeois, 1986; Klein, 1993 et al.; Chen et al., 1993) so as to use the kinematic redundancy for optimizing an objective function or to directly solve the redundancy by including some constraints into the direct kinematic model. Other works are oriented towards real time application like the gradient projection scheme proposed in (Dubey et al., 1991) or the decomposition of the jacobian matrix into two squared matrices found in (Chevallereau, 1988). The computational complexity is partly reduced. The globally optimal resolution has also been proposed. This approach converts the redundancy resolution to an optimal control problem using the necessary conditions of optimality given by the Pontryagin’s principle (Nakamura et al., 1987) or by the optimal control theory (Kim et al., 1994). This work is however inappropriate for real-time control. These solutions generate arbitrary joint positions in the neighborhood of singularities and when target positions are out of reach. Since a neural network can certainly learn to generalize and learn relationships that were not present in the training set, it will give an approximated solution near singularity points. As a result, a neural network will be able to solve the singularity problem which has long been the major difficulty in implementing Resolved Motion Rate Control. A neural network will also be able to give solutions that work well even when target positions are out of reach. Another problem that arises when studying the kinematic aspects of a robot manipulator is the so-called inverse geometrical problem. When using a position controller to operate a robot manipulator, trajectory planning is required in advance or in real time. The desired trajectory is usually described in Cartesian coordinates and needs to be converted to joint space for the purpose of analyzing and controlling system behavior. The inverse geometrical model can generally be approached by a numerical solution. The geometric method using (Hunt, 1987; Kircanski et al., 1993) derives a closed-form solution from the direct kinematic equation. For manipulators that do not have a closed-form solution like kinematically redundant manipulators, standard numerical procedures (Featherstone, 1994; Bestaoui, 1991) solve the nonlinear differential equation at a discrete set of points. Moreover, for redundant arms, one solution has to be chosen from an infinite number of inverse kinematic solutions obtained for a given workspace position. This leads to increased numerical complexity. In this chapter, we use the Modified Memory Neural Network (MMNN) to solve both the redundancy and singularity problems of the inverse geometrical model in a more effective manner than the standard methods. Specifically, we use the MMNN to find solutions to the inverse geometrical model which are required for operating the position controller of a redundant robot manipulator. We demonstrate that the MMNN-based position controller is better able to track straight line trajectories than standard methods, especially when the trajectories involved singularities. We show that the MMNN-based position controller can also be trained to honor problem specific constraints. This neural network controller is able to successfully track more challenging cyclical trajectories when certain constraints were enforced.
201
Towards Autonomic Computing
The organization of this chapter is as follows. In section 2, we recall the kinematic formulation of a robot manipulator necessary to the inverse geometrical model of a redundant robot subject to constraints. Section 3 presents the topology of the memory neural network (MNN) and its corresponding learning method. Simulation results of a three degrees of freedom (DOF) robot arm are given in section 4 and finally some conclusions are drawn in section 5.
Kinematic Formulations for the Position Based Controller In this section, we consider the kinematics of a redundant manipulator which has more DOF than the dimension of the task space. Let q be a n×1 vector of joint angles and x be a m×1 vector which represents the corresponding position of the end-effector in Cartesian coordinates (n>m). The variables x and q are then related by the forward kinematic transformation f(.) (a well-known nonlinear function). Given xd, a trajectory of end-effector positions in Cartesian space, the inverse problem is to compute the corresponding joint trajectory position qd. Classical approaches consist of computing the solution by introducing the pseudoinverse matrix J+ so that: q = τ q + q0 + + q = J x + ( I − J J )υ
(1)
where I is the identity matrix, J is a m×n jacobian matrix, (I – J + J) is the null space projection matrix, υ is an arbitrary joint velocity vector and τ is the sampling period. For the purpose of our proposed method, we introduce an extended position vector as:
x X =F(q)= = f(q) 0 g(q)
(2)
where f(q) is the forward kinematic model and g(q) is the constraint vector added in the redundant case. These constraints can be represented by the general form proposed in (Ballieul, 1985) as g (q ) = ( ∂φ(q ) ∂q
T [N ] )
= 0
(3)
where φ(q) is a scalar kinematic objective function to be minimized and N is the (n-m) null space vector of J which corresponds to the self-motion of a redundant arm. Mathematically, N can be expressed as J −1 J N = det( J a ) N with N = a b − I n−m
(4)
where Ja is a m-square matrix made of the m first columns of J and Jb is an m×(n-m) matrix of the remaining columns. More specifically, J=(Ja Jb) and the (n-m) null space vector N of the J is obtained using (4). As the criterion φ(q) is convex with regard to q, the condition of optimality (3) is necessary and sufficient (17). For our first simulation, we chose the following objective function: φ(q ) =
1 n ∑ wili (qi − q0i )2 2 i =1
where li is the ith link and wi is its corresponding weight coefficient. 202
(5)
Towards Autonomic Computing
This objective function aims to minimize joint displacement between two successive points. According to (4) and (5) the constraint equation becomes: w1l1 (q1 − q01 ) w l (q − q ) 2 2 2 02 T = 0. g (q) = N w l (q − q ) n n n 0n
(6)
The extended forward kinematic equation for an n DOF planar manipulator (m=2 ⇒ redundancy equals n-2) is: x1 l1 ⋅ c1 + l2 ⋅ c12 + l3 ⋅ c123 + .... + ln ⋅ c123...n F (q ) = X (q ) = x2 = z l1 ⋅ c1 + l2 ⋅ c12 + l3 ⋅ c123 + .... + ln ⋅ c123...n
(7)
where x1 and x2 are the Cartesian coordinates in the plane (m=2), where: si = sin(qi), sij = sin(qi + q j), sijk = sin(qi + q j + qk), ci = cos(qi), cij = cos(qi + q j), cijk = cos(qi + q j + qk) and l1, l2,…..ln are the link lengths, and z is a vector of dimension (n-2). In the general case: F (q ) = X (q ) = (x(q ), z (q ) ) = ( x1 (q ), x2 (q ), x3 (q ) xm (q ), z1 (q ), z2 (q ), , zn − m (q ) ).
(8)
The optimization problem consists in finding the joint position vector q for a given Cartesian position vector (x,z). Generally z may be set to zero in order to find the optimal configuration that places the end-effector at the desired position x and minimizes the objective function φ(q). The robot arm must move in operational space following commands that operate on the joint variables. Generally, the task that the manipulator must achieve is specified in operational space where it is easier to represent the movement of the robot. The passage from one space to another is thus very important and requires the direct and inverse models of the robot. These models are often very difficult to obtain because they are non-linear and their computation is time consuming. If we can obtain the direct models relatively easily, the inverse models do not have an a priori analytical expression. Finding the inverse model is even more difficult if the robot is redundant (Figure 1)
Figure 1. (a) The extended direct model and (b) the inverse model of a redundant manipulator
q
F(q) a
X
X(x,z)
? b
203
Towards Autonomic Computing
Te Neural Network Topology and the Learning Method An efficient method to design a Feedback Recurrent Network has been proposed in (Chatry et al., 1996). Here, we present a new neural method using memory neural network. Neural networks have limitations in identifying complex plants. Indeed, they consist of layers of neurons interconnected by weights, i.e. in static maps. The usual way to build dynamic maps is to feedback outputs and delay inputs as much as necessary so that the input layer is made of both the inputs and the outputs at different time instants. This Time Delay Neural Network (TDNN) (Figure 2) is intrinsically static, and the learning phase is complicated by the feedback loops which raise the possibility of stability. Sastry (Sastry et al.94) first introduced the concept of MNN (Figure 3). To each neuron of a TDNN is added special neurons named memory neurons, which are connected to the neurons by a time delay so that they are able to retain information. Sastry has proven the efficiency of such MNNs in automatic control especially for the identification of non-linear dynamical systems. However, he kept the external feedback loop and therefore kept the same drawbacks as seen with TDNNs. Compared to a TDNN, the new MNN used in this chapter has many advantages (Chatry et al., 1996): the absence of a feedback loop allows the learning phase to be processed with a parallel structure (to the plant) which is much simpler and avoids stability problems, and is also intrinsically dynamic. Each memory cell is a first order discrete time function and the hidden layer is a parallel structure of these dynamic functions. In this chapter, We have generalized MNN structures to understand their influence to learn and identify very complex non-linear systems. The architecture of the Modified Memory Neural Network (MMNN) used to learn the inverse geometrical model is presented in Figure 4. It consists of n inputs in layer E, c neurons in the hidden layer H, and n neurons in the output layer S. The E vector corresponds to the Cartesian coordinates X = F(q) and the S vector corresponds to the joint angle component q. Each neuron of the hidden and output layers is connected, by a time delay, to a special neuron named a memory neuron, as is shown in Figure 3. Thus, we obtain a new architecture of an MMNN shown in Figure 4 (for clarity, we do not represent the complete structure of the neuron as represented in Figure 3). Figure 2. Dynamic TDNN structure x 0d(k) x 0(k) x 0(k-dt) δ(k)
x 0(k-2dt) δ(k-dt) δ(k-2dt)
Figure 3. Sastry MNN Structure
α
204
Towards Autonomic Computing
Figure 4. The NN architecture
Outputs S The output layer
Wc
H
The hidden layer
Wi
The input layer
Inputs
E
Guations Our learning database is composed of 2 vectors: inputs u and outputs y. Wc, Wmc, Wbc are weight vectors for connections (neurons to neurons, memory cells to themselves) between layer c and layer c+1, where W0 is chosen to be null to avoid a direct relationship between input and output. Bc is the bias vector associated with the neurons of layer c (c≠0: input layer). Ac is the activation vector for layer c neurons, Vc is the non-linear output function and Vmc is the output vector of memory cells. NBCC is the number of hidden layers (in our example NBCC=1), NBC(c) is the number of layer c neurons and N is the value of the last layer (N=NBCC+1). The learning procedure involves the presentation of a set of pairs of input and output patterns. The NN first uses the input vector to produce its own output vector and then compares this with the desired output, or target vector. Then the weights are changed to reduce the difference. The rule for changing weights following presentation of input/output pair is derived by using the double Back-propagation algorithm (the spatial back-propagation (index c) and the temporal back-propagation (index k)). We first compute the input layer equations, all hidden layer equations and output layer equations. Then, we use the Lagrange equations to compute the partial derivatives and finally we derive the changing rule: ∆W k and ∆Bk.
Equations for the Input Layer We compute the input layer equations:
Vi 0(k)=ui(k)
(9)
V (k)=M V (k −1)+(I −M )V (k −1)
(10)
m0
0
m0
0
0
with M0=diagonal Matrix(Wb0) and k refers to time.
Equations for All Hidden Layers: C=1 to NBCC We compute all hidden layer equations
V (k)= f (A (k) c
c
c
(11)
205
Towards Autonomic Computing
with
A (k)=W c
m(c −1)
V
m(c −1)
(k)+W
(c −1)
V
(c −1)
(k)+ B
(12)
c
and f a sigmoid function. W and Wm are vectors, so: NBC(c −1)
V (k)= f ( ∑ (W jm(c −1) V jm(c −1)(k)+W j(c −1) V j(c −1)(k))+ B )
(13)
V (k ) = M V (k − 1) + ( I − M )V (k − 1)
(14)
c
T
c
T
c
j =1
mc
c
mc
c
c
with Mc=Diagonal Matrix(W bc).
Equation for the Output Layer, N , and the Output, S We compute the output layer equations VsN (k)= AsN (k)=WsN −1 V N −1(k)+Wsm(N −1) V m(N −1) + BsN T
T
(15)
Double Back-Propagation Equations Lagrange Equations We use the Lagrange equations to compute the partial derivatives Criterion J = with J s =
NBC(N)
1 J s NBC(N) ∑ s =1
(16)
2 1 ( ys (k ) − yd s (k )) ∑ 2 k
(17)
where yds is the desired output. Thus L ( X , ψ, W ) = N −1
∑ ( ∑ ψ ck c=2
T
NBC ( N ) T N −1 1 T m ( N −1) (((WsN −1 V (k ) + Wsm ( N −1) V (k ) + BsN ) − yd s (k )) 2 ) + ∑ ∑ 2 NBC ( N ) s =1 k
(V (k ) − f c ( c
k
NBC ( c −1)
∑ j =1
(W jm ( c −1) V jm ( c −1) (k ) + W j( c −1) V j( c −1) (k )) + B c )) + T
T
∑ψ
mcT k
1 (V (k ) − M V (k − 1) − ( I − M )V (k − 1)) + ∑ ψ k (V (k ) − f (
∑ψ
m1T k
(V (k ) − M V (k − 1) − ( I − M )V (k − 1) + ∑ ψ
mc
c
mc
c
T
c
k
1
1
NBC (0)
∑
k
k
m1
1
m1
1
1
m 0T k
j =1
(W jm 0 (k )V jm 0 (k )) + B )) + T
1
(V (k ) − M V (k − 1) − ( I − M )V (k − 1)))) m0
0
m0
k
where ψ is the Lagrange multiplier.
Partial Derivatives Using the previous partial derivatives we derive the changing rule : ∆W k and ∆Bk. For the layer N-1: (a particular case because no non-linear function)
206
(18)
0
0
Towards Autonomic Computing
∂L
=
∂V
N −1
∂V
m ( N −1)
NBC ( N )
∑
(k )
s =1
∂L
(( ys (k ) − yd s (k ))WsN −1 ) + ψ kN −1 − ( I − M
NBC ( N )
∑
N −1
N −1
(19)
ψ mk +(1N −1) = 0
(20)
cT ∂L c c mc = ψ − I − M ψ − W Fkc +1ψ ck +1 = 0 ( ) k k + 1 ∂V c (k )
(21)
= (k )
s =1
(( ys (k ) − yd s (k ))Wsm ( N −1) ) + ψ mk ( N −1) − M
)ψ mk +(1N −1) = 0
For all hidden layers, c, with c from 1 to N-2:
∂L ∂V (k ) mc
mc = ψ mc k − M ψ k +1 − W c
mcT
Fkc +1ψ ck +1 = 0
(22)
For the layer 0 (a particular case because W0 is null): ∂L ∂V (k ) m0
= −W
m 0T
Fk1ψ1k + ψ mk 0 − M ψ mk +01 = 0 0
(23)
Recurrent Equations for Double Back-Propagation The rule for changing weights following presentation of input/output pair is derived by using the double Backpropagation algorithm (the spatial back-propagation (index c) and the temporal back-propagation (index k)). The following formulas are placed in order of the spatial back-propagation (index c) from top to bottom and in order of the temporal back-propagation from right to left (index k). For the layer N-1: ∀k ≠ K ψ km ( N −1) = −
NBC ( N )
∑ s =1
(( ys (k ) − yd s (k ))Wsm ( N −1) ) + M
N −1
ψ mk +(1N −1)
(24)
(( ys ( K ) − yd s ( K ))Wsm ( N −1) )
(25)
and at the last time K ψ mK ( N −1) = −
NBC ( N )
∑ s =1
∀k ≠ K ψ kN −1 = ( I − M
N −1
)ψ km+(1N −1) −
NBC ( N )
∑ s =1
(( ys (k ) − yd s (k ))WsN −1
(26)
and at the last time K
207
Towards Autonomic Computing
ψ KN −1 = −
NBC ( N )
∑ s =1
(( ys ( K ) − yd s ( K ))WsN −1
(27)
For the hidden layers, where c ranges from 1 to N-2: ∀k ≠ K c mc mc ψ mc Fkc +1ψ ck +1 k = M ψ k +1 + W T
(28)
at the last time K mc ψ mc Fkc +1ψ ck +1 k =W T
(29)
∀k ≠ K, c ψ ck = ( I − M c )ψ mc Fkc +1ψ ck +1 k +1 + W T
(30)
and at the last time K ψ ck = W c Fkc +1ψ ck +1 T
(31)
For the layer 0: at the time k ψ mk 0 = M 0 ψ mk +01 + W m 0 Fk ψ1k T
(32)
at the last time ψ mK 0 = W m 0 FK ψ1K T
(33)
Partial Derivatives for the Computation of Gradient ∂J ∂L = ∀W ∈ W c , W mc , W bc , B c ∂W ∂W
(34)
∂L = 0 ∂W 0
(35)
∂L = −∑ FkVi c (k )ψ ck +1 ∂Wi c k
(36)
At the optimum: ∇J = ∇L ⇒
{
}
with c from 1 to N-2, ∂L = ∑ ( ys (k ) − yd s (k ))V N −1 (k ) ∂WsN −1 k
208
(37)
Towards Autonomic Computing
with c from 0 to N-2, ∂L = −∑ FkVi mc (k )ψ ck +1 ∂Wi mc k
(38)
∂L = ∑ ( ys (k ) − yd s (k ))V m ( N −1) (k ) ∂Wsm ( N −1) k
(39)
∂L ∂L = = −∑ (V mc (k − 1) − V c (k − 1))ψ mc k ∂W bc ∂M c k
(40)
with c from 0 to N-1, ∂L = −∑ Fk ψ ck ∂B c k
(41)
with c from 1 to N-1 and ∂L = ∑ ( ys (k ) − yd s (k )) ∂BsN k
(42)
The weights and the biases of the neural network are adjusted according to: W k+1=W k+η∆W k and Bk+1=Bk+η∆Bk, where the learning rate, η, is adjusted using an algorithm. Various techniques of a one dimensional optimisation have been developed to adjust the order of the learning rate during the neural network training such as Wolfe-Powell rule, dichotomy method and Goldstein rule. The last rule is used in this chapter because it sacrifices accuracy in the line search routine in order to conserve overall computation time. The essential idea is that this rule should first guarantee that the selected η is not too large and next it should not be too small (Ramdane-Cherif et al., 1996). Let E() denote the total squared error function to be minimized, P is the gradient vector, w is the weight coefficients values and ξ1, ξ2 are small predefined parameters (ξ1 ∈ [0, 1] and ξ2 ∈ [ξ1, 1]). This rule can be written as: a) Choosing an initial value ηmin = 0 and ηmax = 10 Computing g (0) = ∇E t ( w) P ,
(43)
Choosing initial value of η b) Computing g (η) = E ( w + ηP )
(44)
If g (η) ≤ g (0) + ξ1ηg (0) go to (c)
(45)
Else setting ηmax = η and go to (e) c) Comparing g (η) and g (0) + ξ 2 ηg (0)
(46)
If g (η) ≥ g (0) + ξ 2 ηg (0) End
(47)
Else ηmin = η e) Selecting a new value of η from [ηmin, ηmax] and return to (a). 209
Towards Autonomic Computing
Table 1. Test performance Training set
SSE
Test set
SSET
1000 pts
0.05
npt = 500 pts
0.08
Equation (45) ensures that η is not too large at the beginning of the training process and (47) ensures that η is not too small at the final of the training process.
Simulation For our simulation, we used a three degrees of freedom planar robotic arm with links l1=2m, l2=1m and l3=0.5m. The network is made up of one hidden layer, three input layer neurons for the Cartesian coordinates (x1,x2 ,z), 15 neurons in the hidden layer and three neurons in the output layer for the joint positions (q1,q2 ,q3). This neural network is trained on a set of Cartesian coordinates (x1,x2 ,z) which span the first quadrant of the manipulator’s workspace (1000 points). Substituting (x1,x2 ,z) in our algorithm (Ramdane-Cherif et al.96), we obtain the output set of examples (q1,q2 ,q3). The initial weights of the neural network are chosen randomly within the range -1 to +1. We have adopted the Sum Squared Error criterion (SSE) function of the iteration number to evaluate the training performance. We see that the MNN leads to a fast minimization of the criterion function. To verify generalization properties of the new architecture, the weight matrix obtained at the end of the learning phase is used. The test set is chosen out of the range of the learning set interval (500 pts). Tab.1 gives the test performance. By comparing the SSE and the sum squared error (SSET) of the test set, we prove that the neural network has done a good job of learning the inverse geometrical model.
Experiment #1: Using the Neural Network to Follow a Straight-Line Trajectory While Minimizing the Performance Criterion φ(q) This neural network is used in the position control scheme as a static map to follow the straight-line trajectory (x2=-x1+2) while minimizing the performance criterion, φ(q), as defined in (5). The results obtained for the tracking errors and criterion evolutions are shown respectively in Figure 5a and Figure 5b where ex1 and ex2 are the errors in the Cartesian coordinates (the joint positions of the NN outputs are used in (6) an (7) to obtain the Cartesian coordinates which are compared to the desired Cartesian coordinates). The simulation gives a smooth joint trajectory, which minimizes the objective function and follows the desired Cartesian trajectory. The tracking errors and criterion evolution are a little large at the beginning but very small for the remainder of the trajectory. In order to set off the good behavior of this neural network controller we compared it to the analytical inverse Jacobian based controller (Resolved Motion Rate Control: RMRC) where Kv is the diagonal matrix of joint velocity loop gains and the command,τ , can be calculated as:
t =Kv(J −1K p(X d − X)−q).
(48)
If Xd(t) lies inside the work envelope and does not go through singularities, then a joint space trajectory qd (t) can be obtained by q = J −1x and the control torque is given by (48). The singularities associated with the Jacobian matrix are often too complicated to be determined analytically. In order to provide an innovative solution to this singularity problem, we propose using the previous neural network in the control scheme position. For this purpose, the robot end-effector is commanded to follow various straight-line trajectories (this robot has l1=2m, l2=0.5m and l3=0.5m).
210
Towards Autonomic Computing
Figure 5. (a) The Cartesian trajectory tracking error (meter vs. trajectory points) (b) the cirterion evolution (meter vs. trajectory points) 0.03
0.15
z= T N (∂φ(q)/∂q)
ex1
0.02
0.1
0.01 0
0.05
-0.01
ex2
-0.02
0
-0.03 -0.05
-0.04 -0.05 0
Figure 6. Results: (a) using the
1 0
2 0
a
3 0
4 0
5 0
6 0
-0.1
0
1 0
2 0
3 0
b
4 0
5 0
6 0
Fig. 5. (a) The Cartesian trajectory tracking error (meter versus trajectory points). (b) The criterion evolution (meter versus trajectory points). classical controller, (b) using teh neural controller. 1
0.95
B
0.9 0.8
A
0.85
x2(m)
x2(m)
0.7
0.8
0.6
0.75
0.5 0.4 0.7
B
0.9
0.75
0.8
0.85
a x1(m)
0.9
0.95
0.7
A
0.7
0.75
0.8
0.85 b
0.9
0.95
x1(m)
Fig. 6. Results: (a) using the classical controller, (b) using the neural controller.
1) Trajectory #1: The X1(t) trajectory starts from point A, which is in the neighborhood of a singularity (q2=π ⇔ x1 = 0.71m, x2 = 0.71m). Figure 6a shows the robot end-effector motion using the RMRC controller (solid line). The dotted line represents the desired trajectory. We see that this kind of controller leaves the manipulator uncontrolled since oscillations occur. Figure 6b shows the robot attempting to follow the same trajectory utilizing the neural network controller. This controller allows the entire trajectory to be achieved with small deviations at the beginning, near the singular point. 2) Trajectory #2: The X2(t) trajectory starts from point C inside the manipulator workspace and leads to the workspace singular point D (q2=0 ⇔ x1 = 2.12m, x2 = 2.12m). Figure 7a and Figure 7b give the end-effector trajectory (solid line) respectively using the inverse Jacobian based controller and the neural controller. The desired trajectory is represented by dotted line. In the first, the robot end-effector behaves unexpectedly near the boundary singularity while with the neural controller, the desired path is approximately followed. 3) Trajectory #3: The X3(t) trajectory goes outside the manipulator workspace to the point E (x1 = 2.14m, x2 = 2.14m). The position response (Figure 8a) with the inverse Jacobian based controller shows very large deviations because the end-effectors are only permitted to move within the workspace. With a neural controller (Figure 8b), the arm attempts to stay close to the desired trajectory. Thus, these results demonstrate that the neural network controller can provide good trajectory tracking, even near singularities, compared to the classical RMRC scheme. The performance of this controller only depends on the neural network’s accuracy in approximating the inverse extended geometrical function. This gives more interesting results than compared to the methods presented in (Perdereau et al., 2002).
211
Towards Autonomic Computing
Figure 7. Results: (a) using the classical controller (b) using the neural controller 2.14
2.14
D
2.12
2.1 2.08
2.1 2.08
x2(m)
x2(m)
2.06
2.06 2.04
2.04
2.02 2
2
1.98
D
2.12
2.02
C 2.05
2
x1(m) a
2.1
2.15
1.98
C 2
2.05 b
2.1
x1(m)
2.15
Fig. 7. Results: (a) using the classical controller (b), using the neural controller
Figure 8. Results: (a) using the classical controller (b) using the neural controller 2.25
2.16 2.12
E
2.15
2.1
x2(m)
x2(m)
2.08
2.1
2.06 2.04
2.05 2
2.02
C
1.95
E
2.14
2.2
2
C
2 2.05 a
2.1
2.15
1.98
2
x1(m)
2.05 b
2.1
2.15
x1(m)
Fig. 8. Results: (a) using the classical controller (b) using the neural controller
Experiment #2: Neural Network to Follow a Cyclic Trajectory while Minimizing the Performance Criterion φ(q) and Satisfying Inequality Constraints h(q) The primary task of a robotic arm is to move its end-effector on a desired trajectory but, thanks to a redundant structure, the arm can simultaneously accomplish a secondary task. In the first experiment, the redundancy is carried out by minimizing the objective function φ(q) characterizing the additional task. In this second experiment, the objective function is subject to inequality constraints. Several constraints must be introduced into the objective function. Then, we deal with an constrained optimization problem. The algorithms usually proposed to solve constrained optimization problems can be divided into two categories: boundary-following methods and penalty-function methods. With the first we find, for example, the feasible-directions and the gradient-projection methods. These techniques change directions from the gradient of the objective function to the gradient of the violated constraint whenever a move outside the feasible region occurs. The penalty-function methods lead to safer computational algorithms compared to gradient methods, and involve fewer computations since they convert constrained minimization into the unconstrained optimization of an augmented objective function. In this simulation we decide to use the penalty-function method to solve the constrained optimization problem. The Neural Network has the same topology as in the first simulation. The learning algorithm is augmented to deal with the inequality constraints. However, we will derive new equations and use a new augmented objective function. We will compare the results for three different learning procedures: • • •
212
Using only the objective function Using the objective function and the exterior penalty method Using the objective function and the interior penalty method
Towards Autonomic Computing
The objective function φ(q) is augmented by the inequality or equality constraint vector h(q). In this case, the augmented objective function is Ω(q ) = φ(q ) + ∑ i =1 α i p (hi (q )) l
(49)
where l is number of constraints, αi is the ith penalty multiplier an p the penalty function. The constraints h(q) are: h(q) ≤ 0
(50)
Indeed, a penalty function p(h(q)) aims to describe how the task requirements are fulfilled, that is, this function tends to a null effect when the constraints h(q) is satisfied. There are several formulation of the penalty functions applied to the constraints in order to include them in our approach. 1) Exterior penalty functions: they account for the constraint only when this is violated (p(h)>0 if h(q)≥0 and p(h)=0 if h(q)=0). One possible exterior penalty function is p(hi) = hi2Γ(hi) where Γ(hi) is the Heaviside function. The solution is outside the feasible region and tends towards the limit 0 (h(q) ≤ 0) as the αi penalty parameters increase. In this case we have: l
Ω(q ) = φ(q ) + ∑ α i hi2 Γ(hi (q ))
(51)
i =1
and ∂h (q ) ∂Ω(q ) ∂φ(q ) l = + ∑ α i hi2 i Γ(hi (q )) ∂q ∂q ∂q i =1
(52)
2) Interior penalty functions: they avoid impacts and acts at all times in a repulsive field manner (p(h)>0 if h(q)<0 and p(h)→∞ if h(q) →0). In general this function is expressed as p(hi)= −1. The solution is always inside hi the feasible region and tends towards the limit 0 (h(q) ≤ 0) as αi decreases. In this case we obtain: −1 hi (q )
(53)
∂hi (q ) ∂Ω(q ) ∂φ(q ) l 1 = + ∑ αi 2 ∂q ∂q ∂q ( h ( q )) i =1 i
(54)
l
Ω(q ) = φ(q ) + ∑ α i i =1
and
The appropriate joint configuration vector qd is now the inverse solution of the redundant robot when minimizing the augmented objective function Ω(q) instead of φ(q). The augmented geometric model is then f (q) X = F (q ) = T ∂Ω(q ) N ∂q
(55)
For this simulation, we propose the kinematic learning scheme shown in Figure 9. (for a given desired Cartesian trajectory, the NN can compute the corresponding joint trajectory. Then, we use (55) to convert these joint positions to the Cartesian positions and calculated the errors. This error is used in the updating weights algorithm). Our objective is the minimization of the performance function subject to joint limitations.
213
Towards Autonomic Computing
In practice, the relative movement of the robot link is geometrically limited within its minimal and maximal bounds: qimin < qi < qimax
(56)
thus, we have 2n scalar inequality constraints written as: hi (q ) = qi min − qi ≤ 0 h(q ) ≤ 0 ⇔ hi + n (q ) = qi − qi max ≤ 0. i = 1, 2,......., n
(57)
A 3DOF planar robot is used to demonstrate the validity of our method Figure 10. The desired path used in our simulation is given by: x (t ) 3 ⋅ cos(2πt ) xd = 1 = x2 (t ) sin(2πt )
(58)
The forward kinematic function is: x l ⋅c + l ⋅c + l ⋅c f (q ) = 1 = 1 1 2 12 3 123 x2 l1 ⋅ s1 + l2 ⋅ s12 + l3 ⋅ s123
(59)
and the (n-m) null space vector of J is: Nt = [–l2 · l3 · s3l1 · l3 · s23 + l2 · l3 · s3 – l1 · l2 · s2 – l1 · l3 · s23]
(60)
where si = sin(qi), sij = sin(qi + q j), sijk = sin(qi + q j + qk), ci = cos(qi), cij = cos(qi + q j), cijk = cos(qi + q j + qk), and the link lengths are l1=1.5m, l2=1m and l3=0.5m. For this experiment we chose the following objective function: φ(q ) =
1 3 2 ∑ qi . 2 i =1
Figure 9. The proposed learning scheme ..
q(k)
Xd(k)
.
NN
+
X(k)
-
F(q) Updating weights
e(k)
214
q(k) q(k )
e(k)
(61)
Towards Autonomic Computing
Figure 10. The redundant robot (joint limitation constraints on the third joint position) x2
q3 ∈ A B
l3
B
A
O
l2
l1 x1
The solution must satisfy: i) the desired trajectory xd is tracked by the manipulator, ii) the objective function φ(q) is minimized, and iii) the joint q3 is maintained within its minimal and maximal bounds:
h(q)≤0 ⇔q3min
(62)
In this simulation, q3min=-0.4 rad and q3max=0.4 rad. According to (3) the constraint equation becomes: q1 g (qk ) = N t q2 = 0 q3 mod
(63)
where q3mod is given by: q3 mod = q3 + 2α1 (q3 − q3 max )Γ(q3 − q3 max ) + 2α 2 (q3 min − q3 )Γ(q3 min − q3 )
(64)
when using the exterior penalty method and q3 mod = q3 + α1
1 1 + α2 2 (q3 − q3 max ) (q3 min − q3 ) 2
(65)
when the interior penalty method is utilized. In order to avoid a large number of iterations in the training procedure, the penalty multipliers, αi, are taken as constant values determined by experimental procedure before the NN training. After several trials, the best values of the penalty multipliers are found to be respectively α1 = 1000, α2 = 1000 for the exterior penalty and α1 = 0.01, α2 = 0.01 for the interior penalty. Three training procedures are performed using a few randomly selected points along the desired trajectory. The NR is then tested on all the trajectories.
215
Towards Autonomic Computing
The first learning procedures use the objective function φ(q) without constraints, h(q), the second ones use φ(q) with constraints and applying the exterior penalty method. The last ones use φ(q) with constraints and applying the interior penalty method. Figure 11a shows the arm configurations using the objective function φ(q) without inequality constraints. By applying inequality constraints, appropriate motions of q3 are now obtained, as shown by Figure 11b and Figure 11c which represent arm configuration for the exterior and interior penalty methods, respectively. Figure 11d compares the evolution of q3 in three cases: without constraints (the third joint angle goes out of its allowed bounds, that is to say, in part of the motion, q3 is beyond its maximum limit), with the exterior penalty function and, with the interior penalty function. In Figure 11d, the solid line is the no constraints case, the “X”s line is the exterior penalty case and the “O”s line is the interior penalty case. It is clear that q3 is maintained within its limits when employing either penalty method. With the exterior penalty method, the position is bounded when the constraints are violated, while with the interior penalty method the position trajectory is smoother. We point out that the tracking error remains very small and the additional function g(q) (26) is near the minimum for the entire trajectory (g(q) ≈ 0). Next we used the same penalty multipliers, the same inequality constraint, and the same robot as used in the previous simulation for another test on a different desired trajectory. The new trajectory is a square with the summits ((3,0);(0,1);(-3,0);(0,-1)). In this simulation, the constraints are satisfied in the same way as the previous one. Figure 12a and Figure 12b show the arm configurations and the joint positions along the trajectory using the φ(q) without constraints. Figure 12c gives the evolution of the joint position (q3) using φ(q) with constraints. In the case of the minimization of φ(q) without inequality constraints, the joint position q3 goes out of its allowed bounds (0.4 rad). By applying inequality constraints (exterior (‘+’) or interior (‘o’)) the joint position q3 remains within its limits. Figure 11. Arm configuration along a part of the trajectory, (a) using φ(q) without constraints, (b) using φ(q) with constraints (exterior penalty), (c) using φ(q) with constraints (interior penalty), (d) joint position q3 (rad) versus trajectory points x2
x2 1.2
1.2 1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.5
0
0.5
1
1.5
2
-0.5
x1
2.5
a
0
0.5
1
1.5
2
2.5
50
60
70
b
x2
x1
1
1.2 1 0.5
0.8 0.6 0.4
0
0.2 0 -0.5 -0.2 -0.4 -1 -0.5
216
0
0.5
1
c
1.5
2
2.5
x1
0
10
20
30
40
80
90
100
d
Fig. 11. Arm configuration along a part of the trajectory, (a) using φ(q) without constraints, (b) using φ(q) with constraints (exterior penalty), (c) using φ(q) with constraints (interior penalty), (d) joint position q3 (rad) versus trajectory points.
Towards Autonomic Computing
Figure 12. (a) Arm configuration for the second trajectory, (b) joint positions using the φ(q) without constraints (y label represents the joint position in rad and the x label represents the trajectory points), (c) joint position q3 (rad) versus trajectory points (y label represents the joint positions in radian and the x label represents the trajectory points)
x2
1.5
1
0.5
0
-0.5
-1
-1.5
-3
-2
-1
a
0
1
2
3
x1
1.2
7 6
1 5 4
0.8
3 0.6
2 1
0.4 0 -1
0.2
-2 -3
0
20
40
b
60
80
100
120
0 0
20
40
60
c
80
100
120
140
Fig. 12. (a) Arm configuration for the second trajectory, (b) joint positions
In this simulation, a penalty approach dealsconstraints with a constrained optimization to solve the inverse kinematic using the which φ(q) without (y label represents the joint position in rad of redundant robot manipulators and is looked at.represents An optimization using a neural network is formulated the x label the trajectoryprocedure points), (c) joint position q3 (rad) trajectoryin points label represents the joint and positions in radian and the and it produces on-line positionversus trajectories joint(yspace from position orientation trajectories in Cartesian represents the trajectory points). space. In the case of constrained optimization, ax label penalty-function approach based on an MNN is used. Compared to the conventional methods, the proposed neural network offers substantially better accuracy and guarantees good minimization of a performance function subject to joint limitations while achieving the end-effector task.
Conclus In this chapter, an artificial neural network is investigated for the purpose of trajectory control problems in robotics. An MMNN network is applied to learn the inverse geometrical model. Then the MMNN is used in the position based controller for a redundant robot which is restricted by various constraints. Good results have been obtained: the tracking error is very small and the criterion is minimized effectively. In order to set off the good behavior of this neural network controller we also compared it to the analytical inverse Jacobian based controller. The results demonstrate that, when compared to the classical RMRC scheme, the neural network controller ensures good trajectory tracking results even near singularities. The performance of this controller only depends on the accuracy of the neural network in its ability to approximate the inverse extended geometrical function. The simulation results discussed in the last section exhibit good tracking of the end-effector trajectory and show that additional constraints can be satisfied. The basic idea in this case, is a constrained optimization using a penaltyfunction approach and based on MMNN. This neural network can be used in other command structures and transformations. In this chapter we have shown that a cognitive approach through the neural network paradigm allows the resolution of some complex problems which cannot currently be solved with conventional methods. The MMNN is capable of adapting to varying conditions which is an important feature of Autonomic computing systems.
217
Towards Autonomic Computing
References Baillieul, J. (1985). Kinematic programming alternatives for redundant manipulators. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 722-728). St. Louis, MO. Bestaoui, Y. (1991). An unconstrained optimization approach to the resolution of the inverse kinematic problem of redundant and non-redundant robot manipulators. Int. J. Robotics, Autonomous Systems, 7, 37-45. Chatry, N., Perdereau, V., Drouin, M., Milgram, M., & Riat, J. C. (1996, May). A new design method for dynamical feedback networks. In Proc. Int. Sym. Soft Computing for Industry, Montpellier, France. Chiew, V., & Wang, Y. (2003, August). A multi-disciplinary perspective on cognitive informatics. The 2nd IEEE International Conference on Cognitive Informatics (ICCI’03) (pp. 114-120). London, UK: IEEE CS Press. Chen, Y.-C., & Walker, I. D. (1993). A consistent null-space approach to inverse kinematic of redundant robots. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 374-381). Atlanta, USA. Chevallereau, C., & Khalil, W. (1988). A new method for the solution of the inverse kinematics of redundant robots. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 37-42). Philadelphia, USA. Dubey, R. V., Euler, J. A., & Babcock, S. M. (1991). Real-time implementation of an optimization scheme for seven-degree-of-freedom redundant manipulators. In IEEE Tran. Sys. Robotics and Automation, 7(5), 579-588. Fang, G., & Dissanayake, M.W.M.G. (1993, July). A neural network-based algorithm for robot trajectory planning. Proceedings of International Conference of Robots for Competitive Industries. (p. 521-530). Brisbane, Qld, Australia. Fang, G., & Dissanayake, M.W.M.G. (1998). Experiments on a neural network-based method for time-optimal trajectory planning. Robotica, 16, 143-158. Featherstone, R. (1994). Accurate trajectory transformations for redundant and non-redundant robots. In Proc. IEEE Int. Conf. on Robotics and Automation (pp.1867-1872). San Diego, USA. Jung, S., & Hsia, T.C. (2000). Neural network inverse control techniques for PD controlled robot manipulator. Robotica, 18, 305-314. Hunt, K. J., et al. (1992, November). Neural networks for control systems. Automatica, 28(2), 1083-1112. Hunt, K. H. (1987, March). Robot kinematics - A compact analytic inverse solution for velocities. In ASME J. Mechanisms, Transmissions and Automat. Design, 109, 42-49. Guez, A., & Ahmad, Z. (1989, June). Accelerated convergence in the inverse kinematics via multiplayer feedforward networks. In Proc. IEEE Int. Conf. Neural Networks (pp. 341-344). Washington, USA. Kawato, M., Maeda, Y., Uno, Y., & Suzuki, R. (1990). Trajectory formation of arm movement by cascade neural network model based on minimum torque-change criterion. Biological Cybernetics, 62(275), 288. Kieffer, S., Morellas, V., & Donath, M. (1991, April). Neural network learning of the inverse kinematic relationships for a robot arm. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 2418-2425). Sacramento, CA. Klein, C., Chu-Jenq, C., & Ahmed, S. (1993). Use of an extended jacobian method to map algorithmic singularities. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 632-637). Atlanta, USA. Klein, C. A., & Huang, C. H. (1983, April). Review of pseudoinverse control for use with kinematically redundant manipulators. In IEEE Tran. Sys. Man and Cyb., SMC-13, 3, 245-250. Kim, S. W., Park, K. B. & Lee, J. J. (1994). Redundancy resolution of robot manipulators using optimal kinematic control. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 683-688). San Diego, USA.
218
Towards Autonomic Computing
Kircanski, M., & Petrovic, T. (1993). Combined analytical-pseudoinverse inverse kinematic solution for simple redundant manipulators and singularity avoidance. Int. J. Robotics Research, 12(1), 188-196. Liegeois, A. (1986, December). Automatic supervisory control of the configuration and behavior of multi-body mechanisms. In IEEE Tran. Sys. Man and Cyb., SMC-7, 3, 868-871. Nakamura, Y., & Hanafusa, H. (1987). Optimal redundancy control of robot manipulators. Int. J. Robotics Research, 6(1), 32-42,. Perdereau, V., Passi, C., & Drouin, M. (2002). Real-time control of redundant robotic manipulators for mobile obstacle avoidance. In Int. J. Robotics, Autonomous Systems, 41, 41-59. Ramdane-Cherif, A., Perdereau, V., & Drouin, M. (1995, November-December). Optimization schemes for learning the forward and inverse kinematic equations with neural network. In Proc. IEEE Int. Conf. Neural Networks, Perth, Australia.. Ramdane-Cherif, A., Perdereau, V., & Drouin, M. (1996, April). Penalty approach for a constrained optimization to solve on-line the inverse kinematic problem of redundant manipulators. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 133-138). Minneapolis, USA. Sastry, P. S., Santharam, G., & Unnikrishnan, K. P. (1994, March). Memory neuron networks for identification and control of dynamical systems. In IEEE, Tran. Neural Networks, 5(2), 306-319. Wang, Y., & Kinsner, W. (2005). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C),36(2), 121-123. Wang, Y., & Wang Y.(2002, August). Cognitive models of the brain. Proceedings of the First IEEE International Conference on Cognitive Informatics (ICCI’02) (pp. 259-269). Calgary, AB., Canada: IEEE CS Press. Wang, Y. & Liu, D. (2003, August). On information and knowledge representation in the brain. The 2nd IEEE International Conference on Cognitive Informatics (ICCI’03), (pp. 26-31). London, UK: IEEE CS Press. Wang, Y. (2005, August). On cognitive properties of human factors in engineering. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 174-182). Irvine, CA: IEEE CS Press. Wang, Y. (2003). Cognitive informatics models of software agents and autonomic computing. Keynote speech at The First International Conference on Agent-Based Technologies and Systems (ATS’03) (pp. 25-26).. Canada: University of Calgary Press.
Endnote 1
Portions of this chapter were presented at the IEEE International Conference on Cognitive Informatics, British Columbia, Canada, August 2004.
219
220
Chapter XV
Cognitive Modelling Applied to Aspects of Schizophrenia and Autonomic Computing Lee Flax Macquarie University, Australia
ABTRACT We give an approach to cognitive modelling, which allows for richer expression than the one based simply on the firing of sets of neurons. The object language of the approach is first-order logic augmented by operations of an algebra, PSEN. Some operations useful for this kind of modelling are postulated: combination, comparison, and inhibition of sets of sentences. Inhibition is realised using an algebraic version of AGM belief contraction (Gärdenfors, 1988). It is shown how these operations can be realised using PSEN. Algebraic modelling using PSEN is used to give an account of an explanation of some signs and symptoms of schizophrenia due to Frith (1992) as well as a proposal for the cognitive basis of autonomic computing. A brief discussion of the computability of the operations of PSEN is also given.
Introduction Here we give an account of an algebraic method of modelling cognitive processes. What is unusual about our approach is that the material of the model consists of sets of logical sentences and entailments between them. These components are used to build a preboolean algebra, PSEN, whose elements are sets of sentences. Our approach correlates a set of sentences to a brain component, and a neural path from one component to another, induces an entailment between their correlated sets of sentences. A set of sentences correlated to a brain component is meant to model its cognitive state. This approach to brain functioning sees its behaviour as embodied in the firing of sets of neurons inducing other neurons to fire along neural pathways. This view can be modelled logically in a simple way in a language with one relation symbol, which is interpreted as the firing of a set of neurons. But it is not easy to see what the firing of a particular set of neurons might be connected to as a concept that a person, acting as a modeller, might readily comprehend. Accordingly, in our approach the complete syntax of first-order logic is used. Even though this is an abstraction from the simple language hinted at the previous, this allows for richer expression in the object language for the modeller. So at each cognitive component there will be sets of sentences expressed in first-order logic that correspond to some cognitive situation. To be useful, these sets of sentences need to be operated on and combined in different ways. The algebra, PSEN, is used to perform the operations. We show how PSEN has sufficient power to model a cognitive neuropsychological model of some signs and Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Schizophrenia and Autonomic Computing
symptoms of schizophrenia due to Frith (1992). The algebraic model uses Boolean operations as well as an algebraic formulation of the contraction operation of belief revision (Gärdenfors, 1988, p. 9). An examination of this modelling leads to the proposal of a set of operations useful for cognitive modelling. These are then applied to show how some aspects of autonomic computing can be treated as a cognitive process. Finally, with a view toward computer simulation of cognitive modelling, computability of the operations of PSEN is briefly discussed. The second section develops a simple logical model, which we have called neural logic, to model the view of cognitive functioning based on the firing of groups of neurons. Next, in the third section some examples of pathology exhibited in schizophrenia are used to illustrate the working of a component model of cognitive functioning. Some operations are abstracted from this analysis and their application to autonomic computing described in the fourth section. The fifth section briefly surveys the syntax and semantics of first-order logic, which is proposed as the “language of thought” and also used as the basis for the algebra PSEN. The sixth section describes PSEN and its use in modelling, the examples from schizophrenia of the third section, and autonomic computing of the fouth section. The computability of the operations of PSEN is discussed in the last section.
Cognitive Components and Neural Logic We conceive of cognitive functioning being based upon a system of cognitive components, some of which are connected by neural pathways. This is an abstraction from the actual structure and function of the brain. Cognitive components are an abstraction of physical regions in the brain. Abstract pathways are taken to be channels, which convey information and each pathway corresponds to a bundle of neuronal chains linking one brain region to another. We assume that there is a resultant electrical flow of current in one direction only in a neuronal bundle. The correspondence of this abstraction to reality may not be perfect, but we take it as the basis for our cognitive modelling. Figure 1, based on Frith (1992), is an example of this kind of abstraction. It shows seven cognitive components. Some of them are connected by pathways, which are shown as arrows. For example, there
Figure 1. Action with monitoring
221
Schizophrenia and Autonomic Computing
is a pathway connecting the components action and monitor. The direction of the arrow indicates that activity in the action component induces activity in the monitor component. What kind of information do the abstract pathways convey and what activity do they induce in cognitive components? To answer this we again abstract from what actually happens in the brain. First, we give a simplified description of brain activity and then describe our abstraction of it. Suppose there is a neural pathway from brain region A to B. We say that A is the source of the pathway and B is the target. The electrical nature of the neural pathway means that some neurons firing in A cause some neurons to fire in B. Say the firing of neurons a1,...,am in A causes neurons b1,...,bn to fire in B. We can describe the firing of those neurons in B by using a simple language. If the set of neurons {b1,...,bn} is firing we write fires({b1,...,bn}). This can be written succinctly as fires(p), where p = { b1,...,bn}. What we have done previously is to abstract from the physical behaviour of brain region B and use a simple language to describe its behaviour. We think of the language as something that exists at the level of the abstract cognitive component B. The components of the language are the constant symbols such as p (names of sets of B’s neurons) and a one-place relation symbol, fires. The logical expression fires(p) is a sentence (because it has no free variables). We can now ask when a sentence in the language is true. We do this by interpreting the sentence in a structure and then checking whether the structure satisfies the sentence. If the structure does satisfy the sentence, we say the sentence is true in that structure. Here is an example of a structure, S, that we can use in our case. S has the following component parts: 1. 2. 3.
A set, denoted dom(S), which is the domain of interpretation of S. In our case, we take dom(S) to be the family of all possible sets of neurons. So {a1,...,am} and {b1,...,bn} are both elements of dom(S). For every name c of a set of neurons, S maps c to a member of dom(S), denoted S(c). In our case, we map c to the set of neurons it names. So for example, S(p)= { b1,...,bn }. S maps the one place relation symbol, fires, to a subset of dom(S) denoted S(fires). In our case we expect S(fires) to reflect reality, so we set out to define it to do this.
A set of neurons, q, is an element of S(fires) if and only if q belongs to a brain region which is the target of some pathway and q fires as a result of the firing of some set of neurons in the source of the pathway. In our case {b1,...,bn} fires as a result of the firing of {a1,...,am} in the source, A, of the pathway from A to B. So {b1,...,bn} is in S(fires). This completes the description of the component parts of S. We can now say what it means for the structure S to satisfy the sentence fires(p). The idea is that the structure S satisfies the sentence fires(p) if and only if the translation of the sentence into objects related to the domain of S behaves, in that domain, as the sentence says it should. The translation of the name p is S(p) = { b1,...,bn} according to item 2 above. The translation of fires was defined to be a certain set in item 3 above, and it has {b1,...,bn} as an element. This is exactly what is needed for S to satisfy the sentence fires(p). This can be stated more precisely as follows. The structure S satisfies fires(p), denoted S fires(p), if and only if S(p)∈S(fires). So we see from the discussion in the previous paragraph that, indeed, S fires(p) and so fires(p) is satisfied by S. In contrast to this suppose that r = {bn+1, bn+2} is not made to fire by any subset of neurons in any brain region, then according to the definition in item 3 above { bn+1, bn+2}∉S(fires). So S does not satisfy fires(r). If S does not satisfy fires(r), we write S fires(r). The language can be expanded to include the negation sign, ¬, and the satisfaction rule for negation is specified as follows: S ¬fires(r) if and only if S fires(r). To summarise, we have defined a simple logical language based on the working of brain components and the paths between them. We shall call this language neural logic. Neural logic has names for sets of neurons, a oneplace relation, fires, and the negation sign, ¬. We have also defined the notion of a structure, which interprets the language into a domain, and what it means for a structure to satisfy a sentence of the language. However using neural logic to do cognitive modelling presents a difficulty. It is hard to see how to link neuron firing patterns to things we as human cognitive modellers may wish to express. This is a problem that must be solved, but it may take some time to do this completely. So we take another step in abstraction, which is to enrich neural logic to first-order logic, which is more expressive than neural logic, and propose using this as “the
222
Schizophrenia and Autonomic Computing
language of thought.” The syntax and semantics of first-order logic are described in the fifth section. The following two sections use the architecture shown in Figure 1 to discuss an approach to cognitive modelling using cognitive components and information flows between them. The third section gives Frith’s explanation for some signs and symptoms of Schizophrenia, while the fourth section takes some steps toward addressing the area of autonomic computing as if it were a cognitive process.
Exmples of Pathology in Cognitive Function In Frith (1992), he presents a theory in which he explains several signs and symptoms of schizophrenia in terms of a cognitive model relating action to goals and planning. Our intention here is not to argue for or against the correctness of Frith’s account, but rather to describe it here and to show how it can be modelled algebraically in the sixth section.
Fith’s Account of Signs and Symptoms Schizophrenia is a psychiatric illness. Patients suffering from schizophrenia report positive symptoms such as the following (a symptom is something a patient describes, it cannot be observed): thought insertion and delusions of control. Also there are abnormalities in behaviour that can be observed, these are called signs. Signs can be positive or negative. An example of a negative sign is poverty of speech. A positive sign is incoherence of speech. Explanations of these terms, taken from Frith (1992), are given below. Positive symptoms: Thought insertion--patients experience thoughts coming into their mind from an outside source. Delusions of control—patients experience their actions being con trolled by an outside force. Negative sign: Poverty of speech—answers are restricted to the minimum number of words necessary. Positive sign: Incoherence of speech—grammar is distorted, there are unexpected shifts of topic, there is a lack of logical connection between sentences. Frith explains how these signs and symptoms can occur using a model that correlates corresponding to the following cognitive components: perception, response, stimulus intention, action, willed intention, goals/plans, and a cognitive monitor. A schematic diagram based on Frith (1992) is shown in Figure 2. It gives the pathways between the components and is the same as Figure 1 also shows three possible diconnections (black ellipses) located on three different pathways. The pathway from goals/plans to stimulus intention is inhibitory; that is goals that activate this pathway will have an inhibiting effect on corresponding stimulus intentions. If a disconnection is present, then no signal or information can flow along the pathway connecting its two end components. Frith argues that it is the presence of these disconnections that can cause symptoms and signs of schizophrenia. Here are his explanations for three examples (Frith, 1992).
ExAMPLE 1.
Poverty of speech is induced by a failure of willed intention. Willed intention involves the path (see Figure 2): goals/plans → willed intention → action → response. If disconnection 1 is present, then goals fail to generate intentions and this explains poverty of speech in the context of speech action, because according to Frith (1992):
223
Schizophrenia and Autonomic Computing
Figure 2. Action with monitoring and disconnections
The lack of behaviour in patients with negative features seems to occur specifically in situations in which actions have to be self generated. If willed intention is absent, then there is no self generation of actions. 2. 3.
Incoherence of speech is induced by a failure to inhibit stimulus-driven action. Stimulus-driven action involves the path: perception → stimulus intention → action → response. If disconnection 2 is present, then unwanted or inappropriate speech resulting from perception fails to be inhibited at stimulus intention. Delusion of control is induced by “information about willed intentions (failing) to reach the monitor” (Frith, 1992). This will happen if disconnection 3 is present. Frith explains how this arises as follows (Frith, 1992).
Thinking, like all our actions, is normally accompanied by a sense of effort and deliberate choice as we move from one thought to the next. If we found ourselves thinking without any awareness of the sense of effort that reflects central monitoring, we might well experience these thoughts as alien and, thus, being inserted into our minds. Similarly, actions would appear to be determined by external forces if there was no awareness of the intention to act. The abstract modelling of the cognitive component architecture used here regards the various behaviours described in the examples as inducing sets of sentences in cognitive components. These sets of sentences are taken to be elements in an algebra called PSEN. In the sixth section, it will be shown how an algebraic account can be given of the behaviours described here. The next section proposes that the same component architecture as described in Figure1 can be used for the “cognitive component” of an autonomic computing system.
Agnitive Architecture for Autonomic Computing According to Anonymous (2005) and Wang (2007) autonomic computing is an initiative started by IBM in 2001. Its ultimate aim is to create self-managing computer systems to overcome their rapidly growing complexity and to enable their further growth.
224
Schizophrenia and Autonomic Computing
IBM (2005) says that autonomic computing is an approach to self-managed computing systems with a minimum of human interference. The term derives from the body’s autonomic nervous system, which controls key functions without conscious awareness or involvement. Also IBM says (IBM, 2005) that the following eight elements are characteristic of an autonomic computing system: 1. 2. 3. 4. 5. 6. 7. 8.
To be autonomic, a computing system needs to “know itself” and comprise components that also possess a system identity. An autonomic computing system must configure and reconfigure itself under varying and unpredictable conditions. An autonomic computing system never settles for the status quo--it always looks for ways to optimize its workings. An autonomic computing system must perform something akin to healing--it must be able to recover from routine and extraordinary events that might cause some of its parts to malfunction. A virtual world is no less dangerous than a physical one, so an autonomic computing system must be an expert in self-protection. An autonomic computing system knows its environment and the context surrounding its activity, and acts accordingly. An autonomic computing system cannot exist in a hermetic environment. Perhaps most critical for the user, an autonomic computing system will anticipate the optimized resources needed while keeping its complexity hidden.
Implementation of systems exhibiting the previous eight characteristics is, no doubt, a far reaching and complex endeavor and will take significant time and effort to see through. However it seems that first steps can be taken toward modelling self-knowledge and knowledge of the environment in which a system finds itself. These capabilities are important aspects of items 1 and 6 above. We take as our basic model that depicted in Figure 1 based on Frith (1992). This model involves self-knowledge as well as the incorporation of stimuli from the environment. We postulate that self-knowledge derives from the operation of the willed intention and monitor components of the system. We assume that each cognitive component incorporates data (in the form of first-order sentences) relating to each part of the autonomic computing system. An examination of the examples described in the third section on some signs and symptoms of schizophrenia, leads us to propose the following operations as being of use in cognitive component modelling. Suppose there are two pathways, b and c, ending in cognitive component B. Then there are two sets of sentences at B, X, and Y say, induced by entailment along b and c respectively. The basic idea behind our approach to cognitive modelling is to find useful ways to combine and compare sets such as X and Y. The following examples refer to Figure 1.
Cmbining Sets of Sentences Suppose WI and SI are sets of sentences that are associated with the cognitive component action by entailment from the cognitive components willed intention and stimulus intention respectively. The two sets may need to be combined into a set associated with cognitive component action. An example would be where the sets WI and SI do not interact (are independent) and action performs on the basis of the combined set.
Comparing Sets of Sentences Suppose WI and A are sets of sentences that are associated with the cognitive component monitor by entailment from the cognitive components willed intention and action respectively. The two sets may need to be compared for equivalence and the result acted on by monitor. An example of this would be where, in a mission-critical application, monitor needs to check that potential performance of action coincides with what has been induced at willed intention on the basis of the functioning of goals/plans.
225
Schizophrenia and Autonomic Computing
Inhibition of a Set of Sentences Figure 1 shows an inhibitory path from goals/plans to stimulus intention. Suppose perception induces a set of sentences SI at stimulus intention, and suppose that goals/plans induces the set of sentences GP by entailment (without inhibition) at stimulus intention. What is needed is that all sentences that conflict with GP or entail conflict with GP be removed from SI. An example would be where the autonomic system becomes aware of the details of a malicious intrusion and this knowledge is suitably incorporated into goals/plans so that GP is designed to counter this intrusion. Then, under this inhibitory set-up, an attempt by SI to comply with the intruder’s demands would be neutralised.
Temporal Behaviour Modal temporal operators need to be included in the language so as to allow for dynamic behaviour.
Keyed Memory The capacity is needed to form a set of sentences whose members are retrieved from several different cognitive components based on a key. The modal “time signatures” of the sentences may relate to different times in the past. In the sixth section we show how each of the operations previously listed, except temporal behaviour and keyed memory, which are items for future work, can be modelled in PSEN. The next section is a terse summary of the elements of first-order logic.
First-Order Logic At the end of the second section, we proposed using first-order logic as the language of thought and so we briefly describe its syntax and semantics in this section. Our treatment is standard (Ebbinghaus, Flum, & Thomas, 1984). Some of the terminology introduced in Section 2 defined again here more precisely. The vocabulary consists of a countable number of individual variables, a countable number of constant symbols, a countable number of function symbols of any finite arity, and a countable number of relation symbols of any finite arity. A first-order language consists of formulas, and formulas are built from terms. Terms are built recursively from individual variables, constants, and function symbols as follows. • • •
An individual variable is a term. A constant is a term. If f is a function symbol of arity n, and t1,...,tn are terms, then f(t1,...,tn) is a term.
Formulas are built recursively from terms, relation symbols, connectives and quantifiers as follows. If p is a relation symbol of arity n and t1,...,tn are terms, then p(t1,...,tn) is a formula. Formulas of this kind are called atomic formulas. • • •
If ϕ and ψ are formulas, then so are ¬ϕ, ϕ∨ψ, ϕ∧ψ and ϕ→ψ. If ϕ is a formula and x an individual variable, then (∀χ)ϕ is a formula. If ϕ is a formula and x an individual variable, then (∃χ)ϕ is a formula.
If ϕ is a formula and x is an individual variable of ϕ, then x is free in ϕ if x does not fall within the scope of a quantifier appearing in ϕ. If x does fall within the scope of a quantifier of ϕ, then it is bound. A sentence is a formula with no free variables.
226
Schizophrenia and Autonomic Computing
The semantics of this language is examined in the usual way by considering the behaviour of interpretations on formulas (Ebbinghaus et al., 1984). There are two equivalent ways of regarding the behaviour of an interpretation on a formula. First, one can define what it means for an interpretation to evaluate a formula as either TRUE or FALSE. Secondly, one can define what it means for an interpretation to satisfy a formula. As usual, one then finds that a formula is TRUE under an interpretation if and only if the formula is satisfied by the interpretation. We take the idea of satisfaction as primary and an interpretation’s action on formulas is defined in terms of satisfaction (see next). An interpretation U consists of two parts: a structure, S say, and an assignment of variables, u say. So the interpretation U is an ordered pair: U = (S, u). A structure S consists of the following four things: • • • •
A set D associated with S called its domain. The set D is also denoted dom(S). A mapping of each constant, c, to an element S(c) of D. A mapping of each function symbol, f, of arity n to a function S(f) with domain Dn and codomain D. A mapping of relation symbols of arity n into n-fold relations on D. If p is a relation symbol of arity n, then S maps p to an n-fold relation on D denoted S(p); that is S(p) is a subset of Dn.
We remark that the set D = dom(S) should not be confused with the do-main of definition of the structure S when it is considered as a mapping. When regarding S as a mapping it has a domain of definition that contains the constants, the function symbols, the relation symbols as well as all the terms. The set D is the “domain of interpretation” of S. An assignment of variables u maps each individual variable to an element ofD. The explanation of how U satisfies formulas is given in two steps. First, the action of U on terms is described. This is a combination of the actions of S and u. Then the definition of U satisfying a formula is given in 5.1. This is a recursive definition. The interpretation U maps a term (other than an individual variable or a constant) recursively to an element of D as follows. •
If f is a function symbol of arity n and t1,...,tn are terms mapped by U to elements of D respectively denoted by U(t1),...,U(tn), then U maps f(t1,...,tn) to the element S( f )( U(t1),...,U(tn)) of D.
Satisfaction is a relation between the set of all interpretations and the set of all formulas of a first-order language. The fact that an interpretation U satisfies a formula ϕ is written U ϕ. If U does not satisfy ϕ it is written U ϕ. The satisfaction relation is defined as follows. Definition 5.1 (First-Order Satisfaction) Let U =(S, u) be an interpretation. 1.
Let x be an individual variable. The interpretation V with assignment of variables v is a x-variant of U if V has the same structure, S, as U and if v is the same as u except that it possibly maps the individual variable x to a different element of dom(S) than u does. If p is a relation symbol of arity n and t1,...,tn are terms, then U p(t1,...,tn) if and only if (U(t1),...,U(tn)) is an element of the relation S(p). The satisfaction relation between U and an arbitrary formula is defined recursively as follows.
2. 3.
U ¬ iff U U ∨ iff U or U U ∧ iff U and U U → iff U and U , otherwise U →
• • • •
227
Schizophrenia and Autonomic Computing
• U (∀x) iff for every x -variant V of U , V • U (∃x) iff for some x -variant V of U , V 4. 5. 6.
If U we also write S u and say that S satisfies ϕ under u. If S u for every assignment of variables u then S is said to satisfy ϕ, and this is denoted S u . The formula ϕ is valid, written , if and only if U for every interpretation U.
In order to avoid difficulties with set-theoretical foundations, we work entirely in a universe of sets (Cameron, 1999); all collections of objects are sets and all set-theoretical constructions yield sets. We suppose a first-order language is given. The set of all structures defined on the vocabulary is denoted STRUC. The set of all subsets of STRUC is denoted PSTRUC. In the next section, we describe the algebra PSEN and show how to model the operations mentioned in the fourth section on autonomic computing and the operations giving an account of Frith’s examples described in the third section.
Algebraic Modelling of Component Architectures In this section, a brief description is given of the algebra PSEN. Then PSEN is used to model the operations mentioned in the third and fourth sections.
The Algebra PSEN We give an informal summary of the algebra PSEN here. PSEN behaves almost like a Boolean algebra. Given two sets of sentences X and Y, we can say the following things about them. The entailment relation acts as a partial order, that is:X X; if X Y and Y Z, then X Z; if X Yand Y X, then X and Y are equivalent (not equal, as they would be in a Boolean algebra). We sometimes denote this equivalence by the symbol ~. There are, with respect to , a smallest element, ^, and a largest element, . That is for any X in PSEN, ^ X . There are elements which are greatest lower and least upper bounds for X and Y with respect to .They are denoted respectively by X∧°Y and X∨°Y. This means that (X∧°Y) X (X∨°Y) and (X∧°Y) Y (X∨°Y), and that any Z that satisfies Z X andZ Y must satisfy Z (X∧°Y). (X∨°Y) behaves dually. There is also an inverse: the inverse of X is −X. It has the properties, -X∧°X is equivalent to ^ and -X∨°X is equivalent to . In summary: Partial order: The entailment relation, , is a partial order. Zero and Unit: ^, . Greatest lower bound: X∧°Y. Least upper bound: X∨°Y. Inverse: −X. In the algebra, X∧°Y can be thought of, roughly, as taking the union of the sentences in X and Y and X∨°Y as, roughly, taking the intersection. (It must be remembered that the partial order of PSEN is entailment not set inclusion.) This completes the informal summary of PSEN. The facts about PSEN as an algebra are summarised in the following two results. For a detailed development of PSEN see Flax, 2004).
228
Schizophrenia and Autonomic Computing
Theorem 6.1
(PSEN, , − , ∧ ∨ )is a distributive prelattice with order and operators −, ∧°, and ∨° which are unique up to equivalence, ~. PSEN also has infinitary meet and join operators ∧° and ∨° which are unique up to equivalence. Corollary 6.2
(PSEN, , − , ∧ ∨ , ∧ , ∨ ) is a complete preboolean algebra with preorder and distinguished ele-
ments and operators which are unique up to equivalence, ~, as follows. Zero: ^; unit:; inverse: −; binary meet: ∧°; binary join:∨°; infinitary meet: ∧°; and infinitary join:∨°. De Morgan’s laws hold for PSEN and the following well-known results are true. X Y iff X ∨° Y ~ Y X Y iff X ∧° Y ~ X X Y iff X ∧° —Y ~ ^ X Y iff —Y —X The next subsection shows how some of the operations that are used to model the working of the component architectures of the fourth and third section can be done using PSEN.
Modelling Operations Using PSEN The algebra PSEN is used to give expressions for some of the operations mentioned in the fourth section.
Combining Sets of Sentences We combine the sets of sentences WI and SI by forming the expression WI∧°SI. This is roughly the same as taking the union of the two sets of sentences (Flax, 2004).
Comparing Sets of Sentences To compare WI and A for equivalence, we form the expression (WI∨°SI) - (WI∧°SI), which is the symmetric difference of the two sets. If the two sets are equivalent then the symmetric difference will be ^. This follows from one of the results at the end of the previous subsection: X Y iff X ∧° —Y ~ ^. Supposing that the symmetric difference is equivalent to ∞, if we set X=WI∨°A and Y=WI∧° A, then we see that WI∨°A WI∧° A. But then from the properties of ∧° and ∨°, WI WI∨°A WI∧° A A, so WI A . A similar argument shows that A WI and so WI is equivalent to A.
Inhibition of a Set of Sentences What is needed is that any sentences in SI that conflict with GP or entail conflict with GP must be removed from SI. We use the contraction operation to do this. Let X and Y be sets of sentences associated with same cognitive component. We contract X by Y when we remove from X any sentences that will entail Y. The approach we use to contraction comes from the AGM treatment of belief revision. Detailed explanations can be found in Gärdenfors (1988) and Gärdenfors and Rott (1995). We have formulated an algebraic approach to contraction that is equivalent to the AGM approach. We give the main principles which are needed but cannot give a detailed account here (Flax, 2004) for details on algebraic contraction in PSEN. To contract X by Y we need to specify a rejection function, whose value is denoted M X, Y, and then operate (in PSEN) on X and M X, Y . Intuitively, the rejection function M X, Y entails the inverse, −Y, of Y. The rejection function has to satisfy the following two properties (amongst others).
229
Schizophrenia and Autonomic Computing
• •
Y~ or X Y, then M X,Y ~ ^. M X,Y ~ —Y.
In general, it turns out that there are many different rejection functions. Here we take M X,Y to be the canonical rejection function M X, Y = −Y provided neither Y~ nor X Y, else M X,Y is taken to be ^. The contraction of X by Y is the least upper bound of X and the rejection function. In symbols, the contraction is the following expression in PSEN: X ∨° M X,Y . For our example, we must remove from SI any sentences that entail the (boolean) inverse of the set of sentences GP associated with stimulus intention by entailment from goals/plans. The inverse of GP is −GP. We need to remove sentences that entail −GP because the link is inhibitory. As we have said above, this removal operation is contraction. It removes from SI any sentences that entail −GP; we contract SI by −GP. Here we use the rejection function MSI,−GP. This rejection function turns out to be equivalent to GP, or in symbols MSI,−GP = −(−GP) ~GP. In the special cases when -GP~ or SI -GP then, as mentioned above, MSI,−GP is taken to be ^. The contraction of SI by −GP is SI ∨°MSI, GP. If -GP and SI -GP then the contraction is SI∨°GP, the least upper bound of SI and GP. Otherwise the contraction is just SI because MSI,−GP = ^ and the contraction is (SI∨°^)~SI. These methods can be used to model cognitive malfunctioning in the illness, schizophrenia as we show next.
Algebraic Modeling of Frith’s Account We use restricted entailment, the algebra PSEN, and contraction to model Frith’s examples described in the third section. EXAMPLE 1. Suppose WI and SI are sets of sentences that are associated with the cognitive component action by entailment from the cognitive components willed intention and stimulus intention respectively. The two sets need to be combined algebraically into an element of PSEN associated with cognitive component action. We do this by taking the meet of WI and SI and take this as the element of PSEN that is associated with action: WI∧°SI. If disconnection 1 is present then WI is taken to be and so (WI∧°SI) ~ ( ∧° SI) ~ SI. Now no sentences sourced by entailment from willed intention are in the set associated with action. EXAMPLE 2. Our approach is to consider sets of sentences associated with the cognitive component stimulus intention and to operate upon them to give a resultant element of PSEN, so that no unwanted sentences are entailed by that resultant element. In this way, the cognitive component action will behave appropriately. In our approach, the inhibitory action of the path from goals/plans to stimulus intention is accomplished by an operation within stimulus intention. It is a two-step process. First we allow the cognitive component stimulus intention to associate with a set of sentences P obtained by entailment from the cognitive component perception. Secondly, we remove from Pany sentences that entail the (boolean) inverse of the set of sentences GP associated with stimulus intention by entailment from goals/plans. The inverse of GP is −GP. We need to remove sentences that entail −GP because the link is inhibitory. As we have said previously, this removal operation is contraction. It removes from P any sentences that entail −GP; we say that we contract P by −GP. In general, to apply contraction we use the rejection function M P,−GP = −(−GP)~GP. If, however, -GP ~ or P -GP then, as mentioned in the previous subsection, M P,−GP is taken to be ⊥. The contraction of P by −GP is P ∨°M P, -GP. If −GP and P −GP then the contraction is P∨°GP, otherwise M P,−GP = ⊥ and the contraction is (P ∨°⊥)~P. If disconnection 2 is present then GP is taken to be so −GP ~⊥. Suppose P⊥ then P ⊥ ~ −GP and as previously shown, the contraction is P. In this case, inappropriate speech fails to be inhibited.
230
Schizophrenia and Autonomic Computing
EXAMPLE 3. Suppose WI and A are sets of sentences that are associated with the cognitive component monitor by entailment from the cognitive components willed intention and action respectively. The two sets need to be compared algebraically for equivalence and the result acted on by monitor. We do this by taking the symmetric difference of the two sets; if it is equivalent to ⊥ then the two sets are taken to be equivalent. The symmetric difference of WI and A is (WI ∨°A)-(WI ∧°A). If disconnection 3 is present then WI is taken to be and the symmetric difference is ((∨°A)-(∧°A))-(-A)~-A, which is different from ⊥(unless A~). In this way, monitor is able to detect that there is no input from willed intention. This ends our modelling of Frith’s examples.
Computability of P In the previous sections, there has been an unstated underlying theme: that something can be gained from algebraic modelling of cognitive architectures. No doubt this remains to be seen, but experimentation will be made easier if simulations can be run on computers. This will be feasible if the algebra PSEN is computable. So in this section some aspects of the computability of the operations of PSEN and contraction are discussed by examining the computability of the algebraic expressions and entailment relation of PSEN. We use the approach to decidability, computability, and enumerability described by Ebbinghaus et al. (1984). In order to begin discussion of computability in PSEN, one has to start with something that is computable and so assumptions about basic computability properties of structures and assignments of variables are made explicit in 7.1. An important assumption we have made for this section is that all structures have finite domains. With these basic assumptions it can be shown that it is decidable whether a given structure entails a given firstorder sentence. In order to discuss computability of the entailment relation and operations of PSEN, one has to move to the restricted versions of these objects. It is easiest to see what is involved by looking at restricted entailment. Ordinary entailment is well known to be undecidable. A first step toward decidability of entailment is to consider an entailment relation where the semantic checking of satisfaction of structures is limited to a subset of the class of all structures. If one only checks a subset of structures when examining entailment then one gets a restricted entailment. Suppose the subset of structures constituting the restriction is R, then R denotes a restricted entailment with restriction R. This is defined precisely in definition 7.4. If the restriction set R is finite and each member of R has the properties listed in 7.1, then the restricted entailment R turns out to be decidable on finite sets of sentences. There are also restricted versions of the operations of PSEN. They are denoted by X ∧°RY, X ∨°RY and −R X. Definitions, relevant results, and detailed proofs are given in Flax (2004). The fact that operations and relations are capable of being restricted to a finite set of structures with finite domains is exploited to allow computability results to be proved. The expressions X ∧°RY, X ∨°RY and −R X. are computable, where R is a finite set of structures with finite domains and X and Y are finite sets of sentences; the restriction of the relation R to pairs of finite sets of sentences is decidable; and finally contraction is computable subject to the computability assumptions made about structures. To get started in examining the computability of the satisfaction relation for first-order sentences, we assume that some basic “raw materials” are computable: structures are assumed to be computable and only structures with finite domains are considered. In this section, the following properties relating to the computability of structures are assumed.
Properties 7.1 • • • • •
All structures have finite domains. All structures are computable. This means that all the parts making up the structure are computable: its mapping of Constants to domain members, Function symbols to functions defined on its domain, Relation symbols to relations over its domain.
231
Schizophrenia and Autonomic Computing
• • •
Any assignment of variables is computable. If S is a structure and f a function symbol of arity n, then the domain of S(f) is (domS)n and the values of S(f) are computable over its domain. If S is a structure and p is a relation symbol of arity n, then S(p) is a relation on (domS)n and the relation S(p)is decidable over (domS)n.
With these assumptions, the next proposition states that the value of any term under a given interpretation is computable and the satisfaction of any first-order formula by a given interpretation is decidable. Therefore, it is decidable whether a structure satisfies a first-order sentence. Proposition 7.2 Let S be a structure with finite domain dom(S) and let u be an assignment of variables. 1. 2.
The value of any term tunder the interpretation (S, u) is computable. Let ϕ be a first-order formula. It is decidable whether or not (S, u)ϕ.
As a corollary, satisfaction of a specific sentence by a specific structure is decidable. For a proof see Flax (2004). Corollary 7.3 Let S be a structure with finite domain and let ϕ be a first-order sentence. It is decidable whether or not Sϕ. Now the following fundamental restricted notions are defined: set of models and entailment. We recall that the set of all structures is denoted STRUC. Note that by setting R= STRUC in definition 7.4, one gets the unrestricted version of what is being defined.
Definition 7.4 Let X and Y be set of sentences and let R⊆STRUC. 1. 2.
The set of models of X restricted to R, denoted mod R(X), is modR (X)={S∈R:SX}. X entails Y with restriction R iff modR X⊆ modR Y; this is written as X RY.
The next definition is convenient because it allows the following results to be stated more succinctly than without it.
Definition 7.5 1. FPSTRUCFIN is the family of all finite sets of structures with finite do-mains. 2. FPSENFO is the family of all finite sets of first-order sentences. 3. If R ∈ FPSTRUCFIN, the restriction of the relation R to FPSENFO×FPSENFO is denoted RFO. Subject to the standard assumptions of this section, that the properties 7.1 hold, the restricted set of models of a finite set of sentences is computable. See (Flax, 2004) for a proof. Proposition 7.6 Let R ∈ FPSTRUCFIN and let X ∈ PSENFO, then modRX is finite and computable. The next two results show that the boolean operators are computable, subject to the usual assumptions for this section. See Flax (2004) for a proof. Proposition 7.7 Let R ∈ PSTRUCFIN and let modRX and modRY be computable, then X ∧°RY, X ∨°RY and −RX are computable.
232
Schizophrenia and Autonomic Computing
Corollary 7.8 For R ∈ FPSTRUCFIN and X,Y∈FPSENFO, X ∧°RY, X ∨°RY and −RX are computable. The next proposition states that restricted entailment between finite sets of sentences is decidable. See Flax (2004) for a proof. Proposition 7.9 For R ∈ FPSTRUCFIN, the relation RFO is decidable. We now sketch the argument for the computability of contraction. We recall that in Section 6 we defined the contraction of X by Y to be the expression X ∨° M X,Y. The next proposition shows that when structures and sentences belong to FPSTRUCFIN and FPSEN FO respectively, contraction is computable. We recall that the canonical construction of M K,X used in the sixth section, that is taking −X to be M K,X, ensures that M K,X is computable for X ∈ FPSENFO or for computable modR X. Alternatively if M K,X ∈ FPSEN FO then it is also computable. So the assumptions concerning M in the next proposition are feasible. The proof of proposition 7.10 follows from proposition 7.7. Proposition 7.10 For R∈FPSTRUCFIN, X,Y∈FPSENFO, and modR(MX,Y) computable, we have that the contraction X ∨° MX,Y is computable.
Conclusion We have shown that the “physicalist” view of cognitive functioning where the firing of neurons in one cognitive component induces firing of neurons via a neural pathway in an other component, can be modeled in a very simple logical language. This language is based on names for sets of neurons and one relation symbol, fires, which is interpreted to mean that a named set of neurons is firing. The underlying theme of this article is that it is worth while taking the simple logical model to a higher level of abstraction and to model cognitive function in terms of sets of sentences of first-order logic being operated on algebraically at each cognitive component. At this level of abstraction, sets of sentences are induced from one cognitive component to another by means of entailment. At any cognitive component, sets of sentences can be combined, compared or inhibited by using the operations of the algebra PSEN. It is possible to make these operations computable by imposing suitable restrictions. This is important for enabling computer simulation. Future work needs to address the pragmatics of defining computable operations in PSEN. The algebraic account of Frith’s examples and autonomic computing works well enough at its own level, but there are loose ends to be followed. More work needs to be done to investigate how memory can be modelled using PSEN. This could then be used to explain how sets of sentences associate with a cognitive component. Finally, modal temporal operators need to be incorporated to enable modelling of dynamic behaviour through time.
References Anonymous (2005). Autonomic computing. Retrieved April 2005, from http://en.wikipedia.org/wiki/Autonomic_Computing Cameron, P. J. (1999). Sets, logic, and categories. Springer. Ebbinghaus, H. D., Flum, J., & Thomas, W. (1984). Mathematical logic. Springer. Flax, L. (2004, September). Algebraic belief revision and nonmonotonic entailment results and proofs. Technical Report C/TR04-01, Macquarie University. Retrieved from http://www.comp.mq.edu.au/~flax/techReports/brNm. pdf
233
Schizophrenia and Autonomic Computing
Frith, C. D. (1992). The cognitive neuropsychology of schizophrenia. Lawrence Erlbaum Associates. Gärdenfors, P. (1988). Knowledge in flux. MIT Press. Gärdenfors, P., & Rott, H. (1995). Belief revision. In M. Dov, C. Gabbay, J. Hogger, & J. A. Robinson (Eds.), Handbook of logic in artificial intelligence and logic programming (Vol. 4, pp. 35-132). Oxford University Press. IBM Corp (2005). Autonomic computing. Retrieved April 2005, from http://www.research.ibm.com/autonomic/ glossary.html IBM Corp (2005). The eight elements. Retrieved April 2005, from http://www.research.ibm.com/autonomic/ manifesto/autonomic_computing.pdf Wang (2007). Exploring machine cognitionmechanisms for autonomic computing. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(2), pp. i-v.
This work was previously published in International Journal of Cognitive Informatics and Natural Intelligence, Vol. 1, Issue 2, edited by Y. Wang, pp. 58-72, copyright 2007 by IGI Publishing (an imprint of IGI Global).
234
235
Chapter XVI
Interactive Classification Using a Granule Network Yan Zhao University of Regina, Canada Yiyu Yao University of Regina, Canada
Abstract Classification is one of the main tasks in machine learning, data mining, and pattern recognition. Compared with the extensively studied automation approaches, the interactive approaches, centered on human users, are less explored. This chapter studies interactive classification at 3 levels. At the philosophical level, the motivations and a process-based framework of interactive classification are proposed. At the technical level, a granular computing model is suggested for re-examining not only existing classification problems, but also interactive classification problems. At the application level, an interactive classification system (ICS), using a granule network as the search space, is introduced. ICS allows multi-strategies for granule tree construction, and enhances the understanding and interpretation of the classification process. Interactive classification is complementary to the existing classification methods.
Introduction Human cognitive activities rely on classification to organize the vast number of known matter, plants, animals and events into categories that can be named, remembered, and discussed. The problems of human-based classification are that, for very large and separate datasets, it is difficult for people to be aware, to extract, to memorize, to search and to retrieve classification patterns, in addition to interpreting and evaluating classification results that are constantly changing, and then making recommendations or predictions in the face of inconsistent and incomplete data. Computers perform classification by revealing the internal structures of data according to programmed algorithms. They maintain precise operations under a heavy information load and preserve steady performance. A typical automatic classification approach is batch processing, where all the input is prepared before the program runs. The problems of automatic classification are that the systems often do not allow users, or limit users’ ability,
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Interactive Classification Using a Granule Network
to contact and participate in the discovery process. A fixed algorithm may not satisfy the diverse requirements of users; a user often cannot relate to the answers, and is left wondering about the meaning and value of the socalled discovered knowledge. In this chapter, we propose a framework of human-machine interactive classification. Although human-machine interaction has been emphasized for many disciplines, such as information retrieval and pattern recognition, it has received some, though not yet enough, attention in the domain of data mining (Ankerst et al., 1999, Brachmann & Anand, 1996, Han, Hu & Cercone, 2003, Zhao & Yao, 2006). The fundamental idea of interactive classification is: on one hand, computers can help users to carry out description, prediction and explanation activities efficiently; on the other hand, human insights, judgements and preferences can effectively interfere with method selection, application and adjustment, thus improving the existing methods and generating new methods. Interactive classification uses the advantages of both a computer system and a human user. A foundation of human-computer interaction in data mining may be provided by cognitive informatics (Wang, 2002, 2007a, 2007b; Wang & Kinsner, 2006; Wang et al., 2006). As Wang suggests that, for cognitive informatics, relations and connections of neurons represent information and knowledge in the human brain might be more important than the neurons. Following the same way of thinking, we believe that interactive data mining is sensitive to capacities and needs of humans and machines. A critical issue is not how intelligent the user is, or how efficient the algorithm is, but how well these two parts can be connected and communicated, stimulated and improved. More specifically, interactive classification systems allow users to suggest preferred classifiers and knowledge structures, and use machines to assist calculation and analysis during the discovery process. A user can freely explore the dataset according to his/her preference and priority, ensuring that each classification stage and the corresponding result are all understandable and comprehensible. The constructed classifier is not necessarily efficient when compared with most of the automatic classifiers. However, it is close to human thinking by its very nature. The evaluation of an interactive classification, involving the understandability and applicability of the final classification results, relies heavily on the interaction between the computer and the human user, not just on one single factor. In the rest of this chapter, we discuss the interactive classification at three levels: philosophical level, technical level and application level. At the philosophical level (Section 2), we discuss the motivation of interactive classification and present the process-based framework. At the technical level (Section 3), we apply granular computing as the methodology for examining the search space and complexity issues of interactive classification. At the application level (Section 4), an interactive classification implementation based on a granule network is introduced. The main results demonstrate the usefulness of the proposed approach. The conclusion is in Section 5.
Study of IIassification at the Philosophical Level For an interactive system, the most critical aspects are not only how intelligent the user is, or how efficient the system is, but also how well these two parts connect and communicate. Through interaction and communication, computers and users can divide labour in order to achieve a good balance of automation and human control.
Motivations for Interactive Classification Human-computer interaction should cope with users’ either vague or clear requirements, and fixed or changing goals, as well as encouraging the curiosity of exploration, and supporting multiple views and operations for different inquiries. The interaction allows users to engage in all important processes of classification. Typically, a user can design and conduct a specific classifier, think critically and logically in his/her own way, and compare the different sets of classification rules based on his/her own understanding. These interactive processes encourage human learning, improve insight and understanding of the domain, and stimulate users to explore creative possibilities. User feedback can be used to improve the system. The interaction is bi-beneficial. It is important to note that users process a variety of skills, intelligence, cognitive styles, frustration tolerances and other mental abilities. They come to a classification problem with varying preferences, requirements and background knowledge. Given a set of data, each user will try to make sense of data by seeing it from different
236
Interactive Classification Using a Granule Network
angles, in different aspects, and under different views. Based on this diversity, there is not one universally applicable theory or method that is able to serve the needs of all users. This provides motivation for and justification of the co-existence of many theories and methods for classification systems, as well as the exploration of new theories and methods. The existing classification algorithms simply represent various heuristics that can be applied as rule searching strategies. An interactive classification approach allows a free choice of any existing classification algorithms, or even a new algorithm; it allows execution of or reversal of construction steps; and permits the user to save or test the current classification results at any time. According to a particular classification problem, a user may prefer one solution to another, one arrangement to another, one attribute order to another, or one evaluation to another. An interactive classifier should be able to allow a user to state his/her preference, qualitatively or quantitatively. That is, A is preferred to B, or A is how much preferred to B. The preference can even be stated as an order. For instance, the preference order is a total order, i.e., A, B and C, or a weak order, A is as good as B, and both A and B are better than C. An interactive data mining system should be able to support a specific user preference. For example, it should embed the preferred attributes in discovered rules and remove the unfavoured ones, as well as comparing and evaluating the preferred rules with the “standard” good rules that have strong support or confidence. Having specific requirements and user preferences targeted, an interactive classifier is able to carry out precise calculations and inferences. By providing a default control flow, an interactive classification system allows the user to establish a workable control flow when exploring a special control structure. In many real applications, ideal situations do not exist. Instead, users must choose a strategy for abnormal situation handling. Some possible strategies include retaining all long and specific rules; keeping only general rules and exceptions to the general rules; sacrificing rule accuracy for other properties, such as low cost, low risk, or better understandability.
A Process-Based Framework for Interactive Classification Interactive classification has received more attention in recent years (Ankerst, et al., 1999, Brachmann & Anand, 1996, Finin & Silverman, 1986, Han, Hu & Cercone, 2003, Hatano, et al., 1999, Zhao & Yao, 2005). Most of the existing interactive classifiers add visual functionality into the process, which enables users to invigilate the classification process at some stages (Brachmann & Anand, 1996, Elm, et al., 2004, Han, Hu & Cercone, 2003). Han et al. pointed out that, “most visualization systems concentrate on the raw data visualization and/or the final results visualization, but lack the ability to visualize the entire process of knowledge discovery (Han, Hu & Cercone, 2003).” Graphical visualization makes it easy to identify and distinguish the trend and distribution. Although this is a necessary feature for human-computer interaction, it is not sufficient. Classification, as well as knowledge discovery in general, includes six phases. They are: data preparation, data selection and reduction, data preprocessing and transformation, pattern discovery, pattern evaluation and pattern visualization (Brachmann & Anand, 1996, Fayyad, et al., 1996,Han, Hu & Cercone, 2003, Mannila, 1997, Yao, Zhao & Maguire, 2003, Yao, Zhong & Zhao, 2004}. In an interactive classification system, these phases can be carried out as follows: • • • • • •
Data preparation visualizes the raw data with a specific format. Data distribution and some relationships between attributes can be easily observed. Data selection and reduction involves the reduction of the number of attributes and/or the number of tuples. Users can specify particular attributes and data area, and remove other data that is outside the area of interest. Data preprocessing and transformation determines the number of intervals as well as cut-points for continuous datasets, and transforms the dataset into a workable dataset. Pattern discovery visualizes the preprocessed data and interactively constructs classification patterns under user guidance, monitoring and supervision. Pattern evaluation evaluates the discovered classification results whenever the user wishes. The usefulness of this is subject to the user's judgement. Pattern visualization visualizes the classification results in forms of decision trees, rules and/or graphs. Patterns can be stored for later reference, analysis and comparison.
237
Interactive Classification Using a Granule Network
The process of an interactive system is virtually a dynamic loop, which is iterated until satisfiable results are obtained. The interactive classification process is achieved by different kinds of interactive forms, including: idea proposition, information acquisition, data manipulation, guidance acquisition, and evaluation and decision making. They proceed through the entire data mining process mentioned above, and produce desirable mining results.
A Granular Computing Model for Interactive Classssion Ever since the introduction of the term of Granular Computing (GrC), a rapid development of this topic has been observed (Lin, 1997, Nguyen, Skowron & Stepaniuk, 2001, Pedrycz, 2001, Yao, 2006, Yao & Yao, 2002, Yao & Zhong, 1999, Zadeh, 1997). It is used by researchers in computational intelligence to explore varying levels of granularity in human-centered problem solving and information processing. A granule, a subset of the universe, is regarded as the primitive notion of granular computing. We refer to a level with a family of granules as a granulated view. Granules in different levels are linked by order relations in a hierarchy. A granule in a higher level can be decomposed into many granules in a lower level, and, conversely, many granules in a lower level can be combined into granules in a higher level. A granule in a lower level provides a detailed description of the granule in a higher level, and a granule in a higher level has a more abstract description than the granules in a lower level. From the standpoint of granular computing, a concept may be exemplified by a granule, and be described or labelled by a formula. Once concepts are constructed and described, one can develop computational methods for the granule and the formula, such as sub- and super-concepts, and disjoint and overlapped concepts (Yao, 2006). These relationships can be conveniently expressed in the form of rules, with some associated quantitative measures indicating the strength. By combining the results from granular computing and formal concept analysis, knowledge discovery and data mining, especially rule mining, can be viewed as a process of forming concepts and finding relationships between concepts in terms of granules and formulas. Specifically, classification deals with the grouping or clustering of objects based on certain criteria. It is directly related to concept formation and concept relationship identification (Yao and Yao, 2002). While concept formation involves the construction of classes and description of classes, concept relationship identification involves the connections between classes. The classification problem is then properly modelled by the granular computing theory.
Information Tables Information tables are used in granular computing models. An information table provides a convenient way to describe a finite set of objects, called a universe, by a finite set of attributes. It represents all available information and knowledge. That is, objects only are perceived, observed, or measured by using a finite set of properties. Definition 1. An information table is the following tuple: S = (U, At, {Va | a ∈ At}, {Ia(x) | x ∈ U, a ∈ At}) where U is a finite nonempty set of objects, At is a finite nonempty set of attributes, Va is a nonempty set of values of a ∈ At, Ia : U → Va is an information function. The mapping Ia (x) = v means that the value of object x on attribute a is v, where v ∈ Va. Definition 2. A decision logic language defines the information in an information table S. An atomic formula is given by a = v, where a ∈ At and v ∈ Va. If φ and ψ are formulas, then so are ¬φ, φ ∧ ψ and φ ∨ ψ. Definition 3. Given a formula φ, if an object x satisfies φ, we write x |= φ. The set ms(φ) of objects, defined by
238
Interactive Classification Using a Granule Network
ms( f ) = {x ∈ U | x | =φ} is called the meaning of the formula φ in S. If S is understood, we simply write m(φ). For classification tasks, it is assumed that each object in an information table is associated with a unique class label. Objects can be divided into classes, which form a granulation of the universe. Without loss of generality, we assume that there is a unique attribute class taking class labels as its values. The set of attributes is expressed as At = ∪ {class}, where is the set of attributes used to describe the objects, also called the set of descriptive attributes. Definition 4. A granule X is a definable granule if it is associated with at least one formula, i.e., X = m(φ), where φ ∈ A formula φ can be viewed as the description of a granule m(φ); a granule m(φ) contains the set of objects having the property expressed by φ. A connection between formulas of and subsets of U is thus established. This formulation enables us to study formal concepts in a logic setting in terms of formulas, and also in a settheoretic setting in terms of granules. In many data mining applications, one is only interested in formulas of a certain form. Suppose we restrict the connectives of language to only the conjunction connective ∧, which means that each formula is a conjunction of atomic formulas. Such a formula is referred to as a conjunctor. Definition 5. A subset X ⊆ U is a conjunctively definable granule in an information table S if there exists a conjunctor φ such that m(φ) = X. Similarly, we can define the disjunctive definable granules and other classes of definable sets (Yao, 2006). A conjunctor that contains only one atomic formula defines a 1-conjunctor granule; a conjunctor that conjuncts n atomic formulas defines a n-conjunctor granule. The most general granule is 0-conjunctor, indicating the whole universe. With respect to the descriptive attribute set , the most specific granules in an information table are the ||-conjunctor granules, where |.| denotes the cardinality of the set.
Classification Definition 6. A granule X (defined by attributes in the set of descriptive attributes) is consistently classified into a class ci, if Iclass(x) = ci for all x ∈ X; otherwise, the granule is inconsistently classified. A classification rule is in the from of φ ⇒ class = ci, where φ defines X. Definition 7. Let ρ : 2U → ℜ+ be a function such that ρ(X) measures the degree to which φ ⇒ class = ci is true, where ℜ+ are non-negative reals. The measure ρ can be defined to capture various aspects of classification. Two such measures are discussed below. They are the ratio of sure classification, and the accuracy of classification. Definition 8. The ratio of sure classification of a granule X is given by:
1, if ∃ci ∈ Vclass , X ⊆ m(class = ci ); r 1ρ(1(X) = X) = 0, otherwise. The ratio of sure classification represents the percentage of objects that can be classified by ci without any uncertainty. The measure ρ1(X) reaches the maximum value 1 if X is consistent. The granule X is said to be a certain solution to a class label ci. The measure reaches the minimum value 0 if X is inconsistent.
239
Interactive Classification Using a Granule Network
Definition 9. The accuracy of classification of a granule X is defined by:
r 2ρ(1(X) = X) =
| X ∩ ci ( X ) | |X|
,
The accuracy of X is, in fact, the accuracy of a rule regarding the class label satisfying the majority of the set X. The measure ρ2(X) reaches the maximum value 1 if X is consistent. The granule X is said to be a certain solution to its associated class label ci. When ρ2(X) < 1, X is inconsistent, and is said to be an approximate solution to the class label ci(X). For two granules with X1 ⊆ X2, we have ρ2(X1) ≥ ρ2(X2). A certain solution is special case of an approximate solution with ρ1=100\% and ρ2=100\%.
A Granule Network A granule network systematically organizes all the granules and atomic formulas with respect to the given information table S = (U, At = ∪ {class}, {Va | a ∈ At}, {Ia (x) | x ∈ U, a ∈ At}). A granule network has |At| levels at most. Each node consists of a granule, and each arc leading from a granule to its child granule is labelled by an atomic formula. A path from a coarse granule to a fine granule indicates a conjunctive relation. According to a hierarchical structure, the root node is the universe. The second level contains all the 1-conjunctor granules, the third level contains all the 2-conjunctor granules, and so on, till the |At|-th level contains all ||-conjunctor granules. To create a granule network, we need to understand the number of conjunctors (conjunctive formulas) and the number of conjunctively definable granules. They determine the size of the granule network. The number of conjunctors: There is only one 0-conjunctor, ∅, and ∑ || Va |1-conjunctors. The number of a∈
2-conjunctors is ∑ || Vai | * | Va j |. For n-conjunctors, the number is | Va1 | *...* | Van |, where a1...an ∈ , and they are not the same.ai ,a j ∈ The number of conjunctively definable granules: Because of the associativity of conjunction, two conjunctors φ ∧ ψ and ψ ∧ φ are associated with the same granule. In other words, a 2-conjunctor granule can be defined by 2! conjunctors, a 3-conjunctor granule can be defined by 3! conjunctors, and a n-conjunctor granule can be defined by n! conjunctors, etc. It is easy to verify that, the total number of conjunctively definable granules is the product of cardinality of possible values of each attribute plus one, defined by ∏ (| Va | +1) . a∈
Regarding classification tasks, the order of atomic formulas forming a conjunctor does affect the efficiency and effectiveness of search and retrieval. For example, one may find that the order of first retrieving the granule m(φ) then retrieving the finer granule m(φ ∧ ψ), is faster, more reasonable, or more feasible, than the reverse order. Furthermore, if a granule is consistently classified, then all its finer granules are also consistently classified, and thus the finer solutions are trivial for the classification purpose. Suppose one does not find this general granule first, but instead, some finer granules are obtained. As a result, one has many finer granules that are trivial. There is a set of formulas associated with a single granule, and the expressive powers of these formulas are not equal.
Heuristic Search in a Granule Network Partition-based algorithms look for the most promising attributes to split the examined granules at each level, and each is labelled by one of the possible values of the selected attribute. The child granules naturally cover their parent granule, and pairwisely disjoint with each other. Various measures are applied in order to find the most promising attributes. For example, ID3 (Quinlan, 1983), ASSISTANT (Cestnik, Kononenko & Bratko, 1987) and C4.5 (Quinlan, 1993) use information entropy measurements, and CART (Breiman, et al., 1984) uses Gini index.
240
Interactive Classification Using a Granule Network
Covering-based algorithms look for the most promising attribute-value pairs at each level that best classify a particular class. It is possible that the granules being searched overlap. It is easy to understand that covering-based algorithms search a bigger space than partition-based algorithms, and therefore covering-based rules tend to be larger in quantity and more general in quality than partition-based rules. Many measures are applied when looking for the most promising attribute-value pairs. For example, PRISM (Cendrowska, 1987) uses the confidence measure, and CN2 (Clark & Niblett, 1989) uses the information entropy of an attribute-value pair to all classes. We also may think of the coverage measure of an attribute-value pair given a certain class. Top-down algorithms start the search from the 0-conjunctor granule, then heuristically search down for the most promising attributes (partition-based) or attribute-value pairs (covering-based) to restrict the granule, until the consistent classification solutions are found. Some top-down criteria are based on local optimization. An important feature of ID3-like algorithms is that when splitting a node, an attribute is chosen based only on information about this node, but not on any other nodes at the same level. The consequence is that in the decision tree, different nodes in the same level may use different attributes, moreover, the same attribute may be used at different levels. Some top-down criteria are based on global optimization, which chooses the attribute in favour of all nodes at the same level. One example is the kLR algorithm (Yao, Zhao & Yao, 2004), which can construct a decision tree and evaluate the accuracy level by level. Bottom-up algorithms start the search from the ||-conjunctor granules, i.e., the individual objects in the information table. The idea of bottom-up algorithms is that if all the finer granules are consistent solutions, then they are trivial solutions and can be described by their coarsened granule inductively. The AQ algorithm (Michalski, 1983) is a classic example of bottom-up algorithms. Another heuristic for efficient search is that instead of searching for the consistent solutions, one can search for the satisfactory inconsistent solutions. Pre-pruning methods are used by top-down algorithms that prematurely halt the search by meeting a predefined threshold. For example, ASSISTANT (Cestnik, Kononenko & Bratko, 1987) and kLR (Yao, Zhao & Yao, 2004) use an accuracy threshold as a cutoff criterion, while CN2 (Clark & Niblett, 1989) uses the Laplacian function. One also can use a generality threshold, which indicates the size of the granule. When the generality of a certain granule is too small to be significant as a good solution, one may ignore it. Post-pruning methods consist of three parts. First, grow a decision tree or decision rules for the data. Second, prune from the tree/rules a sequence of subtrees/subrules. Finally, try to select from the sequence of subtree/sub-rules which estimates the true regression function as best as possible. The examples of post-pruning algorithms are CART (Breiman, et al., 1984) and C4.5 (Quinlan, 1993). According to the three steps, the space complexity for constructing the tree is not saved, but only the rule presentation is simplified and coarsened.
An Implementation of Interactive Classssion: ICS An interactive classification system, ICS, using a granule network as a search space, is implemented to demonstrate the power of human-computer interactivity. The domain knowledge and the user preference can be profitably included in the search phases. The graphical user interface of ICS is shown in Figure 1.
Interactive Pattern Discovery In the process of interactive pattern discovery, users require three types of information: First, information of the constructed granule tree. The name of a granule tree is derived from the fact that the classification results are searched in a granule network, and can be arranged into a tree structure. At the initial stage, the granule tree only has a root, the entire universe, the biggest granule. If it is not consistently classified by any class, we start the classification process, and the granule tree grows branches. Second, information of a selected branch. Initially, the only branch is the root, which is a 0-conjunctor. If an atomic formula is conjuncted to the root, then we obtain
241
Interactive Classification Using a Granule Network
Figure 1. The graphic user interface of ICS
a 1-conjunctor associated with a subset of the universe. Users can choose any one of the branches to investigate. As a result, he/she can alternatively create a 2-conjunctor granule based on this 1-conjunctor granule, or originate another 1-conjunctor granule based on the root. Third, information of the available atomic formulas with respect to a selected branch. When a branch is consistently classified, the finer granules of it are all consistently classified, and thus are not meaningful. When a branch is not yet consistently classified, then the user needs to decide which atomic formula should be conjuncted to the selected branch. Users require these three types of information to make a decision for classification. Sometimes, not all the classes and measures are equally important and meaningful to a user. In the interactive pattern discovery phase, it is necessary to allow the user to select one specific class and the interested measurements. Both the manipulations “Add to tree” and “Add to tree as a rule” conjunct an atomic formula to the selected branch. The difference between these two is that by adding to tree, the consistent granules with their labels are added to the rule set automatically; by adding to tree as a rule, the consistent granules with their labels, as well as the picked inconsistent granule with its dominating class label, are added to the rule set. To describe and classify all the classes, partition-based granule trees are easily constructed. Whenever an expecting attribute is selected, all of its possible values are added to the granule tree, with the corresponding granules being pair-wisely disjointed. The partition-based approach ensures that no portion will be missed for classification. If one needs to achieve partial success, namely, to describe and classify one particular class, then covering-based granule trees are more suitable. In this case, the active attribute-value pairs and their measurements are of concern. Normally, the progression of investigation moves from the most promising branch to the less promising ones. One can apply the depth-first mode to explore each branch sequentially, or apply the breath-first mode to explore the granules at the same level. A mixture method is also allowed.
242
Interactive Classification Using a Granule Network
Interactive Pattern Evaluation In the process of interactive evaluation, users need to obtain two types of evaluations: Evaluation of the description of the training data: ICS keeps track of the number of objects that have been processed and covered by any one of the constructed rules, the sum of the correctly classified objects that covered by consistent solutions and/or manually-set solutions, and the sum of the correctly classified objects that covered by consistent solutions only. The values are updated when a new node is added to a selected branch, a label is manually set for a branch, or a branch is deleted from the granule tree. Evaluation of the prediction of the testing data regarding the constructed classification rules: By the interactive method, one does not need to complete the whole training process before doing the test. The test process can be carried out whenever the user wishes. Similarly, the classification process can be stopped manually when the accuracy of the testing result is acceptable. The manipulations involved in this interactive evaluation phase are “TEST” and “DELETE,” i.e., test the constructed rules at a specific time, and delete one or some of the rules that are not satisfactory, respectively. The user can decide either to use a random leave-out method, or the 5-cross validation method to divide the dataset into two parts. One part is for constructing classification rules and trees, while the other part is for testing the accuracy of the learned rules and the granule tree. For covering-based granule trees, the order of adding granules to the tree affects the accuracy of classification. This happens when the inconsistent solutions are manually set.
Interactive Pattern Representation Two threads are clearly shown in ICS. First is the tree-branch-node thread. The constructed granule tree is the main result of the interactive classification. It is composed by many branches, and each branch has many nodes. We can consider the tree as a whole, or consider it as many parts. Each part can be represented, evaluated and manipulated separately. The second thread is the training-testing thread. A set of classification rules can be extracted from the granule tree. This rule set is another main result of interactive classification. An approximated classification rule accepted by a user also can be used for testing, the same as with a certain rule. The central part of ICS’s user interface includes the first thread. The bottom part contains the second thread.
Conclusion The contributions of this chapter are twofold. First, a framework of using a granule network for classification tasks is developed, that is theoretically reasonable and technically practical. A classification problem can be modelled as a search for a partition or a covering defined by a set of attribute-values in a granule network. The entire search space and many of the existing heuristics are studied. Second, an interactive classification system, ICS, is implemented for the classifier construction. The proposed approach provides more freedom of choice regarding heuristics and measures according to the user’s needs. The process is based on the idea that the classification task can be more useful if it carries with it user preference and user interaction. It overcomes the limitation of most of the existing automatic classification algorithms that fix on one heuristic to decide where to classify and how to classify.
References Ankerst, M., Elsen, C., Ester, M., & Kriegel, H.P. (1999). Visual classification: An interactive approach to decision tree construction, ACM SIGKDD. International Conference on Knowledge Discovery and Data Mining (pp. 392-396).
243
Interactive Classification Using a Granule Network
Brachmann, R., & Anand, T. (1996). The process of knowledge discovery in databases: A human-centered approach. Advances in knowledge discovery and data mining. (pp. 37-57). Menlo Park, CA: AAAI Press & MIT Press. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth Int. Group. Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27, 349-370. Cestnik, B., Kononenko, I., & Bratko, I. (1987). ASSISTANT 86: A knowledge-elicitation tool for sophisticated users. Proceedings of the 2nd European Working Session on Learning (pp. 31-45). Yugoslavia. Chiew, V., & Wang, Y. (2004). Formal description of the cognitive process of problem solving. Proceedings of ICCI’04 (pp. 74-83). Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261-283. Elm, W.C., Cook, M.J., Greitzer, F.L., Hoffman, R.R, Moon, B., & Hutchins, S.G. (2004). Designing support for intelligence analysis. Proceedings of the Human Factors and Ergonomics Society (pp. 20-24). Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (eds.) (1996). Advances in knowledge discovery and data mining. AAAI/MIT Press. Finin, T., & Silverman, D. (1986). Interactive classification of conceptual knowledge. Proceedings of the First International Workshop on Expert Database Systems (pp. 79-90). Han, J., Hu, X., & Cercone, N. (2003). A visualization model of interactive knowledge discovery systems and its implementations, Information Visualization, 2(2), 105-125. Hatano, K., Sano, R., Duan, Y., & Tanaka, K. (1999). An interactive classification of Web documents by selforganizing maps and search engines. Proceedings of the 6th International Conference on Database Systems for Advanced Applications (pp. 19-22). Lin, T.Y. (1997). Granular computing. Announcement of the BISC special interest group on granular computing. Mannila, H. (1997). Methods and problems in data mining. Proceedings of International Conference on Database Theory’97 (pp. 41-55). Matlin, M.V. (1998). Cognition (4th ed.). Harcount Brace and Company. Mayer, R.E. (1992). Thinking, problem solving, cognition (2nd ed.). W.H. Freeman and Company. Michalski, J.S., Carbonell, J.G., & Mirchell, T.M. (Eds.) (1983). Machine learning: An artificial intelligence approach (pp. 463-482). Palo Alto, CA: Morgan Kaufmann. Nguyen, S.H., Skowron, A., & Stepaniuk, J. (2001). Granular computing: A rough set approach. Computational Intelligence, 17, 514-544. Pedrycz, W. (Ed.) (2001). Granular computing: An emerging paradigm. Heidelberg: Physica-Verlag. Quinlan, J.R. (1983). Learning efficient classification procedures and their application to chess end-games. In Machine learning: An artificial intelligence approach, 1, Michalski, J.S. Carbonell, J.G., & Mirchell, T.M. (Eds.). (pp. 463-482).Palo Alto, CA: Morgan Kaufmann. Quinlan, J.R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann. Wang, Y. (2002a, August). On cognitive informatics. Keynote speech in the Proceedings of the 1st IEEE Inter-
244
Interactive Classification Using a Granule Network
national Conference on Cognitive Informatics (ICCI02) (pp. 34-42). Calgary, Canada: IEEE CS Press. Wang, Y. (2007a). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), 1(1), 1-27. Hershey, PA: IGP. Wang, Y. (2007b, July). The OAR model of neural informatics for internal knowledge representation in the brain. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(3), 64-75. Hershey, PA: IGI Publishing. Wang, Y., & Kinsner, W. (2006, March). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 121-123. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 124-133. Yao, Y.Y. (2006). Granular computing for data mining. Proceedings of SPIE Conference on Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, paper 624105. Yao, Y.Y., & Yao, J.T. (2002). Induction of classification rules by granular computing. Proceedings of the 3rd International Conference on Rough Sets and Current Trends in Computing (pp. 331-338). Yao, Y.Y., Zhao, Y., & Maguire, R.B. (2003). Explanation-oriented association mining using rough set theory. Proceedings of Rough Sets, Fuzzy Sets and Granular Computing (pp. 165-172). Yao, Y.Y., Zhao, Y., & Yao, J.T. (2004). Level construction of decision trees in a partition-based framework for classification. Proceedings of SEKE’04 (pp. 199-205). Yao, Y.Y., & Zhong, N. (1999). Potential applications of granular computing in knowledge discovery and data mining. Proceedings of World Multiconference on Systemics, Cybernetics, and Informatics, 5, Computer Science and Engineering (pp. 573-580). Yao, Y.Y., Zhong, N., & Zhao, Y. (2004). A three-layered conceptual framework of data mining. Proceedings of ICDM Workshop of Foundation of Data Mining (pp. 215-221). Zadeh, L.A. (1997). Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems, 19, 111-127. Zhao, Y., & Yao, Y.Y. (2005). Interactive user-driven classification using a granule network. Proceedings of ICCI’05 (pp. 250-259).
245
Section IV
Knowledge Science
247
Chapter XVII
A Cognitive Computational Knowledge Representation Theory Mehdi Najjar University of Sherbrooke, Canada André Mayers University of Sherbrooke, Canada
Abstract Encouraging results of last years in the field of knowledge representation within virtual learning environments confirms that artificial intelligence research in this topic find it very beneficial to integrate the knowledge psychological research have accumulated on understanding the cognitive mechanism of human learning and all the positive results obtained in computational modelling theories. This chapter introduces a novel cognitive and computational knowledge representation approach inspired by cognitive theories which explain the human cognitive activity in terms of memory subsystems and their processes, and whose aim is to suggest formal computational models of knowledge that offer efficient and expressive representation structures for virtual learning. Practical studies both contribute to validate the novel approach and permit to draw general conclusions.
Introduction Almost since the advent of the computer age, researchers have recognised the computer’s enormous potential as an educational aid, and although the idea of using software resources for teaching and learning purpose dates back more than three decades, having recourse to virtual learning environments (VLE) in teaching and training constitute an axis of interest which has not stopped growing. Indeed, this important technological concept is being more and more considered by an increasing number of universities and colleges. Various attempts (Wells & Travis, 1996; Rzepa & Tonge, 1998; Lintermann & Deussen, 1999; Heermann & Fuhrmann, 2000) to create strongly interactive VLE were made, generating a remarkable enthusiasm within the educational community. However, if one has the ambition to build such environments that provide specific teaching material and exploit
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Cognitive Computational Knowledge Representation Theory
technology-based features and which are equipped with tutorial strategies able to interact with learners that have various levels of intelligence and different capacities of knowledge acquisition—especially, to adapt contents to each student profile and its needs (Brusilovsky & Peylo, 2003) and to provide tailored aid to learners according to their cognitive states (de Rosis, 2001), then understanding the human learning processes and the manners of structuring and handling knowledge during those processes is a fundamental task. Recent multidisciplinary researches on cognitive informatics (Wang, 2003; Wang et al., 2003; Wang & Wang, 2006; Wang & kinsner, 2006) that study internal information processing mechanisms and processes of the brain (and that investigate how human beings acquire, interpret and express knowledge by using the memory and the mind) lead to seriously consider the idea to adopt a memory-based approach which perceives the memory as the foundation for any kind of intelligence. Incontestably, representing the acquired/handled knowledge of students during learning constitutes a real challenge. One solution to the outcome issues expressed above could be offered thanks to the adoption of a cognitive, computational and memory-based knowledge representation approach that formalise the structuring of the domain knowledge which is handled and/or acquired by learners during training activities via VLE. In this chapter, we introduce AURELLIO1, a cognitive and computational knowledge representation approach inspired by cognitive theories which explain the human cognitive activity in terms of memory subsystems and their processes, and whose aim is to suggest formal computational models of knowledge representation. The proposed models are innovative in many respects. These models (1) use parsimoniously cognitive structures suggested by psychology to dynamically encode the knowledge, (2) take into account an episodic knowledge—defined within a novel context—whose analysis serves for a better understanding of the learner behaviour (Najjar et al., 2006), and (3) treat explicitly the student’s goals for reasoning purpose. The rest of the article is organised as follows. First, we present the AURELLIO knowledge representation theoretical approach. Second, we describe an AURELLIO-based authoring tool whose purpose is to facilitate modelling the domain knowledge via a user-centered graphical interface which is ergonomic and easy to use by non-experts in informatics. The objective (through this section) is to point out in detail the various knowledge representation structures proposed by AURELLIO. Third, we report on two practical studies that try to validate AURELLIO-based models of knowledge representation in the scope of the expressivity and efficiency contexts. In the first study, the objective was to conceive an AURELLIO model that represents the domain knowledge of a technical and rigorous discipline—the usage of reduction rules of algebraic Boolean expressions. In the second study, the focus was on the cognitive aspects of the AURELLIO knowledge representation and reasoning in comparison with ACT-R (Anderson, 1993), a famous and widely acknowledged cognitive architecture. Here, the interest was on modelling interrupted activities and the interruptions’ consequence on the task achievement. Fourth, we underline some originalities of AURELLIO and we discuss relations between our approach and the ACT-R knowledge representation theory. In the last section, by way of conclusion, we mention our current work.
The Aurellio Knowledge Representation Approach If we are interested in education and teaching, and have the ambition to endow an artificial system with competence in those fields, it is not possible to be unaware of all that concerns training, cognition and memory. The latter is one of the most enthralling properties of the human brain. If it is quite true that it governs the essence of our daily activities, it also builds the identity, the knowledge, the intelligence and the affectivity of human being (Baddeley, 1990). Rather than being a simple hardware device of data storage (as in the computer’s case), the principal characteristic of this memory is carrying out categorisation, generalisation and abstraction processes (Gagné et al., 1992). However, if the human memory has its extraordinary faculties of conservation, it sometimes happens to us to forget. This phenomenon occurs when information did not undergo suitable treatment. Indeed, the organisation process is essential in the success of the mechanism of recall. In other words, chances to find a recollection (a fact in the memory) depend on the specificity of elements with which it has been linked. Those facts can be acquired explicitly (for example, we can acquire them by speech). They correspond to an explicit memory called declarative memory (whose contents in knowledge are declarative, according to the AI paradigm). Moreover, our practice and savoir-faire are largely implicit. They are acquired by repetitive exercises rather than
248
A Cognitive Computational Knowledge Representation Theory
consciously. They correspond to an implicit memory called procedural memory. Whereas the latter is mainly made up of procedures acquired by practice, declarative memory can be subdivided in several types such as, semantic memory and episodic memory. Different approaches in cognitive psychology propose various sets of knowledge representation structures. Nevertheless, these sets are not necessary compatible. Depending on authors, the semantic memory is sometimes called “declarative memory” and it may contain an episodic memory (Anderson & Ross, 1980). The episodic/semantic distinction debate was open during many decades. Sophisticated experiments (Herrmann & Harwood, 1980; Tulving, 1983) tried to show that the two memory subsystems are functionally separate. Other surveys were against the distinction between them (Anderson & Ross, 1980). Recent neurological research (Shastri, 2002) proved that the episodic memory is distinct, by its neuronal characteristics, from the semantic memory. However, it seems that at least there is significant overlap between the two memories, even if they are functionally different (Neely, 1989). Basically, it has been argued that knowledge is encoded in various memory subsystems not according to their contents but according to the way in which these contents are handled and used, making the memory a large set of complex processes and modules in continual interactions (Baddeley, 1990). These subsystems are mainly divided in three main sections presenting—each one—a particular type of knowledge: (1) semantic knowledge (Neely 1989), (2) procedural knowledge (Anderson, 1993) and (3) episodic knowledge (Tulving, 1983). Although there is neither consensus on the number of the subsystems nor on their organisation, the majority of the authors in psychology mentions—in some form or in another—these three types of knowledge.
The Semantic Knowledge Representation Our knowledge representation approach regards semantic knowledge as concepts taken in a broad sense. Thus, they can be any category of objects. Moreover, we subdivide concepts in two categories: primitive concepts and described concepts. The first is defined as a syntactically non-split representation; i.e., primitive concept representation can not be divided into parts. For example, in propositional calculus, symbols “a” and “b” of the expression “(a & b)” are non-split representations of the corresponding proposals. On the other hand, we define described concept as a syntactically decomposable representation. For example, the expression “(a & F)” is a decomposable representation that represents a conjunction between proposal “a” and the truth constant “False” (two primitive concepts). Symbol “&” represents the conjunction logic operator (AND) and is a primitive concept. In this way, the semantic of a described concept is given by the semantics of its components and their relations (which take those components as arguments to create the described concept). Thus, it would be possible to combine primitive or described concepts to represent any other described concept.
The Procedural Knowledge Representation The procedural memory subsystem serves to automate problem solving processes by decreasing the quantity of handled information and the time of resolutions (Sweller, 1988). In opposition to semantic knowledge, which can be expressed explicitly, procedural knowledge becomes apparent by a succession of actions achieved automatically—following internal and/or external stimuli perception—to reach desirable states. A procedure is a mean of satisfying needs without using the attention resources. For example, procedural knowledge enables us to recognise words in a text, to write by means of the keyboard, to drive a car or to add mechanically “42 + 11” without being obliged to recall the algorithm explicitly, i.e., making mentally the sum of the units, the one of the tens and twinning the two preceding sums. Performing automatically the addition of “42” and “11” can be seen as procedural knowledge which was acquired by the repetitive doing. This automation—via the use of procedures—reduces the cognitive complexity of problems solving. In its absence, the entire set of semantic knowledge must be interpreted in order to extract relevant knowledge, able to specify real or cognitive actions that are necessary to achieve the task. In that case, the semantic knowledge interpretation often lies beyond of the numerical capacities. This surpassing is one of the most frequent causes of student’s errors during the resolution of problems (Sweller, 1988). However, a procedure can be transformed into semantic knowledge by means of reification. For example, a teacher who explains the sequence of actions to solve a problem reifies the corresponding procedure. Nevertheless, these
249
A Cognitive Computational Knowledge Representation Theory
two types of knowledge (semantics and procedural) can coexist, since automation is not made instantaneously. It is done rather in the course of time and with the frequency of use (Anderson, 1993). In our approach, we subdivide procedures in two main categories: primitive procedures and complex procedures. Executions of the first are seen as atomic actions. Those of the last can be done by sequences of actions, which satisfy scripts of goals. Each one of those actions results from a primitive or complex procedure execution; and each one of those goals is perceived as an intention of the student cognitive system.
The Episodic Knowledge Representation The episodic memory retains details about our experiences and preserves temporal relations allowing reconstruction of previously experienced events as well as the time and context in which they took place (Tulving, 1983). Note that the episode, seen as a specific form of knowledge, has been extensively used in various approaches in a wide variety of research domains, such as modelling cognitive mechanisms of analogy-making (Kokinov & Petrov, 2000), artificial intelligence planning (Garagnani et al. 2002), student modelling within ITS (Weber & Brusilovsky, 2001) and neuro-computing (Shastri, 2002). In our proposed approach, the episode representation is based on instantiation of goals. These are seen as generic statements retrieved from semantic memory. Whereas the latter contains information about classes of instances (concepts), the episodic memory contains statements about instances of concepts (cognitions). As the episodic knowledge is organised according to goals, each episode specifies a goal that translates an intention and gives a sense to the underlying events and actions. If the goal realisation requires the execution of a complex procedure, formed by a set of “n” actions, then the goal will be composed of “n” subgoals whose realisation will be stored in “n” sub-episodes. Thus, executions of procedures are encoded in episodic memory and each goal realisation is explicitly encoded in an episode. In this way, the episodic memory of a student model can store all facts during a learning activity.
The Explicit Representation of Goals In theory, a goal can be described using a relation as follows: (R X, A1, A2, .. An). This relation (R) allows to specify goal “X” according to primitive or described concepts “A1, A2, .. An” which characterise the initial state. Nevertheless, in practice, the stress is often laid on methods to achieve the goal rather than the goal itself; since these methods are, in general, the object of practising. Consequently, the term “goal” is used to refer to an intention to achieve the goal rather than meaning the goal itself. Thus, procedures become methods carrying out this intention, which is noted “R (A1 A2.. An)”; and a goal can be seen as a generic function where the procedures play the role of methods. To underline the intention idea, the expression representing “R” is an action verb. For example, the goal “reduce (F & T)” means the intention to simplify the conjunction of the truth constant “False” with the truth constant “True”.
The Authoring Tool We have designed a domain knowledge generator authoring tool which attempts to offer a user-friendly environment that allows to model graphically any subject-matter domain knowledge (according to our proposed knowledge representation approach) and to transpose it automatically into related XML files. Those are generated to serve as a knowledge support for a tutor reasoning purpose. The authoring tool eases representing and modeling the knowledge by experts without the obligation of high capabilities in computer science at their disposal (for example, the mastery of specification (e.g., UML) and/or programming languages). In this section, we use the description of the authoring tool environment (1) to point out (to the reader) in minute detail the various knowledge representation structures proposed by the theory and (2) to highlight the ergonomic aspect of the assisted modelling.
250
A Cognitive Computational Knowledge Representation Theory
The Graphical Part The left-hand side of the environment consists of a drawing pane where the various types of knowledge can be represented. Concepts and cognitions are represented by triangles. As mentioned, cognitions are concrete instances of concepts and are taken as parameters by goals which pass them to procedures (that achieve goals). Procedures are represented by circles and goals by squares. Abstract concepts and abstract goals are delimited by dashed contours. These abstract objects stand for categories of similar knowledge entities. Thus, they don’t have any concrete instances. Complex procedures and described concepts are delimited by bold contours. Our structural model offers two types of diagrams: procedural diagrams and conceptual diagrams. The former contain (1) specification links connecting a complex procedure to all subgoals that it specifies, (2) satisfaction links associating a goal with all the procedures which attempt to achieve it, and—optionally—(3) handling links involving a goal and its handled cognitions. Figure 1 shows a general view of the authoring tool environment in which a procedural diagram defines that the goal “reduce (F & T)” (a specification of the abstract goal “reduce_Expression-Conjunction”) can be achieved by means of two procedures: “AppRedConjTrue” and “AppRedConjFalse”. The former is the procedure that applies the reduction rule of a conjunction with the “True” truth constant. The latter reduces conjunctions of expressions with the “False” truth constant. The diagram also defines that the goal “reduce (F & T)” handles three cognitions: “cst_t”, “cst_f” and “oper_and”. The first is a concrete instance of the concept “Constant_True”, the second is of the concept type “Constant_False” and the third is an instance of the concept “Operator_Conjunction”. Conceptual diagrams specify hierarchical links (“is-a”) and aggregation links (“part-of”) between primitive and/or described concepts of the domain knowledge. Figure 2 illustrates that concepts “Constant_False” and “Constant_True” are specifications of the primitive concept “Truth_ Constant” which inherits from the abstract concept “Object” and the latter is a specification of the primitive concept “Logical_ Operator”, also a sub-concept of “Object”. Figure 1. A general view of the authoring tool environment
251
A Cognitive Computational Knowledge Representation Theory
The Data Specification Part The right-hand side of the environment permits to author(s) to specify detailed information about the knowledge entities. This information is organised in slots (see figure 1 and figure 2). The first four slots of a concept are metadata that provide general information about the concept. The “Identifier” slot is a character string used as a unique reference to the concept, “Name” contains the concept name (as it is presented to the learner), “Description” specifies its textual description and “Author” refers to its creator. The remaining slots are specific concept data. “Type” indicates the concept type which can be either primitive or described. The “Goals” slot contains a goals prototypes list. The latter provides information about goals that students could have and which use the concept. While “Super-concepts” contains the list of concepts from which the concept inherits, “Sub-concepts” contains the list of concepts which inherit from that concept. This notion of inheritance between concepts can be seen as a shortcut available to authors to simplify modelling, but should not be regarded as a way to model the organisation of concepts in the semantic memory. The organisation of the latter that is currently accepted by the majority of psychologists is the Collins and Loftus model of spreading activation (Collins & Loftus 1975) which states that inheritance links are a particular form of semantic knowledge that can be acquired and encoded as concepts. The “Components” slot is only significant for described concepts. It indicates, for each concept component, its concept type. Finally, “Teaching” points to some didactic resources which can be used to teach the concept. Goals have four specific slots (in addition to all the concept’s slots). “Skill” describes the necessary skill to accomplish the goal, “Parameters” indicates the types of its parameters, “Procedures” contains a set of procedures which can be used to attain it and “Didactic-Strategies” suggests strategies to teach how to realise that goal. Other than the metadata slots where “Description”, “Author” and “Name” are slots identical to those of concepts and goals, each procedure is characterised by its specific data. The “Goal” slot indicates the goal for which the procedure was defined. “Parameters” specifies the concepts type of the arguments. For complex procedures, “Script” indicates a sequence of goals to achieve. For primitive procedures, “Method” points to a Java method
Figure 2. An example of a conceptual diagram
252
A Cognitive Computational Knowledge Representation Theory
that executes an atomic action. “Validity” is a pair of Boolean values. Whereas the first indicates if the procedure is valid and so it always gives the expected result, the second indicates if it always terminate. “Context” fixes constraints on the use of the procedure. “Diagnosis-Solution” contains a list of pairs [diagnosis, strategy] indicating for each diagnosis, the suitable teaching strategy to be adopted. Finally, “Didactic-Resources” points to additional resources (examples, tests, etc.) to teach the procedure.
Practical Validations Experimentations comprise two parts. In the first, the goal was to perform practical validations of an AURELLIO-based model in a real context of learning. The first validation consists of : (1) modelling—by means of the authoring tool—the necessary knowledge for the usage of algebraic Boolean expressions’ reduction rules, which are taught to undergraduate students (at the University of Sherbrooke) by professors in mathematics or in philosophy. (2) Building a VLE which presents a problem solving milieu related to the simplification of Boolean expressions. Finally, (3) testing the VLE with students engaged in real learning activities. In the second part of the experiments, the interest was to emphasize the cognitive aspects of AURELLIO in comparison with ACT-R, a very popular and recognised cognitive approach of knowledge representation and reasoning. We were interested in the modelling of culinary activities’ progress; despite being conscious that this does not belong within the framework of learning by means of virtual environments (as in the first part of our experiments).
The Learning Environment The goal of our tool is both to help students to learn Boolean reduction techniques and to increase the confidence of learners using the VLE. Preliminary notions, definitions and explanations constitute a necessary theoretical knowledge background to approach the Boolean reduction problem. This knowledge is organised into sections and is available through exploration via clicking buttons.
General Overview In the theory section, rules of Boolean simplification are stated and presented to students. In the explanation section, hints and thorough explanations on the suitable usage of Boolean reduction rules are provided. In the examples section, examples are given. Those are generated randomly (with variable degree of difficulty chosen by the tutor). Students can also enter, by means of a visual keyboard, any Boolean expression they want and ask the system to solve it. The problem solving steps and the applied rules are shown on a blackboard. Examples show optimal solutions (minimum of resolution steps and applied rules of simplification) to simplify Boolean expressions and are provided to guide learner during the problem solving, which begins by clicking on the exercise button allowing to access to the corresponding section. Exercises, also with variable complexity levels, give opportunities to learners to practice tasks. Via the visual keyboard, students reduce a randomly generated Boolean expression by choosing suitable simplification rules to apply in the order they want. Although various tutorial strategies are to be considered, we use actually the “Cognitive Tutor” strategy (Anderson et al., 1995), implemented within several ITS and whose effectiveness has been largely proven (Aleven & Koedinger, 2002; Corbett, Mclaughlin & Scarpinatto, 2000). Consequently, in the case of erroneous rule choice (or application) on any of the sub-expressions forming the initial given expression, the system notifies the learner and shows her/him (1) the selected sub-expression, (2) the applied rule to reduce it, (3) the resulted simplified sub-expression and (4) the global expression current state. For example, Figure 3 shows the resolution steps made by a learner (Marie) to reduce an expression in the exercise section.
253
A Cognitive Computational Knowledge Representation Theory
Figure 3. the resolution steps made by Marie to reduce an expression
The Tests’ Description To test and experiment our approach, we asked students in mathematics—who attend the courses “MAT-113” or “MAT-114” dedicated to logic calculus—to practice the reduction of Boolean expressions by using our software. An assisted training, aiming to familiarise them with the various components and available tools in the lab, was offered. Then, we left them practising individually with the learning environment. Data and parameters of this experiment are reported in Table 1. By this experiment, our interest was the evaluation of the authoring tool’s effectiveness to generate knowledge XML files that transpose the dynamic aspect of the representational model (which will be explained and discussed in the fifth section). According to the proposed theoretical approach described above, each step in a learner’s resolution process (during a problem solving task) corresponds to a transition realisable by means of primitive or complex procedure applied to satisfy a goal or a subgoal. This procedure handles primitive and/or described concepts such as rules, proposals, logical operators and truth constants. The results of the experimentation indicate that the generated files, produced via the authoring tool, permit to a tutor prototype (embedded within the VLE) to properly interpret students’ actions during tasks accomplishment and to offer personalised remediation to each student’s errors.
Table 1. Main parameters of the experiment
254
Complexity
1
2
3
4
5
Number of exercises
4
4
5
6
6
Number of students
10
10
10
10
10
A Cognitive Computational Knowledge Representation Theory
For each learner and each exercise made, the system notes the procedures used as well as the cognitions created and handled. Since a procedure is generally called to achieve a goal, the collected data allows deduction of goals (and their subgoals) formulated during the Boolean reduction process. Once saving a trace of the resolution in an “episodic” XML file, the errors’ analysis consists in (1) scanning the content of the XML file to research errors occurred during the reduction of the expression and, for each detected error, (2) identifying a valid procedure (which we note “P_valid”) allowing to achieve the student goal and which could have been used (by the learner) instead of an erroneous procedure (which we note “P_error”). The identification of a correct procedure—which makes use of a Boolean reduction rule—is made thanks to the XML file containing the domain knowledge (produced via the authoring tool). When the student makes errors, the tutor proposes to her/him a new Boolean expression (which we note “Expr_FBack”) that the simplification will (in theory) make use of “P_valid”. In this sense, “Expr_FBack” can be seen as a personalised remediation following the occurrence of “P_error”.
The Tiramisu Experiment In a second practical validation, we have been interested in the computational and cognitive modelling of the behaviour during food preparation which is suspended by an interruption phenomenon and its impact on the task accomplishment. By this experiment, the interest was to bring out the advantages of our knowledge representation approach in comparison with the popular ACT-R cognitive theory (Anderson, 1993).
The Knowledge Representation in ACT-R The ACT-R theory has been based on a great number of studies in experimental psychology and has a great success as a cognitive modelling approach. As mentioned, it allows expressing a wide range of phenomena such as the acquisition and transfer of particular types of knowledge. The basic architecture of ACT-R consists of a set of modules, each devoted to processing different kind of information. Coordination in the behaviour of these modules is achieved through a central production system. Knowledge of the long-term memory is divided into two distinct categories: declarative knowledge and procedural knowledge. The former is composed of elements having descriptive nature and called chunks (Halford, 1993). A chunk must be an instance of a chunk type which defines the chunk attributes whose values are specified when declaring the chunk. Communication between the various modules is done via buffers. In each buffer, it is permitted to deposit and/or to recuperate only a chunk at a time. The acquisition of new declarative knowledge is made by stimuli interpretation of the environment or by calling procedural knowledge. The later are production rules that manipulate declarative knowledge. Each production has a “conditions → actions” representation. Whereas conditions specify the necessary buffers contents to fire up the rule, actions (i) detail facts to add to buffers after achieving the rule or (ii) modify the value of one or several chunk attributes in a buffer. In ACT-R, the procedural knowledge is acquired in situations of learning by acting. Novel knowledge is initially stored in declarative form and then, with the frequency of activation during the learning process, this knowledge will be compiled, generalized in rules and will be finally treated as procedures. Thus, every capability is decomposable in minimal elements which are learned by doing. In this sense, learning consists in compiling several particular cases in a general production rule and to practice with it for automation. For instance, complex task realisation, such as the LISP programming, corresponds to sequences of hundreds of production rules learned independently.
The Tiramisu Realisation Abstractly, the Tiramisu recipe realisation consists of well-ordered layers of ingredients—placing twice the set <”base”, “cocoa”, “biscuit”, “cocoa”, “base”, “cocoa”, “biscuit”, “cocoa”> and adding the set <”base”, “cocoa”, “sugar”>, where “cocoa” refers to , “biscuit” denotes and “base” symbolises <mascarpone cream mixed with egg yolk and sugar> (see figure 4). The experiment supposes that the cook (i) initially has these four ingredients mixtures ready to employ and (ii) uses an opaque baking dish that allows noticing only the last added layer. A first type
255
A Cognitive Computational Knowledge Representation Theory
Figure 4. The well ordered layers of the Tiramisu
Figure 5. A problematic situation within the Tiramisu’s realisation
of frequent faults due to an interruption (for example, a telephone call during the preparation of the recipe) is the erroneous layers order. In fact, a problem emerges when continuing (after interrupting) the task and the last placed ingredient (the only observable among all) is (see figure 5). As in this case three possibilities are to be considered (adding , adding , or adding <sugar>), an error risk is highly probable in a lapse of memory situation and if there is no information about what has been done before. A second kind of errors results in being confused about the number of sets already placed and/or those remaining to place without necessarily being mistaken on the mixtures order. For example, duplicating more than twice or adding only once the set <”base”, “cocoa”, “biscuit”, “cocoa”>.
The ACT-R Based Model To model the Tiramisu recipe preparation, we have defined various types of chunks. A first set of chunks contains information allowing to know the number of the ingredients already placed in the baking dish. A second set of chunks memorises the last mixture placed before the ingredient. To cause a fault due to an interruption, the simulation is stopped (at a given time chosen randomly) and the model was forced to compute some arithmetical calculus (during few seconds) to return again to the Tiramisu suspended realisation. Experimental tests show that the model was unable to remember the last chunk used before the interruption. In fact, the chunk which is always reminded was the one having the greatest activation value. According to the ACT-R theory, the activation of a chunk reflects (i) its general usefulness in the past and (ii) its relevance to the current context (Anderson et al., 2004). This activation is given by equation (1) : [ Ai = Bi + ∑j Wj Sji ]
(1)
where Bi is the base-level activation of the chunk i, Wj reflect the attentional weighting of the elements that are part of the current goal, and Sji are the strengths of association from the elements j to chunk i. The activation of a chunk controls both its probability of being retrieved and its speed of retrieval. In this sense, it is impossible to recover a given chunk according to its occurrence in time. Only the activation law determines which chunk will be recovered. Since the time impact on the calculation of the Bi factor decreases rapidly, after an interrup-
256
A Cognitive Computational Knowledge Representation Theory
tion the number of use of a chunk becomes the most decisive feature that affect its recall. This induces abnormal behaviours of our model, especially in the situation where the last placed only observable ingredient after the interruption was . In theory, to decide what to do next, the model must recall a chunk that memorise the last added ingredient. In practice, the recovered chunk will not be the one desired –the chunk recalled was that used for deciding to lay in the previous step (prior to the occurred interruption) and as it has already been utilised, this increases its activation value. The further an inappropriate chunk is reminded, the more its activation augments. Gradually, this chunk becomes the only remembered one. For example, if it specifies that the last ingredient is this will result in placing an infinite sequence of followed by . To resolve the mentioned problem, it was indispensable to enrich the defined chunks with additional slots. For example, adding the slot “is-last-created” to chunks of the type “last-ingredient” to permits to the model to retain the last ingredient placed before the mixture. The “is-last-created” slot values are exclusively Booleans. However, this proposed solution, even if it demonstrates efficiency in memorising ultimate steps done before interruptions, cannot translate a right use of the human cognitive structures –by handling Boolean parameters, the model behaves and reasons rather like a machine than like a human being.
The AURELLIO Based Model The experiment of the simulation of the Tiramisu realisation shows that our ACT-R based model cannot memorise particular events in a temporal context. Particularly, it cannot remember its last actions. For example, what it has just placed before adding , last operation before the caused interruption. Even if it is possible to simulate a memory of events by distinguishing between various occurrences of the same chunk (by defining several instances of this declarative knowledge entity), to establish the relation between each occurrence and the context in which it was created and handled remains a missing advantage of the model. By encoding each event in a suitable structure, it would be possible (i) to find it later by scanning the structure and (ii) to recreate all the context characteristics to which it was attached. i.e., the intention or the need leading to the event creation, means used to satisfy (or try to satisfy) this need and the generated consequences, etc. Modelling the Knowledge We have used the authoring tool to model the knowledge related to the Tiramisu realisation. For example, Figure 6 shows a structural diagram which defines that the concepts “sugar”, “cocoa”, “none” (which stands for the emptiness of the baking dish), “biscuit” and “base” are four primitive concepts that inherits from the abstract concept “Ingredient”. The structural diagram also establishes that “OneBaseAdded”, “TwoBasesAdded” and “ThreeBasesAdded” are primitive sub-concepts of the abstract concept “BaseAddedRememberance” which refers to the remembrance of the number of already added in the baking dish. Figure 7 illustrates a part of procedural diagram which defines that the subgoal “G_AddRightIngredient” (specified by the complex procedure “P_AddIngredient” called to achieve the goal “G_AddIngredient”) can be attained by means of one complex procedure (“P_AddOnCocoa”) or three other which are primitive (“P_AddBaseOnNone”, “P_AddCocoaOnBiscuit” and “P_AddCocoaOnBase”). Whereas the complex procedure “P_AddOnCocoa” decides what to do after an interruption if the last placed only observable ingredient is , “P_AddBaseOnNone” covers the bottom of an empty baking dish with , “P_AddCocoaOnBiscuit” adds over , and “P_AddCocoaOnBase” spreads over . Implementing the Model An AURELLIO-based simulator was designed. Its purpose is to emulate the realisation of the Tiramisu recipe and its temporary interruption by a secondary task. As shown in figure 8, the graphical interface of the simulator is divided in five panes and a legend that define and explains the symbols used. The “Process Log” pane shows the handled procedural knowledge as well as goals achieved by procedures. The “Human Actions” pane simulates a scenario of the recipe realisation and describes both the execution of the recipe’s steps and the internal operations of the model (data handling). The “Last Episode Remembered” pane shows an episodic knowledge which represents a past event that the model succeeded to remember. The “Tiramisu’s Realisation” pane shows (i) the cook’s view of the baking dish (in which only the last added ingredient is observable) and (ii) the order—in relation to
257
A Cognitive Computational Knowledge Representation Theory
Figure 6. The Tiramisu structural diagram
Figure 7. Part of the Tiramisu procedural diagram
258
A Cognitive Computational Knowledge Representation Theory
Figure 8. The graphical interface of the simulator
the given scenario—of the layers of ingredients. The interruption is represented in the diagram by a continuous red line. The interrupter task is represented in the “Secondary Task” pane. The reduction of Boolean expressions was chosen as a secondary task. All events which occur during a simulation are stored in an XML file. According to the statements that memorising, rather than being a mechanical act of storage, assumes the perception of coherent and logical structures and that the human brain does not record information as a computer does but assimilates, sorts and organises the knowledge entities in various systems that link them and give them a sense, the retrieving of data (in our model) especially requires the initial context of encoding episodes. To remember the second last ingredient after an interruption, the model’s reasoning is based on the interrupted goal. When this one has been identified by scanning the episodic contents of the XML file, the model (1) looks for an episode whose goal that it contains is of type “G_AddIngredient” and which is chronologically nearest to the interrupted goal, (2) identifies the cognition created when achieving that goal (through the realisation of its sub-goals, if the need arises). This cognition represents the last added ingredient; and (3) expresses the intention to add a new ingredient over the last one. Therefore, it formulates a new goal of type “G_AddIngredient”. To remember the second last ingredient in the critical situation where the last added layer was , the model creates a goal of type “G_Remember_SecondLastAdded” which can be achieved by the procedure “P_Get_SecondLastAdded”. The latter scours the episodic memory for the second chronologically nearest goal to the interrupted one and which is of type “G_AddIngredient”. This goal represents the intention to add the second last ingredient before the interruption. The model identifies again the cognition created after the satisfaction of that goal. The research process in the episodic memory is facilitated thanks to (1) the hierarchical chaining between episodes which links any episode to its predecessor and (2) the chronological order of episodes which is determined by the “Time” slot of each episode. In addition to this slot, which indicates the time at which the episode occurred, episodes are characterised by the following slots: (1) “Identifier” is a unique number randomly generated by the system, (2) “Goal-Episode” points to the goal identifier and to those of the handled cognitions, (3) “Procedure-Episode” contains a reference to the selected procedure, (4) cognition obtained by the application
259
A Cognitive Computational Knowledge Representation Theory
Figure 9. A part of the episodic history
of the chosen procedure is stored in “Result”, (5) “Super-Episode” and (6) “Sub-Episodes” contain links to the inner and the outer hierarchical episode, (7) “Status” takes a qualitative value (success, failure, on standby or aborted) according to the status of the current episode; and finally, (8) “Cost” comprises an estimated cost of the procedure usage. For example, figure 9 shows the episodic history of layering <sugar> after verifying that the second last added ingredient was the third layer of . In that particular case (where the second last added ingredient was ) and for the recall of the number of its already added layers (to decide to add either or <sugar> over ), the model starts the process of remembering the number of placed. It combines this second last added ingredient with the cognition “OneBaseAdded”. Then, the procedure “P_Get_BaseAddedRememberance” is called to achieve the goal “G_Remember_BaseAdded”. This procedure seeks in the XML file of the episodic memory for every episode with a goal which is of type “G_AddIngredient” and whose result is a cognition of type . If need be, the model associates in order, to the found episodes the cognitions “TwoBaseAdded” and “ThreeBaseAdded” which represent the fact of having already placed two layers—respectively, three layers—of .
Concluding Remarks By the Tiramisu experiment, we have shown that in momentarily interrupted realisations of a cooking recipe, the widely acknowledged ACT-R knowledge representation approach cannot offer a model that faithfully reproduces the usual human behaviour. More precisely, we highlighted the incapacity of the ACT-R theory to reproduce properly the recall of information in a temporal context (Najjar et al., 2005). We emphasized its failure in memorising remembrances and we have show that AURELLIO—which uses additional knowledge structures that are inspired from the human memory—allows encoding, retrieving and reproducing properly recollections; and thus, attempts to offer models that are highly close to the natural human behaviour. If our model has prove its success in remembering events in a much more natural way than the ACT-R model, it stays that its behaviour during the recall process, although it tries to look like the human one, remains some-
260
A Cognitive Computational Knowledge Representation Theory
what computational. In fact, the localisation process of remembrances does not consist of exploring exhaustively the memory to take out the interrelated events between which the recollection to be located will take place. In reality, the use of points of reference facilitates the expression of a given memory. These markers are states of consciousnesses imperatively necessary to the correct functioning of the recollection mechanism. If, to reach a distant recollection, one had to follow the whole series of events which separate him from it, the memory would be impossible to exist—as a structure—because of the complexity of the operation. In this sense, our model should memorise knowledge in order to make it operational by a complex act of recall which unfolds by successive stages and which, ideally, considers phases of forgetfulness that belong to the process of memorising and recollection. Thus, it would be necessary to take account of an equation of recall using of a law of activation which controls the access to episodes. In this way, a distant event or a recollection whose encoding was not made properly could not be completely and/or always reminded by the model (what is quite natural and usual for a human being).
Discussss Cognitive Informatics have proved that it is very beneficial to integrate into the new generation of software and information technologies industry the encouraging knowledge that studies of internal information processing mechanisms and processes of the brain have accumulated (Shao & Wang 2003; Wang, 2005). In this sense, we think that it would be advantageous and practical to be inspired by a psychological cognitive approach, which offers a fine modelling of the human process of the knowledge handling, for representing both the learner and the domain knowledge within virtual learning environments. Our hypothesis is that the proposed knowledge structures because they are quite similar to those used by human beings, offer a more effective knowledge representation (for example, for a tutoring purpose). In addition, we chose a parsimonious use of cognitive structures suggested by psychology to encode knowledge. Indeed, we divide these structures into two categories: on one hand, semantic and procedural knowledge which is common, potentially accessible and can be shared—with various mastery degrees—by all learners; and, on the other hand, episodic knowledge which is specific for each learner and whose contents depend on the way with which the common knowledge (semantic and procedural) is perceived and handled. More precisely, primitive units of semantic and procedural knowledge—chosen with a small level of granularity—are used to build complex knowledge entities which are dynamically combined in order to represent the learner knowledge. The dynamic aspect is seen in the non-predefined combinations between occurrences of concepts and the applied procedures handling them translating the learner goals. Generally, the complex procedure “P” selected to achieve a given goal “G” determines number and order of subgoals of “G” (whose each one can be achieved, in turn, by a procedure called, in this case, a sub-procedure of “P”). The choice of “P” depends of the learner practices and preferences when s/he achieves the task. This means that goal realisation can be made in various ways, by various scenarios of procedures execution sequences. Therefore, number and chronological order of subgoals of “G” are not predefined. Thus, the learner cognitive activity is not determined systematically, in a static way, starting from her/his main goal. Traces of this cognitive activity during problems solving are stored as specific episodic knowledge. This allows a tutor to scan the episodic knowledge model that the system formulates in its representation of the learner to determine—via reasoning strategies—the degree of mastery of procedural knowledge and/or the acquisition level of semantic knowledge. Our distinction between semantic and procedural knowledge is mainly based on the criteria of the ACT-R theory (Anderson, 1993). However, AURELLIO takes into account an additional component of the declarative memory—the episodic memory, a structure which is characterised by the capacity to encode information about lived facts (Richard & Goldfarb, 1986). Humphreys et al. (1989), for instance, affirms that cognitive-based models which do not make a distinction between declarative and episodic cannot distinguish various occurrences of the same element of knowledge2. An episodic structure appears advantageous to use for the analysis of the learning tasks’ trace. This trace can be examined for a better understanding of the learner reasoning. In addition, the episodic knowledge structuring suggested by AURELLIO places any episode in a novel hierarchical context. The proposed structures connect concrete episodes in a hierarchy which is not of the type generalisation/specialisation (Weber & Brusilovsky, 2001), but rather of the type event/sub-events where event is represented by an ancestry episode and sub-events are represented by a line of descent episodes. Thus, for
261
A Cognitive Computational Knowledge Representation Theory
an adequate strategic reasoning, it is possible—for example—for an intelligent agent to scan and scrutinise the episodic history in order to extract relevant indices directly from concrete episodes. Another original aspect of our approach is the explicit introduction of goals into the knowledge representation. Although they are treated by means of procedures, we consider goals are a special case of knowledge that represents intentions behind the actions of the cognitive system. i.e., a goal is seen as a semantic knowledge which describes a state to be reached. The fact that there exists a particular form of energy employed to acquire goals distinguishes them from any standard form of knowledge. This distinction involves a different treatment for goals in the human cognitive architecture (Altman & Trafton, 2002). We propose to treat goals explicitly to reify them as particular semantic knowledge which is totally distinct from those which represent objects.
Conclusion We have presented a knowledge representation approach which combines some cognitive theories on knowledge handling and acquisition with computational modelling techniques. We have described an authoring tool which offers to represent—according to the proposed approach—any domain knowledge. We have depicted practical validations of the authoring tool and the related knowledge representation models. We have finally emphasized some originalities of our theory. We are currently investigating a new idea for integrating pedagogic and didactic knowledge in our knowledge representation approach. We are also performing advanced user-tests with the authoring tool. Tests results will undoubtedly lead to the software’s improvement. On another side, we are elaborating the equation of recall which uses an activation law to control the remembrance degree of episodes in the memory.
Ackk The authors want to thank (1) Froduald Kabanza for his instructive comments on an early version of this chapter, (2) Jean-François Lebeau and Francis Bouchard for their help on the realisation of the Tiramisu experiment and (3) Philippe Fournier-Viger for his contribution on the realisation of the Boolean reduction VLE and the Tiramisu experiment.
References Aleven, V. & Koedinger, K. 2002. An Effective Metacognitive Strategy : Learning by doing and explaining with computer-based Cognitive Tutors. Cognitive Science. 26 (2): 147-179. Altmann, E. & Trafton, J. 2002. Memory for goals: An Activation-Based Model. Cognitive Science, 26, 39-83. Anderson, J. R. 1993. Rules of the mind. Lawrence Erlbaum Eds. Anderson, J. R., Corbett, A.T., Koedinger, K.R. & Pelletier, R. 1995. Cognitive Tutors: Lessons learned. The Journal of Learning Sciences. 4 (2):167-207. Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Qin, Y. 2004. An integrated theory of the mind. Psychological Review 111 (4): 1036-1060. Anderson, J. R., & Ross, B. H. 1980. Evidence against a semantic-episodic distinction. Journal of Experimental Psychology: Human Learning and Memory, 6, 441-466. Baddeley, A. 1990. Human Memory : theory and practice. Hove (UK): Lawrence Erlbaum. Brusilovsky, P. & Peylo, C. 2003. Adaptive and intelligent Web-based educational systems. International Journal of AI in Education. 13 (2) : 159-172.
262
A Cognitive Computational Knowledge Representation Theory
Collins, M. & Loftus, F. 1975. A spreading activation theory of semantic processing. Psychological Review. (82):407-428. Corbett, A., Mclaughlin, M. & Scarpinatto, K. C. 2000. Modeling Student Knowledge: Cognitive Tutors in High School and College. Journal of User Modeling and User-Adapted Interaction. (10): 81-108. de Rosis, F. 2001. Towards adaptation of interaction to affective factors. Journal of User Modeling and UserAdapted Interaction. 11(4). Gagné, R., Briggs, L. & Wager, W. 1992. Principles of Instructional Design. (4th edition), New York: Holt, Rinehart & Winston (Eds.). Garagnani, M., Shastri, L., & Wendelken, C. 2002. A connectionist model of planning as back-chaining search. Proceedings of the 24th Conference of the Cognitive Science Society. Fairfax, Virginia, USA. pp 345-350. Halford, G. S. 1993. Children’s understanding: The development of mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Heermann, D. & Fuhrmann, T. 2000. Teaching physics in the virtual university: the Mechanics toolkit, in Computer Physics Communications, 127, 11-15 Hermann, D. & Harwood, J. 1980. More evidence for the existence of separate semantic and episodic stores in long-term memory. Journal of Experimental Psychology, 6 (5), 467-478. Humphreys, M. S., Bain, J. D. & Pike, R. 1989. Different ways to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96, 208-233. Kokinov, B. & Petrov, A. 2000. Dynamic extension of episode representation in analogy-making in AMBR. Proceedings of the 22nd Conference of the Cognitive Science Society, NJ. 274-279. Lintermann, B. & Deussen, O. 1999. Interactive Structural and Geometrical Modeling of Plants, IEEE Computer Graphics and Applications, 19 (1). Najjar, M., Fournier-Viger, P., Lebeau, J. F. & Mayers, A. (2006). Recalling Recollections According to Temporal Contexts—Applying of a Novel Cognitive Knowledge Representation Approach. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06). July 17-19, Beijing, China. Najjar, M., Fournier-Viger, P., Mayers, A. & Bouchard, F. 2005. Memorising Remembrances in Computational Modelling of Interrupted Activities. Proceedings of the 7th International Conference on Computational Intelligence and Natural Computing. July 21-26, Salt Lake City, Utah, USA. pp: 483-486. Neely, J. H. 1989. Experimental dissociations and the episodic/semantic memory distinction. Experimental Psychology: Human Learning and Memory, (6): 441-466. Richards D. D. & Goldfarb J. 1986. The episodic memory model of conceptual development: an integrative viewpoint. Cognitive Development. 1, 183-219 Rzepa, H & Tonge, A. 1998. VChemlab: A virtual chemistry laboratory. Journal of Chemical Information and Computer Sciences, 38(6) : 1048-1053. Shao, J. & Y. Wang. 2003. A New Measure of Software Complexity based on Cognitive Weights, IEEE Canadian Journal of Electrical and Computer. Engineering. 28(2), pp.69-74. Shastri, L. 2002. Episodic memory and cortico-hippocampal interactions. Trends in Cognitive Sciences, 6: 162168. Sweller, J. 1988. Cognitive load during problem solving: effects on learning. Cognitive Science. 12: 257-285. Tulving, E. 1983. Elements of Episodic Memory. Oxford University Press, New York.
263
A Cognitive Computational Knowledge Representation Theory
Weber, G. & Brusilovsky, P. 2001. ELM-ART: An adaptive versatile system for Web-based instruction. International Journal of AI in Education 12 (4) : 351-384. Wang, Y. 2003. Cognitive Informatics: A New Transdisciplinary Research Field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), pp.115-127. Wang, Y. & W. Kinsner. 2006. Recent Advances in Cognitive Informatics, IEEE Transactions on Systems, Man, and Cybernetics (Part-C), 36(2), March, pp.121-123. Wang, Y. & L. Wang. 2006. Cognitive Informatics Models of the Brain, IEEE Transactions on Systems, Man, and Cybernetics (Part-C), 36(2), March, pp. 203-207. Wang, Y., D. Liu, & Y. Wang. 2003. Discovering the Capacity of Human Memory. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), pp.189-198. Wang, Y. 2005. The Development of the IEEE/ACM Software Engineering Curricula. IEEE Canadian Review, 51(2), May, pp.16-20. Wells, L.K. & Travis, J. 1996. LabVIEW for Everyone: Graphical Programming Made Even Easier, Prentice Hall Eds., NJ.
Endnotes 1
2
264
Semantic, procedural, episodic and goal-based knowledge representation theory. To fill this gap, ACT-R permits to create several instances of the same chunk type in order to indirectly simulate the existence of an episodic memory. However, this form of memory does not have an explicit structure.
265
Chapter XVIII
A Fixpoint Semantics for Rule-Base Anomalies Du Zhang California State University, USA
Abstract A crucial component of an intelligent system is its knowledge base that contains knowledge about a problem domain. Knowledge base development involves domain analysis, context space definition, ontological specification, and knowledge acquisition, codification and verification. Knowledge base anomalies can affect the correctness and performance of an intelligent system. In this chapter, we describe a fixpoint semantics for a knowledge base that is based on a multi-valued logic. We then use the fixpoint semantics to provide formal definitions for four types of knowledge base anomalies: inconsistency, redundancy, incompleteness, circularity. We believe such formal definitions of knowledge base anomalies will help pave the way for a more effective knowledge base verification process.
Introduction Computing plays a pivotal role in our understanding of human cognition (Pylyshyn, 1989). The classical cognitive architecture for intelligent behavior assumes that both computers and minds have at least the following three distinct levels of organization (Pylyshyn, 1989). (a) The semantic level or the knowledge level where the behavior of human beings or appropriately programmed computers can be explained through the things they know and the goals they have. It attempts to establish, in some meaningful or even rational ways, connections between the actions (by human or computer) and what they know about their world. (b) The symbol level where the semantic content of knowledge and goals is assumed to be encoded through structured symbolic expressions. It deals with representation, structure and manipulation of symbolic expressions. (c) The physical or biological level where the physical embodiment of an entire system (human or computer) is considered. It encompasses the structure and the principles by which a physical object functions. Pylyshyn’s cognitive penetrability criterion states that “the pattern of behavior can be altered in a rational way by changing subjects’ beliefs about the task” (Pylyshyn, 1989). It is the subjects’ tacit knowledge about the world, not the properties of the architecture that enables such behavior adjustment.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Fixpoint Semantics for Rule-Base Anomalies
The hallmark of a knowledge-based system is that by design it possesses the ability to be told facts about its world and to alter its behavior accordingly (Brachman & Levesque, 2004). It exhibits the property of cognitive penetrability. Today, knowledge-based systems not only play an important role in furthering the study in cognitive informatics (Wang et al., 2002; Patel et al., 2003; Chan et al., 2004; Kinsner et al., 2005; Wang, 2002, 2007; Wang and Kinsner, 2006), but also have found their way into so many problem domains (Cycorp, 2006) and have been utilized to generate numerous successful applications (IBM, 2006; Ross, 2003). A crucial component of an intelligent system or a knowledge-based system is its knowledge base (KB) that contains knowledge about a problem domain (Brachman & Levesque, 2004; Fagin et al, 1995; Levesque & Lakemeyer, 2000). Knowledge base development involves domain analysis, context space definition, ontological specification, and knowledge acquisition, codification and verification (Zhang, 2005). When developing a KB for an application, it is important to recognize the context under which we formulate and reason about domain-specific knowledge. A context is a region in some n-dimensional space (Lenat, 1998). In a KB development process, domain analysis should result in identification of the region of interest in the context space. Specifying a context entails specifying or locating a point or region along each of those n dimensions. Once the context (or contexts) for a problem domain is identified, ontological development is in order. An ontology is a formal, explicit specification of a shared conceptualization (Chandrasekaran et al, 1999; Gomez-Perez et al, 2004; O’Leary, 1998). After the conceptualization is in place, knowledge acquisition, codification and verification can be carried out to build the KB for some application. Inevitably, there will be anomalies in a KB as a result of existing practices in its development process. Knowledge base anomalies can affect the correctness and performance of an intelligent system, though some systems are robust enough to perform rationally in the presence of the anomalies. It is necessary to define KB anomalies formally before identifying where they are in a KB and deciding what to do with them. In this chapter, our focus is on formal definitions of KB anomalies and on the issue of how to identify them. Our attention is on rule-based KB. A rule-based KB has a set of facts that is stored in a working memory (WM) and a set of rules stored in a rule base (RB). Rules represent general knowledge about an application domain. They are entered into a RB during initial knowledge acquisition or subsequent KB updates. Facts in a WM provide specific information about the problems at hand and may be elicited either dynamically from the user during each problem-solving session, or statically from the domain expert during knowledge acquisition process, or derived through rule deduction. We assume that rules in a KB have the following format: P1 ∧...∧ Pn → R, where Pis are the conditions (collectively, the left-hand side, LHS, of a rule), R is the conclusion (or right-hand side, RHS, of a rule), and the symbol “→" is understood as the logical implication. The Pis and R are literals. If the conditions of a rule instance are satisfied by facts in WM, then the rule is enabled and its firing deposits its conclusion into WM. A fact is represented as a ground atom. It specifies an instance of a relationship among particular objects in the problem domain. WM contains a collection of positive ground atoms which are deposited through either assertion (initial or dynamically), or rule deduction. A negated condition ¬p(x) in the LHS of a rule is satisfied if p(x) is not in WM for any x. A negated ground atom ¬p(a) in the LHS of a rule is satisfied if p(a) is not in WM. A negated conclusion ¬R in the RHS of a rule results in the removal of R from WM, when the LHS of the rule is satisfied1. Rule instances and negated literals can be utilized by the inference system, but are never deposited into WM (Ginsberg & Williamson, 1993). Let WM0 denote the initial state for WM. We use WMi (i = 1,2,3,…) to represent subsequent states of WM as a result of firing all enabled rules under the state of WMi-1. For the basic concepts and terminology in the first order predicate logic, readers are referred to (Ben-Ari, 1993; Chang & Lee, 1973). The rest of the chapter is organized as follows. Section 2 offers a brief review of the related work. Section 3 discusses the four types of KB anomalies. Section 4 describes the fixpoint semantics we adopt for a KB. Formal definitions of the KB anomalies are given in Section 5 in terms of the fixpoint semantics. Finally Section 6 concludes the chapter with remark on future work.
266
A Fixpoint Semantics for Rule-Base Anomalies
Related Work Early work in rule-base verification and validation treated the anomalies as specific deficiencies, and focused on devising algorithms to detect them. The approaches were based on formalizing KB either in terms of some graphical model (Zhang, 1994) or as a quasi logic theory (Ginsberg & Williamson, 1993) or in some knowledge representation formalism (Nguyen et al, 1987). For a summary of the previous results and additional references, please refer to (Menzies & Pecheur, 2005). There are three ways to assign the meanings to a logical theory: operational, model-theoretic, and fixpoint (van Emden & Kowalski, 1976). Operational semantics is based on the deduction of a set of ground atoms through some inference method. Model-theoretic semantics defines a set of ground atoms that are a logical consequence of the given logical theory. Fixpoint semantics associates a transformation with the given logical theory and uses the transformation’s fixpoint to denote the meanings of components in the theory. All three semantics are proved to be equivalent for certain logical theories (e.g., definite clause programs) (van Emden & Kowalski, 1976). In goal-oriented inference systems, various computation rules compute the relations determined by the fixpoint semantics (van Emden & Kowalski, 1976). However, because of the discrepancy between a first-order logical theory and a KB (Rushby & Whitehurst, 1989), the existing results thus far include some conservative semantics for KB. Model-theoretic flavored approaches include (Zhang & Luqi, 1999; Levy & Rousset, 1998; Rushby & Whitehurst, 1989). Both consistency and completeness are defined in (Rushby & Whitehurst, 1989), whereas the work in (Levy & Rousset, 1998) addresses consistency, completeness and dependency. The results in (Zhang & Luqi, 1999) discuss not only consistency and completeness, but also redundancy and circularity. The difficulties in defining an operational semantics for KB are discussed in (Rushby & Whitehurst, 1989). An approximate imperative semantics for KB is outlined in (Rushby & Whitehurst, 1989), which relies on establishing an invariant for the rule base. Though fixpoint approach has been used to delineate semantics in logic programming (see Section 4) and to underpin the concept of common knowledge in (Fagin et al, 1995), it has not been utilized to characterize KB anomalies. Finally, the issues of knowledge consistency and completeness are also dealt with in the context of default reasoning (Brachman & Levesque, 2004).
Rule-Base Anomalies In this chapter, we are interested in the formal definitions of the following KB anomalies: 1. 2. 3. 4.
Inconsistency. Redundancy. Incompleteness. Circularity.
Before we proceed to the definitions of KB anomalies in terms of the fixpoint semantics, we need to say a few words about the language used for KB. We assume that the language is expressive enough to allow the following terms to be properly defined: same, synonymous, complementary, mutual exclusive, and incompatible literals (L1 and L2 in Table 1 are literals).
Inconsistency Under the semantics of classical logic, if a KB derives conclusions that are contradictory, the KB is said to contain inconsistent knowledge. The root cause of KB inconsistency is due to rules in RB, but its manifestation is through WM. For instance, the inconsistency of a RB containing a pair of rules {p(x) → q(x), p(x) → ¬q(x)} is not apparent until a fact p(a) is asserted into WM and both rules are enabled and fired. In general, although the rules in a RB are consistent on their own (because there exists a model for them), they can form an inconsistent
267
A Fixpoint Semantics for Rule-Base Anomalies
theory when combined with certain facts in WM. In order for a KB to be consistent, there needs to be a model for both RB and WM. On the other hand, facts in WM are changing over time due to dynamic assertions and retractions. RB may be consistent with WMi, but inconsistent with WMj where i ≠ j. Thus, relying on a particular WM state in verifying the consistency of RB may not produce an accurate result. KB consistency has both spatial and temporal properties. Spatially, KB consistency can be local within a context or global among several contexts. Temporally, it can be transient with regard to some WMi or persistent for all WMi. There are logical consistency and output consistency (whether the same set of inputs produces the same set of outputs or several sets of outputs) (Rushby & Whitehurst, 1989). In this chapter, we are interested in characterizing logical, intra-context and persistent consistency in a KB. Our goal is to find a way to identify the types of inconsistency that result in derivations of complementary, mutual exclusive, or incompatible conclusions. KB inconsistency can be attributed to a number of factors: • • • • • • •
Merging or fusing of knowledge bases results in disagreeing rules being introduced. In a distributed knowledge base environment where a federation of geographically dispersed KBs is deployed, contradictive knowledge can be present at different sites. The development language for a KB allows for certain literals, such as complementary, mutual exclusive, or incompatible, to be expressively used in the KB and the ontology explicitly sanctions those concepts. There is lack of explicit constraints in the ontology specification (e.g., ontology does not specify that animals and vegetables are mutually exclusive and jointly exhaustive in living things). Honest programming errors in complex applications may be possible causes. The need for assertion lifting (i.e., lifting, or importing assertions from one context to another (Lenat, 1998)) may also be a cause. Redundancy-induced circumstances where some redundant rule is modified, but others do not (see below).
Redundancy Declaratively, KB redundancy does not diminish nor increase the set of KB entailments. Operationally, KB redundancy may lead to the following anomalous situations. (a) During KB maintenance or evolution, if one of the redundant rules is modified and the others remain unchanged, then the updated KB will not correspond to
Table 1. Same, synonymous, complementary, mutual exclusive, and incompatible literals Syntax Identical
Different
* †
268
Conflict
Semantics
Equivalent
*
†
Same: denoted as L1 = L2 L1 and L2 are syntactically identical (same predicate symbol, same arity, and same terms at corresponding positions)
Synonymous: denoted L1 ≅ L2 L1 and L2 are syntactically different, but logically equivalent
Complementary: denoted L1 # L2 L1 and L2 are an atom and its negation
mutual exclusive: denoted L1 L2 L1 and L2 are syntactically different and semantically have opposite truth values Incompatible: denoted L1 ≭ L2 L1 and L2 are complementary pair of synonymous literals
Given two rules r i and rk , if LHS(r i) = {P1,...,Pn} and LHS(rk) = {P1′,...,Pn′}, then LHS(r i) = LHS(rk) iff ∀i ∈ [1, n] Pi = Pi′. Given two rules r i and rk , if LHS(r i) = {P1,...,Pn} and LHS(rk) = {P1′,...,Pn′}, then LHS(r i) ≅ LHS(rk) iff ∀i ∈ [1, n] Pi ≅ Pi′.
A Fixpoint Semantics for Rule-Base Anomalies
the intended change, and inconsistencies can be introduced as well. (b) For a KB where no certainty factors are utilized, redundant rules may be enabled under a given state, thus resulting in performance slow down because all the enabled redundant rules may be fired, even though the firings of those redundant rules will yield the same set of literals (conclusions). (c) For a KB containing certainty factors, redundancy will become a serious problem, the reason being that each redundant rule’s firing results in a duplicate counting of the same information, which, in turn, erroneously increases the level of confidence assigned to the derived literals (conclusions). This may ultimately affect the set of deducible literals. Spatially, KB redundancy can occur within a context or among several contexts. There are several types of redundancy (Zhang & Luqi, 1999). Causes for redundancy include knowledge base merging, programming errors, and assertion lifting.
Incompleteness A KB is incomplete when it does not have all the necessary information to answer a question of interest in an intended application (Brachman & Levesque, 2004; Levesque, 1984; Rushby & Whitehurst, 1989). Thus, completeness represents a query-centric measure for the quality of a KB. It is a challenging issue because of the following reasons: (a) In many applications, the KB is built in an incremental and piecemeal fashion and it undergoes a continual evolution. The information acquired at each stage of the evolution may be vague or indefinite in nature. (b) The deployment of a KB system cannot just wait for the KB to be stabilized in some final and complete form since this may never happen. Despite the fact that a practical KB may never exhaustively capture knowledge in all aspects of a real problem domain, it is still possible for a KB to be complete for a specific area in the domain. The boundaries of this specific area can be defined in terms of all relevant queries that can be asked during problem solving sessions. If a KB has all the information to answer those relevant queries definitely, then the KB is complete with regard to those queries. The concepts of relevant queries and the ability of a KB to answer those queries are what underpin our discussion of KB completeness. Given a KB, we define ℙKB and ℙA as sets of all predicate symbols and askable predicate symbols in the KB, respectively. An askable predicate symbol is one that can appear in a query. Usually it is the case that ℙKB ⊇ ℙA. A query ǭ containing predicate symbols pi,.., pj ∈ℙA is denoted as ǭ ≃ ǫ(pi,.., pj). A set ℚ of relevant queries is now defined as follows: ℚ = {ǭ | ǭ appears in some query session ∧ ǭ ≃ ǫ(pi,..., pj) ∧ pi,..., pj ∈ℙA }. Given a query ǭ ∈ℚ, the answer to ǭ, denoted as α(ǭ), can be either definite or unknown. α(ǭ) is definite if either KB⊢ǭ or KB⊢¬ǭ; α(ǭ) is unknown if neither KB⊢ǭ and nor KB⊢¬ǭ.
Circularity There are many circular phenomena in computer science and AI. The theory of non-well-founded sets has been utilized to study the phenomena in (Barwise & Moss, 1996). Circularity in a KB has been informally defined as a set of rules forming a cycle (Chang et al, 1990; Nguyen et al, 1987; Rushby, 1988). What exactly KB circularity entails semantically is not that clear in the literature. In (Zhang & Luqi, 1999), KB circularity is defined in terms of the derivation of tautologous rules. The phenomena reflect an anomalous situation in a KB and have both operational and semantic ramifications. Operationally speaking, circular rules may result in infinite loops (if an exiting condition is not properly defined) during inference, thus hampering the problem solving process. Semantically speaking, the fact that a tautologous formula is derivable indicates that the circular rule set encompasses knowledge that is always true regardless of any problem specific information. In general, tautologous formulas are those that are true by virtue
269
A Fixpoint Semantics for Rule-Base Anomalies
of their logical form and thus provide no useful information about the domain being described (Genesereth & Nilsson, 1987). Therefore, circular rules prove to be less useful in the problem solving process. What is needed, as evidenced in many real KB systems, are consistent rules that are triggered by problem specific information (facts) rather than tautologous rules that are true regardless of a problem to be solved.
Fixpoint Semantics for Knowledge Base There are a number of fixpoint semantics for a logical theory (Fitting, 1991; Fitting, 2002): classical two-valued, two-valued with stratification, three-valued for handling negation, four-valued for dealing with inconsistency and incompleteness, and the truth value space of [0, 1]. In this chapter, we adopt the four-valued logic FOUR as defined in (Belnap, 1977; Ginsberg, 1988). FOUR has the truth value set of {true, false, ⊥, } where true and false have their canonical meanings in the classical two-valued logic, ⊥ indicates undefined or don’t know, and is overdefined or contradiction (Figure 1). The four-valued logic FOUR is the smallest nontrivial bilattice, a member in a family of similar structures called bilattices (Ginsberg, 1988). Bilattices offer a general framework for reasoning with multi-valued logics and have many theoretical and practical benefits (Ginsberg, 1988). As can be seen later in Section 5, the fixpoint characterizations of rule-base anomalies will make it possible for an automated anomaly detection process. According to (Belnap, 1977), there are two natural partial orders in FOUR: knowledge ordering ≤k (vertical) and truth ordering ≤t (horizontal) such that: ⊥ ≤k false ≤k , ⊥ ≤k true ≤k and false ≤t ≤t true, false ≤t ⊥ ≤t true. Both partial orders offer a complete lattice. The meet and join for ≤k , denoted as ⊗ and ⊕, respectively, yield: false ⊗ true= ⊥ and false ⊕ true = . The meet and join for ≤t, denoted as ∧ and ∨, respectively, result in: ∧ ⊥ = false and ∨ ⊥ = true. The knowledge negation reverses the ≤k ordering while preserving the ≤t ordering. The truth negation reverses ≤t ordering while preserving ≤k ordering. For a knowledge base Ω, we define a transformation TΩ , which is a “revision operator” (Fitting, 2002) that revises our beliefs based on the rules in RB and established facts in WM. The interpretation of TΩ can be understood in the following sense. A single step of TΩ to Ω amounts to generating a set of ground literals, denoted as ⊢WMiRB, which is obtained by firing all enabled rules in RB under WMi. It can be shown that TΩ is monotonic and has a least fixpoint lfp(TΩ) with regard to ≤k (Fitting, 1991, Fitting, 2002). Since a monotonic operator also has a greatest fixpoint denoted as gfp(), gfp(TΩ) exists and can be expressed as follows: gfp(TΩ) = {B | TΩ(B) = B}. Figure 1. The four-valued logic FOUR
false
True
⊥
270
A Fixpoint Semantics for Rule-Base Anomalies
Because of the definition of TΩ , lfp(TΩ) is identical to gfp(TΩ) for a given knowledge base Ω. Operationally, the fixpoint of TΩ for a KB can be obtained as follows. Given a set G of initial facts, WM0 gets initialized based on G. i = 0; Φ0 = G; Φ1 = Φ0 ∪ ⊢WM0RB; while (Φi+1 != Φi) do { i++; Φi+1 = Φi ∪ ⊢WMiRB }; lfp(TΩ) = gfp(TΩ) = Φi; lfp(TΩ) (gfp(TΩ)) contains all the derivable conclusions from the KB through some inference method, thus, constituting the semantics for the KB. In the following fixpoint descriptions for rule-base anomalies, we simply utilize the lfp() in the definitions. Let ν be a mapping from ground atomic formulas to FOUR. Given a ground atomic formula A, ν(A) = true, if A ∈ lfp(TΩ) ∧¬A ∉ lfp(TΩ). ν(A) = false, if ¬A ∈ lfp(TΩ) ∧ A ∉ lfp(TΩ). ν(A) = , if A ∈ lfp(TΩ) ∧¬A ∈ lfp(TΩ). ν(A) = ⊥, if A ∉ lfp(TΩ) ∧¬A ∉ lfp(TΩ).
Fpoint Semantics for Rule-Base Anomalies With the least fixpoint semantics for a KB in place, we can study the formal definitions of knowledge base anomalies, the detection and resolution of which are one of the important issues in developing reliable and correct knowledge bases.
Consistency Given a knowledge base Ω, we obtain its least fixpoint lfp(TΩ). KB is said to contain inconsistent knowledge if for hi and hk ∈ lfp(TΩ) (i ≠ k), the following holds: ∃ hi, hk ∈ lfp(TΩ) [(ν(hi) = ) ∨ (ν(hk) = ) ∨ (hi hk) ∨ (hi ≭ hk)] When lfp(TΩ) contains a pair of either complementary, or mutual exclusive, or incompatible literals, it indicates that the KB contains inconsistency.
Redundancy Let FB stand for a list of facts and ri a rule in RB. For a knowledge base Ω = RB ∪ FB, let us consider another knowledge base Ω′ = RB′ ∪ FB, where RB′ = RB − ri. Ω contains redundancy if lfp(TΩ) = lfp(TΩ′). Since Ω′ yields the same set of derivable assertions as Ω and |Ω′|<|Ω|, Ω has some unnecessary information, thus containing redundancy.
271
A Fixpoint Semantics for Rule-Base Anomalies
Completeness A KB is complete with regard to a relevant query set ℚ if ∀ǭ ∈ ℚ [α(ǭ) is definite]. A KB is complete with regard to ℚ if and only if the following holds: ∀q ∈ ℚ [ν(q) = true ∨ ν( q) = false ]. Or the KB is incomplete when ∃q ∈ ℚ [ν(q) = ⊥]. That is, lfp(TΩ) lacks definite information for some relevant query.
Circularity A rule is tautologous if it contains a complementary or an incompatible pair of literals. A nonempty set RB of rules is circular if the deduction of a tautologous rule rj from RB is possible (RB ⊢ rj). A nonempty set of rules is minimally circular, denoted as RBminC (⊆ RB), if RBminC is circular and no proper subset of RBminC is circular (Zhang & Luqi, 1999). Given RBminC, we define two sets of literals LHS(RBminC) and RHS(RBminC) as follows: LHS(RBminC) = {L | L ∈ LHS(r) ∧ r ∈ RBminC }, RHS(RBminC) = {L | L ∈ RHS(r) ∧ r ∈ RBminC }. Let ∆ = {hi,…, hk}. A KB Ω contains circular rules if and only if the following holds: ∃∆∈lfp(TΩ) {[∆ ⊆ RHS(RBminC) ∧ ∆ ⊆ LHS(RBminC)] ∧ [∀hj ∈∆ ∃hj′∈∆ [[(hj # hj′) ∧ (RBminC ⊢ rl) ∧ (hj∈ rl) ∧ (hj′ ∈ rl)] ∨ [(hj ≭ hj′) ∧ (RBminC ⊢ rl) ∧ (hj∈ rl) ∧ (hj′ ∈ rl)]]]} Since rl is tautologous, the derivation of it indicates the existence of circular rules. Knowledge base anomalies detection can be carried out by first deriving lfp(TΩ) for a KB, and then verifying if all or any of the aforementioned conditions hold.
Example 1. Following is an example from (Murata et al 1991). Given a medical diagnosis KB consisting of seven rules r1, …, r72 where r1, …, r4 are from doctor one and r5, …, r7 from doctor two, d1 through d3 indicate different diagnosis results and s1 through s5 represent different symptoms. r1: r2: r3: r4: r5: r6: r7:
s1(x) ^ s2(x) → d1(x) s1(x) ^ s3(x) → d2(x) d2(x) → ¬d1(x) d1(x) → ¬d2(x) s1(x) ^ s4(x) → d1(x) ¬s1(x) ^ s3(x) → d2(x) s1(x) ^ s5(x) → d3(x) Now a set of lab test results is made available about John and Bill as follows:
272
A Fixpoint Semantics for Rule-Base Anomalies
G = {s1(john), ¬s1(bill), ¬s2(john), ¬s2(bill), s3(john), s3(bill), s4(john), ¬s4(bill)}. The least fixpoint for the KB is obtained below: lfp(TΩ) = {s1(john), ¬s1(bill), ¬s2(john), ¬s2(bill), s3(john), S3(bill), s4(john), ¬s4(bill), d1(john), d2(john), d2(bill), ¬d1(john), ¬d2(john), ¬d1(bill)}. If all the di are in the query set, then we have the following values for the diagnosis results: ν(d1(john)) = ⊤ ν(d2(john)) = ν(d3(john)) = ⊥ ν(d1(bill)) = false ν(d2(bill)) = true ν(d3(bill)) = ⊥. Thus, the KB contains inconsistent knowledge and is also incomplete.
Example 2. Given a KB consisting the following RB r1: r2: r3: r4:
c1(x) ^ c2(x) → c3(x) c4(x) ^ c5(x) → c6(x) c3(x) ^ c6(x) → p(x) c1(x) ^ c2(x) ^ c5(x) → p(x)
and a set FB of facts {c1(a), c2(a), c3(a), c4(a)}, it has its lfp(TΩ) below: lfp(TΩ) = {c1(a), c2(a), c3(a), c4(a), c5(a), c6(a), p(a)}. Let Ω′ = RB′ ∪ FB, where RB′ = RB − r4. The KB contains redundancy because lfp(TΩ) = lfp(TΩ′).
Example 3. For the RB below r1: e11(x) → e10(x) r2: e7(x) ^ e1(x) → e9(x) r3: e8(x) ^ e3(x) → e11(x) r4: e9(x) ^ e2(x) → e8(x) r5: e10(x) ^ e4(x) ^ e5(x) ^ e6(x) → e7(x) and H = {e1(a), e2(a), e3(a), e4(a), e5(a), e6(a), e11(a)} as a given set of facts, the KB has its lfp(TΩ): lfp(TΩ) = {e1(a), e2(a), e3(a), e4(a), e5(a), e6(a), e7(a), e8(a), e9(a), e10(a), e11(a)}. Let ∆ = {e7(a), e8(a), e9(a), e10(a), e11(a)}. ∆∈lfp(TΩ) and satisfies the circularity conditions with regard to RB, thus indicating that RB is circular. It can be shown that RB is also minimally circular.
273
A Fixpoint Semantics for Rule-Base Anomalies
Example 4. For the following RB: r1: m(x) → ¬h(x) r2: ¬m(x) → ¬q(x) and a set of facts {h(a), q(a)}, even though we cannot conclude that the KB contains inconsistency in terms of the lfp characterization because lfp(TΩ) = ∅, the KB is inconsistent under the model-theoretic semantics. This will become clear if we apply some logical equivalence preserving transformation (say, the contrapositive law) to the original KB to obtain the following equivalent KB’: KB’ = {h(a), h(x) → ¬m(x), q(a), q(x) → m(x)} KB’ has its lfp(TΩ′) that contains {m(a), ¬m(a)}, thus ν(m(a)) = . What this case tells us is two-fold. First, although the control mechanism of the knowledge-based system will not derive any contradiction from the original KB, the knowledge in KB is nevertheless inherently inconsistent and this information is important for the knowledge engineer to know. Second, the inconsistency characterization under the lfp semantics is a sufficient but not necessary condition to the inconsistency characterization under the model-theoretic semantics.
Conclusions This chapter describes a fixpoint semantics based approach to the formal definitions of four types of knowledge base anomalies. Such formal definitions will help pave the way for a more effective knowledge base verification process. Compared with the existing results, our contributions can be summarized as follows. •
•
The fixpoint approach offers a unique perspective to the issue of formally denoting KB anomalies and is conducive to an automated anomaly detection process. on the other hand, even though the model-theoretic semantics yields a stronger result on KB consistency as exhibited by Example 4 above, the approach does not lend itself to a practical anomaly detection process. The proposed approach covers all four types of anomalies.
Automated tools will be developed based on the proposed approach to help assist KB anomaly detections.
Ak The author would like to acknowledge the support from California State University, Sacramento for the work reported in the chapter. Comments from anonymous reviewers are greatly appreciated.
References Barwise, J. & Moss, L. (1996). Vicious circles. Stanford CA: CLSI Publications. Belnap, N. D. (1977). A useful four-valued logic. In G. Epstein & J. Dunn (Ed.), Modern uses of multiple-valued logic (pp. 8-37). D. Reidel, Dordrecht.
274
A Fixpoint Semantics for Rule-Base Anomalies
Ben-Ari, M. (1993). Mathematical logic for computer science. UK: Prentice Hall International. Brachman R. J. & Levesque, H. J. (2004). Knowledge representation and reasoning, San Francisco: Morgan Kaufmann Publishers. Chan, C., W. Kinsner, Y. Wang, and D.M. Miller (Eds.) (2004), Cognitive Informatics: Proc. 3rd IEEE International Conference on Cognitive Informatics (ICCI’04), IEEE CS Press, Victoria, Canada, August. Chandrasekaran, B., Josephson, J. R. & Benjamins, V. R. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems, Vol.14, No.1, pp. 20-26. Chang, C.L. & Lee, R.C.T. (1973). Symbolic logic and mechanical theorem proving. New York: Academic Press. Chang, C. L. Combs, J. B. & Stachowitz, R. A. (1990). A report on the expert systems validation associate (EVA). Expert Systems with Applications, Vol.1, pp.217-230. Cycorp, http://www.cyc.com/ Fagin, R., Halpern, J. Y., Moses, Y. & Vardi, M. Y. (1995). Reasoning about Knowledge. Cambridge, MA: MIT Press. Fitting, M. (1991). Bilattices and the semantics of logic programming. Journal of Logic Programming, Vol.11, pp. 91-116. Fitting, M. (2002). Fixpoint semantics for logic programming: a survey. Theoretical Computer Science, Vol. 278, Issues 1-2, pp. 25-51. Genesereth, M. R. & Nilsson, N. J. (1987). Logical foundations of artificial intelligence. Los Altos, CA: Morgan Kaufmann Publishers. Ginsberg, A. & Williamson, K. (1993). Inconsistency and rdundancy checking for quasi-first-order-logic knowledge bases. International Journal of Expert Systems, Vol.6, No.3, pp. 321-340. Ginsberg, M. L. (1988). Multivalued logics: a uniform approach to inference in artificial intelligence. Computational Intelligence, Vol.4, No.3, pp. 265-316. Gomez-Perez, A., Fernandez-Lopez, M. & Corcho, O. (2004). Ontological engineering. London: Springer-Verlag. IBM’s research project on Information Economies, http://www.research.ibm.com/infoecon/index.html Kinsner, W., D. Zhang, Y. Wang, and J. Tsai (Eds.) (2005), Cognitive Informatics: Proc. 4th IEEE International Conference on Cognitive Informatics (ICCI’05), IEEE CS Press, Irvine, California, USA, August. Lenat, D. (1998, October). The dimensions of context-space. CYCorp Report. Levesque, H. J. (1984). The logic of incomplete knowledge bases. In M. L. Brodie, J. Mylopoulos & J.W. Schmidt (Ed.), On Conceptual Modeling. New York: Springer-Verlag. Levesque, H. J. & Lakemeyer, G. (2000). The logic of knowledge bases. Cambridge, MA: MIT Press. Levy, A. Y. & Rousset, M-C. (1998). Verification of knowledge bases based on containment checking. Artificial Intelligence, Vol. 101, Issues 1-2, pp.227-250. Menzies, T. & Pecheur, C. (2005). In M. Zelkowitz (Ed.), Advances in computers, Vol. 65, Amsterdam, the Netherlands: Elsevier.
275
A Fixpoint Semantics for Rule-Base Anomalies
Murata, T., Subrahmanian, V. S. & Wakayama, T. (1991). A Petri net model for reasoning in the presence of inconsistency. IEEE Transactions on Knowledge and Data Engineering, Vol.3, No.3, pp. 281-292. Nguyen, T. A., Perkins, W. A., Laffey, T. J. & Pecora, D. (1987). Knowledge base verification, AI Magazine, Vol.8, No.2, pp. 69-75. O’Leary, D. E. (1998). Using AI in knowledge management: knowledge bases and ontologies. IEEE Intelligent Systems, Vol.13, No.3, pp. 34-39. Patel, D., S. Patel, and Y. Wang (Eds.) (2003), Cognitive Informatics: Proc. 2nd IEEE International Conference on Cognitive Informatics (ICCI’03), IEEE CS Press, London, UK, August. Pylyshyn, Z. W. (1989). Computing in cognitive science. In M. I. Posner (Ed.), Foundations of cognitive science (pp. 49-92). Cambridge, MA: MIT Press. pp. 49-92. Ross, R. G. (2003). Principles of the business rules approach, Boston, MA: Addison-Wesley. Rushby, J. (1988, October). Quality measures and assurance for AI software. NASA Contractor Report 4187. Rushby J. & Whitehurst, R.A. (1989, February). Formal verification of AI software. NASA Contractor Report 181827. van Emden, M. H. & Kowalski, R. (1976). The semantics of predicate logic as a programming language. Journal of ACM, Vol. 23, pp. 733-742. Wang, Y., R. Johnston, and M. Smith (Eds.) (2002), Cognitive Informatics: Proc. 1st IEEE International Conference on Cognitive Informatics (ICCI’02), IEEE CS Press, Calgary, AB, Canada, August. Wang, Y. (2002a), Keynote speech: On cognitive informatics. Proc. 1st IEEE International Conference on Cognitive Informatics (ICCI02), Calgary, Canada, IEEE CS Press, August, pp. 34-42. Wang, Y. (2007), The Theoretical Framework of Cognitive Informatics, The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), IGP, Hershey, PA, USA, 1(1), Jan., pp.1-27. Wang, Y. and W. Kinsner (2006), Recent Advances in Cognitive Informatics, IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), March, 121-123. Zhang, D. & Nguyen, D. (1994). PREPARE: a tool for knowledge base verification., IEEE Transactions on Knowledge and Data Engineering, Vol.6, No.6, pp. 983-989. Zhang, D. & Luqi. (1999). Approximate declarative semantics for rule base anomalies, Knowledge-Based Systems, Vol.12, No.7, pp. 341-353. Zhang, D. (2005). Fixpoint semantics for rule base anomalies. In the Proceedings of the Fourth IEEE International Conference on Cognitive Informatics, Irvine, CA. pp. 10-17.
E ndnotes 1
2
276
There would be no effect on WM if R is not in WM when ¬R is derived. r7 was not in the original example and was added to illustrate the completeness property.
277
Chapter XIX
Development of an Ontology for an Industrial Domain Christine W. Chan University of Regina, Canada
Abstract This chapter presents a method for ontology construction and its application in developing ontology in the domain of natural gas pipeline operations. Both the method as well as the application ontology developed, contribute to the infrastructure of Semantic Web that provides semantic foundation for supporting information processing by autonomous software agents. This chapter presents the processes of knowledge acquisition and ontology construction for developing a knowledge-based decision support system for monitoring and control of natural gas pipeline operations. Knowledge on the problem domain was acquired and analyzed using the Inferential Modeling Technique, then the analyzed knowledge was organized into an application ontology and represented in the Knowledge Modeling System. Since ontology is an explicit specification of a conceptualization that provides a comprehensive foundation specification of knowledge in a domain, it provides semantic clarifications for autonomous software agents that process information on the Internet.
Introduction The vast amount of information on the World Wide Web has made it increasingly difficult to access and retrieve the required information or data. In response to the problem, the World Wide Web Consortium (W3C) formally proposed the Semantic Web to be the next evolutionary step for the Web in 2001. The Semantic Web aims to attach semantic information to Web resources and would provide the semantic structure or scaffolding that would enable autonomous software agents to traverse the Web in search of or process information on behalf of users or other systems. Within this context, ontologies are important from at least two perspectives. First, they provide a key component of the Semantic Web because an ontology can function as a repository of vocabulary of unambiguous domain-related concepts and their meanings anchored in consensus domain knowledge (Jasper et al. 1999). This semantic structure would enable autonomous agents to access and process information on the Web. An example is the use of ontologies to support negotiation in e-commerce (Tamma et al. 2005). Second, ontologies provide sharable knowledge models on particular problem domains for construction of knowledge-based systems in a
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Development of an Ontology for an Industrial Domain
distributed and open environment such as the Internet. Therefore, from the knowledge engineering perspective, an ontology constitutes a crucial building block for developing knowledge-based systems. Knowledge engineering is the process of eliciting expertise, organizing it into a computational structure, and representing it in a knowledge-based system. The process of knowledge engineering can be viewed from the cognitive informatics perspective (Wang et al, 2002/06; Patel et al., 2003; Chan et al., 2004; Kinsner et al., 2005; Yao et al., 2006; Wang and Kinsner, 2006; Wang, 2002/03/06/07), with a focus on the problem solving expertise in cognition (Chan 2002). The effort spent in engineering knowledge is often substantial due to the tacit nature of expertise; and the process of acquiring knowledge for building the knowledge base is widely recognized as a major bottleneck in the development process. If several knowledge-based systems on the same problem domain are to be constructed, the effort required to build the knowledge bases for the different systems is often duplicated. A possible solution for the problem is to share any knowledge on a given problem domain that has been acquired among systems. Four different approaches for sharing knowledge have been adopted within the Knowledge Sharing Effort sponsored by Air Force Office of Scientific Research, Defense Advanced Research Projects Agency, the Corporation for National Research Initiatives, and the National Science Foundation (Neches et al. 1998). Similar to their objectives, the work presented here aims to construct ontologies which can overcome the barriers to sharing that arise due to lack of consensus across knowledge bases on vocabulary and semantic interpretations in domain models. A critical step in the process of developing an ontology is performing a detailed analysis of the domain. In this chapter, we present a method for knowledge acquisition and ontology construction to support the development of a knowledge-based system, and we demonstrate application of the method to the domain of natural gas pipeline operations. The proposed method involves first eliciting and organizing knowledge using the Inferential Modelling Technique (IMT), and thenbased on the initial classification of knowledge elements in the problem domain constructing an application ontology using an automated knowledge modeling tool called the Knowledge Modeling System. The knowledge represented in the application ontology provides the basis for implementing the advisory system. Long-term objectives of the work include developing a general method for ontology development that may be applied to other domains; contributing to the repository of application ontologies that can provide the basis for developing knowledge-based systems on the Internet; and contributing to the infrastructure needed to facilitate ontology development as part of the Semantic Web for autonomous machine intelligence. This chapter presents the process of developing part of the domain ontology for the natural gas transmission domain. Understanding the ontology development process of this application domain contributes to the long-term objective of deriving a method of ontology development by identifying the knowledge types in this domain. The assumption is made that natural gas domains share a set of common and basic vocabulary, even though different application problems can make use of particular subsets of the common vocabulary. The knowledge types clarified in this natural gas transmission domain constitute one such subset of the general ontology for the natural gas domains. Hence the question of whether this problem domain is typical of the natural gas problems is less important than the fact that this is one of the problems in the natural gas domain. This chapter is organized as follows: Section 1 gives some background on ontological studies within the context of knowledge level modeling and knowledge acquisition. Section 2 briefly describes the application domain of natural gas pipeline network operations. Section 3 presents knowledge acquisition and initial classification of the knowledge using the IMT. Section 4 discusses knowledge representation using the Knowledge Modeling System. Section 5 briefly presents the automated system, called the Gas Pipeline Operation Advisor (GPOA) that was developed based on the application ontology. Section 6 gives some concluding remarks and suggests some directions for future work.
Background Otological Studies Ontological studies evolved over the past two decades from a Knowledge Acquisition (KA) approach that emphasized knowledge level modeling for knowledge-based system development. Knowledge modeling refers to an
278
Development of an Ontology for an Industrial Domain
approach for knowledge acquisition and analysis of a domain which includes the KADS methodology (Schreiber et al. 1988), the generic task approach (Chandrasekaran 1986), and components of expertise approach (Steels 1990). Within the framework of knowledge-level modeling, two major lines of research have developed. One refines the existing knowledge-level frameworks and emphasizes their formalizations. For example, ML2 has been developed as a formal implemented language based on the KADS methodology (Flores et al. 1998). Another line of research aims at developing knowledge level models for a range of tasks and domains in order to uncover generic components, problem solving methods, and ontologies that enable reuse across applications. The objective is to facilitate knowledge acquisition by providing domain-independent generic models that guide knowledge engineers in the construction of knowledge models for a particular domain. We may distinguish knowledge modeling in terms of problem solving methods and domain ontology. Briefly, a problem solving method can be seen as an abstract model which provides a means of identifying at each step, candidate actions in a sequence of actions that accomplish some task within a specific domain (Mcdermott 1998); while an ontology defines the vocabulary of representational terms with agreed upon definitions in human and machine readable forms (Gruber 1992). Within this context, the Inferential Modeling Technique (IMT) is a knowledge modeling technique that supports developing knowledge level models for primarily industrial tasks and domains. The IMT emphasizes both domain and task specific knowledge elements in an industrial application. For a detailed discussion of IMT, see (Chan 1992, 2000, 2002). Both knowledge modeling and ontology studies emphasize the primacy of developing problem-solving models of real world application domains in terms of knowledge rather than representation mechanisms (Clancey 1985, Newell 1981). Van de Velde and Schreiber (1997) suggested that ontologies are basically “practical knowledge level models” which “reflect knowledge content of a system…and make explicit the structure within which the knowledge operates in solving particular classes of problems.” Hence, knowledge or ontology models are considered important not only for developing effective knowledge bases but also for enabling knowledge sharing and reuse because they “make explicit the structures within which the knowledge operates in solving particular classes of problems, which enables reuse of models across applications” (van de Velde and Schreiber 1997). Chandrasekaran et al. (1998) stated that an ontology for a domain provides a vocabulary for the task and the types of knowledge needed for the task; it identifies knowledge types and how to use them. Hence, an ontology is a central component for knowledge-based system construction because it is the “heart of a knowledge representation system” (Chandrasekaran et al. 1998). The explicit conceptualization embodied in an ontology facilitates interoperability, enables reusability among software components, and enhances communication among developers (Guarino and Giaretta 1995). An ontology can be regarded as a description of the most useful, or at least most well-trodden organization of knowledge in a given domain. This organization of knowledge can be reused and shared among different development groups that work on different application tasks but on the same domain. Since the development groups may be geographically dispersed and work on heterogenous software platforms and programming environments, it is critical that they have a common understanding of the terms and concepts that describe a domain. The core knowledge represented by an ontology can become the basis of diverse knowledge-based systems addressing the same problem domain. Application ontologies have been built for a number of problem domains including the medical domain (Hanh et al. 1999), physical domain (Borgo et al. 1997), enterprise modeling (Uschold et al. 1998), natural language (Bateman, 1990), law (Valente and Breuker, 1996) molecular biology (Altman et al. 1999), and technical domains (Laresgoiti et al. 1996).
Automation in Natural Gas Pipeline Network Operations Although ontologies are increasingly recognized to be important for knowledge-based system development, research effort devoted to ontology construction in the area of natural gas pipeline operations is scant. While much research work has been done in application of artificial intelligence (AI) in different areas of chemical engineering such as process design and control, manufacturing, and production, the application of information technology to the natural gas transmission domain has been mostly restricted to the areas of simulation, mathematical analysis, and forecast studies. Subramanian et al. (2000) presented a computing architecture for pipeline system control and management called Sim-Opt, which combined a discrete event simulation module and an optimizer module to assist operators in decision-making processes. Zhu et al. (2001) modeled a predictive control strategy for a
279
Development of an Ontology for an Industrial Domain
large-scale gas pipeline network system. The function of a gas control unit is to maintain pipeline pressure at a desired level to ensure both safety and customer demand satisfaction. Tao and Ti (1998) analyzed fast transients in natural gas pipeline, especially where breakdowns or peak consumption periods occurred. Zhou and Adewumi (1998) modeled a two-phase flow phenomenon in a natural gas pipeline system. Liu and Lin (1991) developed a forecast model for residential consumption rate of natural gas using time series analysis. Sailor and Munoz (1997) developed a methodology for assessing sensitivity of natural gas consumption rate to climate based on a multiple regression analysis of historical energy consumption and data on climate. This brief synopsis of some relevant works in this area reveals that the research work has adopted a primarily quantitative approach and there is little research devoted to conducting a detailed analysis of knowledge characteristics in the domain of natural gas pipeline system operations. This study fills this gap in research by developing at least part of an ontology of the domain. In this chapter, the process of knowledge acquisition based on the Inferential Modeling Technique (IMT) provided the basis for ontology construction for the domain. The dual-phased process of knowledge modeling, consisting of first knowledge acquisition and analysis, and then configuring the knowledge elements into a domain ontology, illustrates the method of constructing an ontology for the application problem domain, which will be presented in section 3.
Inferential Modeling Technique The Inferential Model has been presented in (Chan 1992, 1995, 2000), here we briefly summarize the procedure of the Inferential Modeling Technique (IMT). The template of knowledge types specified by the Inferential Model serves to guide the elucidation of the search space for a given problem domain. It was used as the basis in the design of the Knowledge Modeling System (KMS). The model can be operationalised as a procedure called the IMT which facilitates the development of the “specific categories” for a given domain by presenting the knowledge engineer (KE) with a template of knowledge types and a sequence of steps whereby the elicited units can be classified. The technique consists of the following steps: (1) specify the physical objects in the domain; (2) specify the properties of objects identified in Step 1; (3) specify the values of the properties identified in Step 2, or, (4) define the properties as functions or equations, (5) specify the relations associated with objects and properties identified in Steps 1 and 2 as functions or equations, (6) specify the partial order of the relations identified in Step 5 in terms of strength factors and criteria associated with the relations, (7) specify the inference relations derived from objects and properties identified in Steps 1 and 2, (8) specify the partial order of the inference relations identified in Step 7 in terms of strength factors and criteria associated with the relations, (9) specify the tasks in the problem, (10) decompose the tasks identified in Step 9 into inference structures or subtasks (which invoke units identified in Steps 1, 2, 5, and 7), (11) specify the partial order of the inference and subtask structures identified in Step 10 in terms of strength factors and criteria, (12) specify strategic knowledge in the domain, (13) specify how strategic knowledge identified in Step 12 is related to task and inference structures specified in Steps 9 and 10, (14) return to Step 1 until the specification of knowledge types is satisfactory to both the expert and KE. This procedure involves an iterative-refinement of the knowledge elements in a problem domain and provides top-down guidance on the knowledge types that are required for problem solving. The termination of this procedure occurs when both the knowledge engineer and expert are reasonably satisfied that the knowledge model that emerges represents the problem solving expertise. The knowledge types specified in the sequence of steps above can be specified in the KMS.
APPLICATIONPROBLEMDOMAIN In natural gas pipeline operations, natural gas is transmitted from a set of source nodes to a set of sink nodes in order to satisfy customer demand. The gas is transmitted through a network of compressor stations. The operation of natural gas transmission is continuously monitored and controlled by dispatchers, who are responsible for (1) operating the system in a safe, economic, and cost-effective manner, and (2) satisfying customer load requirement. During operation, the dispatcher needs to make two decisions: (1) increase or decrease compression, and (2) select individual compressor units to turn on/off. These decisions determine effectiveness of the natural gas pipeline
280
Development of an Ontology for an Industrial Domain
operation. The dispatcher adds compression to the pipeline system by turning on one or more compressors when customer demand for natural gas increases, and turns off one or more compressors to reduce compression in the pipeline system when customer demand for natural gas decreases. Natural gas pipeline operations is a tedious and stressful job, and monitoring and control decisions made by the operators are often based on subjective judgments on the state of operations. A dispatcher is expected to operate several systems over a long period of time. Often a dispatcher operates the system without understanding the theory behind it, and since there are no standardized procedures for making control decisions, a dispatcher can perform the task according to his/her preferences. Hence consistent performance across dispatchers is rare and mistakes are common. For example, a dispatcher can issue an unnecessary start or stop compression command because s/he is not certain what should be the appropriate action under particular system conditions. Therefore, an expert decision support system is needed to aid the dispatcher in the decision making process by optimizing natural gas pipeline operations to satisfy customer demand with minimal operating costs. The decision support system we developed is called the Gas Pipeline Operations Advisor (GPOA); it can support the dispatcher in determining state of the linepack and selecting compressors to turn on or off. The knowledge base of the expert system was developed based on analysis of historical data, heuristic knowledge acquired from experts, and a computer simulation model. Based on the total inline flows and current system conditions, the GPOA can inform the dispatcher on whether compression should be added or decreased in the pipelines and the horsepower requirement needed to satisfy customer demand. To better focus research effort, the GPOA only addressed a small section of the natural gas pipeline in Saskatchewan Canada, calledthe St. Louis East compressor station. Customer demand for natural gas fluctuates depending on the season; the demand for natural gas in winter is usually higher than in the summer. It also changes depending on the time of day. The gas transmission company used a simulator program that generated a compressor discharge pressure curve versus customer station pressure curve to illustrate pipeline operations as shown in Figure 1. Based on this demand versus supply information, the dispatcher controls the volume of natural gas supplied to the St. Louis East customers. The gap between the two graphs is called the “comfort zone.” If the comfort zone is wide, then customer satisfaction is guaranteed but the cost is high; on the other hand, if the gap is small, then operation is cost effective but future customer demand may not be satisfied.
Figure 1. Demand versus supply graph
281
Development of an Ontology for an Industrial Domain
KnowledgeNOWLEDGEUIITIONING The knowledge acquisition process consisted of the three phases of knowledge elicitation, knowledge analysis, and knowledge representation. The primary knowledge source was the expert dispatchers; relevant documents or technical reports were also consulted. The expert dispatchers were interviewed on a weekly basis for a period of eighteen months. During the interviews, the expert dispatchers described the natural gas transmission system and its components. The information provided the basis for the subsequent step of knowledge analysis using the IMT, which supports the knowledge engineer in identifying the different knowledge types of the domain. The role that IMT plays in the KA process is depicted in Figure 2. According to the IMT, some sample knowledge elements in this domain include the following: • • •
•
•
classes of concrete or abstract objects, for example, a pipeline, which is a medium between a compressor station and customers, is represented as a concrete class, attributes that describe a class, for example, temperature is a descriptor for the pipeline, values of attributes can be numeric, symbolic or logical. For example, the value of capacity of a gas compressor at the St. Louis station is numeric and is 600 break horsepower or BHP. An example of a logical value is status of the gas compressor at St. Louis, which is either on or off. relations between two or more classes of objects, for example, the inheritance relationship between an electrical compressor and its parent, the compressor, which is expressed as “an electrical-compressor is a compressor” tasks are sets of activities that accomplish some objectives, for example, the task of compressor selection involves selecting a compressor in order to put additional pressure into the pipelines.
Figure 2. Inferential modeling technique for KA and ontology construction
282
Development of an Ontology for an Industrial Domain
The acquired knowledge analyzed with IMT was represented in tables. Table 1 is a sample table that represents the class of compressors in the domain of natural gas pipeline operations, which includes details on the class such as the related super class or sub-class, attributes and values. Some notes on the compressor class are included in the column of “definition of domain items”. The IMT was used to clarify knowledge on classes, task and strategy. The dispatcher performed the tasks involved in monitoring and control of gas pipelines according to some implicit strategies. In the following, some sample tasks and strategy are described in detail to illustrate some considerations a dispatcher has in operating the natural gas pipelines. The most important task for a dispatcher is that of monitoring the gas pipeline operations. The expert dispatchers suggested the key variable to observe in monitoring pipeline operations was line-pack level, which is defined as the volume of natural gas that exists between the compressor discharge pressure and the customer end-point delivery pressure. In addition to line-pack level, three other conditional variables and one decision variable are crucial for gas pipeline operations. The former include (a) rate of change of pressure at end point, (b) change of pressure at end point, and (c) total flow in the pipeline; and the decision variable is the state of the line-pack, which measures the value of the comfort zone. The causal relationships among these variables are represented in ten decision tables, a sample of which is shown in Table 2. Another sample task is that of controlling compressors at the compressor stations, which provides guidelines to operators on deciding which compressor units to turn on or off based on run time and operating cost of each unit. The prioritized order for operating the St. Louis compressor units are as follows: (1) when volume of gas in the system is less than 400 cubic metres, the break horsepower (BHP) requirement is zero; this situation means free flow; (2) when BHP requirement is between 0 and 800 cubic metres, turn on gas compressor numbered 1 at the St. Louis station; (3) when BHP requirement is between 800 and 1200 cubic metres, turn on gas compres-
Table 1. Sample domain class with attributes and values Object No # O1
Raw data of domain items Compressor
Ontology items
Definition of domain items
Attribute (A)
Value (V)
Class compressor
Subclasses of compressor are electrical-compressor and gascompressor.
Type
Reciprocating, rotary vane, rotary screw, injection, liquid ring
Have attributes.
Horsepower
250, 250,600,600,600
Compression capacities
(x in msscsfd or m3/d)
Model number
(alphanumeric)
Have relation with Gas_pipeline, and Gas_supplier_station
Table 2. A sample decision table relating key variables Current Line-pack (510 to 550) Rate of change of pressure (+) Total Flow 400 to 560
Change of Pressure -200 to -125
-125 to -50
-50 to 50
50 to 125
125 to 200
E
H
H
H
H
560 to 720
E
E
E
H
H
720 to 880
E
E
E
E
H
880 to 1040
L
E
E
E
H
1040 to 1200
L
E
E
E
H
283
Development of an Ontology for an Industrial Domain
sor numbered 1 at the St. Louis station and gas compressor numbered 2 at the Melfort station; (4) when BHP requirement is between 1200 and 1600 cubic metres, turn on gas compressor numbered 1 at the St. Louis station, gas compressor numbered 2 at the Melfort station, and electrical compressor numbered 1 at St. Louis station; (5) when BHP requirement is between 1600 and 2200 cubic metres, turn on gas compressor numbered 2, electric compressor numbered 1 at the St. Louis station, and gas compressors numbered 2 and 3 at the Melfort station. It was assumed the BHP requirement would not exceed 2000 cubic metres under normal conditions. A sample strategy in the problem domain indicates that three steps are involved for determining if the linepack level in the pipeline at St. Louis East system is high, low, or enough. They are: (1) obtain data on pressure, gas temperature, ambient temperature, and flows from the natural gas pipeline system, (2) based on information on pressure at the St. Louis station and Melfort station, and flow at Nipawin and Hudson Bay, calculate the conditional variables of linepack level, change of pressure, rate of change of pressure, and total inline flow, (3) based on values of the four conditional variables, identify from the decision tables the corresponding state of linepack (E for enough, H for high, or L for low).
DEIGNIMPLEMENTATIONOFTHEMODELINGYTEM The IMT provided the theoretical basis for developing the Knowledge Modeling System (KMS), which provides the knowledge engineer with automated support in designing and implementing an application ontology. The KMS enables the user to specify all the static and dynamic knowledge elements specifiable in the IMT. The tool consists of two primary modules, the class module and the task module, and an auxiliary conversion module. The class module enables specification and storage of static knowledge elements such as classes, objects, attributes, their values, and supports specification of semantic relationships among classes. The task module supports specification and storage of the dynamic knowledge elements of a domain including the objectives, tasks, subtasks, and processing details of the tasks. The class and task modules are linked so that the task module calls upon the objects specified in the former module. The main components of class and task in the KMS correspond to the orthogonal axes of static and dynamic knowledge in a domain as specified in the IMT. The two components are complementary and together can adequately represent most types of knowledge implicit in an industrial application domain. The system accepts user input through the user interface to either the class or task component. The design of the KMS is shown in Figure 3.
Class Module The class module corresponds to the domain and inference levels in the IMT and consists of the domain and inference classes, attributes, and values. This module documents the user’s static knowledge of an application domain such as classes of objects, attributes and values associated with each class, and relationships between the classes. The classes specified can refer to either concrete or conceptual entities in the real world. The KMS also supports specification of binary relationships between classes. In the current version of the system, only the inheritance or a-kind-of relationships between parent and child classes is supported. Hence, the attributes specified for the parent or superclass can be inherited by the children or subclasses. In addition, a class can also have its own specific attributes. Based on the inheritance relationships, the KMS automatically configures a classification hierarchy of all the classes and subclasses in the domain. If a class is involved in a ternary or higher order relationship, it is specified as a “constraint” in text. The operations supported in the Class Module include addition, modification and removal of the knowledge elements of: (a) a class, (b) an attribute for a class, (c) a possible value for an attribute, and (d) a relationship between classes.
284
Development of an Ontology for an Industrial Domain
Figure 3. Design of KMS User input User Interface
Class Component
Task Component
Export
XML, HTML
Import from
Write/read
User data (*.mdb)
XML, User data (*.mdb)
Task Module The task component of the system documents knowledge about the dynamic aspect of an application domain. Based on the IMT, a task is an organized structure or sequence of activities performed to accomplish some objective. In the KMS, objectives and tasks are independent and managed separately. A task is linked to an objective provided the latter needs the former to complete itself. The detailed steps involved in a task structure are described as behaviour. Similar to the hierarchical structure relating classes and subclasses, tasks are organized into a hierarchical structure so that a task can be divided into subtasks. Also similar to the notion of attributes for classes, properties can be defined in the KMS to describe tasks. Properties or characteristics associated with each task can be either static or dynamic. The Task Module consists of operations to add, modify, and remove the knowledge elements of: (a) objective, (b) task, (c) behaviour of the associated task, (d) percentage of completion of the associated task, (e) priority of the associated task. The module also supports modifying the task hierarchy and specification, disassociating objects from a task, and disassociating tasks from an objective.
Interaction Between Class and Task Modules According to the IMT, the dynamic knowledge of a problem domain is intricately intertwined with the static knowledge. That is, the tasks and subtasks manipulate classes of objects in order to accomplish an objective. In the KMS, the interaction between tasks and classes is implemented as the task component invoking particular class objects defined in the class component of the tool.
KNOWLEDGEREPREENTATIONINGTHE MMODELING SYTEM The knowledge elements clarified using IMT provided the basis for an ontology of the domain, which was designed and documented in the Knowledge Modeling System, which at the time of writing, is a first version prototype of an ontology construction tool (Chan 2004). The tool aims to support the knowledge engineer in documenting and configuring the knowledge elements clarified during knowledge analysis into a domain ontology. The design of
285
Development of an Ontology for an Industrial Domain
the system is based on the IMT. Figure 4 shows a sample input screen of the Knowledge Modeling System (KMS) that allows a knowledge engineer to enter information on classes, sub-classes, attributes and values of a domain. The top left panel of the screen shown in Figure 4 shows the classes and subclasses, some samples of which have been shown in Table 1. All the classes are listed on the lower left panel of the screen. The highlighted class is gas-pipeline, and its class-specific and inherited attributes are listed on the lower right panel of the screen. The inherited attributes are prefixed with “#”. The attribute of “change of pressure at end point (COP)” is highlighted, and its possible values are listed on the top right panel of the screen. The sample task of determining the break horsepower or BHP requirement is presented as follows. On reading the volume or load from the SCADA system, a dispatcher can determine the BHP requirement using the following equation: BHP=277411 × (St. Louis Flow + Melfort Flow) – 1132 For example, if the load is 900×103/day, the BHP requirement can be calculated to be 1400. The dispatcher can also consult the following prioritized list of compressors to be turned on at different ranges of BHP requirement (where G1 is gas compressor numbered 1 and E1 is electrical compressor numbered 1 etc.): 1. 2. 3. 4. 5.
Free flow (no compression) (0 < BHP ≤ 800) St. Louis G1 (800 < BHP ≤1200) St. Louis G1 and Melfort G2 (1200 < BHP ≤ 1600) St. Louis G1, Melfort G2, and St. Louis E1 (1600 < BHP ≤ 2000) St. Louis G1 and E1, Melfort G2 and G3
Figure 4 Representation of classes, attributes and values in KMS
286
Development of an Ontology for an Industrial Domain
Figure 5 User interface for representing task knowledge
The prioritized list of compressor operation constitutes important information for the dispatcher, and is documented in the task component of the KMS. The top left panel of the screen in Figure 5 shows the task objectives in the problem domain of monitoring and control of gas pipelines. All the tasks involved in the highlighted objective of “determine horsepower requirement” are listed in the middle panel of the screen. On the top right panel, a decomposed list of the tasks and subtasks in the domain are shown. For example, the second task in the top right panel states “compressor selection.” The subtask under this includes the prioritized list of compressors. By clicking on “follow the order from 1 to 5 based on BHP requirement,” the prioritized list for operating the compressors at the two stations is displayed in the bottom right panel of the screen.
THEPIPELINEOPERATIONOR The knowledge model provided the conceptual basis for the prototype expert system called the Gas Pipeline Operation Advisor (GPOA). The objective of GPOA was to assist the dispatcher in two tasks: (1) determine if sufficient natural gas was in the pipeline system and make a recommendation on the linepack level and (2) determine the horsepower requirement and select compressor(s) to turn on/off. The knowledge modeling process clarified knowledge in the domain and provided the basis for building an expert system for monitoring and control of the natural gas pipeline operations. Knowledge items identified using IMT were used directly in developing the knowledge model and later in construction of the expert decision support system. For example, the St. Louis and Melfort compressor stations and their discharge and suction pressure were first defined using IMT, represented in the knowledge tables, documented in the Knowledge Modeling System, and became objects and attributes in the implemented system of GPOA. The system was implemented in G2 of Gensym Corporation, U.S.A., and some details on the system can be found in (Uraikul et al. 2000).
Preliminary Evaluation of GPOA Preliminary evaluation was done by a group of experts from the natural gas transmission company. They were presented with some details of the knowledge acquisition process, such as the decision tables, and the implemented
287
Development of an Ontology for an Industrial Domain
GPOA. They were satisfied with the conceptual model upon which the system was built. The experts also tested the GPOA using different scenarios, and the performance was satisfactory. They evaluated the system outputs of alarms and suggested solutions, and were satisfied with their usefulness and validity.
CONCLUIONFUTURE This chapter has presented the method of ontology construction based on the IMT, and discussed application of the method in developing an application ontology in the domain of natural gas pipeline operations. The ontology construction process involved two stages. First, acquisition and analysis of domain knowledge was conducted using the IMT. Secondly, the knowledge items identified were modeled and represented in a domain ontology which is documented in the KMS. In this approach, the IMT provided the scaffolding upon which the ontology was built, and the clarified knowledge was represented using tables. The knowledge explicated in the tables were documented and formalized in the automated tool of the Knowledge Modeling System. The knowledge and ontology models provided the basis for building an expert decision support system for monitoring and control of natural gas pipeline operations. Domain knowledge obtained and clarified during the processes of knowledge acquisition and analysis using the IMT contributed directly to the ontology modeling and construction phases of the system development process. Ontology construction methods and application ontologies contribute to the support for autonomous computing in that they form components of the Semantic Web deemed necessary for providing semantic anchor for autonomous software agents. Application ontologies are explicit specifications of the vocabulary and constraints in a problem domain. If there should be diverse systems or agents that require information on a particular problem domain, the semantic structure furnished by an application ontology is an indispensable foundation upon which the systems or agents can obtain semantic clarification for information processing. Alternatively, semantic information captured in one or more ontologies can be used by autonomous tools such as agents for inference operations or for generating solutions for user queries.
ACnowledgement The author would like to acknowledge the contributions of U. Varanon and W. Jin to this work. We are grateful for the co-operation and support of SaskEnergy/Transgas in this project. We also would like to acknowledge the generous financial support of a Research Grant from Natural Sciences and Engineering Research Council (NSERC) of Canada.
REFERENCE Altman, R.B., Bada, M., Chai, X.J., Carillo, M.W., Chen, R.O., & Abernethy, N.F. (1999). RiboWeb: An ontology-based system for collaborative molecular biology. IEEE Intelligent Systems, 14(5), 68-76. Bateman, J. (1990). Upper modeling: A general organization of knowledge for natural language processing. In Paper prepared for the Workshop on Standards for Knowledge Representation Systems, Santa Barbara. Borgo, S., Guarino, N., & Masolo, C., (1997). An ontological theory of physical objects. In Ironi L (ed.), Proceedings of Eleventh International Workshop on Qualitative Reasoning (QR’97), Cortona (Italia), 3-6, 223-231. Chan, C.W. (2004, May 2-4). A knowledge modeling system. Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering (CCECE ’04) (pp.1353-1356). Niagara Falls, Ontario. Chan, C.W. (2000). A knowledge modelling technique and industrial applications. In C. Leondes, (Ed.), Knowledge-Based Systems Techniques and Applications, 34(4), 1109-1141. USA: Academic Press.
288
Development of an Ontology for an Industrial Domain
Chan, C.W. (2002, August 19-20)Cognitive informatics: A knowledge engineering perspective. Proceedings of First IEEE International Conference on Cognitive Informatics (ICCI 02) (pp. 49-56). Calgary Alberta. Chan, C., Kinsner, W., Wang, Y., & Miller, D.M. (eds.) (2004, August). Cognitive informatics. Proceedings of the 3rd IEEE International Conference,(ICCI’04). Victoria, Canada: IEEE CS Press. Chan, C.W. (1995). Development and application of a knowledge modeling technique. Journal of Experimental and Theoretical Artificial Intelligence, 7, 217-236 Chan, C.W. (1992). Knowledge acquisition by conceptual modeling. Applied Mathematics Letters Journal, 3, 7-12. Chandrasekaran, B. (1986). Generic tasks in knowledge-based reasoning: High-level building blocks for expert systems design. IEEE Expert, 1(3), 23-30. Chandrasekaran, B., Josephson, J.R., & Benjamins, V.R. (1998). Ontology of tasks and methods. 11th Knowledge Acquisition for Knowledge-Based Systems Workshop ‘98 (KAW 98) (pp. 6.1-6.21). Banff, Canada. Clancey, W.J. (1985). Heuristic classification. Artificial Intelligence, 27, 289-350. Flores-Mendez, R.A., Van Leeuwee, P., Lukose, D. (1998). Modeling expertise using KADS and MODEL-ECS. In B.R. Gaines & M. Musen (eds.) Proceedings of the 11th Knowledge Acquisition for Knowledge-based Systems Workshop (KAW’98), 1 3-14, 19-23. Banff, Canada. Gruber, T. (1992, October). A translation approach to portable ontology specifications. Proceedings of the 7th Banff Knowledge Acquisition Knowledge-based Systems Workshop ’92 (pp. 11-16). Paper No. 12. Banff, Canada. Guarino, N., & Giaretta, P., (1995). Ontologies and knowledge bases: Towards a terminological clarification. In Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, (pp. 25-32) N. Mars (Ed.), Amsterdam, The Netherlands: IOS Press. Hanh, U., Schulz, S., & Romacker, M., (1999). Part whole reasoning: A case study in medical ontology engineering. IEEE Intelligent Systems, 14(5), 59-67. Jasper, R., & Uschold, M. (1999, August). A framework for understanding and classifying ontology applications. In Proceedings of the IJCAI99 Workshop on Ontologies and Problem-Solving Methods (KRR5) (pp. 11.1-11.12). Stockholm, Sweden. Kinsner, W., Zhang, D., Wang, Y., & Tsai, J. (eds.) (2005, August). Cognitive informatics. Proceedings of the 4th IEEE International Conference, (ICCI’05). Irvine, CA: IEEE CS Press. Laresgoiti, I., Anjewierden, A., Bernaras, A., Corera, J., Schreiber, A. TH, & Wielenga, B.J. (1996). Ontologies as vehicles for reuse: A mini-experiment. In B.R. Gains & M.A. Musen (eds.). Proceedings of the 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop (KAW-96) (pp.301-30.21). Banff, Canada. Liu, L., & Lin, M., (1991). Forecasting residential consumption of natural gas using monthly and quarterly time series. International Journal of Forecasting, 7, 3-16 Mcdermot, J. (1998) Preliminary steps toward a taxonomy of problem-solving methods. In Automating Knowledge Acquisition for Expert Systems (pp.225-255) S. Marcus, (ed.), Boston: Kluwer. Neches, R.. Fikes, R., Finn, T., Gruber, T., Patil, R., Senator, T., & Swartout, W.R. (1998). Enabling technology for knowledge sharing. AI Magazine, 12(3), 37-56. Newell, A. (1982). The knowledge level. AI Magazine, 18(1), 1-20. Patel, D., Patel, S., & Wang, Y. (eds.) (2003, August). Cognitive informatics. Proceedings of the 2nd IEEE International Conference (ICCI’03). London, UK: IEEE CS Press.
289
Development of an Ontology for an Industrial Domain
Sailor, D.J., & Munoz, J.R. (1997). Sensitivity of electricity and natural gas consumption to climate in the U.S.A. Methodology and results for eight states. Energy 22(10), 987-998. Schreiber, G., Breuker, J., Bredeweg, B., & Wielinga, B. (1988, June 19-23). Modeling in knowledge based systems development. In J. Boose, B. Gaines, M. Linster (eds.). Proceedings of the European Knowledge Acquisition Workshop (EKAW ’88) (pp. 7.1-7.15). Gesellschaft fur Mathematik und Datenverarbeitung, MBH, 7.1-7.15. Steels, L. (1990). Components of expertise. AI Magazine, 11(2), 29-49. Subramanian, D., Pekny, J.F., Reklaitis, G.V. (2000). A simulation-optimization framework for addressing combinatorial and stochastic aspects of a research & development pipeline management. Computers and Chemical Engineering, 24(7), 1005-1011. Tao, W.O., & Ti, H.C. (1998). Transients analysis of gas pipeline network. Chemical Engineering Journal, 69, 47-52. Tamma, V., Phelps, S., Dickinson, I., & Woolridge, M. (2005). Ontologies for supporting negotiation in E-Commerce. Special Issue Engineering Applications of Artificial Intelligence, 18(2), 223-236. Uraikul, V., Chan, C.W., & Tontiwachwuthiku, P. (2000). Development of an expert system for optimizing natural gas operations. Expert Systems with Applications, 18(4), 271-282. Uschold, M. (2003). Where are the sematics in the Semantics Web. AI Magazine, 24(3), 25-36. Uschold, M., King, M., Morale, S., & Zorgios, Y. (1998). The enterprise ontology. The Knowledge Engineering Review, 13(1), 31-89. Valente, A., & Breuker, J. (1996). Towards principled core ontologies. In B.R. Gaines & M. Mussen (eds.) Proceedings of the KAW-96, Banff, Canada. Van de Velde, W., & Schreiber, G. (1997). The future of knowledge acquisition: A european perspective. IEEE Expert, 1, 1-3. Wang, Y. (2002). On Cognitive Informatics. Keynote speech at the Proceedings of the 1st IEEE International Conference on Cognitive Informatics (ICCI’02) (pp. 34-42). IEEE CS Press. Wang, Y. (2003). Cognitive informatics. A new transdisciplinary research field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 151-167. Wang, Y. (2006a, July). Cognitive informatics - Towards the future generation computers that think and feel. Keynote speech at the Proceedings fo the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 3-7). Beijing, China: IEEE CS Press. Wang, Y. (2007a, January). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), 1(1), 1-27. USA: IGI Publishing, Wang, Y., & Kinsner, W. (2006, Mrach). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 121-123. Wang, Y., Johnston, R., & Smith, M. (Eds.) (2002, August). Cognitive informatics. Proceedings of the 1st IEEE International Conference (ICCI’02). Calgary, AB, Canada: ). IEEE CS Press. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 124-133. Yao, Y.Y., Shi, Z.,Wang, Y., & Kinsner, W. (Eds.) (2006). Cognitive informatics. Proceedings of the 5th IEEE International Conference (ICCI’06) 1-2. Beijing, China: IEEE CS Press.
290
Development of an Ontology for an Industrial Domain
Zhou, J., & Adewumi, M. A. (1998). Transients in gas-condensate natural gas pipelines. Journal of Energy Resources Technology, 120, 32-40. Zhu, G., Henson, M.A., & Megan, L. (2001). Dynamic modeling and linear model predictive control of gas pipeline networks. Journal of Process Control, 11, 129-148
291
292
Chapter XX
Constructivist Learning During Software Development Václav Rajlich Wayne State University, USA Shaochun Xu Laurentian University, Canada
Abstract This article explores the non-monotonic nature of the programmer learning that takes place during incremental program development. It uses a constructivist learning model that consists of four fundamental cognitive activities: absorption that adds new facts to the knowledge, denial that rejects facts that do not fit in, reorganization that reorganizes the knowledge, and expulsion that rejects obsolete knowledge. A case study of an incremental program development illustrates the application of the model and demonstrates that it can explain the learning process with episodes of both increase and decrease in the knowledge. Implications for the documentation systems are discussed in the conclusions.
Introduction One of the puzzling issues of software engineering is the nature of the knowledge that is needed in order to develop and evolve a program. The program itself is a repository of knowledge about the program domain and may contain knowledge that is not available elsewhere, as documented by Kozaczynski and Wilde (1992). It also contains knowledge of all design decisions that were made during the program development and consequent program evolution (Rugaber, Ornburn, & LeBlanc, 1990). When evolving or maintaining the program, it is necessary to recover this knowledge; otherwise, maintenance or evolution will be impossible. It is also necessary to communicate this knowledge to all new programmers who are joining an existing software project. The loss of the programming knowledge can be a serious problem and was identified as a leading cause of the code decay (Rajlich & Bennett, 2000). Although the knowledge is embedded in the program, it cannot be easily recovered since it is encoded in programming structures and delocalized into different components of the program. Moreover, the consequences of the decisions, rather than the decisions themselves, appear in the code. In many ways, the recovery of knowledge from the code is similar to solving a puzzle and is laborious and error prone.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Constructivist Learning During Software Development
One of the most basic questions that concerns the nature of the programmer knowledge is the issue of its monotonicity. According to a naïve view, the knowledge steadily increases, as the new facts emerge and are absorbed by the programming team; many current documentation systems are geared towards that (Ye, 2006). However, in this article we show that there are also episodes of the knowledge retraction, and the documentation systems should provide an adequate support for that also. Our approach in the article is based on cognitive informatics (CI). CI is a multidisciplinary study of cognition and information sciences, which investigates human information processing mechanisms and processes and their applications in computing (Wang & Kinsner, 2006); studying the knowledge and cognitive process involved in software development is one of the goals of cognitive informatics. In order to understand the nature of programming knowledge and its acquisition, we adopted and further developed a constructivist model of programmer learning that is based on four basic cognitive activities: absorption, denial, reorganization, and expulsion of the knowledge. We validated this model in a case study of the pair programming that is a part of eXtreme Programming (Martin, 2002). In pair programming, two programmers work side-by-side at one machine as they collaborate in program design, implementation, and testing. The programming pair has to communicate and share the knowledge, and this gives an opportunity to analyze unobtrusively their dialog for the indications of the programmer knowledge and learning. The first section of this article describes our theory of constructivist learning. The second section describes the case study. The third section contains the discussion of the results of the case study and the fourth section has an overview of the related literature. The fifth section contains general conclusions and future work.
Theory of Constructivist Learning The constructivist learning model is based on the work of Piaget (Piaget, 1954). The original aim of Piaget was to explain learning in children, but the constructivist theory extends to adult learning and to epistemology (von Glasersfeld, 1995). The theory assumes that the learners actively and incrementally construct their knowledge. They start from some preliminary knowledge, and they extend it by adding new facts to it; they may go through stages in which they may accept ideas that they will later discard as wrong. The two main activities are assimilation and accommodation, where assimilation describes how learners deal with new knowledge, and accommodation describes how learners reorganize their existing knowledge. We modified this theory by dividing assimilation into two separate activities. Absorption means that the learners add new facts to their knowledge. However, if the new facts do not fit in, the learners may reject them; we call this second activity a denial. We also divided accommodation into two separate activities. Reorganization means that the learners reorganize their knowledge to aid future absorption of new facts. Expulsion is the process where part of the knowledge becomes obsolete or provably incorrect and the learners reject it. Of course, there are also mixed activities: learners may absorb a modified fact, rather than make an outright denial; learners may reorganize their knowledge while absorbing new facts, and so forth. Table 1 lists the four basic learning activities.
Table 1. Learning activities Activity
Symbol
Characterization of the activity
Absorption
a
The learners add new facts to their knowledge.
Denial
d
The learners reject the new facts that do not fit in.
Reorganization
r
The learners reorganize their knowledge to aid future absorption.
Expulsion
e
A part of the knowledge becomes obsolete and the learners reject it.
293
Constructivist Learning During Software Development
Table 2. Analogy between programming activities and cognitive activities Programming Activities
Cognitive Activities
Incremental change
Absorption
Rejection of change request
Denial
Refactoring
Reorganization
Retraction
Expulsion
In order for learning to occur, the learners must possess preliminary knowledge. Preliminary knowledge makes learning possible; the more the learners know, the more they can learn. Sometimes this preliminary knowledge turned out to be inaccurate or even completely wrong, and the learners employed the four cognitive activities to build more accurate knowledge. The theory is particularly suitable to the situations where learners must discover the facts of the knowledge on their own, without a teacher, which is a common situation in software engineering. The assertion of this theory is the following: When the process of learning is recorded and divided into episodes, every episode can be classified in terms of the categories of the constructivist learning model. The model would be falsified (Popper, 2003) if there were episodes that laid outside of our classification scheme, or if independent observers frequently arrived at different conclusions about the same observed episode, indicating that the classifications are arbitrary. Each episode deals with specific concepts that are part of the knowledge. In the context of program development, the concepts can be classified as either domain concepts or programming concepts. Domain concepts belong to a specific domain that the program addresses (Biggerstaff, Mitbander, & Webster, 1994), while programming concepts belong to the knowledge of programming, such as the programming language, program development process, design decisions, and so forth. Design decisions involve program architecture, selection of the program classes, methods, or attributes, and so forth (Ran & Kuusela, 1996). The leaning process can be explained in terms of analogy with incremental program development. The knowledge that is constructed by the programmers is analogous to the program they incrementally develop, and the activities of their learning are analogous to the activities of incremental development. The analogy is summarized in Table 2 where analogous terms are in the same row. The case study of the next section examines an application of this model.
The Case Study For our case study, we utilized the eXtreme Programming process (Beck, 2000), where pair programming is one of the recommended practices. Pair programming offers a unique opportunity to study the parallel construction of both the program and the programmer knowledge. In pair programming, the programmers communicate with each other about the evolving program. By recording and analyzing their dialog, we study how both the knowledge and program grow. The program developed during this case study records bowling scores (Martin, 2002). The programming was done by two experts who have been working in the industry for more than 25 years (Martin, 2002). The programming pair started the process with the preliminary knowledge of the domain (bowling rules), represented by the concept map in Figure 1 (Novak, 1998), where concepts are represented by rectangles and the arrows represent the dependencies. The dependencies stand for the order in which the concepts have to be explained to somebody who is not familiar with the domain; a concept can be explained only if all previous concepts on which it is dependent have been explained and understood. Preliminary programming knowledge includes knowledge of the programming language, algorithms, eXtreme Programming practices, and so forth.
294
Constructivist Learning During Software Development
Figure 1. Domain concepts game
score
team
scorer
current score
strike
spare
frame
throw
pin
ball
Figure 2. UML diagram of the first version TestFrame TestCase
TestThrow testThrow()
Throw
TestFrame() testScoreNoThrow() testAddOneThrow()
Frame getScore() add()
Score
Equipped with this knowledge, the programmers implemented a sequence of the program versions and recorded their dialog. The UML class diagram (Fowler, 1999) of the first version is in Figure 2. The design decisions that led to this diagram were extracted from the recorded dialog and appear in Figure 3. The rectangles represent design decisions, while arrows represent the order in which the design decisions were made. Dark rectangles represent domain concepts that serve as the basis of some of the design decisions. Please note that the order of the design decisions does not correspond to the order of the dependencies in the UML diagram. If the two orders were identical, that would mean that all design decisions were done in bottom-up order.
295
Constructivist Learning During Software Development
Figure 3. Design decisions for the first version testScoreOneThrow
testScoreNoThrow
TestFrame
add
Score
getScore
Frame
frame
score
TestThrow
Throw
throw
The development continued through several steps; see the progress of the learning in Figure 4 that shows the situation toward the end of the dialog. Programmers realized that the class “Frame” and the related classes do not contribute toward the functionality of the program and deleted them. The resulting UML class diagram of the program is in Figure 5. We divided this dialog into 94 small episodes, and then three independent observers classified each episode as one of the four cognitive activities, using the characterization of the activities in Table 1. Out of 94 episodes, only 5 episodes were classified differently by each of the observers; in 30 episodes there was a partial disagreement where one observer differed from the other two; in these cases we discussed the cases and chose the majority opinion as the final classification. While acknowledging these differences, we found the classification in general was acceptable and reliable. The sample dialog and its resulting classification are shown in Table 3.
The Result of the Case Study In the case study, we observed that the knowledge required by even a small program is quite extensive, as demonstrated in Figure 4. It should be remembered that these figures are only the “tip of the iceberg,” as there is a large preliminary knowledge of the domain and programming that these figures do not capture. Yet all that knowledge is necessary to evolve the program. During the incremental program development, the most common cognitive activity was absorption (about 71.3%) as shown in Table 4, and it dominated the programming process from the beginning to the end. This coincides with the intuition that the knowledge increases during the program implementation. While absorption was the driving force of the learning, there was one large episode of knowledge expulsion in the last part of the dialog. Class “Frame” of Figure 4 and six related design decisions out of the total of 39 were retracted as a part of code “cleaning,” resulting in a substantial knowledge change. This episode illustrates the non-monotonic nature of constructivist learning. The changing number of concepts are presented in Figure 6. There were also episodes of reorganization, which amounts to 17% of all episodes. The rejection of concepts “Score Card” and “Team” in the first part of the dialog are examples of denial. We observed that the absorption occurred mostly at the beginning and middle of the dialog, while the expulsion happened most often in the last part. Reorganization appeared mostly in the middle and last parts, and denial appeared only in the first and middle parts of the dialog.
296
Constructivist Learning During Software Development
Figure 4. Design decisions after episode 87
testHeartBreak
testEndOfArray
Strike
testSimpleStrike
testTenthFrameSpare
testSimpleGame
testPerfectGame
testSimpleSpare setUp
getCurrentFrame
testSimpleFrameAfterSpare
strike Spare spare
testFourThrowsNoMark
firstThrowInFrame
testScoreOneThrow
secordThrow
adjustCurrentFrame testTwoThrowsNoMark firstThrow TestGame
ScoreForFrame Ball
ball
add score itsCurrentFrame firstThrow
testScoreNoThrow
Pins
pins TestFrame
itsScore getScore
itsCurrentThrows
Game
Score
score
add TestThrow
Frame game frame
Throw
throw
Programmers discussed nine domain concepts in three episodes before the coding started, but during the rest of the development almost all the domain concepts were revisited, and their comprehension was updated. The domain concept “Frame” was updated as many as nine times, with each update resulting in more accurate knowledge. Many design decisions were also updated, but less frequently than domain concepts. As a result of the case study, we conclude that the constructivist learning model explains the learning that takes place in incremental software development and that the learning is significantly non-monotonic.
Related Work CI is a multidisciplinary research area that is at the intersection of cognitive science, neural psychology, philosophy, software engineering, and other disciplines (Wang, 2002). Wang (2004) previously investigated the relationship between software engineering and cognitive informatics.
297
Constructivist Learning During Software Development
Figure 5. Final program TestGame g : Game TestGame() testTwoThrowsNoMark() testFourThrowsNoMark() testSimpleSpare() setUp() testSimpleFrameAfterSpare() testSimpleStrike() testP erfectGame() testEndOfArray() testSampleGame() testHeartBreak() testTenthFrameSpare()
Game itsCurrentFrame : int firstThrowInFrame : boolean itsScorer : Scorer score() add() adjustCurrentFrame() handleSecondThrow() advanceFrame() adjustFrameForStrike()
Scorer ball : int itsThrows : []int itsCurrentThrow : int addThrow() scoreForFrame() strike() spare() nextTwoBalls() nextBall() twoBallsInFrame()
There have been numerous publications on constructivist learning, including the classical work of Piaget (Piaget, 1954). von Glasersfeld (1995) described constructivism as a theory of knowledge and defined radical constructivism as a theory of learning. Novak (1998) addressed theories of learning, knowledge, and instruction and used concept maps as tools to describe the knowledge. The knowledge graphs of this article follow Novak’s approach.
298
Constructivist Learning During Software Development
Table 3. Fragment of the dialog and the classification Concepts updated
No
Programmers’ actions
Activity
35
K: Why the magic number 21? M: That’s the maximum possible number of throws in a game. K: scoreForFrame needs to be refactored to be more communicative.
r
36
K: But before we consider refactoring, let me ask another question: Is Game the best place for this method? In my mind, Game is violating Bertrand Meyer’s SRP (Single Responsibility Principle). It is accepting throws and it knows how to score for each frame. What would you think about a Scorer object?
a
37
M: But there are side-effects in the score+= expression. They don’t matter here because it doesn’t matter which order the two addend expressions are evaluated in. K: I suppose we could do an experiment to verify that there aren’t any side-effects, but that function isn’t going to work with spares and strikes. Should we keep trying to make it more readable or should we push further on its functionality?
d
38
M: The experiment would only have meaning on certain compilers. Other compilers might use different evaluation orders. Let’s get rid of the order dependency and then push on with more test cases.
r
39
M: Next test case. Let’s try a spare. K: Let’s refactor the test and put the creation of the game in a setUp function.
r
40
M: That’s better now. Let’s write the spare test case. I think the increment of ball in the frameScore==10 case shouldn’t be there. Here’s a test case that proves my point.
a
Spare
41
M: See, that fails. Now if we just take out that pesky extra increment. It still fails.... Could it be that the score method is wrong?
d
Score
42
M: I’ll test that by changing the test case to use scoreForFrame (2).
a
43
M: That passes. The score method must be messed up.
a
44
M: That’s wrong. The score method is just returning the sum of the pins, not the proper score. What we need score to do is call scoreForFrame with the current frame.
a
45
K: We don’t know what the current frame is. Let’s add that message to each of our current tests, one at a time.
a
46
M: OK, that works. But it’s stupid. Let’s do the next test case.
a
47
M: Let’s try the next. This one fails. Now let’s make it pass.
a
48
K: I think the algorithm is trivial. Just divide the number of throws by two, since there are two throws per frame. Unless we have a strike ... but we don’t have strikes yet, so let’s ignore them here too. What if we don’t calculate it each time? What if we adjust a currentFrame member variable after each throw?
a
Game, Scorer
Score
Frame
Other notable learning theories include Vygotsky (1978), who proposed the social cognition learning that emphasizes social interaction in the development of cognition. Although social interaction is an important element during software engineering processes, we believe that our emphasis must be on the individual programmers and their autonomous construction of their knowledge, and hence we found this theory less applicable. Observational learning theory states that learning occurs through the simple processes of observing someone else’s activity (Biederman, Stepaniuk, Davey, Raven, & Ahn, 1999). However, software engineering processes involve more complicated processes than simple observation; therefore, observational learning is not suitable here. Behaviorism focuses on the objectively observable behaviors at the expense of mental activities; therefore, we do not use it for explanation of software engineering activities. Brain-based learning is based on the structure and function of the brain and emphasizes the fact that the brain can perform several activities at once, like tasting and smelling. Learning involves both focused attention and peripheral perception and both conscious and unconscious processes (Jensen, 2000). Because brain-based learning deals with the functions of the brain rather than the cognitive activity of learners, we did not use it. Control theory claims that behavior is not caused by a response to an outside stimulus, but inspired by what a person wants most, such as survival, love, and freedom (Glasser, 1998). Again we did not find that view useful for the study of software engineering learning. 299
Constructivist Learning During Software Development
Table 4. The distribution of the cognitive activities in the case study Activities
beginning
middle
end
total
percentage
Absorption
27
18
22
67
71.3%
Denial
1
4
0
5
5.3%
Reorganization
0
7
9
16
17%
Expulsion
2
1
3
6
6.4%
Total
30
30
34
94
100%
Figure 6. Changing numbers of the concepts and their dependencies 50 45 40 35
number of concepts
30 25 20 15 10 5 0 1
11
21
31
41
51
61
71
81
91
episode number
We concluded that although these theories contain valuable insights, they are not directly applicable to our purposes. The most promising theories we found include the constructivist learning, which we adopted as part of our model of constructivist learning. Incremental software development has been described in numerous publications, for example Beck (2000), Martin (2002), and Williams, Kessler, Cuningham, and Jeffries (2000). Beck (2000) introduced a new approach to software development, called eXtreme Programming that is based on practices of incremental development, continuous testing, pair programming, and several other practices. Several researchers have studied knowledge contained in a program. According to Brooks (1983), a programmer understands a program through construction of a mental model that consists of successive knowledge domains. Fischer, McCall, Ostwald, Reeves, and Shipman (1994) proposed a support for the incremental development based on a specific knowledge model, where seeding, evolution, and reseeding are the three stages of knowledge capture and transformation. Henninger (1997) recognized that software development is a process involving various knowledge resources, which keep changing during the process. Robillard (1999) identified two types of knowledge: topical and episodic. Topical knowledge refers to the meaning of words, and episodic knowledge consists of people’s experience with knowledge.
300
Constructivist Learning During Software Development
In our previous work, Rajlich (2002) presented the program comprehension as a learning process. In a case study of an incremental software development, Rajlich and Xu (2003) described an analogy between incremental software development and constructivist learning. A study of program debugging was based on Bloom’s taxonomy (Bloom, 1956; Xu & Rajlich, 2004). Xu and Rajlich (2005) developed a dialog-based protocol, a novel empirical research method that is based on the analysis of the dialogs in a programming pair, and give an insight into the programmer cognitive activities. This approach may reduce Hawthorne and placebo effects that are present in other empirical techniques. Many existing software documentation tools do not pay sufficient attention to the changes in knowledge and essentially assume that the knowledge is unchanged (Forward & Lethbridge, 2002). Among them, JavaDoc extracts documentation statically from Java source files and produces formatted HTML output (Gosling, Joy, & Guy, 1996), Doc++ (Wunderling & Zцckler, n.d.) is useful for creating hierarchical documentations of class libraries; Doxygen (2004) is used for documentation of Java, C, and C++ programs. Donald Knuth’s Literate Programming (Knuth, 1984) places both source code and documentation into the same file. None of these systems directly supports non-monotonicity in the construction of the knowledge. PAS (Rostkowycz, Rajlich, & Marcus,2004) is a hypertext-based documentation system suitable for incremental redocumentation
Conclusion and Future Work In this article, we introduced a model of constructivist learning and used it to explain programmer learning during a software engineering process. The model is based on the four cognitive activities: absorption that adds new facts to the knowledge, denial that rejects facts that do not fit in, reorganization that reorganizes the knowledge, and expulsion that rejects obsolete knowledge. We validated the model in a case study of pair programming during incremental program development. The data of this article are based on the analysis of the dialog in a programming pair. From the data we concluded that the classification of programmers’ actions according to the constructivist model of learning reflects well the process of learning. We noted that the knowledge required for incremental software development is large and changes rapidly. We plan to investigate nature of this knowledge in more detail and see whether there are any particular substructures of this knowledge that require greater programmer attention. We observed episodes of both knowledge increase and decrease and hence the growth of the knowledge in non-monotone. This fact is important for the program documentation systems. Future work includes additional studies of pairs of programmers during the development of larger programs, in order to further assess the insights provided by the constructivist learning, and to analyze the differences between novices and experts. We also plan to develop specialized documentation tools that will support the nonmonotonic nature of the knowledge growth during the program development.
Acknowledgment The authors would like to acknowledge Jay Rajnovich for his help with writing this article and Claudia Iacob for being one of the observers who classified the episodes. We also thank the anonymous reviewers for their comments that significantly improved this article. This research was supported in part by grants from the National Science Foundation (CCF-0438970), the National Institute for Health (NHGRI 1R01HG003491), and by 2005 and 2006 IBM Faculty Awards. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF, NIH, or IBM.
301
Constructivist Learning During Software Development
References Beck, K. (2000). Extreme programming explained. MA: Addison-Wesley. Biederman, G. B., Stepaniuk, S., Davey, V. A., Raven, K., & Ahn, D. (1999). Observational learning in children with Down syndrome and developmental delays: The effect of presentation speed in videotaped modeling. Down Syndrome Research and Practice, 6(1), 12-18. Biggerstaff, T. J., Mitbander, B. G., & Webster, D. E. (1994). Program understanding and the concept assignment problem. Communication of the ACM, 37(5), 72-82. Bloom, B. S. (Ed.). (1956). Taxonomy of educational objectives: The classification of educational goals: Handbook, I, Cognitive domain. New York, Toronto: Longmans, Green. Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of ManMachine Studies, 18(6), 543-554. Doxygen. (2004). Doxygen Web site. Retrieved January 7, 2005, from http://www.stack.nl/~dimitri/doxygen/ Fischer, G., Mccall, R., Ostwald, J., Reeves, B., & Shipman, F. (1994). Seeding, evolutionary growth and reseeding: Supporting the incremental development of design environments. Paper presented at the Conference on Computer-Human Interaction (Chi’94), Boston, MA. Forward, A., & Lethbridge, T. (2002). The relevance of software documentation, tools and techniqies: A survey. Paper presented at the ACM Symposium on Document Engineering, Mclean, VA. Fowler, M. (1999). Refactoring: Improving the design of existing code. MA: Addison-Wesley. Glasser, W. (1998). The quality school. Perennial. Gosling, J., Joy, B., & Guy, S. (1996). Java language specification. MA: Addison-Wesley. Henninger, S. (1997). Tools supporting the creation and evolution of software development knowledge. Paper presented at the International Conference on Automated Software Engineering (ASE’97), Incline Village, NV. Jensen, E. (2000). Brain-based learning: The new science of teaching and training (revision ed.). Brain Store Inc. Knuth, D. (1984). Literate programming. The Computer Journal, 27(2), 97-111. Kozaczynski, W., & Wilde, N. (1992). On the re-engineering of transaction systems. Journal of software maintenance, 4, 143-162. Martin, R. C. (2002). Agile software development, principles, patterns, and practices. MA: Addison Wesley. Novak, J. D. (1998). Learning, creating, and using knowledge. Mahwah, NJ: Lawrence Erlbaum Associates. Piaget, J. (1954). The construction of reality in the child. New York: Basic Books. Popper, K. (2003). The logic of scientific discovery. Taylor & Francis Books Ltd. Rajlich, V. (2002). Program comprehension as a learning process. Paper presented at the First IEEE International Conference on Cognitive Informatics, Calgary, Alberta. Rajlich, V., & Bennett, K. H. (2000). A staged model for the software lifecycle. Computer, 33(7), 66-71. Rajlich, V., & Xu, S. (2003). Analogy of incremental program development and constructivist learning. Paper presented at the Second IEEE International Conference on Cognitive Informatics, London, UK.
302
Constructivist Learning During Software Development
Ran, A., & Kuusela, J. (1996). Design decision trees. Paper presented at the Eighth International Workshop on Software Specification and Design, Paderborn, Germany. Robillard, P. N. (1999). The role of knowledge in software development. Communications of the ACM, 42(1), 87-92. Rostkowycz, A. J., Rajlich, V., & Marcus, A. (2004). Case study on the long-term effects of software redocumentation. Paper presented at the 20th IEEE International Conference on Software Maintenance, Chicago, IL. Rugaber, S., Ornburn, S. B., & LeBlanc, R. J. (1990). Recognizing design decisions in programs. IEEE Software, 7(1), 46-54. von Glasersfeld, E. (1995). Radical constructivism. London: The Falmer Press. Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press. Wang, Y. (2002, August), On Cognitive Informatics, Keynote Speech. In Proceedings of First IEEE International Conference on Cognitive Informatics (ICCI'02), (pp.34-42). Calgary, AB., Canada. IEEE CS Press. Wang, Y. (2004, August). On cognitive informatics foundations of software engineering. In Proceedings 3rd IEEE International Conference on Cognitive Informatics (ICCI'04) (pp. 22-31). Canada: IEEE CS Press. Wang, Y., & Kinsner, W. (2006). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 121-123. Williams, L., Kessler, R., Cuningham, W., & Jeffries, R. (2000). Strengthening the case for pair-programming. IEEE Software, 17(4), 19-25. Wunderling, R., & Zцckler, M. (n.d.). Docpp. Retrieved from http://www.zib.de/visual/software/doc++/ Xu, S., & Rajlich, V. (2004). Cognitive process during program debugging. Paper presented at the Third IEEE International Conference on Cognitive Informatics, Victoria, BC. Xu, S., & Rajlich, V. (2005). Dialog-based protocol: An empirical research method for cognitive activity in software engineering. Paper presented at the Fourth ACN/IEEE International Symposium on Empirical Software Engineering, Noosa Heads, Queensland. Ye, Y. (2006). Supporting software development as knowledge-intensive and collaborative activity. Paper presented at the 2006 International Workshop on Interdisciplinary Software Engineering Research, Shanghai, China.
This work was previously published in International Journal of Cognitive Informatics and Natural Intelligence, Vol. 1, Issue 3, edited by Y. Wang, pp. 78-101, copyright 2007 by IGI Publishing (an imprint of IGI Global).
303
304
Chapter XXI
A Unified Approach to Fractal Dimensions Witold Kinsner University of Manitoba, Canada
Abstract Many scientific chapters treat the diversity of fractal dimensions as mere variations on either the same theme or a single definition. There is a need for a unified approach to fractal dimensions for there are fundamental differences between their definitions. This chapter presents a new description of three essential classes of fractal dimensions based on: (a) morphology, (b) entropy, and (c) transforms, all unified through the generalized-entropy-based Rényi fractal dimension spectrum. It discusses practical algorithms for computing 15 different fractal dimensions representing the classes. Although the individual dimensions have already been described in the literature, the unified approach presented in this chapter is unique in terms of its progressive development of the fractal dimension concept, similarity in the definitions and expressions, analysis of the relation between the dimensions, and their taxonomy. As a result, a number of new observations have been made, and new applications discovered. Of particular interest are behavioral processes (such as dishabituation), irreversible and birth-death growth phenomena (e.g., diffusion-limited aggregates, DLAs, dielectric discharges, and cellular automata), as well as dynamical nonstationary transient processes (such as speech and transients in radio transmitters), multifractal optimization of image compression using learned vector quantization with Kohonen’s self-organizing feature maps (SOFMs), and multifractal-based signal denoising.
1. Introduction This chapter is concerned with measuring the quality of various multimedia materials used in perception, cognition and evolutionary learning processes. The multimedia materials may include temporal signals such as sound,
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Unified Approach to Fractal Dimensions
speech, music, biomedical and telemetry signals, as well as spatial signals such as still images, and spatio-temporal signals such as animation and video. A comprehensive review of the scope of multimedia storage and transmission, as well as quality metrics is presented by Kinsner (2002). Most of such original materials are altered (compressed or enhanced) either to fit the available storage or bandwidth during their transmission, or to enhance perception of the materials. Since the signals may also be contaminated by noise during different stages of their processing and transmission, various denoising techniques must be used to minimize the noise, without affecting the signal itself (Kinsner, 2002). Different classes of colored and fractal noise are described by Kinsner (1994c). A review of approaches to distinguish broadband signals and noise from chaos was provided by Kinsner (2003). The multimedia compression is often lossy in that the signals are altered with respect not only to their redundancy, but also to their perceptual and cognitive relevancy. Since the signals are presented to humans (rather than machines), cognitive processes must be considered in the development of suitable quality metrics. Energy-based metrics are not suitable for such cognitive processes. A very fundamental class of metrics based on entropy was described by Kinsner (2004), with a discussion on its usefulness and limitations in the area of cognitive informatics (CI) as defined in (Wang, 2002; Wang and Kinsner, 2006), and autonomic computing (Kinsner, Potter, & Faghfouri, 2005; Wang, 2007). This chapter is an extension of the single-scale entropy-based metrics to multiscale metrics through fractal dimensions. Many experimental results obtained by the author and his collaborators indicate that quality metrics based on fractal dimensions appear to be most suited for perception. Further research on their suitability for cognition is being conducted. A topological dimension is by definition a non-negative integer 0, 1, 2,… . The dimension of a general abstract vector space is the number of linearly independent vectors required for a basis, and is n for R n. An orthonormal basis is by definition a basis which is an orthonormal set. The space-time we live in is often characterized by four Euclidean integer dimensions. At the end of the 19th century, it was believed that one could define the dimension of a space as the number of continuous parameters required for describing it. However, with the introduction of space-filling curves by Peano, Hilbert, Minkowski, Sierpinski and others, as well as the discovery of continuous but nowhere differentiable curves by Weierstrass, and dusts by Cantor and Julia, the notion of the integer dimension had to be refined. Consequently, a real number was then introduced as a measure of the degree (index) of space filling (or meandering, or roughness, or brokenness, or irregularity). Today, chaos in a dynamical system is characterized by such numbers as a measure of the corresponding complexity of a strange attractor of the system. In 1928, Bouligand called the space-filling index the Cantor-Minkowski order. It was also known as fractional dimension (Besicovitch in the 1930s), logarithmic density, and even capacity (Frostman) and KS entropy (after A.N. Kolmogorov 1958 and Sinai 1959). In the 1960s, Benoît B. Mandelbrot reformulated the number in terms of the Hausdorff-Besicovitch fractal dimension to acknowledge the fundamental contribution of Hausdorff in 1919 in establishing a measure of a set through successive coverings of the set by volume elements (abbreviated to vels), and the refinement of the idea by A.S. Besicovitch ten years later. In 1979, Mandelbrot defined a fractal as “a set for which the Hausdorff-Besicovitch dimension strictly exceeds the topological dimension” (Mandelbrot, 1982). Today, we also recognize that some (fat) fractals may have integer dimensions. Since the dimension plays such an important role in fractals, it has been described in numerous books and other sources such as Peitgen, Jürgens, & Saupe (1992, p. 216); Hoggar (1992, p. 392); Falconer (1990, Ch. 2); Edgar (1990, Ch. 6); Feder (1988, p. 14); Barnsley (1988, p. 200); Wornell (1996); Kinsner (1994b); and Kinsner (1995). In the above approaches, the entire fractal object is characterized by a single number. This characterization is adequate for strictly self-similar objects only. Natural fractals are not self-similar everywhere, and require more than one number to describe them completely. In the late 1970s, Mandelbrot stressed that complex objects and dynamical systems should be characterized by a spectrum of numbers, rather than a single dimension. Hentschel & Procaccia (1983) adapted the Schutzenberger-Rényi generalized entropy to characterize the distribution of multifractals. We have also contributed to the study of such multifractal measures. The desire to implement the Hausdorff-Besicovitch fractal dimension has led to various classes of definitions, including (1) length dimension DL, (2) similarity dimension DS, (3) Hausdorff dimension DH, (4) MinkowskiBouligand dimension DMB, (5) mass dimension DM, (6) gyration dimension DG, (7) information dimension DI, (8) correlation dimension DC, (9) Rényi dimension spectrum Dq, (10) Mandelbrot singularity spectrum Sq, (11) spectral dimension Dβ, and (12) variance dimension Dσ, (13) Lyapunov dimension DΛ. Many other definitions
305
A Unified Approach to Fractal Dimensions
can also be found in the literature. Notice that each class may have different methods of computing the dimension, thus resulting in different numbers. Has this proliferation of the definitions of fractal dimensions always resulted in a better representation of fractal objects? Yes. However, since the definitions produced values that were not the same for a given fractal, as reported in the literature, they caused confusion and even mistrust in them. Consequently, the main objective of this chapter is to show that the different definitions of fractal dimensions should produce the same values if and only if the measured object is monofractal (i.e., an object with the same complexity everywhere), but should produce different values if the object is multifractal (i.e., a mixture of monofractals). Another objective of this chapter is to show how those dimensions are related, and how they can be classified into groups according to distinct features. They shall be grouped them into (i) morphological dimensions, (ii) entropy-based dimensions, and (iii) transform-based dimensions.
2. Morphological Fractal D imensions Morphological fractal dimensions are concerned with the geometry of fractal objects only. They are purely shape-related concepts. They utilize no information about the distribution of a measure over a spatial fractal or the time behavior of a dynamical system. Although limited to the morphology of objects, they are the most widely used today in the form of the box-counting dimension. This section describes the major variations on this class of dimensions, and establishes a number of characteristic attributes of the morphological fractal dimensions. Although this class is easy to compute and sufficient to characterize monofractals, it is not sufficient to characterize multifractal objects such as strange attractors pertinent to perception and cognition.
2.1 Lngth Fractal Dimension, DL Measuring of objects (i.e., their forms or patterns) is a fundamental part of any geometry such as Euclidean (325-265 BC), hyperbolic Lobachevsky (1792-1856), and spherical Riemanian (1826-1866). Simple objects such as triangles, polygons and circles allow unambiguous measurements, regardless of their scale. This can be demonstrated by the well-known Archimedean approximation of the length of a circle of radius R by inscribed regular polygons of length LN = 2NRsin(180/N), where N is the number of sides of length r in the polygon, as shown in Fig. 1a. It is seen that an initial rough estimate L6 is replaced by successive values of L12, L24, L48, L96, … that converge rapidly to the true length of the circle L∞ = 2πR. Notice that the values of (1/r) along the x-axis in Fig. 1b can be interpreted as a precision of the measurements.
Figure 1. (a) Unambiguous Archimedean approximation of the length a circle. (b) Log-log plot of the successive measurements
306
A Unified Approach to Fractal Dimensions
Figure 2. Ambiguous measurement of the length of a fractal
Figure 3. Approximation of the length of the coast of Britain and a circle
In general, the length of any simple curve (i.e., a curve that is continuous and differentiable everywhere) can be approximated by taking a ruler of length r and counting the number N(r) of such rulers required to step along the curve from one end to the other (i.e., to cover the curve). As the ruler size approaches zero, L(r) approaches a finite limit according to L(r) = lim N(r) r D E = const; const < ∞ r→0
(1)
Clearly, the value of the exponent of r is DE = 1 in the Euclidean space. Does this unambiguous measurement of a simple object in the Euclidean geometry extend to a complex object in the fractal geometry, such as the Koch curve or a coastline? No, because measuring such objects reveals rapidly escalating details when the measuring unit r decreases, as illustrated in Fig. 2. So, even if absolutely precise, the measurement of L at two different sizes of r will be different (a larger r results in a shorter L). Similarly, for two different scale maps and the same r, the measurements will also be different (a larger scale results in a shorter L). If we start measuring the coast of Britain with a ruler of length r1 = 500 km, then successive length measurements at ruler sizes of 100, 54 and 17 km plotted in log-log coordinates are scattered around a line with a slope m ≈ 0.361900, as shown in Fig. 3 (for comparison, the graph also shows the Archimedean approximation of the circle). Notice that the slope is denoted by the usual m to signify “modulus of slope,” or “move,” or the French “monter,” “to climb,” and to avoid confusion with other symbols used in this chapter. This line implies the following power-law relationship between L and r at the successive scale stages k
307
A Unified Approach to Fractal Dimensions
L k = c r km or L k ~ r km
(2)
where c is the intercept. Thus, the exponent m can be expressed as m = lim
k→∞
log L k log r k
(3)
to the accuracy of log c. So, if the exponent of r is DE = 1 in Eq. (1), the measured length of a fractal diverges, L(r) → ∞. What happens if we change the exponent from 1 to some value D in Eq. (1)? L(r) = lim N (r) r D r→0
(4)
The new exponent D is now capable of suppressing the rate of the diverging sequence to a point at which it becomes constant, exactly as it is with simple objects, but beyond which it vanishes. This critical value DL is the length fractal dimension. This new dimension, though no longer an integer, reconciles the difference between the length measurement of simple and fractal objects! It appears that this is one of the simplest explanations of the meaning of the fractal dimension concept. It clearly involves the successive (Hausdorff) coverings by volume elements (vels for short) which are in this case line segments of size r, with r → 0 at successive coverings. It can be shown that the DL is related to the slope m of the log-log plot according to the following simple relation (Kinsner, 1994a) D L = 1 + m
(5)
Thus, the coast of Britain can be characterized by DL ≈ 1.36, and the more meandering coast of Norway by DL ≈ 1.57. It is seen that the fractal dimension DL can be interpreted as a measure of complexity of the curve (i.e., the level of meandering, or roughness, or singularity). This measure has a lower bound of DLmin = 1 (i.e., the Euclidean dimension in which its smooth relative could exist), and an upper bound DLmax = De (i.e., the embedding dimension De in which the actual fractal curve exists). For example, a space filling curve such as the Peano or Hilbert has DLmin = 1 (because its generator consists of line segment) and DLmax = 2, if it is defined on a surface. It means that a continuous, never intersecting, nowhere differentiable curve fills the entire surface, thus being a curve of the largest possible complexity. On the other hand, most of mathematical and natural fractals have DLmax < De. Notice that computational constraints may lead to values that exceed the theoretical bounds slightly. Since this fundamental bounding property of fractal dimensions also applies to all the other fractal dimensions discussed in this chapter, it is very useful in fractal object characterization and classification using crisp and fuzzy classifiers. Another fundamental property of fractal dimensions is that they are independent of the energy of the object (signal). Instead of measuring energy of the object, they measure its complexity through its information, or entropy, or other metrics. Still another fundamental property of fractal dimensions is that they can be obtained only through measurements at multiple scales, thus revealing any long-range dependencies in the fractal objects. This property is critical in the analysis of chaotic time series (e.g., Kantz & Schreiber, 2004; and Sprott, 2003). Notice that since the concept of length does not discriminate between different parts of the object, the above approach to computing DL ignores any nonuniformity of the distribution of a measure along the object, and thus this is equivalent to the assumption that the entire length has equal (uniform) distribution of properties such as the density of the line. This property makes DL the first example of morphological dimensions.
308
A Unified Approach to Fractal Dimensions
2.2 Self-Similarity Dimension, DS While the length fractal dimension DL applies to any line fractal, the self similarity fractal dimension DS applies to either regular self-similar objects, or asymptotically self-similar objects of any embedding dimension De. This is the reason why we can abandon the concept of length, surface, volume, or any other higher-order dimensional object, and concentrate on the relationship between the number of vels and their size (or any other equivalent scale metric) at multiple coverings. If the following power-law relation is satisfied N(r) ~ 1r
DS
for r → 0
(6)
then the self-similarity dimension DS is given by D S = lim
k→∞
log N k log 1 / 1
/r k
=
log N log 1 / r
(7)
where the sign ~ reads “is proportional to,” k is the covering count, r is the reduction rate in the size of a vel between two successive coverings (r = rk / rk-1 for k > 0; e.g., r = (1/4) / (1/2) = 1/2, as shown in Fig. 4), and N is the increase rate of the number of vels between two successive coverings (N = Nk / Nk-1; e.g., N = (9) / (3) = 3, as shown in Fig. 4). For the Sierpinski gasket shown in Fig. 4, having complexity between a line and a plane, the self-similarity dimension can be computed from DS = lim
k→∞
= lim
k→∞
log 3 k log 1 / 1 /2 k
log 3 k log 3 = = 1.5850... k log 2 log 2
(8)
It is seen that the self-similarity fractal dimension of a pure mathematical fractal can be obtained from a single measurement at any scale for k > 0. For the Cantor set (see Kinsner, 1994a), having complexity between a point and a line, the self-similarity dimension is DS ≈ 0.6309.
Figure 4. Construction of the Sierpinski triangular gasket
309
A Unified Approach to Fractal Dimensions
2.3 Husdorff Dimension, DH In 1919, Felix Hausdorff proposed an iterative multiple-scale subdivision procedure to define a measure of any irregular object. Since this multiscale measure is fundamental to a group of morphological fractal dimensions, these fractal dimensions bear his name. The concept of successive coverings of a given fractal object is very simple: (i)
COVER the object by using the concept of neighbourhood which could be a small region of any shape (often called the Borel ball, or a volume element, vel for short), centered on a point either on or in the vicinity of the fractal. If the fractal is embedded in a specific Euclidean dimension (e.g., the Koch curve is embedded in DE = 2), use the neighbourhood of the same embedding dimension and size r to cover the object. Notice that r must be smaller than the fractal object itself; otherwise, Nk reaches a saturation level at 1. (ii) For a given size rk, COUNT the number Nk of vels required to cover the object. (iii) REDUCE the size of the vel, and REPEAT Step (ii) UNTIL no further detail is seen (i.e., another saturation level is reached). If we have any two successive measurements for k–1 and k, then the Hausdorff fractal dimension can be computed from D Hk =
log N k / N k − 1 log r k − 1 / r k
(9)
In the limit, the expression becomes D H = lim
k→∞
log N k log 1/r k
(10)
As it should, the expression resembles both the length dimension and the self-similarity dimension, but now relates to more general irregular fractals. Is is important to notice that while the fractal dimension of a strict monofractal can be computed from a single scale, an irregular or stochastic fractal must be computed from at least three scales or rk. Each scale produces just a single point in the log-log plot of Nk vs rk, and must never be taken as DH because it is a function of the scale rk (a common mistake made by novices in this area). Instead, a linear regression must be done on the points (except for any saturation points at the extreme values of the scale) to obtain the slope m. The dimension is then DH = m. This procedure applies to all the successive multiscale calculations. Another important problem is related to the concept of covering. Since there is no single definition of the covering, several distinct implementations techniques have evolved, three of which are shown in Fig. 5 (Kinsner, 1994a). Figure 5a shows the minimum number of regular vels of radius r required to cover a fractal F completely, while Fig. 5b shows the opposite extremum: the maximum non-overlapping vels that can be used to cover F. Figure 5c shows adjacent vels forming a mesh, and the dimension is often called the box-counting dimension. Notice that the first two techniques use r as the radius of the vels, while the latter technique uses r as their diameter. Although the numbers of intersecting vels are different for each covering type, the three Hausdorff dimensions are closely related, as long as the same technique is used throughout the experiment. This technique applies to objects in any embedding dimension.
2.4 Minkowski-Bouligand Dimension, DMB This dimension is touted as the fundamental approach by some authors (e.g., Tricot, 1995, Sec. 2.5). It is based on orders of growth and scale of functions at 0 (i.e., when r€→€0) to measure the degree of space filling of a
310
A Unified Approach to Fractal Dimensions
Figure 5. Examples of covering schemes for the Hausdorff dimension: (a) Minimum overlapping vels. (b) Maximum non-overlapping vels. (c) Adjacent mesh for the mesh-counting dimension
curve. One embodiment of this approach is the Minkowski sausage procedure in which a centre of a small disk with radius rk is allowed to follow the curve to measure the Minkowski content (i.e., the area Ak of the resulting Minkowski sausage, as shown in Fig. 6). A measure of the space filling of the curve can be calculated by dividing the area by the diameter of the disk A mk =
Ak 2r k
(11)
and a rate of this change can be established by reducing the size of the disk. If this follows the power-law relationship, the Minkowski-Bouligand dimension (also known as the Cantor-Minkowski-Bouligand dimension) is given by
311
A Unified Approach to Fractal Dimensions
Figure 6. “Minkowski sausage” covering
D MB = lim
r→0
log A mk +2 log 1/r k
(12)
Notice that for a smooth curve, Amk ~ rk and DMB = (log rk/(–log rk)+2 = –1+2=1. It can be shown that for strictly self-similar fractals, DH = DMB, but for natural fractals, DH < DMB. Based on the concept of dilation, we have also tried a variation on this scheme to determine the rate of spacefilling of electrical discharges in dielectrics, as they are related to diffusion-limited aggregates (DLAs), and obtained very-high-accuracy estimates of the fractal dilation dimension (Stacey, 1994).
2.5 Mass Dimension, DM The mass fractal dimension is another embodiment of the Minkowski sausage approach. It is often used in measuring the complexity of natural fractals such as the Lichtenberg figure (Fig. 7), dielectric discharges, and fractal growth phenomena in general. An estimate of the “mass” Ak contained in the fractal is obtained by measuring the area of the fractal branches contained within a corresponding circle of radius rk and centered at the seed of the fractal (i.e., the point where the branches merge). This measurement is repeated for different radii, and the mass fractal dimension is computed from D Mk =
log A k / A k − 1 log r k / r k − 1
(13)
log A r log r
(14)
or D M = lim
r→0
in which case the mass dimension is calculated from the slope of the corresponding log-log plot. Clearly, the plot will saturate when the radius exceeds the size of the fractal, and when the radius becomes smaller than the lattice on which the fractal is observed or simulated. The mass dimension is DM ≈ 1.7 to 1.9 for the Lichtenberg figure,
312
A Unified Approach to Fractal Dimensions
Figure 7. Successive coverings of a growth fractal for mass dimension
Figure 8. Comparison of coverings in mass dimension (rk) and gyration dimension (RG)
DM ≈ 1.6 for both the diffusion-limited aggregates (DLAs) in two-dimensional (2D) embedding space and for the natural down feather, DM ≈ 2.4 for DLAs in 3D, and DM ≈ 2.73 for the Sierpinski sponge in 3D.
2.6 Gyration Dimension, DG Similarly to the mass fractal dimension, the gyration fractal dimension is ideally suited for fractal growth phenomena with asymmetry. Furthermore, we also consider the gyration dimension as a major extension of the mass dimension. Although both seem to be related, the differences are significant in that the gyration dimension uses a statistical measure of the spread of the fractal during its growth, rather than a priori selected circles centered at its seed. The two techniques are compared in Fig. 8. The radius of gyration RG is equal to the standard deviation of the spread of the fractal, and has its origin at the center of mass (alias centre of gravity, or centroid) of the fractal. It is seen from Fig. 8 that a circle with radius RG covers any asymmetrical fractal much better than the concentric circles coincident with the seed, as used in the mass dimension. For a given number Nk of discharged sites at stage k in a dielectric discharge simulation, the radius of gyration is defined as (Vicsek,1992, p. 84) R Gk (N k ) =
1 Nk
Nk
Σ r 2j j=1
(15)
313
A Unified Approach to Fractal Dimensions
where rj is the distance of the jth bond from the centroid of the fractal. The location of the centroid µ is defined as the arithmetic mean of all the discharged sites along the x and y directions ck (N k )
≡ x,y
(16)
where Nk
Nk
x ck (N k ) = 1 Σ x j and yck (N k ) = 1 Σ y j Nk j= 1 N k j = 1
(17)
DG If the power-law relationship N k = R Gk , holds, then the gyration dimension is
DG = lim
k→∞
log N k log RGk
(18)
As before, DG can be obtained from the slope of a log-log plot. Notice that Eq. (15) does not reveal its relation to variance. However, it can be rewritten into the following more convenient and practical form (Kinsner, 1994a) R Gk (N k ) =
1 Nk
=
2 x
2 y
Nk
Σ x 2j − N1k j=1
Nk
+
+
Nk
2
Nk
Σ
j=1
Σ y 2j − N1k jΣ= 1 y j j=1
xj
2
1/2
(19)
Since the four sums in Eq. (19) can now be computed during the simulation of the random fractal, the radius of gyration can be computed at any desired value of k. We term this type of computation as being real time as opposed to batch processing if the averages must be computed first. Furthermore, this mode of computation is applicable not only to birth processes in which the number of elements always expands (e.g., DLAs), but also to birth-death processes in which the number of occupied sites can both grow and diminish (e.g., cellular automata). An incremental real-time procedure for computing the gyration dimension is described in (Kinsner, 1994a). Another important observation is that the gyration dimension is the only morphological dimension that lends itself to a modification to include a nonuniform probability distribution within the fractal. The modified expression for the radius of gyration is the square root of the weighted average R Gk (N k ) =
2 r
=
1 Nk
Nk
Σ p(r j) j=1
rj −
2
(20)
Observe that Eq. (20) no longer relates to a morphological dimension (uniform distribution); it now relates to an entropy-based dimension as discussed next.
3. Entropy-Based Fractal Dimensions Entropy-based fractal dimensions differ significantly from the morphological dimensions discussed in the previous section in that they can deal with nonuniform distributions in the fractals, while the morphological dimensions show the shape of a projection of the fractal only. This is understandable because the morphological dimensions
314
A Unified Approach to Fractal Dimensions
are purely metric (and not probabilistic or possibilistic) concepts. Since this distinction has not been appreciated uniformly in the literature, one should be aware of possible fundamental errors in the results and conclusions there.
3.1 IDI The simplest entropy-based fractal dimension is related to the first-order Shannon entropy. Let us consider an arbitrary fractal that is covered by Nk vels, each with a diameter rk, at the kth covering (which is a setting similar to that used to determine the Hausdorff dimension, DH ). Recall that DH was estimated from the number of vels intersected by the fractal, regardless of the density of the fractal in each vel. In contrast, the estimation of the information dimension, DI, considers the density of the fractal, as determined from the relative frequency of occurrence of the fractal in each intersecting vel. If njk is the frequency with which the fractal enters (intersects) the jth vel of size rk in the kth covering, then its ratio to the total number NTk of intersects of the fractal with all the vels is an estimate of the probability pjk of the fractal within the jth vel, and is given by n jk N Tk
(21)
Σ n jk j=1
(22)
p jk def lim
k→∞
where Nk
N Tk def
Notice that this total number NTk must be recalculated for each kth covering because, in general, it can change substantially on dense fractals. With this probability distribution at the kth covering, the average (expected) self-information (i.e., Ijk = log (1 / pjk) of the fractal contained in the Nk vels can be expressed by the Shannon entropy H1k as given by Nk
H 1k def
Σ p jk I jk = j=1
Nk
− Σ p jk log p jk j=1
(23)
Notice that the subscript 1 in H denotes that the Shannon entropy is of the first order which assumes independence between all the vels. If the following power-law relationship holds c H 1k ~ r1 k
DI
(24)
where c is a constant, then the information fractal dimension is D I = lim
k→∞
H 1k log 1/r k
(25a)
H 1r log 1/r
(25b)
or D I = lim
r→0
As before, DI can be obtained from the slope m of a log-log plot of Shannon’s entropy H1k vs precision (1/rk) as DI = m.
315
A Unified Approach to Fractal Dimensions
The difference between the self-similarity dimension and the information dimension can be illustrated by studying fractals and nonfractals such as an ensemble of a unit interval and an isolated point with equal probability distribution between the interval and the point. It can be shown (Kinsner, 1994a) that the self-similarity dimension masks out the point completely (DS = 1), while the information dimension preserves the presence of the point (DI = 1/2). This also applies to the Hausdorff dimension. In general, DI ≤ DS and DI ≤ DH, with the equality occurring only for fractals with uniform probability distributions. Such fractals are called monofractals.
3.2 Correlation Dimension, DC The information dimension reveals the expected spread in the nonuniform probability distribution of the fractal, but not its correlation. The correlation fractal dimension was introduced to address this problem. Let us consider a setting identical to that required to define the information dimension, DI. If we assume the following powerlaw relationship −1
Nk
Σ
j=1
p jk2
DC
~ 1r
(26)
then the correlation dimension is −log D C = lim
k→∞
Nk
Σ p jk2 j=1
log 1/r k
(27a)
(27b)
or Nr
log D C = lim
k→ ∞
Σ p jk2 j=1
log r k
As before, DC can be obtained from the slope m of a log-log plot of the second-order entropy H 2 vs precision (1/rk) as DC = m. It is clear that the numerator is different from the Shannon first-order entropy in the information dimension. It can be shown that it has the meaning of a correlation between pairs of neighboring points on the fractal F (Kinsner, 1994a). This correlation can be expressed in terms of a density-density correlation (or pair correlation) function. It is also known as the correlation sum, or correlation integral. This interpretation can lead to a very fast algorithm for computing the correlation dimension (Grassberger & Procaccia,1983; Kinsner, 1994a; Kinsner, 1994b). There are numerous examples in the literature of computing the correlation dimension for natural fractals, including DLAs, dielectric discharges, retinal vessels, damped pendulum, and the Hénon strange attractor (for a review, see Kinsner, 1994a). In general, DC ≤ DI ≤ DH, with the equality occurring for monofractals only.
3.3 Rényi Dimension Spectrum, Dq Since the correlation dimension is an extension of the Shannon-entropy-based dimension, could we gain anything by generalizing the concept further? Yes, we could see the entire spectrum of power-law relationships if we use the generalized higher-order entropy concept, as introduced by Alfred Rényi in 1955, and prior to him by M.P. Schutzenberger, and given by
316
A Unified Approach to Fractal Dimensions
Hq =
1 log 1−q
Nk
Σ p jkq j=1
0≤q≤∞
(28)
where q is called the moment order. Notice that although the Rényi entropy becomes singular for q = 1, it can be shown that it is the Shannon entropy (as in Eq. 23) (Kinsner, 1994a). Let us consider a setting identical to the previous two fractal dimensions. If the following power-law relationship holds for the expanded range of q (to cover all its negative values) Nk
Σ
j=1
p jkq
1 1−q
~ 1r
Dq
−∞ ≤ q ≤ ∞
(29)
then the Rényi fractal dimension spectrum is Nk
log
D q = lim
k→∞
Σ p jkq j=1
1 1 − q log 1/r k
(30a)
or Nk
D q = lim
k→∞
1 q−1
log
Σ p jkq j=1
log r k
(30b)
Once again, for a given order q, Dq can be obtained from the slope s of a log-log plot of the q-order entropy Hq vs reduction (rk) as Dq = m. It should be clear that the process should be repeated for a desired range of q, often −10 ≤ q ≤ 10 to contain numerical errors for high powers on some computers. Also notice that special attention must be given for q = 1 at which Eq. (30) becomes singular. It can be shown (Kinsner, 1994a) that for a fractal with a nonuniform probability distribution function, Dq decreases monotonically from D −∞ to D ∞, resembling an inverted S curve, as shown in Fig. 9. For a pure selfsimilar monofractal, all the dimensions become equal to D0 = DH, as shown by the horizontal line in Fig. 9. One by one, we can show (Kinsner, 1994a) that for q = 0, the Rényi dimension is equivalent to the morphological Hausdorff dimension D0 ≡ DH; while for q = 1, the Rényi dimension is equivalent to the information dimension D1 ≡ DI; for q = 2, it is the correlation dimension D2 ≡ DC; and for q = ±∞, Dq becomes what we call the Chebyshev dimension computed from the maximum and minimum probabilities, respectively. The Chebyshev extreme dimensions provide the Rényi spectrum bounds which are very useful in classification by neural networks. The other fractal dimensions discussed in this chapter are also equivalent to the Rényi dimensions with some values of q, including noninteger values of q (e.g., the gyration and variance dimensions). Thus, since the Rényi fractal dimension spectrum covers all the known dimensions, it can be seen as the unifying framework. The significance of the Rényi dimension spectrum is that it is no longer a single-valued dimension, but a single-valued monotonically decreasing function. Without any assumptions, it reveals the nature of the object either as a monofractal (a straight line), or as a mixture of fractals (a multifractal) whenever the function looks like an inverted S curve. This S-curve can be interpreted as a bounded signature of the fractal. This is in contrast to the Rényi entropy which is unbounded. This boundedness of the signature is of particular importance to classification of fractal objects. The inverted S-curve may be either antisymmetric about D 0 for q = 0, or not. Any asymmetry is an indicator of an asymmetrical probability distribution (skewness). Whenever a fractal has the complexity of a multifractal strange attractor in chaos, with varying densities, a single-valued fractal dimension no longer can describe the fractal adequately, and the Rényi dimension spectrum should be used for its characterization. For example, the multifractal spectrum can be used in characterizing the
317
A Unified Approach to Fractal Dimensions
Figure 9. A typical Rényi dimension spectrum for a multifractal
multifractal structure and dynamics of the majority of non-equilibrium heterogeneous stochastic phenomena in physics and chemistry (including DLAs and dielectric discharges; viscous fingering; solidification; and surface growth), as well as in biology, medicine and CI (including the Lévy walks of ion pumps in a cell; electrical signals from muscular, cardiac and brain activities; perception processes; and cognition dynamics). It can also be used in the study of percolation, cellular automata, textures of images and surfaces of materials, non-stationary processes such as speech, and electrical signals in biological organisms. We have studied several such phenomena using the Rényi dimension spectrum, and found it to be extremely useful as both a detector of multifractality, and as a source for input vectors for neural network classifiers. As described in Sec. 3.2, the correlation dimension can be interpreted in terms of the pair correlation function. One can also show that the Rényi dimension spectrum can be interpreted as q-tuple correlation function for q > 0. Once again, this interpretation may lead to a fast algorithm for computing the Rényi dimension spectrum. The Rényi dimension may also be instrumental in revealing the spectrum of fractals contained in an object, as discussed next.
3.4 Mandelbrot Singularity Spectrum, Sq We shall now show that the Rényi fractal dimension spectrum is closely related to the Mandelbrot singularity spectrum. If a fractal object has different local measures (such as a probabilistic weight) at its different regions, the distribution of the measures can be described by a multifractal distribution function f(α) (that we also call the multifractal singularity spectrum). As it will be seen, we could interpret α as the strength of the local singularity of the measure, and thus could call it the Hölder exponent or the crowding index. Since the idea of multifractality was first proposed by Mandelbrot (e.g., Mandelbrot, 1974), and later described by many other (e.g., Hentschel & Procaccia, 1983; Grassberger & Procaccia, 1983; and Vicsek, 1992), we would like to call the multifractal dimension f(α) the Mandelbrot singularity spectrum, Sq. Consider a recursive (multiplicative) process generating a non-uniform fractal (i.e., with rescaled regions of different sizes rj) with inhomogeneous measures (i.e., regions with different probabilities pj) at each of the rescaled regions. An example of such a process over a square of size L at its commencement (the second iteration, k = 2) is shown in Fig. 10. We have seen that for a uniform fractal with homogeneous measures, the distribution of probabilities p for a given vel of size r satisfies the single-valued power-law relation p(r) ≡ 1 ~ r D S N r
(31)
where DS is the self-similarity fractal dimension. However, for a non-uniform fractal with inhomogeneous distribution, the local relationship is
318
A Unified Approach to Fractal Dimensions
Figure 10. A recursive process producing a multifractal (after (Vicsek,1992, p. 50)
Figure 11. The Mandelbrot singularity spectrum
p j(r j) ~ r j
j
for
j
= [−∞,∞]
(32)
In addition, we can consider how many vels have the same αj. For example, for k = 2 in Fig. 10, there are two vels with p1, three vels with p2, and one vel with p3. In general, the number of vels with a specific α has the following power-law relation N (r) ~
1 r f ( )
(33)
where f(α) is the Mandelbrot singularity spectrum, Sq. If we perform a set of measurements, a plot of f(α) could be constructed, as shown in Fig. 11. Notice that the maximum Sqmax is reached for the Hausdorff dimension DH ≡ D0, while the minimum Sqmin = 0 is reached for the extreme values of probability (pmax on the left of the maximum, and pmin on the right). So, Sq is not only bounded, but has a finite support! From an application point of view in fractal object classification, this finite support makes this object characterization more desirable than the Rényi spectrum Dq which is also bounded, but has an infinite support. Also notice that the exponent α is analogous to the energy, while f(α) is analogous to the entropy as a function of energy, and is reminiscent of plots in thermodynamical systems (Stanley & Meakin,1988). Since the Rényi dimension spectrum, Dq, contains all the information about the multiscale dimensional analysis of the fractal, it also contains the information about its singularity spectrum. Consequently, the Mandelbrot singularity spectrum, Sq, can be computed directly from the Rényi dimension Dq. The Hölder exponent can be
319
A Unified Approach to Fractal Dimensions
obtained by taking the derivative with respect to the Rényi exponent q q
def d (q Š1) D q dq
(34)
and f(α) ≡ Sq is obtained from S q def q
q
− (q Š1) D q
(35)
This Rényi and Mandelbrot spectra represent equivalent descriptions of multifractals, as they are Legendre transforms of each other (Halsey, Jensen, Kadanoff, Procaccia & Shraiman, 1986; and Stanley & Meakin,1988). However, since Eq. (34) involves numerical differentiation, it is better to compute the Mandelbrot singularity spectrum directly, using wavelets and their modulus maxima (Faghfouri & Kinsner, 2005); and Mallat,1998).
4. Transform-Based Fractal D imensions We shall concentrate on three very practical dimensions based on: (i) power spectrum density, and (ii) multiscale variance, and (iii) Lyapunov exponents.
4.1 Spectral Dmension, Dβ A time series representing a chaotic or non-chaotic process can be transformed into its power spectrum density, using spectral analysis techniques such as Fourier (including the short-term windowed fast Fourier transform, FFT, and the discrete cosine transform, DCT), or the time-scale transforms such as wavelets (Wornell, 1996). If the power spectrum has equally spaced harmonics, the underlying process is periodic or quasiperiodic (nonchaotic). On the other hand, if the power spectrum is broadband, with substantial power at low frequencies, it may originate from chaos, although the broadband power spectrum does not guarantee sensitivity to initial conditions, and therefore chaos. A broadband signal v(t) can be characterized either by its energy spectrum E( f ) = |V( f )|2, or its power spectrum P*( f ) = |V( f )|2/T, or its power spectrum density given by P( f ) = lim 1 V ( f ) 2 T →∞ T
(36)
where |V( f )| is the Fourier transform amplitude. The spectral density gives an estimate of the mean-square fluctuations of the signal at a frequency f. If we assume that the power spectrum density has the following power-law form P( f ) ~ 1 f
(37)
then we can use the exponent β to define the spectral fractal dimension as D = DE +
3− 2
(38)
where DE = 1 is the embedding Euclidean dimension for the time series. It is now customary to characterize such broadband signals as colored noise, according to the value of β; for β = 0, 1, 2, 3, the noise is white, pink, brown, and black, respectively, as shown in Fig. 12. White noise is completely random, while the colored noise has more persistence (which is defined as the trend of a process to
320
A Unified Approach to Fractal Dimensions
Figure 12. (a) White, pink, brown, and black noise defined by their power spectrum. (b) Example of a black noise
continue in the direction upon which it has embarked). Black noise is often representative of natural and unnatural catastrophes such as floods and droughts. For a fractal time series, the exponent β may also be fractional. A single value of β indicates self similarity or self affinity of the noise everywhere. A complicated phenomenon may exhibit more than one β in its power spectrum density, thus facilitating search for the critical points in the process.
4.2 Variance Dimension, Dσ As we have seen, a time series representing a chaotic or non-chaotic process could be characterized through the power spectrum exponent β. It can also be characterized directly in real time by analyzing the spread of the increments in the signal amplitude (variance, σ2). Let us assume that the signal v(t) is discrete. If we assume that the variance of its amplitude increments is related to the time interval according to Var v(t 2) − v(t 1) ~ t 2 − t 1
2H
(39)
(40)
or for short Var
v
t
~
t
2H
then the Hurst exponent H can be calculated from a log-log plot using log Var v H = lim 1 t→0 2 log t
t
(41)
Finally, for the embedding Euclidean dimension DE, the variance dimension Dσ can be computed from D = D E + 1 − H
(42)
The technique of computing the variance dimension is so simple that it lends itself to real-time fractal analysis of a time series (Kinsner, 1994c). Although the spectral dimension may reveal a multifractal nature of the underlying process by estimating β for different short-term (windowed) Fourier analysis, the choice of the window is difficult and may introduce artifacts. On the other hand, the variance dimension does not require a window in the Fourier sense, and thus avoids the windowing problem. This technique can also be used to calculate a variance fractal dimension trajectory (VFDT) for a process that is piecewise stationary (Kinsner, 1994a). Such a trajectory can be used to analyze the temporal or spatial
321
A Unified Approach to Fractal Dimensions
multifractality in the study of dishabituation in behavior modification (e.g., Kinsner, Cheung, Cannons, Pear, & Martin, 2003).
4.3 Lapunov Dimension, DΛ The Lyapunov dimension is another useful fractal dimension because it can be derived from Lyapunov exponents which, in turn, can be calculated directly from the strange attractor or the orbit (phase trajectory) of a dynamical system, without the explicit knowledge of the underlying nonlinear system of coupled differential equations (for flows), or recursive difference equations (for maps) (Kinsner, 2003). This can be done by measuring quantitatively the stretching and contracting of the evolution of neighbouring orbits of the dynamical system. The spectrum of such Lyapunov exponents can then be used to distinguish between nonchaotic and chaotic, or even hyperchaotic processes. If all the exponents are negative, the system is convergent to a point or a cyclic attractor. However, if one of the exponents is positive, the system is divergent, but if the orbit forms a strange attractor, the system is chaotic. If two or more exponents are positive, the system is hyperchaotic. The Lyapunov dimension is defined as (Kaplan & Yorke,1979; and Ott,1993, p. 134)
D =K+
1 K +1
K
Σ
j=1
j
(43)
where λj denotes the jth Lyapunov exponent, arranged as a spectrum from the largest λ1 to the smallest λm for an m-dimensional system, K is the largest integer index which makes the sum of the exponents non-negative according to K
Σ j=1
j
≥0
(44)
and λK+1 is the first negative exponent. The advantage of the Lyapunov dimension is that it characterizes the complexity of a strange attractor without much computational effort. However, since this scalar cannot possibly reveal the multifractality of a strange attractor, the Rényi or Mandelbrot spectrum should be used.
5. Concluding Remarks The main objective of this chapter was to present a unified framework for fractal dimensions in order to identify conditions under which the different types of fractal dimensions can be either equal or unequal. We have seen that only pure self-similar monofractals have all the distinct fractal dimensions equal (within the computational accuracy used). On the other hand, the different types of fractal dimensions of multifractals cannot be equal. Since the Rényi fractal dimension spectrum includes all the dimensions, it is used as the foundation for the unified approach in this chapter. Like a singularity filter, the Rényi dimension spectrum sifts out each monofractal from the multifractal object. This spectrum includes not only the morphological fractal dimensions, but also the entropy-based and transform-based fractal dimensions. Another objective of this chapter was to develop a taxonomy of fractal dimensions. There are at least three approaches to the classification of fractal dimensions. Firstly, they can be classified according to the information content of a fractal F under consideration. Another classification may be according to the method of computing the dimension. Still another approach may be based on the applicability of the dimension to specific processes and objects. We have taken the first approach for the taxonomy, and identified three classes of dimensions based on: (i) morphology of an object, (ii) its entropy, and (iii) its transform. The morphological dimensions are based on purely geometric concepts, and they emphasize the shape (morphology) of the object. This applies to objects whose distributions of a measure (such as probability) is uniform (i.e., the fractal is homogeneous) or the information about the distribution is not available. It should be stressed that not all morphological dimensions produce the
322
A Unified Approach to Fractal Dimensions
same values for the same object. For example, a Hausdorff dimension of a dielectric discharge is different from its mass dimension and gyration dimension. We also established that the gyration dimension is a good candidate for a bridge between the morphological and entropy dimensions in that it can be modified from the purely geometrical form to its information-based form. The entropy-based dimensions take into account a probability measure or a correlation measure of F. All the entropy dimensions considered in this chapter are defined in terms of the relative frequency of visitation of a typical trajectory (in temporal fractals) or the distribution measure (in spatial fractals), so they use either information about the time behavior of a dynamical system or the measure describing the inhomogeneity of a spatial fractal. The use of q-tuple correlation functions is covered elsewhere (Kinsner, 1994b). Notice that the entropy dimensions include the morphological dimensions as special cases of the Rényi dimension spectrum. The Mandelbrot singularity spectrum is closely related to the Rényi dimension spectrum. Finally, the transform fractal dimensions rely on changing the original fractal into a different domain in which its properties could be observed. One example is the spectral fractal dimension derived from the frequency domain. The variance fractal dimension is another example, though less conventional, of moving from the time domain to the variance domain. It should be stressed that the variance dimension is not the same as the variance of a process, or even the variance index, as used in the literature. There are many other fractal dimensions in this class that are not discussed in this chapter. Notice that all the three classes of dimensions use the Hausdorff covering procedure to extract the dimensions. Each class of fractal dimensions has its own range of applications. For example, the very popular box-counting dimension (called the Hausdorff mesh dimension in this chapter) has been applied to nearly everything that looks like a fractal. It produces reliable results when analyzing contours and n-dimensional projections of fractal objects, as long as they are single fractals. However, this morphological dimension could not possibly reveal the intricate structures of multifractal objects. Consequently, if such morphological dimensions are used to characterize the shape and the texture of malignant cancerous cells, the results are unacceptable and lead to skepticism among pathologists. Multifractal objects such as the malignant cells must use the entropy dimensions for their characterization. But we must caution that the single-valued information dimension or correlation dimension cannot reveal the multifractal complexity either. Instead, the Rényi dimension or the Mandelbrot singularity spectrum can show the spectrum of fractals contained in a multifractal object. The transform dimensions have their applicability too. For example, the spectral dimension or the variance dimension of a temporal or spatial signal reveals the persistence (i.e., the likelihood that the present trend continues) or antipersistence (the likelihood that the present trend will reverse) of the corresponding fractal object. The variance dimension has certain advantages over the spectral dimension. The fractal dimensions presented in this chapter constitute a sample from an even larger family of dimensions. This sample was intended to show the relative merits of the dimensions, and how they relate to one another. There are many other issues not discussed here. One of the major issues is the accuracy of computing the dimensions. The problem of saturation at the boundaries of natural fractals is discussed further in (Kinsner, 1994c). Another representation problem (i.e., the number of points required to represent the fractal, and the vel sizes used to compute the dimensions) is still being investigated. Our study of fractal dimension spectra indicates that they could be very good candidates for the characterization of perceptual and cognitive processes (Kinsner & Dansereau, 2006) because they reveal long-term relations in those processes through the fundamental multi-scale measurements. They are superior not only to any energybased metrics, but also to entropy-based metrics (Kinsner, 2004).
Acknowledgment This work was supported in part through a research grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada.
323
A Unified Approach to Fractal Dimensions
References Barnsley, M. (1988). Fractals everywhere (p. 396). Boston, MA: Academic. Edgar, G.A. (1990). Measure, topology, and fractal geometry (p. 230). New York, NY: Springer Verlag. Falconer, K. (1990). Fractal geometry: Mathematical foundations and applications (p. 288). New York, NY: Wiley. Faghfouri, A., & Kinsner, W. (2005, May 2-5). Local and global analysis of multifractal singularity spectrum through wavelets. In Proc. IEEE 2005 Can. Conf. Electrical & Computer Eng.(pp. 2157-2163). Saskatoon, SK. Feder, J. (1988). Fractals (p. 238). New York, NY: Plenum. Grassberger, P., & Procaccia, I. (1983, January 31). Characterization of strange attractors. Phys. Rev. Lett., 50(5), 346-349. Halsey, T.C., Jensen, M.H., Kadanoff, L.P., Procaccia, I., & Shraiman, B. (1986, February). Fractal measures and their singularities: The characterization of strange sets. Phys. Rev., A33(2), 1141-1151. Hentschel, H.G.E., & Procaccia, I. (1983). The infinite number of generalized dimensions of fractals and strange attractors. Physica, 8D, 435-444. Hoggar. S.G. (1992). Mathematics for computer graphics (p. 472). Cambridge, UK: Cambridge University Press. Kantz, H., & Schreiber, T. (1997). Nonlinear time series analysis (p. 304).. Cambridge, UK: Cambridge Univ. Press. Kaplan, J.L., & Yorke, J.A. (1979). Chaotic behavior of multidimensional difference equations. In Peitgen, H.-O. & Walther, H.O. (eds.), Functional differential equations and approximations of fixed points (pp. 204-227, 503). New York, NY: Springer Verlag. Kinsner, W., Cheung, V., Cannons, K., Pear, J., & Martin, T. (2003, August 18-20). Signal classification through multifractal analysis and complex domain neural networks. In Proc. IEEE 2003 Intern. Conf. Cognitive Informatics, ICCI03 (pp. 41-46). London, UK. ISBN: 0-7803-1986-5. Kinsner, W. (1994a, May). A unified approach to fractal and multifractal dimensions. Technical Report (p. 147) DEL94-4. Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Manitoba, Canada (abbreviated to UofM in the references below). Kinsner, W. (1994b, June 7). Entropy-based fractal dimensions: Probability and pair-correlation algorithms for E-Dimensional images and strange attractors. Technical Report, DEL94-5; UofM, (p. 44). Kinsner, W. (1994c, June 15). Batch and real-time computation of a fractal dimension based on variance of a time series. Technical Report, DEL94-6; UofM (p. 22). Kinsner, W. (1994d, June 20). The Hausdorff-Besicovitch dimension formulation for fractals and multifractals. Technical Report, DEL94-7; UofM (p. 12). Kinsner, W. (1995, January). Self-similarity: The foundation for fractals and chaos. Technical Report, DEL95-2. UofM, (p. 113). Kinsner, W. (2002, August 19-20). Compression and its metrics for multimedia. In Proc. IEEE 2002 Intern. Conf. Cognitive Informatics, ICCI02 (pp. 107-121). Calgary, AB. ISBN: 0-7695-1724-2. Kinsner, W. (2003, August 18-20). Characterizing chaos through Lyapunov metrics. In Proc. IEEE 2003 Intern. Conf. Cognitive Informatics, ICCI03 (p. 189-201). London, UK. ISBN: 0-7803-1986-5.
324
A Unified Approach to Fractal Dimensions
Kinsner, W. (2004, August 16-18). Is entropy suitable to characterize data and signals for cognitive informatics? In Proc. IEEE 2004 Intern. Conf. Cognitive Informatics, ICCI04 (p. 6-21). Victoria, BC. ISBN: 0-7695-2190-8. Kinsner, W., & Dansereau, R. (2006, July 17-19). A relative fractal dimension spectrum as a complexity measure. In Proceedings of the 5th IEEE International Conference on Cognitive Informatics. Beijing, China. ISBN 1-4244-0475-4. Kinsner, W., Potter, M., & Faghfouri, A. (2005, June 16-18). Signal processing for autonomic computing. In Rec. Can. Applied & Industrial Mathematical Sciences, CAIMS05. Winnipeg, MB. Mallat, S. (1998). A wavelet tour of signal processing (p. 577). San Diego, CA: Academic. Mandelbrot, B.B. (1974). Intermittent turbulence in self-similar cascades: Divergence of higher moments and dimension of the carrier. J. Fluid Mech., 62(2), 331-358. Mandelbrot, B.B. (1982). The fractal geometry of nature (p. 468). New York, NY: W.H. Freeman. Ott, E. (1993). Chaos in dynamical systems (p. 385). Cambridge, UK: Cambridge University Press. Peitgen, H.-O., Jürgens, H., & Saupe, D. (1992). Chaos and fractals: New frontiers of science (p. 984). New York (NY): Springer-Verlag. Sprott, J.C. (2003). Chaos and time-series analysis.(p. 507). Oxford, UK: Oxford University Press. Stacey, G. (1994, November). Stochastic fractal modelling of dielectric discharges. (p. 308). Master’s Thesis. Winnipeg, MB: University of Manitoba. Stanley, H.E., & Meakin, P. (1988, Septembr 29). Multifractal phenomena in physics and chemistry. Nature, 335, 405-409. Tricot, C. (1995). Curves and fractal dimension (p. 323). New York, NY: Springer-Verlag. Vicsek, T. (1992). Fractal growth phenomena (p. 488). Singapore: World Scientific, (2nd ed.). Wang, Y. (2002, August 19-20). On cognitive informatics. In Proc. 1st IEEE Intern. Conf. Cognitive Informatics (pp. 34-42). Calgary, AB. Wang, Y. (2007), Toward theoretical foundations of autonomic computing. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(3), 1-16. USA: IPI Publishing. Wang, Y., & Kinsner, W. (2006, March), Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 121-123. Wornell, G.W. (1996). Signal processing with fractals: A wavelet-based approach (p. 177). Upper Saddle River, NJ: Prentice-Hall.
325
Section V
Relevant Development
327
Chapter XXII
Cognitive Informatics: Four Years in Practice: A Report on IEEE ICCI’05 Du Zhang California State University, USA Witold Kinsner University of Manitoba, Canada Jeffrey Tsai University of Illinois in Chicago, USA Yingxu Wang University of Calgary, Canada Philip Sheu University of California, USA Taehyung Wang California State University, USA
The 2005 IEEE International Conference on Cognitive Informatics (ICCI’05) was held during August 8th to 10th 2005 on the campus of University of California, Irvine. This was the fourth conference of ICCI [Kinsner et al. 05]. The previous conferences were held at Calgary, Canada (ICCI’02) [Wang et al. 02], London, UK (ICCI’03) [Patel et al. 03], and Victoria, Canada (ICCI’04) [Chan et al. 04], respectively. ICCI’05 was organized by General Co-Chairs of Jeffrey Tsai (University of Illinois) and Yingxu Wang (University of Calgary), Program Co-Chairs of Du Zhang (California State University) and Witold Kinsner (University of Manitoba), and Organization CoChairs of Philip Sheu (University of California), Taehyung Wang (California State University, Northridge), and Shangping Ren (Illinois Institute of Technology). Cognitive informatics (CI) is a cutting-edge and multidisciplinary research area that tackles the fundamental problems shared by modern informatics, computation, software engineering, AI, cybernetics, cognitive science,
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Cognitive Informatics: Four Years in Practice
neuropsychology, medical science, systems science, philosophy, linguistics, economics, management science, and life sciences [Wang02]. CI is defined as a transdisciplinary enquiry of cognitive and information sciences that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, and their engineering applications via an interdisciplinary approach [Wang03]. CI is the transdisciplinary study into the internal information processing mechanisms and processes of the natural intelligence – human brains and minds – and their engineering applications. Since its inception in 2002 [Wang et al. 02], ICCI has been growing steadily in both its size and scope. It attracts both researchers from academia, government agencies, and industry practitioners from many countries. The conference provides a main forum for the exchange and cross-fertilization of ideas in the new research areas of CI. To facilitate a more concerted effort toward a particular focus, each conference had its theme in the paper solicitation and final program organization process. For ICCI’05 [Kinsner et al. 02], the theme was natural intelligence and autonomous computing. ICCI’05 had three insightful keynote speeches. The first one entitled “Cognitive Computation: The Ersatz Brain Project” was presented by James A. Anderson of Brown University, offered some exciting glimpses on an ambitious project to build a brain-like computing system. The talk focused on the progress made in three areas: preliminary hardware design, programming techniques, and software applications. The proposed hardware architecture, based on ideas from mammalian neo-cortex, is a massively parallel and two-dimensional locally connected array of CPUs and their associated memory. What makes this design feasible is an approximation to cortical function called the Network of Networks which indicates that the basic computing unit in the cortex is not a single neuron but small groups of neurons working together to form attractor networks. Thus, each network of networks module corresponds to a single CPU in the hardware design. A system with approximately the power of a human cerebral cortex would require about a million CPUs and a terabyte of data specifying the connection strengths using the network of networks approach. To develop “cognitive” software for such a brain-like computing system, there needs to be some new programming techniques such as topographic data representation, lateral data movement, and use of interconnected modules for computation. The software applications involve language, cognitive data analysis, visual information processing, decision making, and knowledge management. The second keynote speech at ICCI’05 was given by Yingxu Wang of University of Calgary on “Psychological Experiments on the Cognitive Complexities of Fundamental Control Structures of Software Systems.” To tackle the fundamental issue of the cognitive complexity of software systems, a set of concepts were introduced. There are ten basic control structures (BCS), each of them has a cognitive weight that describes the extent of difficulty, or the time and effort in comprehending the functionality and semantics of a given structure in programs. Through cognitive psychological experiments, the weights were calibrated. From the cognitive complexities of BCSs, the cognitive functional size (CFS) of a software system can be established, which is modeled as a product of its architectural and operational complexities. Results of case studies indicated that CFS is the most sensitive measure for representing the real complexity of a software system. Third keynote speech at ICCI’05 by Witold Kinsner of University of Manitoba summarized the recent developments in CI. In his talk on “Some Advances in Cognitive Informatics”, Kinsner took a closer look at some recent advances in signal processing for autonomic computing and its metrics. Autonomic computing is geared toward mitigating the escalating complexity of a software system in both its features and interfaces by making the system self-configuring, self-optimizing, self-organizing, self-healing, self-protecting and self-communicating. Because signal processing is used in nearly all fields of human endeavor ranging from signal detection, fault diagnosis, advanced control, audio and image processing, communications engineering, intelligent sensor systems, and business, it will play a pivotal role in developing autonomic computing systems. The classical statistical signal processing nowadays is augmented by intelligent signal processing which utilizes supervised and unsupervised learning through adaptive neural networks, wavelets, fuzzy rule-based computation and rough sets, genetic algorithms, and blind signal estimation. Quality metrics are needed to measure the quality of various multimedia materials in perception, cognition and evolutionary learning processes, to gauge the self-awareness in autonomic systems, and to assess symbiotic cooperation in evolutionary systems. Instead of energy-based metrics, the multiscale metrics based on fractal dimensions are found to be most suited for perception. The technical program of ICCI’05 [Kinsner et al. 05] included 42 papers from researchers and industrial practitioners in this growing field. The accepted papers covered a wide spectrum of topics in CI: from topics on
328
Cognitive Informatics: Four Years in Practice
processes of the natural intelligence (brain organization, cognitive mechanism and process, memory and learning, thinking and reasoning, cognitive linguistics, and neuropsychology), to topics on the internal information processing mechanisms (information model of the brain, knowledge representation and engineering, machine learning, neural networks and neural computation, pattern recognition, and fuzzy logic), and to topics on the engineering applications (autonomic computing, informatics foundation of software engineering, software agent systems, quantum information processing, bioinformatics, web-based information systems, and agent technologies). During the conference, presentations were arranged into the following nine sessions: (1) Cognitive informatics; (2) Information and signal theories; (3) Intelligent systems; (4) Applications of cognitive informatics; (5) Human factors in engineering; (6) Cognitive learning; (7) Knowledge and concept modeling; (8) Intelligent decision making; and (9) Cognitive software engineering. Past four years have witnessed some exciting results in the trenches of CI. The research interest in this niche area from all over the world is growing and the body of work produced thus far is taking shape in both quality and quantity. ICCI’06 will be held during July 17-19, 2006 in Beijing, China with its theme on natural intelligence, autonomic computing, and neuroinformatics [Yao et al. 06], while ICCI’07 and ICCI’08 are slated for Sydney, Australia in 2007, and Madrid, Spain in 2008, respectively. The ICCI’05 program and proceedings are the result of the great effort and contributions of many people. We would like to thank all authors who submitted interesting papers to ICCI’05. We acknowledge the professional work of the Program Committee and external reviewers in effectively reviewing and improving the quality of submitted papers. Our acknowledgement also goes to the invaluable sponsorships of IEEE Computer Society, UCI, UIC, Univ. of Calgary, Univ. of Manitoba, IEEE Canada, and IEEE CS Press. We acknowledge the organizing committee, ICCI’05 secretariat, and student volunteers who have helped to make the event a success.
References Chan, C., Kinsner, W., Wang, Y., & Miller, D.M. (eds.) (2004, August). Cognitive informatics. Proceedings of the 3rd IEEE International Conference (ICCI’04). Victoria, Canada: IEEE CS Press. Kinsner, W., Zhang, D., Wang, Y., & Tsai, J. (eds.) (2005). Cognitive Informatics..Proceedings of the 4th IEEE International Conference (ICCI’05). Irvine, CA: IEEE CS Press. Patel, D., Patel, S., & Wang, Y. (eds.) (2003, August). Cognitive informatics. Proceedings of the 2nd IEEE International Conference (ICCI’03). London, UK: IEEE CS Press. Wang, Y. (2002, August), On cognitive informatics. Keynote Speech at the Proc. 1st IEEE International Conference on Cognitive Informatics (ICCI’02) (pp.34-42). Calgary, Canada: IEEE CS Press. Wang, Y. (2003). Cognitive informatics: A new transdisciplinary research field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, .4(2), 115-127. Wang, Y., Johnston, R., & Smith, M. (eds.) (2002, August). Cognitive informatics. Proceedings of the1st IEEE International Conference (ICCI’02). Calgary, AB: IEEE CS Press.. Yao, Y., Shi., Z., Wang, Y., & Kinsner, W. (eds.) (2006, July). Cognitive Informatics. Proceedings of the 5th IEEE International Conference (ICCI’06). Beijing, China: ) IEEE CS Press.
329
330
Chapter XXIII
Toward Cognitive Informatics and Cognitive Computers: A Report on IEEE ICCI’06 Yiyu Yao University of Regina, Canada Zhongzhi Shi Chinese Academy of Sciences, China Yingxu Wang University of Calgary, Canada Witold Kinsner University of Manitoba, Canada Yixin Zhong Beijing University of Posts and Telecommunications, China Guoyin Wang Chongqing University of Posts and Telecommunications, China Zeng-Guang Hou Chinese Academy of Sciences, China
Cognitive informatics (CI) is a cutting-edge and multidisciplinary research area that tackles the fundamental problems shared by modern informatics, computation, software engineering, AI, cybernetics, cognitive science, neuro-psychology, medical science, systems science, philosophy, linguistics, economics, management science, and life sciences [Wang, 2002]. CI can be viewed as a trans-disciplinary enquiry of cognitive and information sciences that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, and their engineering applications [Wang, 2003, 2007a; Wang and Kinsner, 2006]. It is a trans-disciplinary study of the internal information processing mechanisms and processes of the natural intelligence – human brains and minds – and their engineering applications.
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Toward Cognitive Informatics and Cognitive Computers
The IEEE International Conference on Cognitive Informatics (ICCI) series has been established since 2002 [Wang et al., 2002; Patel et al., 2003; Chan et al., 2004; Kinsner et al., 2005; Yao et al., 2006]. The conference provides the main forum for the exchange and cross-fertilization of ideas in CI. ICCI’06 is the fifth conference of the series and was held at the Institute of Automation, Chinese Academy of Sciences, Beijing, China during July 17-19, 2006. ICCI’06 was organized by conference Co-Chairs Yingxu Wang (University of Calgary), Yixin Zhong (Beijing University of Posts and Telecommunications), and Witold Kinsner (University of Manitoba), and Program Co-Chairs Zhongzhi Shi (Chinese Academy of Sciences) and Yiyu Yao (University of Regina), with the valuable support of Organization Co-Chairs Yuyu Yuan (Beijing University of Posts and Telecommunications), Guoyin Wang (Chongqing University of Posts and Telecommunications) and Zeng-Guang Hou (Chinese Academy of Sciences, China). The program committee of ICCI’06 consists of over 50 experts in various areas of CI around the world. The theme of ICCI’06 is natural intelligence, autonomic computing, and neural informatics. The objectives of ICCI’06 are to draw attention of researchers, practitioners, and graduate students to the investigation of cognitive mechanisms and processes of human information processing, and to stimulate the international effort on cognitive informatics research and engineering applications. The ICCI’06 program encompasses 40 regular papers and 55 short papers selected from 276 submissions from 18 countries based on rigorous reviews by program committee members and external reviewers. Two-volume proceedings have been published by IEEE CS Press [Yao et al., 2006]. During the conference, presentations were arranged into the following 18 sessions: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
Cognitive Models Pattern and Emotion Recognition Computational Intelligence CI Foundations of Software Engineering Autonomic Agents Biosignal Processing Cognitive Complexity of Software Knowledge Manipulation Rough Sets and Problem Solving Descriptive Mathematics for CI Visual Information Processing Knowledge Representation Cognitive Data Mining Neural Networks Pattern Classification Machine Learning Intelligent Algorithms Intelligent Decision-Making
The ICCI’06 program covers a wide spectrum of topics that contribute to cognitive informatics and cognitive computers. Researchers exchanged ideas on processes of the natural intelligence (i.e., brain organization, cognitive mechanism and process, memory and learning, thinking and reasoning, cognitive linguistics, and neuropsychology), internal information processing mechanisms (i.e., cognitive informatics model of the brain, the OAR model, knowledge representation and engineering, machine learning, neural networks and neural computation, pattern recognition, and fuzzy logic), and engineering applications of CI (i.e., autonomic computing, informatics foundation of software engineering, software agent systems, quantum information processing, bioinformatics, web-based information systems, and agent technologies). ICCI’06 brought together a group of over 100 researchers and graduate students to exchange latest research results and to explore new ideas in CI. Through stimulating discussion and a panel session on the future of cognitive informatics, the participants were excited about the current advances, the future trends, and expected development in CI.
331
Toward Cognitive Informatics and Cognitive Computers
The ICCI’06 program is enriched by three keynotes and three special lectures. Jean-Claude Latombe, Professor of Stanford University, presented the keynote speech entitled “Probabilistic Roadmaps: A Motion Planning Approach Based on Active Learning [Latombe, 2006].” This talk focused on motion planning of autonomous robots. A new motion-planning approach – Probabilistic RoadMap (PRM) planning – was presented. PRM planning trades the prohibitive cost of computing the exact shape of a feasible space against the cost of learning its connectivity from dynamically chosen examples (the sampled configurations). PRM planning, a widely popular approach in robotics, is extremely successful in solving apparently very complex motion planning problems with various feasibility constraints. The talk touched the foundations of PRM planning and explained why it is so successful. The PRM success reveals key properties satisfied by feasible spaces encountered in practice. Furthermore, a better understanding of these properties is already making it possible to design faster PRM planners capable of solving increasingly more complex problems. In his keynote speech on “Cognitive Informatics: Towards Future Generation Computers that Think and Feel,” Yingxu Wang, Professor of Calgary University, presented a set of the latest advances in CI that may lead to the design and implementation of cognitive computers capable of thinking and feeling [Wang, 2006a]. He pointed out that CI provides the theory and philosophy for the next generation computers and computing paradigms. In particular, recent advances in CI were discussed in two groups, namely, an entire set of cognitive functions and processes of the brain and an enriched set of denotational mathematics. He described the approach to design cognitive computers for cognitive and perceptible concept/knowledge processing, based on denotational mathematics such as Concept Algebra [Wang, 2006b, 2006d], Real-Time Process Algebra (RTPA) [Wang, 2002b, 2006d], and System Algebra [Wang, 2006c]. Cognitive computers implement the fundamental cognitive processes of the natural intelligence such as the learning, thinking, formal inferences, and perception processes. They are novel information processing systems that think and feel. In contrary to the von Neumann Architecture for traditional stored-program controlled imperitive computer, the speaker elaborated on the Wang Architecture for cognitive computers, consisting of the Knowledge Manipulation Unit (KMU), the Behavior Manipulation Unit (BMU), the Experience Manipulation Unit (EMU), the Skill Manipulation Unit (SMU), the Behavior Perception Unit (BPU), and the Experience Perception Unit (EPU) [Wang, 2006a]. This is an applausive step towards the study on cognitive computers. Withold Kinsner, Professor of Manitoba University, presented a keynote speech entitled “Towards Cognitive Machines: Multiscale Measures and Analysis [Kinsner, 2006]”. He showed that computer science and computer engineering have contributed to many shifts in technological and computing paradigms. The next paradigm shift is the cognitive machines. Such cognitive machines must be able to aware of their environments like human beings. Such machines ought to understand the meaning of information in more human-like ways. The motivation for developing such machines range from self-evidenced practical reasons such as the expense of computer maintenance, to wearable computing in healthcare, and gaining a better understanding of the cognitive capabilities of the human brain. In order to design such machines, we are faced with many problems, ranging from human perception, attention, concept creation, cognition, consciousness, executive processes guided by emotions and value, and symbiotic conversational human-machine interactions. Kinsner suggested that cognitive machine research includes multiscale measures and analysis. More specifically, he discussed definitions of cognitive machines, representations of processes, as well as their measurements and analyses. Application paradigms of cognitive machines were given, including cognitive radio, cognitive radar, and cognitive monitors. Three special lectures stimulating discussions on CI have been invited. Professor Yixin Zhong’s talk was on “A Cognitive Approach to NI and AI Research.” He reviewed the major AI approaches, namely structuralism, functionalism, behaviorism, and cognitivism that seemed contradictory to each other from the traditional point of view. He then proposed the mechanism approach of AI expressed in the form of information-knowledge-intelligence transformation. He showed that the four major approaches may constitute an incorporation based on the new approach, which may be of great significance to both NI and AI research. Professor Zhongzhi Shi presented a talk “On Intelligence Science and Recent Progresses.” He proposed that intelligence science is a cross-fertilized discipline dedicated to joint research on basic theory and technology of intelligence by brain science, cognitive science, cognitive informatics, AI, and etc. Within this framework, he reported recent advances in visual perception, introspective learning, linguistic cognition, consciousness model, and platform of agent-grid intelligence. Professor Yiyu Yao talked on “Granular Computing and Cognitive Informatics.” A framework of granular com-
332
Toward Cognitive Informatics and Cognitive Computers
puting was discussed from three perspectives. From the philosophical perspective, granular computing concerns structured thinking. From the methodological perspective, granular computing concerns structured problem solving. From the computational perspective, granular computing focuses on structured information processing. In summary, the plenary lectures attempted to connect CI research to artificial intelligence, natural intelligence, intelligence science in general and granular computing in particular. They demonstrated that CI may provide an overarching theoretical framework for the studies related to the above research areas. The participants of ICCI’06 have witnessed exciting results from the exploration of many perspectives of CI. The research interest all over the world is growing rapidly and the body of work produced thus far is taking shape in both quality and quantity. Several proposals had been discussed during the conference. It was decided that ICCI’07 will be held at Lake Tahoe, California, USA. A special issue of selected papers of ICCI’06 is planned in the International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi, http://www.enel.ucalgary. ca/IJCINI/). More information about CI may be found at http://www.enel.ucalgary.ca/ICCI2006/. The ICCI’06 program as presented in the proceedings is the result of the great effort and contributions of many people. We would like to thank all authors who submitted interesting papers to ICCI’06. We acknowledge the professional work of the Program Committee and external reviewers for their effective review and improvement of the quality of submitted papers. Our acknowledgement also goes to the invaluable sponsorships of IEEE Computer Society, The IEEE ICCI Steering Committee, Chinese Academy of Sciences, IEEE Canada, IEEE CS Press, and The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi). We thank the keynote speakers and invited lecturers for presenting their visions and insights on fostering this emerging interdisciplinary area. We acknowledge the organizing committee members, and particularly the ICCI’06 secretariats and student volunteers who have helped to make the event a success. The ICCI Steering Committee welcomes contributions and suggestions from researchers around the world in planning future events. Multidisciplinary researchers and practitioners are invited to join the CI community and participate in the future conferences in the IEEE ICCI series.
References Chan, C., Kinsner, W. Wang, Y., & Miller, D.M. (Eds.) (2004, August). Cognitive informatics. Proceedings of the 3rd IEEE International Conference (ICCI’04). Victoria, Canada: IEEE CS Press.. Kinsner, W. (2006, July). Towards cognitive machines: Multiscale measures and analysis. Keynote speech at the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 8-14). Beijing, China: IEEE CS Press. Kinsner, W., D. Zhang, Y. Wang, and J. Tsai (Eds.) (2005, August). Cognitive informatics. Proceedings of the 4th IEEE International Conference (ICCI’05). Irvine, CA: IEEE CS Press.. Latombe, J.-C. (2006, July). Probabilistic roadmaps: A motion planning approach based on active learning. Keynote Speech at the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 1-2). Beijing, China: IEEE CS Press. Patel, D., Patel, S., & Wang, Y. (Eds.) (2003, August). Cognitive informatics. Proceedings of the 2nd IEEE International Conference (ICCI’03), London, UK: IEEE CS Press. Wang, Y. (2002a). On Cognitive Informatics. Keynote speech at the Proceedings of the 1st IEEE International Conference on Cognitive Informatics (ICCI’02) (pp. 34-42). IEEE CS Press. Wang, Y. (2002b, October). The real-time process algebra (RTPA). Annals of Software Engineering: An International Journal 14, 235-274. Oxford: Baltzer Science Publishers. Wang, Y. (2003). Cognitive informatics. A new transdisciplinary research field. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy, 4(2), 115-127.
333
Toward Cognitive Informatics and Cognitive Computers
Wang, Y. (2006a, July). Cognitive informatics - Towards the future generation computers that think and feel. Keynote speech at the Proceedings fo the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 3-7). Beijing, China: IEEE CS Press. Wang, Y. (2006b, July). On concept algebra and knowledge representation. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 320-331). Beijing, China: IEEE CS Press. Wang, Y. (2006c, July). On abstract systems and system algebra. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 332-343). Beijing, China: IEEE CS Press. Wang, Y. (2006d, March). On the informatics laws and deductive semantics of software. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 161-171. Wang, Y. (2007a, January). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), 1(1), 1-27. USA: IGI Publishing, Wang, Y. (2007b, July). The OAR model of neural informatics for internal knowledge representation in the brain. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(3), 64-75. USA: IGI Publishing.,. Wang, Y., & Kinsner, W. (2006, Mrach). Recent advances in cognitive informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 121-123. Wang, Y., Johnston, R., & Smith, M. (Eds.) (2002, August). Cognitive informatics. Proceedings of the 1st IEEE International Conference (ICCI’02). Calgary, AB, Canada: ). IEEE CS Press.. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 124-133, Yao, Y.Y., Shi, Z.,Wang, Y., & Kinsner, W. (Eds.) (2006). Cognitive informatics. Proceedings of the 5th IEEE International Conference (ICCI’06). Beijing, China: IEEE CS Press.
334
335
Compilation of References
Aleksander, I. (1989). Neural computing architectures: The design of brain-like machines. (p. 401). Cambridge, MA: MIT Press. Aleksander, I. (1998, August) From WISARD to MAGNUS: A family of weightless neural machines (pp. 18-30). Aleksander, I. (2003). How to build a mind: Towards machines with imagination (maps of the mind 2nd ed.) (p 192) New York, NY: Columbia University Press. Aleksander, I. (2006). Artificial consciousness: An update. Available as of May 2006 from http://www.ee.ic. ac.uk/research/neural/publications/iwann.html (This is an update on his paper “Towards a neural model of consciousness,” in Proc. ICANN94, New York, NY: Springer, 1994.)
Anderson, J. (2002, August 19-20). Hybrid computation with an attractor neural network In Proceedings of the 1st IEEE Intern. Conf. Cognitive Informatics (pp. 3-12). Calgary, AB{ISBN 0-7695-1724-2} Anderson, J. A. (2005). Cognitive computation: The Ersatz brain project. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 2-3). IEEE Computer Society. Anderson, J. A. (2005). A brain-like computer for cognitive software applications: The Ersatz brain project. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 27-36). IEEE Computer Society. Anderson, J. R. (1993). Rules of the mind. Lawrence Erlbaum Eds.
Aleven, V. & Koedinger, K. (2002). An Effective Metacognitive Strategy : Learning by doing and explaining with computer-based Cognitive Tutors. Cognitive Science. 26(2), 147-179.
Anderson, J. R., & Ross, B. H. (1980). Evidence against a semantic-episodic distinction. Journal of Experimental Psychology: Human Learning and Memory, 6, 441-466.
Alligood, K.T. , Sauer, T.D., & Yorke, J.A. (1996). Chaos: An introduction to dynamical systems (p. 603). New York, NY: Springer Verlag.
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Qin, Y. (2004). An integrated theory of the mind. Psychological Review, 111(4), 1036-1060.
Altman, R.B., Bada, M., Chai, X.J., Carillo, M.W., Chen, R.O., & Abernethy, N.F. (1999). RiboWeb: An ontologybased system for collaborative molecular biology. IEEE Intelligent Systems, 14(5), 68-76.
Anderson, J. R., Corbett, A.T., Koedinger, K.R. & Pelletier, R. (1995). Cognitive Tutors: Lessons learned. The Journal of Learning Sciences. 4(2), 167-207.
Altmann, E. & Trafton, J. (2002). Memory for goals: An Activation-Based Model. Cognitive Science, 26, 39-83.
Ankerst, M., Elsen, C., Ester, M., & Kriegel, H.P. (1999). Visual classification: An interactive approach to decision tree construction, ACM SIGKDD. International Conference on Knowledge Discovery and Data Mining (pp. 392-396).
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Compilation of References
Atmanspacher, H., & Scheingraber, H. (1987). A fundamental link between system theory and statistical mechanics. Foundations of Physics, 17, 939-963. Austin, J. (ed.) (1998). RAM-based neural networks (p. 240). Singapore: World Scientific. Baars B. (1988). A cognitive theory of consciousness. Cambridge, UK: Cambridge University Press.,Available as of May 2006 from http://nsi.edu/users/baars/ BaarsConsciousnessBook1988/index.html Badalamente, R. V., & Greitzer, F. L. (2005). Top ten needs for intelligence analysis tool development. In Proceedings of the 2005 International Conference on Intelligence Analysis, McLean, Virginia. Baddeley, A. 1990. Human Memory : theory and practice. Hove (UK): Lawrence Erlbaum. Baeten, J., & Weijland, W. (1990). Process algebra. Cambridge Tracts in Computer Science, 18. Cambridge University Press. Baeten, J.,& Middelburg, C. (2002). Process algebra with timing. EATCS Monograph. Springer. Baillieul, J. (1985). Kinematic programming alternatives for redundant manipulators. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 722-728). St. Louis, MO. Bargiela, A., & Pedrycz, W. (2002). Granular computing: An introduction (p. 480). New York, NY: Springer. Barlow, H.B. (1961). Possible principles underlying the transformation of sensory messages. In Rosenblith, W.A,editor, Sensory communication (pp. 217-234). .Cambridge, MA: The MIT Press. Barnsley, M. (1988). Fractals everywhere (p. 396). Boston, MA: Academic. Barwise, J. & Moss, L. (1996). Vicious circles. Stanford CA: CLSI Publications. Bateman, J. (1990). Upper modeling: A general organization of knowledge for natural language processing. In Paper prepared for the Workshop on Standards for Knowledge Representation Systems, Santa Barbara.
336
Beck, K. (2000). Extreme programming explained. MA: Addison-Wesley. Bell, A.J, & Sejnowski, T.J. (1997). The ‘independent components’ of natural scenes are edge filters. Vision Research, 37(23), 3327-3338. Bell, D.A. (1953), Information theory. London: Pitman. Belnap, N. D. (1977). A useful four-valued logic. In G. Epstein & J. Dunn (Ed.), Modern uses of multiple-valued logic (pp. 8-37). D. Reidel, Dordrecht. Ben-Ari, M. (1993). Mathematical logic for computer science. UK: Prentice Hall International. Bender, E.A. (1996). Mathematical methods in artificial intelligence. Los Alamitos, CA: IEEE CS Press Berger, J. (1990). Statistical decision theory – Foundations, concepts, and methods. Springer-Verlag. Bergson, H. (1960). Time and free will: An essay on the immediate data of consciousness. New York, NY: Harper Torchbooks (Original edition 1889, translated by F.L. Pogson). Bergstra, J., Ponse, A., & Smolka, S. (eds.) (2001). Handbook of process algebra. North Holland. Bernardo, M., & Gorrieri, R. (1998). A tutorial on EMPA: A theory of concurrent processes with nondeterminism, priorities, probabilities and time. Theoretical Computer Science, 202, 1-54. Bestaoui, Y. (1991). An unconstrained optimization approach to the resolution of the inverse kinematic problem of redundant and non-redundant robot manipulators. Int. J. Robotics, Autonomous Systems, 7, 37-45. Biederman, G. B., Stepaniuk, S., Davey, V. A., Raven, K., & Ahn, D. (1999). Observational learning in children with Down syndrome and developmental delays: The effect of presentation speed in videotaped modeling. Down Syndrome Research and Practice, 6(1), 12-18. Biederman, I. (1987). Recognition-by- components: A theory of human image understanding. Psychological Review, 94(2), 115-147.
Compilation of References
Biggerstaff, T. J., Mitbander, B. G., & Webster, D. E. (1994). Program understanding and the concept assignment problem. Communication of the ACM, 37(5), 72-82.
Bravetti, M., Bernardo, M., & Gorrieri, R. (1998). Towards performance evaluation with general distributions in process algebras. In CONCUR’98, LNCS 1466, (pp. 405-422). Springer.
Bishop, C.M. (1995). Neural networks for pattern recognition (p. 482). Oxford, UK: Oxford University.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Belmont, CA: Wadsworth Int. Group.
Bloom, B. S. (Ed.). (1956). Taxonomy of educational objectives: The classification of educational goals: Handbook, I, Cognitive domain. New York, Toronto: Longmans, Green. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1987). Occam’s razor. Information Processing Letters, 24, 377-380. Boole, G. (1854). An investigation of the laws of thought, on which are founded the mathematical theories of logic and probabilities. New York: Dover Publications, Inc. Boothe, R.G. (2002). Perception of the visual environment. New York: Springer-Verlag; Berlin Heidelberg. Borgo, S., Guarino, N., & Masolo, C., (1997). An ontological theory of physical objects. In Ironi L (ed.), Proceedings of Eleventh International Workshop on Qualitative Reasoning (QR’97), Cortona (Italia), 3-6, 223-231. Brachman R. J. & Levesque, H. J. (2004). Knowledge representation and reasoning, San Francisco: Morgan Kaufmann Publishers. Brachmann, R., & Anand, T. (1996). The process of knowledge discovery in databases: A human-centered approach. Advances in knowledge discovery and data mining. (pp. 37-57). Menlo Park, CA: AAAI Press & MIT Press. Braine, M. D. S., & Rumain, B. (1989). Development of comprehension of “or”: Evidence for a sequence of competencies. Journal of Experimental Child Psychology, 31, 46-70. Bravetti, M., & Gorrieri, R. (2002). The theory of interactive generalized semi-Markov processes. Theoretical Computer Science, 282(1), 5-32.
Brillouin, L. (1964). Scientific uncertainty and information. New York, NY: Academic. Britten K H. (1996). Attention is everywhere. Nature, 382, 497-498. Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 18(6), 543-554. Brooks, R.A. (1970). New approaches to robotics. American Elsevier, 5, 3-23. New York. Brusilovsky, P. & Peylo, C. 2003. Adaptive and intelligent Web-based educational systems. International Journal of AI in Education. 13 (2) : 159-172. Buelthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. In Proceedings of National Academy of Science (pp. 60-64).. USA. Burt P.J. (1985). Smart sensing within a pyramid vision machine. Proceedings of the IEEE, 76(8), 1006-1015. Calvin, W. H. (1996). How brains think: Evolving intelligence, then and now. New York: Basic Books. Calvin, W. H. (1996). The cerebral code: Thinking a thought in the mosaics of the mind. Cambridge, MA: MIT Press. Calvin, W. H., & Bickerton, D. (2000). Lingua ex machina: Reconciling Darwin and Chomsky with the human brain. Cambridge, MA: MIT Press. Cameron, P. J. (1999). Sets, logic, and categories. Springer. Carbonell, J.G., & Mirchell, T.M. (Eds.). (pp. 463-482). Palo Alto, CA: Morgan Kaufmann.
337
Compilation of References
Carlsson, C., & Turban, E. (2002). DSS: Directions for the next decade. Decision Support Systems, 33, 105-110. Cazorla, D., Cuartero, F., Valero, V., Pelayo, F., & Pardo, J. (2003). Algebraic theory of probabilistic and non-deterministic processes. Journal of Logic and Algebraic Programming, 55(1-2), 57-103. Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27, 349-370. Cestnik, B., Kononenko, I., & Bratko, I. (1987). ASSISTANT 86: A knowledge-elicitation tool for sophisticated users. Proceedings of the 2nd European Working Session on Learning (pp. 31-45). Yugoslavia.
Chandrasekaran, B. (1986). Generic tasks in knowledgebased reasoning: High-level building blocks for expert systems design. IEEE Expert, 1(3), 23-30. Chandrasekaran, B., Josephson, J. R. & Benjamins, V. R. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems, Vol.14, No.1, pp. 20-26. Chandrasekaran, B., Josephson, J.R., & Benjamins, V.R. (1998). Ontology of tasks and methods. 11th Knowledge Acquisition for Knowledge-Based Systems Workshop ‘98 (KAW 98) (pp. 6.1-6.21). Banff, Canada. Chang, C. L. Combs, J. B. & Stachowitz, R. A. (1990). A report on the expert systems validation associate (EVA). Expert Systems with Applications, Vol.1, pp.217-230.
Chalmers, D. (1997). The conscious mind: In search of a fundamental theory. (p. 432). Oxford, UK: Oxford University Press.
Chang, C.L. & Lee, R.C.T. (1973). Symbolic logic and mechanical theorem proving. New York: Academic Press.
Chan, C., Kinsner, W., Wang, Y., & Miller, D.M. (eds.) (2004, August). Cognitive informatics. Proceedings of the 3rd IEEE International Conference on Cognitive Informatics (ICCI'04), Victoria, Canada, IEEE Computer Society Press, Los Alamitos, CA., July, 320pp.
Chatry, N., Perdereau, V., Drouin, M., Milgram, M., & Riat, J. C. (1996, May). A new design method for dynamical feedback networks. In Proc. Int. Sym. Soft Computing for Industry, Montpellier, France.
Chan, C.W. (1992). Knowledge acquisition by conceptual modeling. Applied Mathematics Letters Journal, 3, 7-12. Chan, C.W. (1995). Development and application of a knowledge modeling technique. Journal of Experimental and Theoretical Artificial Intelligence, 7, 217-236 Chan, C.W. (2000). A knowledge modelling technique and industrial applications. In C. Leondes, (Ed.), Knowledge-Based Systems Techniques and Applications, 34(4), 1109-1141. USA: Academic Press. Chan, C.W. (2002, August 19-20)Cognitive informatics: A knowledge engineering perspective. Proceedings of First IEEE International Conference on Cognitive Informatics (ICCI 02) (pp. 49-56). Calgary Alberta. Chan, C.W. (2004, May 2-4). A knowledge modeling system. Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering (CCECE ’04) (pp.1353-1356). Niagara Falls, Ontario.
338
Chen, Y.-C., & Walker, I. D. (1993). A consistent nullspace approach to inverse kinematic of redundant robots. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 374-381). Atlanta, USA. Chevallereau, C., & Khalil, W. (1988). A new method for the solution of the inverse kinematics of redundant robots. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 37-42). Philadelphia, USA. Chiew, V., & Wang, Y. (2003, August). A multi-disciplinary perspective on cognitive informatics. The 2nd IEEE International Conference on Cognitive Informatics (ICCI’03) (pp. 114-120). London, UK: IEEE CS Press. Chiew, V., & Wang, Y. (2004). Formal description of the cognitive process of problem solving. Proceedings of ICCI’04 (pp. 74-83). Chomsky, N. (1957). Syntactic Structures. Haag, Mouton.
Compilation of References
Chomsky, N. (1965). Aspect of the Theory of Syntax. MIT Press. Chomsky, N. (1988). Language and mind (p. 208). Cambridge, UK: Cambridge Univ. Press, 2006 (3rd ed.). Clancey, W.J. (1985). Heuristic classification. Artificial Intelligence, 27, 289-350. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261-283. Clarke, B. L. (1981). A calcus of Individuals based on connection. Notre Dame Journal of Formal Logic, 23(3), 204-218. Clarke, B. L. (1985). Individuals and points. Notre Dame Journal of Formal Logic, 26(1), 61-75. Cleaveland, R., Dayar, Z., Smolka, S., & Yuen, S. (1999). Testing pre-orders for probabilistic processes. Information and Computation, 154(2), 93-148. Collins, M. & Loftus, F. 1975. A spreading activation theory of semantic processing. Psychological Review. (82):407-428. Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton/Oxford: Princeton University Press. Corbett, A., Mclaughlin, M. & Scarpinatto, K. C. 2000. Modeling Student Knowledge: Cognitive Tutors in High School and College. Journal of User Modeling and UserAdapted Interaction. (10): 81-108. Cotterill, R. (2003). CyberChild: A simulation test-bed for consciousness studies. In J. Consciousness Studies, 10, 4-5, 31-45. Cotterill, R. (ed.) (1988). Computer simulations in brain science (p. 566) Cambridge, UK: Cambridge University Press. Cover, T.M., & Thomas, J.A. (1991). Elements of information theory (p. 542). New York, NY: Wiley. Cox, J. R., & Griggs, R. A. (1989). The effects of experience on performance in Wason’s selection tasks. Memory and Cognition, 10, 496-503.
Croft, W. B. (1984). The role of context and adaptation in user interfaces. International Journal of Man-Machine Studies, 21, 283-292. D’Argenio, P., Katoen, J.-P., & Brinksma, E. (1998). An algebraic approach to the specification of stochastic systems. In Programming Concepts and Methods, (pp.126-147). Chapman & Hall. Dansereau, R., & Kinsner, W. (2001, May 7-11). New relative multifractal dimension measures. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP2001, 1741-1744. Salt Lake City, UT. Dansereau, R.M., Kinsner, W., & Cevher, V. (2002, May 12-15). Wavelet packet best basis search using Rényi generalized entropy. In Proceedings of the IEEE 2002 Canadian Conference on Electrical & Computer Engineering, CCECE02, 2, 1005-1008 Winnipeg, MB. ISBN: 0-7803-7514-9. Davies, J., & Schneider, S. (1995). A brief history of timed CSP. Theoretical Computer Science, 138, 243-271. Dawkins, R. (1990). The selfish gene (2nd ed.) (p. 368). Oxford, UK: Oxford University Press. de Farias, D.P. (2002). The linear programming approach to approximate dynamic programming: Theory and application, doctorate dissertation. (p. 146). Stanford, CA: Stanford University. Available as of May 2006 from http://web.mit.edu/~pucci/www/daniela_thesis.pdf de Farias, D.P., & Van Roy, B. (2003). The linear programming approach to approximate dynamic programming. Oper. Res., 51(6), 850-865. de Farias, D.P., & Van Roy, B. (2004, August). On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res., 29,(3), 462-478. de Rosis, F. (2001). Towards adaptation of interaction to affective factors. Journal of User Modeling and UserAdapted Interaction, 11(4). Dennett D.C. (1991). Consciousness explained (p. 528). London, UK: Allan Lane/Penguin.
339
Compilation of References
Dillinger, M., Madani, K., & Alonistioti, Nancy (eds.) (2003). Software defined radio: Architectures, systems and functions (p. 454). New York, NY: Wiley. Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery 3(4) 409-425. Dong, T. (2005). Recognizing variable spatial environments—The theory of cognitive prism. Unpublished doctoral dissertation. University of Bremen, Germany. Dong, T. (2005). SNAPVis and SPANVis: Ontologies for recognizing variable vista spatial environments. In C. Freksa, M. Knauff, B. Krieg-Brückner, B. Nebel, & T. Barkowsky (ds.), International Conference Spatial Cognition, 4, 344-365. Berlin: Springer. Dong, T. (2006). The theory of cognitive prism—Recognizing variable spatial environments. In Proceedings of the Nineteenth International Florida Artificial Intelligence Research Society Conference (pp. 719–724). Menlo Park, CA: AAAI Press. Dong, T. (2007). Knowledge representation of distances and orientation of rregions. International Journal of Cognitive Informatics and Natural Intelligence, 1(2), 86-99. Doxygen. (2004). Doxygen Web site. Retrieved January 7, 2005, from http://www.stack.nl/~dimitri/doxygen/ Dreyfus, H. L., & Dreyfus, S. E. (1986). Mind over machine: The power of human intuition and expertise in the era of the computer. New York: The Free Press.
Eagly, A.H., & Chaiken, S. (1992). The psychology of attitudes. San Diego: Harcourt, Brace. Ebbinghaus, H. D., Flum, J., & Thomas, W. (1984). Mathematical logic. Springer. Edgar, G.A. (1990). Measure, topology, and fractal geometry (p. 230). New York, NY: Springer Verlag. Edwards, W., & Fasolo, B. (2001). Decision technology. Annual Review of Psychology, 52, 581-606. Elm, W.C., Cook, M.J., Greitzer, F.L., Hoffman, R.R, Moon, B., & Hutchins, S.G. (2004). Designing support for intelligence analysis. Proceedings of the Human Factors and Ergonomics Society (pp. 20-24). Faghfouri, A., & Kinsner, W. (2005, May 2-5). Local and global analysis of multifractal singularity spectrum through wavelets. In Proc. IEEE 2005 Can. Conf. Electrical & Computer Eng.(pp. 2157-2163). Saskatoon, SK. Fagin, R., Halpern, J. Y., Moses, Y. & Vardi, M. Y. (1995). Reasoning about Knowledge. Cambridge, MA: MIT Press. Falconer, K. (1990). Fractal geometry: Mathematical foundations and applications (p. 288). New York, NY: Wiley. Fang, G., & Dissanayake, M.W.M.G. (1993, July). A neural network-based algorithm for robot trajectory planning. Proceedings of International Conference of Robots for Competitive Industries. (p. 521-530). Brisbane, Qld, Australia.
Dreyfus, H.L. (1992). What computers still can’t do. MIT Press/Cambridge Press.
Fang, G., & Dissanayake, M.W.M.G. (1998). Experiments on a neural network-based method for time-optimal trajectory planning. Robotica, 16, 143-158.
Dubey, R. V., Euler, J. A., & Babcock, S. M. (1991). Real-time implementation of an optimization scheme for seven-degree-of-freedom redundant manipulators. In IEEE Tran. Sys. Robotics and Automation, 7(5), 579-588.
Farrell, J.E., & Van Den Branden Lambrecht, C.J. (eds.) (2002, January). Translating human vision research into engineering technology [Special Issue]. Proceedings of the IEEE, 90(1).
Dubois, D., & Prade, H. (1988). Possibility theory: An approach to computerized processing of uncertainty (p. 263). New York, NY: Plenum.
340
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (eds.) (1996). Advances in knowledge discovery and data mining. AAAI/MIT Press.
Compilation of References
Fazio, R.H. (1986). How do attitudes guide behavior. In R.M. Sorrentino and E.T. Higgins (eds.), The Handbook of Motivation and Cognition: Foundations of Social Behavior. New York: Guilford Press. Featherstone, R. (1994). Accurate trajectory transformations for redundant and non-redundant robots. In Proc. IEEE Int. Conf. on Robotics and Automation (pp.18671872). San Diego, USA. Feder, J. (1988). Fractals (p. 238). New York, NY: Plenum. Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A, 4(12), 2379-2394. Field, D.J. (1994). What is the goal of sensory coding. Neural Computation, 6, 559-601. Finin, T., & Silverman, D. (1986). Interactive classification of conceptual knowledge. Proceedings of the First International Workshop on Expert Database Systems (pp. 79-90). Fischer, G., Mccall, R., Ostwald, J., Reeves, B., & Shipman, F. (1994). Seeding, evolutionary growth and reseeding: Supporting the incremental development of design environments. Paper presented at the Conference on Computer-Human Interaction (Chi’94), Boston, MA. Fischer, K.W., Shaver, P.R., & Carnochan, P. (1990). How emotions develop and how they organize development. Cognition and Emotion, 4, 81-127. Fischler, M.A., & Firschein, O. (1981). Intelligence: The eye, the brain and the computer (p. 331). Reading, MA: Addison-Wesley. Fishbein, M., & Ajzen, I. (1975). Belief, attitude, intention, and behavior: An introduction to theory and research.. Reading, MA: Addison-Wesley. Fisher, D., & Schlimmer, J. (1988). Concept simplification and prediction accuracy. In Proceedings of the Fifth International Conference on Machine Learning. Morgan Kaufmann, 22-28.
In H. T. Smith & T. R. G. Green (Eds.), Human interaction with computers. London: Academic Press. Fitting, M. (1991). Bilattices and the semantics of logic programming. Journal of Logic Programming, Vol.11, pp. 91-116. Fitting, M. (2002). Fixpoint semantics for logic programming: a survey. Theoretical Computer Science, Vol. 278, Issues 1-2, pp. 25-51. Fitts, P. M. (1951). Engineering psychology and equipment design. In S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 1287-1340). New York: Wiley. Fitts, P. M. (Ed.) (1951). Human engineering for an effective air navigation and traffic control system. Washington, DC: National Academy Press, National Academy of Sciences. Flax, L. (2004, September). Algebraic belief revision and nonmonotonic entailment results and proofs. Technical Report C/TR04-01, Macquarie University. Retrieved from http://www.comp.mq.edu.au/~flax/techReports/ brNm.pdf Flores-Mendez, R.A., Van Leeuwee, P., Lukose, D. (1998). Modeling expertise using KADS and MODEL-ECS. In B.R. Gaines & M. Musen (eds.) Proceedings of the 11th Knowledge Acquisition for Knowledge-based Systems Workshop (KAW’98), 1 3-14, 19-23. Banff, Canada. Forward, A., & Lethbridge, T. (2002). The relevance of software documentation, tools and techniqies: A survey. Paper presented at the ACM Symposium on Document Engineering, Mclean, VA. Fowler, M. (1999). Refactoring: Improving the design of existing code. MA: Addison-Wesley. Franklin, N., & Tversky, B. (1990). Searching imagined environments. Journal of Experimental Psychology: General, 119, 63-76. Franklin, S. (1995). Artificial minds (p. 464). Cambridge, MA: MIT Press. Franklin, S. (2003). IDA: A conscious artefact.
Fitter, M. J., & Sime, M. E. (1980). Creating responsive computers: Responsibility and shared decision-making.
341
Compilation of References
Freeeman, W. (2001). How brains make up their minds (2nd ed.) (p. 146). New York, NY: Columbia University Press. Frith, C. D. (1992). The cognitive neuropsychology of schizophrenia. Lawrence Erlbaum Associates. Gabrieli, J.D.E. (1998). Cognitive neuroscience of human memory. Annual Review of Psychology, 49, 87-115. Gadhok, N., & Kinsner, W. (2006, May 10-12). An implementation of beta-divergence for blind source separation. In Proceedings of the IEEE Can. Conf. Electrical & Computer Eng., CCECE06 (pp. 642-646). Ottawa, ON.
Gershon, N. (1995). Human information interaction. In Proceedings of the WWW4 Conference, Boston, MA. Giarrantans, J., & Riley, G. (1989). Expert systems: Principles and programming. Boston: PWS-KENT Pub. Co. Ginsberg, A. & Williamson, K. (1993). Inconsistency and rdundancy checking for quasi-first-order-logic knowledge bases. International Journal of Expert Systems, Vol.6, No.3, pp. 321-340. Ginsberg, M. L. (1988). Multivalued logics: a uniform approach to inference in artificial intelligence. Computational Intelligence, Vol.4, No.3, pp. 265-316.
Gagné, R., Briggs, L. & Wager, W. 1992. Principles of Instructional Design. (4th edition), New York: Holt, Rinehart & Winston (Eds.).
Glabbeek, R.V., Smolka, S., & Steffen, B. (1995). Reactive, generative and stratified models of probabilistic processes. Information and Computation, 121(1), 59-80.
Ganek, A.G., & Corbi, T.A. (2006, May). The dawning of the autonomic computing era. IBM Systems Journal, 42(1), 34-42. Available as of May 2006 from http://www. research.ibm.com/journal/sj/421/ganek.pdf/
Glasser, W. (1998). The quality school. Perennial.
Ganter, B., & Wille, R. (1999). Formal concept analysis (pp. 1-5). Springer. Garagnani, M., Shastri, L., & Wendelken, C. 2002. A connectionist model of planning as back-chaining search. Proceedings of the 24th Conference of the Cognitive Science Society. Fairfax, Virginia, USA. pp 345-350. Gärdenfors, P. (1988). Knowledge in flux. MIT Press. Gärdenfors, P., & Rott, H. (1995). Belief revision. In M. Dov, C. Gabbay, J. Hogger, & J. A. Robinson (Eds.), Handbook of logic in artificial intelligence and logic programming (Vol. 4, pp. 35-132). Oxford University Press. Gasson, M., Hutt, B., Goodhew, I., Kyberd, P., & Warwick, K. (2002, September). Bi-directional human machine interface via direct neural connection. In Proceedings of the IEEE Workshop on Robot and Human Interactive Communication (pp. 265-270), Berlin, Germany. Genesereth, M. R. & Nilsson, N. J. (1987). Logical foundations of artificial intelligence. Los Altos, CA: Morgan Kaufmann Publishers.
342
Gold, E. M. (1978). Complexity of automaton identification from given data. Information and Control, 37, 302-320. Goldberg, D.E. (2002). The design of innovation: Genetic algorithms and evolutionary computation (p. 272). New York, NY: Springer. Gomez-Perez, A., Fernandez-Lopez, M. & Corcho, O. (2004). Ontological engineering. London: SpringerVerlag. Gosling, J., Joy, B., & Guy, S. (1996). Java language specification. MA: Addison-Wesley. Grassberger, P., & Procaccia, I. (1983, January 31). Characterization of strange attractors. Physics Review Letters, 50A(5), 346-349. Greitzer, F. L. (2005). Toward the development of cognitive task difficulty metrics to support intelligence analysis research. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 315-320). IEEE Computer Society. Greitzer, F. L. (2005). Extending the reach of augmented cognition to real-world decision making tasks. In Proceedings of the HCI International 2005/Augmented Cognition Conference, Las Vegas, NV.
Compilation of References
Greitzer, F. L., Hershman, R. L., & Kaiwi, J. (1985). Intelligent interfaces for C2 operability. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. Griffith, D. (1990). Computer access for persons who are blind or visually impaired: Human factors issues. Human Factors, 32, 467-475. Griffith, D. (2005). Beyond usability: The new symbiosis. Ergonomics in Design, 13, 3. Griffith, D. (2005). Neo-symbiosis: A tool for diversity and enrichment. Retrieved August 6, 2006, from http://2005.cyberg.wits.ac.za. Griffith, D., Gardner-Bonneau, D. J., Edwards, A. D. N., Elkind, J. I., & Williges, R. C. (1989). Human factors research with special populations will further advance the theory and practice of the human factors discipline. In Proceedings of the Human Factors 33rd Annual Meeting (pp. 565-566), Santa Monica, CA. Human Factors Society. Grossberg, S. (1982). Studies of mind and brain: Neural principles of learning, perception, development, cognition and motor control (p.662). Boston, MA: D. Reidel Publishing. Grossberg, S. (ed.) (1988). Neural networks and natural intelligence (p. 637). Cambridge, MA: MIT Press. Gruber, T. (1992, October). A translation approach to portable ontology specifications. Proceedings of the 7th Banff Knowledge Acquisition Knowledge-based Systems Workshop ’92 (pp. 11-16). Paper No. 12. Banff, Canada. Guarino, N., & Giaretta, P., (1995). Ontologies and knowledge bases: Towards a terminological clarification. In Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, (pp. 25-32) N. Mars (Ed.), Amsterdam, The Netherlands: IOS Press. Guez, A., & Ahmad, Z. (1989, June). Accelerated convergence in the inverse kinematics via multiplayer feedforward networks. In Proc. IEEE Int. Conf. Neural Networks (pp. 341-344). Washington, USA.
Gцtz, N., Herzog, U., & Rettelbach, M. (1993). Multiprocessor and distributed system design: The integration of functional specification and performance analysis using stochastic process algebras. In 16th Int. Symp. on Computer Performance Modelling, Measurement and Evaluation (PERFORMANCE’93), LNCS 729, (pp. 121–146). Springer. Haikonen, P.O.A. (2003). The cognitive approach to conscious machines (p. 294). New York, NY: Academic. (See also http://personal.inet.fi/cool/pentti.haikonen/) Haikonen, P.O.A. (2004, June). Conscious machines and machine emotions. Workshop on Models for Machine Consciousness, Antwerp, BE.. Halford, G. S. (1993). Children’s understanding: The development of mental models. Hillsdale, NJ: Lawrence Erlbaum Associates. Halsey, T.C., Jensen, M.H., Kadanoff, L.P., Procaccia, I., & Shraiman, B. (1986, February). Fractal measures and their singularities: The characterization of strange sets. Phys. Rev., A33(2), 1141-1151. Han, C.h., Lidz, J., & Musolino, J. (2003). Verb-raising and grammar competition in Korean: Evidence from negation and quantifier scope. Unpublished manuscript. Simon Fraser University, Northwestern University, Indiana University. Han, C.-h., Ryan, D., Storoshenko, S., & Yasuko, S. (in press). Scope of negation, and clause structure in Japanese. In Proceedings of the 30th Berkeley Linguistics Society. Han, J., Hu, X., & Cercone, N. (2003). A visualization model of interactive knowledge discovery systems and its implementations, Information Visualization, 2(2), 105-125. Hancock, P. A., Pepe, A. A., & Murphy, L. (2005). Hedonomics: The power of positive and pleasurable ergonomics. Ergonomics in Design, 13, 8-14. Hanh, U., Schulz, S., & Romacker, M., (1999). Part whole reasoning: A case study in medical ontology engineering. IEEE Intelligent Systems, 14(5), 59-67.
343
Compilation of References
Harrison, P., & Strulo, B. (2000). SPADES – A process algebra for discrete event simulation. Journal of Logic Computation, 10(1), 3-42.
Held, G. (1987). Data compression: Techniques and applications, hardware and software considerations (2nd ed.), (p. 206). New York, NY: Wiley.
Hartley, R.V.L. (1928). Transmission of information. Bell System Technical Journal, I, 535-563.
Henninger, S. (1997). Tools supporting the creation and evolution of software development knowledge. Paper presented at the International Conference on Automated Software Engineering (ASE’97), Incline Village, NV.
Hastie, R. (2001). Problems for judgment and decision making. Annual Review of Psychology, 52, 653-683. Hatano, K., Sano, R., Duan, Y., & Tanaka, K. (1999). An interactive classification of Web documents by selforganizing maps and search engines. Proceedings of the 6th International Conference on Database Systems for Advanced Applications (pp. 19-22). Hawkins, S. (1996). The illustrated a brief history of time (2nd ed.) (p.248). New York, NY: Bantam. Haykin, S. (2005, February). Cognitive radio: Brainempowered wireless communications. IEEE J. Selected Areas in Communications, 23(2), 201-220. Haykin, S. (2005, September 28-30). Cognitive machines. In IEEE Intern. Workshop on Machine Intelligence & Sign. Proceedings, IWMISP05. Mystic, CT. Available as of May 2006 from http://soma.crl.mcmaster.ca/ASLWeb/Resources/data/ Cognitive_Machines.pdf Haykin, S. (2006, January). Cognitive radar. IEEE Signal Processing Mag (pp. 30-40).. Haykin, S., & Chen, Z. (2005). The cocktail party problem. Neural Computation, 17, 1875-1902. Haykin, S., & Kosko, Bart. (2001). Intelligent signal processing (p. 553) New York, NY: Wiley. . Haykin, S., Principe, C.J., Sejnowski, T.J., & McWhirter, J. (2006). New directions in statistical signal processing (p. 544). Cambridge, MA: MIT Press.
Hentschel, H.G.E., & Procaccia, I. (1983). The infinite number of generalized dimensions of fractals and strange attractors. Physica, 8D, 435-444. Hermann, D. & Harwood, J. 1980. More evidence for the existence of separate semantic and episodic stores in long-term memory. Journal of Experimental Psychology, 6 (5), 467-478. Hillston, J. (1996). A compositional approach to performance modelling. Cambridge University Press. Hinton, G.E., & Anderson, J.A. (1981). Parallel models of associative memory (p. 295). Hillsdale, NJ: Lawrence Erlbaum Associates. Hoare, C.A.R. (1985). Communicating sequential processes. Prentice-Hall Inc. Hoffman, R. R., Feltovich, P. J., Ford, K. M., Woods, D. D., Klein, G., & Feltovich, A. (2002). A rose by any other name… would probably be given an acronym. Retrieved August 6, 2006, from http://www.ihmc.us/research/projects/EssaysOnHCC/TheRose.pdf Hoffman, R.R., Klein, G., & Laughry, K.R. (2002, January/February). The state of cognitive systems engineering. IEEE Intelligent Systems Magazine (pp. 73-75).. Hoggar. S.G. (1992). Mathematics for computer graphics (p. 472). Cambridge, UK: Cambridge University Press.
Heath, P. (Eds). (1966). On the syllogism and other logical writings by Augustus De Morgan. New Haven: Yale University Press.
Holland, O. (ed.), (2003). Machine consciousness (p. 192). Exeter, UK: Imprint Academic.
Heermann, D. & Fuhrmann, T. 2000. Teaching physics in the virtual university: the Mechanics toolkit, in Computer Physics Communications, 127, 11-15
Hollnagel, E., & Woods, D. D. (1983). Cognitive systems engineering: New wine in new bottles. International Journal of Man-Machine Studies, 18, 583-600. Reprinted (1999) in 30th Anniversary Issue of International Journal
344
Compilation of References
of Human-Computer Studies, 51, 339-356. Retrieved August 6, 2006, from http://www.idealibrary.com
IBM Autonomic Computing Manifesto. (Available as of May 2006, http://www.research.ibm.com/autonomic/)
Hubel, D.H. (1995). Eye, brain and vision (Reprint edition). W.H. Freeman & Company.
IBM Corp (2005). Autonomic computing. Retrieved April 2005, from http://www.research.ibm.com/autonomic/glossary.html
Huffman, D. (1954). The synthesis of sequential switching circuits. J. Franklin Inst. 257, 3-4, 161-190, 275-303. Hughes, F. J., & Schum, D. A. (2003). Preparing for the future of intelligence analysis: Discovery – Proof – Choice. Unpublished manuscript. Joint Military Intelligence College. Humphreys, G. K., & Khan, S. C. (1992). Recognizing novel views of three-dimensional objects. Canadian Journal of Psychology, 46, 170-190. Humphreys, M. S., Bain, J. D. & Pike, R. (1989). Different ways to cue a coherent memory system: A theory for episodic, semantic and procedural tasks. Psychological Review, 96, 208-233. Hunt, K. H. (1987, March). Robot kinematics - A compact analytic inverse solution for velocities. In ASME J. Mechanisms, Transmissions and Automat. Design, 109, 42-49. Hunt, K. J., et al. (1992, November). Neural networks for control systems. Automatica, 28(2), 1083-1112. Hurley, P.J. (1997). A concise introduction to logic (6th ed.). London: Wadsworth Publishing Co., ITP. Hyvarinen, A., & Hoyer, P.O. (2001). A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images. Vision Research. 41(18), 2413-2423. Hyvarinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis (p. 481). New York, NY: Wiley. IBM (2001). IBM autonomic computing manifesto. http://www.research. ibm.com/autonomic/. IBM (2006, June). Autonomic computing white paper: An architectural blueprint for autonomic computing (4th ed.) (pp.1-37).
IBM Corp (2005). The eight elements. Retrieved April 2005, from http://www.research.ibm.com/autonomic/ manifesto/autonomic_computing.pdf ISO/IEC 11172-3 (1993) Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s - Part 3: Audio. Itti, L. (2001). Visual attention and target detection in cluttered natural scenes. Optical Engineering, 40(9), 1784-1793. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on PAMI, 20(11), 1254-1259. Jasper, R., & Uschold, M. (1999, August). A framework for understanding and classifying ontology applications. In Proceedings of the IJCAI99 Workshop on Ontologies and Problem-Solving Methods (KRR5) (pp. 11.1-11.12). Stockholm, Sweden. Jayant, N. (1992, June). Signal compression: Technology targets and research directions. IEEE Journal on Selected Areas Communications, 10, 796-818. Jayant, N. (ed.) (1997). Signal compression: Coding of speech, audio, text, image and video (p. 231). Singapore: World Scientific. Jayant, N.S., Johnson, J.D., & Safranek, R.S. (1993, October). Signal compression based on models of human perception. Proceedings of the IEEE, 81(10), 1385-1422. Jennings, N.R. (2000). On agent-based software engineering. Artificial Intelligence, 17(2), 277-296. Jennings, R. E. (1994). The genealogy of disjunction. New York: Oxford University Press. Jennings, R. E. (2004). The meaning of connectives. In S. Davis & B. Gillon (Eds.), Semantics: A reader. New York: Oxford University Press. 345
Compilation of References
Jennings, R. E. (2005). The semantic illusion. (in press). In A. Irvine & K. Peacock (Eds.), Errors of reason. Toronto: University of Toronto Press. Jennings, R. E., & Friedrich, N. A. (2006). Proof and consequence: An introduction to classical logic. Peterborough: Broadview Press. Jennings, R. E., & Schapansky, N. (2000). Without: From separation to negation, a case study in logicalization. In Proceedings of the CLA 2000 (pp.147-158). Ottawa: Cahiers Linguistiques d’Ottawa. Jensen, E. (2000). Brain-based learning: The new science of teaching and training (revision ed.). Brain Store Inc. Jordan, D. W., & Smith, P. (1997). Mathematical techniques: An introduction for the engineering, physical, and mathematical sciences (2nd ed.). UK; Oxford University Press. Joy, B. (2000, April). Why the future doesn’t need us. Wired, 8.04. Jung, S., & Hsia, T.C. (2000). Neural network inverse control techniques for PD controlled robot manipulator. Robotica, 18, 305-314. Kadanoff, L.P. (1993). From order to chaos: Essays (p. 555). Singapore: World Scientific. Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice Hall, Inc. Kahneman, D. (2002, December 8). Maps of bounded rationality: A perspective on intuitive judgment and choice. Nobel Prize lecture. Kahneman, D. (2003). A perspective on judgment and choice: Mapping bounded rationality. American Psychologist, 58, 697-720. Kahneman, D., & Frederick, S. (2002). Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases (pp. 49-81). New York: Cambridge University Press.
Kantor, P. B. (1980). Availability analysis. Journal of the American Society for Information Science, 27(6), 311-319. Reprinted (1980) in Key Papers in information science (pp. 368-376). White Plains, NY: Knowledge Industry Publications, Inc. Kantz, H., & Schreiber, T. (1997). Nonlinear time series analysis (p. 304).. Cambridge, UK: Cambridge Univ. Press. Kaplan, J.L., & Yorke, J.A. (1979). Chaotic behavior of multidimensional difference equations. In Peitgen, H.-O. & Walther, H.O. (eds.), Functional differential equations and approximations of fixed points (pp. 204-227, 503). New York, NY: Springer Verlag. Kawato, M., Maeda, Y., Uno, Y., & Suzuki, R. (1990). Trajectory formation of arm movement by cascade neural network model based on minimum torque-change criterion. Biological Cybernetics, 62(275), 288. Kephart, J., & Chess, D. (2003, January). The vision of autonomic computing. IEEE Computer, 26(1), 41-50. Kieffer, S., Morellas, V., & Donath, M. (1991, April). Neural network learning of the inverse kinematic relationships for a robot arm. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 2418-2425). Sacramento, CA. Kim, S. W., Park, K. B. & Lee, J. J. (1994). Redundancy resolution of robot manipulators using optimal kinematic control. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 683-688). San Diego, USA. Kinsner, W. (1991). Review of data compression methods, including Shannon-Fano, Huffman, arithmetic, Storer, Lempel-Ziv-Welch, fractal, neural network, and wavelet algorithms. Technical Report DEL91-1 (p. 157). Winnipeg, MB, Canada: Dept. Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (1994). Fractal dimensions: Morphological, Entropy, Spectrum, and Variance Classes. Technical Report, DEL94-4 (p.146) Winnipeg, MB, Canada: Dept. of Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (1994, May). A unified approach to fractal and multifractal dimensions. Technical Report (p. 147)
346
Compilation of References
DEL94-4. Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Manitoba, Canada (abbreviated to UofM in the references below). Kinsner, W. (1994, June 7). Entropy-based fractal dimensions: Probability and pair-correlation algorithms for E-Dimensional images and strange attractors. Technical Report, DEL94-5; UofM, (p. 44). Kinsner, W. (1994, June 15). Batch and real-time computation of a fractal dimension based on variance of a time series. Technical Report, DEL94-6; UofM (p. 22).
Report DEL03-1, (p. 76). Winnipeg, MB, Canada: Dept. Electrical & Computer Engineering, University of Manitoba. Kinsner, W. (2003, August 18-20). Characterizing chaos through Lyapunov metrics. In Proceedings of the 2nd IEEE International Conference on Cognitive Informatics (pp. 189-201). London UK. ISBN 0-7695-1986-5. Kinsner, W. (2003). Is it noise of chaos? Technical Report DEL03-2 (p. 98). Winnipeg, MB, Canada: Dept. Electrical & Computer Engineering, University of Manitoba.
Kinsner, W. (1994, June 20). The Hausdorff-Besicovitch dimension formulation for fractals and multifractals. Technical Report, DEL94-7; UofM (p. 12).
Kinsner, W. (2004, August 16-18). Is entropy suitable to characterize data and signals for cognitive informatics? In Proc. IEEE 2004 Intern. Conf. Cognitive Informatics, ICCI04 (p. 6-21). Victoria, BC. ISBN: 0-7695-2190-8.
Kinsner, W. (1995, January). Self-similarity: The foundation for fractals and chaos. Technical Report, DEL95-2. UofM, (p. 113).
Kinsner, W. (2005). Some advances in cognitive informatics. In International Conference on Cognitive Informatics (ICCI’05).6-7. IEEE Press.
Kinsner, W. (1996). Fractal and chaos engineering: Postgraduate lecture notes (p. 760). Winnipeg, MB, Canada: Department of Electrical & Computer Engineering, University of Manitoba.
Kinsner, W. (2005, August 8-10). A unified approach to fractal dimensions. In Proceedings of the 4th IEEE International Conference on Cognitive Informatics (pp. 58-72). Irvine, CA. ISBN 0-7803-9136-5.
Kinsner, W. (1998). Signal and data compression: Postgraduate lecture notes P. 642). Winnipeg, MB, Canada: Department of Electrical & Computer Engineering, University of Manitoba.
Kinsner, W. (2005, August 8-10). A unified approach to fractal dimensions. In Proceedings of the IEEE 2005 Intern. Conf. Cognitive Informatics, ICCI05 (pp. 58-72). Irvine, CA.{ISBN: 0-7803-9136-5}.
Kinsner, W. (2002, August 19-20). Compression and its metrics for multimedia. In Proceedings of the 1st IEEE International Conference on Cognitive Informatics (pp.107-121). Calgary, AB. {ISBN 0-7695-1724-2}
Kinsner, W. (2005, June 16-18). Signal processing for autonomic computing. In Proceedings 2005 Meet. Can. Applied & Industrial Math Soc., CAIMS 2005, Winnipeg, MB Available as of May 2006 from http://www.umanitoba.ca/institutes/iims/caims2005_theme_signal.shtml
Kinsner, W. (2002, August 19-20). Compression and its metrics for multimedia. In Proc. IEEE 2002 Intern. Conf. Cognitive Informatics, ICCI02 (pp. 107-121). Calgary, AB. ISBN: 0-7695-1724-2. Kinsner, W. (2003, August 18-20). Characterizing chaos through Lyapunov metrics. In Proc. IEEE 2003 Intern. Conf. Cognitive Informatics, ICCI03 (p. 189-201). London, UK. ISBN: 0-7803-1986-5. Kinsner, W. (2003). Characterizing chaos with Lyapunov exponents and Kolmogorov-Sinai entropy. Technical
Kinsner, W. (2006, July). Towards cognitive machines: Multiscale measures and analysis. Keynote speech at the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 8-14). Beijing, China: IEEE CS Press. Kinsner, W. (2007). Towards cognitive machines: Multiscale measures and analysis. The International Journal on Cognitive Informatics and Natural Intelligence (IJCINI), 1(1), 28-38.
347
Compilation of References
Kinsner, W., & Dansereau, R. (2006, July 17-19). A relative fractal dimension spectrum as a complexity measure. In Proceedings of the 5th IEEE International Conference on Cognitive Informatics. Beijing, China. ISBN 1-4244-0475-4. Kinsner, W., & Dansereau, R. (2006, July 17-19). A relative fractal dimension spectrum as a complexity measure. In Proceedings of the IEEE 2006 Intern. Conf. Cognitive Informatics, ICCI06, Beijing, China.{ISBN: 1-4244-0475-4}. Kinsner, W., & Dansereau, R. (2006, July 17-19). A relative fractal dimension spectrum as a complexity measure. In Proceedings of the 5th IEEE International Conference on Cognitive Informatics. Beijing, China. ISBN 1-4244-0475-4. Kinsner, W., Cheung, V., Cannons, K., Pear, J., & Martin, T. (2003, August 18-20). Signal classification through multifractal analysis and complex domain neural networks. In Proc. IEEE 2003 Intern. Conf. Cognitive Informatics, ICCI03 (pp. 41-46). London, UK. ISBN: 0-7803-1986-5. Kinsner, W., D. Zhang, Y. Wang, and J. Tsai (Eds.) (2005), Cognitive Informatics: Proc. 4th IEEE International Conference on Cognitive Informatics (ICCI’05), IEEE CS Press, Irvine, California, USA, August. Kinsner, W., D. Zhang, Y. Wang, and J. Tsai (Eds.) (2005, August). Cognitive informatics. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI'05), Irvine, California, USA, IEEE Computer Society Press, Los Alamitos, CA., July, 356pp. Kinsner, W., Potter, M., & Faghfouri, A. (2005, June 16-18). Signal processing for autonomic computing. In Rec. Can. Applied & Industrial Mathematical Sciences, CAIMS05. Winnipeg, MB. Kinsner, W., Zhang, D., Wang, Y., & Tsai, J. (eds.) (2005). Cognitive Informatics..Proceedings of the 4th IEEE International Conference (ICCI’05). Irvine, CA: IEEE CS Press. Kinsner, W., Zhang, D., Wang, Y., & Tsai, J. (eds.) (2005, August). Cognitive informatics. Proceedings of the 4th
348
IEEE International Conference, (ICCI’05). Irvine, CA: IEEE CS Press. Kircanski, M., & Petrovic, T. (1993). Combined analytical-pseudoinverse inverse kinematic solution for simple redundant manipulators and singularity avoidance. Int. J. Robotics Research, 12(1), 188-196. Kleene, S.C. (1956). Representation of events by nerve nets. In C.E. Shannon and J. McCarthy (eds.) Automata Studies (pp. 3-42). Princeton Univ. Press. Klein, C. A., & Huang, C. H. (1983, April). Review of pseudoinverse control for use with kinematically redundant manipulators. In IEEE Tran. Sys. Man and Cyb., SMC-13, 3, 245-250. Klein, C., Chu-Jenq, C., & Ahmed, S. (1993). Use of an extended jacobian method to map algorithmic singularities. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 632-637). Atlanta, USA. Klir G.J. (1992). Facets of systems science. New York: Plenum. Klivington, K. (1989). The science of mind (p. 239). Cambridge, MA: MIT Press. Knuth, D. (1984). Literate programming. The Computer Journal, 27(2), 97-111. Koehler, W. (1929). Gestalt Psychology. London: Liveright. Koffka, K. (1935). Principles of Gestalt psychology. New York: Brace & World. Kohonen, T. (2002). Self-organization and associative memory. (pp. 312) (2nd ed.). New York, NY: Springer Verlag. Kokinov, B. & Petrov, A. 2000. Dynamic extension of episode representation in analogy-making in AMBR. Proceedings of the 22nd Conference of the Cognitive Science Society, NJ. 274-279. Kort, B., & Reilly, R. (2002). Theories for deep change in affect-sensitive cognitive machines: A constructivist model. Educational Techology. & Society, 5(4), 3
Compilation of References
Kozaczynski, W., & Wilde, N. (1992). On the re-engineering of transaction systems. Journal of software maintenance, 4, 143-162.
Licklider, J. C. R., & Taylor, R. G. (1968, April). The computer as a communication device. Science & Technology, 76, 21-31.
Kronaver, R.E., & Yehoshua, Y.Z. (1985). Reorganization and diversification of signals in vision. IEEE Trans. on Sys. Man, and Cyber SMC, 15(1), 91-101.
Lieberman, P. (1984). The biology and evolution of language. Cambridge, MA: Harvard University Press.
Kurzweil, R. (1990). The age of intelligent machines (p. 565). Cambridge, MA: MIT Press.
Lieberman, P. (1991). Uniquely human: The evolution of speech, thought, and selfless behavior. Cambridge, MA: Harvard University Press.
Kurzweil, R. (1999). The age of spiritual machines: When computers exceed human intelligence. New York: Penguin Group
Lieberman, P. (2000). Human language and our reptilian brain: The subcortical bases of speech, syntax and thought. Cambridge, MA: Harvard University Press.
Laresgoiti, I., Anjewierden, A., Bernaras, A., Corera, J., Schreiber, A. TH, & Wielenga, B.J. (1996). Ontologies as vehicles for reuse: A mini-experiment. In B.R. Gains & M.A. Musen (eds.). Proceedings of the 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop (KAW-96) (pp.301-30.21). Banff, Canada.
Liegeois, A. (1986, December). Automatic supervisory control of the configuration and behavior of multi-body mechanisms. In IEEE Tran. Sys. Man and Cyb., SMC-7, 3, 868-871.
Latombe, J.-C. (2006, July). Probabilistic roadmaps: A motion planning approach based on active learning. Keynote Speech at the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 1-2). Beijing, China: IEEE CS Press. Lau, C. (1991). Neural networks, theoretical foundations an analysis. IEEE Press. Lenat, D. (1998, October). The dimensions of contextspace. CYCorp Report. Levesque, H. J. & Lakemeyer, G. (2000). The logic of knowledge bases. Cambridge, MA: MIT Press. Levesque, H. J. (1984). The logic of incomplete knowledge bases. In M. L. Brodie, J. Mylopoulos & J.W. Schmidt (Ed.), On Conceptual Modeling. New York: Springer-Verlag. Levy, A. Y. & Rousset, M-C. (1998). Verification of knowledge bases based on containment checking. Artificial Intelligence, Vol. 101, Issues 1-2, pp.227-250. Licklider, J. C. R. (1960). Man-computer symbiosis. IRE Transactions on Human Factors in Electronics, HFE, 4-11.
Lin, T.Y. (1997). Granular computing. Announcement of the BISC special interest group on granular computing. Lintermann, B. & Deussen, O. (1999). Interactive Structural and Geometrical Modeling of Plants, IEEE Computer Graphics and Applications, 19(1). Lipschutz, S. (1964), Schaum’s outline of theories and problems of set theory and related topics. New York, NY: McGraw-Hill Inc. Lipschutz, S. (1967). Schaum’s outline of set theory and related topics. McGraw-Hill Inc. Liu, L., & Lin, M., (1991). Forecasting residential consumption of natural gas using monthly and quarterly time series. International Journal of Forecasting, 7, 3-16 López, N., & Núñez, M. (2001). A testing theory for generally distributed stochastic processes. In CONCUR 2001, LNCS 2154, (pp. 321-335). Springer. López, N., Núñez, M., & Rubio, F. (2004). An integrated framework for the analysis of asynchronous communicating stochastic processes. Formal Aspects of Computing, 16(3), 238-262.
349
Compilation of References
Mackey, M.C. (1992). Time’s arrow: The origin of thermodynamic behavior (p. 175). New York, NY: Springer Verlag.
Menzies, T. & Pecheur, C. (2005). In M. Zelkowitz (Ed.), Advances in computers, Vol. 65, Amsterdam, the Netherlands: Elsevier.
Mainzer, K. (2004). Thinking in complexity (4th ed.) (p. 456). New York, NY: Springer Verlag.
Meystel, A.M., & Albus, J.S. (2002). Intelligent systems, architecture, design, and control. John Wiley & Sons, Inc.
Mallat, S. (1998). A wavelet tour of signal processing (p. 577). San Diego, CA: Academic. Mandelbrot, B.B. (1974). Intermittent turbulence in self-similar cascades: Divergence of higher moments and dimension of the carrier. J. Fluid Mech., 62(2), 331-358. Mandelbrot, B.B. (1982). The fractal geometry of nature (p. 468). New York, NY: W.H. Freeman. Mann, S. (2002). Intelligent image processing (p. 339). New York, NY: Wiley/IEEE. Mannila, H. (1997). Methods and problems in data mining. Proceedings of International Conference on Database Theory’97 (pp. 41-55). Martin, R. C. (2002). Agile software development, principles, patterns, and practices. MA: Addison Wesley. Maslow, A. H. (1970). Motivation and personality (2nd ed). New York: Viking. Matlin, M.V. (1998). Cognition (4th ed.). Harcount Brace and Company. Mayer, R.E. (1992). Thinking, problem solving, cognition (2nd ed.). W.H. Freeman and Company. McCulloch, W.S. (1965). Embodiments of mind. Cambridge, MA: MIT Press. McCulloch, W.S. (1993). The complete works of Warren S. McCulloch. Salinas, CA: Intersystems Pub. McCulloch, W.S.,& Pitts, W.H. (1943). A logic calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5. USA. Mcdermot, J. (1998) Preliminary steps toward a taxonomy of problem-solving methods. In Automating Knowledge Acquisition for Expert Systems (pp.225-255) S. Marcus, (ed.), Boston: Kluwer.
350
Michalski, J.S., Carbonell, J.G., & Mirchell, T.M. (Eds.) (1983). Machine learning: An artificial intelligence approach (pp. 463-482). Palo Alto, CA: Morgan Kaufmann. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity to process information. Psychological Review, 63, 81-97. Milner, R. (1989). Communication and concurrency. Englewood Cliffs, NJ: Prentice-Hall. Mingers, J. (1989). An empirical comparison of pruning measures for decision tree induction. Machine Learning, 4, 227-243. Minsky, M. (1986). The society of mind (p339). New York, NY: Touchstone. Mitra, S.K. (1998). Digital signal processing: A computer-based approach (p.864). New York: McGraw-Hill (MatLab Series) Montello, D. (1993). Scale and multiple Ppsychologies of space. In A. Frank & I. Campari (Eds.), Spatial Information Theory: A Theoretical Basis for GIS (pp. 312-321). Berlin: Springer. Murata, T., Subrahmanian, V. S. & Wakayama, T. (1991). A Petri net model for reasoning in the presence of inconsistency. IEEE Transactions on Knowledge and Data Engineering, Vol.3, No.3, pp. 281-292. Murch, R. (2004). Autonomic computing. London: Person Education. Najjar, M., Fournier-Viger, P., Lebeau, J. F. & Mayers, A. (2006). Recalling Recollections According to Temporal Contexts—Applying of a Novel Cognitive Knowledge Representation Approach. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06). July 17-19, Beijing, China.
Compilation of References
Najjar, M., Fournier-Viger, P., Mayers, A. & Bouchard, F. (2005). Memorising Remembrances in Computational Modelling of Interrupted Activities. Proceedings of the 7th International Conference on Computational Intelligence and Natural Computing. July 21-26, Salt Lake City, Utah, USA. pp: 483-486.
Norman, D. A. (2005). Human-centered design considered harmful. Interactions. Retrieved August 6, 2006, from http://delivery.acm.org/10.1145/1080000/1070976/ p14-norman.html?key1=1070976&key2=3820555211& coll=portal&dl=ACM&CFID=554857554&CFTOKE N=554857554
Nakamura, Y., & Hanafusa, H. (1987). Optimal redundancy control of robot manipulators. Int. J. Robotics Research, 6(1), 32-42,.
Norman, D. A., & Draper, S. W. (1986). User-centered system design: New perspectives on human-computer interaction. Mahwah, NJ: Lawrence Erlbaum.
Neches, R.. Fikes, R., Finn, T., Gruber, T., Patil, R., Senator, T., & Swartout, W.R. (1998). Enabling technology for knowledge sharing. AI Magazine, 12(3), 37-56.
Novak, J. D. (1998). Learning, creating, and using knowledge. Mahwah, NJ: Lawrence Erlbaum Associates.
Neely, J. H. (1989). Experimental dissociations and the episodic/semantic memory distinction. Experimental Psychology: Human Learning and Memory, (6), 441466. Newell, A. (1982). The knowledge level. AI Magazine, 18(1), 1-20. Ngo-The, A., & Ruhe, G. (2006). A systematic approach for solving the wicked problem of software release planning. Submitted to Journal of Soft Computing.
Núñez, M. (2003). Algebraic theory of probabilistic processes. Journal of Logic and Algebraic Programming, 56(1-2), 117-177. Núñez, M., & de Frutos, D. (1995). Testing semantics for probabilistic LOTOS. In Formal Description Techniques 8, 365-380. Chapman & Hall. Núñez, M., de Frutos, D., & Llana, L. (1995). Acceptance trees for probabilistic processes. In CONCUR’95, LNCS 962, (pp.249-263). Springer.
Nguyen, S.H., Skowron, A., & Stepaniuk, J. (2001). Granular computing: A rough set approach. Computational Intelligence, 17, 514-544.
Núñez, M., Rodríguez, I., & Rubio, F. (2003). Towards the identification of living agents in complex computational environments. In 2nd IEEE Int. Conf. on Cognitive Informatics, (pp. 151-160). IEEE Computer Society Press.
Nguyen, T. A., Perkins, W. A., Laffey, T. J. & Pecora, D. (1987). Knowledge base verification, AI Magazine, Vol.8, No.2, pp. 69-75.
Núñez, M., Rodríguez, I., & Rubio, F. (2004). Applying Occam’s razor to FSMs. In International Conference on Cognitive Informatics. (pp. 138-147). IEEE Press.
Nicollin, X. & Sifakis, J. (1991). An overview and synthesis on timed process algebras. In Computer Aided Verification’91, LNCS 575, (pp. 376-3)..
O’Leary, D. E. (1998). Using AI in knowledge management: knowledge bases and ontologies. IEEE Intelligent Systems, Vol.13, No.3, pp. 34-39.
Nielsen, J. (1993). Usability engineering. Cambridge, MA: Academic Press/AP Professional.
Olshausen, B.A, & David, J. F. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607-609.
Nielsen, M.A., & Chuang, I.L. (2000). Quantum computation and quantum information (p. 676) Cambridge, UK: Cambridge University Press.. Norman, D. A. (2004). Emotional design: Why we wove (or hate) everyday things. New York: Basic Books.
Oppenheim A.V., & Schafer, R.W. (1975). Digital signal processing (p.585). Englewood Cliffs, NJ: PrenticeHall. Oppenheim A.V., & Schafer, R.W. (1989). Discretetime signal processing (p. 879). Englewood Cliffs, NJ: Prentice-Hall.
351
Compilation of References
Oppenheim A.V., & Willsky, A.S. (1983). Signals and systems (p. 796). Englewood Cliffs, NJ: Prentice-Hall. Oppenheim, A.V., Schafer, R.W., & Buck, J.R. (1999). Discrete-time signal processing (p. 870) (2nd Ed.) Prentice Hall. Osborne, M., & Rubinstein, A. (1994). A course in game theory. MIT Press. Otsu, N. (1979). A threshold selection method from graylevel histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62-66. Ott, E. (1993). Chaos in dynamical systems (p. 385). Cambridge, UK: Cambridge University Press. Painter T., & Spanias, A. (1998, April). Perceptual coding of digital audio. Proceedings of the IEEE, 88(4), 451-513. Parasuraman, R. (2003). Neuroergonomics: Research and practice. Theoretical Issues in Ergonomics Science, 4(1-2), 5-20. Parsell, M. (2005, March). Review of P.O. Haikonen, the cognitive approach to consious machines. Psyche, 11(2), 1-6.. Available as of May 2006 from http://psyche. cs.monash.edu.au/book_reviews/haikonen/haikone. pdf Patel, D., Patel, S., & Wang, Y. (eds.) (2003, August). Cognitive informatics. Proceedings of the 2nd IEEE International Conference on Cognitive Informatics (ICCI'03), IEEE Computer Society Press, July, 227pp. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data (p. 252). New York, NY: Springer.
(NY): Springer-Verlag. Pelayo, F.L., Cuartero, F., Valero, V., & Cazorla, D. (2000). An example of performance evaluation by using the stochastic process algebra ROSA. In 7th Int. Conf. on Real-Time Systems and Applications, (pp. 271-278). IEEE Computer Society Press. Pennebaker, W.B., & Mitchell, J.L. (1993). JPEG still image data compression standard (p. 638). New York, NY: Van Nostrand Reinhold. Penrose, R. (1989). The emperor’s new mind (p. 480). Oxford, UK: Oxford University Press. Penrose, R. (1994). The shadows of the mind: A search for the missing science of consciousness (p.457). Oxford, UK: Oxford University Press. Perdereau, V., Passi, C., & Drouin, M. (2002). Real-time control of redundant robotic manipulators for mobile obstacle avoidance. In Int. J. Robotics, Autonomous Systems, 41, 41-59. Pescovitz, D. (2002). Autonomic computing: Helping computers help themselves. IEEE Spectrum, 39(9), 49-53. Pesin, Y.B. (1977). Characteristic Lyapunov exponents and smooth ergodic theory. Russian Mathematical Surveys, 32, 55-114. Pheifer, R., & Scheier, C. (1999). Understanding intelligence (pp. 720). Cambridge, MA: MIT Press. Piaget, J. (1954). The construction of reality in the child. New York: Basic Books.
Payne, D.G., & Wenger, M.J. (1998). Cognitive psychology. New York: Houghton Mifflin Co.
Piaget, J., & Indelder, B. (1948). La représentation de l’espace chez l’enfant. Paris: PUF: Bibliothèque de Philosphie Contemporaine.
Pedrycz, W. (Ed.) (2001). Granular computing: An emerging paradigm. Heidelberg: Physica-Verlag.
Pinel, J.P.J. (1997), Biopsychology, (3rd ed.). Needham Heights, MA: Allyn and Bacon.
Pedrycz, W., & Gomide, F. (1998). An introduction to fuzzy sets: Analysis and design (p. 465). Cambridge, MA: MIT Press.
Pirolli, P., & Card, S. K. (1999). Information foraging. Psychological Review, 106(4), 643-675.
Peitgen, H.-O., Jürgens, H., & Saupe, D. (1992). Chaos and fractals: New frontiers of science (p. 984). New York
352
Plotkin, G.D. (1981). A structural approach to operational semantics. Technical Report DAIMI FN-19, Computer Science Department, Aarhus University.
Compilation of References
Popper, K. (2003). The logic of scientific discovery. Taylor & Francis Books Ltd. Posner, M. (ed.) (1989). Foundations of cognitive science (p 888). Cambridge, MA: MIT Press. Prigogine, I. (1996). The end of certainty: Time, chaos, and the new laws of nature (p. 228). New York, NY: The Free Press. Prigogine, I., & Stengers, I. (1984). Order out of chaos: Man’s new dialogue with nature (p. 349). New York, NY: Bantam. Principe, J.C., Euliano, N.R., & Lefebvre, W.C. (2000). Neural and adaptive systems: Foundations through simulations (p. 656). New York, NY: Wiley. Proakis, J.G., & Manolakis, D.G. (1995). Digital signal processing: Principles, algorithms and applications (p. 1016) (3rd ed.). Upper Saddle River, NJ: Prentice-Hall. Pylyshyn, Z. W. (1989). Computing in cognitive science. In M. I. Posner (Ed.), Foundations of cognitive science (pp. 49-92). Cambridge, MA: MIT Press. pp. 49-92. Quillian, M.R. (1968). Semantic memory. In M. Minsky (ed.), Semantic information processing. Cambridge, MA: Cambridge Press. Quinlan, J.R. (1983). Learning efficient classification procedures and their application to chess end-games. In Machine learning: An artificial intelligence approach, 1, Michalski, J.S. Quinlan, J.R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann. Rabin, M.O., & Scott, D. (1959). Finite automata and their decision problems. IBM Journal of Research and Development, 3, 114-125. Rajlich, V. (2002). Program comprehension as a learning process. Paper presented at the First IEEE International Conference on Cognitive Informatics, Calgary, Alberta. Rajlich, V., & Bennett, K. H. (2000). A staged model for the software lifecycle. Computer, 33(7), 66-71.
Rajlich, V., & Xu, S. (2003). Analogy of incremental program development and constructivist learning. Paper presented at the Second IEEE International Conference on Cognitive Informatics, London, UK. Ralson, A., Reilly, E.D., & Hemmendinger, D. (eds.) (2003). Encyclopedia of computer science (p. 2064) (4th ed.), New York, NY: Wiley. Ramdane-Cherif, A., Perdereau, V., & Drouin, M. (1995, November-December). Optimization schemes for learning the forward and inverse kinematic equations with neural network. In Proc. IEEE Int. Conf. Neural Networks, Perth, Australia.. Ramdane-Cherif, A., Perdereau, V., & Drouin, M. (1996, April). Penalty approach for a constrained optimization to solve on-line the inverse kinematic problem of redundant manipulators. In Proc. IEEE Int. Conf. Robotics and Automation (pp. 133-138). Minneapolis, USA. Ran, A., & Kuusela, J. (1996). Design decision trees. Paper presented at the Eighth International Workshop on Software Specification and Design, Paderborn, Germany. Randell, D., Cui, Z., & Cohn, A. (1992). A spatial logic based on regions and connection. In B. Nebel & W. Swartout & C. Rich (Eds.), Proceeding 3rd International Conference on Knowledge Representation and Reasoning (pp. 165-176). San Mateo: Morgan Kaufmann. Rao, R., Gordon, D., & Spears, W. (1995). For every generalization action, is there really an equal or opposite reaction? Analysis of conservation law. In Proceedings of the Twelveth International Conference on Machine Learning. (pp. 471-479). Morgan. Reed, G., & Roscoe, A. (1988). A timed model for communicating sequential processes. Theoretical Computer Science, 58, 249-261. Richards D. D. & Goldfarb J. 1986. The episodic memory model of conceptual development: an integrative viewpoint. Cognitive Development. 1, 183-219 Rissanen, J. (1978). Modelling by shortest data description. Automatica, 14, 465-471.
353
Compilation of References
Rittel, H., & Webber, M. (1984). Planning problems are wicked problems. In N. Cross (ed.), Developments in Design Methodology (pp 135-144). Chichester, UK: Wiley. Robillard, P. N. (1999). The role of knowledge in software development. Communications of the ACM, 42(1), 87-92. Rosch, E., Mervis, C. B., Gray, W., Johnson, D., & BoyesBraem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382-439. Ross, R. G. (2003). Principles of the business rules approach, Boston, MA: Addison-Wesley. Rostkowycz, A. J., Rajlich, V., & Marcus, A. (2004). Case study on the long-term effects of software redocumentation. Paper presented at the 20th IEEE International Conference on Software Maintenance, Chicago, IL. Roy, B. (1991). The outranking approach and the foundations of ELECTRE methods. Theory and Decision, 31, 49-73. Roy, D.K. (2005, August). Grounding words in perception and action: Insight for computational models. Trends in Cognitive Sciences, 9(8), 389-396. Roy, D.K., & Pentland, A.P. (2002). Learning words from sights and sounds: A computational model. Cognitive Science, 26, 113-146. Ruaro, M.E., Bonifazi, P., & Torre, V. (2005, March). Toward the neurocomputer: Image processing and pattern recognition with neuronal cultures. IEEE Trans. Biomedical Eng., 52(3), 371-383. Ruelle, D. (1978). Thermodynamics formalism (p. 183). Reading, MA: Addison-Wesley-Longman and Cambridge, UK: Cambridge University Press. Rugaber, S., Ornburn, S. B., & LeBlanc, R. J. (1990). Recognizing design decisions in programs. IEEE Software, 7(1), 46-54. Ruhe, G. (2003). Software engineering decision support – Methodologies and applications. In Tonfoni and Jain (eds.), Innovations in Decision Support Systems, 3, 143-174.
354
Ruhe, G., & An, N.-T. (2004). Hybrid intelligence in software release planning. International Journal of Hybrid Intelligent Systems, 1(2), 99-110. Rumelhart, D.E., & McClelland, J.L. (1986). Parallel Distributed Processing, 1-2, 547&611. Cambridge, MA: MIT Press. Rushby J. & Whitehurst, R.A. (1989, February). Formal verification of AI software. NASA Contractor Report 181827. Rushby, J. (1988, October). Quality measures and assurance for AI software. NASA Contractor Report 4187. Rybak, G., & Golovan, P. (1998). A model of attention-guided visual perception and recognition. Vision Research. 38, 2387-2400. Rzepa, H & Tonge, A. 1998. VChemlab: A virtual chemistry laboratory. Journal of Chemical Information and Computer Sciences, 38(6) : 1048-1053. Sailor, D.J., & Munoz, J.R. (1997). Sensitivity of electricity and natural gas consumption to climate in the U.S.A. Methodology and results for eight states. Energy 22(10), 987-998. Sandia National Laboratories (2006). Projects. Available as of May 2006 from http://www.sandia.gov/cog. systems/Projects.htm Sanquist, T. F., Greitzer, F. L., Slavich, A., Littlefield, R., Littlefield, J., & Cowley, P. (2004). Cognitive tasks in information analysis: Use of event dwell time to characterize component activities. In Proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting, New Orleans, Louisiana. Sanz, R., Chrisley, R., & Sloman, A. (2003). Models of sonsciousness: Scientific report (p. 37). European Science Foundation. Available as of May 2006 from http://www. esf.org/generic/1650/EW0296Report.pdf Sastry, P. S., Santharam, G., & Unnikrishnan, K. P. (1994, March). Memory neuron networks for identification and control of dynamical systems. In IEEE, Tran. Neural Networks, 5(2), 306-319.
Compilation of References
Sayood, K. (2000). Introduction to data compression (2nd ed.) (p. 636). San Francisco, CA: Morgan Kaufman.
Searle, J.R. (1980). Minds, brains and programs. Behavioral & Brain Sciences, 3, 417-424.
Schaffer, J. (1994). A conservation law for generalization performance. In Proceedings of the 11th International Conference on Machine Learning (pp. 259-265). Morgan Kaufmann.
Searle, J.R. (1992). The rediscovery of the mind. (p. 288). Cambridge, MA: MIT Press.
Schank, R., & Abelson, R. (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Erlbaum. Schmorrow, D. D., & Kruse, A. A. (2004). Augmented cognition. In W. S. Bainbridge (Ed.), Berkshire encyclopedia of human computer interaction (pp. 54-59). Great Barrington, MA: Berkshire Publishing Group. Schmorrow, D., & McBride, D. (2005). Introduction to special issue on augmented cognition. International Journal of Human-Computer Interaction, 17(2). Scholtz, J., Morse, E., & Hewett, T. (2004, March). In depth observational studies of professional intelligence analysts. Paper presented at Human Performance, Situation Awareness, and Automation (HPSAA), Daytona Beach, FL. Retrieved August 6, 2006, from http://www.itl.nist. gov/iad/IADpapers/2004/scholtz-morse-hewett.pdf Schoning, U. (1989). Logic for computer scientists. Boston: Birkhauser. Schreiber, G., Breuker, J., Bredeweg, B., & Wielinga, B. (1988, June 19-23). Modeling in knowledge based systems development. In J. Boose, B. Gaines, M. Linster (eds.). Proceedings of the European Knowledge Acquisition Workshop (EKAW ’88) (pp. 7.1-7.15). Gesellschaft fur Mathematik und Datenverarbeitung, MBH, 7.1-7.15. Schrödinger, E. (1944). What is Life? with Mind and Matter and Autobiographical Sketches (p. 184). Cambridge, UK: Cambridge University Press {ISBN 0-521-42708-8 pbk; Reprinted 2002}
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal,.27, 379-423, 623-656. Shannon, C.E. (ed.) (1956). Automata studies. Princeton: Princeton University Press. Shao, J. & Y. Wang. 2003. A New Measure of Software Complexity based on Cognitive Weights, IEEE Canadian Journal of Electrical and Computer. Engineering. 28(2), pp.69-74. Shastri, L. 2002. Episodic memory and cortico-hippocampal interactions. Trends in Cognitive Sciences, 6: 162-168. Sienko, T., Adamatzky, A., Rambidi, N.G., & Conrad, M. (2003). Molecular computing (p. 257). Cambridge, MA: MIT Press. Simon, H. (1998). Neural networks: A comprehensive foundation. Upper Saddle River, NJ: Prentice Hall PTR. Simoncelli, E. P. (2003). Vision and statistics of the visual environment. Current Opinion in Neurobiology, 13, 144-149. Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics and neural representation. Annual Review of Neuroscience, 24, 1193-1216. Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119, 3-22.
Schroeder, M.R. (1991). Fractals, chaos, power laws (p. 429). New York, NY: W.H. Freeman.
Sloman, S. A. (2002). Two systems of reasoning. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases (pp. 379-396). New York: Cambridge University Press.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461-464.
Smith, K.J. (2001). The nature of mathematics (9th ed.). CA: Brooks/Cole, Thomson Learning Inc. Smith, R.E. (1993). Psychology. St. Paul, MN: West Publishing Co.
355
Compilation of References
Soloman, S. (1999). Sensor handbook (p. 1486). New York, NY: McGraw-Hill. Solso, R. (ed.) (1999). Mind and brain science in the 21st century. MIT Press. Spelke, E. S. (1990). Principles of object oerception. Cognitive Science, 14, 29-56. Sperschneider, V., & Antoniou, G. (1991). Logic: A foundation for computer science. Reading, MA: Addison-Wesley. Sprott, J.C. (2003). Chaos and time-series analysis (p. 507). Oxford, UK: Oxford University Press. Squire, L., Knowlton, B., & Musen, G. (1993). The structure and organization of memory. Annual Review of Psychology, 44, 453-459. Stacey, G. (1994, November). Stochastic fractal modelling of dielectric discharges. (p. 308). Master’s Thesis. Winnipeg, MB: University of Manitoba. Stanley, H.E., & Meakin, P. (1988, Septembr 29). Multifractal phenomena in physics and chemistry. Nature, 335, 405-409. Stanovich, K. E. (1999). Who is rational: Studies of individual differences in reasoning. Mahway, NJ: Erlbaum. Stanovich, K. E., & West, R. F. (2002). Individual differences in reasoning. Implications for the rationality debate. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and biases. New York: Cambridge University Press. Steels, L. (1990). Components of expertise. AI Magazine, 11(2), 29-49. Sternberg, R.J. (1998). In search of the human mind, (2nd ed.), Orlando, FL: Harcourt Brace & Co., Stevens, S. S. (1975). Psychophysics: Introduction to perceptual, neural, and social prospects New York: Wiley. Stonier, T. (1990). Information and the internal structure of the universe: An exploration into information physics (p. 155). New York, NY: Springer Verlag.
356
Subramanian, D., Pekny, J.F., Reklaitis, G.V. (2000). A simulation-optimization framework for addressing combinatorial and stochastic aspects of a research & development pipeline management. Computers and Chemical Engineering, 24(7), 1005-1011. Sweller, J. 1988. Cognitive load during problem solving: effects on learning. Cognitive Science. 12: 257-285. Tamma, V., Phelps, S., Dickinson, I., & Woolridge, M. (2005). Ontologies for supporting negotiation in ECommerce. Special Issue Engineering Applications of Artificial Intelligence, 18(2), 223-236. Tao, W.O., & Ti, H.C. (1998). Transients analysis of gas pipeline network. Chemical Engineering Journal, 69, 47-52. Tarr, M. J. (1995). Rotating objects to recognize them: A case study of the role of mental transformations in the recognition of three-dimensional objects. Psychonomic Bulletin and Review, 2, 55-82. Tarr, M. J., & Buelthoff, H. H. (1995). Is human object recognition better described by Geon structural descriptions or by multiple views? Commen on Biederman and Gerhardstein (1993). Journal of Experimental Psychology: Human Perception and Performance, 21(6), 1494-1505. Taylor, J.G. (2001). The Race to Consciousness (p. 392). Cambridge, MA: MIT Press. Taylor, J.G. (2002). Paying attention to consciousness. Trends Cogn. Sciences, 6, 206-210. Taylor, J.G. (2003, June 20-24). The CODAM model of attention and consciousness. In Proceedings of the Intern. Joint Conf. Neural Networks, IJCNN03, 1, 292297. Portland, OR. Tekalp, A.M. (ed.) (1998, May). Multimedia signal processing [Special Issue]. Proceedings of the IEEE, 86(5). Thelen, E., & Smith, L.B. (2002). A dynamic system approach to the development of cognition and action (p. 376). Cambridge, MA: MIT Press.
Compilation of References
Tomassi, P. (1999). Logic. London and New York: Routledge. Tornay, S. (1938). Ockham: Studies and Selections. La Salle, IL: Open Court Publishers. Treisman, A.M. (1964). Verbal cues, language, and meaning in selective attention. American Journal of Psychology, 77, 206-219. Tricot, C. (1995). Curves and fractal dimension (p. 323). New York, NY: Springer-Verlag. Tulving, E. (1983). Elements of Episodic Memory. Oxford University Press, New York. Turcotte, D.L. (1997). Fractals and chaos in geology and geophysics (2nd ed.) (p. 398). Cambridge, UK: Cambridge University Press. Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460. Tversky, B. (2005). Functional significance of visuospatial representation. In P. Shah & A. Miyake (Eds.), Handbook of higher-level visuospatial thinking. Cambridge: Cambridge University Press. Tversky, B., & Lee, P. (1999). How space structures language. In C. Freksa, C. Habel, & K. F. Wender (Eds.), Spatial cognition: an interdisciplinary approach to representation and processing of spatial knowledge (pp. 157-176). Springer-Verlag. Tversky, B., Morrison, J. B., Franklin, N., & Bryant, D. (1999). Three spaces of spatial cognition. Professional Grographer, 51, 516-524. UCLA Cognitive Systems Laboratory (2006). Available as of May 2006 fromhttp://singapore.cs.ucla.edu/cogsys. html Uraikul, V., Chan, C.W., & Tontiwachwuthiku, P. (2000). Development of an expert system for optimizing natural gas operations. Expert Systems with Applications, 18(4), 271-282. Uschold, M. (2003). Where are the sematics in the Semantics Web. AI Magazine, 24(3), 25-36.
Uschold, M., King, M., Morale, S., & Zorgios, Y. (1998). The enterprise ontology. The Knowledge Engineering Review, 13(1), 31-89. Valente, A., & Breuker, J. (1996). Towards principled core ontologies. In B.R. Gaines & M. Mussen (eds.) Proceedings of the KAW-96, Banff, Canada. Van de Velde, W., & Schreiber, G. (1997). The future of knowledge acquisition: A european perspective. IEEE Expert, 1, 1-3. van Emden, M. H. & Kowalski, R. (1976). The semantics of predicate logic as a programming language. Journal of ACM, Vol. 23, pp. 733-742. van Heijenoort, J. (1997). From Frege to Godel, A source book in mathematical logic 1879-1931. Cambridge, MA: Harvard University Press. Vapnik, V. (1995). The nature of statistical learning theory. Springer. Velmans, M. (2000). Understanding consousness (p. 296). New York, NY: Routledge. Vicsek, T. (1992). Fractal growth phenomena (2nd ed.) (p. 488). Singapore: World Scientific. Vinje, W.E., & Gallant, J.L., (2000). Sparse coding and decorrelation in primary visual cortex during natural vision. Science, 287, 1273-1276. von Bertalanffy, L. (1952). Problems of life: An evolution of modern biological and scientific thought. London: C.A. Watts. von Glasersfeld, E. (1995). Radical constructivism. London: The Falmer Press. von Neumann, J. (1946). The principles of large-scale computing machines. Reprinted in Annals of History of Computers, 3(3), 263-273. von Neumann, J. (1958). The computer and the brain. New Haven: Yale University Press von Neumann, J. (1963). General and logical theory of automata, A.H. Taub ed., collected works, 5, Pergamon, (pp. 288-328).
357
Compilation of References
von Neumann, J. and O. Morgenstern (1980), Theory of Games and Economic Behavior, Princeton Univ. Press. von Neumann, J., & Burks, A.W. (1966). Theory of self-reproducing automata. Urbana, IL: University of Illinois Press. Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press. Wald, A. (1950). Statistical Ddecision functions. John Wiley & Sons. Wang, Y. (2003). Cognitive informatics models of software agents and autonomic computing. Keynote speech at The First International Conference on Agent-Based Technologies and Systems (ATS’03) (pp. 25-26). Canada: University of Calgary Press. Wang, Y. (2003). Cognitive informatics: A new transdisciplinary research field. Brain and Mind: A Transdisciplinary. Journal of Neuroscience and Neurophilosophy, 4(2), 115-127. Wang, Y. (2003). Using process algebra to describe human and software system behaviors. Brain and Mind: A Transdisciplinary. Journal of Neuroscience and Neurophilosophy, 4(2), 199-213. Wang, Y. (2004, August). Autonomic computing and cognitive processes. Keynote Speech from the Proceedings of the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) (pp. 3-4). Victoria, Canada: IEEE CS Press. Wang, Y. (2005, August). On cognitive properties of human factors in engineering. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 174-182). Irvine, CA: IEEE CS Press. Wang, Y. (2005, August). The cognitive processes of abstraction and formal inferences. Proceedings 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 18-26). Irvin, California: IEEE CS Press. Wang, Y. (2005, May). On the mathematical laws of software. Proceedings of the 18th Canadian Conference
358
on Electrical and Computer Engineering (CCECE’05) (pp. 1086-1089). Saskatoon, SA, Canada. Wang, Y. (2005). Mathematical models and properties of games. Proceedings of the 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (294300). Irvin, California, USA: IEEE CS Press. Wang, Y. (2005). On cognitive properties of human factors in engineering. In Proceedings of the IEEE 2005 International Conference on Cognitive Informatics (pp. 174-182). IEEE Computer Society. Wang, Y. (2005, August). A novel decision grid theory for dynamic decision making. Proceedings 4th IEEE International Conference on Cognitive Informatics (ICCI’05) (pp. 308-314) Irvin, California: IEEE CS Press. Wang, Y. (2006). On the informatics laws and deductive semantics of software. IEEE Transactions on Systems, Man, and Cybernetics (Part C), 36(2), 161-171. Wang, Y. (2006, July). Cognitive complexity of software and its measurement. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 226-235). Beijing, China: IEEE CS Press. Wang, Y. (2006, July). Cognitive informatics - Towards the future generation computers that think and feel. Keynote Speech from the Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 3-7). Beijing, China: IEEE CS Press. Wang, Y. (2006, July). Cognitive informatics and contemporary mathematics for knowledge representation and manipulation. Invited Plenary Talk from the Proceedings of the 1st International Conference on Rough Set and Knowledge Technology (RSKT’06) (pp. 69-78). Lecture Notes in Artificial Intelligence, LNAI 4062, Chongqing, China: Springer. Wang, Y. (2006, July). On abstract systems and system algebra. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 332-343). Beijing, China: IEEE CS Press. Wang, Y. (2006, July). On concept algebra and knowledge representation. Proceedings of the 5th IEEE International
Compilation of References
Conference on Cognitive Informatics (ICCI’06) (pp.320331).Beijing, China: IEEE CS Press. Wang, Y. (2006, July). On the Big-R notation for describing iterative and recursive behaviors. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 132-140). Beijing, China: IEEE CS Press. Wang, Y. (2006, March). On the informatics laws and deductive semantics of software. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 161-171. Wang, Y. (2006, May) The OAR model for knowledge representation. Proceedings of the 19th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE’06) (pp. 1696-1699). Ottawa, Canada. Wang, Y. (2006, May). A unified mathematical model of programs. Proceedings of the 19th Canadian Conference on Electrical and Computer Engineering (CCECE’06) (pp.2346-2349). Ottawa, ON, Canada. Wang, Y. (2006, July). On concept algebra and knowledge representation. Proceedings of the 5th IEEE International Conference on Cognitive Informatics (ICCI’06) (pp. 320-331). Beijing, China: IEEE CS Press. Wang (2007). Exploring machine cognition mechanisms for autonomic computing. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(2), pp. i-v. Wang, Y. (2007), Toward theoretical foundations of autonomic computing. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(3), 1-16. USA: IPI Publishing. Wang, Y. (2007). Software engineering foundations: A software science perspective. CRC Book Series in Software Engineering, Vol. II, Auerbach Publications, USA. Wang, Y. (2007, January). The theoretical framework of cognitive informatics. The International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi) 1(1), 1-27. Hershey, PA: IGP.
Wang, Y. (2007, July). The OAR model of neural informatics for internal knowledge representation in the brain. The International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), 1(3), 64-75. Hershey, PA: IGI Publishing. Wang, Y. (2007). Exploring machine cognition mechanisms for autonomic computing. The International Journal on Cognitive Informatics and Natural Intelligence (IJCINI), 1(2), i-v. Wang, Y. (2007). Formal description of the cognitive process of memorization. Proceedings of the 6th IEEE International Conference on Cognitive Informatics (ICCI'07). Lake Tahoe, IEEE Computer Society Press, Los Alamitos, CA., August. Wang, Y. (ed.) (2007). Special issues on autonomic computing. The International Journal on Cognitive Informatics and Natural Intelligence (IJCINI), 1(3). Wang, Y. & Liu, D. (2003, August). On information and knowledge representation in the brain. The 2nd IEEE International Conference on Cognitive Informatics (ICCI’03), (pp. 26-31). London, UK: IEEE CS Press. Wang, Y., & Gafurov, D. (2003). The cognitive process of comprehension. Proceedings of the 2nd IEEE International Conference on Cognitive Informatics (ICCI’03) (pp. 93-97). London, UK. Wang, Y., & W. Kinsner (2006). Recent Advances in Cognitive Informatics. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), March, 121-123. Wang, Y., & Wang Y.(2002, August). Cognitive models of the brain. Proceedings of the First IEEE International Conference on Cognitive Informatics (ICCI’02) (pp. 259269). Calgary, AB., Canada: IEEE CS Press. Wang, Y., & Y. Wang. (2006). Cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics (Part-C), 36(2), March, pp. 203-207. Wang, Y., & Wang, Y. (2006, March). On cognitive informatics models of the brain. IEEE Transactions on Systems, Man, and Cybernetics (C), 36(2), 16-20.
359
Compilation of References
Wang, Y., Liu, D., & Wang, Y. (2003). Discovering the capacity of human memory. Brain and Mind: A Transdisciplinary. Journal of Neuroscience and Neurophilosophy, 4(2), 189-198.
Warwick, K., & Gasson, M. (2005). Human-machine symbiosis overview. In Proceedings of the HCI International 2005/Augmented Cognition Conference, Las Vegas, NV.
Wang, Y., Dong, L., & Ruhe, G. (2004, July). Formal description of the cognitive process of decision making. Proceedings of the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) (pp. 124-130). Victoria, Canada: IEEE CS Press.
Wason, P. (1966). Reasoning. In B. M. Foss (Eds.), New horizons in psychology (pp. 135-151). London: Penguin.
Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB), IEEE Transactions on Systems, Man, and Cybernetics (Part C), 36(2), 124-133. Wang, Y., Johnston, R., & M. Smith (Eds.) (2002), Cognitive Informatics: Proc. 1st IEEE International Conference on Cognitive Informatics (ICCI’02), IEEE CS Press, Calgary, AB, Canada, August. Wang, Y., D. Zhang, W. Kinsner, J-C. Latombe eds. (2008), Proceedings of the 7th IEEE International Conference on Cognitive Informatics (ICCI'08), Stanford University, IEEE Computer Society Press, Los Alamitos, CA., August. Wang, Y., Dong, L., & Ruhe, G. (2004, July). Formal description of the cognitive process of decision making. Proceedings of the 3rd IEEE International Conference on Cognitive Informatics (ICCI’04) (pp. 124-130). Victoria, Canada: IEEE CS Press. Wang, Y., R. Johnston, and M. Smith (Eds.) (2002), Cognitive Informatics: Proc. 1st IEEE International Conference on Cognitive Informatics (ICCI’02), IEEE CS Press, Calgary, AB, Canada, August. Wang, Y., Wang, Y., Patel, S., & Patel, D. (2006, March). A layered reference model of the brain (LRMB), IEEE Transactions on Systems, Man, and Cybernetics (Part C), 36(2), 124-133. Wang, Y., & Gafurov, D. (2003, August). The cognitive process of comprehension. Proceedings of the 2nd IEEE International Conference on Cognitive Informatics (ICCI’03) (pp. 93-97). London, UK: IEEE CS Press.
360
Weber, G. & Brusilovsky, P. 2001. ELM-ART: An adaptive versatile system for Web-based instruction. International Journal of AI in Education 12 (4) : 351-384. Wells, L.K. & Travis, J. 1996. LabVIEW for Everyone: Graphical Programming Made Even Easier, Prentice Hall Eds., NJ. Wertheimer, M. (1958). Principles of perceptual Organization. In Reading in Perception. New York: Van Nostrand. Westen, D. (1999). Psychology: Mind, brain, and culture (2nd ed.). New York: John Wiley & Sons, Inc. Widrow, B., & Lehr, M.A. (1990, September). 30 years of adaptive neural networks: Perception, madeline, and backpropagation. Proceedings of the IEEE, 78(9), 1415-1442. Wiener, N. (1948). Cybernetics or control and communication in the animal and the machine. Cambridge, MA: MIT Press. Wiggins, J.A., Eiggins, B.B., & Zanden, J.V. (1994). Social psychology (5th ed.). New York: McGraw-Hill, Inc. Wilkins, D. J. (2002, November). The bathtub curve and product failure behavior. Reliability HotWire, 21. Retrieved August 6, 2006, from http://www.weibull. com/hotwire/issue21/hottopics21.htm Williams, G.P. (1997). Chaos theory tamed (p. 499). Washington, DC: Joseph Henry Press. Williams, L., Kessler, R., Cuningham, W., & Jeffries, R. (2000). Strengthening the case for pair-programming. IEEE Software, 17(4), 19-25. Willmore, B, & Tolhurst, D.J. (2001). Characterizing the sparseness of neural codes. Network. 2(3), 255-270.
Compilation of References
Wilson, R.A., & Keil, F.C. (2001). The MIT Encyclopedia of the Cognitive Sciences. MIT Press. Wittig, A.F. (2001). Schaum’s outlines of theory and problems of introduction to psychology (2nd ed.). New York: McGraw-Hill. Woolf, H. B. (1980). Webster’s New Collegiate Dictionary. Springfield, Massachusetts, U.S.A.: G. & C. Merriam Company. Wornell, G.W. (1996). Signal processing with fractals: A wavelet-based approach (p. 177). Upper Saddle River, NJ: Prentice-Hall. Xu, S., & Rajlich, V. (2004). Cognitive process during program debugging. Paper presented at the Third IEEE International Conference on Cognitive Informatics, Victoria, BC. Xu, S., & Rajlich, V. (2005). Dialog-based protocol: An empirical research method for cognitive activity in software engineering. Paper presented at the Fourth ACN/ IEEE International Symposium on Empirical Software Engineering, Noosa Heads, Queensland. Yao, Y., Shi., Z., Wang, Y., & Kinsner, W. (eds.) (2006, July). Cognitive Informatics. Proceedings of the 5th IEEE International Conference (ICCI’06). Beijing, China: ) IEEE CS Press. Yao, Y.Y. (2006). Granular computing for data mining. Proceedings of SPIE Conference on Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, paper 624105. Yao, Y.Y., & Yao, J.T. (2002). Induction of classification rules by granular computing. Proceedings of the 3rd International Conference on Rough Sets and Current Trends in Computing (pp. 331-338). Yao, Y.Y., & Zhong, N. (1999). Potential applications of granular computing in knowledge discovery and data mining. Proceedings of World Multiconference on Systemics, Cybernetics, and Informatics, 5, Computer Science and Engineering (pp. 573-580). Yao, Y.Y., Shi, Z.,Wang, Y., & Kinsner, W. (Eds.) (2006). Cognitive informatics. Proceedings of the 5th IEEE
International Conference on Cognitive Informatics (ICCI'06), Beijing, China, IEEE Computer Society Press, Los Alamitos, CA., July, 1,018pp. Yao, Y.Y., Zhao, Y., & Maguire, R.B. (2003). Explanation-oriented association mining using rough set theory. Proceedings of Rough Sets, Fuzzy Sets and Granular Computing (pp. 165-172). Yao, Y.Y., Zhao, Y., & Yao, J.T. (2004). Level construction of decision trees in a partition-based framework for classification. Proceedings of SEKE’04 (pp. 199-205). Yao, Y.Y., Zhong, N., & Zhao, Y. (2004). A three-layered conceptual framework of data mining. Proceedings of ICDM Workshop of Foundation of Data Mining (pp. 215-221). Ye, Y. (2006). Supporting software development as knowledge-intensive and collaborative activity. Paper presented at the 2006 International Workshop on Interdisciplinary Software Engineering Research, Shanghai, China. Yi, W. (1991). CCS+ Time = An interleaving model for real time systems. In 18th ICALP, LNCS 510, 217–228. Springer. Zachary, W., Wherry, R., Glenn, F., & Hopson, J. (1982). Decision situations, decision processes, and decision functions: Towards a theory-based framework for decision-aid design. Proceedings of the 1982 Conference on Human Factors in Computing Systems. Zadeh, L.A. (1997). Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems, 19, 111-127. Zhang, D. & Luqi. (1999). Approximate declarative semantics for rule base anomalies, Knowledge-Based Systems, Vol.12, No.7, pp. 341-353. Zhang, D. & Nguyen, D. (1994). PREPARE: a tool for knowledge base verification., IEEE Transactions on Knowledge and Data Engineering, Vol.6, No.6, pp. 983-989. Zhang, D. (2005). Fixpoint semantics for rule base anomalies. In the Proceedings of the Fourth IEEE Inter-
361
Compilation of References
national Conference on Cognitive Informatics, Irvine, CA. pp. 10-17. Zhang, J., & Norman, D. (1994). Representations in distributed cognitive tasks. Cognitive Science, 18(1), 87-122. Zhao, Y., & Yao, Y.Y. (2005). Interactive user-driven classification using a granule network. Proceedings of ICCI’05 (pp. 250-259).
362
Zhou, J., & Adewumi, M. A. (1998). Transients in gascondensate natural gas pipelines. Journal of Energy Resources Technology, 120, 32-40. Zhu, G., Henson, M.A., & Megan, L. (2001). Dynamic modeling and linear model predictive control of gas pipeline networks. Journal of Process Control, 11, 129-148 Zsombok, C. E. (1997). Naturalistic decision making: Where are we now? In C. Zsombok & G. Klein (Eds.), Naturalistic decision making. Mahwah, NJ: Erlbaum.
363
About the Contributors
Yingxu Wang is professor of cognitive informatics and software engineering, director of International Center for Cognitive Informatics (ICfCI) and director of Theoretical and Empirical Software Engineering Research Center (TESERC) at the University of Calgary. He received a PhD in software engineering from the Nottingham Trent University, UK, in 1997, and a BSc in electrical engineering from Shanghai Tiedao University in 1983. He was a visiting professor in the Computing Laboratory at Oxford University during 1995 and a visiting professor in the Dept. of Computer Science at Stanford University during 2008, and has been a full professor since 1994. Wang is a Fellow of WIF, a P.Eng of Canada, a Senior Member of IEEE, and a member of ACM, ISO/IEC JTC1, the Canadian advisory committee (CAC) for ISO, the advisory committee of IEEE Canadian Conferences on Electrical and Computer Engineering (CCECE), and the National Committee of Canadian Conferences on Computer and Software Engineering Education (C3SEE). He is the founder and steering committee chair of the annual IEEE International Conference on Cognitive Informatics (ICCI). He is the founding editor-in-chief of the International Journal of Cognitive Informatics and Natural Intelligence (IJCINI), founding editor-in- chief of the International Journal of Software Science and Computational Intelligence (IJSSCI), associate editor of IEEE TSMC-A, and editor-in-chief of CRC book series in Software Engineering. He has accomplished a number of European Union, Canadian, and industry-funded research projects as principal investigator and/or coordinator, and has published over 300 journal and conference papers, and 12 books in software engineering and cognitive informatics. He has served on numerous editorial boards and program committees, and as guest editors for a number of academic journals. He has won dozens of research achievement, best paper, and teaching awards in the last 36 years, particularly the 1994 National Zhan Tianyou Young Scientist Prize, China, and the ground breaking book, Software Engineering Foundations: A Software Science Perspective. *** Patricia Boechler received her PhD in psychology from the University of Alberta in 2002, and is currently a member of that university’s educational psychology department. Her general research interests include cognition, memory, learning and developmental psychology. For the last few years her research has been centered around the study of cognition and learning in educational hypermedia. She has been investigating the effects of different types of interfaces on learning, taking into account individual differences in spatial and literacy skills. Boechler is also interested in developing novel research and statistical methods for uncovering regularities in user navigation behaviours. She is applying her background in the study of spatial cognition using neural networks to the use of neural networks for understanding students’ path patterns in educational hypermedia. Christine W. Chan is professor of engineering at the University of Regina, Regina, Saskatchewan, Canada. She is an adjunct scientist of Telecommunications Research Laboratory, adjunct professor of the electrical and computer engineering department of University of Calgary, and associate member of Laboratory for Logic and Experimental Philosophy in Simon Fraser University. She obtained her PhD degree in applied sciences from Simon Fraser University in 1992. She was assistant professor of computer science at University of Regina in 1993, and professor of computer science from 2000 to 2003. Chan founded the Energy Informatics Laboratory at University of Regina in 1995 and has served as principal investigator of the laboratory. Chan has been in-
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
About the Contributors
volved in research on applications of artificial intelligence and knowledge-based technologies to energy and the environment, industrial applications of artificial intelligence, ontological engineering, knowledge and software engineering, intelligent data analysis using artificial intelligence techniques, object-oriented analysis and design, socio-economic impacts of information technology, and development and impacts of educational instructional software. She has published or presented over 190 technical papers, of which over 50 are international journal articles and over 90 are refereed conference papers. She presently serves as editor of Engineering Applications of Artificial Intelligence, and associate editor of Journal of Environmental Informatics. In 2003, she was co-guest editor of a special issue of Engineering Applications of Artificial Intelligence. In 2004-2006, she was awarded the President’s Scholar award at University of Regina. Dr. Chan is a member of Institute of Electrical and Electronics Engineers (IEEE) Computer Society and American Association of Artificial Intelligence. Tiansi Dong, the key software developer at the Cognitive Ergonomic Systems Germany, professional member of ACM and IEEE, received BS at the Department of Computer Science and Technology, Nanjing University in 1997, M.E. at the Department of Computer Science and Engineering, Shanghai Jiaotong University in 2000, China, and Dr. rer. nat at the Department of Mathematics and Informatics, University of Bremen, Germany in 2005 for the grounding of the theory of cognitive prism. Lee Flax is a senior lecturer in the computing department at Macquarie University, Sydney, Australia. He started work in 1970 for the computer company, ICL. Over the years he has been a programmer, systems analyst, project leader and office manager, all in the information systems area. In the early 1980s he became an academic. His current research interests lie in logic and artificial intelligence. Some specific areas are: cognitive modelling using symbolic methods, computable agent reasoning and algebraic belief revision and non-monotonic entailment. Frank L. Greitzer, PhD, is a chief scientist at the Pacific Northwest National Laboratory (PNNL), where he conducts R&D in human-information interaction for diverse problem domains. He holds a PhD degree in mathematical psychology with specialization in memory and cognition and a BS degree in mathematics. Dr. Greitzer leads a R&D focus area of cognitive informatics that addresses human factors and social/behavioral science challenges through modeling and advanced engineering/computing approaches. His research interests include human-information interaction, human behavior modeling to support intelligence analysis, and evaluation methods and metrics for assessing effectiveness of decision and information analysis tools. In the area of cyber security, Greitzer serves as predictive defense focus area lead for the PNNL Information and Infrastructure Integrity Initiative. Greitzer also conducts research to improve training effectiveness by applying cognitive principles in innovative, interactive, scenario-based training and serious gaming approaches. Representative project descriptions and publications may be found at the cognitive informatics Web site, http://www.pnl.gov/cogInformatics. In addition to his work at PNNL, Greitzer serves as an adjunct faculty member at Washington State University, Tri-Cities campus, where he teaches courses for the computer science department (interaction design) and for the psychology department (cognition, human factors). Greitzer also serves on the editorial board of the Journal of Cognitive Informatics & Natural Intelligence. Douglas Griffith is an applied cognitive psychologist in the Cognitive Solutions Laboratory of General Dynamics Advanced Information Systems. He holds a PhD from the University of Utah and has 32 years of applied experience in government and industry. A former president of Division 21 (applied experimental and engineering psychology) of the American Psychological Association, he is particularly interested in systems that produce a synergism between the human and the machine. One project was the Computer Aids for Vision and Employment (CAVE) Program. The goal was to design better computer systems and training packages for the visually impaired. He managed a subcontract on a project to study cognitive aids for intelligence analysts to counter denial and deception. The work consisted of a review of human information processing shortcomings with an emphasis on those shortcomings that make analysts vulnerable to denial and deception techniques. Then remedies, the cognitive aids, were identified to compensate for these shortcomings and increase the analysts’ awareness of the likelihood of denial and deception activities. In addition to neo-symbiotic systems, he is currently working on collaborative technologies and with metrics for collaboration and the analysis of nonconventional imagery.
364
About the Contributors
Zeng-Guang Hou received the BE and ME degrees in electrical engineering from Yanshan University, Qinhuangdao, China, in 1991 and 1993, respectively, and the PhD degree in electrical engineering from Beijing Institute of Technology, Beijing, China, in 1997. From May 1997 to June 1999, he was a postdoctoral research fellow at the Laboratory of Systems and Control, Institute of Systems Science, Chinese Academy of Sciences, Beijing. He was a research assistant at the Hong Kong Polytechnic University, Hong Kong, China, from May 2000 to January 2001. From July 1999 to May 2004, he was an associate professor at the Institute of Automation, Chinese Academy of Sciences, and has been a full professor since June 2004. From September 2003 to October 2004, he was a visiting professor at the Intelligent Systems Research Laboratory, College of Engineering, University of Saskatchewan, Saskatoon, SK, Canada. His current research interests include neural networks, optimization algorithms, robotics, and intelligent control systems. Ray Jennings is professor of philosophy and director of the Laboratory for Logic and Experimental Philosophy at Simon Fraser University, where he supervises research in logic and the biology of language. He was co-founder (with P.K. Schotch) of the preservationist approach to paraconsistency. He has published in, among others, the Journal of Philosophical Logic, Notre Dame Journal of Formal Logic, Logique et Analyse, Journal of the IGPL, Studia Logica, Zeitschrift für Mathematische Logik und Grundlagen der Mathematik, Analysis, Fundamenta Informaticae, Synthese. His papers on language are in numerous journals and collections including most recently, Mistakes of Reason (UTP) and A Semantics Reader (OUP). He is the author of The Genealogy of Disjunction (OUP) and of the Stanford University Encyclopaedia article Disjunction. He is co-author (with N. A. Friedrich) of Proof and Consequence (Broadview). He gave a set of lectures on the biology of language (Logicalization) at NASSLI’02, Stanford University. Witold Kinsner is professor and associate head at the Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Canada. He is also affiliate professor at the Institute of Industrial Mathematical Sciences, and adjunct scientist at the Telecommunications Research Laboratories, Winnipeg. He obtained a PhD in electrical engineering from McMaster University in 1974. He has authored and co-authored over 500 publications in his research areas. Dr. Kinsner is a senior member of the Institute of Electrical & Electronics Engineers (IEEE), a member of the Association of Computing Machinery (ACM), a member of the Mathematical and Computer Modelling Society, a member of Sigma Xi, and a life member of the Radio Amateurs of Canada. Qingyong Li is a lecturer in School of Computer and Information Technology at Beijing Jiaotong University. He holds a PhD from the Institute of Computing Technology, Chinese Academy of Sciences. His research interests include cognitive informatics, machine learning and image processing. Natalia López was born in Alcalá de Henares, Spain. She obtained her MS in mathematics in 1997 and her PhD in computer science in 2003 from the Universidad Complutense de Madrid. Since 1998, she has been in the Computer Systems and Computation Department, Universidad Complutense de Madrid (Spain) where she is an associate professor. Her topics of interest include process algebra, stochastic temporal systems, and formal testing methodologies. André Mayers is a professor of computer science at the University of Sherbrooke and a founder of a research group about intelligent tutoring systems, mainly focused on knowledge representation structures that simultaneously make easier the acquisition of knowledge by students, the identification of their plans during problem solving activities, and the diagnosis of knowledge acquisition. Mehdi Najjar is actually a postdoctral researcher on cognitive and computational modelling. He received his PhD in artificial intelligence from the University of Sherbrooke (Canada). He is also interested in knowledge representation, management and engineering within virtual learning environments and he collaborates with other researchers on the refinement of the knowledge representation structures within intelligent systems.
365
About the Contributors
Manuel Núñez is an associate professor in the computer systems and computation department, Universidad Complutense de Madrid (Spain). He obtained his MS degree in mathematics in 1992 and his PhD in computer science in 1996. Afterwards, he also studied economics, obtaining his MS in economics in 2002. Dr. Núñez has published more than 70 papers in international refereed conferences and journals. In the last years, he has been co-chair of the Forte 2004 conference and of FATES/RV 2006. His research interests cover both theoretical and applied issues, including testing techniques, formal methods, e-learning environments, and e-commerce. Fernando Lopez Pelayo obtained the following degrees (MSc, mathematics, Complutense University of Madrid (UCM); European PhD, computer science, University of Castilla–La Mancha (UCLM), Spain). Nowadays he is developing his teaching activities in the UCLM and in the Spanish Distance Learning University, UNED. His main research interests are focused on formal aspects of concurrency and performance, cognitive informatics, grid computing and symbolic computation. He has published about fifty scientific papers, a third of them in international journals and the rest in refereed international workshops/conferences. He is member of the scientific committeeies of a couple of journals and five workshops/conferences. Vaclav Rajlich received the PhD degree in mathematics from the Case Western Reserve University. He is a professor and former chair of the computer science department of Wayne State University. His research interests are software evolution and program comprehension. He published approximately 70 peer-reviewed articles and one book. He is a member of ACM and IEEE Computer Society. Amar Ramdane-Cherif received his PhD from Pierre and Marie University of Paris in 1998 in neural networks and IA optimization for robotic applications. Since 2000, he has been associate professor in the laboratory PRISM, University of Versailles, Saint-Quentin en Yvelines, France. His main current research interests include software architecture and formal specification, dynamic architecture, architectural quality attributes, architectural style, neural networks, and agent paradigms. Ismael Rodríguez is an associate professor in the computer systems and computation department, Universidad Complutense de Madrid (Spain). He obtained his MS degree in computer science in 2001 and his PhD in the same subject in 2004. Dr. Rodríguez received the Best Thesis Award of his faculty in 2004. He also received the Best Paper Award in the IFIP WG 6.1 FORTE 2001 conference. Rodríguez has published more than 40 papers in international refereed conferences and journals. His research interests cover formal methods, testing techniques, e-learning environments, and e-commerce. Fernando Rubio is an associate professor in the computer systems and computation department, Universidad Complutense de Madrid (Spain). He obtained his MS degree in computer science in 1997 and his PhD in the same subject in 2001. Dr. Rubio received the National Degree Award on the subject of computer science from the Spanish Ministry of Education in 1997, as well as the Best Thesis Award of his faculty in 2001. Dr. Rubio has published more than 40 papers in international refereed conferences and journals. His research interests cover functional programming, testing techniques, e-learning environments, and e-commerce. Guenther Ruhe received a doctorate in mathematics with emphasis on operations research from Freiberg University, Germany and a doctorate degree from both the Technical University of Leipzig and University of Kaiserslautern, Germany. From 1996 until 2001, he was deputy director of the Fraunhofer Institute for Experimental Software Engineering Fh IESE. Ruhe holds an Industrial Research Chair in Software Engineering at University of Calgary. This is a joint position between department of Computer Science and department of Electrical and Computer Engineering. His laboratory for Software Engineering Decision Support (see www.seng-decisionsupport.ucalgary.ca) is focusing on research in the area of intelligent support for the early phases of software system development, analysis of software requirements, empirical evaluation of software technologies, and selection of components-off-the -shelf (COTS) software products. He is the main inventor of a new generation of intelligent decision support tool for software release planning and prioritization. ReleasePlanner® (www.releaseplanner. com). Ruhe has published more than 155 reviewed research papers at journals, workshops, and conferences. Ruhe is a member of the ACM, the IEEE Computer Society, and the German Computer Society GI.
366
About the Contributors
Phillip C-Y. Sheu is currently a professor of computer engineering, information and computer science, and biomedical engineering at the University of California, Irvine. He received his PhD and MS from the University of California at Berkeley in electrical engineering and computer science in 1986 and 1982, respectively, and his BS from National Taiwan University in electrical engineering in 1978. He has published two books: Intelligent Robotic Planning Systems and Software Engineering and Environment: An Object-Oriented Perspective, and more than 100 papers in object-relational data and knowledge engineering and their applications, and biomedical computations. He is currently active in research related to complex biological systems, knowledge-based medicine, semantic software engineering, proactive web technologies, and large real-time knowledge systems for defense and homeland security. Dr. Sheu is a Fellow of IEEE. Zhiping Shi is an assistant professor at the Key Lab of Intelligent Information Processing of Institute of Computing Technology, Chinese Academy of Science. He received his PhD in computer software and theory in Institute of Computing Technology, Chinese Academy of Science in 2005. His research interests include contentbased visual information retrieval, image understanding, machine learning and cognitive informatics. Zhongzhi Shi is a professor at the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. His research interests include intelligence science, multi-agent systems, and semantic web. He has published 10 books, edited 11 books, and has more than 300 technical papers. His most recent books were Intelligent Agent and Applications and Knowledge Discovery (in Chinese). Shi is a member of the AAAI. He is the chair of WG 12.3 of IFIP. He also serves as vice president of the Chinese Association for Artificial Intelligence. He received the 2nd Grade National Award of Science and Technology Progress in 2002. In 1998 and 2001 he received the 2nd Grade Award of Science and Technology Progress from the Chinese Academy of Sciences. Jeffrey J.P. Tsai received his PhD in computer science from the Northwestern University, Evanston, Illinois. He is a professor in the Department of Computer Science at the University of Illinois at Chicago, where he is also the Director of Distributed Real-Time Intelligent Systems Laboratory. He co-authored Knowledge-Based Software Development for Real-Time Distributed Systems (World Scientific, 1993), Distributed Real-Time Systems: Monitoring, Visualization, Debugging, and Analysis (John Wiley and Sons, Inc., 1996), Compositional Verification of Concurrent and Real-Time systems (Kluwer, 2002), Security Modeling and Analysis of Mobile Agent Systems (Imperial College Press, 2006), co-edited Monitoring and Debugging Distributed Real-Time Systems (IEEE/CS Press, 1995), and Machine Learning Applications in Software Engineering (World Scientific, 2005). Taehyung Wang is an assistant professor in the Department of Computer Science at California State University Northridge (CSUN). His research interests include cognitive informatics, biomedical information system, software engineering, data mining, data warehousing, object-oriented design and analysis methodology, location-based service, data visualization, and Web technologies. Before joining CSUN, he worked as a researcher for the Visual Interactive Data Engineering Lab and the Center of Bioengineering at the University of California, Irvine. Dr. Wang received a PhD from the University of California at Irvine in 1998, and he received a MS in computer science from Western Illinois University and a B.S. in Control and Instrumentation from Seoul National University in 1985, respectively. Shaochun Xu received the PhD degree in computer science at Wayne State University, Detroit, USA, the PhD in geology from the University of Liege, Liege, Belgium, and the MSc degree in computer science from the University of Windsor, Windsor, Canada. From 1997 to 1999, he was a post-doctoral fellow in the Department of Geological Sciences at the University of Manitoba, Winnipeg, Canada. He is currently an assistant professor at the computer science department at Algoma University College, Laurentian University, Sault Ste. Marie, Canada. His research focuses on cognitive aspects of software engineering. Yiyu Yao received his BEng from Xi’an Jiaotong University, People’s Republic of China, in 1983, and his MSc and PhD from the University of Regina, Canada (1988 and 1991, respectively). He was an assistant professor
367
About the Contributors
and an associate professor at Lakehead University, Canada (1992-1998). He joined the Department of Computer Science at the University of Regina in 1998, where he is currently a professor of computer science. His research interests include data mining, rough sets, Web intelligence, granular computing, machine learning and information retrieval. Du Zhang received a PhD in computer science from the University of Illinois. He is a professor and chair at Department of Computer Science, California State University, Sacramento. He has authored or co-authored over 100 publications in journals, conference proceedings, and book chapters, and has edited or co-edited two books, five special issues for five journals, and five IEEE conference proceedings. Du Zhang is a senior member of IEEE and a member of ACM. He is an associate editor for International Journal on Artificial Intelligence Tools and a member of editorial board for International Journal of Cognitive Informatics and Natural Intelligence. Yan Zhao received her BEng from the Shanghai University of Engineering and Science, People’s Republic of China, in 1994, MSc and PhD from the University of Regina, Canada in 2003 and 2007, respectively. Her research interests include data mining and machine learning.
368
369
Index A abstraction 92, 93 activity-centered design 108 algebra, concept (CA) 1, 2, 10, 11, 12, 16, 18, 20, 24, 50, 51, 104, 115, 155, 185, 195, 197, 218, 219, 244, 274, 275, 276, 289, 325, 329, 333, 336, 337, 339, 340, 342, 343, 346, 347, 348, 350, 355, 356, 358, 362 algebra, real-time process (RTPA) 1, 2, 10, 12, 13, 14, 16, 18, 20, 21, 22, 23, 24, 25, 65, 66, 71, 76, 92, 77, 97, 94, 97, 98, 99, 101, 102, 103, 104, 130, 131, 137, 139, 140, 157, 158, 159, 162, 170, 175, 182, 187, 332, 333, 358 algebra, system (SA) 1, 2, 10, 14, 15, 16, 18, 25, 26, 358 algebra PSEN 228 attention-guided sparse coding (AGSC) model 81, 82, 85, 86, 87, 88, 89 attitude 8, 21, 65, 66, 69, 70, 71, 73, 74, 75, 76, 77, 341 AURELLIO 248, 253, 257, 260, 261 autonomic computing (AC) 1, 2, 24, 32, 172, 173, 174, 175, 177, 178, 179, 180, 181, 182, 183, 184, 185, 192, 193, 194, 200 axiom of choice 131
B Bayesian method 130 behaviors, cognitive 175 behaviors, instructive 175 behaviors, perceptive 175 behaviors, reflective 175 Boole, George 118
C Carnot, Sadi 45, 46 cognition 7, 24, 28, 33, 38, 47, 51, 108, 109, 112,
113, 115, 116, 156, 182, 186, 188, 189, 191, 193, 194, 196, 198, 244, 248, 259, 260, 265, 278, 293, 299, 304, 305, 306, 318, 328, 332, 343, 350, 355, 357, 359 cognitive informatics (CI) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 24, 7, 8, 9, 10, 9, 10, 17, 16, 17, 16, 17, 18, 19, 20, 21, 26, 25, 26, 24, 26, 27, 29, 33, 34, 35, 45, 46, 49, 50, 51, 53, 54, 26, 77, 78, 90, 104, 2, 24, 55, 63, 117, 5, 105, 114, 25, 1, 28, 64, 129, 48, 115, 117, 141, 143, 155, 170, 173, 179, 185, 186, 187, 193, 194, 195, 197, 198, 199, 200, 218, 219, 234, 245, 261, 263, 264, 275, 276, 289, 290, 293, 297, 302, 303, 305, 318, 324, 325, 327, 328, 329, 330, 331, 332, 333, 334, 335, 338, 340, 342, 347, 348, 349, 350, 351, 353, 358, 359, 360, 361, 362 cognitive machine model, Haikonen 190 cognitive machines 185, 188, 189, 190, 192, 193, 194, 197, 332, 333, 347, 348 cognitive modelling 220–234 cognitive scientist 118 constructivist learning, during softaware development 292–303 control theory 299
D deduction 95, 97 digital sentience 189 discrepancy distance 81, 83, 86
E emotion 8, 20, 66, 67, 68, 69, 70, 71, 76, 67, 108, 174, 179 emotion, strength of 67, 68, 71 emotional system, human 67, 76 entropy, Boltzmann 46 entropy, Boltzmann-Gibbs 46
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Index
entropy, Clausius 45 entropy, conditional 40, 41 entropy, higher-order message 40 entropy, joint 40, 41 entropy, Kolmogorov 44 entropy, Kolmogorov-Sinai (KS) 44, 45, 305 entropy, mutual 40, 42 entropy, negative (negentropy) 45, 46 entropy, Prigogine 45 entropy, relative 42 entropy, Stonier 47 entropy spectrum, Rényi 42, 43, 48, 194 eXtreme Programming process 294
F fiat projection 142, 145, 153 finite state machines (FSM) 52, 53, 54, 55, 56, 57, 58, 61 first-order logic 226 folding machine 57, 58, 59, 60, 61 formal logical inferences 92
G gas pipeline operation advisor (GPOA) 278, 281, 287, 288 granular computing (GrC) 238
H Hartley, Ralph 36, 37, 49, 344 human/machine symbiosis 109
I imperative computing (IC) 173, 174, 175, 176, 177, 178, 182 inferences 92, 93, 94, 95, 97, 103, 104, 358 inferential modelling technique (IMT) 278, 279, 280, 282, 283, 284, 285, 286, 287, 288 information-matter-energy (IME) model 2, 3, 4, 24, 33 intelligence, artificial (AI) 1, 6, 24, 132, 172, 174, 248, 262, 264, 276, 184, 276, 269, 279, 289, 290, 327, 330, 332, 337, 351, 354, 356, 357, 360 intelligence, natural (NI) 1, 2, 5, 6, 7, 24, 174, 180, 181, 182, 184, 332 interactive classification system (ICS) 235, 241, 242, 243 interactive motivation-attitude theory 65, 76
370
K knowledge acquisition (KA) 278, 282 knowledge base (KB) 266, 267, 268, 269, 270, 271, 272, 273, 274 Kolmogorov, Andrei N. 28, 44, 48, 50, 305, 347
L language, and semantics 119 layered reference model of the brain (LRMB) 1, 2, 5, 8, 18, 20, 21, 24, 27, 65, 66, 68, 76, 93, 97, 130, 77, 105, 103, 136, 137, 141, 156, 174, 179, 180, 181, 182, 183, 187, 245, 290, 334, 360 leaning process 294 learning, brain-based 299 learning, observational 299 Licklider, J.C.R. 106
M man/machine symbiosis 109 memory, long term (LTM) 6, 7, 136, 137, 167, 168, 181 memory, short term (STM) 6, 167, 181 memory neural network (MNN) 200, 202, 204, 205, 210, 217 modified memory neural network (MMNN) 201, 204, 217 motivation 8, 20, 65, 66, 67, 68, 69, 70, 71, 74, 75, 76, 174, 179, 188, 190, 236, 237, 332 motivation/attitude-driven behavioral (MADB) model 8, 69, 71
N natural language 120 neo-symbiosis 106–117 neo-symbiosis, implementing of 110 neural network (NN) paradigm 200, 201, 205, 210, 213, 215 neural systems 78 neurons 6, 18, 78, 79, 80, 82, 85, 136, 201, 204, 205, 210, 220, 221, 222, 233, 236, 328
O object-attribute-relation (OAR) model 1, 2, 5, 6, 7, 10, 18, 24, 26, 65, 71, 76, 77, 97, 99, 100, 103, 105, 136, 137, 167, 168, 245, 331, 334, 359 Occam’s razor principle 52, 53, 54, 55 Occam, William of 52, 53, 54, 55, 63, 337, 340, 351
Index
P pathology, in cognitive function 223–234 perception 1, 2, 5, 8, 17, 18, 20, 24, 28, 33, 38, 47, 49, 65, 66, 68, 79, 81, 82, 86, 89, 90, 103, 107, 108, 109, 117, 137, 143, 167, 173, 174, 179, 188, 190, 191, 194, 196, 198, 223, 224, 226, 230, 249, 259, 299, 304, 305, 306, 318, 328, 332, 343, 345, 354 preliminary knowledge, and learning 294 Prigogine, Ilya 28, 33, 34, 44, 45, 51, 353 programmer learning 292 psychology, engineering of 108
Shannon’s self-information 37, 38, 39, 41, 42, 44, 47, 315 Shannon’s source entropy and redundancy 39 Shannon, Claude 4, 25, 28, 29, 33, 36, 37, 38, 39, 42, 43, 44, 45, 47, 48, 50, 51, 172, 185, 186, 315, 316, 317, 346, 348, 355 sparse coding theory 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 345 spatial environments 142, 143, 153, 155, 340 stochastic process algebra (STOPA) 157, 158, 168, 169
T
R
time delay neural network (TDNN) 204
region connection calculus (RCC-8) theory 143, 146, 147, 151, 152, 153 response saliency 82, 86 rule base (RB) 266, 267, 268, 270, 271, 272, 273, 274
V
S
virtual learning environments (VLE) 247, 248, 253, 254, 262
W working memory (WM) 266, 267, 268, 270, 276
schizophrenia 220–234 Shannon’s code entropy 39
371