From Brains to Systems
Advances in Experimental Medicine and Biology Editorial Board: IRUN R. COHEN, The Weizmann Institute of Science, Rehovot, Israel ABEL LAJTHA, N.S. Kline Institute for Psychiatric Research, Orangeburg, NY, USA JOHN D. LAMBRIS, University of Pennsylvania, Philadelphia, PA, USA RODOLFO PAOLETTI, University of Milan, Milan, Italy
For further volumes: www.springer.com/series/5584
Carlos Hernández Ricardo Sanz Jaime Gómez-Ramirez Leslie S. Smith Amir Hussain Antonio Chella Igor Aleksander Editors
From Brains to Systems Brain-Inspired Cognitive Systems 2010
Editors Carlos Hernández Departamento de Automática Universidad Politécnica de Madrid Jose Gutierrez Abascal 2 28006 Madrid Spain
[email protected] Prof. Dr. Ricardo Sanz Departamento de Automática Universidad Politécnica de Madrid Jose Gutierrez Abascal 2 28006 Madrid Spain
[email protected] Dr. Jaime Gómez-Ramirez Universidad Politécnica de Madrid C/Jose Gutierrez Abascal 2 28006 Madrid Spain
[email protected]
Dr. Amir Hussain Dept. Computing Science University of Stirling Stirling UK
[email protected] Dr. Antonio Chella Dipto. Ingegneria Automatica e Informatica Università Palermo Corso Pizasi 106 90144 Palermo Italy
[email protected] Prof. Dr. Igor Aleksander Dept. Electrical & Electronic Engineering Imperial College of Science, Technology & Medicine Exhibition Rd., Building Room 1009 SW7 2BT London UK
[email protected]
Dr. Leslie S. Smith Dept. Computing Science University of Stirling Stirling UK
[email protected]
ISSN 0065-2598 ISBN 978-1-4614-0163-6 e-ISBN 978-1-4614-0164-3 DOI 10.1007/978-1-4614-0164-3 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011932797 © Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To our beloved families and caring friends.
Preface
The chapters included in this book are extended versions of the most relevant works presented at the Brain-inspired Cognitive Systems Conference held in July 2010 in Madrid, during mild estival days. BICS 2010 was a multitrack conference organised around four strongly related symposia: • • • •
The Sixth International Symposium on Neural Computation (NC 2010) The Fifth International Symposium on Biologically Inspired Systems (BIS 2010) The Fourth International Symposium on Cognitive Neuroscience (CNS 2010) The Third International Symposium on Models of Consciousness (MoC 2010)
BICS 2010 was the fourth of a series of BICS events taking place biennially. The three previous BICS conferences were BICS 2008 (Sao Luis, Brazil), BICS 2006 (Lesbos, Greece) and BICS 2004 (Stirling, UK). The Brain Inspired Cognitive Systems Conference in Madrid brought together a group of leading scientists and engineers who use analytic and synthetic methods both to understand the astonishing cognitive processing properties of biological systems, and specifically those of the living brain, and to exploit such knowledge to advance engineering methods for building artificial systems with higher levels of cognitive competence. The four BICS 2010 Conference Symposia were closely connected events around different aspects of the relation between brain science and the engineering of cognitive systems. The scientific program tried to encourage cross-fertilization across the many symposia topics. This emphasized the role of BICS as a major meeting point for researchers and practitioners in the areas of biological and artificial cognitive systems, encouraging debates across disciplines so as to enrich researchers with complementary perspectives from the diverse scientific fields: NC 2010 presented realistic neural network models and applications. In particular, the symposium focussed on pattern onset learning, structural analyses on Spike-Timing-Dependent Plasticity (STDP) and computational implementations of the Continuum Neural Field Theory. BIS 2010 was mainly devoted to neuromorphic systems and neurophysiologically inspired models. The symposium explored biologically inspired architectures for simulation of object perception, decision making, attention, language or emotions in autonomous agents. CNS 2010 covered both computational models of the brain and brain-inspired algorithms and artifacts. This symposium presented a wide-ranging set of empirical and theoretical papers on key topics in the field of cognitive neuroscience such as, perception, attention, memory or cognitive impairment. MoC 2010 shed light on both philosophical and neurological basis of consciousness. Machine Consciousness focusses on both aspects by investigating how to build self-aware machines. The symposium focused on Machine Consciousness and presented papers such as a metric of visual qualia in arvii
viii
Preface
tificial cognitive architectures, and design and implementation principles for self-conscious robots or machine free will. BICS 2010 gathered cognitive systems engineers and brain scientists in sessions where crossdomain ideas were fostered in the hope of getting new emerging insights on the nature, operation and extractable capabilities of brains. This multiple perspective approach is necessary in complex cognitive systems engineering because the progressively more accurate data about brains is producing a growing need of both a quantitative and theoretical understanding and an associated capacity to manipulate this data and translate it into engineering applications rooted in sound theories. The conference hosted both researchers that aim to build brain inspired systems with higher cognitive competences, and as well as life scientists who use and develop mathematical and engineering approaches for a better understanding of complex biological systems like the brain. All them trying to meet at the point of rigorous theorising necessary both to understand biology and support engineering. The four symposia and this resulting book—a collection of selected and extended papers—is an attempt to provide a broader perspective on these issues which are at the core of XXI century science: the discovery of the organisational principles governing the neural dynamics that mediate in cognition and the potential application of these principles into technical systems. Madrid, Spain
Autonomous Systems Laboratory www.aslab.org
Ricardo Sanz Carlos Hernández Jaime Gómez-Ramirez
Acknowledgements
This book would not have been possible without the dedicated effort of many people to whom we can but express our gratitude here. First and foremost, these are the authors of the chapters, who put a thouroughly application into adapting the extraordinary work they presented at the BICS 2010 Conference that gave birth to this book. We acknowledge the support provided by the Autonomous Systems Laboratory, the Escuela Técnica Superior de Ingenieros Industriales of the Universidad Politécnica de Madrid and the Ministerio de Ciencia y Tecnologia for the organisation of the BICS 2010 Conference. We also owe our gratitude to all the people that collaborated in the celebration of the BICS conference, the organisation, reviewers and specially the contributing participants. Finally very special thanks to Julia Bermejo, whose dedicated help in the revision of the material has ensured the quality of this publication, and to Ann Avouris, the Springer Publishing Editor, whose support has made this book possible.
ix
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo Sanz, Carlos Hernández, and Jaime Gómez-Ramirez
1
2
Emergent Feature Sensitivity in a Model of the Auditory Thalamocortical System . . Martin Coath, Robert Mill, Susan L. Denham, and Thomas Wennekers
7
3
STDP Pattern Onset Learning Depends on Background Activity . . . . . . . . . . . James Humble, Steve Furber, Susan L. Denham, and Thomas Wennekers
19
4
Emergence of Small-World Structure in Networks of Spiking Neurons Through STDP Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gleb Basalyga, Pablo M. Gleiser, and Thomas Wennekers
33
5
Coupling BCM and Neural Fields for the Emergence of Self-organization Consensus Mathieu Lefort, Yann Boniface, and Bernard Girau
6
Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease: A Study Using a Computational Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basabdatta Sen Bhattacharya, Damien Coyle, and Liam P. Maguire
57
Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tapani Raiko and Harri Valpola
75
7
41
8
Internal Simulation of Perceptions and Actions . . . . . . . . . . . . . . . . . . . . . Magnus Johnsson and David Gil
87
9
Building Neurocognitive Networks with a Distributed Functional Architecture . . . 101 Marmaduke Woodman, Dionysios Perdikis, Ajay S. Pillai, Silke Dodel, Raoul Huys, Steven Bressler, and Viktor Jirsa
10 Reverse Engineering for Biologically Inspired Cognitive Architectures: A Critical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Andreas Schierwagen 11 Competition in High Dimensional Spaces Using a Sparse Approximation of Neural Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Jean-Charles Quinton, Bernard Girau, and Mathieu Lefort 12 Informational Theories of Consciousness: A Review and Extension . . . . . . . . . . 139 Igor Aleksander and David Gamez xi
xii
Contents
13 Hippocampal Categories: A Mathematical Foundation for Navigation and Memory . 149 Jaime Gómez-Ramirez and Ricardo Sanz 14 The Role of Feedback in a Hierarchical Model of Object Perception . . . . . . . . . . 165 Salvador Dura-Bernal, Thomas Wennekers, and Susan L. Denham 15 Machine Free Will: Is Free Will a Necessary Ingredient of Machine Consciousness? . 181 Riccardo Manzotti 16 Natural Evolution of Neural Support Vector Machines . . . . . . . . . . . . . . . . . 193 Magnus Jändel 17 Self-conscious Robotic System Design Process—From Analysis to Implementation . 209 Antonio Chella, Massimo Cossentino, and Valeria Seidita 18 Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture . . . . . 223 Raúl Arrabales, Agapito Ledezma, and Araceli Sanchis 19 The Ouroboros Model, Selected Facets . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Knud Thomsen Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Contributors
Igor Aleksander Department of Electrical Engineering, Imperial College, London SW7 2BT, UK,
[email protected] Raúl Arrabales Carlos III University of Madrid, Avda. Universidad, 30, 28911 Leganés, Spain,
[email protected] Gleb Basalyga Centre for Robotics and Neural Systems (CRNS), University of Plymouth, Plymouth, PL4 8AA, UK,
[email protected] Basabdatta Sen Bhattacharya University of Ulster, Magee Campus, Northland Road, Derry BT48 7JL, Northern Ireland, UK,
[email protected] Yann Boniface LORIA, Campus Scientifique–BP 239, 54506 Vandoeuvre-lès-Nancy Cedex, France,
[email protected] Steven Bressler Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton FL, USA Antonio Chella Dipartimento di Ingegneria Chimica Gestionale Informatica Meccanica, Universitá degli Studi di Palermo, Viale delle Scienze, 90128 Palermo, Italy,
[email protected] Martin Coath University of Plymouth, Drake Circus, PL4 8AA, UK,
[email protected] Massimo Cossentino Istituto di Calcolo e Reti ad Alte Prestazioni, Consiglio Nazionale delle Ricerche, Viale delle Scienze, 90128 Palermo, Italy,
[email protected] Damien Coyle University of Ulster, Magee Campus, Northland Road, Derry BT48 7JL, Northern Ireland, UK,
[email protected] Susan L. Denham Centre for Robotics and Neural Systems, University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA, UK,
[email protected] Silke Dodel Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton FL, USA Salvador Dura-Bernal Centre for Robotics and Neural Systems, University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA, UK,
[email protected] Steve Furber School of Computer Science, University of Manchester, Manchester, UK,
[email protected] Jaime Gómez-Ramirez Autonomous Systems Laboratory, Universidad Politécnica de Madrid, José Gutiérrez Abascal 2, 28006 Madrid, Spain,
[email protected] xiii
xiv
Contributors
David Gamez Department of Computing, Imperial College, London SW7 2BT, UK,
[email protected] David Gil Computing Technology and Data Processing, University of Alicante, Alicante, Spain,
[email protected] Bernard Girau INRIA/LORIA Laboratory, Campus Scientifique, B.P. 239, 54506 Vandoeuvre-lèsNancy Cedex, France,
[email protected] Pablo M. Gleiser CONICET, Centro Atomico Bariloche, Bariloche, Argentina,
[email protected] Carlos Hernández UPM Autonomous Systems Laboratory, José Gutierrez Abascal 2, 28006 Madrid, Spain,
[email protected] James Humble School of Computing and Mathematics, University of Plymouth, Plymouth, UK,
[email protected] Raoul Huys Theoretical Neuroscience Group, Université de la Méditerranée, Marseille, France Magnus Jändel Swedish Defence Research Agency, 164 90 Stockholm, Sweden,
[email protected] Viktor Jirsa Theoretical Neuroscience Group, Université de la Méditerranée, Marseille, France; Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton FL, USA Magnus Johnsson Lund University Cognitive Science, Kungshuset, Lundagård, 222 22 Lund, Sweden,
[email protected] Agapito Ledezma Carlos III University of Madrid, Avda. Universidad, 30, 28911 Leganés, Spain,
[email protected] Mathieu Lefort INRIA/LORIA Laboratory, Campus Scientifique, B.P. 239, 54506 Vandoeuvre-lèsNancy Cedex, France,
[email protected] Liam P. Maguire University of Ulster, Magee Campus, Northland Road, Derry BT48 7JL, Northern Ireland, UK,
[email protected] Riccardo Manzotti Institute of Communication and Behaviour, IULM University, Via Carlo Bo, 8, 20143 Milan, Italy,
[email protected] Robert Mill University of Plymouth, Drake Circus, PL4 8AA, UK Dionysios Perdikis Theoretical Neuroscience Group, Université de la Méditerranée, Marseille, France Ajay S. Pillai Brain Imaging and Modeling Section, National Institute on Deafness and Other Communication Disorders, National Institutes of Health, Bethesda MD, USA Jean-Charles Quinton INRIA/LORIA Laboratory, Campus Scientifique, B.P. 239, 54506 Vandoeuvre-lès-Nancy Cedex, France,
[email protected] Tapani Raiko Department of Information and Computer Science, Aalto University, Helsinki, Finland,
[email protected]; ZenRobotics Ltd., Helsinki, Finland Araceli Sanchis Carlos III University of Madrid, Avda. Universidad, 30, 28911 Leganés, Spain,
[email protected] Ricardo Sanz Autonomous Systems Laboratory, Universidad Politécnica de Madrid, José Gutiérrez Abascal 2, 28006 Madrid, Spain,
[email protected]
Contributors
xv
Andreas Schierwagen Institute for Computer Science, Intelligent Systems Department, University of Leipzig, Leipzig, Germany,
[email protected] Valeria Seidita Dipartimento di Ingegneria Chimica Gestionale Informatica Meccanica, Universitá degli Studi di Palermo, Viale delle Scienze, 90128 Palermo, Italy,
[email protected] Knud Thomsen Paul Scherrer Institut, 5232 Villigen PSI, Switzerland,
[email protected] Harri Valpola ZenRobotics Ltd., Helsinki, Finland,
[email protected]; Department of Biomedical Engineering and Computational Science, Aalto University, Helsinki, Finland Thomas Wennekers Centre for Robotics and Neural Systems, University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA, UK,
[email protected] Marmaduke Woodman Theoretical Neuroscience Group, Université de la Méditerranée, Marseille, France,
[email protected]; Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton FL, USA
Chapter 1
Introduction From Brains to the Machines of the Future Ricardo Sanz, Carlos Hernández, and Jaime Gómez-Ramirez
1.1 Introduction Real-world, optimally performant, mission-flexible robots in open-ended environments have been predicted to arrive on a short time by many technologists. Indeed, they have been arriving in 25 years at least during the last 30 years [11]. This is a similar scenario to what has been happening with controlled nuclear fusion. Like fusion reactors, the promised robots are not yet here. The machines of the future are still inside the movies. Commercial robots—the robots that people will pay-for today—are still only able to operate in controlled or semi-controlled environments doing quite simple tasks: welding car parts in factories or cleaning bathroom floors. The complexities of dwelling in the real world, performing heterogeneous tasks in open-ended, dynamic environments, have proven too difficult for the control technologies available these days. However, the minute animals in our environment are perfectly able to manage in these conditions. They prove that the problems of real-world activity can be solved in economic ways. The bio-inspired systems research programme is guided by the idea that the solution can be found in their senses, legs or brains. It should be possible to leverage the technologies produced by Darwinian evolution to improve the behavior of our machines.
1.2 Going From Brains to Machines The behavior of a machine is determined by the interaction between its internal dynamics and the environment it is coupled to. The design process of machines starts in the identification of the desired environmental changes—e.g. moving water from here to there—and goes into the design of a structure for the machine that will bring forth such an effect by the emerging machine-environment interaction dynamics. When the desired behavior is complex the design of the machine structure is divided into two parts: a physical subsystem capable of producing the environment changes and an informational subsystem R. Sanz () Autonomous Systems Laboratory, Universidad Politécnica de Madrid, José Gutiérrez Abascal 2, 28006 Madrid, Spain e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_1, © Springer Science+Business Media, LLC 2011
1
2
R. Sanz et al.
Fig. 1.1 Brain-inspired robotics serve both as an engineering method and as a experimental testbed for biological theories. Theories of brain function can be tested in robot-based controllers implementing transferred theoretical models
that forces the machine behave in a certain way. The term controller is used for the informational subsystem in charge of forcing behavior. There are different strategies to build controllers. The conventional engineering strategy is to build the controller in such a way that the machine will necessarily behave as desired. Let’s call this strategy the design and build strategy for artificial minds. This is done using first principles and classic engineering strategies for design [9]. However, when behavior or environment is way too complicated, the design strategy does not cope well. An alternative strategy is used when the task requirements and its constraints are so complex that we cannot apply the design-build strategy. This strategy is based on reverse-engineering systems that manifest the desired behaviour and copy their functional organisations. This second strategy—let’s call it reverse-engineer and copy—is what is addressed in this volume, taking the brain as a source of design inspiration. The study of the brain as source of design knowledge for more effective machines offers the possibility of addressing extremely complex, real-world tasks than only animals can perform so far. Consider for example the apparently simple task of going around, picking some waste objects to dispose of them into the recycling bin. This is a task that can be done by a toddler or a trained dog. This is also easily done by a robot when the objects and the recycling bin fit certain predefined perceptual categories and the environment conditions are kept in a narrow operative band (e.g. illumination is broadly uniform and sufficient, or the ground is even enough and uncluttered). However, when these operational conditions are not met, the designed-built robot, designed departing from now no longer holding assumptions, will fail in performing the task. Do not try to find articles about robots failing in the scholarly journals of the field. From time to time they appear in funny videos in YouTube but nobody is willing to publish about failures—neither authors nor editors. But failures are there: robots are not functionally robust enough except in performing simple tasks. There is a strong need of improving mission-level robustness if the robots are going to be able to provide their services in open-ended conditions. It is in this context when we revert to the second strategy: copying brains. Bioinspired cognitive architectures offer the promise of solving this kind of problems because the original architectures— those of brains—are already solving them. There are plenty of threads in this research strategy. Some of them are focused in physical competences of agents but most of them are related to mental competences. Current trends tend to depart from the exploration of the abstractions about the mind and intelligence (as embraced by AI of the sixties), turning to the insights gained exploring the brain [14]. Some will argue that by focusing in the brain we are losing the necessary holistic picture of biological agents. Beyond discussions about embodiment and disembodiment [1, 6] there is a clear need of focusing on the cognitive organisation of the agent (that obviously encompasses the body [12]). While the body is enormously relevant in cognitive processes [2, 10], the role of the brain in higher level cognition is indubitable. The flow from brain knowledge to robotics is a potential source of technological assets. Also, while brain-inspired robotics is a very promising engineering method it is also well settled as an experimental testbed for biological theories (see Fig. 1.1).
1 Introduction
3
Fig. 1.2 Brain-inspired robotics will be a technology sensu stricto when biological implementation details are abstracted out and only systemic aspects prevail. This implies rendering the theories in cognitive neuroscience in a form that is devoid of biological ties (the abstract Theory level shown in the figure above)
Building robot controllers by implementing theories about the brain will serve two basic purposes: (i) controlling the robots and (ii) exploring the implications of the theories and, in a sense, validating them in their ecological contexts [4]. However, systematic cognitive systems engineering requires solid theories and not just a collection of inexplicable designs. It is necessary to transition from a catalog of ad-hoc cognitive mechanisms to a rigorous cognitive science; this science may later be applied in the requirements-driven, design process necessary for attaining pre-specified performance levels (see Fig. 1.2). In bioinspired cognitive systems’ engineering it is necessary to extract basic design principles [7, 8]. It is not enough to copy the organisations of animals’ brains or bodies [13, 17]. This is the fundamental methodological doctrine behind the several works included in this book: all they try to go beyond the shallow analysis of biological structures, trying to offer more profound, rigorous visions on cognitive systems operation.
1.3 Book Contents The book contains eighteen chapters that cover the whole spectrum of the conference. From models of biological aspects at molecular levels to philosophical considerations about the most abstract aspects of minds. Coath et al.—Emergent Feature Sensitivity in a Model of the Auditory Thalamocortical System— investigate plasticity of the brain auditory system. They address the question of whether a recurrently connected thalamocortical model exhibiting spike time dependent plasticity can be tuned to specific features of a stimulus. This is of relevance to the understanding of post-natal—and beyond— construction of cortical and thalamic representations of the features of auditory stimulus that will be available for the cortex-related auditory processing. This work is relevant for the understanding of continuous, post-developmental plasticity that may be critical for robust autonomous systems in changing environments. Humble et al.—STDP Pattern Onset Learning Depends on Background Activity—study to what extent the well-known spike-timing dependent plasticity [5] depends on background activity leading
4
R. Sanz et al.
even to instabilities. From their results the authors present preliminary insights into the neuron’s encoding of temporal patterns of coincidence of spikes and how the temporal precision of the onset response depends on the background activity. Basalyga et al.—Emergence of Small-World Structure in Networks of Spiking Neurons Through STDP Plasticity—investigate how a neural network structure changes under synaptic plasticity. They use complex networks of conductance-based, single-compartment integrate-and-fire excitatory and inhibitory neurons showing that under certain conditions, a nontrivial small-world1 structure can emerge from a random initial network by learning. Lefort et al.—Coupling BCM and Neural Fields for the Emergence of Self-organization Consensus—focus on the integration of multimodal perception. They propose a cortex-inspired models for multi-modality association. The model integrates modality maps using an associative map to raise a consistent multimodal perception of the environment. They couple the BCM learning rule and neural maps to obtain a decentralized and unsupervised self-organization. Bhattacharya et al.—Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease: A Study Using a Computational Model—address theoretical model construction towards solving clinical issues of disease. Their models—of thalamocortical circuitry which exhibits oscillation within the theta and the alpha bands—are aimed at gaining a better understanding of the neuronal mechanisms underlying EEG band power changes. Their work shows how the change in model oscillatory behaviour is related to changes in the connectivity parameters in the thalamocortical as well as sensory input pathways. This understanding of the mechanics under the disease symptomatology may in the future provide useful biomarkers towards early detection of the Alzheimer’s disease and for neuropharmaceutical investigations. Raiko and Valpola—Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention—study the emergent properties of a cortex-inspired artificial neural network for image segmentation. They combine segmentation by oscillations and biased competition for perceptual processing. They show encouraging results of experiments using artificial image data. Johnsson and Gil—Internal Simulation of Perceptions and Actions—address the architectural aspects of neural network architectures based on associative self-organising maps to be able to internally simulate perceptions and actions. They present several topologies—mostly recurrently connected—as e.g. a bimodal perceptual architecture and action neural networks adapted by the delta rule. They show simulation tests that show encouraging experimental results. Woodman et al.—Building Neurocognitive Networks with a Distributed Functional Architecture—suggest that the very possibility of successful modeling human behavior with reduceddimensionality models is a key point in understanding the implementation of cognitive processes in general. They suggest that this is due to a separation in the time scales of the dynamics guiding neural processes and the overall behavioral expression, offering a distributed model based on structured flows on manifolds to understand the organization of this class of behavioral dynamics. They demonstrate this model in a functional architecture of handwriting showing hierarchical sequencing of behavioral processes. Schierwagen—Reverse Engineering for Biologically Inspired Cognitive Architectures: A Critical Analysis—analyses methodological and theoretical issues in the development of biologically inspired cognitive systems. He is concerned about the very possibility of reverse-engineering brains by conventional decompositional analysis. Schierwagen concludes that this approach is a no go, discussing the implications for investigations of organisms and behavior as sources of engineering knowledge. Quinton et al.—Competition in High Dimensional Spaces Using a Sparse Approximation of Neural Fields—address the computational tractability of implementations of the continuum neural field 1A
small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps.
1 Introduction
5
theory when an adaptive resolution or an arbitrary number of input dimensions is required. They propose a more economic alternative to self-organizing maps using a sparse implementation based on Gaussian mixture models. They test the proposed algorithm in a reactive color tracking application, using spatially distributed color features. Aleksander and Gamez—Informational Theories of Consciousness: A Review and Extension— analyse recent theories that establish a systematic link between conscious experience and the flow of information—differentiation and integration—in certain areas of the brain. They analyse measures of information integration [15, 16] and some related algorithms for providing quantitative measures of information integration or causal density; hopefully to be used to make predictions about consciousness. They analyse the computational complexity of these algorithms, which limit their application to just small datasets—networks of around a dozen neurons—implementing one of the better known algorithms in the SpikeStream neural simulator to carry out some experimental comparisons. Gómez and Sanz—Hippocampal Categories: A Mathematical Foundation for Navigation and Memory—address the theoretical tools necessary for capturing the theories of cognition that span from neurons to psychological aspects. The mathematical theory of categories is proposed as a valid foundational framework for theoretical modeling in brain sciences, and demonstrated presenting a category-based formal model of grid cells and place cells in hippocampus. Dura-Bernal et al.—The Role of Feedback in a Hierarchical Model of Object Perception—address the question of robust object recognition—including occluded and illusory images, or position and size invariances. They propose a model derived from the HMAX model showing how this feedforward system can include feedback, by means of an architecture which reconciles biased competition and predictive coding approaches. This work provides a biologically plausible model of the interaction between top-down global feedback and bottom-up local evidence in the context of hierarchical object perception. Manzotti—Machine Free Will: Is Free Will a Necessary Ingredient of Machine Consciousness?— addresses the elusive concept of free will in a mechanistic context. Manzotti analyses whether freedom and consciousness are independent aspects of the human mind or by-product of the same underlying structure; this analysis leads to the author outlining a proposal for an architecture sustaining machine free will. Jändel—Natural Evolution of Neural Support Vector Machines—describe two different neural implementations of support vector machines for one-shot trainable pattern recognition. One is based on oscillating associative memory—inspired in the olfactory system—and the second is founded on competitive queuing memory—originally employed for generating motor action sequences in the brain. For both support vector machine models they show that there is a plausible evolutionary path showing that they can apparently emerge by natural processes. Chella et al.—Self-Conscious Robotic System Design Process—from Analysis to Implementation—address some of the engineering issues concerning the development of robots endowed with self-conscious capabilities. They analyse the whole engineering lifecycle (from analysis to implementation) focusing on aspects that are specific to the development of self-conscious robotic systems. They propose a new design process—PASSIC—offering custom software engineering techniques for realizing the complex sub-systems needed. This work binds the studies of consciousness with the necessary engineering methods to apply them. Arrabales et al.—Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture— touch upon the elusive problem of hard consciousness in robots. They attack qualia by a complementary study building “artificial visual qualia” using their cognitive architecture CERA-CRANIUM based on Baars [3] global workspace theory. They study artificial qualia as simulated, synthetic visual experience. The inspection of the dynamics and transient inner states of the cognitive artificial system let them discuss the possible existence of similar mechanisms in human brains. Thomsen—The Ouroboros Model, Selected Facets—describes some fundamental aspects of the Ouroboros cognitive architecture: self-referential recursive processes, schema-based memory organi-
6
R. Sanz et al.
sation, feature-driven expectations, etc. Thomsen shows how the Ouroboros Model can address biological cognitive system aspects like attention, emotion, priming, masking, learning, sleep and consciousness.
1.4 Value and Perspectives Science moves in little steps, but also makes its progress with revolutionary discoveries and concepts that sweep away whole and entire edifices of thinking and replace them with new theories that explain more with less. However, there is a constant in this march, the strive for mathematisation and unification. The extent to which reverse-engineering of brains will help with technological advance in the engineering of more robust autonomous systems is yet to be clear. Nevertheless, the different approaches offered in this book show a steady progress toward more rigorous methods of analysis and synthesis. This rigour implies that they may eventually converge into a single, unified theory of cognition: the very holy grail of cognitive science and engineering.
References 1. Anderson, M.L.: Embodied cognition: A field guide. Artif. Intell. 149, 91–130 (2003) 2. Arbib, M.A., Liaw, J.S.: Sensorimotor transformations in the worlds of frogs and robots. Artif. Intell. 72(1–2), 53–79 (1995) 3. Baars, B.J.: In the theatre of consciousness. Global workspace theory, a rigorous scientific theory of consciousness. J. Conscious. Stud. 4, 292–309 (1997) 4. Banerjee, R., Chakrabarti, B.K. (eds.): Models of Brain and Mind—Physical, Computational and Psychological Approaches. Progress in Brain Research, vol. 168. Elsevier, Amsterdam (2008) 5. Bi, G., Poo, M.: Synaptic modification of correlated activity: Hebb’s postulate revisited. Annu. Rev. Neurosci. 24, 139–166 (2001) 6. Chrisley, R.L.: Taking embodiment seriously: nonconceptual content and robotics. In: Android Epistemology, pp. 141–166. MIT Press, Cambridge (1995) 7. Gómez, J., Sanz, R., Hernández, C.: Cognitive ontologies: mapping structure and function of the brain from a systemic view. In: AAAI 2008 Fall Symposium on Biologically Inspired Cognitive Architectures (2008) 8. Hernández, C., López, I., Sanz, R.: The operative mind: a functional, computational and modelling approach to machine consciousness. Int. J. Mach. Conscious. 1(1), 83–98 (2009) 9. Kuo, B.: Automatic Control Systems. Prentice-Hall, Englewood Cliffs (1991) 10. Lettvin, J.Y., Maturana, H.R., McCulloch, W.S., Pitts, W.: What the frog’s eye tells the frog’s brain. Proc. Inst. Radio Eng. 47, 1940–1959 (1959) 11. Moravec, H.P.: Mind Children: The Future of Robot and Human Intelligence. Harvard University Press, Cambridge (1999) 12. Sanz, R., Gómez, J., Hernández, C., Alarcón, I.: Thinking with the body: towards hierarchical scalable cognition. In: Handbook of Cognitive Science: An Embodied Approach. Elsevier, Amsterdam (2008) 13. Sanz, R., Hernández, C., Gómez, J., Bedia, M.G.: Against animats. In: Proceedings of CogSys 2010, 4th International Conference on Cognitive Systems. Zurich, Switzerland (2010) 14. Sanz, R., Meystel, A.: Modeling, self and consciousness: Further perspectives of AI research. In: Proceedings of PerMIS ’02, Performance Metrics for Intelligent Systems Workshop. Gaithersburg, MD, USA (2002) 15. Seth, A.K.: Measuring autonomy and emergence via granger causality. Artif. Life 16, 179–196 (2010) 16. Tononi, G.: Consciousness as integrated information: a provisional manifesto. Biol. Bull. 215, 216–242 (2008) 17. Webb, B., Consi, T.R. (eds.): Biorobotics. Methods and Applications. MIT Press, Cambridge (2001)
Chapter 2
Emergent Feature Sensitivity in a Model of the Auditory Thalamocortical System Martin Coath, Robert Mill, Susan L. Denham, and Thomas Wennekers
Abstract If, as is widely believed, perception is based upon the responses of neurons that are tuned to stimulus features, then precisely what features are encoded and how do neurons in the system come to be sensitive to those features? Here we show differential responses to ripple stimuli can arise through exposure to formative stimuli in a recurrently connected model of the thalamocortical system which exhibits delays, lateral and recurrent connections, and learning in the form of spike timing dependent plasticity.
2.1 Introduction Since Hubel and Wiesel [11] showed that, for neurons in visual cortex there were ‘preferred stimuli’ which evoked a more vigorous response than all other stimuli, it has become commonplace to think of neurons, or clusters of neurons, as having stimulus preferences—or alternatively as responding to ‘features’ of the stimulus. Although it is widely believed that auditory perception is based on the responses of neurons that are tuned to features of the stimulus it is not clear what these features are or how they might come in to existence. There is, however, evidence that cortical responses develop to reflect the nature of stimuli in the early post-natal period [12, 24, 25] and that this plasticity persists beyond early development [20]. In addition it has been shown that excitatory corticofugal projections to the thalamus are likely to be crucial in thalamic plasticity and hence in the representation of the stimulus that is available to the cortex [7]. The work presented here is motivated by the desire to investigate whether a recurrently connected thalamocortical model exhibiting spike time dependent plasticity (STDP) can be sensitized to specific features of a stimulus by exposure. Modelling studies have suggested [4, 5] that the spectro-temporal patterns found in a limited number of stimuli, which reflect some putative early auditory environment, may bootstrap the formation of neural responses and that unsupervised, correlation based learning leads to a range of responses with features similar to those reported from measurements in vivo. However in this previously reported work the model of STDP adopted, mostly for reasons of computational efficiency, was based on average activity over a period of time rather than the times of the spikes themselves. In addition this model also led to some synaptic weights increasing without limit and hence an arbitrary cut-off in the time used for training. Here we employ a model of plasticity that depends on times of pre-synaptic spikes and a variable representing the post-synaptic activity [2] and avoids the problem of unlimited weights by using synapses that are bi-stable, that is, over time the weights of all synapses tend to one or zero. We M. Coath () University of Plymouth, Drake Circus, PL4 8AA, UK e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_2, © Springer Science+Business Media, LLC 2011
7
8
M. Coath et al.
Fig. 2.1 Each vertical sub-unit of the network consists of eight neurons. The sub-cortical section receives input from one stimulus channel representing a position on the tonotopic axis. Each thalamic (MGB) cell is connected to a number of cortical cells representing layer IV, the principal receiving layer. Layer VI cells recurrently connect the cortex to the thalamus via NMDA synapses which exhibit STDP and thus are the loci of the correlation-based learning in the network
show that a model of auditory cortex incorporating lateral spread of excitation with associated delays, recurrent connections between layers, and exhibiting STDP (learning) adapts during exposure to training patterns (stimuli) in a way that is determined partly by the stimuli themselves, and the resulting network exhibits ‘feature preferences’ that could support the representation of the input in a high dimensional feature space.
2.2 Methods 2.2.1 The Network 2.2.1.1 Network Architecture The model auditory cortex consists of five hundred repeating units each consisting of eight neurons arranged in layers, as illustrated in Fig. 2.1. The lower, sub-cortical, section represents the junction of the inferior-colliculus (IC) with the medial geniculate body of the thalamus (MGB). The upper section represents a two-layer cortical structure consisting of a receiving layer (layer IV [22] marked simply as P4 in Fig. 2.1) and a second layer (marked as P6 in the figure) providing a recurrent excitatory connection to the thalamus [10], and recurrent inhibitory connection to the thalamus via the thalamic reticular nucleus (RTN ) [9, 10]. Inhibitory inputs to the thalamus also come from the IC, in this case via a GABA-type interneuron, although there is evidence for direct connections from GABAergic cells in IC [14, 21]. The recurrent excitatory connections from P6 to MGB are mediated by NMDA type synapses that are the locui of the STDP (see Sect. 2.2.2). This approach reflects the belief that the principle role of such corticofugal connections is to modulate thalamocortical transmission and that “corticofugal
2 Emergent Feature Sensitivity
9
modulation is an important mechanism for learning induced or experience-dependent auditory plasticity” [17, 26]. Although it is clear that some of the changes associated with this plasticity must be located in the cortex, there is recent evidence that corticothalamic synapses are regulated by cortical activity during the early developmental period [23].
2.2.1.2 Neurons The neurons used are linear integrate-and-fire units and use a stimulation paradigm not of current injection, but of conductance injection which moves integrate-and-fire models closer to a situation that cortical neurons would experience in vivo [6]. This modification also allows the use of conductancebased synapses as described in Sect. 2.2.1.3 below. The behaviour of the neurons can be described by: τ
dV wi · (V (t) − ERi ) = −(V (t) − EL ) − dt i
if V > VT
then V → EL : Z(t) → 1 else Z(t) → 0
(2.1)
where τ is the membrane time constant, V the membrane potential, EL = 0 the leak reversal potential, wi (t) is the weight of the ith synapse—this is a function of time because the value of w subsumes not only the weight constant but also the time varying conductance of the synapse (see Sect. 2.2.1.3), VT = 1 is the firing threshold potential, and Z(t) is the output of the neuron expressed as delta functions at firing times. Values for τ were assigned identically and independently randomly from an equal distribution (i.i.d.) in the range 9–11 ms. The value ERi is the reversal potential of the ith synapse. In addition all neurons received i.i.d. current injections representing the sum of non-stimulusspecific activity. This has the effect of bringing the neurons closer to threshold and the range of values was chosen such that a low level (<1 Hz) of spontaneous action potentials was evoked.
2.2.1.3 Synapses There are four types of synapse present in the model. Each exhibits a time dependent conductance which is derived from the train of spikes (delta functions) originating in the pre-synaptic neuron. The conductance is the output of a second-order low-pass filter and the resulting temporal response function for a single spike is an alpha-function characterised by two parameters: the rise-time τr and the decay-time τd . The majority of excitatory synapses have fast rise and fall times and are designated AMPA types. Other excitatory synapses in the thalamocortical projections have longer rise and fall times and are designated as NMDA synapses. Inhibitory synapses are all of the same type which have very fast rise times and intermediate fall time and these are designated as GABA. The time constants are given in Table 2.1 [8]. Table 2.1 Time constants used in synapse models
τr
τd
AMPA
0.90 ms
GABA
0.01 ms
1.50 ms 5.00 ms
NMDA
3.00 ms
40.00 ms
10
M. Coath et al.
2.2.1.4 Depressing Synapses The axons that project from MGB to layer IV of the cortex have the same time constants as other AMPA synapses but exhibit synaptic depression and are referred to as dAMPA. The dynamical properties of cortical synapses can influence the temporal sensitivity of cortical circuitry. Here we use a model of synaptic depression which is characterised by the variable representing the running fraction of available neurotransmitter x(t) that recovers to unity with a time constant τA [19]. dx 1 − x − x · Z(t) = dt τA
(2.2)
The time constant τA was adjusted so as to be consistant with paired pulse ratios reported in in vivo studies of pyramidal neurons [1]. All simulations were run with τA = 30 ms.
2.2.1.5 Connections Between Columns The excitatory afferents from the thalamus to each cell in the cortical receiving layer come from a number of MGB cells as indicated in Fig. 2.1. These are selected based on connection probabilities that vary with the distance between cells as shown in (2.3), i.e. falling as the inter-column distance d increases. The maximum probability of a connection being made is at d = 0 and this value is controlled by the variable C and the ‘width’ of the function is determined by s. All simulations were run with C = 0.2, s = 20. In a similar way the corticothalamic connections to each MGB cell also come from a number of P 6 cells selected in a similar way. For these connections C = 0.1, s = 100. The probability of connection is given by: −0.5 · d 2 P = C · exp (2.3) s2 For each of the 500,000 possible connections in the cortico-thalamic and thalamocortical projections a Boolean value was chosen with the probability of TRUE being P and a synapse created, or not, accordingly. These ‘fan out’ connections give the opportunity for cortical neurons to integrate information from heterotopic areas of thalamus and also stand as surrogates for the cortico–cortical connections [18] which have no explicit representation in this model. The cortico–thalamic connections are mediated via NMDA type synapses which are the loci of the STDP and hence the correlation-based learning in the network, see Sect. 2.2.2.
2.2.1.6 Delays In order to investigate the role played by the temporal structure of the stimuli in the emergent stimulus preferences of the network, delays were incorporated in to the network. Assumptions were made about the dimensions of the cortical area represented by the model and the range of values for axonal propagation rates. Using these two figures, distance dependent delays were introduced for fan-out connections in the model based on the inter-column distance. Under the simplifying assumption that the delay increases linearly with d we have assumed a maximum separation between neurons of 1 cm and values of axonal propagation rate from 0.5–10.0 ms−1 .
2 Emergent Feature Sensitivity
11
2.2.2 Synaptic Plasticity Spike-timing-dependent plasticity (STDP) is the modification of synaptic weights based on the correlation between pre- and post-synaptic firing times. Evidence for this has been gathered in vitro, and is beginning to emerge in vivo [13], and it is believed to be a feature of synapses which have NMDA receptors that regulate the genes required for long term maintenance of these changes [15]. In general, if a pre-synaptic spike precedes a post-synaptic spike then the synapse is potentiated; if the timing of the spikes is reversed then the synapse is depressed. One problem with correlation-based learning is that the weight changes are unstable and additional mechanisms have to be invoked to ensure that weights do not increase in an uncontrolled manner. Our approach in earlier work was to start with very low weights and keep the training short [3]. In this way we see how the pattern of weight changes establishes itself in the early stages of training. Another possibility, the approach that is adopted here, is to implement a form of STDP in which the weights are bi-stable [2]. The learning rule used in the results presented here is summarized in (2.4), (2.5), and (2.6). At the arrival time of each pre-synaptic spike the synaptic efficacy X is modified based on the postsynaptic neuron membrane potential V and the post-synaptic neuron internal state variable C. The variable C is identified with the calcium concentration [16] and is determined by a leaky integration of post-synaptic spiking activity with a relatively slow time constant τ C: 1 dC(t) δ(t − ti ) = − C(t) + JC dt τC
(2.4)
i
where JC is the contribution of a single post-synaptic spike. The synapse is potentiated by a small l and θ h . Similarly amount a if V is above a pre-determined threshold θV and C is within set limits θup up the synapse is de-potentiated by an amount b if V is less than or equal to θV and C is within a different l h and θdown : pair of bounds θdown X→X+a
if V (tpre ) > θV
l < C(t ) < θ h and θup pre up
X→X−b
if V (tpre ) ≤ θV
l h and θdown < C(tpre ) < θdown
(2.5)
If no modification is triggered by the conditions in (2.5) (including in the absence of pre-synaptic spikes) X drifts towards one of two stable states depending on whether it is greater than a threshold value θX : dX =α dt dX = −β dt
if X > θX (2.6) if X ≤ θX
where α and β are positive constants.
2.2.3 Training Each of the stimuli used in these experiments consists of a pattern of current injection into the units representing neurons of the inferior colliculus, these are marked IC in Fig. 2.1. Although, for simplicity, these patterns of current injection are not derived from audio files via a cochlear model they can be thought of as time varying patterns of activity across the tonotopic axis represented by the one dimensional array of IC cells.
12
M. Coath et al.
Fig. 2.2 Example stimuli used in training the network. Each example, in common with all such stimuli used in the experiments, have a 5 Hz amplitude modulation rate. (a) A stimulus with no FM component, and (b) a stimulus with a slowly moving up FM component (determined by θd ). In the work reported here stimuli varied only in the value for θd
2.2.3.1 Parametric AM/FM Stimuli The stimuli were all of the form given by (2.7) below: z(t, c) =
(cos(2πtθx ) + 1)(cos(2π(cθc + tθd ) + 1)) 4
(2.7)
where z(t, c) is the value of the current injection at time t and in channel c. The parameters θx , θc and θd can be adjusted to give sweeps or gratings that move in the tonotopic axis with time, and also patches of stimulation that have a temporal amplitude modulation (AM) but no frequency modulation (FM) component. Examples of such stimuli are shown below in Fig. 2.2. The value of θx , the temporal modulation rate, was fixed at 5 for all experiments. This value was chosen because of the inherent low-pass nature of the thalamocortical projections caused by the depressing synapses, (see Fig. 2.1) hence stimuli with temporal modulation rates much greater than 5 would drive the cortical receiving layer only weakly. In addition rates of temporal modulation around 4–5 Hz are important for communication signals such as the syllable rate for human speech. The value of θc , the spectral density, was fixed at 2. For each experiment one value of θd was chosen as the training stimulus. The network was then exposed to 50 epochs (each 2 seconds) of this stimulus with the learning rule turned on. Between each of these learning phases the response of the network was recorded to 10 other stimuli. These are referred to as test stimuli, with a range of values for θd both positive and negative. Each test and training stimulus was separated from the previous one by ≈300 ms of random current injection at the same mean level as the stimuli and the phase of the stimulus advanced by a random value from 0 to 2π to ensure that both the training and test stimuli were not presented stating at the same phase in all cases, this process is summarised in Fig. 2.3. In the results section we consider networks trained with the values for θd of −10, −5, 0, 5, 10. Fixing θd at integer multiples of θx produces stimuli with similar temporal characteristics in that the maxima of the current injections occur in the same channels with each presentation. 2.2.3.2 Random Chord Stimuli We also consider results of training with stimuli that consist of injections of current in channels chosen at random (P = 0.1) for short periods of time chosen from an equal distribution from 20–60 ms. These noise-like ‘random chord’ stimuli are more suitable than random current injections representing white noise which drive the cortical receiving layer only very weakly.
2 Emergent Feature Sensitivity
13
Fig. 2.3 One epoch of the training process. The sequence consists of (a) noise; before every stimulus there was 300 ms of random current injection at a mean level the same as the stimuli, (b) training; one value of θd was chosen as the FM component of the training stimulus, (c1 , . . . , cn ) testing; the response of the network was recorded during 10 test stimuli with different values of θ with the learning rule switched off. Each experiment consisted of 30 of these epochs Fig. 2.4 The distribution of the directional sensitivity (Sθ , see (2.8)) after training with ‘noise like’ random chord stimulus. The results obtained after 30 epochs (see Fig. 2.3) show no clear sensitivity to any preferred FM sweep rate
2.3 Results Of interest are both the responses of the network and the responses of the individual neurons. In both cases the questions are: (a) what, if any, aspect of the stimulus is the individual neuron or network of neurons sensitive to after training and (b) is the distribution of sensitivities different for different training stimuli. In the results presented here the responses were measured in the thalamic (MGB ) section of the network as this is the locus of plasticity in the current study and we will consider results using the five training stimuli i.e. θd = −10, −7, −5, −0, 5, 7, 10. We will also consider results from training with random chord stimuli (see Sect. 2.2.3). To help us describe the results we can define the direction sensitivity S of a neuron as the log ratio of the spike count R at any given θ for up and down versions of the stimulus. S = log
Rθmax Rθmin
(2.8)
The value of θ which gives the maximum absolute value of S is the rate of change to which the neuron is most sensitive which we will indicate as Sθ . Because the highest spike count is always in the numerator of (2.8) we append the sign representing the direction which gives the highest spike count to indicate the preferred direction. The first experiment was to determine the stimulus preferences of the MGB neurons in the case where the training stimulus consists of random chords. The resulting directional sensitivities are shown in Fig. 2.4. As can be seen in Fig. 2.4 there are neurons exhibiting all directional sensitivities but no clear pattern emerges during training. The network was then exposed to stimuli having no FM component that is with θd = 0. Figure 2.5(a) shows the spike count for each test stimulus at each training epoch and Fig. 2.5(b) shows the resulting
14
M. Coath et al.
Fig. 2.5 Training with stimulus θd = 0. (a) The summed spike count for the whole network for each of the ten test stimuli over the training period, and (b) the resulting distribution of directional sensitivities among individual MGB cells. No error bars are shown in (a) as this illustrates a single example training. Note that symbols representing up and down stimuli for all θ are superimposed indicating no overall direction preference
Fig. 2.6 Weights of NMDA synapses after training with (a) stimuli with θd = 0 (no FM component) and (b) with θd = +5. The dotted diagonal indicates connections between cortical and thalamic cells at the same position in the tonotopic axis. In the case of training with θd = −5 (not shown) a pattern similar to (b) emerges but as a mirror image reflected in the dotted diagonal
distribution of values for Sθ . The response of the network shows no overall sensitivity to direction of frequency modulation, although Fig. 2.5(c) shows that there are direction sensitive neurons in the MGB after training and that these are predominantly at θ = ±5 and θ = ±10. In the previous section the training stimuli had no FM component but patterns with a range of values for θd were used as test stimuli. In the next set of experiments the same test stimuli were used but stimuli with a single value for θd were used for training. To illustrate the influence of the FM component of the stimulus on the pattern of weights Fig. 2.6 shows the pattern of corticothalamic projection weights in the NMDA synapses after exposure to stimuli with and without FM components. The nature of the learning rule is such that the overwhelming majority of synapses will have connection weights of either one or zero so the patterns of weights can be thought of as a connectivity matrix. It can be seen that the emergent connectivity for the stimulus without FM component, Fig. 2.6(a), is symmetrical around the diagonal indicated with a dashed line. Points on this diagonal represent connections between cortical cells and MGB cells at the same position on the tonotopic axis, that is with a column separation d = 0 see (2.3). In contrast, the connectivity pattern that emerges after training using a stimulus with FM component, Fig. 2.6(b), exhibits an asymmetry about this same diagonal. In the case illustrated the FM component used in training was
2 Emergent Feature Sensitivity
15
Fig. 2.7 Training with stimulus θd = ±5. (a) and (c) the summed spike count for the whole network for each of the ten test stimuli over an example training period. A small difference in the network response to up and down stimuli is evident at θ = ±5 and θ = ±10. (b) and (d) the resulting distribution of directional sensitivities among individual MGB cells
‘up’ and in the case where the equivalent ‘down’ stimulus was used the pattern was the mirror image (not shown). Figure 2.7 shows two results corresponding to Fig. 2.5 but after training with stimuli having θd = ±5. An asymmetry emerges in the response of the whole network and this can be seen in the distribution of responses in the individual neurons. The network trained with up stimuli exhibits a majority of neurons with a greater sensitivity for down stimuli but only at θ = 5, however, the emergent preference for up stimuli is visible for θ = 10. This apparent contradiction might be due to elevated firing rates in the first few milliseconds of the preferred stimulus causing the thalamocortical synapses to depress, thus lowering the overall spike count for the training stimulus. Work is underway to confirm this hypothesis. A corresponding result can be seen in Fig. 2.8 for training with stimuli θ = ±10 which shows a pattern consistent with the interpretation of Fig. 2.7. Few neurons show directional sensitivity for the training stimulus but there is an increase in those with maximal directional sensitivity in the opposite direction, and in the same direction for twice the FM rate.
2.4 Discussion The functional organization of the auditory system can be altered in vivo by repeated stimulation (e.g. [26]) and it has also been shown in studies of the visual system [23] that activity-dependent plasticity in cortico-thalamic connections operate during early developmental stages. The work presented here
16
M. Coath et al.
Fig. 2.8 Training with stimulus θd = ±10. (a) and (c) the summed spike count for the whole network for each of the ten test stimuli over an example training period. A small difference in the network response to up and down stimuli is evident at θ = ±10. (b) and (d) the resulting distribution of directional sensitivities among individual MGB cells
represents the first attempt to model this sort of developmental plasticity in a biophysically motivated artificial network of spiking units. The results show that STDP allows the model which exhibits lateral connectivity and recurrent connections to adapt to the stimuli to which they are exposed. This adaptation can, we have shown, exploit a range of axonal delays in order to represent correlations in activity at different times along a spatially defined axis such as the tonotopic axis reported in primary auditory cortex. In this way sensitivity to spectrotemporal features can emerge through exposure to stimuli both in individual neurons and neuronal ensembles. The results provide a model which may help us understand how efferent cortical pathways to thalamic nuclei might be crucial in the development of the auditory and other sensory systems, and also how these might support other forms of plasticity such as attentional, or task-related modulation of thalamocortical transmission. Acknowledgements LAMN.
This work is funded by EU FP7-ICT-231168 SCANDLE, and EPSRC EP/C010841/1 CO-
References 1. Atzori, M., Lei, S., Evans, D.I., Kanold, P.O., Phillips-Tansey, E., McIntyre, O., McBain, C.J.: Differential synaptic processing separates stationary from transient inputs to the auditory cortex. Nat. Neurosci. 4(12), 1230–1237 (2001). doi:10.1038/nn760
2 Emergent Feature Sensitivity
17
2. Brader, J.M., Senn, W., Fusi, S.: Learning real-world stimuli in a neural network with spike-driven synaptic dynamics. Neural Comput. 19(11), 2881–2912 (2007). doi:10.1162/neco.2007.19.11.2881 3. Coath, M., Balaguer-Ballester, E., Denham, S.L., Denham, M.: The linearity of emergent spectro-temporal receptive fields in a model of auditory cortex. Biosystems 94(1–2), 60–67 (2008). doi:10.1016/j.biosystems.2008.05.011 4. Coath, M., Brader, J.M., Fusi, S., Denham, S.L.: Multiple views of the response of an ensemble of spectro-temporal features support concurrent classification of utterance, prosody, sex and speaker identity. Network 16(2–3), 285– 300 (2005) 5. Coath, M., Denham, S.L.: Robust sound classification through the representation of similarity using response fields derived from stimuli during early experience. Biol. Cybern. 93(1), 22–30 (2005). doi:10.1007/s00422-005-0560-4 6. Destexhe, A., Rudolph, M., Paré, D.: The high-conductance state of neocortical neurons in vivo. Nat. Rev., Neurosci. 4(9), 739–751 (2003). doi:10.1038/nrn1198 7. Ergenzinger, E.R., Glasier, M.M., Hahm, J.O., Pons, T.P.: Cortically induced thalamic plasticity in the primate somatosensory system. Nat. Neurosci. 1(3), 226–229 (1998). doi:10.1038/673 8. Gerstner, W., Kistler, M.: Spiking Neuron Models. Cambridge University Press, Cambridge (2002) 9. Guillery, R.W., Feig, S.L., Lozsádi, D.A.: Paying attention to the thalamic reticular nucleus. Trends Neurosci. 21(1), 28–32 (1998) 10. Guillery, R.W., Sherman, S.M.: Thalamic relay functions and their role in corticocortical communication: generalizations from the visual system. Neuron 33(2), 163–175 (2002) 11. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962) 12. Illing, R.-B.: Maturation and plasticity of the central auditory system. Acta Oto-Laryngol., Suppl. 552(552), 6–10 (2004) 13. Jacob, V., Brasier, D.J., Erchova, I., Feldman, D., Shulz, D.E.: Spike timing-dependent synaptic depression in the in vivo barrel cortex of the rat. J. Neurosci. 27(6), 1271–1284 (2007). doi:10.1523/JNEUROSCI.4264-06.2007 14. Marie, R.L.S., Stanforth, D.A., Jubelier, E.M.: Substrate for rapid feedforward inhibition of the auditory forebrain. Brain Res. 765(1), 173–176 (1997) 15. Rao, V.R., Finkbeiner, S.: NMDA and AMPA receptors: old channels, new tricks. Trends Neurosci. 30(6), 284–291 (2007). doi:10.1016/j.tins.2007.03.012 16. Shouval, H.Z., Bear, M.F., Cooper, L.N.: A unified model of MNDA receptor-dependent bidirectional synaptic plasticity. Proc. Natl. Acad. Sci. USA 99(16), 10831–10836 (2002). doi:10.1073/pnas.152343099 17. Suga, N., Xiao, Z., Ma, X., Ji, W.: Plasticity and corticofugal modulation for hearing in adult animals. Neuron 36(1), 9–18 (2002) 18. Thomson, A.M., Bannister, A.P.: Interlaminar connections in the neocortex. Cereb. Cortex 13(1), 5–14 (2003) 19. Tsodyks, M.V., Markram, H.: The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability. Proc. Natl. Acad. Sci. USA 94(2), 719–723 (1997) 20. Wang, X.: The unexpected consequences of a noisy environment. Trends Neurosci. 27(7), 364–366 (2004). doi:10.1016/j.tins.2004.04.012 21. Winer, J.A., Marie, R.L.S., Larue, D.T., Oliver, D.L.: GABAergic feedforward projections from the inferior colliculus to the medial geniculate body. Proc. Natl. Acad. Sci. USA 93(15), 8005–8010 (1996) 22. Winer, J.A., Miller, L.M., Lee, C.C., Schreiner, C.E.: Auditory thalamo-cortical transformation: structure and function. Trends Neurosci. 28(5), 255–263 (2005). doi:10.1016/j.tins.2005.03.009 23. Yoshida, M., Satoh, T., Nakamura, K.C., Kaneko, T., Hata, Y.: Cortical activity regulates corticothalamic synapses in dorsal lateral geniculate nucleus of rats. Neurosci. Res. 64(1), 118–127 (2009). doi:10.1016/j.neures.2009.02.002 24. Zhang, L., Bao, S., Merzenich, M.: Persistent and specific influences of early acoustic environments on primary auditory cortex. Nat. Neurosci. 4(11), 1123–1130 (2001) 25. Zhang, L.I., Bao, S., Merzenich, M.M.: Disruption of primary auditory cortex by synchronous auditory inputs during a critical period. Proc. Natl. Acad. Sci. USA 99(4), 2309–2314 (2002) 26. Zhang, Y., Yan, J.: Corticothalamic feedback for sound-specific plasticity of auditory thalamic neurons elicited by tones paired with basal forebrain stimulation. Cereb. Cortex 18(7), 1521–1528 (2008). doi:10.1093/cercor/bhm188
Chapter 3
STDP Pattern Onset Learning Depends on Background Activity James Humble, Steve Furber, Susan L. Denham, and Thomas Wennekers
Abstract Spike-timing dependent plasticity is a learning mechanism used extensively within neural modelling. The learning rule has previously been shown to allow a neuron to learn a repeated spatiotemporal pattern among its afferents and respond at its onset. In this study we reconfirm these previous results and additionally adduce that such learning is dependent on background activity. Furthermore, we found that the onset learning is unstable when in a noisy framework. Specifically, if the level of background activity changes during learning the response latency of a neuron may increase and with the adding of additional noise the distribution of response latencies degrades. Consequently, we present preliminary insights into the neuron’s encoding: viz. that a neuron may encode the coincidence of spikes from a subsection of a stimulus’ afferents, but the temporal precision of the onset response depends on some background activity, which must be similar to that present during learning.
3.1 Introduction Spike-timing dependent plasticity (STDP) is a well established mechanism that permits spike time differences between pre- and postsynaptic activities to affect changes in synaptic efficacy [1–3, 10, 15, 21]. Similar to other forms of synaptic plasticity (such as long-term potentiation and long-term depression), a synapse’s efficacy can either increase or decrease, however with STDP, if presynaptic activity precedes postsynaptic activity, the conjoining synapse is strengthened and if the reverse is observed, the synapse is depressed. Furthermore, the amount of efficacy change is dependent on the precise timing between the pre- and postsynaptic activity: if presynaptic activity is just a few milliseconds before postsynaptic activity the resulting synaptic change will be greater than in a case where the interval was much longer. This amount of change for different intervals is dependent on an STDP function. A commonly used STDP function is an exponential decay with maximum synaptic change with minimal temporal difference between pre- and postsynaptic activity. This form of synaptic plasticity that relies on temporal differences between pre- and postsynaptic activity was found using electrophysiology; for example Bell et al. [1] found that in cerebellum-like structures in fish, synaptic plasticity depended on the sequence of pre- and postsynaptic events and Markram et al. [10] found a similar effect in pyramidal neurons. Plasticity rules, have been studied extensively and are commonly used as a substrate of many forms of learning [8, 9, 19]. Linsker [8] demonstrated that through Hebb-type synaptic plasticity, structured receptive fields could be developed and MacKay and Miller [9] examined these results and Linsker’s simulations as eigenvectors and their eigenvalues. Further, since the discovery of temporal J. Humble () School of Computing and Mathematics, University of Plymouth, Plymouth, UK e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_3, © Springer Science+Business Media, LLC 2011
19
20
J. Humble et al.
synaptic plasticity, many potential benefits have been proposed. For example, STDP can (1) increase the mutual information between inputs and outputs of simple networks ([13] used information theory to quantify learning performance), (2) provide a function for Hebbian learning and development and (3) capture the causality of determining the direction of synaptic change that is implied by Hebb’s original postulate. Plasticity rules based on such temporal differences have consequently been studied extensively. Plasticity rules have been applied successfully to simple pattern learning and relatively more complicated competitive pattern learning [5, 6, 11, 12, 14, 18]. Guyonneau et al. [6] found that a neuron with STDP synapses when presented with a repeated stimulus lead to the neuron’s synapses favouring the first spikes in the stimulus, and Masquelier et al. [14] found that a trace rule was able to learn visual complex cell like receptive fields. Of interest to this study is research by Masquelier and colleagues. Masquelier et al. demonstrated that a leaky-integrate and fire neuron equipped with STDP can learn repeated spatio-temporal spike patterns even when embedded in a statistically identical distractor signal [11]—a somewhat non-trivial task. The neuron was able to distinguish the repeated pattern from a background signal even though the firing rates of both were equal. Furthermore, after the neuron had learnt a part of the pattern it reduced the response latency—relative to the beginning of the pattern—until it responded within a few milliseconds after the onset of the pattern. It is proposed that such learning may take advantage of the view that a pattern is a succession of spike coincidences; these coincidences combined with STDP’s pre- and postsynaptic temporal considerations form the basis of many synaptic plasticity learning rules. Masquelier et al. [12] extended their network with multiple neurons. As in the previous work each neuron was learning a repeated spatio-temporal spike pattern, however the neurons compete to learn a pattern through lateral inhibition. Due to this inhibitory competition each neuron learnt a different segment of a pattern. Specifically, they found that one neuron learnt to respond near the onset, as in their previous work, however remaining neurons then ‘stacked’ on this neuron so there response latencies increased. Overall, the neurons therefore learnt the whole pattern. Envisage a neuron that has learnt to fire after a subset of its afferents present a pattern, as in work by Masquelier et al. [11]. In the present work we firstly reconfirm the results by Masquelier and colleagues. We subsequently adduce this neuron’s high sensitivity and dependence on background activity unrelated to the pattern: we found that if the level of background activity is different during learning than recall the latency of recall is modified. Furthermore, we suggest that the temporal accuracy of the onset learning is unstable in a noisy framework, viz. that the latency distribution degrades with noise. Subsequently, insights into the neuron’s encoding are also presented.
3.2 Methods Simulations were performed using custom made C software. Source code is available from the authors upon request.
3.2.1 Network Structure and Afferent Input The neural network structure was similar to that of Masquelier et al. [11]: 2000 afferents converge on one neuron. These 2000 afferents carry Poisson spike trains which were produced ‘on the fly’, with 1000 of these afferents occasionally conveying a repeated spatio-temporal pattern of 50 ms duration. To produce this input, a simulation is segmented into 50 ms windows. During each subsequent bin, afferents communicating the pattern have a 0.25 probability of displaying the pattern; the remaining
3 STDP Pattern Onset Learning Depends on Background Activity
21
afferents consistently project random Poisson activity. (The pattern was not presented consecutively: this ensured each pattern presentation was preceded and followed by random activity.) Poisson spike trains were 54 Hz initially, with 10 Hz further noise added to all afferents including those of the pattern. This protocol is similar to Masquelier et al. but allows on-line production of afferents’ inputs. Analysis was carried out to ensure that the pattern was statistically identical to the random distractor signal. Each full simulation ran for up to 3000 s. During typical simulations it was found that any individual synaptic weight could converge (change from one synaptic weight limit to the other) in 200 s; therefore synaptic weight values were recorded every 2 s allowing sufficient visualisation of synaptic weight trajectories and distributions.
3.2.2 Neuron Model A leaky integrate-and-fire neuron model was used such that membrane potential was modeled by (3.1) and synapses were modelled as alpha functions (3.2) where τm = 10 ms is the membrane time constant, τr = 1 ms is the rise time constant of the alpha synapse and τf = 5 ms is the fall constant; I is a spike of value equal to a synaptic weight. Input activity consists of pulses of current for duration dt. A threshold of θ = 1 was used. dV = −V + Sf if V ≥ θ then reset V = 0 τm (3.1) dt dSr = −Sr + I τr dt (3.2) dSf τf = −Sf + Sr dt
3.2.3 Spike-Timing Dependent Plasticity Spike-timing-dependent plasticity was modelled by an additive exponential STDP rule: Wt+dt = Wt + f (τ ). Equation (3.3) describes the STDP function used where τp = τm = 20 ms. These values of τp and τm were chosen similar to those observed experimentally [4, 7, 15, 20, 22], where the strongest synaptic modifications occurred within a window of ±20 ms. These constants vary from other studies such as Masquelier et al. [11, 12] who used τp = 16.8 ms and τm = 33.7 ms. −τ f (τ ) = Ap × exp if τ ≥ 0 τp (3.3) τ if τ < 0 f (τ ) = Am × exp τm Learning rates were assigned with (3.4), where Wmax is the maximum synaptic weight (cf. m Sect. 3.2.4). The assignment of Am is in accordance with the finding by Song et al. [18] that A Ap needs to be slightly larger than one to maintain reasonable postsynaptic activity. Ap = 0.002 × Wmax τp Am = −Ap × × 1.05 τm
(3.4)
22
J. Humble et al.
Furthermore, the cumulative change in synaptic weights from random activity should approximate to < 0; see (3.5). τp f (τ ) < 0 (3.5) −τm
3.2.4 Maximum Synaptic Weight Many studies into STDP pattern learning use weight bounding to control synaptic change; such saturation limits stop unstable forms of STDP based learning changing synaptic efficacies ad infinitum. Furthermore, the maximum synaptic weight can have a great impact on the learning process. If the maximum synaptic weight is too low, postsynaptic activity may not reach threshold, and if it is too high, postsynaptic activity may be erratic and uninformative due to a neuron reaching threshold from noisy input. We therefore implemented the maximum synaptic weight with (3.6), where v is the difference in membrane potential required to go from resting to threshold (v = 1), r(t) is the average firing rate of afferents, Ninput is the number of afferents in the pattern (here 1000) and A is an additional constant with arbitrary units. Wmax =
v ]+A [ τm ×r(t)×dt
Ninput
(3.6)
Therefore, after learning, one envisages that all synapses participating in a pattern are at Wmax and the rest are at, or near, 0 and thus just the afferents in the pattern will evoke an action potential from the output neuron. Furthermore, A can be adjusted to allow fluctuations in the membrane potential to reach the neuron’s threshold because (1) all afferents in the pattern may not be learnt through STDP, (2) synapses are not guaranteed to converge fully to weight boundaries, (3) the afferents are not guaranteed to fire at the average firing rate or sufficiently within the membrane time constant and (4) each presynaptic spike is represented with an alpha function not Dirac delta spikes. To this end, an appropriately selected value (considering the above three points) can allow STDP to be affected by positive fluctuations in input and membrane potential—found to be crucial by Song et al. [18] and designed into a learning rule by Senn et al. [17]. It was found that STDP trained best with A = 20. Crucially, (3.6) is a ‘best guess’ approach to assigning maximum synaptic weights because a ‘best’ value will depend on how many afferents are fully potentiated at the end of learning, and this changes between simulations depending on the pattern, chance and random initial synaptic weights. Specifically, the rheobase will stay the same across simulations but the number of synapses contributing and their individual contributions will change. Ideally some neural mechanism could be used to change the maximum synaptic weight during learning possibly depending on pre and/or postsynaptic activity.
3.2.5 Initial Synaptic Weights The initial strength of synaptic weights can be as important as their maximum efficacy. Several different approaches are commonly used to set initial synaptic weights; the approach we used is to set synaptic weights w with random values drawn from a Uniform distribution between 0 and Wmax as in (3.7). This results in the output neuron firing frequently during the first phase of the simulation (cf. Sect. 3.3.1). 0 < w ≤ Wmax
(3.7)
Initial synaptic weights should be set such that fluctuations from all afferents cause the output neuron to fire frequently. This is important because it allows all synapses contributing to postsynaptic activity to be affected by STDP [18], and thus potentially potentiated or depressed.
3 STDP Pattern Onset Learning Depends on Background Activity
23
Fig. 3.1 Postsynaptic latency—relative to the pattern start—as a function of discharges. When the neuron discharges outside of the pattern a latency of 0 is shown. The STDP clearly learns the pattern and has similar periods to those observed by Masquelier et al.: (1) when the neuron is non-selective to the pattern and most synaptic weights are being τ depressed due to −τp m f (τ ) < 0; (2) when the neuron is training to the beginning of the pattern; and (3) when the neuron consistently fires within the pattern
3.3 Results During any typical simulation, a neuron’s responses can be characterised by three temporal phases— first observed by Masquelier et al. [11]. In our simulations the first phase is identical to [11], the second slightly differs and the third is unstable; it is these two latter phases we investigate in more detail in this section. Specifically, the firing rate of the background activity—consisting of afferents not transmitting the pattern during a pattern presentation and all afferents when not—stays constant (64 Hz), as described in the methods, but its effect on the neuron’s membrane potential is modulated through the synapses and as these are plastic the resulting effect is non-trivial.
3.3.1 Typical Results A typical simulation of the spiking model for 1000 s is depicted in Fig. 3.1. There are three clear phases within the simulation: (1) when the neuron fires non-specifically to the pattern as initial synaptic weights are high; (2) when the neuron has recognised the repeating pattern and reduces its latency to the onset, until ≈10 ms where it can no longer reduce the latency, possibly due to tm = 10 ms; and (3) when the neuron responds near the onset of the pattern but drifts counterproductively a little increasing the latency. These results reinforce those of Masquelier et al., viz. that a neuron equipped with STDP is able to effectively find and train to a repeated spatio-temporal pattern embedded within a statistically identical signal. This is not surprising as the mechanism presented by Masquelier et al. is very general. However, Masquelier et al. found that once a neuron had trained to a pattern it had “converged towards a fast and reliable pattern detector”. In our simulations this was not the full story: we found that a learnt neuron’s firing latency appeared to drift backward through the pattern. For example, in phase 3 of the simulation depicted in Fig. 3.1, the neuron does respond as early as ≈12 ms but then drifts towards the end of the pattern by ≈5 ms.
24
J. Humble et al.
Fig. 3.2 Results of a longer simulation (3000 s) including synaptic weight distributions and trajectories. (a) Depicts the evolution of the synaptic weight distribution of afferents in the pattern and (b) of those not in the pattern. (c) Shows the synaptic weight trajectories of 50 afferents in the pattern and (d) 50 of those not in the pattern. (e) Depicts discharge latency as a function of simulation time (cf. Fig. 3.1)
3.3.2 Long Simulation To study this drift in more detail longer simulations (3000 s) were run to see how much drift would occur and whether the latency would stabilise—Fig. 3.2 shows the results of such a simulation. Figure 3.2e depicts a neuron’s discharges as a function of simulation time. The typical phases described earlier (Sect. 3.3.1 and Fig. 3.1) are still visible, albeit compressed due to a change in the independent variable. Notably, although the neuron does respond as early as 13 ms it saturates at ≈18 ms. This drift backward through the pattern has a longer time scale than the second period where STDP reduces the latency.
3 STDP Pattern Onset Learning Depends on Background Activity
25
To elucidate the cause of this backward drift, synaptic weight values were recorded every 2 s. Synaptic weight distributions and trajectories are fairly stable after 1000–2000 s and are bimodal (as found by [11, 16]). The period when STDP learns the pattern can be seen in the synaptic weights distributions and trajectories of afferents transmitting the pattern (Figs. 3.2a and 3.2c): the synaptic weights quickly converge to either synaptic weight bound. The convergence of synaptic weights to the pattern is mostly complete before 250 s with a small fraction of synaptic weights still varying throughout the rest of the simulation. In contrast, synapses not in the pattern take relatively longer to stabilise (≈1000–2000 s), with some never stabilising (Figs. 3.2b and 3.2d). These non-pattern synaptic weights are depressed over this longer period; this is the desired effect of STDP as they are not conveying the repeated pattern. This longer time scale has a possible side-effect however: the latency drift described above. Explicitly, during the first phase, the neuron does not fire specifically to the pattern, and thus STDP depresses most synaptic weights. This continues until pattern-driven fluctuations in the neuron’s membrane potential facilitates STDP learning [18]: phase two. These fluctuations are based on a background activity level consisting of the synaptic weights of all afferents at that time (≈200–250 s). Therefore, when STDP depresses non-pattern synaptic weights after the initial pattern learning has completed, this background level is consequently modified: it is reduced. We hypothesized that this reduction in background activity increases the time the neuron takes to temporally sum incoming spikes to threshold and consequently produces the increase in latency visible in Fig. 3.2e.
3.3.3 Analysis of Drift To test our hypothesized reliance of the output neuron response latency on the level of background activity, an additional constant electrical current was applied to the output neuron after it had learnt a pattern and remained present for the rest of the simulation. This additional constant is analogous to raising the level of background activity. (We chose to inject current into the neuron instead of raising the background activity level through the afferents as the first guarantees a set effective amount.) If a trained neuron has no significant reliance on the level of background activity and responds only to a short succession of coincidences from a subset of afferents it has learnt, changes—if small relative to the membrane potential range—should not disturb the neuron’s response latency; Fig. 3.3 shows the results of such an experiment. When the constant current is applied to the neuron at 2000 s there is a significant change in synaptic weights (Fig. 3.3a–d). Moreover, Fig. 3.3e shows a significant change in the neuron’s firing latency during this additional current: the latency is reduced. Interestingly the neuron’s latency then drifts again. The synapses are plastic for the entirety of the simulation; if the synapses are fixed at their current strengths from 2000 s onwards this drift does not occur and the neuron consistently fires at the initial reduced latency (results not shown herein). An analysis of the synaptic weight distributions and trajectories from 2000 s helps to postulate as to this second drift: many afferents which convey the pattern change their efficacy significantly—either going from 0 to Wmax or vice-versa. Accompanying these changes is a reduction in synaptic weights for afferents not conveying the pattern—similar to the initial training phases. The neuron is adjusting to its new background activity. To further clarify the reliance on background activity Fig. 3.4 depicts a neuron’s membrane potential. Figure 3.4a is just after the neuron has trained to the pattern, and Fig. 3.4b is later in the simulation after some drift has occurred. The background activity is less in Fig. 3.4a than Fig. 3.4b leading to a longer integration time to threshold and thus increasing the firing latency by ≈5 ms.
26
J. Humble et al.
Fig. 3.3 Results of a long simulation (3000 s) with an additional injection of constant electrical current at 2000 s. As with Fig. 3.3, synaptic weight distributions ((a) and (b)) and synaptic weight trajectories ((c) and (d)) were recorded every 2 s. (e) clearly shows a decrease in latency at the current onset; the latency then increases and levels off
3.3.4 Impact of Noise To this point neurons trained to the beginning of the pattern, however if small Gaussian white noise is added to a neuron’s membrane potential, this is no longer the case. Figure 3.5 depicts two typical results with noise: in Fig. 3.5a the shortest latency is ≈20 ms, and in Fig. 3.5b it is ≈38 ms. It appears the noise interferes with the learning phase (Fig. 3.1). In fact, with this noise we found that a neuron’s learnt latency varied greatly; increasing amounts of noise changed the distribution of learnt response latencies. Depicted in Fig. 3.6a are learnt response latency distributions for simulations with different amounts of noise added to a neuron learning a 50 ms long pattern. Initially, with no or little noise the neurons respond near the onset (Fig. 3.6a). However, as the noise amount increases the neurons stop responding near the onset and respond later and later in the pattern. Furthermore, the structure of
3 STDP Pattern Onset Learning Depends on Background Activity
27
Fig. 3.4 A neuron’s membrane potential depicting a lower background activity and consequently a longer integration time. (a) Reaches threshold with a latency of ≈10 ms, whereas (b), with ≈15 ms. Patterns started at 251.25 s and 952.35 s
Fig. 3.5 Two typical results when Gaussian noise is added to the membrane potential. The neuron starts training to the beginning of the pattern but stops before reaching it
the distribution is lost with the highest levels of noise. To test whether this was an artefact of a pattern with length 50 ms, we also tried adding varying amounts of noise to patterns of length 75 and 100 ms (Fig. 3.6b): the noise amount affected the longer patterns in the same manner. Further, we fitted right censored gamma distributions to the response latency distributions. The peaks of these distributions are shown in Fig. 3.7a and show a general result, viz. that an increase in noise disrupts the neuron’s ability to train to the beginning of the pattern and the peak latency drifts backwards through the pattern. In addition, there is a negative side effect of this additional noise: a drop in learning performance. Figure 3.7b depicts a major drop in performance for noise with σ > 0.4. Specifically for σ = 0.9, performance drops to just below 20% for 50 ms patterns, ≈3% for 75 ms and does not learn a 100 ms pattern at all. Masquelier et al. [11] also noted that the final learnt latency may not be at the onset of the pattern and suggest this might be due to a zone of low spike density. We postulate that adding extra noise to a neuron increases the likeliness that the neuron stops during one of these low spike density zones, however also reduces its ability to learn the pattern at all with high amounts of noise.
3.4 Discussion The results we present reconfirm the findings by Masquelier et al. [11], and even though we used a different STDP window and different learning rates, STDP was able to distinguish a repeated pattern
28
J. Humble et al.
Fig. 3.6 When additional noise is added the response latency distributions are affected (N = 100). (a) Response latency distributions for varying amounts of background noise (from σ = 0.0 to σ = 0.9) for a pattern of length 50 ms. (b) Fitted bezier curves to response latency distributions for patterns of length 50, 75 and 100 ms
from random activity, learn it and fire consistently within it. However, we found that with long simulations the latency of a neuron’s firing increased with respect to the beginning of the pattern. It is not clear if Masquelier et al. [11] ran their simulations for longer than 450 s or whether they found this drift.
3 STDP Pattern Onset Learning Depends on Background Activity
29
Fig. 3.7 Adding increasing amounts of noise not only disrupts the response latency distribution but also produces a drop in performance. (a) The peak of fitted right censored gamma distributions and (b) the learning performance
To eliminate this drift as an artefact of the model, parameters were adjusted as follows: • Different learning τp rates had no effect. As rates are assigned with Am = −Ap × (τp /τm ) × 1.05, f (τ ) was always < 0 thus different learning rates changed the converAm ∝ Ap so τ−m gence/learning time only. • Different values of Wmax had no effect on the drift. To elucidate the cause of this drift we examined synaptic weight distributions and trajectories; these imply that the drift is due to two components: fast learning of the pattern’s afferents and slower unlearning of synapses from afferents firing randomly. Initially, STDP modifies afferents’ weights which are transmitting the stimulus, promptly (within 250 s) stabilising appropriate synaptic weights at either bound. Then, remaining synaptic weights are depressed on a longer time scale (within 1000– 2000 s). This second longer time scale adjusts the level of background activity, and we hypothesised that this causes a drift. We tested this hypothesis and found that an artificial change in the level of background activity can change the response latency. In addition, we found that the tuning of the pattern detector towards the pattern onset is unstable against noise. A reduction in background activity increased the latency of a neuron’s response so one may ask what would happen if synapses which are depressed—and determined not to carry information—were eliminated. If this was to occur, a neuron may either not fire at all or fire with a much greater—
30
J. Humble et al.
perhaps uninformative—latency; thus, the STDP-based learning schema used in this study may have less biological plausibility than previously thought when used on its own. Throughout this study the emphasis has been pattern onset learning, however if the task was to simply fire within a learnt pattern the drift described herein does not present a problem. Furthermore, when additional noise was added—if relatively small (σ < 0.4)—it did not greatly impact on performance, but did distort response latency distributions in favour for responding with an increased latency. A complete lack of background activity on the other hand may pose a significant complication, and indeed a great amount of additional noise severly impacted on learning performance. An offshoot of the findings presented in this study is a glimpse at the pattern encoding used by its neurons. A neuron may encode the coincidence of spikes from a subsection of the pattern’s afferents, and when these spikes occur the neuron fires. However the temporal precision of this response depends on some background activity, which must be similar to that present during learning. Consequently, in a noisier framework—perhaps with greatly fluctuating afferent activity as may be the case in biological situations—a neuron may no longer respond to its learnt pattern or may respond after an unacceptable delay. In addition we found that a relatively small amount of noise added to the neuron’s membrane potential produced less stable learning; in fact the earliest trained latency appears to be quite un-stable.
References 1. Bell, C., Han, V., Sugawara, Y.: Synaptic plasticity in a cerebellum-like structure depends on temporal order. Nature 387(6630), 278 (1997) 2. Bi, G., Poo, M.: Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18(24), 10464–10472 (1998) 3. Bi, G., Poo, M.: Distributed synaptic modification in neural networks induced by patterned stimulation. Nature 401(6755), 792 (1999) 4. Bi, G., Poo, M.: Synaptic modification by correlated activity: Hebb’s postulate revisited. Annu. Rev. Neurosci. 24, 129–66 (2001) 5. Gerstner, W., Kistler, W.: Spiking Neuron Models. Cambridge University Press, Cambridge (2002) 6. Guyonneau, R., VanRullen, R., Thorpe, S.: Neurons tune to the earliest spike through STDP. Neural Comput. 17, 859–879 (2005) 7. Levy, W., Steward, O.: Temporal contiguity requirements for long-term associative potentiation/depression in the hippocampus. Neuroscience 8(4), 791–7 (1983) 8. Linsker, R.: From basic network principles to neural architecture: Emergence of spatial-opponent cells. Proc. Natl. Acad. Sci. USA 83(19), 7508–7512 (1986) 9. Mackay, D., Miller, K.: Analysis of Linsker’s application of Hebbian rules to linear networks. Network: Comput Neural Syst. 1(3), 257–297 (1990) 10. Markram, L.: Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275(5297), 213 (1997) 11. Masquelier, T., Guyonneau, R., Thorpe, S.: Spike timing dependent plasticity finds the start of repeating patterns in continuous spike trains. PLoS ONE 3(1), e1377 (2008) 12. Masquelier, T., Guyonneau, R., Thorpe, S.: Competitive STDP-based spike pattern learning. Neural Comput. 21(5), 1259 (2009) 13. Masquelier, T., Hughes, E., Deco, G., Thorpe, S.: Oscillations, phase-of-firing coding, and spike timing dependent plasticity: An efficient learning scheme. J. Neurosci. 29(43), 13484–13493 (2009) 14. Masquelier, T., Thorpe, S.: Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Comput. Biol. 3(2), 31 (2007) 15. Pratt, K., Dong, W., Aizenman, C.: Development and spike timing-dependent plasticity of recurrent excitation in the xenopus optic tectum. Nat. Neurosci. 11(4), 467–475 (2008) 16. Rubin, J., Lee, D., Sompolinsky, H.: Equilbrium properties of temporally asymmetric hebbian plasticity. Phys. Rev. Lett. 86(2), 264–267 (2001) 17. Senn, W., Fusi, S.: Learning only when necessary: better memories of correlated patterns in networks with bounded synapses. Neural Comput. 17, 2106–2138 (2005) 18. Song, S., Miller, K., Abbott, L.: Competitive Hebbian learning through spike-timing dependent synaptic plasticity. Nat. Neurosci. 3, 919–926 (2000)
3 STDP Pattern Onset Learning Depends on Background Activity
31
19. Swindale, N.: A model for the formation or orientation columns. Proc. R. Soc. Lond. 215(1199), 211–230 (1982) 20. Wittenberg, G., Wang, S.: Malleability of spike-timing-dependent plasticity at the CA3-CA1 synapse. J. Neurosci. 26(24), 6610–6617 (2006) 21. Zhang, L., Tao, H., Hold, C.: A critical window for cooperation and competition among developing retinotectal synapses. Nature 395(6697), 37 (1998) 22. Zhang, X.: Long-term potentiation at hippocampal perforant path-dentate astrocyte synapses. Biochem. Biophys. Res. Commun. 383(3), 326 (2009)
Chapter 4
Emergence of Small-World Structure in Networks of Spiking Neurons Through STDP Plasticity Gleb Basalyga, Pablo M. Gleiser, and Thomas Wennekers
Abstract In this work, we use a complex network approach to investigate how a neural network structure changes under synaptic plasticity. In particular, we consider a network of conductancebased, single-compartment integrate-and-fire excitatory and inhibitory neurons. Initially the neurons are connected randomly with uniformly distributed synaptic weights. The weights of excitatory connections can be strengthened or weakened during spiking activity by the mechanism known as spiketiming-dependent plasticity (STDP). We extract a binary directed connection matrix by thresholding the weights of the excitatory connections at every simulation step and calculate its major topological characteristics such as the network clustering coefficient, characteristic path length and small-world index. We numerically demonstrate that, under certain conditions, a nontrivial small-world structure can emerge from a random initial network subject to STDP learning.
4.1 Introduction In recent years there has been a growing interest in modeling the brain as a complex network of interacting dynamical systems [1–4]. Data on anatomical and functional connectivity shows that the brain, at many different levels, manifests as a small-world structure, characterized by the presence of highly clustered modules and short distances between nodes [5, 6]. This complex network structure is hypothesized to be an optimal configuration that allows for the localization of function, such as visual or auditory, to specific brain areas, a concept known as functional segregation. At the same time, such a network structure is thought to maximize information flow across different areas; the latter is termed as functional integration [7, 8]. The brain’s ability to rapidly and efficiently combine specialized information from different brain areas, is called information integration property and is even considered to be crucial for consciousness [9, 10]. The complex network approach [11–13] applies the mathematical methods of graph theory to the analysis of the brain networks. Two statistical measures are often used in order to characterize the functional properties of a complex network [14]: the clustering coefficient C and characteristic path length L. The clustering coefficient measures how densely neighboring neurons are connected in the network and, therefore, can be used to quantify the functional segregation property of the network. The clustering coefficient of an individual neuron is equivalent to the fraction of connections of the neuron’s neighbors that are also connected to each other [14, 15]. The network clustering coefficient is defined as the mean clustering coefficient averaged over all neurons in the network. The characteristic path length (or average shortest distance) is defined as the shortest connection path length between two neurons, averaged over all pairs of neurons in the network [14], and, therefore, can be used to quantify G. Basalyga () Centre for Robotics and Neural Systems (CRNS), University of Plymouth, Plymouth, PL4 8AA, UK e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_4, © Springer Science+Business Media, LLC 2011
33
34
G. Basalyga et al.
the integration property of the net. Highly integrated networks have short characteristic path length. For example, neurons, arranged into a ring, form a regular network with high clustering coefficient but long characteristic path length and activity needs longer time to be integrated by the network. In contrast, random networks have short characteristic path length. This leads to fast spreading of activity over the entire network. However, random networks have small clustering coefficient and, therefore, limited ability to segregate and specialize. Small-world networks are formally defined as networks that present a characteristic path length of the same order as a random network, but are significantly more clustered than random networks [13]. In order to quantify this property, we use the small-world index S, which is defined for a given network as follows [16]: C/Cr S= , (4.1) L/Lr where C and L are the clustering coefficient and characteristic path length of a given network; Cr and Lr are the clustering coefficient and characteristic path length of a random reference network. This reference network is usually created from the given network by a random rewiring algorithm that keeps the degree distribution and connection probability the same [12, 17]. For a random network, C = Cr and L = Lr and clearly S = 1. For a small-world network, the structure is more clustered than a random network, thus C > Cr , also, the mean distance is L ≈ Lr , and as a consequence S is greater than 1. Experimental data on functional cortical connectivity estimate the small-world index of cat visual cortex to be in the range from 1.1 to 1.8 (see [6], Table 1). For anatomical connectivity of cat and macaque cortices, the estimated range of S is from 1.4 to 2.6 (see [7], Table 1). Understanding the basic processes that allow for the emergence of these nontrivial cortical structures can allow for much insight in the study of the brain. In particular, in this work we will be interested in a research line which indicates that a small-world structure can evolve from a random network as a result of specific synaptic plasticity [18–20]. This plasticity, known as spike-timingdependent plasticity (STDP) [21, 22], manifests itself in changes in the synaptic weights, depending of spiking activity of the network. In this work we use a complex network approach [12, 13] to study how the underlying network topology changes during STDP learning. As quantitative measures to characterize the time evolution of the network structure, we consider the distribution of synaptic weights, the clustering coefficient and the mean path length at every simulation step. We observe that under specific conditions a nontrivial small-world structure emerges from a random initial network.
4.2 Model The model consists of 100 conductance-based single-compartment leaky integrate-and-fire (LIF) neurons (80% excitatory and 20% inhibitory), connected randomly with a connection probability of 10%. Figure 4.1(a) presents a visualization of the network in neuroConstruct [23, 24]. The equation for the membrane potential Vm for each neuron is: dVm (4.2) = −gL (Vm − EL ) + S(t) + G(t), Cm dt where Cm = 1 µF/cm2 is the specific capacitance, gL = 5 × 10−4 S/cm2 is the leak conductance density and EL = −60 mV is the leak reversal potential. Total surface area of a neuron is 1000 µm2 . The function S(t) represents the spiking mechanism, which is based on the implementation in NEURON of the conductance-based LIF cells as described in Brette et al. [25]. A spike is emitted when Vm reaches a threshold of −50 mV. After firing, the membrane potential is reset to −60 mV. The function G(t) in (4.2) represents the conductance based synaptic interactions: gj i (t)(Vi − Ej ), (4.3) G(t) = − j
4 Emergence of Small-World Structure in Networks of Spiking Neurons
35
Fig. 4.1 (a) Three-dimensional visualization of the model in neuroConstruct. The excitatory neurons are shown in blue, the inhibitory neurons are shown in red. Links between the nodes represent the synaptic connections. (b) Raster plot of the spiking activity of 80 excitatory neurons during 5 seconds of STDP learning, driven by a 50 Hz Poissonian spiking input
where Vi is the membrane potential of neuron i, gj i (t) is the synaptic conductance of the synapse from neuron j to neuron i, and Ej is the reversal potential of that synapse. Ej = 0 mV was set for excitatory synapses, and Ej = −60 mV for inhibitory synapses. Each cell in the model has three types of synapses: fixed excitatory synapses, plastic (STDP) excitatory synapses and fixed inhibitory synapses. The fixed excitatory synapses receive Poissonian random spike inputs and are described by a input double exponential “alpha” function with maximum conductance, gmax = 0.0005 µS, time constants of rise, τrise = 1 ms, and decay constant τdecay = 5 ms. The inhibitory synapses are not plastic and the inhibitory conductances gj i are described by a double exponential “alpha” function with fixed inh = 0.067 µS, rise time constant, τ maximum conductance, gmax rise = 1 ms, and decay constant τdecay = 10 ms. The excitatory recurrent connections in the network are plastic and the synaptic conductances change at every firing event in the following way: exc gij = gmax wij (t),
(4.4)
exc = 0.006 µS is the maximum excitatory conductance; where gmax
wij = wij + wij (t),
(4.5)
and the amount of the synaptic modification wij is defined by the STDP function [22, 26], which depends on the time difference between pre- and post-synaptic spikes, t = tpost − tpre , Ap exp(−t/τp ) if t ≥ 0 (4.6) wij (t) = −Ad exp(t/τd ) if t < 0. In order to avoid weight divergence during learning, the synaptic weights wij are bounded in the range, 0 ≤ wij ≤ wmax , with wmax = 2. The constants Ap and τp set the amount and duration of longterm potentiation. The constants Ad and τd set the amount and duration of long-term depression. We set Ap = Ad = 0.1. Experiments suggest that the STDP potentiation time window is typically shorter than the depression time window: for example, τp = 17 ± 9 ms and τd = 34 ± 13 ms [21]. Therefore, we set τp = 10.0 ms and τd = 20.0 ms. Further details on implementation of synaptic interactions can be found in [24]. The model was constructed using the neuroConstruct software [23, 24] and simulated using NEURON v6.2 [27]. Complex network analysis was performed in Matlab (The MathWorks) using the Brain Connectivity Toolbox [12].
36
G. Basalyga et al.
Fig. 4.2 The distribution of weights, wij /wmax , of STDP synapses at the beginning (a), and at the end (b) of STDP learning. The corresponding adjacency matrix Aij (c)–(d) is obtained after thresholding the connection matrix wij (t) with wc = 0.01
4.3 Results We start from a randomly connected network of 100 LIF neurons, stimulated by 50 Hz Poissonian spike trains. The spiking activity of the excitatory neurons is illustrated in Fig. 4.1(b). Initially the coupling strengths were uniformly distributed (see Fig. 4.2(a)). However, after a certain period of STDP learning, some synapses are strengthened to the maximum weight value, wmax , while the majority of the synapses are weakened to near zero. Therefore, as shown in Fig. 4.2(b), the resulting distribution of synaptic weights becomes bimodal. A binary directed adjacency matrix Aij (t) can be constructed at every simulation step t by thresholding the real values of the connection matrix wij (t). If the synaptic weight of the connection between cells i and j is larger than a threshold value wc , then the connection is regarded to be functional and Aij (t) is set to 1. On the other hand, if wij (t) is less than wc , then Aij (t) is set to 0. In Fig. 4.2(c)– (d) we present the networks corresponding to the adjacency matrices obtained by thresholding with wc = 0.01 at the beginning (t = 0 s) and at the end (t = 5 s) of the simulation. The figures clearly show that, after STDP learning, the network becomes sparser. This effect can be quantified by measuring the average connection density kden , which is defined as the number of connections K present in Aij out of all possible, kden = K/Kmax (where Kmax = N 2 − N for a directed graph, excluding self-connections). Figure 4.3(a) shows how kden drops quickly during STDP learning and reaches a minimum value near 0.01. However, neurons that appear to be completely disconnected from the network at a particular time, may reconnect at other times due to STDP weight modification. The temporal evolution of the network clustering coefficient is shown in Fig. 4.3(b). Network weights wij (t) are sampled every t = 50 ms and, after thresholding with wc = 0.01, the clustering coefficient C(t) is calculated from the obtained adjacency matrix Aij (t). A random reference network is generated from Aij (t) by arbitrarily rewiring the connections but keeping the degree distribution and connection probability the same [12, 17]. In order to avoid the statistical fluctuations due to rewiring, we usually generate 50 reference networks from a given network Aij (t), and calculate the mean values of Cr (t) and Lr (t), averaged over all generated reference networks. As we see from Fig. 4.3(b) and
4 Emergence of Small-World Structure in Networks of Spiking Neurons
37
Fig. 4.3 The temporal evolution of the connection density kden (t) (a) and the network clustering coefficient C(t) (b) during STDP learning. Input spike frequency is 50 Hz. The connection delays in the network are set to 5 ms everywhere. The values of the clustering coefficient of corresponding random network Cr (t) are averaged over 50 reference networks at every 50 ms
Fig. 4.4 The temporal evolution of the complex network measures during STDP learning for the same model as in Fig. 4.3. (a) The temporal evolution of the ratios for the clustering coefficient, C(t)/Cr (t), and characteristic path length, L(t)/Lr (t). The values Cr (t) and Lr (t) are averaged over 50 random reference networks at every 50 ms. (b) The temporal evolution of the small-world index, S(t). The data points are sampled every 50 ms and averaged over every 10 sample points. The mean value of S, averaged over the entire stimulation time, is 1.86
Fig. 4.4(a), during STDP learning, the clustering coefficient of the network becomes larger than that of a typical value calculated from random reference networks, and the ratio C(t)/Cr (t) grows to exceed 1. At the same time, the characteristic path length becomes less or similar to that of a random network, as shown in Fig. 4.4(a). Therefore, the small-world index S(t) grows above 1, as illustrated in Fig. 4.4(b), and the functional structure organized by STDP becomes more small-world like. The simulations for longer times (up to 50 seconds) show that the small-world index fluctuates significantly during learning. Figure 4.5(a) shows the mean values of S, averaged over the entire simulation time, for different input spike frequencies. One can see that the values of S are greater than 1 only in the medium range of input spike frequency, from 10 to 50 Hz. This can be explained in the following way. For low input frequencies (< 10 Hz), there are not enough spikes to reinforce small-world connectivity. For high input frequencies (> 60 Hz), there are too many spikes so that the emerging small-world connectivity gets quickly destroyed by noisy spikes and the small-world index just fluctuates around 1 during the entire simulation time. The numerical simulations indicate that the effect of emergence of small-world structure also depends on the choice of model parameters such as connection delays. For the model in Fig. 4.5(a), all connection delays are fixed at 5 ms. Figure 4.5(b) shows that the effect is different for a model with randomly distributed connection delays.
38
G. Basalyga et al.
Fig. 4.5 The mean values of the small-world index (averaged over 50 s simulation time), as function of the input spike frequency for two models with different connection delay distributions. (a) The standard model described in Sect. 4.2 with all connection delays fixed at 5 ms. (b) The model with random uniform distribution of delays in the range from 1 ms to 10 ms
4.4 Discussion In this paper, we analyzed how a neural network structure evolves under spike-timing-dependent plasticity using the complex network approach. We started from a typical random neural network and demonstrated that a small-world structure can emerge through STDP learning under certain conditions. However, the numerical simulations indicate this emergence is sensitive to the choice of model parameters. Input statistics can interact with the time constants of neurons and synapses so that, during STDP, the small world index will simply fluctuate around 1 and the network structure becomes temporally small-world like but then tends to return to a random organization. Also, as it was demonstrated in similar studies [18], the balance between the excitation and inhibition in the model is important to achieve the effect. Further studies are required to establish the relationship between the formation of a nontrivial network structure and the dynamical properties of a neural network. It would be interesting to measure the temporal evolution of network information integration capacity during STDP learning. However, currently available methods for calculating information integration measures are valid only for small networks of 8 to 12 nodes [28, 29]. New algorithms for estimating information integration for large realistic neural networks, need to be developed in the future to address this issue. Acknowledgement
This work was supported by an EPSRC research grant (Ref. EP/C010841/1).
References 1. Sporns, O., Chialvo, D.R., Kaiser, M., Hilgetag, C.C.: Organization, development and function of complex brain networks. Trends Cogn. Sci. 8, 418–425 (2004) 2. Reijneveld, J.C., Ponten, S.C., Berendse, H.W., Stam, C.J.: The application of graph theoretical analysis to complex networks in the brain. Clin. Neurophysiol. 118, 2317–2331 (2007) 3. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev., Neurosci. 10(3), 186–198 (2009) 4. Gomez Portillo, I.J., Gleiser, P.M.: An adaptive complex network model for brain functional networks. PLoS ONE 4(9), e6863 (2009). doi:10.1371/journal.pone.0006863 5. Sporns, O., Honey, C.J.: Small worlds inside big brains. Proc. Natl. Acad. Sci. USA 103(51), 19219–19220 (2006) 6. Yu, S., Huang, D., Singer, W., Nikolic, D.: A small world of neuronal synchrony. Cereb. Cortex 18(12), 2891–2901 (2008) 7. Bassett, D.S., Bullmore, E.: Small-world brain networks. Neuroscientist 10, 512–523 (2006)
4 Emergence of Small-World Structure in Networks of Spiking Neurons
39
8. Sporns, O., Tononi, G., Edelman, G.M.: Connectivity and complexity: the relationship between neuroanatomy and brain dynamics. Neural Netw. 13(8–9), 909–922 (2000) 9. Tononi, G., Edelman, G.M., Sporns, O.: Complexity and coherency: integrating information in the brain. Trends Cogn. Sci. 2, 474–484 (1998) 10. Tononi, G.: An information integration theory of consciousness. BMC Neurosci. 5, 42 (2004) 11. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.-U.: Complex networks: Structure and dynamics. Phys. Rep. 424, 175–308 (2006) 12. Rubinov, M., Kotter, R., Hagmann, P., Sporns, O.: Brain connectivity toolbox: a collection of complex network measurements and brain connectivity datasets. NeuroImage 47(Suppl 1), 39–41 (2009) 13. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: Uses and interpretations. NeuroImage 52(3), 1059–1069 (2010) 14. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998) 15. Fagiolo, G.: Clustering in complex directed networks. Phys. Rev. E, Stat. Nonlinear Soft Matter Phys. 76(2), 026107 (2007) 16. Humphries, M.D., Gurney, K.: Network ‘small-world-ness’: A quantitative method for determining canonical network equivalence. PLoS ONE 3(4), 0002051 (2008). doi:10.1371/journal.pone.0002051 17. Maslov, S., Sneppen, K.: Specificity and stability in topology of protein networks. Science 296(5569), 910–913 (2002) 18. Shin, C.-W., Kim, S.: Self-organized criticality and scale-free properties in emergent functional neural networks. Phys. Rev. E 74(4), 45101 (2006) 19. Kato, H., Kimura, T., Ikeguchi, T.: Self-organized neural network structure depending on the STDP learning rules. In: Visarath, X., et al. (eds.) Applications of Nonlinear Dynamics. Model and Design of Complex Systems. Understanding Complex Systems, pp. 413–416. Springer, Berlin (2009) 20. Kato, H., Ikeguchi, T., Aihara, K.: Structural analysis on STDP neural networks using complex network theory. In: Artificial Neural Networks—ICANN 2009. Lecture Notes in Computer Science, vol. 5768, pp. 306–314. Springer, Berlin (2009) 21. Bi, G., Poo, M.: Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18, 10464–10472 (1998) 22. Song, S., Miller, K.D., Abbott, L.F.: Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat. Neurosci. 3, 919–926 (2000) 23. Gleeson, P., Steuber, V., Silver, R.A.: neuroConstruct: a tool for modeling networks of neurons in 3D space. Neuron 54(2), 219–235 (2007) 24. Gleeson, P., Crook, S., Cannon, R.C., Hines, M.L., Billings, G.O., Farinella, M., Morse, T.M., Davison, A.P., Ray, S., Bhalla, U.S., Barnes, S.R., Dimitrova, Y.D., Silver, R.A.: NeuroML: A language for describing data driven models of neurons and networks with a high degree of biological detail. PLoS Comput. Biol. 6(6), 1000815 (2010). doi:10.1371/journal.pcbi.1000815 25. Brette, R., Rudolph, M., Carnevale, T., Hines, M., Beeman, D., Bower, J., Diesmann, M., Morrison, A., Goodman, P., Harris, F., Zirpe, M., Natschlager, T., Pecevski, D., Ermentrout, B., Djurfeldt, M., Lansner, A., Rochel, O., Vieville, T., Muller, E., Davison, A., El Boustani, S., Destexhe, A.: Simulation of networks of spiking neurons: A review of tools and strategies. J. Comput. Neurosci. 23, 349–398 (2007) 26. Billings, G., van Rossum, M.C.W.: Memory retention and spike-timing-dependent plasticity. J. Neurophysiol. 101, 2775–2788 (2009) 27. Carnevale, T., Hines, M.: The NEURON Book. Cambridge University Press, Cambridge (2006) 28. Tononi, G., Sporns, O.: Measuring information integration. BMC Neurosci. 4, 31–51 (2003) 29. Balduzzi, D., Tononi, G.: Integrated information in discrete dynamical systems: Motivation and theoretical framework. PLoS Comput. Biol. 4(6), 1000091 (2008). doi:10.1371/journal.pcbi.1000091
Chapter 5
Coupling BCM and Neural Fields for the Emergence of Self-organization Consensus Mathieu Lefort, Yann Boniface, and Bernard Girau
Abstract Human beings interact with the environment through different modalities, i.e. perceptions and actions, processed in the cortex by dedicated brain areas. These areas are self-organized, so that spatially close neurons are sensitive to close stimuli, providing generalization from previously learned examples. Although perceptive flows are picked up by different spatially separated sensors, their processings are not isolated. On the contrary, they are constantly interacting, as illustrated by the McGurk effect. When the auditory stimulus /ba/ and the /ga/ lip movement are presented simultaneously, people perceive a /da/, which does not correspond to any of the stimuli. Merging several stimuli into one multimodal perception reduces ambiguities and noises and is essential to interact with the environment. This article proposes a model for modality association, inspired by the biological properties of the cortex. The model consists of modality maps interacting through an associative map to raise a consistent multimodal perception of the environment. We propose the coupling of BCM learning rule and neural maps to obtain the decentralized and unsupervised self-organization of each modal map influenced by the multisensory context. We obtain local self-organization of modal maps with various inputs and discretizations.
5.1 Introduction The cortex is divided into several areas, each one corresponding to a main function. Some of them are dedicated to compute a perceptive flow as visual areas V1 to V5. Each area is made up of multilayer cortical columns (see Mountcastle [15] for an overview). Thanks to this generic structure, areas are able to process other functions, depending on the type of inputs they receive, as it can be seen in perceptual cortical areas of disabled people (see Elbert and Rockstroh [9] for example). An object is defined by a set of perceptions and affordances. This term, defined in Gibson’s theory [10], refers to the possible interactions between an object and a living being. Examples of affordances of a tree are “lie down” or “climb on” for a human or “perch on” for a bird. In order to perceive an object, the cortex has to compute this set of modalities as a single multimodal stimulus. Although sensations from an object reach the cortex through different sensors, the cortex merges them into associative areas to form an unified representation. This reduces input noise and makes it possible to recall one perception from another one. For example, an adult is able to identify an object from a set of previously seen objects only by touching it. Perceptual flows are not only associated in the cortex but they interact with each other. These influences are easily detectable. For example, when someone is watching a ventriloquist show, his ears perceive a sound coming from the ventriloquist and his eyes see the lip movements of the puppet. Visual perception influences the auditory one that appears to come from the puppet. Bonath et al. [4] M. Lefort () INRIA/LORIA Laboratory, Campus Scientifique, B.P. 239, 54506 Vandoeuvre-lès-Nancy Cedex, France e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_5, © Springer Science+Business Media, LLC 2011
41
42
M. Lefort et al.
showed that, during the ventriloquist illusion, activity in the auditory area first matches the real sound sensation and secondly, after the associative feedback, activity increases in the zone that corresponds to the perceived sound location. Moreover, merging perceptions is useful to increase our perceptive threshold. Goldring et al. [12] showed that the reaction time to a monomodal stimulus (visual or auditory) is longer than to the multimodal one. The gain may be the consequence of the reduction of noise in multimodal stimuli due to an increase of information quantity. In the active perception theory described in [16], associating perceptions and actions to make an unique multimodal perception is an essential mechanism for interacting with the environment. Obtaining a coherent perception of an object requires exploring it with actions, as visual saccades. It allows the cortex to detect correlations between actions and perceptions to act in a sensory motor behavior. The context of our study is the design of a multimap, multimodal and modular architecture to process perception/action tasks. The aim is not to model the cortex but to be inspired by its properties such as the genericity of its architecture, the self-organization and plasticity of perceptive areas, the merging of perceptions and their reciprocal influences to form a unified view of the environment. This paradigm raises the question of how to associate different modalities, especially how to obtain the self-organization of a perceptive map in a multimodal context thanks to continuous learning? To answer this question, we propose a model of neural map which self-organizes thanks to a local, continuous, decentralized and unsupervised learning, modulated by a high level spatially consistent signal. By assembling several modality maps around an associative one, we create a multimodal context that emerges thanks to a competitive mechanism. In return this multimodal context influences the perception in the modality maps. This perception is used as a modulating signal to influence the self-organization of each modality map. In Sect. 5.2, we first explain the general architecture of our model. In Sect. 5.3, we discuss how a perception arises within a neural map. In Sect. 5.4, we describe more precisely the architecture of a modality map. In Sects. 5.5 and 5.6, we introduce the BCM learning rule that we choose to detect sensations in the modality map, and the modification we made of it. In Sect. 5.7, we present the modality association architecture we use. Section 5.8 shows and analyses several experimental self-organized perceptions that we obtain with our model.
5.2 General Description of the Model Two points appear to be essential for an agent to interact with a continuously changing environment. The first point is that perception is active, meaning that its actions are essential for its perceptions (see Noë [16] for an overview). Acting within an environment leads to changes that the cortex is able to correlate with the changes in sensation to raise perception. The second point is the capacity of generalization of the cortex. From a reduced finite set of examples, we are able to act in a continuous and unknown environment. Our goal is to develop a multimap and multimodal architecture inspired by the properties of the cortex to perform perception/action tasks with generalization (see Fig. 5.1). The system consists of neural maps composed of generic multilayer structures inspired by cortical columns. Each map can be a modality map that processes all kinds of perceptions or actions, or an associative map that links several modality maps. Each map has a 2D architecture and intra and extra map connections, which are all generic. Each modality map can result in a perception, and the associative map represents the multimodal context. Modal perceptions have to be globally consistent to obtain a multimodal perception of the environment. The multimodal correlations are learned through the weights of the lateral connections, whereas generalization is provided by self-organization of each modality. The self-organization appears under the constraints of the multimodal context, meaning that each modality map self-organizes
5 Coupling BCM and Neural Fields for the Emergence of Self-organization
43
Fig. 5.1 Example of use of the multimap architecture for a sensory motor integration. Each map is composed of generic cortical columns and the intra and extra map connections are also generic. The perception in each modality map has to be consistent with the multimodal context represented in the associative map
in a way such that it is consistent with the other ones. These constraints provide a generalization at the multimodal level. Computation of neural activity and learning are local, continuous, decentralized and unsupervised to provide a robust, autonomous and adaptive system. The design of such an architecture raises a lot of problems. The first one is how to obtain the selforganization of a map with a local, continuous, decentralized and unsupervised learning, especially with a multimodal context. For example, the self-organizing map model proposed by Kohonen [13] works with a centralized learning and a decreasing learning distance. The second main problem is how to associate modality maps to influence perceptions and to provide a recall mechanism. This article focuses on the first point and it answers by coupling neural maps with the BCM learning rule. The activity in a modality map has to be representative of the current perception. For that, we use the dynamic neural field (DNF) theory which creates a localized and spatially consistent activity in a neural map from the activity of the current sensation (see Sect. 5.3.2). To obtain a sensation, we use the BCM learning rule that raises selectivity to a stimulus (see Sect. 5.5). We introduce in this rule a modulation by the high level spatially consistent activity of the DNF, to obtain a self-organization of the BCM selectivity at the map level (see Sect. 5.6). One answer to the second question is provided by the Bijama model developed by Ménard and Frezza-Buet [14], which consists in a multimap and multilayer architecture for multimodal integration. We use their multimap architecture to test our model of modality map in a multimodal context (see Sect. 5.7).
5.3 Emerging Perception Within Neural Maps 5.3.1 Expected Properties The activity inside the modality map represents the current perception. This perception is generated from the sensory activity influenced by the multimodal context. These two information flows may be contradictory and a consensus has to emerge. Moreover, the sensory information can be noisy and need to be filtered. We need a continuous and decentralized mechanism that is able to raise one unique perception in the neural map, from the current noisy and contradictory input. It will reduce a complex and multidimensional input to a single and consistent stereotyped perception. Moreover, we want the activity to be spatially consistent, meaning that two close neurons have close activities, in order to use it as a high level signal to obtain a self-organization of the sensory layer.
44
M. Lefort et al.
A winner-take-all mechanism is a simple way to compute a competition between inputs. However, it is a centralized computation, and therefore does not satisfy our constraints. Instead, we use a dynamic neural fields, which provides the most important expected properties.
5.3.2 Dynamic Neural Field Theory The dynamic neural fields (DNF), developed by Wilson and Cowan [21] and Amari [1], provides a decentralized competitive mechanism that raises an activity bump on a continuous manifold M which takes as input a manifold M . The equation is the following: 1 ∂u(x, t) w(x − x )f (u(x , t))dx + s(x, y)I (y, t)dy + h = −u(x, t) + τ ∂t M M with τ the temporal decay, u(x, t) the membrane potential at the point x at the time t, w the weights of the lateral connectivity, s the weights of the afferent connections, f representing the mean firing rate as some function of the membrane potential, I the activity of the input and h the mean neuron threshold. f is a sigmoid or a linear function of u and w is a difference of Gaussian with local excitation and global inhibition: 2
w(x) = Ae
− |x|2 a
2
− Be
− |x|2 b
with A, a, B, b ∈ R+∗ and a < b and A > B. In our case, as the map is at a mesoscopic level with a model of cortical columns, we use a simple discrete modeling of the DNF and we assume that w and s are constant in time. The equation becomes: 1 dU (t) = −U (t) + W ∗ U (t) + S ∗ I (t) τ dt with U (t) a m1 × m2 array of the map activity at time t, ∗ the convolution product, W the m1 × m2 × m1 × m2 array of lateral connectivity with a difference of Gaussian shape, I a m1 × · · · × mn array of the input activity and S the m1 × m2 × m1 × · · · × mn array of the afferent weights. To avoid activity explosion, the activity of each column is bounded between 0 and 1.
5.3.3 Properties This equation, with a difference of Gaussian lateral connectivity, provides a decentralized competitive mechanism between inputs and raises an activity bump at the map level. This activity bump is created and sustained thanks to the local lateral excitation. The width of the inhibitory Gaussian connections determines the minimum distance between two activity bumps. This ensures that the activity bump is unique if the inhibition is wide enough for the map. This bump has a stereotyped Gaussian shape, because of the excitatory lateral connectivity, and it appears where the input is the most spatially and temporally consistent. A DNF is able to raise one stereotyped and spatially consistent activity bump whatever the input provided, that it contains information, meaning that the activity of some neurons is significantly higher that the other ones. A DNF filters the input as the property remains true with noisy inputs (see Rougier [17] for more precisions). A DNF provides a way to raise one stereotyped perception by a decentralized competition mechanism. Depending on the parameters, the DNF adopts one of two behaviours. When the input activity is predominant, the activity of the DNF is very reactive to the input change, but does not present a
5 Coupling BCM and Neural Fields for the Emergence of Self-organization
45
Fig. 5.2 Generic layered structure of a cortical column in a modality map
perfectly stereotyped activity bump, especially when input is not spatially consistent. On the contrary, when the lateral activity is predominant, the activity bump is stereotyped but possesses a strong inertia and tends to be stable, which leads to inconsistency between the sensation and the perception. Since we want a stereotyped bump which reacts to stimulus changes, we have to correctly parameterize the DNF equation in order to obtain a good balance between the two expected properties and their consequences on the activity and the dynamics.
5.4 Modality Map A modality map has a generic architecture and can represent either a perception or an action. The activity bump in the map stands for the current perception. This perception will be influenced by the multimodal constraints to be consistent at the multimodal level. The mechanisms for the emergence of the multimodal constraints are explained in Sect. 5.7. The sensory representations have to selforganize with a local, continuous, decentralized and unsupervised learning to provide a generalization of the environment.
5.4.1 Description A cortical column of a modality map is made up of two main layers (Fig. 5.2). The first one is a sensory layer, which provides an activity representative of the current input. The expected properties of this layer are described in the next section. The competitive layer takes the sensory layer as input and computes the DNF equation, providing a spatially consistent activity bump as output of the map, representing the current perception (see Sect. 5.3.2). This perception will be influenced by the other ones in a multimodal context (see Sect. 5.7 for details about the multimodal association). As there is a feedback from the competitive to the sensory layer, the constrained perception will influence the sensory selectivity self-organization such that it becomes consistent in a multimodal context.
5.4.2 Expected Properties of the Sensory Layer The sensory layer provides an activity which is representative of the current input. This layer has to self-organize thanks to a continuous, decentralized and unsupervised learning, meaning that two close neurons are sensible to close stimuli. This self-organization provides a generalization mechanism, since the sensory layer is able to provide a spatially consistent activity, even for unknown inputs. To provide a multimodal generalization, the self-organization of each modality has to be consistent with the others. As the activity of the competitive layer represents the local perception influenced by
46
M. Lefort et al.
Fig. 5.3 LTP/LTD threshold θ is sliding depending on the recent neuron activity. If LTP recently occurred (w > 0), LTD is favored by the increase of θ (adapted from Bear [2])
the multimodal context (see previous section), it is used to drive the self-organization of the sensory layer. The DNF takes the output of the sensory layer as input, directly if the map is isolated or through the merging layer in the multimodal case (see Sect. 5.7.1). Therefore, the sensory layer has to provide information to the competitive layer, meaning that the activity of some columns is significantly higher than the other ones for a given stimulus. The sensory layer of a cortical column has to be able to develop autonomously a selectivity to some of the environment stimuli. This idea is consistent with Wallace’s observations [19] that show that the neuron ability to integrate multisensory information and so the dependency to multisensory merging, increases with age. Kohonen maps [13] are self-organizing maps. Even if this learning rule is centralized, it can be decentralized as it has been done in the Bijama model [14]. Nevertheless since the Kohonen model involves a leader neuron that continuously switches according to the current stimulus, whereas the DNF activity bump possesses inertia, leading some inconsistencies between the current perception and sensation, this may cause mislearning. Moreover, since every isolated neuron using the Kohonen learning rule converges to the mean of its inputs, there is no increase of information at the map level, which is incompatible with DNF coupling. We propose to define our sensory layer by using the BCM learning rule, which is able to develop a selectivity, and adapt it to obtain a self-organization of the independently raised selectivities, driven by the activity bump of the competitive layer.
5.5 BCM Learning Rule 5.5.1 Biological Inspiration Hebbian learning is based on the correlation between the pre-synaptic and the post-synaptic neural activities. If both neurons are coactivated, a long term potentiation (LTP) occurs meaning that the weight of the synapse between them is increased. The Cooper-Liberman-Oja (CLO) rule [8] introduces a fixed threshold. When the pre-synaptic activities do not succeed in activating the post-synaptic neuron above this threshold, weights decrease (LTD for long term depression), whereas LTP occurs when the post-synaptic neuron activity is above the threshold. Some biological experiments [2] show that this LTD/LTP threshold is sliding in the opposite direction of the previous activity history of the neuron, meaning that if LTP recently occurred, the threshold increases to favour LTD (see Fig. 5.3). The Bienenstock Cooper Munro (BCM) rule [3] is based on this biological fact which has the property of self regulating weights contrary to the Hebbian rule.
5.5.2 Equations The output activity u of a neuron is equal to the weighted sum of its input x: u = w.x
5 Coupling BCM and Neural Fields for the Emergence of Self-organization
47
The θ value of sliding LTP/LTD threshold is equal to the expectation of u2 on a temporal window τ . This provides the biological property of LTD/LTP threshold sliding in the opposite direction of the previous activity history of the neuron. θ = Eτ [u2 ] The weights modification is defined by: w = ηxφ(u, θ ) and
φ(u, θ ) = u(u − θ )
(5.1)
with η the learning rate and φ(u, θ ) a function that defines the proportion of LTP and LTD as an approximation of the biological observations of Fig. 5.3.
5.5.3 Properties The first interesting property of the BCM learning rule is that a neuron independently develops a selectivity to a specific input, which depends on the initial weights and the temporal sequence of stimuli. In the BCM learning rule (5.1), neuron weights have converged when w = 0, which corresponds to u = 0 or u = θ . In the case of independent noisy input vectors, the only stable weight vector contains only one non zero value. Indeed, the sliding threshold regulates weights but also introduces a spatial competition between stimuli. Only activities for one stimulus can be at the threshold level (see Cooper et al. [7] for more precision). The second property is the plasticity of the BCM learning rule. Equation (5.1) uses a constant learning rate and its stabilization is reached when the neuron is selective to one input. The activity of the neuron for this input is then inversely proportional to its probability of its appearance in the environment. If the input distribution changes, corresponding to an environmental change, neural weights are no more stables and the response value is adapted. If the stimulus completely disappears from the environment, the neuron is able to develop another selectivity.
5.5.4 Self-organization Autonomous selectivity and plasticity are two desirable properties for our system, but the BCM rule only considers independent neurons. This means that at the map level, BCM neurons map the input space without any kind of organization. To achieve self-organization, BCM neurons must have influence on each other. In their book, Cooper et al. present an adaptation of their rule to obtain such a self-organization of a neural map. It consists in the addition of a lateral connectivity in order to influence the output. The lateral connectivity used is a difference of Gaussian with short range excitation and long range inhibition. It adds excitation to close neurons to favour the learning of close inputs, by raising their output u above θ . On the contrary, distant neurons are inhibited and this leads them to learn dissimilar inputs. Since we want to be able to influence the self-organization using the multimodal context, we choose to use the perceptive activity bump of the competitive layer instead of a lateral connectivity to modulate the learning rule. This activity bump reflects the local sensation constrained by the multisensory context. It is also spatially consistent, meaning that the activity is spatially localized and decreases with the distance to the center of the bump, which is necessary to obtain a self-organization.
48
M. Lefort et al.
5.6 BCM Feedback Modulation 5.6.1 Principle The activity bump in the modality map results from relaxation and competition between the local sensation and the multimodal context to be consistent. This activity bump of the competitive layer still has to be introduced as an influence upon the self-organization of the perceptive layer. Moreover, this bump is spatially consistent with a Gaussian shape. By using it as feedback for the BCM learning, the spatial consistency is propagated to the sensory layer which will self-organize. Girod and Alexandre [11] show that it is possible to influence the selectivity of a neuron using the BCM rule by modulating its output. The idea is to favour the LTP (respectively the LTD), by increasing u above θ , when the feedback activity is high (respectively low). Mathematically, the feedback modifies the zeros of the equation and so the basins of attraction of each input. We choose to use a multiplicative modulation mechanism. When testing the self-organization with additive lateral connectivity (see Sect. 5.5.4), we have noticed some problems due to a possible negative output and the modification of the equilibrium points which prevents modulation to be as high as needed. Moreover, multiplicative modulation occurs in the cortex, especially for coordinate transformation (see for example Salinas and Thier [18]). Using the multimodal constraints, we create multimodal common coordinates in which each modality, initially perceived in its own coordinates, is able to self-organize in a way to be spatially consistent with other modalities.
5.6.2 Equation We introduce a modulation feedback of the BCM learning rule with a bump activity dependent term. This modulation can be additive or multiplicative. We choose the second option because it allows modulation as high as needed (see previous section). The equation is the following: u = w.x × mod(A∗ )
(5.2)
with A∗ the activity of the upper layer and mod a modulation function dependent on A∗ . Any increasing function of A∗ may be used as feedback modulation. Feedback modulation influences the probability of the neuron to become selective to one stimulus. The higher is the feedback, the higher is the probability for the neuron to develop the selectivity. Moreover, as the BCM learning rule regulates its activity to be one at a time average, independently of the input values, the value of the modulation when A∗ = 0 does not make any difference. In this article, the used modulating term is a sigmoid parametrized by α and β (see (5.3) and Fig. 5.4) which is limited by 0, to avoid negative outputs, and mod(A∗ ) > 1 for A∗ > 0. α (5.3) mod(A∗ ) = ∗ 1 + (α − 1)e−βA As a consequence, when there is an activity bump, the value of the output u increases because of the modulating term. It may then overcome θ and thus learns the current perception (LTP).
5.7 Modality Association The modality association architecture has to be generic and scalable and to provide a way to create a multisensory perception which influences each local perception to be globally consistent. Indeed,
5 Coupling BCM and Neural Fields for the Emergence of Self-organization
49
Fig. 5.4 Modulating function mod used for the BCM rule
Fig. 5.5 Generic layered structure of a cortical column in a modality map with multimodal constraints
by influencing the localization of the activity bumps (which represents current perceptions) with the multimodal context, the self-organization of each modality map may be guaranteed to be consistent with the other ones. The Bijama architecture, developed by Ménard and Frezza-Buet [14], proposed an architecture for multimodal integration. Their association architecture is adopted here as one possible solution to create a constrained multimodal context. To obtain a multimodal perception, each modality map is linked to an associative map with constrained topographic connections. Multimodal constraints are created with reciprocal connections from the associative map to modality maps which influences the bump localization through the addition of two layers in the modality maps.
5.7.1 Modality Map with Multimodal Constraints The structure of modality map is the same as in the isolated case (see Sect. 5.4) with the addition of the two middle layers. They are used to connect modality maps and influence local sensation with the other perceptions (Fig. 5.5). More precisely, the cortical layer integrates multimodal constraints. Its output is the value of the lateral connection to the associative map, i.e. the weighted sum of the outputs of the cortical columns that are connected to it (see Sect. 5.7.3 for precisions on inter map connectivity). Its value represents the multimodal context for this perception. The merging layer merges the first and second layer activities to influence the local sensation by the multimodal context. It is inspired by the three cortical column states described by Burnod [6]. This is done by the product of the cortical layer activity, plus a leaking term, with the sensory activity. The leaking term is useful when the multimodal context is not yet consistent. Yet, the competitive layer takes, as input, the merging layer output, which represents the local sensation influenced by the multimodal context to be consistent with the other perceptions. This means that the localization of the DNF activity bump, representing the current perception, is influenced by the multisensory perception.
50
M. Lefort et al.
Fig. 5.6 Generic layered structure of a cortical column in an associative map
Fig. 5.7 The different modality maps are reciprocally connected to an unique associative map with a strip connectivity topographically organized. (A) The associative map is connected to all modality maps. The first layers (in gray levels) of an associative cortical column are each connected to a different modality map with a strip connectivity depicted in corresponding gray. (B) Each modality map is connected to the associative map using the same strip connectivity. See Sect. 5.7.1 (resp. Sect. 5.7.2) for precisions on the cortical column architecture of the modality (resp. associative) map
5.7.2 Associative Map An associative cortical column has a various number of layers depending on how many modality maps are connected to it (see Fig. 5.6). Each one of the first layers, named cortical layers, computes the lateral influence of the corresponding modality map by a weighted sum of the outputs of the cortical columns that are connected to it (see Sect. 5.7.3 for precisions on inter map connectivity). The merging layer integrates all the perceptive informations by multiplying the values of the previous layers plus a leaking term. The leaking term, as in the modality cortical column, is useful to obtain an activity even when the different perceptions are inconsistent. The competitive layer is the same as in a cortical column of a modality map, taking the merging layer as input and providing, as map output, a competitive activity bump, that represents the multimodal context.
5.7.3 Lateral Connectivity Modality maps are reciprocally connected to an associative map as described in Fig. 5.7. These connections have strip shapes, inspired by the cortical column model of Burnod [6] and are representative of long-range connectivity between cortical areas. They are topographically organized, meaning that two close cortical columns are connected to close cortical columns in the distant map. Strip connectivity introduces constraints in the multimodal integration. Indeed, if perceptive activity bumps are not fully consistent, meaning that they are not located in strips that cross in a single
5 Coupling BCM and Neural Fields for the Emergence of Self-organization
51
Fig. 5.8 (A) The activity in the associative map is high if the different perceptions in the modality maps are consistent, i.e. the activities are located in strips with an unique intersection point. (B) In other cases, integrating perceptions provides a badly located and low level activity bump
point (see Fig. 5.8(B)), activity in the associative map remains low and is badly located. In this case, the relaxation of the strips constraints, through the dedicated layer in each cortical column, leads the perceptive bumps to move until they reach an equilibrium point (see Fig. 5.8(A)). Since the activity bump is used as a high level signal to obtain a self-organization of the perceptive layer, changing its localization influences the self-organization. This means that lateral connectivity forces self-organization of each modality map to be globally consistent by choosing a local selforganization which is consistent with the context. This provides consistency and generalization at the multimodal level. The learning of correlations between perceptions is performed on the lateral weights of the strips that connect the maps. The use of Widrow and Hoff’s rule [20] makes the strips learn presences and not values. The activity in the strip, i.e. the weighted sum of the output activities of the cortical columns that lie in the strip, will be high only when an activity bump is present in the strip at some learned positions. Moreover, the number of such learned positions is not limited, such that a single cortical layer may be active (high activity) for multiple distant perceptions. This allows the system to learn multiple correlations for a single perception, like for example a red box and a blue one.
5.8 Experiments First, we test our modality map model in an isolated case, meaning that the map is not connected to other perceptions (for the architecture see Sect. 5.4 and Fig. 5.10). This intends to validate the coupling of the BCM learning rule with the DNF competitive mechanism and allows to divide the parameter space. Secondly, we connect several modality maps with the Bijama assembling architecture (see Sect. 5.7) to validate the constrained self-organization in a multimodal context.
5.8.1 Protocol We use Gaussian inputs centered in a finite range as inputs of the model. Each Gaussian is discretized on a constant finite number of neurons, whose activity is equal to the Gaussian value at the point of discretization (Fig. 5.9). The number of different inputs is infinite and can be 1D, 2D or more. One dimensional inputs may be interpreted as the orientation of a bar by population coding in the visual cortex. Two dimensional inputs may correspond to spatially localized spot stimuli on the retina.
52
M. Lefort et al.
Fig. 5.9 Inputs used to test our map model are successive random discretized Gaussian centered within a finite range
Fig. 5.10 Architecture of the modality map isolated. The connections are shown for one cortical column and are generic
Inputs are presented to the model for a finite period. The presentation period has to be long enough to ensure the stabilization of the DNF. This may represent an attentional system that focusses on a stimulus. Input sequence is random because when people perform saccades, there is no temporal continuity between two consecutive stimuli. Since BCM learning is based on spatial competition, the variance of the input Gaussian has to be large enough to ensure the partial overlapping of close inputs. For the weights of the afferent connection of the competitive layer (see Sect. 5.3.2) we choose a Gaussian receptive field (Fig. 5.10). This is biologically inspired and it creates spatial consistency for the DNF input. Indeed, as the BCM neuron develops a selectivity independently, there is no spatial consistency at the map level. With a Gaussian receptive field, the activity of one column in the sensory layer is spread to close columns at the competitive layer. This mechanism helps DNF to raise an activity bump from the activity of the sensory layer.
5 Coupling BCM and Neural Fields for the Emergence of Self-organization
53
Fig. 5.11 Representation in gray scale of the center value of the discriminated Gaussian by the sensory layer of each column of the map using (A) the BCM rule, (B) the modulated BCM rule
Fig. 5.12 Representation of the discriminated 2D-input by the sensory layer of each column of the modality map. The first dimension of the discriminated input is represented by the orientation of the bar, the second one by a toric color coding. The two scales are printed below the self-organization
5.8.2 Results 5.8.2.1 Modality Map Isolated We test our modal map model isolated in order to validate our approach of BCM and DNF coupling to obtain a self-organization. The modality map model is described in Sect. 5.4 and the inputs used, in the previous section. We compare the organization of our modality map with the one of isolated BCM neurons to see the influence of the DNF modulation on the organization. We first test our model with 1D inputs. Results are presented Fig. 5.11 where sensible stimulus of each sensory layer is represented in gray scale. As expected, with no lateral interactions, BCM neurons develop a selectivity and match the input space but do not present any organization (A). On the contrary, the sensory layer of our modality map is self-organized (B). The self-organization reached is stable. Indeed, since the BCM equation converges toward a stable selectivity, it is robust to feedback inconsistency, generated by the DNF inertia, once the selectivity is reached. We test also our modality map architecture with 2D inputs. The results are presented Fig. 5.12 where the first coordinate of the sensible stimulus of each sensory layer is represented by an orientation, and the second coordinate by a toric color coding. We can see a self-organization over both dimensions. However, the organization is more local than previously because the mapping of a 2D input space is more constrained (see Sect. 5.8.3 for more explanations). With the 2D inputs, we use a 20 × 20 modality map. This increases the discretization of the neural field and consequently of the self-organization of the sensory layer. Indeed, DNF parameters are scaled
54
M. Lefort et al.
Fig. 5.13 Representation in gray scale of the discriminated input by the sensory layer of each column of a map in (A) an isolated context, (B) a multimodal context
from 10 × 10 to 20 × 20, meaning that the width of the difference of Gaussian connection relative to map size is the same. Thanks to the genericity of the architecture, it is consequently easy to increase the resolution of the map without having to parametrized the model again.
5.8.2.2 Modality Map with Multimodal Context To test our map model with a multimodal context, we use the inter map strip connectivity around an associative map proposed in the Bijama architecture (see Sect. 5.7). We use three perceptive maps which receive the same 1D inputs by simplicity. We compare the self-organization of a map within and out of the multimodal context using the same common parameters and receiving the same input flow (see Fig. 5.13). As expected, we observe that the multimodal context biased the self-organization of the modality map by constraining the localization of activity bumps of the system so that bumps will be globally consistent as in Fig. 5.8(A).
5.8.3 Analysis It has to be noticed that, for the task of mapping a 1D input space into a 2D map, the Kohonen learning squashes the map into one dimension. This organization can be stable only thanks to the decreasing learning width. Indeed, as each cortical column is not at the center of its response space, a constant learning width will lead to instability. The BCM learning rule succeeds in overcoming this thanks to the stability of the selectivity of each cortical column. For the mapping of a 2D input space into the 2D modality map, more discontinuities appear comparing to 1D inputs. This means that the organization is locally continuous but not globally. Here, this is partially due to the dimensional constraints of the mapping. For the 1D inputs, the degree of freedom of the mapping is higher than for the 2D inputs. More generally, these discontinuities can be explained by the use of the BCM learning rule. The first point is that the competitive layer is able to raise an activity bump only when its input contains information. This means that the BCM equation has to begin to develop a selectivity before being modulated. Since the selectivity is stable once reached, the earlier the feedback is effective, the more efficient is the modulation on the selectivity choice. A solution may be to introduce an unlearning of the currently sensory representation when it is inconsistent with the received feedback. The second point is that BCM learning rule uses spatial competition to improve selectivity. This means that a neuron, which receives feedback for two spatially non overlapping inputs cannot learn the mean of the two inputs, but becomes selective to one of them. Therefore, when a discontinuity appears between two columns, there is no way to remove it. The width of Gaussian inputs influences their overlapping and therefore the appearance of discontinuities in the self-organization.
5 Coupling BCM and Neural Fields for the Emergence of Self-organization
55
The number of discontinuities in the self-organization may be minimized by the use of an unlearning mechanism. However, the spatial competition performed by the BCM equation which allows the coupling with DNF, will necessarily introduces some discontinuities. As we wanted to associate different modalities, representing different space topologies, potentially with a high dimensionality, that may be incompatible, a consensus has to emerge. In this context, the self-organization discontinuities, at the level of a modality map, caused by the use of BCM, is not a limitation and can be biologically observed, for example, in the pinwheels organization of V1 (see Bosking et al. [5]).
5.9 Conclusion We propose the coupling of the Bienenstock, Cooper and Munro (BCM) learning rule with dynamic neural fields (DNF) into a modality map model as a self-organization mechanism with a multimodal context. The BCM learning rule develops a selectivity to an input, which provides sensory information to DNF. This layer reduces the complexity and the noise of the sensory activity to a spatially localized and stereotyped activity bump representing the current perception. This perception is influenced by the multimodal context provided by an associative map linked to all modalities. We modify the BCM learning rule with the introduction of the perceptive activity feedback. This multiplicative high level modulation favours or unfavours the increase of the neuron output over the LTD/LTP sliding threshold. The spatial consistency of this activity bump is then propagated to the selectivity of the sensory layer, which self-organize. This generic architecture may enable to learn modality associations and recall missing perceptions. Spatial competition used by the BCM learning rule provides the autonomous emergence of information and allows the coupling with DNF. This competition introduces some discontinuities in the self-organization of the modality map. This is not a limitation, since for the association of different modalities, representing different space topologies that may be incompatible, a consensus has to be found. Moreover, an unlearning of the currently sensible selectivity when it is inconsistent with the received feedback may reduce these discontinuities.
References 1. Amari, S.-I.: Dynamics of pattern formation in lateral-inhibition type neural fields. Biol. Cybern. 27, 77–87 (1977) 2. Bear, M.F.: Mechanism for a sliding synaptic modification threshold. Neuron 15(1), 1–4 (1995) 3. Bienenstock, E.L., Cooper, L.N., Munro, P.W.: Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J. Neurosci. 2(1), 32–48 (1982) 4. Bonath, B., Noesselt, T., Martinez, A., Mishra, J., Schwiecker, K., Heinze, H.-J., Hillyard, S.A.: Neural basis of the ventriloquist illusion. Curr. Biol. 17(19), 1697–1703 (2007). doi:10.1016/j.cub.2007.08.050 5. Bosking, W.H., Zhang, Y., Schofield, B., Fitzpatrick, D.: Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex. J. Neurosci. 17(6), 2112–2127 (1997) 6. Burnod, Y.: An Adaptive Neural Network: The Cerebral Cortex. Masson, Paris (1990) 7. Cooper, L.N., Intrator, N., Blais, B.S., Shouval, H.Z.: Theory of Cortical Plasticity. World Scientific, Singapore (2004) 8. Cooper, L.N., Liberman, F., Oja, E.: A theory for the acquisition and loss of neuron specificity in visual cortex (1979) 9. Elbert, T., Rockstroh, B.: Reorganization of human cerebral cortex: the range of changes following use and injury. The Neuroscientist 10(2), 129 (2004) 10. Gibson, J.J.: The Theory of Affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82 (1977) 11. Girod, T., Alexandre, F.: Effects of a modulatory feedback upon the BCM learning rule. In: CNS, 2009 12. Goldring, J.E., Dorris, M.C., Corneil, B.D., Ballantyne, P.A., Munoz, D.R.: Combined eye-head gaze shifts to visual and auditory targets in humans. Exp. Brain Res. 111, 68–78 (1996). doi:10.1007/BF00229557
56
M. Lefort et al.
13. Kohonen, T.: Self-organization and associative memory. Appl. Opt. 24, 145–147 (1985) 14. Ménard, O., Frezza-Buet, H.: Model of multi-modal cortical processing: coherent learning in self-organizing modules. Neural Netw. 18(5–6), 646–655 (2005). doi:10.1016/j.neunet.2005.06.036 15. Mountcastle, V.B.: The columnar organization of the neocortex. Brain 120(4), 701 (1997) 16. Noë, A.: Action in Perception. MIT Press, Cambridge (2004) 17. Rougier, N.P., Vitay, J.: Emergence of attention within a neural population. Neural Netw. 19(5), 573–581 (2006) 18. Salinas, E., Thier, P.: Gain modulation: a major computational principle of the central nervous system. Neuron 27(1), 15–21 (2000) 19. Wallace, M.T.: The development of multisensory processes. Cogn. Process. 5, 69–83 (2004). doi:10.1007/ s10339-004-0017-z 20. Widrow, B., Hoff, M.E.: Adaptive switching circuits. Neurocomp. Fund. Res. 123–134 (1988) 21. Wilson, H.R., Cowan, J.D.: A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue. Biol. Cybern. 13(2), 55–80 (1973)
Chapter 6
Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease: A Study Using a Computational Model Basabdatta Sen Bhattacharya, Damien Coyle, and Liam P. Maguire
Abstract Electroencephalography (EEG) studies in Alzheimer’s Disease (AD) patients show an attenuation of average power within the alpha band (7.5–13 Hz) and an increase of power in the theta band (4–7 Hz). Significant body of evidence suggest that thalamocortical circuitry underpin the generation and modulation of alpha and theta rhythms. The research presented in this chapter is aimed at gaining a better understanding of the neuronal mechanisms underlying EEG band power changes in AD which may in the future provide useful biomarkers towards early detection of the disease and for neuropharmaceutical investigations. The study is based on a classic computational model of the thalamocortical circuitry which exhibits oscillation within the theta and the alpha bands. We are interested in the change in model oscillatory behaviour corresponding with changes in the connectivity parameters in the thalamocortical as well as sensory input pathways. The synaptic organisation as well as the connectivity parameter values in the model are modified based on recent experimental data from the cat thalamus. We observe that the inhibitory population in the model plays a crucial role in mediating the oscillatory behaviour of the model output. Further, increase in connectivity parameters in the afferent and efferent pathways of the inhibitory population induces a slowing of the output power spectra. These observations may have implications for extending the model for further AD research.
6.1 Introduction One of the major challenges posed by Alzheimer’s Disease (AD) is early diagnosis. Early clinical symptoms of AD cannot be distinguished from other forms of mental dementia related to advancing age. Moreover, such symptoms are often also associated with ageing of normal adults [28]. According to current diagnostic criteria, clinical confirmation of AD is possible only when the disease is in an intermediate stage of progression where irreversible damage to cells in vital cognitive areas has already occurred [13]. Therapies currently used are mainly to delay symptomatic degradation. It is strongly believed that early detection of AD might help in specifying drugs to prevent or delay onset of the disease [12, 14, 46]. Currently a major focus is on finding biomarkers associated with neuropathological changes present in the brain of AD patients several years or even decades prior to onset of cognitive deterioration. Diminishing power within the alpha band (≈7.5–13 Hz), commonly referred to in literature as ‘slowing’ of alpha rhythms, is identified as a definite marker in the EEG of AD patients [10, 27, 36]. Thus, using alpha band slowing as a marker for early stage AD has been proposed [7–9, 27]. Although such slowing of alpha rhythms are an indicator of underlying pathological aberration related with many neurological as well as psychological disorders [16, 26], early stage AD patients are ‘fairly’ B.S. Bhattacharya () University of Ulster, Magee Campus, Northland Road, Derry BT48 7JL, Northern Ireland, UK e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_6, © Springer Science+Business Media, LLC 2011
57
58
B.S. Bhattacharya et al.
distinguishable from a set of control patients and patients affected with mental depression with a correct classification rate of 77% and 72% respectively, albeit in a clinically restrained environment [27, 36]. Also, the only significant difference between untreated AD patients and those treated with a specific drug (cholinesterase inhibitor) is observed in the alpha frequency band [1]. Again, increase in power within the theta band (≈4–7 Hz) in early stages of AD has been reported [19, 38]. Furthermore, a recent study on early stage AD show abnormal EEG correlations within the alpha and theta bands in parietal and frontal regions [25]. Thus it is hypothesised that a better understanding of the underlying cause of diminishing power in the alpha band and increasing power within the theta band in AD may elucidate future therapeutic techniques as well as early diagnosis of the disease. Research suggests that the thalamus and the thalamocortical circuitry has a close functional relationship with cortical alpha rhythms [2, 22, 30, 31, 34]. Further, similar mechanisms in the thalamocortical circuitry are associated with θ and α rhythms [16, 17]. We propose using computational models of the thalamocortical circuitry to understand the underlying neuronal behaviour associated with changes in EEG in AD. More specifically, we aim to investigate the relation between the synaptic activity in the thalamocortical circuitry and EEG changes within the alpha and the theta frequency bands as seen in Alzheimer’s Disease. The hope is that the model can be developed to help understand the origins of EEG changes in AD and establish early signatures of these changes. Previously [4, 6], we validated a classic computational model of the thalamocortical circuitry proposed by Lopes da Silva [35] based on his studies on relaxed awake dogs. The biological data for the model was based on Tombol’s Golgi study of the thalamocortical neurons in an adult cat [41]. The model output mimicked alpha rhythms recorded from dog’s occipital cortex. In this work, we first present Lopes da Silva’s Alpha Rhythm model (ARm) mimicking occipital EEG and study the variation in output power corresponding to variation of the excitatory and inhibitory connectivity parameters in the model. We observe a slowing of the mean EEG frequency with an increase in the inhibitory parameter. The result is thus consistent with earlier reports [4]. However, an increase in the excitatory parameter shows an increase in the average ‘dominant frequency’ (frequency band having the maximum power content) of the model output power spectrum. Also, the output shows remarkable sensitivity to the model input, the average dominant frequency increasing with increasing values of the input mean. Next, we present a more biologically plausible structure of the ARm (consistent with recent work in [44]) as well as biologically plausible connectivity parameters based on the most recent available experimental data from the cat thalamus (as reported in [20, 33]). We observe that the model output mimics a slowing in the mean EEG observed in AD with increasing values of connectivity parameters in both the efferent and afferent pathways of the inhibitory thalamocortical cell population. Furthermore, an increase in the connectivity of the sensory afferents shows a progressive increase of the mean EEG frequency, indicating the disappearance of alpha rhythms associated with opening of the eyes and attentive states. In Sect. 6.2, we present the ARm as validated in our work followed by the modified ARm structure. In Sect. 6.3, we present the results from both the models and discuss the results in Sect. 6.4. The conclusions from the work and future directions are presented in Sect. 6.5.
6.2 Neural Mass Modelling of the Thalamocortical Circuitry Based on Golgi studies on the thalamus of an adult cat, Tombol reported that two typical kinds of nerve cell generally occur in all areas of the thalamus: the thalamocortical relay (TCR) neurons with large axons and internuncial (IN) neurons with short axons [41]. Furthermore, she estimated that more than ten afferents from outside the thalamus project on to the dendrites of one TCR cell making strong excitatory synapses on both TCRs and INs. The axons of the IN population also make strong synaptic contacts with the TCR dendrites. From the physiological dimensions reported by Tombol, Lopes
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
59
da Silva estimated that each TCR neuron receive inhibitory input from three IN neurons. The TCR population was also estimated to be making excitatory synaptic contact with the IN cell population via recurrent collaterals of their axons, although fibre count estimations of such connections were not reported. In Sect. 6.2.1, we provide an overview of Lopes da Silva’s Alpha Rhythm model where the connectivity between two cell population are based on Tombol’s research and represent ‘fibre connectivity’—i.e. the proportion of fibres (axons) converging from a source cell population to one fibre (dendrite) of a receiving cell population. Subsequent to Tombol’s Golgi studies and Lopes Da Silva’s seminal work, several research studies on thalamocortical connectivities were reported (see [20] for details). Furthermore, the thalamocortico-thalamic loop connectivities are now much better understood; it is now believed that the inhibitory connections from the Thalamic Reticular Nucleus (TRN) as well as excitatory feedback from the cortical layers to the TCR cell population play a crucial role in thalamic oscillations. Furthermore, afferents from the sub-thalamic structures, primarily the Brain Reticular Formation (BRF), are believed to play a secondary but vital role in modulating thalamic activity and are the main sources of cholinergic1 input to the thalamus [32, 33]. Thus, although it has been possible to estimate the total fibre count converging on a thalamic cell population [18], proportional representation of the fibresources at the converging site are difficult to estimate [20]. On the other hand, there is a substantial amount of data available on ‘synaptic connectivity’ i.e., the total number of synapses by afferent fibres on the dendrites of TCR, IN and TRN cells in rats, cats and monkeys. In Sect. 6.2.2, we present a modified version of the ARm where two cell population are linked through synaptic connectivity rather than fibre connectivity as considered in ARm. The synaptic connectivity parameter values are based on the most recent experimental data available from the Lateral Geniculate Nucleus (LGN) of the cat. Further, unlike ARm, the modified ARm has a definite representative structure for a single neuronal population (as in [44]). This is due to the altered nature of connectivity between the two models.
6.2.1 The Alpha Rhythm Model: An Overview Lopes da Silva presented a neural-mass model of the alpha rhythms based on Wilson and Cowan’s neural field equations which provides a mathematical definition for the behaviour of neuronal population in the thalamic and cortical tissues so densely packed that they may be assumed as a continuum [47, 48]. Thus a neuronal population is considered as a single entity having a soma which generates membrane potential V such that V is the sum of potential changes due to extrinsic and intrinsic inputs which is subsequently transformed into a mean firing rate using a sigmoid function (see (6.3)). The Alpha Rhythm model (ARm), shown in Fig. 6.1 has two cell population—an excitatory cell population representing the TCR cells and an inhibitory cell population representing the IN cells. The TCR cell population receive excitatory input from an extrinsic source Nr and is defined as: Nr = Ψμ,ϕ ,
(6.1)
where Ψμ,ϕ represents a background firing activity of neurons in the sensory pathway2 and is simulated with a Gaussian white noise having a mean μ and standard deviation ϕ. The collective synapses in the model are represented by a kernel h which was originally defined by Wilson and Cowan as h(t) = Ae−at where A is the synaptic amplitude and a = τ1rd is the delay term 1 Neurons
that use Acetylcholine (ACh) as the synaptic neurotransmitter are called cholinergic neurons, while the synapse is called a cholinergic synapse (see [23] for a review).
2 Alpha
Rhythms are dominant in cortical EEG while a subject is in a relaxed but awake state with eyes closed and thus corresponds to a lack of visual representation from the external world.
60
B.S. Bhattacharya et al.
Fig. 6.1 The Alpha Rhythm model (ARm) [35, 39] as simulated in our work with Simulink® in Matlab
where τrd is the rise-time of the kernel function h(t) [29, 47]. This was later modified by Lopes da Silva and others [35, 39, 49] to incorporate another delay term b = τf1d where τf d is the decay-time of the function h(t) and is defined as: he,i (t) = Ae,i e−ae,i t − e−be,i t ,
(6.2)
where the suffixes e or i represent parameters corresponding to either excitatory or inhibitory synapses respectively. Each neuronal population is considered to be composed of spiking neurons. Thus, the average membrane potential of each cell population is converted to a mean firing rate E(t) using a sigmoid function S and is defined as [11, 47, 49]: E(t) = S [V (t)] = g0 eγ [V (t)−V0 ] = g0 2 − e−γ [V (t)−V0 ]
∀V ≤ V0 ∀V > V0 ,
(6.3)
where V0 is the threshold spiking voltage of the neuronal population, g0 is the maximum firing rate and γ is the steepness parameter of the sigmoid function. Here it must be mentioned that Lopes da Silva’s implementation of the model was linear as he made an approximation that the operating point of the sigmoid was within its linear region. Thus, the output of the sigmoid function E(t) was assumed 1 to be directly proportional to the output V (t) with a proportionality constant q = 2130 . The total number of efferent axons from the TCR cell population making excitatory synapse on a single neuron of the IN population is represented by the connectivity parameter C1 . The mean firing rate is thus scaled up by a factor C1 before synaptic contact with subsequent cell population. The total number of IN cells sending afferents to a single neuron of the TCR population is represented by the connectivity parameter C2 . In this work we refer to C1 and C2 as the excitatory and inhibitory connectivity parameters respectively as C1 precedes the excitatory synapse (from TCR to IN cells) while C2 precedes the inhibitory synapse (from IN to TCR cells). The summed membrane potential V1 at the output of the collective soma of the TCR cell population (represented by the summation block in Fig. 6.1) is taken as the output of the model representing
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
61
Table 6.1 Values of the parameters defined in (6.1), (6.2) and (6.3). In the present work, values of the parameters defined in (6.1) are as in [5, 40] while those in (6.2) and (6.3) are as in [4, 6, 39] μ
ϕ
Ae
ae
be
Ai
ai
bi
γ
V0
g0
pps
pps2
mV
s−1
s−1
mV
s−1
s−1
mV−1
mV
s−1
312
169
1.65
55
605
32
27.5
55
0.34
7
25
Table 6.2 The values of the connectivity parameters in the two models. The ‘basal’ values of the parameters of the ARm are those for which the model showed a peak within the alpha band and are as in the original work [35]. In the modified ARm, the connectivity parameter values are based on most recent available data from the Lateral Geniculate Nucleus of a cat (as reported in [20, 33]). The range of values of each parameter from the normal (pathologically unaltered) brain are referred to here as their ‘normal range’. The basal value of each parameter in the modified ARm are set (arbitrarily) to the lowest value within the range. The ‘X’s indicate that normal range values are not relevant to connectivity parameters in ARm ARm
modified ARm
C1
C2
C1
C2
C3
Basal
30
3
30
24
7
Normal range
X
X
30–40%
24–31%
7–12%
occipital EEG with dominant frequency within the alpha band. The system may be defined as follows: E1 (t) = S [V1 (t)] V2 (t) = [C1 E1 (t) ⊗ he2 (t)] E2 (t) = S [V2 (t)]
(6.4)
V1 (t) = [Nr ⊗ he1 (t)] + [C2 E2 (t) ⊗ hi (t)] , where Nr , he,i , E and S are as defined in (6.1)–(6.3). The parameter values used in these equations are defined in Table 6.1. C1 and C2 are the connectivity parameters and are defined in Table 6.2. The model is simulated with a 4th/5th order Runge-Kutta ODE solver within the Simulink® environment in Matlab. The results are presented in Sect. 6.3.1.
6.2.2 A Modified Version of the ARm A recent study on the cat thalamic Lateral Geniculate Nucleus (dorsal) (Van Horn et al. 2000, as mentioned in [20, 33]) have reported that around 24–31% of the total synapses received by the TCR cell population are from TRN and IN combined, while only 7–12% are from the retinal afferents (the remaining, which is the majority, being from cortical and sub-thalamic sources; these are not considered in this work). There seems to be a lack of consensus in the available literature on synaptic connection from TCR cells to the IN population [32]. However, the TCR cells make excitatory contact with the TRN cells via collaterals of axons en-route to the cortex and comprise 30–40% of the total synaptic afferents to the cells. In a modified version of the ARm, we assign these biologically plausible values to the parameters C1 and C2 where C1 represent the proportion of excitatory thalamocortical synapses on a dendrite of a TRN cell while C2 represents the inhibitory synapses from the TRN population on a TCR cell dendrite. Unlike the ARm, the inhibitory population is considered to be representing the TRN since there is no available data on synaptic input from the TCR to the IN cell population. Further,
62
B.S. Bhattacharya et al.
Fig. 6.2 The structure of a single neuron in the circuit, slightly altered from ARm, and consistent with more recent research involving similar models [44]
we introduce another connectivity parameter C3 which represents the retinal/sensory synapses on the TCR cell population. The structure of a single neuron population in the model is as in [44] and shown in Fig. 6.2. Each cell population has two sets of connectivity parameters at its input where each set correspond to either an extrinsic (e.g. sensory afferents) or intrinsic input (e.g. intra-population collaterals or interpopulation feedbacks). The afferent fibres to the population from a certain type of input (extrinsic or intrinsic) are collectively considered as a unit fibre having a mean firing rate. The synapse made by this fibre is then scaled up by the corresponding connectivity parameter representing the total number of synaptic contacts corresponding to this input. This is unlike ARm where the unit fibre carrying the mean firing rate is scaled up by a connectivity parameter representing the total number of fibres in the unit prior to making a synapse (see Fig. 6.1). Implementing the above-mentioned modifications in ARm, we have a modified model structure as shown in Fig. 6.3. The TCR cell population has two inputs, one each from an extrinsic (sensory afferents) and an intrinsic (feedback from the TRN cells) source. The TRN cell population however has only one input which is intrinsic (feed-forward inputs from the TCR cells). The output of the model is the summed membrane potential at the TCR Soma (as in ARm) and may be defined by the following set of equations: E1 (t) = S [V1 (t)] V2 (t) = [E1 (t) ⊗ he2 (t)] C1 E2 (t) = S [V2 (t)]
(6.5)
V1 (t) = [Nr ⊗ he1 (t)] C3 + [E2 (t) ⊗ hi (t)] C2 , where all parameters are as defined in (6.1)–(6.3) and parameter values are as in Table 6.1. The connectivity parameters C1 , C2 and C3 are set at the lowest value of their respective biologically plausible ranges as provided in Table 6.2. These values of the connectivity parameters are referred to as their ‘basal’ values. Comparing (6.4) and (6.5) we see that the only difference is due to the parameter C3 in the last equation of the set. The differences in output of the two models (discussed in Sect. 6.3) are because of this parameter (C3 ) as well as due to different parameter ranges for C1 and C2 than in the ARm studies. Further, as in ARm, the set of equations in (6.5) are 2nd order ODE and are solved with a 4th/5th order Runge-Kutta ODE solver within the Simulink® environment in Matlab; results are presented in Sect. 6.3.2 and discussed in Sect. 6.4.
6.3 Empirical Methods and Results We vary the connectivity parameters both above and below their respective basal values as defined in Table 6.2 and study the effects on output power spectra of both models. We start by presenting the
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
63
Fig. 6.3 The modified thalamocortical circuitry as an extension of the ARm and as simulated in our work with Simulink® in Matlab. The connectivity parameter values in the model are based on most recent experimental data (from the cat Lateral Geniculate Nucleus (LGN)) available from literature [20, 33] and represent the proportion of excitatory or inhibitory synapses from respective afferents on a single dendritic terminal of a cell. An additional connectivity parameter (w.r.t. ARm) C3 is also incorporated in the model to represent proportion of synapses from sensory afferents. The structure of a single neuron in the circuit is slightly altered from ARm and is consistent with more recent research involving similar models [44]
ARm in Sect. 6.3.1 followed by the modified ARm in Sect. 6.3.2. Each set of connectivity parameter values are considered to be corresponding to different groups of subjects, each group having 20 members. Thus, for each set of connectivity parameter values, the model is simulated for 20 different random inputs with the same mean μ to simulate EEG from 20 subjects in a group corresponding to these parameter values. Further, we present a study where we vary the mean μ of the extrinsic input and observe the dependence of the output of each model on this parameter when the connectivity parameters in each model are kept at their respective basal values. For each model, the output behaviour within the theta (4–7 Hz), lower alpha (7.5–10 Hz) and upper alpha (10.5–13 Hz) bands are observed, the band frequency ranges being set as in [3, 15, 45]. The total simulation time is 10 seconds with a sampling frequency of 250 Hz. The output from each simulation is clipped on the time axis so as to abstract the values from the start of the 4th s to the end of the 8th s as is done in experimental studies of EEG. These abstracted outputs from the 20 simulations are then averaged and the output vector thus obtained is bandpass filtered between 3 and 14 Hz using a Butterworth filter of order 10 in Matlab. The power spectra of this filtered output is computed in Matlab using a Welch periodogram with hamming window of segment length 1/4th the size of the sampling frequency and overlap of 50% [4, 6, 10].
6.3.1 Study of the ARm The excitatory parameter C1 is varied over a range 0–100 at intervals of 10 while the inhibitory parameter C2 is held constant at its basal value of 3. The variation in the output power in each of the three bands is shown in Fig. 6.4(a). When C1 = 0, the maximum power is in the lower end of the spectrum and within the theta band. However, as the parameter value is increased, there is an almost a linear increase of the power within the lower alpha band, while power within the theta and upper alpha bands appear to saturate. Figure 6.4(b) shows the power spectra of the model output with varying values of C1 . A gradual shift of the dominant frequency from the theta to the lower alpha
64
B.S. Bhattacharya et al.
Fig. 6.4 (a) The average power spectral density within the theta (θ ), lower alpha (α1 ) and the upper alpha (α2 ) bands as the excitatory parameter C1 in the ARm is varied from 0–100 at intervals of 10 while the inhibitory parameter C2 is held at its basal value of 3. For each pair of parameter values, the output is averaged over a total of 20 simulations which may be thought of as 20 ‘simulated’ subjects as done in experimental set-up. (b) The output power spectral density of the ARm with varying values of C1 and bandpass filtered with low and high cut-off frequencies of 3 and 14 Hz respectively
Fig. 6.5 (a) The average power spectral density within the theta (θ ), lower alpha (α1 ) and the upper alpha (α2 ) bands as the inhibitory parameter C2 in the ARm is varied from 0–10 while the excitatory parameter C1 is held at its basal value of 30. (b) The power spectral density of the model output with varying values of C2 and bandpass filtered to retain frequency components within the theta and alpha bands
band is observed along with progressive increase in band power. Thus, there is a gradual increase in the mean frequency of the output spectrum with increasing excitatory connectivity. Next, the inhibitory parameter C2 is varied over a range 0–10 while C1 is held constant at its basal value of 30. Analyses similar to those performed for C1 are shown in Figs. 6.5(a) and 6.5(b). For C2 = 0, the maximum power is within the theta band, shown in Fig. 6.5(a), with a broad peak at the theta and lower alpha bands as shown in Fig. 6.5(b). In Fig. 6.5(a), the maximum power corresponds to C2 = 1 with a distinct peak within the lower alpha band. Thereafter the power within the alpha band sharply falls until at C2 = 6, it is less than that of the theta band. This gradual shift from peak
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
65
Fig. 6.6 The average dominant frequency in the output power spectra of the ARm shown as bar plots with increasing values of the excitatory parameter C1 (top) and the inhibitory parameter C2 (bottom). The error-bars show the variance in the dominant frequency over the total of 20 different inputs simulating 20 subjects in experimental set-up. The output power spectrum is bandpass filtered over the theta (4–7 Hz) and alpha (7.5–13 Hz) bands
power within the lower alpha band to theta band with increasing values of the inhibitory parameter is also seen in the output power spectra in Fig. 6.5(b), thus indicating a slowing of the mean output spectra. Figure 6.6 shows the average dominant frequency of the output power spectra with variation of the excitatory (top) and the inhibitory (bottom) parameters. A distinct displacement from theta to the lower alpha band is seen with increasing values of the excitatory parameter, while an increase in the inhibitory parameter shows a slowing of the output spectra with a drop in dominant frequency from the lower alpha band to the theta band. The results thus agree with those shown in Figs. 6.4 and 6.5. One-way Analysis of Variance (ANOVA) examining effects of the parameters C1 and C2 on power in each of the three bands of interest confirmed systematic effects of parameter alterations on band power (P < 0.05 in all cases). Pairwise tests are done to compare the significance of the power difference within a frequency band when the parameters are varied about their basal values. Varying the excitatory parameter above and below C1 = 30 show that the power difference is statistically significant (P < 0.05) only when C1 = 90 (we ignore the case when C1 = 0 as this would mean an absence of excitatory connectivity and thus do not have any biological significance). However, within both lower and upper alpha bands, the power difference is significant (P < 0.05) for the range C1 = 70–100. For the inhibitory parameter, when varied about its basal value C2 = 3, the results show statistical significance for values in the range C2 = 5–10 for all bands. In the lower and upper alpha bands, the results for C2 = 1–2 are also significant (ignoring the case when C2 = 0). The significance tests are done using the Statistical Toolbox™ in Matlab. An overview of the results of the pairwise significance tests is shown in Table 6.3. So far, the mean μ of the Gaussian noise input to the model has been held constant at its basal value (312 pps). At this point, we vary μ over a range from 0–700 pps at intervals of 100 while the excitatory and inhibitory parameters are set at their basal values of 30 and 3 respectively. This may be thought of as a change of initial conditions corresponding to different states of the brain. From Figs. 6.7(a) and 6.7(b) we observe that the dominant frequency in the output power spectra shows a progressive increase with increasing values of μ. Furthermore, we observe that for 250 ≤ μ ≤ 350 the average dominant frequency lies within the range 6.5–7.5 Hz i.e. at the junction of theta and lower alpha bands. Thus choice of basal value for μ seems to be crucial to the model behaviour. A discussion of the results are presented in Sect. 6.4.
66
B.S. Bhattacharya et al.
Table 6.3 A summary of the results from the pairwise statistical significant tests done on the output power spectra in the theta, lower alpha and upper alpha bands with variation of the connectivity parameters about their respective basal values in both models. Each column show the range of values of the corresponding parameter over which the output power within a band do not show significant difference (P > 0.05). Single entries for more that one frequency band in a row indicates similar significance ranges within the bands
ARm
Modified ARm
θ
Lower α
Upper α
C1
0 < C1 < 90
0 < C1 < 60
C2
C2 < 5
C1
23 < C1 < 38
20 < C1 < 38
23 < C1 < 44
C2
20 < C2 < 40
18 < C2 < 28
C2 > 18
C3
7 ≤ C3 < 10
C3 = 7
3 < C3 < 11
10 < C1 < 60 2 < C2 < 5
Fig. 6.7 (a) The power spectral density of the ARm output as the input mean μ is varied from 0–700 at intervals of 100 and while the excitatory and inhibitory parameters are held at their basal values (defined in Table 6.2). (b) The average dominant frequency in the output power spectra of the ARm shown as bar plots with increasing values of μ, while the error-bars show the variance
6.3.2 Study of the Modified ARm The ‘normal range’ (range of values observed in pathologically unaffected brains of animals as reported in [20, 33]) of each connectivity parameter are as provided in Table 6.2. We vary C1 and C2 within ±10 of the maximum and minimum values within their respective normal range.3 For C3 the variation range is ±5 of the maximum and minimum values respectively (since the minimum value within the normal range is 7). The varying values of C1 , C2 and C3 are provided along the abscissa of Figs. 6.8(a), 6.8(c) and 6.8(e) respectively (see figure caption). The bar plots in Fig. 6.9 show the mean dominant frequency with variations of the three parameters along with the standard error about the mean. The results are explained below and discussed further in Sect. 6.4. In Fig. 6.8(a) we observe that the dominant frequency lies within the theta band and increases with increasing values of the parameter. Power within both lower and upper alpha bands show a very slow increase with increasing values of the parameter. The power spectrum in Fig. 6.8(b) shows a slight 3 The
set of varying values for each connectivity parameter are different and over a smaller range compared to that in ARm. Preliminary investigation over smaller parameter ranges with ARm are reported elsewhere [4, 6].
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
67
Fig. 6.8 Power within the theta (θ ), lower alpha (α1 ) and the upper alpha (α2 ) bands with variation of the parameters (a) C1 , (c) C2 and (e) C3 of the modified ARm about their normal range (defined in Table 6.2). The values of each parameter while varying as presented here are: C1 = [20 23 26 28 30 32 34 36 38 40 41 44 47 50]; C2 = [14 16 18 20 22 24 26 28 30 32 34 36 38 41]; C3 = [1 3 5 7 8 9 10 11 12 13 14 16 18 20]. The power spectral densities of the model output with variation of (b) C1 , (d) C2 and (f) C3
68
B.S. Bhattacharya et al.
Fig. 6.9 The average dominant frequency within the theta (4–7 Hz) and the alpha (7.5–13 Hz) bands in the output power spectra of the modified ARm shown as bar plots with increasing values of the excitatory parameter (a) C1 , (b) C2 and (c) C3 . The error-bars show the variance in the dominant frequency over the total of 20 different inputs simulating 20 subjects in experimental set-up
slowing in the mean dominant frequency for 20 ≤ C1 ≤ 30. There is a significant increase in the peak power for C1 > 30 (Fig. 6.8(b)) while the average dominant frequency decreases slightly till approx. C1 = 40 (Fig. 6.9(a)). For C1 > 40, the average dominant frequency plot is fairly flat as observed in Fig. 6.9(a). For the parameter C2 , the power within the theta band show band-pass characteristics in Fig. 6.8(c)—for C2 = 24–34, the power shows a very slight increase with increasing values of the parameter. For C2 < 24 and C2 > 34, the power shows a linear decrease. On the other hand, power within the lower alpha band show low-pass characteristics—i.e. the power is maximum within the lower alpha band for C2 < 20. At C2 ≈ 20 the plot intersects the theta-band power plot and rolls over showing a constant decrease with further increase in C2 values. Power within upper alpha band show a band-pass characteristics with the pass band for C2 ≥ 24 and C2 ≤ 34. The power spectrum in Fig. 6.8(d) also shows a progressive decrease of the dominant frequency for increasing values of C2 , although the peak power increases and then decreases. An overall slowing of the mean EEG is observed in Fig. 6.9(b) where we see an approximately linear decrease in the average dominant frequency with increasing values of the inhibitory parameter. In Fig. 6.8(e), a sharp decrease in power in the theta and lower alpha bands is observed for C3 < 7. Overall power is maximum within the lower alpha band and increases linearly for 5 ≤ C3 ≤ 18 after which it shows a roll over. The power within the theta band shows a band-pass characteristics with very
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
69
slight increase in power as C3 increases from 7–11. For C3 < 7, there is a steep roll over. However, for C3 ≥ 12, the roll over is more subtle. The power within this band is greater than that within the lower alpha band for C3 ≤ 9. Beyond this point, the dominant frequency is within the lower-alpha band. The power within the upper alpha band shows a faster rise for C3 > 12 until it is equal to the (falling) theta-band power at C3 = 20. The power spectrum in Fig. 6.8(f) show a steady shift in the dominant frequency from within the theta to the lower alpha band along with a steady increase in the peak power. The increase of the average dominant frequency with values of C3 > 5 is also shown in Fig. 6.9(c). The plot is fairly flat for C3 ≤ 5. A series of one-way ANOVAs (done using the Statistical Toolbox™ in Matlab) show that the effects of variation of all three parameters on band power are statistically significant (P < 0.05). A pairwise comparison for statistical significance show that power in all the three bands corresponding to values of C1 < 20 and C1 > 44 are significantly different (P < 0.05) from that for C1 = 30, the basal value of the parameter in this work. For C2 = 24 (basal value), power difference in all three bands are significant for C2 < 20 while difference in the theta and alpha band powers are also significant for C2 > 40 and C2 > 28 respectively. When C3 = 7 (basal value), a statistically significant difference in theta-band power (P < 0.05) is observed for C3 > 10 while within the lower and upper alpha bands, statistically significant difference is observed for values of C3 > 7 and C3 > 11 respectively. A summary of these results is provided in Table 6.3. Varying the input mean in this model gave similar results as shown in Fig. 6.7 and are not shown here explicitly for brevity.
6.4 Discussion In this section we discuss the results in the biological context and answer the question—does the model output corresponding to variation of the connectivity parameters show EEG characteristics associated with early stage Alzheimer’s Disease? We first discuss the results of the ARm followed by those of the modified ARm. The output power spectra of the ARm show an overall decrease in the average dominant frequency for increasing values of the inhibitory parameter. Further, an increase in the average dominant frequency is seen with increasing values of the excitatory parameter, although the increase is much slower and shows a ‘roll-over’ characteristic for larger values of the parameter. These results are in agreement with our previous reports [4, 6] where we have showed slowing of EEG within the alpha band with increase of the inhibitory parameter in the ARm. Overall slowing of the power spectra corresponds to an almost exponential decrease of power within the lower alpha band so that it falls below the theta-band power for higher values of the inhibitory parameter. The theta-band power, on the other hand, first decreases and then remains fairly ‘flat’ with increasing values of the inhibitory parameter while power within the higher alpha band decreases. However, early stage AD is associated with an increase in power within the theta and lower alpha bands and a decrease in power within the upper alpha bands [19, 27, 37]. Thus, although there is an overall slowing of the power spectrum, the behaviour within individual frequency bands do not conform to that associated with AD. Moreover, increase in power within both alpha bands is seen with increasing values of the excitatory parameter while the power within the theta band does not show any marked change for the same; this leads to an overall increase in the mean dominant frequency within the power spectrum. Again, power within the theta and lower alpha bands do not show EEG characteristics associated with AD. In the modified ARm, we observe a slowing in the average dominant frequency of the output power spectra with increasing values of both the excitatory and inhibitory connectivity parameters. Furthermore, we observe an increase in the theta-band power when the parameter values are within a certain range, while the power within the lower alpha band shows a ‘suppressed’ (fairly flat with
70
B.S. Bhattacharya et al.
little fluctuation) behaviour with increasing values of the excitatory parameter. Increasing the values of the inhibitory parameter beyond a threshold shows a decrease in the power within the lower alpha band. Power within the upper alpha band is significantly smaller than that within the other two bands and seems to have little effect on the overall dominant frequency characteristics of the model output. These observations are very much in agreement with the slowing of mean EEG reported in AD with an increase in power within the theta band in early stages of the disease. Such a biologically conforming behaviour in the model may be attributed to the introduction of biologically plausible structure and parameter values in the model (see Sect. 6.2). We explain the results as follows. The inhibitory population in the modified ARm corresponds to cells of the TRN. An increase in excitatory parameter values would thus indicate increased synaptic activation of the TRN cells by the TCR cells. This increased excitatory input to the TRN cells seems to induce suppression of higher frequency components in the model output while enhancing oscillatory components within the lower frequency band. The results indicate an involvement of the TRN population in oscillatory behaviour within lower frequency bands as reported in experimental studies [24]. Moreover, afferents to the TRN are reported to be pathologically altered in AD [42]. Thus increase in excitatory synaptic activity of the thalamocortical afferents to the TRN might also correspond to a pathological alteration in this pathway in AD leading to a slowing of the EEG power spectra. An increase in inhibitory parameter in the model, on the other hand, corresponds to an increase in the inhibitory feedback from the TRN cell population to the TCR population. Again, this results in suppression or decrease of the power within the alpha band and an increase of power within the theta band, leading to an overall decrease in the dominant frequency of the power spectra. Thus, the TRN seems to play a crucial role in mediating the behavioural traits of the model output and conforms to biological studies showing important role of inhibitory feedback from the TRN in shaping the oscillatory behaviour of the thalamo-cortico-thalamic circuitry [21]. Overall, the results suggest that synaptic connectivity in the afferent and efferent pathways of the Thalmic Reticular Nucleus might provide crucial clues to a better understanding of EEG slowing in AD. For the parameter C3 in the model representing the excitatory synaptic connectivity of the retinal afferents with the TCR cells, the average dominant frequency show a significant increase with increasing values of the parameter. Conversely, we may state that the power spectra shows diminished dominant frequency as the retinal synaptic connectivity is decreased. Other than reduced connectivity, a decrease in C3 might also correspond to decreased synaptic ‘activity’ in the retinal pathway and thus could implicate a deficiency in visual cognition observed in AD [43]. It may be noted that there is an interesting difference in behaviour of the model output spectra corresponding to increasing excitatory synaptic connectivities C1 and C3 , albeit in different pathways (see Figs. 6.9(a) and 6.9(c)). This supports our contention of the significant role that the TRN plays in controlling thalamocortical oscillatory behaviour. A crucial parameter in both models seems to be the mean μ of the input corresponding to the background firing frequency in the sensory pathway (when the eyes are closed and there is a lack of visual information). Higher values of the input would mean an increase in activity in the sensory pathway, probably implying resumption of sensory information associated with opening of the eyes. Thus increase in dominant frequency with increasing values of the mean input might indicate a disappearance of low frequency components from the EEG associated with eye opening and augmented attentive states. In the context of AD, we speculate that the reduced levels of visual cognition observed in AD [43] might be correlated with a lower firing rate in the sensory pathway, leading to a lower mean of the dominant frequency in the cortical EEG. From an engineering perspective, however, the results indicate that to design a model such that it oscillates within a desired frequency band, corresponding to a desired set of basal parameter values, the mean of the input to the model needs to be tuned such that the output oscillates within this band.
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
71
6.5 Conclusion and Future Work The goal of the current work is to mimic the slowing of mean EEG observed in patients with Alzheimer’s Disease (AD) using thalamocortical neural mass models and thus try to understand the underlying neuronal mechanism associated with such EEG changes. In this work, we have presented a study of two models—(a) the Alpha Rhythm model (ARm) originally proposed by Lopes da Silva which is a classic computational model of the thalamocortical circuitry to mimic the cortical alpha rhythms observed in EEG of a subject in a relaxed but awake state and with eyes closed; (b) a modification of the ARm by introducing biologically plausible synaptic connectivity parameters and parameter values based on most recent available data. We study the power spectra of the output of each model in response to variation in the connectivity parameters about their ‘base’ values (defined in Table 6.2). The power spectra are studied within the theta (4–7 Hz), lower alpha (7.5–10 Hz) and upper alpha (10.5–13 Hz) bands. While slowing in power spectra of the ARm output corresponding to increase in inhibitory parameter values conform to slowing of EEG observed in AD, the behaviour of the power content within individual bands do not agree with reports from experimental studies. In the modified version of the ARm, however, we see a slowing of the power spectrum corresponding to increase in both excitatory and inhibitory parameters in the thalamocortical pathway, implying an important role of the Thalamic Reticular Nucleus (TRN) in defining oscillatory characteristics of the thalamo-cortico-thalamic circuitry. Furthermore, the results justify reports suggesting a pathological alteration in the TRN afferent pathways in AD. The connectivity in the sensory afferent pathway also affects the output implying a possible effect of visual cognition deficiency reported in AD on EEG characteristics. Thus, we observe a biological plausibility in the behaviour of the modified model output compared to that of ARm. This biological conformity may be attributed firstly to the introduction of the connectivity parameter corresponding to sensory afferents in the model, and secondly to biologically plausible values assigned to the connectivity parameters informed by recent studies on mammalian thalamus. Although the modification to ARm introduced in this work is a step towards implementing biologically plausibility in the model, it remains a fairly simple model and is constrained in its lack of inclusion of cortical as well as cholinergic inputs, both of which are believed to be major factors in functional changes associated with AD. While cholinergic pathways are the most commonly and widely discussed within the neuropathology of AD, cortico-cortical and cortico-thalamic connectivities have been reported to play a vital role in the thalamocortical oscillatory behaviour. This is supported by experimental studies showing that the majority of synaptic contacts on thalamic dendrites are from cortical sources. Thus, future work would involve incorporating these circuit elements in the model. We anticipate that this will shed further light on the interplay between various connectivity parameters and afferent pathways underlying EEG changes associated with AD. Acknowledgements This work is supported by the Northern Ireland Department for Education and Learning under the Strengthening the All Island Research Base programme. B. Sen Bhattacharya would like to thank Dr. David Watson for valuable comments and suggestions on the work and several useful discussions from time to time.
References 1. Basar, E., Guntekin, B.: A review of brain oscillations in cognitive disorders and the role of neurotransmitters. Brain Res. 1235, 172–193 (2008) 2. Basar, E., Schurmann, M., Basar-Eroglu, C., Karakas, S.: Alpha oscillations in brain functioning: an integrative theory. Int. J. Psychophysiol. 26, 5–29 (1997) 3. Bennys, K., Rondouin, G., Vergnes, C., Touchon, J.: Diagnostic value of quantitative EEG in Azheimer’s Disease. Clin. Neurophysiol. 31, 153–160 (2001)
72
B.S. Bhattacharya et al.
4. Bhattacharya, B.S., Coyle, D., Maguire, L.P.: A computational modelling approach to investigate alpha rhythm slowing associated with Alzheimer’s Disease. In: Proceedings of the Conference on Brain Inspired Cognitive Systems (BICS), Madrid, Spain, pp. 382–392 (2010) 5. Bhattacharya, B.S., Coyle, D., Maguire, L.P.: Intra- and inter-connectivity influences on event related changes in thalamocortical alpha rhythms. In: Proceedings of the conference on Biologically-Inspired Computation: Theories and Applications (BIC-TA), Liverpool, United Kingdom, pp. 1685–1692 (2010). ISBN 978-1-4244-6439-5 6. Bhattacharya, B.S., Coyle, D., Maguire, L.P.: Thalamocortical circuitry and alpha rhythm slowing: an empirical study based on a classic computational model. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, pp. 3912–3918 (2010) 7. Brenner, R.P., Reynolds, C.F., Ulrich, R.F.: Diagnostic efficacy of computerized spectral versus visual EEG analysis in elderly normal, demented and depressed subjects. Electroencephalogr. Clin. Neurophysiol. 69, 110–117 (1988) 8. Brenner, R.P., Reynolds, C.F., Ulrich, R.F.: EEG findings in depressive pseudodementia and dementia with secondary depression. Electroencephalogr. Clin. Neurophysiol. 72, 298–304 (1989) 9. Brenner, R.P., Ulrich, R.F., Spiker, D.G., Sclabassi, R.J., Reynolds, C.F., Marin, R.S., Boller, F.: Computerized EEG spectral analysis in elderly normal, demented and depressed subjects. Electroencephalogr. Clin. Neurophysiol. 64, 483–492 (1986) 10. Cantero, J.L., Atienza, M., Gomez-Herrero, G., Cruz-Vadell, A., Gil-Neciga, E., Rodriguez-Romero, R., GarciaSolis, D.: Functional integrity of thalamocortical circuits differentiates normal ageing from mild cognitive impairment. Hum. Brain Mapp. 30, 3944–3957 (2009) 11. Stam, C.J., Pijn, J.P.M., Suffczy´nski, P., da Silva, F.H.L.: Dynamics of the human alpha rhythm: evidence for non-linearity? Clin. Neurophysiol. 110, 1801–1813 (1999) 12. Cummings, J.L., Vinters, H.V., Cole, G.M., Khachaturian, Z.S.: Alzheimer’s Disease: Etiologies, pathophysiology, cognitive reserve, and treatment opportunities. Neurology 51(Suppl 1), 2–17 (1998) 13. Siemers, E.M.: Advances in biomarkers and modelling for the development of improved therapeutics: early Alzheimer’s treatment. Abstract of talk at: 1st International Congress on Alzheimer’s Disease and Advanced Neurotechnologies (2010) 14. Geula, C.: Abnormalities of neural circuitry in Alzheimer’s Disease. Neurology 51(Suppl 1), 18–29 (1998) 15. Hogan, M., Swanwick, G.R.J., Kaiser, J., Rowan, M., Lawlor, B.: Memory-related EEG power and coherence reductions in mild Alzheimer’s Disease. Int. J. Psychophysiol. 49, 147–163 (2003) 16. Hughes, S.W., Crunelli, V.: Thalamocortical mechanisms in EEG alpha rhythms and their pathological implications. The Neuroscientist 11(4), 357–372 (2005) 17. Hughes, S.W., Lorincz, M., Cope, D.W., Blethyn, K.L., Kekesi, K.A., Parri, H.R., Juhasz, G., Crunelli, V.: Synchronised oscillations at α and θ frequencies in the Lateral Geniculate Nucleus. Neuron 42, 253–268 (2004) 18. Izhikevich, E.M., Edelman, G.M.: Large-scale model of mammalian thalamocortical systems. Proc. Natl. Acad. Sci. USA 105(9), 3593–3598 (2008) 19. Jeong, J.: EEG dynamics in patients with Alzheimer’s disease. Clin. Neurophysiol. 115, 1490–1505 (2004) 20. Jones, E.G.: The Thalamus, Vols. I and II, 1st edn. Cambridge University Press, Cambridge (2007) 21. Kim, U., Sanchez-Vives, M.V., McCormick, D.A.: Functional dynamics of gabaergic inhibition in the thalamus. Science 278, 130–134 (1997) 22. Llinas, R.: The intrinsic electrophysiological properties of mammalian neurons: insights into central nervous system function. Science 242, 1654–1664 (1988) 23. McCormick, D.A.: Acetylcholine: distribution, receptors, and action, pp. 91–101 (1989) 24. McCormick, D.A., Bal, T.: Sleep and arousal: thalamocortical mechanisms. Annu. Rev. Neurosci. 20, 185–215 (1997) 25. Montez, T., Poil, S.S., Jones, B.F., Manshanden, I., Verbunt, J.P., van Dijk, B.W., Brussaard, A.B., van Ooyen, A., Stam, C.J., Scheltens, P., Linkenkaer-Hansen, K.: Altered temporal correlations in parietal alpha and prefrontal theta oscillations in early-stage Alzheimer’s Disease. Proc. Natl. Acad. Sci. USA 106(5), 1614–1619 (2009) 26. Niedermeyer, E.: Alpha rhythms as physiological and abnormal phenomena. Int. J. Psychophysiol. 26, 31–49 (1997) 27. Prinz, P.N., Vitiello, M.V.: Dominant occipital (alpha) rhythm frequency in early stage Alzheimer’s Disease and depression. Electroencephalogr. Clin. Neurophysiol. 73, 427–432 (1989) 28. Raji, C.A., Lopez, O.L., Kuller, L.H., Carmichael, O.T., Becker, J.T.: Age, Alzheimer Disease, and brain structure. Neurology 73, 1899–1905 (2009) 29. Rodrigues, S., Chizhov, A.V., Marten, F., Terry, J.R.: Mappings between a macroscopic neural-mass model and a reduced conductance-based model. NeuroImage 102, 361–371 (2010) 30. Romei, V., et al.: On the role of prestimulus alpha rhythms over occipito-parietal areas in visual input regulation: correlation or causation? J. Neurosci. 30, 8692–8697 (2010) 31. Schreckenberger, M., Lange-Asschenfeld, C., Lochmann, M., Mann, K., Siessmeier, T., Buchholz, H.-G., Bartenstein, P., Gründer, G.: The thalamus as the generator and modulator of EEG alpha rhythm: a combined PET/EEG study with lorazepam challenge in humans. NeuroImage 22(2), 637–644 (2004)
6 Alpha and Theta Rhythm Abnormality in Alzheimer’s Disease
73
32. Sherman, S.M.: Thalamus. Scholarpedia 1(9), 1583 (2006) 33. Sherman, S.M., Guillery, R.W.: Exploring the Thalamus, 1st edn. Academic Press, New York (2001) 34. da Silva, F.H.L.: Neural mechanisms underlying brain waves: from neural membranes to networks. Electroencephalogr. Clin. Neurophysiol. 79, 81–93 (1991) 35. da Silva, F.H.L., van Lierop, T.H.M.T., Schrijer, C.F., van Leeuwen, W.S.: Essential differences between alpha rhythms and barbiturate spindles: spectra and thalamo-cortical coherences. Electroencephalogr. Clin. Neurophysiol. 35, 641–645 (1973) 36. Soininen, H., Reinikainen, K., Partanen, J., Helkala, E.-L., Paljärvi, L., Riekkinen, P. Sr.: Slowing of electroencephalogram and choline acetyltransferase activity in post mortem frontal cortex in definite Alzheimer’s Disease. Neuroscience 49(3), 529–535 (1992) 37. Soininen, H., Reinikainen, K., Partanen, J., Mervaala, E., Paljarvi, L., Helkala, E.-L., Riekkinen, P. Sr.: Slowing of the dominant occipital rhythm in electroencephalogram is associated with low concentration of noradrenaline in the thalamus in patients with Alzheimer’s disease. Neurosci. Lett. 137, 5–8 (1992) 38. Stam, C.J.: Use of magnetoencephalography (MEG) to study functional brain networks in neurodegenerative disorders. J. Neurolog. Sci. 289, 128–134 (2010) 39. Suffczy´nski, P.: Neural dynamics underlying brain thalamic oscillations investigated with computational models. Ph.D. thesis, Institute of Experimental Physics, University of Warsaw (October 2000) 40. Suffczy´nski, P., et al.: Event-related dynamics of alpha rhythms: a neuronal network model of focal ERD/surround ERS. In: Pfurtscheller, G., da Silva, F.H.L. (eds.) Handbook of Electroencephalography and Clinical Neurophysiology, Revised series, pp. 67–85. Elsevier, Amsterdam (1999) 41. Tombol, T.: Short neurons and their synaptic relations in the specific thalamic nuclei. Brain Res. 3, 307–326 (1967) 42. Tourtellotte, W.G., Hoesen, G.W.V., Hyman, B.T., Tikoo, R.K., Damasio, A.R.: Afferents of the thalamic reticular nucleus are pathologically altered in Alzheimer’s Disease. J. Neuropathol. Exp. Neurol. 48(3), 336 (1989) 43. Uhlhaas, P.J., Pantel, J., Lanfermann, H., Prvulovic, D., Haenschel, C., Maurer, K., Linden, D.E.J.: Visual perceptual organization deficits in Alzheimer’s dementia. Dement. Geriatr. Cogn. Disord. 25(5), 465–475 (2008) 44. Ursino, M., Cona, F., Zavaglia, M.: The generation of rhythms within a cortical region: Analysis of a neural mass model. NeuroImage 52(3), 1080–1094 (2010) 45. Wada, Y., Nanbu, Y., Jiang, Z.-Y., Koshino, Y., Yamaguchi, N., Hashimoto, T.: Electroencephalographic abnormalities in patients with presenile dementia of the Alzheimer type: quantitative analysis at rest and during photic stimulation. Biol. Psychiatry 41, 217–225 (1997) 46. Waugh, W.H.: A call to reduce the incidence of Alzheimer’s Disease. J. Appl. Res. 10(2), 53–57 (2010) 47. Wilson, H.R., Cowan, J.D.: Excitatory and inhibitory interaction in localized populations of model neurons. J. Biophys. 12, 1–23 (1972) 48. Wilson, H.R., Cowan, J.D.: A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue. Kybernetik 13, 55–80 (1973) 49. Zetterberg, L.H., Kristiansson, L., Mossberg, K.: Performance of a model for a local neuron population. Biol. Cybern. 31, 15–26 (1978)
Chapter 7
Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Tapani Raiko and Harri Valpola
Abstract We study the emergent properties of an artificial neural network which combines segmentation by oscillations and biased competition for perceptual processing. The aim is to progress in image segmentation by mimicking abstractly the way how the cerebral cortex works. In our model, the neurons associated with features belonging to an object start to oscillate synchronously, while competing objects oscillate with an opposing phase. The emergent properties of the network are confirmed by experiments with artificial image data.
7.1 Introduction The success of animals depends on their ability to extract relevant information from their environment. Since animals have got very good at this during evolution, engineers have a lot to learn from the computational principles underlying the perceptual abilities of animals. In mammals, the cerebral cortex is largely responsible for interpretation and extraction of relevant information. The perceptual abilities of cerebral cortex include learning to make relevant sensory discriminations, segmentation of objects and attentional selection of relevant objects. In this article, we focus on segmentation. However, it is important to recognise that segmentation cannot be isolated from the other components. For example, when segmenting objects in a visual scene, one typically has to be able to recognise the individual objects to know which parts go with which object. On the other hand, recognition usually requires one to segment out the object first. It is therefore impossible to achieve good performance by neatly separating the process into two consecutive stages of segmentation and recognition. Nevertheless, this is what is usually done in machine vision: first segment the object and then recognise it. The cerebral cortex seems to take a different approach: segmentation and recognition are combined into an iterative, dynamical process. The goal of this article is to study such a process from an engineering perspective. We combine bottom-up biological inspiration with top-down demands and restrictions of the engineering problem of interpreting and extracting useful information for the environment. This article is an extension of a conference paper [10]. In the conference version, we only had results on the simplified segmentation problem (see Sect. 7.5.1), where only one feature for each object in each area could be active, which makes the problem nonrealistic. In the current version, we also include a more realistic segmentation problem (see Sect. 7.5.2) where many features can be active. The rest of the article is structured as follows. Section 7.2 gives a background on segmentation by coupled oscillators and on biased competition model for attention and learning. In Sect. 7.3 we T. Raiko () Department of Information and Computer Science, Aalto University, Helsinki, Finland e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_7, © Springer Science+Business Media, LLC 2011
75
76
T. Raiko and H. Valpola
Fig. 7.1 (a) Because of the Gestalt principles, a circle is perceived rather than some other grouping of the lines. (b) Gestalt grouping of the neurons. The lines are features coded by different neurons. The shades of gray illustrate the connection strengths between the neurons on the right and the neuron on the left, darker meaning stronger. The lateral connections are stronger when the Gestalt principle is better fulfilled
list the desired emergent properties that we would like to have in our model. We start from high-level phenomena and work our way towards details. Section 7.4 gives the definition of a model that has such properties, which is then experimentally confirmed in Sect. 7.5, followed by discussion in Sect. 7.6.
7.2 Background In this section, we will first introduce Gestalt principles which form the basis for segmentation. Next, we discuss segmentation by coupled oscillators. Finally, we introduce biased competition model for attention and learning. Our goal in this paper is to apply segmentation with coupled oscillators to models which use biased competition for perception and learning.
7.2.1 Gestalt Principles When we humans see a new object, we may not know its identity but we can nevertheless tell what is part of the object and what is not. In other words, we are able to segment out an object without having seen it before. In perceptual psychology, the rules of the organisation of perceptual scenes are called Gestalt principles [12]. Psychologists have identified several principles, such as proximity, common fate, similarity, continuity, closure, symmetry and convexity. The Gestalt principle of continuity is illustrated in Fig. 7.1a, where the human visual system groups some of the line segments to form a circle. What makes the Gestalt principles interesting in the current context is that they can be learnt from data. In neural terms, the Gestalt principles can be implemented by giving positive connections between certain neurons in one area and some other neurons in an adjacent area. Learning the connections can be based on simple correlations found in the data. For example, features responding to lines of certain orientation in one part of the visual field are more probably co-activated with features of similar orientation in some other part of the visual field. This mechanism is illustrated in Fig. 7.1b. These “neural” Gestalt rules can be learnt from the data and they operate on the level on individual feature-coding neurons. The principle is therefore applicable to any modality and also between modalities unlike, for example, many segmentation procedures that make use of the spatial structure
7 Oscillatory Neural Network for Image Segmentation with Biased
77
of visual images. Moreover, the neural Gestalt rules can be learnt locally and in parallel. In the visual domain this means that the local correlations found in familiar objects generalise to new objects which have different overall shapes but nevertheless obey the same local correlations.
7.2.2 Segmentation by Coupled Oscillators In the cerebral cortex, the representation of objects is distributed. For instance, colour and movement are represented in different visual cortical areas in humans and other primates. Moreover, the brain represents the objects in many levels of abstraction. For instance, the cerebral cortex can recognise and represent the identity (lion or hammer), category (predator or tool) and emotional significance (dangerous or useful) of objects. Since there are typically multiple objects present in the world, the brain needs to represent which feature belong together. Von der Malsburg [9] suggested that the brain would achieve this by a temporal code, by synchronising the neural firing of the neurons which represent features belonging together. Such object-specific synchronisation was found experimentally by Gray and Singer [6] and Eckhorn et al. [5]. From an engineering perspective, segmentation by synchronisation of coupled oscillators seems to be an attractive option because it should be able to work in a hierarchical neural network. Such networks are used in practice when recognising objects. It should therefore be possible to combine segmentation and recognition into an iterative process that solves both problems at the same time. Modelling work has shown that coupled oscillators are able to synchronise their firing under suitable conditions (for a review, see e.g. [7]). This property has also been used for segmentation in many models (e.g., [2, 15]; for a recent review, see [13]).
7.2.3 Biased Competition Model for Attention and Learning Based on psychophysical experiments, contextual (predominantly top-down) biasing of local lateral competition had been proposed as a model of covert attention in humans [4]. Usher and Niebur [14] then suggested a computational model for biased competition that has been shown to replicate many attentional phenomena, for instance both bottom-up and top-down aspects of attention [3]. Deco and Rolls [3] also showed that it is possible to learn the weights for contextual biasing by the mechanism outlined in Sect. 7.2.1. In other words, the neural Gestalt rules can be applied in a relatively straight-forward manner to implement selective attention. Yli-Krekola [17] combined biased competition model with competitive learning. This model not only supports attention but forms a good basis for learning hierarchical feature representations suitable for categorising objects. The model already includes the lateral connections which can be used for segmentation. However, the model selects just one object. This makes it difficult to correctly segment objects if there are features which could belong to more than one object. For segmentation, it would usually be better that each feature is assigned to the object where it fits best. Such an assignment is in practice possible only if the model represents several objects at once or sequentially. Yli-Krekola et al. [18] added a habituation mechanism to the model. The purpose was to make the model switch between different objects and thus improve the segmentation capability. In this paper we further develop the model by replacing the habituation mechanism by oscillators. Essentially we are combining coupled-oscillator models with biased competition model. This development is a step towards models which would support learning, segmentation and selection of relevant objects in a hierarchical network.
78
T. Raiko and H. Valpola
7.3 Design for Emergence Our goal is to combine the oscillator ideas (Sect. 7.2.2) with the biased competition model (Sect. 7.2.3). Both oscillations and competition are found on cerebral cortex so they should be compatible. The overall structure of our network is such that there are so called areas that correspond to patches in the image. The areas get bottom-up input from the pixels. The areas should be connected to each other with local interactions only, that is, there is no hierarchy or global signals. The different areas should work in the same way, using the same algorithms. We start from four high-level emergent properties and continue with requirements for one area that would lead to those properties.
7.3.1 Expected Emergent Properties A1 The network should integrate information from local patches. When seeing an object, the local features belonging to it should start to oscillate. The local features strengthen each other if the features are compatible, for instance in the sense that an edge is continuous. These pairwise connections translate into the gain of the whole population corresponding to the object. A2 When a scene contains many objects or many different interpretations, the object with globally highest gain emerges. The attention is thus drawn to the most obvious object. A3 The oscillations of an object should synchronise internally and completely. Not only should pieces of objects synchronise to each other, but the synchrony should spread to the whole population. A4 When there are multiple objects in a scene, they should desynchronise between each other. When using only local connections, this can of course only happen when the objects overlap. This should not be considered as restrictive as it might first sound. When building a hierarchical representation of the scene, objects do “overlap” on high-level representations with large receptive fields, even if they do not on the pixel level or on the local feature level.
7.3.2 Requirements for One Area We hope to accomplish the above emergent high-level properties by incorporating the following lowlevel properties in a single area. B1 Not many neurons (e.g., less than one percent) should be active at the same time. This is required for attention (A2). B2 The activity or oscillation of a single neuron should not grow without limits regardless of the strength of the bottom-up input or lateral support. B3 Without any lateral support from other areas, the neuron should not start to oscillate clearly. B4 With constant lateral support, the oscillation should become clearer. B5 With oscillating lateral support, the neuron should oscillate with the same frequency and phase as the lateral input. B6 The amplitude of the oscillation should describe the bottom-up activation and the oscillation of the lateral support. The requirements B2–B6 are the basic requirements for the oscillatory system in general (A1). B7 As the different neurons in an area compete for their time to become active (as required in B1), their oscillations should become antisynchronised, that is, the phase differences in lateral inputs should become even stronger in the activations. This should lead into global desynchronisation (A4).
7 Oscillatory Neural Network for Image Segmentation with Biased
79
Table 7.1 Values of parameters used in the experiments α1
α2
α3
β1
β2
γ
δ
ε1
ε2
ζ1
ζ2
η1
η2
θ
κ
0.4
0.05
0.01
0.2
0.2
0.05
0.3
0.01
0.01
0.4
0.12
0.09
0.09
0.02
0.7
Fig. 7.2 An experiment with a single neuron. The time series of the activity x(t) is shown in black, and the inhibition y(t) in green. The bottom-up input u(t) is one until t = 100 and zero after that
B8 The change from one neuron being active to another being active, should happen quickly and completely. This should help in finding a clear separation between the segments (A3).
7.4 Implementation We propose a model that has the above properties. Since the full model is rather complicated, we start by defining a single oscillator and show how it works, and only then expand it to the full model with several areas each consisting of several neurons.
7.4.1 Single Oscillator Model We start by introducing a single oscillator (or neuron) consisting of a pair of signals, called activity x(t) and inhibition y(t). The pair works a bit like cos and sin functions that are each other’s derivatives with one minus sign: cos (t) = − sin(t) and sin (t) = cos(t). The system is activated by a bottom-up input u(t) and it has intrinsic Gaussian noise nk (t). (7.1) x(t + 1) = (1 − α1 )x(t) − β1 y(t) + γ u(t) + ε1 n1 (t) + /δ y(t + 1) = (1 − α2 )y(t) + β2 x(t) + ε2 n2 (t) + (7.2) nk (t) ∼ N(0, 1) [x]+ = max(0, x) =
(7.3) x + |x| 2
(7.4)
We use a nonlinearity [·]+ to keep all the signals positive. This is important to ensure that the negative terms such as the inhibition −β1 y(t) cannot activate the system. Parameters α, β, . . . are positive numbers whose values are given in Table 7.1. Figure 7.2 shows how the system responds to a bottom-up input that first activates the oscillator at time after t = 0, and then shuts it down after t = 100. As required in B3, the oscillation does not have a constant frequency, but it is heavily affected by noise instead. This will help the system to adapt to other signals when the model is enriched below. Note also that the response is true to the bottom-up input in the sense that when the input disappears, the activity x(t) diminishes very quickly.
80
T. Raiko and H. Valpola
7.4.2 Full Model Definition In the full model, each neuron i = 1, . . . , n has five non-negative signals at time t, the bottom-up input ui (t), the output activity xi (t), the inhibition yi (t), the gain control gi (t), and the lateral support vi (t). The neurons are organised in areas a = 1, . . . , A such that each neuron i belongs to exactly one area a(i). W is an n × n non-negative1 weight matrix that has been learned separately. The weights are assumed symmetric (Wij = Wj i ) and the diagonal elements are zero (Wii = 0). The discrete time dynamics of the signals are as follows: xi (t + 1) = (1 − α1 )xi (t) − β1 yi (t) + (ζ1 vi (t) + γ )ui (t) − η1
xj (t) + ε1 n1i (t) /[gi (t + 1) + δ]
(7.5)
+
j ∈a(i)
yi (t + 1) = (1 − α2 )yi (t) + β2 xi (t) + ζ2 vi (t)ui (t) − η2
yj (t) − θ
j ∈a(i)
xj (t) + ε2 n2i (t)
j ∈a(i)
gi (t + 1) = (1 − α3 )gi (t) + α3 (1 − κ)xi (t) + κ
vi (t) =
(7.6) +
j ∈a(i) xj (t)
|a(i)|
Wij xi (t)
(7.7)
(7.8)
j
nki (t) ∼ N(0, 1)
(7.9)
The nonlinearity [·]+ picks the positive part of the input. The bottom-up input feeds the activity especially if it has lateral support, see the term (ζ1 vi (t)+γ )ui (t). A competition of activities within an area (B1) is produced by the term −η1 j ∈a(i) xj (t). The activities are saturated (B2) by dividing by the gain control signal plus constant, gi (t) + δ. The inhibition signal yi (t) has many similar terms as the activity itself. The gain control signal gi (t) is a running average of the activity of the single neuron and the average activity in the area. The lateral support signal vi (t) is only an auxiliary variable that collects the activities of other neurons mapped through the weight matrix W. We used the parameter values given in Table 7.1. The parameters were found by tuning them by hand in order to achieve the desired properties described in the previous section. The signals were initialized to zeroes, that is, xi (1) = yi (1) = gi (1) = 0.
7.4.3 Example with a Single Area As a simple experiment, we used only a single area a = 1 and three neurons i = 1, 2, 3. The bottom-up input ui (t) = 1 is a constant 1 for each neuron the whole time. Instead of modelling the lateral support 1 Negative
coupling strengths were found harmful for synchronisation in [1] in a slightly different context.
7 Oscillatory Neural Network for Image Segmentation with Biased
81
Fig. 7.3 An experiment with a single area. Each subplot shows the signals as a function of time t , from top to bottom, lateral support vi (t), output activity xi (t), inhibition yi (t), and the gain control gi (t). Each colour corresponds to one neuron i. Note that the bottom-up input ui (t) is a constant 1 for all neurons
Fig. 7.4 Left: Example images from the training data. Right: Some of the learned weights. Each subfigure shows the sum of all the weights corresponding to the feature in the middle patch
with (7.8), we used the signals vi (t) shown in the top-most subfigure of Fig. 7.3, where two neurons get oscillating lateral support, but the third one does not. The resulting signals are shown in Fig. 7.3. The two neurons with oscillating lateral support start to oscillate strongly, whereas the third neuron only very mildly (see requirements B3–B6). Note that the active phases in xi (t) are better separated than in the lateral support vi (t) (B7). This is a good property that will help to separate the oscillations of separate object to have separate phases (A4).
7.5 Experiments in Image Segmentation We generated artificial data for the image segmentation problem. Grey scale images of 30 × 30 pixels that each contain one object as shown in Fig. 7.4. The images were divided into 5 × 5 areas of 10 × 10 pixels each with 5 pixel overlap in each direction. Because of overlapping areas, the grey scale values √ were multiplied with a kernel (1 + cos(φ))/2 in both horizontal and vertical directions where φ goes from −π to π within the 10 pixel segment. The patches from each area of 10000 examples were pooled into one and 64 features were learned with k-means clustering. The features looked like 63 different curve segments and one empty feature.
82
T. Raiko and H. Valpola
Fig. 7.5 Two objects are inserted into the same image as a test case for segmentation. The top row shows the generated and combined images, the middle row shows the activated features without the overlap, and the bottom row shows the activated features with overlap which is also the reconstruction of the images based on the feature activations
7.5.1 Simplified Segmentation We transformed the data into binary activations of the features such that exactly one of the 64 features i got ui (t) = 1 for all t (the nearest prototype vector in k-means), and the others have ui (t) = 0 for all t. The empty feature was then dropped out. Weights were learned by computing the correlation coefficient ρij for each feature over the data set and setting Wij = ρij0.7 when ρij ≥ 0.06 and Wij = 0 otherwise. Using sparse connectivity is important for computational efficiency. Some of the weights are visualized in Fig. 7.4. Note that all the weights within each area are zero, because there can be no correlations when only one unit in an area could be active at a time. For testing the segmentation, we introduced two objects into the same image, as shown in Fig. 7.5. We simplified the problem such that we found (at most) one active feature for each object in each area, and combine the activations only afterwards (see middle row in Fig. 7.5). This way we ensured that similar features activated by just one contour do not start to compete with each other. The results are shown in Fig. 7.6. The activities caused by the two object start to oscillate with opposite phases (A1–A4) and the segmentation happens in just a few wavelengths of the system. The activities xi (t) at times t = 101, . . . , 300 were fed into non-negative matrix factorisation [8] (NMF) in order to represent them compactly as a product of two time series and two sets of features. The two sets of features are shown in the result figure and they show the two objects well separated.
7.5.2 Real Segmentation The previous experiment was unrealistic in the sense that we had at most one feature active for each object. Next, we made the combination of the objects on the pixel level (see top right patch in Fig. 7.5 for an example). With using the same features learned by k-means, we still give have activity ui (t) = 1 for the best matching feature i, but we also gave activities uj (t) between 0 and 1 depending on how well they match the data.2 The empty feature was dropped out again. Lateral weights (see right side of Fig. 7.4) were relearned with these new activations. The most notable difference of the lateral weights to the simplified case, is that now the connections within a single area have non-zero weights. Figure 7.7 shows the results. The shown activities are not quite as clear and regular as in the simplified segmentation problem. This can be explained by the fact that when one object activates 2 u (t) = exp[−8p − f 2 /(0.3 + p2 )]/ max i i j
is the feature i.
exp[−8p − fj 2 /(0.3 + p2 )], where p is the patch of data and fi
7 Oscillatory Neural Network for Image Segmentation with Biased
83
Fig. 7.6 Simplified segmentation. Left: The activities xi (t) corresponding to the middle patch (top) and the patch below it (bottom) plotted as a function of t . Note that the largeness of the first peaks could be easily avoided with a reasonable non-zero initialisation for the gain control gi (1). Top right: The reconstruction of images based on the activities xi (t) at different times t . Bottom right: The segmentation result obtained from non-negative matrix factorisation of the signals xi (t)
Fig. 7.7 Real segmentation. Left: The activities xi (t) corresponding to the middle patch (top) and the patch below it (bottom) plotted as a function of t . The activities are given for the top-most data sample (same as in Fig. 7.6). Top right: The segmentation result obtained from NMF analysis of the signals xi (t). First column from the left is the data, second column is the reconstruction from the feature activities, third and fourth columns are segmented objects
84
T. Raiko and H. Valpola
Fig. 7.8 Real segmentation. Horizontal rows show the signals xi (t) as a function of t for neurons whose (constant) input activity ui (t) is greater than 0.2. The neurons are sorted based on the NMF results such that the neurons corresponding more to the first segmented object (third column in the right subfigure of Fig. 7.7) are on the top. Left subfigure is for the first sample (top row on the right subfigure of Fig. 7.7), and the right subfigure is for the second sample
more than one feature in an area, the neurons activations start to compete a little. On the other hand, the features that are correlated in data support each other with lateral weights, which seems to overcome the disturbing competition. The segmentation is successful in most cases, but fails with difficult cases like the bottom example. Figure 7.8 reveals some more details about the oscillations. Synchronisation of activities within an object are almost perfect, but in the second case, the different objects are not in complete antisynchrony; actually they have different frequencies.
7.6 Conclusion and Discussion We have shown preliminary but promising results of a system that combines oscillations for segmentation and biased competition for attention. The problem of finding the subset of features that best support each other to form objects is a very difficult one. Instead of making an exhaustive (exponentially difficult) search for such subsets, our approach solves it by using the emergent properties of oscillators. Experiments confirm that it takes only a few wave-lengths before the oscillations synchronise to find the objects. The authors are not aware of any previous work on using oscillating neural networks for image segmentation being able to use the continuity of edges. Most work seems to consider pixel intensities directly. Also, many works seem to have some global signals, whereas the current work is completely based on local computations. Our current implementation may be excessively complex, for instance, it is very likely that some of the terms in the system dynamics described in (7.5)–(7.9) could be left out by changing the other parameters. In the future, we hope to make it as simple as possible. We would also like to set the parameter values automatically in order to achieve as fast and accurate segmentation as possible. A non-zero initial value for the gain control signal gi (1) would diminish the large first peaks seen in Figs. 7.6–7.8. The computational complexity of the proposed system is very high even though we already use a sparse connection matrix W. There are many ways to make the system faster. Firstly, we could skip modelling the activations of neurons that do not get enough bottom-up input. This way, we would miss the phenomenon of false contours, though. Secondly, we could change the time-scale of the system such that one would need fewer time points to see the oscillations settle down. Thirdly, a parallel implementation of the system would be very efficient due to the design of local computations only. Fourthly, instead of modelling the oscillation explicitly, one could model variables such as the amplitude and the phase of a neurons activity, just like in the Kuramoto model [11, 16]. This way
7 Oscillatory Neural Network for Image Segmentation with Biased
85
would lose some of the expressiveness of the model. For instance, in Fig. 7.7, one of the neurons has two different sized peaks within one period. Experimentation with real (or at least more realistic) data would reveal more properties about the proposed model. The fact that the image segmentation experiment concentrated on continuity of edges, was an outcome of the used ring-like data. The Gestalt principle was not imposed on the model but it was learned from the regularities in the data. Simply by changing the data for instance such that each object is filled with a particular colour, the same model would have worked quite differently: We would expect it to segment objects based on the colour. A natural extension of the current work is to further bias the competition with connections from other parts of a larger network. For instance, if the system is looking for food, the food-like objects should draw the attention more easily than others. We hypothesise that oscillations would synchronise also hierarchically and not just laterally as in the current model, but testing this is left as future work.
References 1. Breve, F.A., Zhao, L., Quiles, M.G., Macau, E.E.N.: Chaotic phase synchronization and desynchronization in an oscillator network for object selection. Neural Netw. 22(5–6), 728–737 (2009) 2. Choe, Y., Miikkulainen, R.: Self-organization and segmentation in a laterally connected orientation map of spiking neurons. Neurocomputing 21(1–3), 139–158 (1998) 3. Deco, G., Rolls, E.T.: A neurodynamical cortical model of visual attention and invariant object recognition. Vis. Res. 44, 621–642 (2004) 4. Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Annu. Rev. Neurosci. 18, 193–222 (1995) 5. Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M., Reitboeck, H.J.: Coherent oscillations: A mechanism of feature linking in the visual cortex? Multiple electrode and correlation analyses in the cat. Biol. Cybern. 60, 121–130 (1989) 6. Gray, C.M., Singer, W.: Stimulus-specific neuronal oscillations in orientation columns of cat visual cortex. Proc. Natl. Acad. Sci. USA 86, 1698–1702 (1989) 7. Hoppensteadt, F.C., Izhikevich, E.M.: Weakly Connected Neural Networks. Springer, Berlin (1997) 8. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004) 9. von der Malsburg, C.: The correlation theory of brain function. Technical report Internal Report 81-2, MPI Biophysical Chemistry, Göttingen, Germany (1981) 10. Raiko, T., Valpola, H.: Oscillatory neural network for image segmentation with biased competition for attention. In: Proceedings of the Brain Inspired Cognitive Systems (BICS 2010) symposium, Madrid, Spain, July (2010) 11. Strogatz, S.H.: From Kuramoto to Crawford: Exploring the onset of synchronization in populations of coupled oscillators. Physica D 143, 1–20 (2000) 12. Todorovic, D.: Gestalt principles. Scholarpedia 3(12), 5345 (2008) 13. Ursino, M., Magosso, E., Cuppini, C.: Recognition of abstract objects via neural oscillators: Interaction among topological organization, associative memory and gamma band synchronization. IEEE Trans. Neural Netw. 20(2), 316–335 (2009) 14. Usher, M., Niebur, E.: Modeling the temporal dynamics of it neurons in visual search: A mechanism for top-down selective attention. J. Cogn. Neurosci. 8, 311–327 (1996) 15. Wang, D.L., Terman, D.: Locally excitatory globally inhibitory oscillator networks. IEEE Trans. Neural Netw. 6, 283–286 (1995) 16. Kuramoto, Y.: Chemical Oscillations, Waves and Turbulence. Springer, Berlin (1984) 17. Yli-Krekola, A.: A bio-inspired computational model of covert attention and learning. Master’s thesis, Helsinki University of Technology, Finland (2007) 18. Yli-Krekola, A., Särelä, J., Valpola, H.: Selective attention improves learning. In: Proceedings of the International Conference on Artificial Neural networks (ICANN), pp. 285–294 (2009)
Chapter 8
Internal Simulation of Perceptions and Actions Magnus Johnsson and David Gil
Abstract We present a study of neural network architectures able to internally simulate perceptions and actions. All these architectures employ the novel Associative Self-Organizing Map (A-SOM) as a perceptual neural network. The A-SOM develops a representation of its input space, but in addition also learns to associate its activity with an arbitrary number of additional (possibly delayed) inputs. One architecture is a bimodal perceptual architecture whereas the others include an action neural network adapted by the delta rule. All but one architecture are recurrently connected. We have tested the architectures with very encouraging simulation results. The bimodal perceptual architecture was able to simulate appropriate sequences of activity patterns in the absence of sensory input for several epochs in both modalities. The architecture without recurrent connections correctly classified 100% of the training samples and 80% of the test samples. After ceasing to receive any input the best of the architectures with recurrent connections was able to continue to produce 100% correct output sequences for 28 epochs (280 iterations), and then to continue with 90% correct output sequences until epoch 42.
8.1 Introduction An idea that has been gaining popularity in cognitive science in recent years is that higher organisms are able to internally simulate interactions with their environment. According to the simulation hypothesis [10] this can be done by utilizing three proposed mechanisms. First (simulation of actions), it is assumed that brain activity can occur that resembles the activity that normally occurs when actions are performed, except that the motor output is suppressed during simulation. Second (perceptual simulation), the brain can elicit activity in sensory cortex that resembles the activity that would normally occur as a consequence of sensory input. There are a wealth of evidence that interaction between sensory modalities may be important for perceptual simulation. For instance, several neuroimaging experiments have demonstrated that activity in visual cortex when a subject imagines a visual stimulus resembles the activity elicited by a corresponding ancillary stimulus (for a review of this evidence see e.g. [16]; for a somewhat different interpretation, see [2]). A critical question here is how simulated perceptual activity might be elicited. One possibility is that signals arising in the frontal lobe in anticipation of consequences of incipient actions are sent back to sensory areas [10]. Another possibility is that perceptual activity in one sensory area can influence activity in another. A dramatic illustration of the interaction of different modalities in humans can be seen in the McGurk-MacDonald effect. If you hear a person making the sound/ba/but the sound is superimposed on a video recording on which you do not see the lips closing, you may hear the sound/da/instead [17]. M. Johnsson () Lund University Cognitive Science, Kungshuset, Lundagård, 222 22 Lund, Sweden e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_8, © Springer Science+Business Media, LLC 2011
87
88
M. Johnsson and D. Gil
Fig. 8.1 (a) Real interaction with environment. Stimulus S1 causes perceptual activity s1 , which causes preparatory response r1 and overt response R1 . R1 causes predictable new stimulus S2 , which causes new sensory activity etc. (b) Simulated interaction. Preparatory response r1 elicits, via internal association mechanisms, perceptual activity s2 before overt behaviour occurs and causes new stimulus
The third assumption (anticipation) is that both overt and merely simulated actions can elicit perceptual simulation of the probable consequences of the action (simulated perception may also be elicited by sensory activity in a different modality). In overt interaction with the environment, the consequences of an action generate sensory input that can function as stimuli for new actions. In simulated interaction (Fig. 8.1b), simulated actions elicit sensory consequences via associatively learned perceptual simulations rather than via the physical consequences and the sense organs (Fig. 8.1a). An internal simulation might start with a real perception and continue with a chain of simulated actions and perceptions. Obviously this ability has survival value, since it provides the animal with a way of evaluating a potential course of action before acting it out in the physical world with perhaps lethal consequences. Inspired by these findings we suggest that in an efficient multimodal perceptual system, the subsystems of different sensory modalities should co-develop and be associated with each other. This means that suitable activity in some modalities that for the moment receive input should, at least to some degree, elicit appropriate activity in other sensory modalities as well. This would provide an ability to activate the subsystem for a modality even when its sensory input is limited or nonexistent as long as there are activities in subsystems for other modalities, which the subsystem has learned to associate with certain patterns of activity usually coming together with the patterns of activity in the other subsystems. Another desirable trait of the system would be the ability to elicit activity patterns that are normally subsequent to the present activity pattern in a subsystem even when sensory input is absent. This would mean the ability to expect future sequences of perceptions that normally follows a certain perception within a modality, but also to elicit chains of proper activity in motor systems in the absence of sensory input. It would also be desirable if the activity in a sensory modality of the system could elicit proper activity in other modalities too. For example, a gun seen to be fired from a long distance, would yield an expectation of a bang to follow soon. The ability to elicit continued and reasonable activity in different subsystems in the absence of input provides the system with an ability to internally simulate sequences of perceptions and actions as proposed in the neuroscientific simulation hypothesis. Previously we have focused on the perceptual side of the problem. Thus we have done experiments with a novel self-organizing neural network called the Associative Self-Organizing Map (A-SOM), which is an extension of the Self-Organizing Map (SOM) [15]. The SOM is a neural network that self-organizes to represent a topology preserving projection of its input space. This is similar to what have been observed in cortical areas (e.g. somatotopic maps of the body surface in the somatosensory cortex or retinotopic maps in the visual cortex). The A-SOM also organizes into a topology preserving projection of its input space, but in addition it also learns to associate its activity with an arbitrary number of additional inputs. This means that it is possible to have several connected A-SOMs that co-develop. Thus simultaneous activity in different A-SOMs due to simultaneous input will be associated by associative connections between the A-SOMs. When the connected A-SOMs are fully trained, it is possible to elicit proper activity in an A-SOM that for the moment lacks input due to the activity in other connected A-SOMs. Proper activity is to be understood as the activity that would be there due to the input that usually comes together with the input to the other connected A-SOMs.
8 Internal Simulation of Perceptions and Actions
89
The A-SOM is not restricted to be connected to other A-SOMs. For example, it could be connected to other kinds of neural networks or the associative connections could be used to associate the A-SOMs activity at different times. In the previous experiments we did simulations with an A-SOM which received the activity of two external SOMs as additional inputs [13]. The A-SOM differs from earlier attempts to build associate maps such as the Adaptive Resonance Associative Map [20] and Fusion ART [18] in that all layers (or individual networks) share the same structure and uses topologically arranged representations. Unlike ARTMAP [4], the A-SOM also allows associations to be formed in both directions. We have also tested the A-SOM together with real sensors in the context of haptic perception were we implemented a bio-inspired self-organizing texture and hardness perception system which automatically learned to associate the self-organized representations of these two submodalities (A-SOMs) with each other [11, 12]. In this text we present experiments with a number of architectures. One of these is able to internally simulate perceptual sequences as well as elicit cross-modal activity. This architecture consists of two connected A-SOMs. One of these A-SOMs also learned to associate its current activity with its activity of the previous iteration. This created a novel kind of recurrent Self-Organizing Map which was able to learn perceptual sequences. This activity sequence could be invoked by input to either of the two A-SOMs or both. The most similar existing unsupervised recurrent neural network is the Recursive SOM that feeds back its activity together with the input for the next iteration [22]. The Recursive SOM is similar but not equivalent to the A-SOM and lacks ability to associate with the activity of other neural networks. Other examples similar unsupervised recurrent neural network are the Temporal Kohonen Map [5], the Recurrent Self-Organizing Map [21] and the Merge SOM [19]. A review of recursive self-organizing neural network models is presented in [9]. We also present four supervised A-SOM based architectures with an action layer added to the A-SOM, thus obtaining an architecture able to simulate chains of both perceptions and actions. This implies architectures which can elicit reasonable sequences of activity both in its perceptual network and in its action network. One of these supervised architectures only has feed-forward connections and is suitable for classification tasks. The other three architectures in addition use recurrent connections that make them able to continue with internal simulations of both perceptions and actions in the absence of input. These architectures are related to the well known recurrent supervised neural network architectures of Jordan and Elman [6] but add the properties of the A-SOM. The implementation of all code for the experiments presented in this paper was done in C++ using the neural modelling framework Ikaros [1].
8.2 Architectures All architectures discussed in this paper are based on the A-SOM. The first architecture consists of two connected A-SOMs, one with recurrent feedback connections. Thus, this architecture could be considered a bimodal perceptual architecture. The four other architectures all have a common basic structure. Thus they consist of two separate but connected neural networks, one perceptual neural network and one action neural network. The perceptual neural network consists of an A-SOM which is fully connected with forward connections to the action neural network (with a time delay of one iteration during a simulation). The action neural network consists of a grid of neurons that are adapted by the delta rule to get an activity that converges to the provided desired output. The four different architectures that include an action neural network differ in whether they include recurrent feedback connections and if so how these are connected.
90
M. Johnsson and D. Gil
8.2.1 The Perceptual Neural Network The perceptual neural network consists of an Associative Self-Organizing Map (A-SOM) [13], which can be considered a SOM that learns to associate its activity with (possibly delayed) additional inputs. The A-SOM consists of an I × J grid of a fixed number of neurons and a fixed topology. Each a 1 ∈ R m1 , w 2 ∈ R m2 , . . . , w r ∈ R mr ∈ R n and wij neuron nij is associated with r +1 weight vectors wij ij ij weighting the main input and the r ancillary inputs. All the elements of all the weight vectors are initialized by real numbers randomly selected from a uniform distribution between 0 and 1, after which all the weight vectors are normalized, i.e. turned into unit vectors. At time t each neuron nij receives r + 1 input vectors x a (t) ∈ R n and x 1 (t − d1 ) ∈ R m1 , x 2 (t − d2 ) ∈ R m2 , . . . , x r (t − dr ) ∈ R mr where dp is the time delay for input vector x p , p = 1, 2, . . . , r. The main net input sij is calculated using the standard cosine metric sij (t) =
a (t) x a (t) · wij a x a (t)wij (t)
(8.1)
.
The activity in the neuron nij is given by yij (t) = yija (t) + yij1 (t) + yij2 (t) + · · · + yijr (t) /(r + 1) where the main activity
yija
(8.2)
is calculated by using the softmax function [3] yija (t) =
(sij (t))m maxij (sij (t))m
(8.3)
where m is the softmax exponent. p The ancillary activity yij (t), p = 1, 2, . . . , r is calculated by again using the standard cosine metric p
p
yij (t) =
x p (t − dp ) · wij (t) p
x p (t − dp )wij (t)
.
(8.4)
The neuron c with the strongest main activation is selected: c = arg max yija (t). ij
(8.5)
a The weights wij k are adapted by
a a a a wij k (t + 1) = wij k (t) + α(t)Gij c (t) xk (t) − wij k (t)
(8.6)
where 0 ≤ α(t) ≤ 1 is the adaptation strength with α(t) → 0 when t → ∞. The neighbourhood func−
rc −rij
tion Gij c (t) = e 2σ 2 (t) , where rc ∈ R 2 and rij ∈ R 2 are location vectors of neurons c and nij , is a Gaussian function decreasing with time. p The weights wij l , p = 1, 2, . . . , r, are adapted by p p p p wij l (t + 1) = wij l (t) + βxl (t − dp ) yija (t) − yij (t) (8.7) where β is the adaptation strength. p a All weights wij k (t) and wij l (t) are normalized after each adaptation.
8.2.2 The Action Neural Network The action neural network consists of an I × J grid of a fixed number of neurons and a fixed topology. Each neuron nij is associated with a weight vector wij ∈ R n . All the elements of the weight vector
8 Internal Simulation of Perceptions and Actions
91
are initialized by real numbers randomly selected from a uniform distribution between 0 and 1, after which the weight vector is normalized, i.e. turned into unit vectors. At time t each neuron nij receives an input vector x(t) ∈ R n . The activity yij in the neuron nij is calculated using the standard cosine metric yij (t) =
x(t) · wij (t) . x(t)wij (t)
During the learning phase the weights wij l , are adapted by wij l (t + 1) = wij l (t) + βxl (t) yij (t) − dij (t)
(8.8)
(8.9)
where β is the adaptation strength and dij (t) is the desired activity for the neuron nij .
8.2.3 Tested Variants We have tested five different architectures that employ the A-SOM as a perceptual neural network. Four of the architectures also included the action neural network described above. In the first architecture we have set up a bimodal perceptual architecture consisting of two A-SOMs (Fig. 8.2a) and tested its ability to continue with reasonable sequences of activity patterns in the two A-SOMs in the absence of any input. This could be seen as an ability to internally simulate expected sequences of perceptions within a modality likely to follow the last sensory experience while simultaneously elicit reasonable perceptual expectations in the other modality. One of the A-SOMs, the A-SOM A, is a recurrent A-SOM, i.e. a set of recurrent connections feed the activity of A-SOM A back to itself as ancillary input with a time delay of one iteration. The other A-SOM, the A-SOM B, is an ordinary A-SOM without recurrent connections. A-SOM A is connected to A-SOM B (see Fig. 8.2a), i.e. A-SOM B receives the activity of A-SOM A as ancillary input without any time delay. Thus the activity in A-SOM A will elicit associated activity in A-SOM B. In the second architecture (Fig. 8.2b) we fully connected the perceptual neural network to the action neural network with feed-forward connections only, i.e. the action neural network received the activity of the perceptual neural network as input, thus creating an architecture suitable for classification. The third architecture (Fig. 8.2c) is similar to the second one but with recurrent connections added. These recurrent connections feed the activity of the perceptual neural network back to the perceptual neural network itself as ancillary input with a time delay of one iteration. This yields an architecture able to produce proper sequences of action activity even when the perceptual neural network stops receiving input. The fourth architecture (Fig. 8.2d) is similar to the third but with the recurrent connections in the third architecture replaced by recurrent connections that feed the activity of the action neural network back to the perceptual neural network as ancillary input with a time delay of one iteration. As the third architecture this also yields an architecture able to produce proper sequences of action activity even when the perceptual neural network stops receiving input. In the fifth architecture (Fig. 8.2e) the approaches of the third and the fourth architectures are combined, i.e. there are two sets of recurrent connections. Thus there are both a set of recurrent connections that feed the activity of the perceptual neural network back to itself as ancillary input with a time delay of one iteration, and a set of recurrent connections that feed the activity of the action neural network back to the perceptual neural network as ancillary input with a time delay of one iteration. Also this architecture is suitable for the production of proper sequences of action activity even when the perceptual neural network stops receiving input.
92
M. Johnsson and D. Gil
Fig. 8.2 The five tested neural network architectures. (a) The A-SOM A is connected with recurrent connections to itself as ancillary input with a time delay of one, and with another set of connections as ancillary input to A-SOM B with a time delay of zero; (b) The perceptual neural network is connected to the action neural network with feed-forward connections only; (c) The perceptual neural network is connected to the action neural network with feed-forward connections and to itself with recurrent connections; (d) The perceptual neural network is connected to the action neural network with feed-forward connections, and the action neural network is connected with recurrent connections to the perceptual neural network; (e) The perceptual neural network is connected to the action neural network with feed– forward connections and to itself with recurrent connections. In addition the action neural network is connected with recurrent connections to the perceptual neural network
8 Internal Simulation of Perceptions and Actions
93
8.3 Simulations All five tested architectures used A-SOMs with 15 × 15 neurons, and the four architectures that included action neural networks used such neural networks that consisted of 1 × 10 neurons. To test the architectures we constructed a set of 10 training samples by random selection, with uniform distribution, from a subset s of the plane s = {(x, y) ∈ R 2 ; 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. The selected points were then mapped to a subset of R 3 by adding a third constant element of 0.5, yielding a training set of three-dimensional vectors. The reason for this was that a Voronoi tessellation of the plane was calculated from the generated points to later aid in the determination of how new points in the plane should be classified (the second architecture described above). To make this Voronoi tessellation, which is based on the Euclidian metric, useful for this purpose with the A-SOM as a perceptual neural network, which uses a metric based on dot product, the set of points in the plane has to be mapped so that the corresponding position vectors after normalization are unique. One way to accomplish such a mapping is by adding a constant element to each vector. The result of this is that each vector will have a unique angle in R 3 . We chose the value 0.5 for the constant elements to maximize the variance of the angles in R 3 . All architectures were trained during 20000 iterations, i.e. during 2000 epochs when receiving the sequence of 10 training samples. The softmax exponent for the A-SOMs were set to 1000. The learning rate α(0) of the A-SOMs was initialized to 0.1 with a learning rate decay of 0.9999 (i.e. multiplication of the learning rate with 0.9999 in each iteration), which means the minimum learning rate, set to 0.01, will be reached at the end of the 20000 training iterations. The neighbourhood radius, i.e. σ of the neighbourhood function Gij c (t) in (8.6), was initialized to 15 for both A-SOMs and shrunk to 1 during the 20000 training iterations by using a neighbourhood decay of 0.9998 (i.e. multiplication of the neighbourhood radius with 0.9998 in each iteration). The A-SOMs used plane topology when calculating the neighbourhood. The learning rate β for the associative weights in all A-SOMs as well as for the neurons in the output layer for all architectures was set to 0.35. In the first architecture the A-SOM A receives its main input from the constructed input set described above. In addition, its total activity is fed back as ancillary input with a time delay of one iteration. Besides the main input from the constructed input set, the A-SOM B receives the total activity of the A-SOM A as ancillary input without any time delay. Both A-SOMs were simultaneously fed with 10 samples of the training set over and over again, all the time in the same sequence, during a training phase (consisting of 2000 epochs). The two A-SOMs could as well have been fed by samples from two different sets, always receiving the same combinations of samples from the two sets (otherwise the system could not learn to associate them). This could be seen as a way of simulating simultaneous input from two different sensory modalities when an animal or a robot explores its environment. After training of the first architecture, weight adaptation was turned of and the system was tested by feeding both A-SOM A and A-SOM B with the 10 samples from the training set once again in the same sequence as during the training phase, i.e. the system received input for one epoch. The centres of activity for each sample in both A-SOMs were recorded, and the corresponding Voronoi tesselations for the A-SOMs were calculated. The centres of activity, of course, always correspond to the localizations of the neurons in the A-SOMs. However, if we consider the centres of activity to be points in the plane, then we can calculate a Voronoi tesselation of the plane according to these points. In this way we will also get a division of the grid of neurons of each A-SOM. This is because each neuron in an A-SOM will be localized in a Voronoi cell or on the border between several Voronoi cells (when we see the localizations of the neurons as points in the plane). Voronoi tesselations for the activity centres of the A-SOMs are used to assess the performance of the system. This is done in the following way: During the first epoch after training when the A-SOMs received main input, we recorded the sequences of Voronoi cells containing the centres of activity for the sequences of activity patterns in both A-SOMs. After the first epoch the A-SOMs did not receive main input anymore, i.e.
94
M. Johnsson and D. Gil
only null vectors were received as main inputs. Anyway, sequences of activity patterns continued to be elicited in both A-SOMs. This means the system continued to run with internal states only. This is possible since A-SOM A received its own total activity as ancillary input with a time delay of one iteration and the A-SOM B received the total activity of A-SOM A as ancillary input without any time delay. For each of the following 25 epochs (without any main input to the A-SOMs) we recorded whether the centres of activity for each iteration in the epoch was in the correct Voronoi cell. If the centre of activity is in the correct Voronoi cell, then it is considered correct because then it is sufficiently similar to the centre of activity (from the first test epoch) that corresponds to that Voronoi cell. This is because then it is closer to the centre of activity (from the first test epoch) that corresponds to that Voronoi cell than to any other centre of activity from the first test epoch. This procedure enabled us to calculate the percentage of correct activity patterns for each of the 25 epochs without main input to the A-SOMs during the test phase. During these 25 epochs the activity is elicited solely by recurrent connections. Since the other four architectures that include an action neural network are supervised each training sample was associated with a desired activity provided to the action neural network during the training phase. The set of desired activities D consisted of 10-dimensional vectors, where one element in each was set to 1 and the other elements were set to 0, i.e. D = {(1, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, 0, . . . , 1)}. This means that after training each sample of the training set should elicit the highest activation in a unique neuron if the trained architecture is able to distinguish between the samples in the training set. Moreover, if the trained architecture receives a new sample (this was tested in the case of the second architecture described above), not included in the training set, this should elicit activity in the same output neuron as the closest sample in the training set. In other words: a new sample located in a certain Voronoi cell of the input space should elicit the highest activity in the same neuron as the training sample corresponding to that particular Voronoi cell. If this is true for a sufficient ratio of the new samples, then the generalization ability of the architecture should be considered good. After the training phase the second architecture described above was tested with the training samples and with an additional set of 10 new samples generated in the same way as the training set. The other three architectures with an action neural network were tested to evaluate their ability to produce proper sequences of activities in their action neural networks even when their perceptual neural network stopped receiving input. Thus we only used the training sets in this evaluation, and the sequences of activities were considered proper if the same sequence of neurons in the action neural network had the highest activity as when the architectures received input. Thus these architectures were tested, after the training phase, by first feeding them the sequence of the 10 training samples once. Then the architectures did not receive any more input and we recorded the activity for the following 950 iterations (see Fig. 8.3 C, D and E) to see if the architectures were able to reproduce the same sequence of 10 activity pattern in their action neural networks over and over again. Thus we recorded the percentage of the iterations in each epoch (i.e. an epoch being 10 iterations since the training sequence was the 10 samples in the training set) with proper activity after the architecture ceased to receive any input.
8.3.1 Bimodal Perceptual Architecture In Fig. 8.4 (middle and right) we can see that both A-SOMs perfectly discriminate between the 10 samples in the sample set, and by comparing the Voronoi tessellations of the A-SOMs (Fig. 8.4, middle and right) with the Voronoi tessellation of the plane for the training set (Fig. 8.4, left) we can see that the ordering of the Voronoi cells for the training set are to a large extent preserved for the Voronoi cells for the centres of activation in the A-SOMs. Figure 8.3a shows the percentages of correct activity patterns in each epoch (i.e. sequence of 10 iterations) for each of the first 25 epochs when the system did not receive anymore main input. The
8 Internal Simulation of Perceptions and Actions
95
Fig. 8.3 The simulation results. (a) The results with the Bimodal Perceptual Architecture. The percentages of correct activity patterns in each epoch (i.e. sequence of 10 iterations) for each of the first 25 epochs when the system did not receive anymore main input. The diagram to the left depicts this for A-SOM A, whereas the diagram to the right depicts it for A-SOM B; (b) The results with the Feed-Forward Architecture for Classification. 100% of the training samples and 80% of the test samples were recognised; (c) The results with the Architecture with Recurrent Perceptual Neural Network Connections. After ceasing to receive any input this architecture was able to continue to produce 100% correct sequences of activity patterns in the action neural network for 28 epochs, and then continue with 90% correct sequences until epoch 42; (d) The results with the Architecture with Recurrent Connections from the Action Neural Network to the Perceptual Neural Network. There was 100% correct reproduction of the sequence of activity patterns in the action neural network in the first 3 epochs after ceasing to receive any input; (e) The results with the Architecture with Recurrent Perceptual Neural Network Connections and Recurrent Connections from the Action Neural Network to the Perceptual Neural Network. There was 100% correct reproduction of the sequence of activity patterns in the action neural network in the first 8 epochs after ceasing to receive any input
diagram to the left in Fig. 8.3a depicts the result for A-SOM A, whereas the diagram to the right in Fig. 8.3a depicts the result for A-SOM B. As can be seen the percentage of correct activity patterns is 100% for the first 9 epochs without main input in both A-SOM A and A-SOM B. The percentage of correct activity patterns then decline gradually in both A-SOMs and at the 25th iteration it is 60% for A-SOM A and 20% for A-SOM B.
96
M. Johnsson and D. Gil
Fig. 8.3 (Continued)
Fig. 8.4 Left: The Voronoi tessellation of the points used when constructing the training set used as input to the two A-SOMs. Middle and Right: The Voronoi tessellations corresponding to the centres of activity during the first epoch of the test phase for the two A-SOMs. The image in the middle depicts the Voronoi tesselation of the fully trained A-SOM A together with the 10 centres of activity corresponding to the 10 first iterations of the test phase when the system received input from the sample set. The right image depicts the same but for the fully trained A-SOM B
8.3.2 Feed-Forward Architecture for Classification Figure 8.2b shows a schematic depiction of the architecture. The simulation results with the second architecture are depicted in Fig. 8.3b. As can be seen in this figure, all samples in the training set elicited the highest activity in the proper neuron, i.e. 100% correct. In Fig. 8.3b we can also see that 8 of the 10 new samples elicited highest activity in the proper neuron of the action neural network. Sample 3 in the new sample set should have been classified as belonging to the Voronoi cell for sample 3 of the training set, but was misclassified as belonging to the Voronoi cell for sample 1 in the training set. Sample 7 in the new sample set should have been classified as belonging to the Voronoi cell for sample 7 of the training set, but was misclassified as belonging to the Voronoi cell for sample 4 in the training set.
8 Internal Simulation of Perceptions and Actions
97
It is worth noting that sample 10 in the new sample set lies at the border between the Voronoi cells for training samples 4 and 7. This sample was classified as belonging to the Voronoi cell for training sample 7, but it would also be considered correctly classified if it would have been classified as belonging to the Voronoi cell for training sample 4. An interesting observation was that when receiving this new sample the activity of the neuron in the output layer that represents the Voronoi cell for training sample 7 was the most activated neuron in the action neural network, and the neuron that represents the Voronoi cell for training sample 4 was the second most activated neuron in the action neural network.
8.3.3 Architecture with Feedback of Perceptual Activity The simulation results with the third architecture are depicted in Fig. 8.3c. In this figure we can see that this architecture was able to reproduce the sequence of the training samples with 100% correctness in the first 28 epochs (i.e. for 280 iterations), with 90% correctness until epoch 42, and then there was a gradual decline until it reached a level of 20% correct activities at epoch 76. This still was the performance level at epoch 95.
8.3.4 Architecture with Feedback of Action Activity The simulation results with the fourth architecture are depicted in Fig. 8.3d. In this figure we can see that this architecture was able to reproduce the sequence of the training samples with 100% correctness in the first 3 epochs (i.e. for 30 iterations) without input, then there was a rapid decline until a level of 0% correct activity patterns in the action neural network were reached at epoch 14. This level of 0% correct activity patterns in the action neural network continued until epoch 71 when it started to improve again. It reached a new peak of 50% correct activity patterns in the action neural network between epochs 79 and 86. After that there was a decline again and the level of 0% correct activity patterns in the action neural network was reached again at epoch 95.
8.3.5 Architecture with Feedback of Action and Perceptual Activity The simulation results with the fifth architecture are depicted in Fig. 8.3e. In this figure we can see that this architecture was able to reproduce the sequence of the training samples with 100% correctness in the first 8 epochs (i.e. for 80 iterations) without input, then there was a rapid decline until a level of 0% correct activity patterns in the action neural network were reached at epoch 13. This level of 0% correct activity patterns in the action neural network continued until epoch 63 when it started to improve again. The architecture reached a new peak of 90% correct activity patterns in the action neural network between epochs 67 and 69. After that there was a decline again and the level of 0% correct activity patterns in the action neural network was reached again at epoch 81. This level of 0% correct activity patterns in the action neural network was still the performance level at epoch 95.
8.4 Discussion We have implemented and tested five A-SOM based architectures. The first architecture was a bimodal model consisting of two A-SOMs and we tested its ability to continue with reasonable sequences of
98
M. Johnsson and D. Gil
activity patterns in the two A-SOMs in the absence of any input. This architecture could be viewed as a model of a neural system with one monomodal representations (A-SOM A) and one bimodal representation (A-SOM B) constituting a neural area that merges two sensory modalities into one representation. In our experiments with the first architecture we focused on the ability of the A-SOM to internally simulate expected sequences of perceptions likely to follow the last input while simultaneously elicit reasonable perceptual expectations in the other A-SOM. This was accomplished by setting up a system of two connected A-SOMs. One of these A-SOMs used its own total activity as time-delayed ancillary input whereas the other used the total activity of the first one as ancillary input without any time delay. Our experiments showed that our model is able to continue to produce proper sequences of activation in both A-SOMs for several epochs even when these have stopped receiving any main input. The results are very encouraging and confirms the ability of the A-SOM to serve in a system capable of internal simulation as well as of cross-modal activation. The second architecture was a Feed-Forward Architecture for Classification and it was able to correctly classify 100% of the training samples as well as 80% of a new set of test samples. The three other supervised architectures used recurrent connections to enable internal simulation of perceptions and actions. The third architecture used recurrent connections to feed back the activity of the A-SOM to itself as ancillary activity with a time delay of one iteration. After ceasing to receive any input this architecture was able to continue to produce 100% correct sequences of output for 28 epochs, and then continue with 90% correct sequences until epoch 42. The fourth architecture used recurrent connections to feed back the activity of the output layer to the A-SOM as ancillary input with a time delay of one iteration. This architecture was able to continue to produce 100% correct sequences of output for 3 epochs after ceasing to receive any input. The fifth architecture used two sets of recurrent connections. One set of recurrent connections were used to feed back the activity of the A-SOM to itself as ancillary activity with a time delay of one iteration. The other set of recurrent connections were used to feed back the activity of the output layer to the A-SOM as ancillary input, also with a time delay of one iteration. This architecture was able to continue to produce 100% correct sequences of output for 8 epochs after ceasing to receive any input. The A-SOM actually develops several representations, namely one representation for its main input (the main activity) and one representation for the activity of each of the associated neural networks it is connected to (the ancillary activities) which may actually be the time delayed total activity of the A-SOM itself, and one representation which merges these individual representations (the total activity). One could speculate whether something similar could be found in cortex, perhaps these different representations could correspond to different cortical layers. It is worth noting that although so far it has not been tested the authors can see no impediments to why it should not be possible to have several sets of connections that feed back the total activity of the A-SOM to itself as ancillary input but with varying lengths of the time delays. This would probably yield an enhanced ability for internal simulation and to remember perceptual sequences. It should be noted that the first architecture is consistent with different views of how the sensory system is organized. The traditional view of sensory information processing has been that of a hierarchically organized system. Unimodal neurons in primary sensory cortex send signals to higher association areas where information from different modalities are eventually merged. The model presented in this paper is consistent with such a view. A-SOM B could be seen as being a step higher in the sensory hierarchy than A-SOM A and could project to other A-SOMs further up the hierarchy. However, recent neuroscientific evidence suggests that different primary sensory cortical areas can influence each other more directly. For instance, a recent fMRI study [14] showed that visual stimuli can influence activity in primary auditory cortex. The A-SOM can serve as a model of such an organization as well. As an illustration, A-SOM B could be located in an analog of a primary sensory cortical area, say an auditory area, and be influenced by signals from A-SOM A, which could be located in a different, say visual, area.
8 Internal Simulation of Perceptions and Actions
99
When comparing the three supervised architectures with recurrent connections, the architecture with recurrent connections that feed back the activity of the A-SOM to itself is clearly the best. This is because it is able to continue with proper output sequences much longer than the other supervised recurrent architectures in the absence of input. That the correctness of the output sequences decline with time in the three supervised architectures with recurrent connections is reasonable and it is probably due to that the present activity elicit similar (but not exactly the same) activity in the succeeding iteration, which over time should lead to an increased deviation from the correct activity. A reasonable guess to why the supervised architecture with recurrent connections that feed back the activity of the A-SOM to itself is better than the architecture with recurrent connections from the output layer to the A-SOM is as follows: It should be possible to keep more information when associating the activity in the A-SOM with the activity of the A-SOM in the previous iteration than when associating the activity of the A-SOM with the activity of the output layer in the previous iteration. The reason is dimensionality, i.e. the number of connections is much larger in the former case than in the latter because the A-SOM in our simulations contains 225 neurons whereas the output layer contains only 10. The reason that there is a second peak in the two supervised architectures with recurrent connections from the output layer to the A-SOM is probably similar. There is a higher probability that the activity pattern in the 10 output neurons starts to become proper again after some time than that the activity pattern in the 225 neurons in the A-SOM happens to become proper. However, we have no idea about why the second peak comes earlier, reaches a higher level and is more narrow in the fifth architecture than the second peak in the fourth architecture. A somewhat surprising result was that the fifth architecture, i.e. the one with recurrent connections from the output layer to the A-SOM as well as recurrent connections from the A-SOM to the A-SOM is not better than the third architecture, i.e. the one with recurrent connections from the A-SOM to itself. We had expected it to be because it should be able to keep more information. Probably this is due to the way the total activity is calculated in the A-SOM, i.e. by averaging the ancillary activities and the main activity. This means that the ancillary input from the output layer will have as much influence on the total activity of the A-SOM as the ancillary activity with the time delayed A-SOM activity. Thus a performance somewhere in between the performance of the supervised architecture with recurrent connections from the A-SOM and the performance of the supervised architecture with recurrent connections from the output layer to the A-SOM would be expected. This is also what we got in the simulations. In the future we will try to extend the ideas presented in this text with a reinforcement learning mechanism that makes the architectures reward driven. We also aim at testing our architectures together with real robots. A robot controller that utilised a simulation mechanism was designed by Ziemke et al. [23]. A genetic algorithm was used to set the weights of this ANN-based controller, which predicted the next sensory input based on its current sensory input together with its current motor output. The controller could then run on predicted sensory input instead of real sensory input. It was shown that a simulated Khepera robot that had learnt to move around in a simple environment avoiding some obstacles could do this successfully when external sensory input was replaced by predicted (“simulated”) input. Another idea for further development of the presented architectures is to use several sets of recurrent connections with different time delays to improve the ability to continue with proper output sequences in the absence of input. One drawback with this approach is of course the increased computational burden. Thus it will be a matter of weighting the improved ability to continue with proper output sequences against the additional computational burden. Still another idea is that it is conceivable to develop a variant of the A-SOM based on the Growing Cell Structure [7] or the Growing Grid [8]. In this way it might be possible to create an architecture that automatically creates a suitable number of neurons with a suitable topology. This would yield a suitable size of the hidden layer to represent the clusters in the particular input space.
100
M. Johnsson and D. Gil
Acknowledgements We want to express our acknowledgement to the Ministry of Science and Innovation (Ministerio de Ciencia e Innovación—MICINN) through the “Jose Castillejo” program from Government of Spain and to the Swedish Research Council through the Swedish Linnaeus project Cognition, Communication and Learning (CCL) as funders of the work exhibited in this chapter.
References 1. Balkenius, C., Morén, J., Johansson, B., Johnsson, M.: Ikaros: Building cognitive models for robots. Adv. Eng. Inform. (2009). doi:10.1016/j.aei.2009.08.003 2. Bartolomeo, P.: The relationship between visual perception and visual mental imagery: a reappraisal of the neuropsychological evidence. Cereb. Cortex 38, 357–378 (2002) 3. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, London (1995) 4. Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 3, 698–713 (1992) 5. Chappell, G.J., Taylor, J.G.: The temporal kohonen map. Neural Netw. 6, 441–445 (1993) 6. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990) 7. Fritzke, B.: Growing cell structures—a self-organizing network for unsupervised and supervised learning. Neural Netw. 7(9), 1441–1460 (1993) 8. Fritzke, B.: Growing grid—a self-organizing network with constant neighborhood range and adaptation strength. Neural Process. Lett. 2, 5 (1995) 9. Hammer, B., Micheli, A., Sperduti, A., Strickert, M.: Recursive self-organizing network models. Neural Netw. 17, 1061–1085 (2004) 10. Hesslow, G.: Conscious thought as simulation of behaviour and perception. Trends Cogn. Sci. 6, 242–247 (2002) 11. Johnsson, M., Balkenius, C.: Associating som representations of haptic submodalities. In: Ramamoorthy, S., Hayes, G.M. (eds.) Towards Autonomous Robotic Systems 2008, pp. 124–129 (2008) 12. Johnsson, M., Balkenius, C.: Experiments with self-organizing systems for texture and hardness perception. J. Comput. Sci. Technol. 1(2), 53–62 (2009) 13. Johnsson, M., Balkenius, C., Hesslow, G.: Associative self-organizing map. In: International Joint Conference on Computational Intelligence (IJCCI) 2009, pp. 363–370 (2009) 14. Kayser, C., Petkov, C.I., Augath, M., Logothetis, N.K.: Functional imaging reveals visual modification of specific fields in auditory cortex. J. Neurosci. 27, 1824–1835 (2007) 15. Kohonen, T.: Self-Organization and Associative Memory. Springer, Berlin (1988) 16. Kosslyn, S.M., Ganis, G., Thompson, W.L.: Neural foundations of imagery. Nat. Rev., Neurosci. 2, 635–642 (2001) 17. McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976) 18. Nguyen, L.D., Woon, K.Y., Tan, A.H.: A self-organizing neural model for multimedia information fusion. In: International Conference on Information Fusion 2008, pp. 1738–1744 (2008) 19. Strickert, M., Hammer, B.: Merge som for temporal data. Neurocomputing 64, 39–71 (2005) 20. Tan, A.H.: Adaptive resonance associative map. Neural Netw. 8, 437–446 (1995) 21. Varsta, M., Millan, J., Heikkonen, J.: A recurrent self-organizing map for temporal sequence processing. In: ICANN 1997 (1997) 22. Voegtlin, T.: Recursive self-organizing maps. Neural Netw. 15, 979–991 (2002) 23. Ziemke, T., Jirenhed, D., Hesslow, G.: Internal simulation of perception: a minimal neuro-robotic model. Neurocomputing 68, 85–104 (2005)
Chapter 9
Building Neurocognitive Networks with a Distributed Functional Architecture Marmaduke Woodman, Dionysios Perdikis, Ajay S. Pillai, Silke Dodel, Raoul Huys, Steven Bressler, and Viktor Jirsa
Abstract In the past few decades, behavioral and cognitive science have demonstrated that many human behaviors can be captured by low-dimensional observations and models, even though the neuromuscular systems possess orders of magnitude more potential degrees of freedom than are found in a specific behavior. We suggest that this difference, due to a separation in the time scales of the dynamics guiding neural processes and the overall behavioral expression, is a key point in understanding the implementation of cognitive processes in general. In this paper we use Structured Flows on Manifolds (SFM) to understand the organization of behavioral dynamics possessing this property. Next, we discuss how this form of behavioral dynamics can be distributed across a network, such as those recruited in the brain for particular cognitive functions. Finally, we provide an example of an SFM style functional architecture of handwriting, motivated by studies in human movement sciences, that demonstrates hierarchical sequencing of behavioral processes.
9.1 Introduction Human cognitive and behavioral sciences have recently found that their phenomena frequently admit descriptions in the mathematical terms of dynamical systems theory [1, 18, 30, 34], often suggesting that specific behaviors are usually low-dimensional [9, 15]. Although these cognitive studies aid in the design and construction of cognitive systems, results from the field of cognitive neuroscience have much to offer toward understanding the implementation of cognitive structures: the brain appears to produce cognition by coordinating its own high-dimensional, densely interconnected networks according to cognitive demands [3–5, 11]. Here however, we find a mirror image of Bernstein’s problem of degrees of freedom [2]: for cognitive neuroscience, how does the brain, in conjunction with the body, manage its multitude of degrees of freedom to generate cognition? For the design of cognitive systems, what various substrates and configurations are necessary to generate useful capabilities? There exists a multitude of neural and cognitive models [1, 32], all generating phenomena of a particular interest, yet the application of these mathematical tools to the problem of designing and constructing cognitive systems is still largely fragmented, and more importantly, the relation of cognition to its underlying brain networks lacks a general theoretical framework. Here, we present an approach, based on an understanding of large-scale brain networks, to the problem of constructing cognitive systems under the name of Structured Flows on Manifolds [17, 28]. Through this framework, we address a fundamental issue in cognitive neuroscience and the design of brain-inspired cognitive systems: how is behaviorally relevant function reliably and adaptively generated by systems, e.g. the human body and nervous system, with practically uncountable degrees of freedom, immense heterogeneity, and dynamics at many time scales? In this regard, biological systems have many characteristic signs to M. Woodman () Theoretical Neuroscience Group, Université de la Méditerranée, Marseille, France e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_9, © Springer Science+Business Media, LLC 2011
101
102
M. Woodman et al.
offer us in the way that biological function often acts to maintain itself: if one or several degrees of freedom become unavailable, others in the network of underlying processes are recruited such that the system maintains its overall function [6, 26]; this occurs such that function serves as a constraint for organizing the system. The importance of this property is that it endows the system with stability and flexibility seen at both the component (e.g. neural) and behavioral level, and it has been proposed to play a central role in the function of biological systems [10, 23].
9.2 Structured Flows on Manifolds As demonstrated in behavioral and cognitive studies [1, 18, 30, 34], cognitive performance within tasks appears to operate and evolve in low-dimensional spaces [9, 15]. This is a core aspect of the approach we wish to present here; the other is the genesis of the low-dimensional space itself. In our approach, specific behaviors are identified by the dynamical flow of the variables relevant to those behaviors; such a behavior is termed a phase flow. According to dynamical systems theory, phase flows allow for an unambiguous description of deterministic, time-continuous and autonomous systems. The flow in phase space can be generally described by equations of the form x˙ = f(x)
(9.1)
where x is the vector of so-called state or phase variables and the dot indicates the derivative with respect to time. While the flow provides a quantitative system description, the flow’s topology uniquely defines a system’s qualitative behavior. The number and type of topological structures that can exist in phase space are constrained by the system’s dimension (i.e., number of state variables): in two dimensions, simple and well-known topological structures include fixed points, limit cycles and separatrices, while systems with more than two dimensions allow for other, more complex topological structures. Point attractors and stable limit cycles are generally associated with discrete behavior and rhythmic behaviors, respectively (cf. [15, 16]). A separatrix is a topological structure that divides the phase space into regions with distinct flows (Fig. 9.1, black lines), and may provide a system with threshold properties. Topology is a crucial aspect of phase spaces: the behavior of two systems map to each other if and only if their phase space is topologically equivalent. Phase flow topologies thus provide an unambiguous classification of dynamical systems by identifying the invariance that distinguishes equivalence classes, which we use here to conceptualize classes of behavior [15, 16]. While phase flows provide a description of behaviors, they do not specify how these behaviors are instantiated in a high-dimensional system, e.g. handwritten letters (Sect. 9.4): the relevant flows are low-dimensional, yet the joints and muscles in the involved hand and arm operate in a potentially highdimensional state space. Specifically, phase flows do not mathematically describe the recruitment and constraint of these different degrees of freedom required to produce the low-dimensional behaviors themselves: another insight is required to account for why the system occupies only a small subset of its state space. Jirsa, Pillai and colleagues have developed [28, 29] a general framework, under the name Structured Flows on Manifolds, for understanding the realization of functional dynamics as an adiabatic contraction (cf. [14]) to a functionally relevant subset of phase space, called the manifold, followed by functional dynamics on a slower time scale, the phase flow, illustrated in Fig. 9.1 [28]. This process is of the form ξ˙i = −f (ξ )ξi + μgi (ξ )
(9.2)
where ξ ∈ N forms the set of functional or behavioral state variables, the function f (.) determines the shape and stability of the manifold, gi (.) is the ith component of the phase flow on the manifold, and μ 1 is a parameter creating a time scale separation between the manifold dynamics and phase flow dynamics.
9 Building Neurocognitive Networks with a Distributed Functional Architecture
103
Fig. 9.1 Structured Flows on Manifolds: In this space spanned by three behavioral dimensions, trajectories from a simulation of differential equations (lines in blue and red) show that the flow contracts quickly onto the manifold (surface in light blue) to engage one of the available phase flows; here, trajectories enter a bistable (red) or limit cycle (blue) flow. Strong perturbations can cause the flow to traverse the separatrix (black line) that distinguishes one flow from another
9.3 Distributed Functional Dynamics The explicit time scale separation of (9.2) distinguishes the low-dimensional space of the system’s dynamics from the functional dynamics within that space, but one of the goals is to obtain a relation between the network of underlying processes (for us, neural) and the emergent dynamics. To this end, we distribute the dynamics in (9.2) by relating the set of functional state variables, {ξ1 , . . . , ξN }, to the neural network state space, {q1 , . . . , qN }, using a linear mapping (here, matrices W and its inverse W† ) between the ξ and q spaces: ⎛ 1 ⎞⎛ ⎞ ⎛ †1 ⎞ ⎛ ⎞ q1 ξ1 v v ⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟ † (9.3) ξ = W q = ⎝ . ⎠ ⎝ . ⎠ ⇐⇒ q = Wξ = ⎝ . ⎠ ⎝ . ⎠ v†N
vN
qN
ξN
where v and v† are the components of the transform. Here, we assume that this transformation is orthonormal such that W† W = I, where I is an identity matrix, and that W† and W are both square matrices, however the latter condition may not be true when considering multiple networks that produce the same function; a treatment of this case is presented in [28]. The motivation for using such a mapping is that we can map the dynamics of the underlying network to functional dynamics and vice versa.
9.3.1 Functional Dynamics of Networks As an example, we start with a firing rate model, similar to a Wilson-Cowan model [35], of a neural population obeying intrinsic dynamics defined by V (.). This model uses a multiplicative form of coupling, weighted by ωij , and an unspecified synaptic function Sj (qj ): q˙i = V (qi ) + qi
N j =1
ωij Sj (qj )
(9.4)
104
M. Woodman et al.
where qi is the firing rate of the ith neural mass, and the index i = 1, . . . , N , where N is the size of the neural network. Here, we have assumed that the intrinsic dynamics of each neural mass is the same (hence V (.) is not indexed by i). We can use the coordinate transform to identify the dynamics of the network in a functional space: ⎛ ⎞ V (q1 ) + q1 N j =1 ω1j Sj (qj ) ⎜ ⎟ .. ⎟ W† q˙ = W† ⎜ (9.5) . ⎝ ⎠ N V (qN ) + qN j =1 ωNj Sj (qj )
N N †k i i j ξ˙k = vi V (v ξ ) + v ξ ωij S(v ξ ) (9.6) j =1
i=1
After the intrinsic node and connectivity functions are specified, it is possible to both describe in more detail the functional dynamics produced by this network, especially with respect to the manifolds, and the phase flows it is capable of producing.
9.3.2 Implementing Functional Dynamics in a Network We are interested in understanding what dynamics and coupling terms are necessary for a network to produce structured flows on manifolds. Here, we briefly review a discussion of this topic presented in [28]. We apply the coordinate transform W to (9.2) which yields ⎛ ⎞ −f (ξ )ξ1 + μg1 (ξ ) ⎜ ⎟ .. (9.7) Wξ˙ = W ⎝ ⎠ . −f (ξ )ξN + μgN (ξ ) q˙i =
N
vki −f (W† q)(v†k q) + μgk (W† q)
(9.8)
k=1
Continuing from this point, it is necessary to specify f (.) and g(.) to see how particular manifolds and dynamics might map to network node dynamics and connectivity. Of interest, still in this general form, is the point that the dynamics described in Sect. 9.2 can be produced by a network of interacting neural processes.
9.4 Functional Architectures In this section, we wish to demonstrate an application of Structured Flows on Manifolds to handwriting. In the following, we assume the manifold is partially constituted by the (x, y) dimensions of the page on which the writing occurs such that the trajectories in the page are projections of the trajectories flowing on the manifold. Here, we formulate a functional (multi-time scale) architecture that is capable of generating complex, sequential behaviors using phase flows as its functional modes (building blocks) (e.g. the multiple phase flows shown in Fig. 9.2 are the pieces of larger, more complex dynamics). As in Sect. 9.2, a low-dimensional phase flow accounts for the dynamics of a functional mode of the system (following (9.1), see also Fig. 9.2), yet here, the flow is a linear composition of all phase flows available in a repertoire: x˙ = F(x, t) = γi (t)fi (x) (9.9) i
9 Building Neurocognitive Networks with a Distributed Functional Architecture
105
Fig. 9.2 Illustration of a functional architecture for a sequence involving three distinct functional units. (A) The color coding indicates which subnetwork is active at each stage of the simulation. (B) Time series of the slow dynamics. (C), (D), (E) Vector fields of the 2-dimensional phase flows on a plane coding for the three functional modes: (C) a monostable point attractor (black filled circle) and a separatrix (red curve) that endows the system with threshold properties; (D) a limit cycle system exhibiting sustained oscillations with one unstable equilibrium point (open circle); (E) and a bistable system with two point attractors, a saddle node (red filled diamond) and a separatrix that partitions the state space into two different regimes of attraction. The active subnetwork uniquely determines which vector field is active (C), (D) or (E), and it thus determines the flow of the fast functional dynamics. (F): Time series of the fast functional dynamics (x, y) (same color coding) and the instantaneous non-autonomous control signal I (t), which induces one cycle per stimulus to the first (monostable) phase flow and one half cycle per stimulus to the third (bistable) one
where x and fi are defined as in (9.1) (index i denotes the ith phase flow) and F denotes the expressed phase flow as a function of time t. Operating signals γi s act as weighting coefficients for the ith acting on a time scale slower than that of the manifold and phase flow. More specifically, the vector γ (t) governs the transition among phase flows, and sequentially generates or selects functional modes, each of which can exhibit qualitatively different dynamics (Fig. 9.2B). Apart from short transitions between successive phase flows (or in case of perturbation), functional modes do not overlap in time, i.e. only one phase flow is expressed at any time t because typically γi (t) = 1 while γj (t) = 0 ∀j = i.
9.4.1 Sequential Function The slows γi s follow autonomous heteroclinic sequence dynamics implementing the principle of winner-less competition [13, 19, 20, 31, 33]: 2 2 γ˙i = 1 − γj2 − ci+1 γi+1 − rij γj2 + ei−1 γi−1 (9.10) γi j =i,i−1,i+1
j
where ci+1 , ei−1 , rij > 0 are the coupling coefficients between γi and the following γi+1 , preceding γi−1 and all other γi s in the sequence, respectively (note that the indices i denote a sequential order). The conditions min(cj , ej + rj ) > ej (9.11) j
j
106
M. Woodman et al.
where rj = maxi rij , are necessary and sufficient for a robust heteroclinic cycle to emerge. Other options are also available such as nonautonomous γi (t)s, synergetic oscillations as in [8], or systems with additional feedback from the fast variables in the SFM to the slow sequential dynamics of the γi s. Each scheme has different implications for the transitions between phase flows, and consequently, for the dynamics of the entire system. In all cases but one (nonautonomous operating signals), the entire system can be viewed as a single high dimensional phase flow involving multiple time scales. Apart from the slow dynamics that changes the expressed phase flow topology, the architecture also provides an optional involvement of an instantaneous operating signal I (t) that leaves the flow unaffected except for a specific time during which it provides a meaningful perturbation to the dynamics, as in Fig. 9.2F. This signal can temporarily set the system into a desired state or, when combined with phase flows containing separatrices, may change the spatiotemporal dynamics of a functional mode (by, for instance, initiating a process, as in [15]). The functional architecture thus combines invariant features (the SFM) with those that are variable across repeated instances of a functional mode’s appearence in behavior. Perdikis et al. [27] provide computational evidence for the efficiency of such a control scheme.
9.4.2 Handwriting Example Figure 9.3 demonstrates a ‘handwritten’ word generated by our functional architecture. Characters are modeled as 3-dimensional phase flows following the general set of differential equations: ⎧ ⎨ y˙ = k(z − (y − y1∗ )(y − y2∗ )(y − y3∗ )), u˙ = f(u) ⇔ z˙ = −(az + b(y − y0∗ )), ⎩ ˙ x˙ = fx (y, y), where u = [x, y, z] represents the state vector, and y and z follow Excitator-like [16] dynamics ∗ , a, and b specify(monostable or limit cycle; see Fig. 9.2C, D respectively), with parameters y0−3 ing the exact shape of the nullclines and k 1 introducing a separation of time scales that endows ˙ is designed to yield the desired the phase flows with threshold-like dynamics. The function fx (y, y) letter shape in a handwriting workspace, i.e. the (x, y) plane. All the functional dynamics evolves on the surface of a cylindrical manifold, which corresponds to the handwriting workspace, aligned with the x axis. Finally, a heteroclinic sequence generates the slow dynamics that concatenates the phase flows to yield a written word. Such a decomposition of a motor sequence into simpler units (e.g. movement primitives or strokes) is well motivated in the motor control literature [7, 22, 24, 25].
9.5 Discussion In this paper, we have briefly presented Structured Flows on Manifolds as a new approach toward understanding the organization of functional, low-dimensional behaviors and their emergence from distributed neural networks. Our approach is based on phase flows which unambiguously describe the evolution of the system and capture its function. As phase flow topologies identify behavioral invariance, they allow for a stringent qualitative classification of behavior. By virtue of the distribution of functional dynamics, these high dimensional systems are organized by their function, yielding both stability and flexibility. According to our approach, the dynamics of a high-dimensional system generally operate on multiple time scales: the fast dynamics contract to a low-dimensional manifold (a subspace of the entire system phase space) within which the phase flow prescribes for the system a
9 Building Neurocognitive Networks with a Distributed Functional Architecture
107
Fig. 9.3 The “handwritten” word “flow” generated by the proposed functional architecture. (A): The output trajectory on the page, i.e. the plane (x, y). (B): The output trajectory in the 3-dimensional phase (state) space spanned by x, y and z. One can see that the functional dynamics is restricted to the surface corresponding to the handwriting workspace, the state space with behavioral semantics, of a cylindrical manifold along the x axis. (C): From top to bottom: heteroclinic sequence accounting for the slow sequential dynamics γ1−4 , output time series x, y and z, and finally, the instantaneous operating signal I (t) is used to trigger one cycle per stimulus in the monostable phase flows of the letters “l” and “o”. The letter “f” is implemented using a monostable phase flow (here, initial conditions obviate the need for an I (t) stimulus) and “w” with a limit cycle. The time series are color coded as in Fig. 9.2
slower process. This temporal structure allows for a multitude of structured flows on manifolds, which we see as potential perceptive and active building blocks for human cognition and behavior. As shown in the example of handwriting, continuously evolving sequences of behaviors can be chained together through control signals dynamics, perhaps from another system with its own functional organization, operating on a time scale slower than that of the behavioral function. Finally, we suggest how, i.e. under which constraints (explored in more detail in [28]), low-dimensional, flexible and robust behavioral function emerges from inherently high-dimensional neural networks. Many other architectures describing cognitive processes have been developed within symbolic, connectionist or dynamicist paradigms or hybrids thereof. While these existing theories demonstrate congruence with many aspects of cognition in specific demonstrations, of which our approach currently has only a few, including learning, which we have not yet addressed, our approach, very much in the dynamicist spirit, confers distinct benefits: it brings compliance with and takes advantage of biological and neurological complexity. Furthermore, the use of dynamics accommodates many of the existing modeling strategies [12]. For example, the symbolicism present in the original AI research and that of hybrid models is captured by the topology of phase flows; by working with networks, we derive benefits inherent in the connectionist paradigm. Lastly, whereas many other cognitive architectures are based on memory, representations and information processing [21], in our vocabulary, cognition is a self-organizing, spatiotemporal pattern formation process in the brain. Thus, our approach describes biological behavior, function and cognition with great explanatory power. Our framework may be read as a theory of the relation between structure and function, a long standing biological question: as shown in Sect. 9.4, variables evolving on a slower time scale can parametrize the dynamics on faster time scales. In this way, slow processes may provide structure for faster processes such that a system’s overall function organizes its component dynamics. Finally,
108
M. Woodman et al.
as demonstrated in [28], the distributed nature of neural networks can confer important functional properties, such as stability, flexibility and resilience to perturbation or lesion, on a large-scale system. This is in contrast to those systems designed by humans (e.g., cars, computers, etc.) that lack these properties. Overall, we suggest a dynamical systems framework for both reverse engineering human cognition and forward engineering new biologically inspired cognitive systems.
References 1. Tschacher, W., Dauwalder, J.-P.: The Dynamical Systems Approach to Cognition. World Scientific, Singapore (2003) 2. Bernstein, N.: The Co-ordination and Regulation of Movements. Pergamon Press, Oxford (1967) 3. Breakspear, M., Jirsa, V.K.: Neuronal Dynamics and Brain Connectivity. Springer, Berlin (2007) 4. Bressler, S.L.: Neurocognitive networks. Scholarpedia 3(2), 1567 (2008) 5. Bressler, S.L., Tognoli, E.: Operational principles of neurocognitive networks. Int. J. Psychophysiol. 60, 139–148 (2006) 6. Buchanan, J.J., Kelso, J.A.S., de Guzman, G.: Self-organization of trajectory formation. Biol. Cybern. 76(4), 257– 273 (1997) 7. Bullock, D., Grossberg, S., Mannes, C.: A neural network model for cursive script production. Biol. Cybern. 70, 15–28 (1993) 8. Ditzinger, T., Haken, H.: Oscillations in the perception of ambiguous patterns. Biol. Cybern. 61, 279–287 (1989) 9. Dodel, S.M., Pillai, A.S., Fink, P.W., Muth, E.R., Stripling, R., Schmorrow, D.D., Cohn, J.V., Jirsa, V.K.: Observerindependent dynamical measures of team coordination and performance. In: Danion, F., Latash, M.L. (eds.) Motor Control: Theories, Experiments, and Applications, pp. 72–101. Oxford University Press, London (2010) 10. Edelman, G.M., Gally, J.A.: Degeneracy and complexity in biological systems. PNAS 98(24) (2001) 11. Fuster, J.M.: Cortex and Mind: Unifying Cognition. Oxford University Press, London (2005) 12. Giunti, M.: Dynamical Models of Cognition in Mind as Motion. MIT Press, Cambridge (1998). Chap. 18 13. Grossberg, S.: Biological competition: Decision rules, pattern formation, and oscillations. Proc. Natl. Acad. Sci. USA 77, 2338–2342 (1980) 14. Haken, H.: Synergetics: Introduction and Advanced Topics. Springer, Berlin (2004) 15. Huys, R., Smeeton, N.J., Hodges, N.J., Beek, P.J., Williams, A.M.: On the dynamic information underlying visual anticipation skill. Atten. Percept. Psychophys. 70, 1217–1234 (2008) 16. Jirsa, V.K., Kelso, J.A.S.: The excitator as a minimal model for the coordination dynamics of discrete and rhythmic movement generation. J. Mot. Behav. 37(1), 35–51 (2005) 17. Jirsa, V.K., Mersmann, J.: Patent Application (2006) 18. Kelso, J.A.S.: Dynamic Patterns. MIT Press, Cambridge (1995) 19. Krupa, M.: Robust heteroclinic cycles. J. Nonlinear Sci. 7, 129–176 (1997) 20. Krupa, M., Melbourne, I.: Asymptotic stability of heteroclinic cycles in systems with symmetry. Ergod. Theory Dyn. Syst. 15, 121–147 (1995) 21. Langley, P., Laird, J.E., Rogers, S.: Cognitive architectures: Research issues and challenges. Cognitive Systems Research 10(2) (2009) 22. Lashley, K.S.: The Problem of Serial Order in Behavior. Wiley, New York (1951) 23. Maturana, H.R., Varela, F.J.: Autpoiesis and Cognition: The Realization of the Living. Springer, Berlin (1991) 24. Morasso, Mussa-Ivaldi, F.A.: Trajectory formation and handwriting, a computational model. Biol. Cybern. 45, 131–142 (1982) 25. Mussa-Ivaldi, F.A., Bizzi, E.: Motor learning through the combination of primitives. Philos. Trans. R. Soc. Lond. A 355, 1755–1769 (2000) 26. Fink, P.W., Kelso, J.A.S., Jirsa, V.K., de Guzman, G.: Recruitment of degrees of freedom stabilizes coordination. J. Exp. Psychol. Hum. Percept. Perform. 26(2), 671–692 (2000) 27. Perdikis, D., Huys, R., Jirsa, V.: Complex processes from dynamical architectures with time-scale hierarchy Brezina V. PLoS ONE 6(2) (2011). Available at: http://dx.plos.org/10.1371/journal.pone.0016589. 28. Pillai, A.S.: Structured flows on manifolds: Distributed functional architectures; fulltext available at: http://purl. fcla.edu/fau/77649. Ph.D. thesis, Florida Atlantic University (2008) 29. Pillai, A.S., Jirsa, V.K.: Structured flows on manifolds: Distributed functional architectures (In preparation) 30. Port, R.F., van Gelder, T. (eds.): Mind as Motion: Explorations in the Dynamics of Cognition. MIT Press, Cambridge (1998) 31. Rabinovich, M., Huerta, R., Varona, P., Afraimovich, V.S.: Transient cognitive dynamics, metastability and decision making. PLoS Comput. Biol. 4, 1000072 (2008)
9 Building Neurocognitive Networks with a Distributed Functional Architecture
109
32. Rabinovich, M.I., Varona, P., Selverston, A.I., Abarbanel, H.D.I.: Dynamical principles in neuroscience. Rev. Mod. Phys. 78(4) (2006) 33. Seliger, P., Tsimring, L.S., Rabinovich, M.I.: Dynamics-based sequential memory: winnerless competition of patterns. Phys. Rev. E, Stat. Nonlinear Soft Matter Phys. 67(1 Pt 1), 011905 (2003) 34. Spivey, M.: The Continuity of Mind. Oxford University Press, London (2008) 35. Wilson, H.R., Cowan, J.D.: Excitatory and inhibitory iteractions in localized populations of model neurons. Biophys. J. 12, 1–24 (1972)
Chapter 10
Reverse Engineering for Biologically Inspired Cognitive Architectures: A Critical Analysis Andreas Schierwagen
Abstract Research initiatives on both sides of the Atlantic try to utilize the operational principles of organisms and brains to develop biologically inspired, artificial cognitive systems. This paper describes the standard way bio-inspiration is gained, i.e. decompositional analysis or reverse engineering. The indisputable complexity of brain and mind raise the issue of whether they can be understood by applying the standard method. Using Robert Rosen’s modeling relation, the scientific analysis method itself is made a subject of discussion. It is concluded that the fundamental assumption of cognitive science, i.e. complex cognitive systems are decomposable, must be abandoned. Implications for investigations of organisms and behavior as well as for engineering artificial cognitive systems are discussed.
10.1 Introduction Wer will was Lebendig’s erkennen und beschreiben, Sucht erst den Geist heraus zu treiben, Dann hat er die Teile in seiner Hand, Fehlt, leider! nur das geistige Band. J.W. G OETHE , Faust, Erster Teila
For some time past, computer science and engineering devote close attention to the functioning of the brain. It has been argued that recent advances in cognitive science and neuroscience have enabled a rich scientific understanding of how cognition works in the human brain. Thus, research programs have been initiated by leading research organizations on both sides of the Atlantic to develop new cognitive architectures and computational models of human cognition [1–6] (see also [7], and references therein). Two points are emphasized in the research programs: First, there is impressing abundance of available experimental brain data, and second, we have the computing power to meet the enormous requirements to simulate a complex system like the brain. Given the improved scientific understanding of the operational principles of the brain as a complexly organized system, it should then be possible to build an operational, quantitative model of the brain. Tuning the model could be achieved using the deluge of empirical data. The main method used in empirical research to integrate the data derived from the different levels of the brain organization is reverse engineering. Originally a concept in engineering and computer science, reverse engineering involves as first step the process of detailed examination of a functional system and its dissecting at the physical level into component parts, i.e. decompositional analysis. In A. Schierwagen () Institute for Computer Science, Intelligent Systems Department, University of Leipzig, Leipzig, Germany e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_10, © Springer Science+Business Media, LLC 2011
111
112
A. Schierwagen
a second step, the (re-)construction of the system is attempted, see below. This principle is usually not much discussed with respect to its assumptions, conditions and range1 but see [10–12]. Together, according to the prevailing judgement there is nothing in principle that we do not understand about brain organization. All the knowledge about its ‘building blocks’ and connectivity is present (or can be derived), and needs only to be put into the model. This view is widely taken; it represents the belief in the power of the reverse engineering method. As I am going to show in this paper, there is, however, substantial evidence to question this belief. It turns out that this method in fact ignores something fundamental, namely that biological and engineered systems are basically different in nature. The paper is organized as follows. Section 10.2 presents the fundamental assumption employed in the cognitive and brain sciences, i.e. the assumption that both brain and mind are decomposable. In Sect. 10.3, the concepts of decompositional analysis, reverse engineering and localization are reviewed. The following Sect. 10.4 is devoted to modularization and its relation to the superposition principle of system theory. Then, Sect. 10.5 shortly touches on Blue Brain and SyNAPSE, two leading reverse-engineering projects. Both projects are based on the hypothesis of the columnar organization of the cortex. The rationale underlying reverse engineering in neurocomputing or computational neuroscience is outlined. New findings are mentioned indicating that the concept of the basic uniformity of the cortex is untenable. Section 10.6 ponders about the claim that non-decomposability is not an intrinsic property of complex systems but is only in our eyes, due to insufficient mathematical techniques. For this, Rosen’s modeling relation is explained which enables us to make the scientific analysis method itself a subject of discussion. It is concluded that the fundamental assumption of cognitive science must be abandoned. We end the paper by some conclusions about the relevance of Rosen’s [13–15] work for the study of organisms and behavior as well as for engineering artificial cognitive systems.
10.2 Conceptual Foundations of Cognitive and Brain Sciences Brains, even those of simple animals, are enormously complex structures, and it is a very ambitious goal to cope with this complexity. The scientific disciplines involved in cognition and brain research (Fig. 10.1) are committed to a common method to explain the properties and capacities of complex systems. This method is decompositional analysis, i.e. analysis of the system in terms of its components or subsystems. Since Simon’s influential book “The Sciences of the Artificial” [17], (near-) decomposability of complex systems has been accepted as fundamental for the cognitive and brain sciences (CCN). We call this the fundamental assumption for the cognitive and brain sciences. Simon [17], Wimsatt [18] and Bechtel and Richardson [19], among others, have further elaborated this concept. They consider decomposability a continously varying system property, and state, roughly, that systems fall on a continuum from aggregate (full decomposable) to integrated (non-decomposable). The fundamental assumption states that cognitive and brain systems are non-ideal aggregate systems; the capacities of the components are internally realized (strong intra-component interactions), and interactions between components do not appreciably contribute to the capacities, they are much weaker than the intracomponent interactions. Hence, the description of the complex system as a set of weakly interacting components seems to be a good approximation. This property of complex systems, which should have evolved through natural selection, was called near-decomposability and characterized as follows [17]: 1 Only
recently, differences between proponents of reverse engineering on how it is appropriately to be accomplished became public. The prominent heads of two reverse engineering projects, Markram [2] and Modha [8], disputed publicly as to what granularity of the modeling is needed to reach a valid simulation of the brain. Markram questioned the authenticity of Modha’s respective claims [9].
10
Reverse Engineering for Biologically Inspired Cognitive Architectures
113
Fig. 10.1 The Cognitive Hexagon (as of 1978 [16]). Cognitive science comprised six disciplines, all committed to decompositional analysis as the basic research method
Near-decomposability 1. In a nearly decomposable system, the short-run behavior of each of the component subsystems is approximately independent of the short-run behavior of the other components; 2. In the long run the behavior of any one of the components depends in only an aggregate way on the behavior of the other components [17, p. 100]. Thus, if the capacities of a near-decomposable system are to be explained, to some approximation its components can be studied in isolation, and based on their known interactions, their capacities eventually combined to generate the system’s behavior. In other words, the aforementioned fundamental assumption represents the conceptual basis for reverse engineering the brain and mind. Let us summarize this assumption because it is of central importance in the following: Fundamental assumption for cognitive and brain sciences Cognitive and brain systems are nonideal aggregate systems. The capacities of the components are internally realized (strong intracomponent interactions) while interactions between components are negligible with respect to capacities. Any capacity of the whole system then results from superposition of the capacities of its subsystems. This property of cognitive and brain systems should have evolved through natural selection and is called near-decomposability.
10.3 Decompositional Analysis, Localization and Reverse Engineering The primary goal of cognitive science and its subdisciplines is to understand cognitive capacities like vision, language, memory, planning etc. Capacities are considered as dispositional properties which can be explained via decompositional analysis, see Fig. 10.2. In CBS, two types of decompositional analysis are differentiated, i.e. functional analysis and structural analysis [20–22]. Functional analysis is the type of decompositional analysis that proceeds without reference to the material composition of the system. It is concerned with the sub-functions of some hypothesized components of the whole system which enable this whole system to have certain capacities and properties and to realize corresponding functions. Structural analysis involves to attempt to identify the structural, material components of the system. Thus, the material system S can be decomposed into context-independent components Sj , i.e. their individual properties are independent of the decomposition process itself and of S’s environment. Functional analysis and structural analysis must be clearly differentiated, although in practice, there is a close interplay between them (as indicated by the double arrow in Fig. 10.2). This is obvious in the
114
A. Schierwagen
Fig. 10.2 View on decompositional analysis of brain and cognition. See text for details
localization approach which combines both analysis types, i.e. a specific component function is linked with a specific structural component. Functional analysis should also be differentiated from capacity analysis. The former is concerned with the functions performed by components of the whole system which enable this whole system to have certain capacities and properties. The latter is concerned with the dispositions or abilities of the whole system, whereas functional and structural analysis is concerned with the functional and structural bases of those dispositions or abilities. Understating the case, the localization approach is sometimes described as a hypothetical identification which is to serve as research heuristics [19]. In fact, however, the majority of cognitive scientists considers it as fundamental and indispensable (e.g. [28]). Obviously, decompositional analysis and reverse engineering are closely related. Reverse engineering is a two-step method: It has the decompositional analysis of the original system as the first, basic step, while the second step consists in creating duplicates of the original system, including computer models. It should be noticed that there is no reason to assume that functional and structural components match up one-to-one! Of course, it might be the case that some functional components map properly onto individual structural components—the dream of any cognitive scientist working as ‘reverse engineer’. It is rather probable, however, for a certain functional component to be implemented by non-localized, spatially distributed material components. Conversely, a given structural component may implement more than one distinct function. According to Dennett [24, p. 273]: “In a system as complex as the brain, there is likely to be much ‘multiple, superimposed functionality’.” With other words, we cannot expect specific functions to be mapped to structurally bounded neuronal structures, and vice versa. It is now well known that Dennett’s caveat has been proved as justified (e.g. [25]). Thus the value of the localization approach as ‘research heuristics’ seems rather dubious [26, 27].
10.4 Complex Systems and Modularization In CBS and in other fields of science, the components of near-decomposable systems are called modules. This term originates from engineering; it denotes the process of decomposing a product into building blocks—modules—with specified interfaces, driven by the designer’s interests and intended functions of the product. Modularized systems are linear in the sense that they obey an analog of the superposition principle of linear system theory in engineering [29]. The behavior of a decomposable system results from the linear combination of the behavior of the system modules. In some respects,
10
Reverse Engineering for Biologically Inspired Cognitive Architectures
115
this principle represents a formal underpinning of the constructive step in reverse engineering2 (see Sects. 10.1, 10.3, 10.5). The terms ‘linear’ and ‘nonlinear’ are often used in this way: ‘Linear’ systems are decomposable into independent modules with linear, proportional interactions while ‘nonlinear’ systems are not3 [29, 30]. Applying this concept to the systems at the other end of the complexity scale, the integrated systems are basically not decomposable, due to the strong, nonlinear interactions involved. Thus, past or present states or actions of any or most subsystems always affect the state or action of any or most other subsystems. In practice, analyses of integrated systems nevertheless try to apply the methodology for decomposable systems, in particular if there is some hope that the interactions can be linearized. Such linearizable systems have been above denoted as nearly decomposable. However, in the case of strong nonlinear interactions, we must accept that decompositional analysis is not applicable. Already decades ago this insight was stressed. For example, Levins [31, p. 76 ff.] proposed a classification of systems into aggregate, composed and evolved systems. While the aggregate and the composed would not cause serious problems for scientific analyses, Levins emphasized the special character of evolved systems: A third kind of system no longer permits this kind of analysis. This is a system in which the component subsystems have evolved together, and are not even obviously separable; in which it may be conceptually difficult to decide what are the really relevant component subsystems. . . . The decomposition of a complex system into subsystems can be done in many ways. . . it is no longer obvious what the proper subsystems are, but these may be processes, or physical subsets, or entities of a different kind.
The question then arises: Should we care about integrated systems, given the fundamental assumption that all relevant systems are nearly decomposable? Non-decomposability then would be only in our eyes, and not an intrinsic property of strongly nonlinear systems, and—as many cognitive and computer scientists believe—scientific progress will provide us with the new mathematical techniques required to deal with nonlinear systems. We will return to this problem in Sect. 10.6.
10.5 Reverse Engineering the Brain and Neurocomputing 10.5.1 The Column Concept A guiding idea about the composition of the brain is the hypothesis of the columnar organization of the cerebral cortex. It was developed mainly by Mountcastle, Hubel and Wiesel, and Szenthágothai (e.g. [32–34]), in the spirit of the highly influential paper “The basic uniformity in structure of the neocortex” published in 1980 [37]. According to this hypothesis (which has been taken more or less as fact by many experimental as well as theoretical neuroscientists), the neocortex is composed of ‘building blocks’ of repetitive structures, the ‘columns’ or neocortical microcircuits, and it is characterized by a basic canonical pattern of connectivity. In this scheme all areas of neocortex would perform identical or similar computational operations with their inputs. Referring to and based on these works, several projects started recently, among them the Blue Brain Project [2] and the SyNAPSE Project [5]. They are considered to be “attempts to reverse-engineer the mammalian brain, in order to understand brain function and dysfunction through detailed simulations” [2] or, more pompous, “to engineer the mind” [5]. The central role in these projects play 2A
corresponding class of models in mathematics is characterized by a theorem stating that for homogeneous linear differential equations, the sum of any two solutions is itself a solution.
3 We
must differentiate between the natural, complex system and its description using modeling techniques from linear system theory or nonlinear mathematics.
116
A. Schierwagen
cortical microcircuits or columns. As Maas and Markram [35] formulate, it is a “tempting hypothesis regarding the computational role of cortical microcircuits . . . that there exist genetically programmed stereotypical microcircuits that compute certain basis function.” Their paper well illustrates the modular approach fostered, e.g. by [12, 36, 38, 39]. Invoking the localization concept, the tenet is that there exist fundamental correspondences among the anatomical structure of neuronal networks, their functions, and the dynamic patterning of their active states. Starting point is the ‘uniform cortex’ with the cortical microcircuit or column as the structural component. The question for the functional component is answered by assuming that there is a one-to-one relationship between the structural and the functional component (see Sect. 10.3). Together, the modularity hypothesis of the brain is considered to be both structurally and functionally well justified. As quoted above, the goal is to examine the hypothesis that there exist genetically programmed stereotypical microcircuits that compute certain basis function.
10.5.2 Neocortical Microcircuits and Basis Functions The general approach to cognitive capacities takes for granted that “cognition is computation”, i.e. the brain produces the cognitive capacities by computing functions.4 According to the scheme formulated in Sect. 10.3, reverse engineering of the cortex (or some subsystem) as based on the column concept and performed from a neurocomputational perspective then proceeds as follows. Reverse engineering the cortex 1. Capacity analysis: A specific cognitive capacity is identified which is assumed to be produced through the brain by computing a specific function. 2. Decompositional analysis: a. Functional (computational) analysis: From mathematical analysis and approximation theory it is well-known that a broad class of practically relevant functions f can be approximated by composition or superposition of some basis function. If we assume that some basis functions can be identified, they provided the components of a hypothetical functional decomposition. b. Structural analysis: Provide evidence that cortical microcircuits are the anatomical components of the cortex. 3. Localization: The next step consisted in linking the component functions with the component parts by suggesting that the basis function are computed by the structural components (columns or cortical microcircuits). 4. Synthesis/Superposition: The specific cognitive capacity or function under study now can be explained by combining the basis functions determined in step 2.a. The composition rules were implicitly contained in the interconnection pattern of the circuits, thus enabling the brain system under study to generate the specific cognitive capacity. The question now is, however—Are the assumptions and hypotheses made appropriate, or must they considered as too unrealistic? In fact, most of the underlying hypotheses have been challenged only recently. To start with the assumptions about the structural and functional composition of the cortex, the notion of a basic uniformity in the cortex with respect to the density and types of neurons per column for all species turned out to be untenable (e.g. [40–42]). It has been impossible to find the cortical microcircuit that computes specific basis function [43]. No genetic mechanism has been 4 See [7] for discussion of the computational approaches (including the neurocomputational one) to brain function, and their shortcomings.
10
Reverse Engineering for Biologically Inspired Cognitive Architectures
117
deciphered that designates how to construct a column. It seems that the column structures encountered in many species (but not in all) represent spandrels (structures that arise non-adaptively, i.e. as an epiphenomenon) in various stages of evolution [44]. If we evaluate the column concept of the cortex employed in theories of brain organization, it is obvious that—employing the localization concept mentioned in Sect. 10.3—hypothesized structural components (cortical columns) have been identified with alike hypothetical functional components (basis function). There is evidence, however, for a certain functional component to be implemented by spatially distributed networks and, vice versa, for a given structural component to implement more than one distinct function. With other words, it is not feasible for specific functions to be mapped to structurally bounded neuronal structures [25, 40–42]. This means, although the column concept is an attractive idea both from neurobiological and computational point of view, it cannot be used as an unifying principle for understanding cortical function. Thus, it has been concluded that the concept of the cortex as a ‘large network of identical units’ should be replaced with the idea that the cortex consists of ‘large networks of diverse elements’ whose cellular and synaptic diversity is important for computation [45–47]. It is worth to notice that the reported claims for changes of the research concept completely remain within the framework of reverse engineering. A more fundamental point of criticism concerns the methods of decompositional analysis and reverse engineering themselves and will be discussed in the next section.
10.6 Complex Systems and Rosen’s Modeling Relation In Sect. 10.4, we concluded that integrated systems are basically non-decomposable, thus resisting the standard analysis method. We raised the question: Should we at all care about integrated systems, given the fundamental assumption that all relevant systems are nearly decomposable? According to the prevalent viewpoint in CCN, non-decomposability is not an intrinsic property of complex, integrated systems but is only in our eyes, due to insufficient mathematical techniques (e.g. [48–51]). Bechtel and Richardson, instead, warn that the assumption according to which nature is decomposable and hierarchical might be false [19, p. 27]: “There are clearly risks in assuming complex natural systems are hierarchical and decomposable.” Rosen [14, 15] has argued that understanding complex, integrated systems requires making the scientific analysis method itself a subject of discussion. A powerful method of understanding and exploring the nature of the scientific method, and in particular, reverse engineering, provides his modeling relation. It is this relation by which scientists bring “entailment structures into congruence” [14, p. 152]. This can be explained as follows. The modeling relation is the set of mappings shown in Fig. 10.3 [13, 52]. It relates two systems, a natural system N and a formal system F , by a set of arrows depicting processes or mappings. The assumption is that this diagram represents the various processes which we are carrying out when we perceive the world. N is a part of the physical world that we wish to understand (in our case: organism, brain), in which things happen according to rules of causality (arrow 1). On the right, F represents symbolically the parts of the natural system (observables) which we are interested in, along with formal rules of inference (arrow 3) that essentially constitute our working hypotheses about the way things work in N , i.e. the way in which we manipulate the formal system to try to mimic causal events observed or hypothesized in the natural system on the left. Arrow 2 represents the encoding of the parts of N under study into the formal system F , i.e. a mapping that establishes the correspondence between observables of N and symbols defined in F . Predictions about the behavior
118
A. Schierwagen
Fig. 10.3 Rosen’s Modeling Relation. A natural system N is modeled by a formal system F . Each system has its own internal entailment structures (arrows 1 and 3), and the two systems are connected by the encoding and decoding processes (arrows 2 and 4). From http://www.panmere.com
in F , according to F ’s rules of inference, are compared to observables in N through a decoding represented by arrow 4. When the predictions match the observations on N , we say that F is a successful model for N . It is important to note that the encoding and decoding mappings are independent of the formal and natural systems, respectively. In other words, there is no way to arrive at them from within the formal system or natural system. That is, the act of modeling is really the act of relating two systems in a subjective way. That relation is at the level of observables; specifically, observables which are selected by the modeler as worthy of study or interest. Given the modeling relation and the detailed structural correspondence between our percepts and the formal systems into which we encode them, it is possible to make a dichotomous classification of systems into those that are simple or predicative and those that are complex or impredicative. This classification can refer to formal inferential systems such as mathematics or logic, as well as to physical systems. As Rosen showed [13], a simple system is one that is definable completely by algorithmic method: All the models of such a system are Turing-computable or simulable. When a single dynamical description is capable of successfully modeling a system, then the behaviors of that system will, by definition, always be correctly predicted. Hence, such a system will be predicative in the sense that there will exist no unexpected or unanticipated behavior. A complex system is by exclusion not a member of the syntactic, algorithmic class of systems. Its main characteristics are as follows. A complex system possesses non-computable models; it has inherent impredicative loops in it. This means, it requires multiple partial dynamical descriptions—no one of which, or combination of which, suffices to successfully describe the system. It is not a purely syntactic system, it necessarily includes semantic elements, and is not formalizable. Complex systems also differ from simple ones in that complex systems are not simply summations of parts—they are non-decomposable. This means, when a complex system is decomposed, its essential nature is broken by breaking its impredicative loops. This has important effects. Decompositional analysis is inherently destructive to what makes the system complex—such a system is not decomposable without losing the essential nature of the complexity of the original system! In addition, by being not decomposable, complex systems no longer have analysis and synthesis as simple inverses of each other. Building a complex system is therefore not simply the inverse of any analytic process of decomposition into parts. In other words, reverse engineering the brain—a complex, integrated and thus non-decomposable system—must necessarily fail and will not provide the envisaged understanding! It should be stressed that simple and complex systems after Rosen’s definition cannot be directly related to those sensu Simon (Sects. 10.2, 10.4). While Rosen’s approach yields a descriptive definition of complexity, Simon’s is interactional, see [53]. It seems clear, however, that Rosen’s ‘simple systems’ comprise Simon’s full- and near-decomposable systems, and Rosen’s ‘complex systems’ correspond to Simon’s non-decomposable, integrated systems. No matter which definition is applied, the conclusion about the brain’s non-decomposability remains valid.
10
Reverse Engineering for Biologically Inspired Cognitive Architectures
119
10.7 Conclusions If one attempts to understand a complex system like the brain it is of crucial importance if general operation principles can be formulated. Traditionally, approaches to reveal such principles follow the line of decompositional analysis as expressed in the fundamental assumption of cognitive and computational neuroscience, i.e. cognitive systems like other, truly complex systems are decomposable. Correspondingly, reverse engineering has been considered the appropriate methodology to understand the brain and to engineer artificial cognitive systems. The claim was discussed that non-decomposability is not an intrinsic property of complex, integrated systems but is only in our eyes, due to insufficient mathematical techniques. For this, the scientific analysis method itself was considered. Referring to results from mathematics and system theory, I have presented arguments for the position that the dominant complexity concept of cognitive and computational neuroscience underlying reverse engineering needs revision. The updated, revised concept must comprise results from the nonlinear science of complexity, and insights expressed, e.g., in Rosen’s work on life and cognition. It was concluded that the decomposability assumption of cognitive science must be abandoned. Organisms and brains are complex, integrated systems which are non-decomposable. This insight implies that there is no ‘natural’ way to decompose the brain, neither structurally nor functionally. We must face the uncomfortable insight that in cognitive science and neuroscience we have conceptually, theoretically, and empirically to deal with complex, integrated systems which is much more difficult than with simple, decomposable systems of quasi–independent modules! Thus, we cannot avoid (at least in the long run) subjecting research goals such as the creation of ‘brain-like intelligence’ and the like to analyses which apprehend the very nature of natural complex systems. a Translation
of Goethe’s Verse by George Madison Priest
Who’ll know aught living and describe it well, Seeks first the spirit to expel. He then has the component parts in hand But lacks, alas! the spirit’s band. J.W. G OETHE, Faust, First Part
References 1. Biologically-Inspired Cognitive Architectures, Proposer Information Pamphlet (PIP) for Broad Agency Announcement 05-18. DARPA Information Processing Technology Office, Arlington, VA (2005) 2. Markram, H.: The blue brain project. Nat. Rev., Neurosci. 7, 153–160 (2006) 3. Albus, J.S., Bekey, G.A., Holland, J.H., Kanwisher, N.G., Krichmar, J.L., Mishkin, M., Modha, D.S., Raichle, M.E., Shepherd, G.M., Tononi, G.: A proposal for a decade of the mind initiative. Science 317, 1321 (2007) 4. Perry, W., Broers, A., El-Baz, F., Harris, W., Healy, B., Hillis, W.D., et al.: Grand challenges for engineering. National Academy of Engineering, Washington (2008) 5. Systems of Neuromorphic Adaptive Plastic Scalable Electronics (SyNAPSE). DARPA/IBM (2008) 6. European Commission, ICT Call 6 of the 7th Framework Programme, Objective 2.1: Cognitive Systems and Robotics (2009) 7. Schierwagen, A.: Brain organization and computation. In: Mira, J., Alvarez, J.R. (eds.) IWINAC 2007, Part I: Bio-inspired Modeling of Cognitive Tasks. LNCS, vol. 4527, pp. 31–40 (2007) 8. Ananthanarayanan, R., Esser, S.K., Simon, H.D., Modha, D.S.: The cat is out of the bag: cortical simulations with 109 neurons and 1013 synapses. Supercomputing 09. In: Proc. ACM/IEEE SC2009 Conference on High Performance Networking and Computing, Nov. 14–20, 2009, Portland, OR (2009) 9. Brodkin, J.: IBM cat brain simulation dismissed as ‘hoax’ by rival scientist. Network World November, 24 (2009) 10. Dennett, D.C.: Cognitive science as reverse engineering: several meanings of ‘top down’ and ‘bottom up’. In: Prawitz, D., Skyrms, B., Westerståhl, D. (eds.) Logic, Methodology and Philosophy of Science IX, pp. 679–689. Elsevier, Amsterdam (1994)
120
A. Schierwagen
11. Marom, S., Meir, R., Braun, E., Gal, A., Kermany, E., Eytan, D.: On the precarious path of reverse neuroengineering. Front. Comput. Neurosci. 3 (2009). doi:10.3389/neuro.10.005 12. Gurney, K.: Reverse engineering the vertebrate brain: methodological principles for a biologically grounded programme of cognitive modelling. Cogn. Comput. 1, 29–41 (2009) 13. Rosen, R.: Anticipatory Systems: Philosophical, Mathematical and Methodological Foundations. Pergamon, Oxford (1985) 14. Rosen, R.: Life Itself: A Comprehensive Inquiry into the Nature, Origin, and Fabrication of Life. Columbia University Press, New York (1991) 15. Rosen, R.: Essays on Life Itself. Columbia University Press, New York (2000) 16. Keyser, S.J., Miller, G.A., Walker, E.: Cognitive Science in 1978. An unpublished report submitted to the Alfred P. Sloan Foundation, New York (1978) 17. Simon, H.: The Sciences of the Artificial. MIT Press, Cambridge (1969) 18. Wimsatt, W.: Forms of aggregativity. In: Donagan, A., Perovich, A.N., Wedin, M.V. (eds.) Human Nature and Natural Knowledge, pp. 259–291. D. Reidel, Dordrecht (1986) 19. Bechtel, W., Richardson, R.C.: Discovering complexity: Decomposition and localization as strategies in scientific research. Princeton University Press, Princeton (1993) 20. Cummins, R.: The Nature of Psychological Explanation. MIT Press, Cambridge (1983) 21. Cummins, R.: “How does it work” versus “What are the laws?”: two conceptions of psychological explanation. In: Keil, F., Wilson, R.A. (eds.) Explanation and Cognition, pp. 117–145. MIT Press, Cambridge (2000) 22. Atkinson, A.P.: Persons systems and subsystems: the explanatory scope of cognitive psychology. Acta Anal. 20, 43–60 (1998) 23. Rosen, R.: The mind-brain problem and the physics of reductionism. In: Rosen, R. (ed.) Life Itself: A Comprehensive Inquiry into the Nature, Origin, and Fabrication of Life, pp. 126–140. Columbia University Press, New York (1991) 24. Dennett, D.C.: Consciousness Explained. Brown, Boston (1991) 25. Price, C.J., Friston, K.J.: Functional ontologies for cognition: the systematic definition of structure and function. Cogn. Neuropsychol. 22, 262–275 (2005) 26. Uttal, W.R.: The New Phrenology. The Limits of Localizing Cognitive Processes in the Brain. MIT Press, Cambridge (2001) 27. Henson, R.: What can functional neuroimaging tell the experimental psychologist? Q. J. Exp. Psychol. A 58, 193– 233 (2005) 28. Ross, E.D.: Cerebral localization of functions and the neurology of language: fact versus fiction or is it something else? Neuroscientist 16, 222–243 (2010) 29. Schierwagen, A.: Real neurons and their circuitry: Implications for brain theory. iir–reporte, pp. 17–20. Akademie der Wissenschaften der DDR, Institut für Informatik und Rechentechnik, Eberswalde (1989) 30. Forrest, S.: Emergent computation: self-organizing, collective, and cooperative phenomena in natural and artificial computing networks. Physica D 42, 1–11 (1990) 31. Levins, R.: Complex Systems. In: Waddington, C.H. (ed.) Towards a Theoretical Biology, vol. 3, pp. 73–88. University of Edinburgh Press, Edinburgh (1970) 32. Hubel, D.H., Wiesel, T.N.: Shape and arrangement of columns in cat’s striate cortex. J. Physiol. 165, 559–568 (1963) 33. Mountcastle, V.B.: The columnar organization of the neocortex. Brain 120, 701–722 (1997) 34. Szenthágothai, J.: The modular architectonic principle of neural centers. Rev. Physiol., Biochem. Pharmacol. 98, 11–61 (1983) 35. Maass, W., Markram, H.: Theory of the computational function of microcircuit dynamics. In: Grillner, S., Graybiel, A.M. (eds.) The Interface between Neurons and Global Brain Function, Dahlem Workshop Report 93, pp. 371–390. MIT Press, Cambridge (2006) 36. Arbib, M., Érdi, P., Szenthágothai, J.: Neural Organization: Structure, Function and Dynamics. MIT Press, Cambridge (1997) 37. Rockel, A.J., Hiorns, R.W., Powell, T.P.S.: The basic uniformity in structure of the neocortex. Brain 103, 221–244 (1980) 38. Bressler, S.L., Tognoli, E.: Operational principles of neurocognitive networks. Int. J. Psychophysiol. 60, 139–148 (2006) 39. Grillner, S., Markram, H., De Schutter, E., Silberberg, G., LeBeau, F.E.N.: Microcircuits in action from CPGs to neocortex. Trends Neurosci. 28, 525–533 (2005) 40. Horton, J.C., Adams, D.L.: The cortical column: a structure without a function. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 360, 386–462 (2005) 41. Rakic, P.: Confusing cortical columns. Proc. Natl. Acad. Sci. USA 105, 12099–12100 (2008) 42. Herculano-Housel, S., Collins, C.E., Wang, P., Kaas, J.: The basic nonuniformity of the cerebral cortex. Proc. Natl. Acad. Sci. USA 105, 12593–12598 (2008)
10
Reverse Engineering for Biologically Inspired Cognitive Architectures
121
43. de Garis, H., Shuo, C., Goertzel, B., Ruiting, L.: A world survey of artificial brain projects Part I: Large-scale brain simulations. Neurocomputing. (2010). doi:10.1016/j.neucom.2010.08.004 44. Gould, S.J., Lewontin, R.C.: The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. R. Soc. Lond. B, Biol. Sci. 205, 581–598 (1979) 45. Destexhe, A., Marder, E.: Plasticity in single neuron and circuit computations. Nature 431, 789–795 (2004) 46. Frégnac, Y., et al.: Ups and downs in the genesis of cortical computation. In: Grillner, S., Graybiel, A.M. (eds.) Microcircuits: The Interface between Neurons and Global Brain Function, Dahlem Workshop Report 93, pp. 397– 437. MIT Press, Cambridge (2006) 47. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev., Neurosci. 10, 186–198 (2009) 48. Bechtel, W.: Dynamics and decomposition: are they compatible? In: Proceedings of the Australasian Cognitive Science Society (1997) 49. Bechtel, W.: Decomposing the brain: A long term pursuit. Brain and Mind 3, 229–242 (2002) 50. Poirier, P.: Be there, or be square! On the importance of being there. Semiotica 130, 151–176 (2000) 51. van Vreeswijk, C.: What is the neural code. In: van Hemmen, J.L., Sejnowski, T. Jr. (eds.): 23 Problems in System Neuroscience, pp. 143–159. Oxford University Press, Oxford (2006) 52. Mikulecky, D.C.: Robert Rosen: the well posed question and its answer—why are organisms different from machines? Syst. Res. 17, 419–432 (2000) 53. Wimsatt, W.C.: Complexity and organization. Proc. Biennial Meet. Philos. Sci. Ass. 1972, 67–86 (1972)
Chapter 11
Competition in High Dimensional Spaces Using a Sparse Approximation of Neural Fields Jean-Charles Quinton, Bernard Girau, and Mathieu Lefort
Abstract The Continuum Neural Field Theory implements competition within topologically organized neural networks with lateral inhibitory connections. However, due to the polynomial complexity of matrix-based implementations, updating dense representations of the activity becomes computationally intractable when an adaptive resolution or an arbitrary number of input dimensions is required. This paper proposes an alternative to self-organizing maps with a sparse implementation based on Gaussian mixture models, promoting a trade-off in redundancy for higher computational efficiency and alleviating constraints on the underlying substrate. This version reproduces the emergent attentional properties of the original equations, by directly applying them within a continuous approximation of a high dimensional neural field. The model is compatible with preprocessed sensory flows but can also be interfaced with artificial systems. This is particularly important for sensorimotor systems, where decisions and motor actions must be taken and updated in real-time. Preliminary tests are performed on a reactive color tracking application, using spatially distributed color features.
11.1 Introduction Most biological systems need to differentiate their environment through interaction in order to undertake adapted behaviors. Adaptation here means that this selection or regulation of behavior must be normative for the living agent to maintain its organization [3, 5]. At any time, the agent must therefore choose from a set of potentialities the ones that will not be detrimental to its survival. From a more continuous perspective, it must influence the dynamics of its coupling with the environment as to bias bifurcations and sustain viable conditions for its ongoing activity. With the decentralized approach adopted in this paper, the many processes from which the behavior emerges must therefore compete through reciprocal excitations and inhibitions. Whether in an overt way with actions guiding the agent towards meaningful situations and stimuli, or in a covert manner with attention focusing on specific perceptual features, decisions must be taken as to direct the sensorimotor flow.
11.1.1 Competition in Neural Fields The model presented in this paper is derived from research done in the field of computational neuroscience. Considering competition within the brain and interactions between assemblies of neurons, we J.-C. Quinton () INRIA/LORIA Laboratory, Campus Scientifique, B.P. 239, 54506 Vandoeuvre-lès-Nancy Cedex, France e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_11, © Springer Science+Business Media, LLC 2011
123
124
J.-C. Quinton et al.
adopt a distributed approach to cognition and focus on models of the cerebral cortex at a mesoscopic scale. We further commit to a biologically plausible though debated decomposition of the cortex sheet into cortical maps, themselves made of cortical columns [8]. Such considerations are grounded on the laminar structure of the cortex conserved throughout species evolution, on the correlated activities found across the cortical layers [22] and on the apparent topological organization at different scales deduced from cytoarchitectural and functional differences [7, 16]. The generic nature and topology of the cortical circuitry are reflected in dynamic neural field models (DNF) where the evolution of the membrane potential of neurons is described by differential equations. We here focus on a particular kind of DNF implementing dynamic competition, namely the Continuum Neural Field Theory (CNFT) [2]. The lateral connectivity pattern in the CNFT follows a difference of Gaussians profile (DoG), most often further constrained to a Mexican hat profile with local excitation and large-scale inhibition. Under adequate conditions, such neural networks are able to maintain so-called bubbles of activity [2, 29, 33]. These bubbles in fact correspond to spatiotemporally coherent patches of activity that emerge on the neural field, in response to external stimulation. The CNFT leads to robust attentional properties, bubbles tracking and focusing on stimuli despite the presence of noise or distracters [28]. Additionally, the CNFT allows non-linear bifurcations to occur when similar stimuli are presented and has the ability to switch between targets if the bottom-up stimulation disappears or if top-down modulation biases the field activity. To put it briefly, the CNFT implements all the desired properties we may expect from a dynamical competition model. However, while mathematical analysis was limited to a maximum of two dimensions [29] and most simulations used a discretized version of the continuous equations [10, 28], there is no theoretical barrier that prevents applying the CNFT equation to high dimensional continuous inputs. The idea of the current implementation is therefore to alleviate the constraints imposed by the 2D structure of the physical substrate and simultaneously reduce the computational complexity of the algorithms as to scale the model up. For this purpose, we approximate the overall field activity by Gaussian mixture models instead of considering individual units.
11.1.2 Sparse Modeling Initially, DNFs were continuous models of the neuronal activity over a manifold, therefore disconnected from any particular implementation. Although matrix-like regular meshes of the manifold are highly compliant with modern computer architectures, they do not basically allow the manipulation of unbounded and continuous coordinates in a variable or multi-resolution perspective (see Fig. 11.1). Spiking neuron based models, such as the Leaky Integrate-and-Fire (LIF)-CNFT model presented in [10], would largely benefit from the use of sparse matrices [30], but this optimization would only address the computational problem. Determining an adequate resolution a priori is indeed a concern, as on one hand computing on a coarse resolution discrete neural map introduces artifacts and strong discontinuities, and on the other hand complexity is a polynomial function of the resolution. A biologically inspired solution resides in self-organizing maps (SOM) [19] or its extensions for continuous learning such as Growing-SOM (GSOM) [1] or Dynamic-SOM [27]. These permit cortical magnification phenomena to take place and introduce a variable resolution across an otherwise fixed topology. Nevertheless, discontinuities still occur and other problems arise when projecting a high dimensional space onto a 2D manifold [21]. Whereas discontinuities are indeed found in the cortical organization (for instance in the visual cortex [16]), the model presented here adopts a more artificial yet direct approach by implementing the CNFT in a high dimensional space. It thereby avoids the need for asymmetric lateral connectivity required to establish a high dimension topology on the 2D cortical sheet (such asymmetries are found between nearby hypercolumns in the case of the primary visual cortex [6]).
11
Competition in High Dimensional Spaces Using a Sparse Approximation
125
Fig. 11.1 Approximation of a 1D continuous function by dense and sparse representations. The dense representation (b) approximates the continuous activity (a) by using a fixed resolution mesh with n elements taking real values. Their number does not depend on the complexity of the curve. On the contrary, the sparse representation (c) gives an estimate of the activity as the sum of a reduced but varying number of components (here Gaussian), that each needs several parameters to be defined: gk = (xk , Ik )
Instead of being bound to matrices, researchers have also explored variations of the basic square grid mesh of the neural maps, starting with other Bravais lattices. Moreover, because neural maps may not be studied in isolation but as forming high level networks, alternative shapes such as disklike maps may be considered to increase the symmetry of the system [21]. Furthermore, to study the robustness of the network dynamics in the presence of artificial lesions, or simply because biological systems are inherently variable at all levels, units should be allowed to take arbitrary locations on the maps. As the neural map topology is the key to distance based interactions, the sparse model presented in this paper can also simulate the dynamics of some Gas-nets models [17]. Perception systems may also benefit from only taking into account a few units at any time, for instance those above a given threshold with rate-coding or simply those firing with spike-coding models (even when sticking to time-driven models). Neglecting the influence of weakly activated units in the current context would drastically reduce the amount of computations to perform. Interfacing such competition mechanisms with artificial sensorimotor systems gives another argument for trying to use compact representations. Robots performing complex behaviors often use a closed loop to control their actions and get feedback from their environment, and most sensory inputs and motor commands take a simple real value (joint angles for example). As cortical maps use population coding, these single values need to be projected through diffusion processes and receptive fields. Computations then occur on these dense representations, but in the end, an inverse conversion is nevertheless required to synthesize the maps activity into exact commands to be sent to effectors. Abstracting from the substrate and directly approximating the field activity avoids such conversions, by considering the global influence of inputs signals on the activity dynamics. Using matrical representations, the CNFT differential equation gets easily translated into standard operations. Alas, the required convolutions result in a polynomial complexity for updating the neural fields, which further prevents a direct high dimensional implementation. Even with optimization techniques such as using a singular value decomposition (SVD) of the kernel, the polynomial power still increases with the dimensionality [20]. This is a consequence of the exponential growth in the number of connections with neighboring units, illustrating the “curse of dimensionality” [4]. Finally, when implementing such competition algorithms on parallel hardware like field-programmable gate arrays (FPGA), the benefits gained from simultaneously updating several units are attenuated by the routing constraints and additional circuitry required by the increased volume of the convolution kernel [31].
126
J.-C. Quinton et al.
11.2 Model This paper introduces a novel implementation of the CNFT, focusing on the increased expressiveness the model may convey for high dimensional data. It will be briefly described and analyzed in this section, as its properties derive from mathematical considerations. In this paper, we will adopt and extend the notations introduced by Amari [2]. At each time step, a focus neural field activity u is updated by the CNFT, integrating stimulations from an input neural field s. The field is represented by a manifold M in bijection with [−0.5, 0.5]d , where d ∈ [1.. + ∞[ is the finite dimension of the manifold. Periodic boundary conditions are used to avoid numerical issues and dissymmetry near the edges, thus introducing a toric topology. The membrane potential at the position vector x and time t on this field is defined by u(x, t) and is maintained in [0, 1]. Similar notations are used for the input stimulation s(x, t). The dynamics of the membrane potential is then described by: ∂u(x, t) = −u(x, t) + c(x, t) + s(x, t) + h (11.1) ∂t where h is the resting potential and c the potential over an intermediate neural field, artificially introduced to simplify the explanations and defined by: c(x, t) = w(x, x )u(x , t)dx (11.2) τ
x ∈M
where w(x, x ) is the lateral connection weight function satisfying (11.3). w(x, x ) = Ae
2
− |x−x2 | a
− Be
2
− |x−x2 | b
(11.3)
The realization that motivated the rough approximation of the neural field activity by a reduced number of Gaussian components comes from the observation of the CNFT dynamics. For experimental conditions where competition is effective (and it has to be for interactive systems), the focus map rapidly converges to a—potentially empty—set of distant bubbles. When global inhibition is considered (b > |M| > a and A > B), general convergence results and experimental studies have shown that a single bubble may emerge and track the associated stimulus, which is the case in the experiments presented in this paper [29]. The second mathematical result used is that arbitrary signals can be approximated by a sum of Gaussians, their number being positively correlated with the precision required [15]. Even though we do not focus on optimization procedures in this publication, efficient decomposition algorithms have been proposed when input signals are not suitable for direct manipulation [11]. Receptive fields have moreover been found to be approximately separable into a sum of amplitude modulated Gaussian components, for instance in the visual system [26, 32]. Chaotic dynamics may be argued to be the norm in natural cognitive systems at the microscopic scale, but population coding averages the variability of single neuron spike trains [18] and receptive fields further smooth out the activity.
11.2.1 Gaussian Mixture Based Three-Step Procedure From now on, neural fields will be denoted as U (focus field produced by the CNFT), C (intermediate competition field) and S (input stimulation field), respectively associated with u, c and s of (11.1). A generic field G (either U , C or S) will be defined as a mixture of components {gk } (respectively {uk }, {ck } and {sk }). Determined by the set of parameters (xk , Ik ), gk denotes a Gaussian function of
11
Competition in High Dimensional Spaces Using a Sparse Approximation
127
Fig. 11.2 Algorithmic decomposition of the sparse implementation. ① A competition field C is produced by propagating activity from {ut1 , ut2 } to {ut1 , ut2 , s1t , s2t }. ② Components from the focus field {ut1 , ut2 }, input field {s1t , s2t } and competition field {c1 , c2 , c3 } are integrated. ③ Close components are merged, and resulting components with negative intensity removed (s2t ∪ c3 and ut2 ∪ c2 ). Only one component remains (u1t+dt = ut1 ∪ s1t ∪ c1 ), reflecting the convergence towards a single bubble of activity
amplitude Ik centered on xk . Let gk (x) be the activity propagated by the Gaussian component gk at the point x satisfying: gk (x) = Ik ∗ e
−
|xk −x|2 σ2
(11.4)
For the focus field U , σ is fixed to a value between the excitatory and inhibitory standard deviations a and b of (11.3) as to synthesize each stereotyped bubble by a single component. The potential at any point of a field G can then be computed as follows: g(x, t) =
gkt (x)
(11.5)
k
As we will now exclusively manipulate Gaussian components and not directly the activity at any given point on the field, (11.1) must be translated into a three step procedure (see Fig. 11.2). First, the competition step consists in generating the necessary components of C, i.e. wherever the focus components {uk } would have an effect whether on each others or on the stimulations {sk } through the lateral connectivity. This is some sort of sparse convolution, and therefore for each gk = (xk , Ik ) ∈ U ∪ S considered, an inhibitory component ck of parameters (xk , Ikc ) is produced. The value of Ikc is determined by: 1 w(xk , xi )Ii n n
Ikc =
(11.6)
i=1
where n is the number of components on the focus field. The lateral competition weight function w has already been introduced in (11.3). Secondly, the focus field U t generated at the last timestep, inner field C resulting from the lateral competition and current input field S t must be combined and integrated over time to reproduce the dynamics of (11.1). This integration step to produce the new focus fields U t+dt is described by the
128
J.-C. Quinton et al.
following equation: U t+dt = U t ∪
dt −U t ∪ C ∪ S t + h τ
(11.7)
where the ∪ operator applied to fields actually corresponds to the union of the component sets, which is equivalent to adding the contributions of the various components as reflected by (11.5). The scalar multiplication (by dtτ ) and addition (of h) are directly applied to the intensity of the Gaussian components. Finally, a merging step is performed to avoid combinatorial explosion. All operations indeed increase the number of components until now, either because of the constant flow of new input stimulations at arbitrary locations, or because of the pair-wise competition components generated. Close components not only need to be merged to account for the reinforcement of bubbles iteration after iteration, but also to easily detect and eliminate negative activity components where inhibition is dominant. This guarantees that a bounded number of components with positive activity will remain on the focus field, the positivity of the field being already discussed in reference implementations [28]. Several merging algorithms have been proposed for Gaussian components in other contexts, but they often require an exact knowledge of the underlying distribution of point-like stimulations that is missing here [34]. A simple Euclidian distance based criterion is thus used, since the iterative nature of the CNFT can compensate for instantaneous small errors. The threshold for the merging is chosen to match the excitatory standard deviation a as to facilitate the emergence of stereotyped bubbles of activity (see Algorithm 11.1).
Algorithm 11.1: Merging algorithm for the Gaussian components on the focus field 1. 2. 3. 4. 5. 6. 7.
%Find close Gaussian pairs P ←∅ for all (ui , uj ) ∈ U 2 if |xi − xj | < a P ← P ∪ (ui , uj ) end end
In practice P is kept sorted as to easily select and always merge the closest components in the algorithm when choosing (ui , uj )
xnew =
8. %Iterate the merging on pairs 9. while P = ∅ 10. unew ← merge(ui , uj ) 11. P ← P \ { pairs with ui or uj } 12. U ← U ∪ unew \ {ui , uj } 13. for all ui ∈ U 14. if |xi − xnew | < a 15. P ← P ∪ (ui , unew ) 16. end 17. end 18. end
Ij Ii xi + xj I i + Ij Ii + Ij
|xi − xj |2 Inew = Ii + Ij − Ii × Ij × α2
(11.8)
Equation (11.8) provides the exact computations performed to determine the new parameters as a function of the two components to be merged. α is a constant higher than a, ensuring a smooth transition between the aligned and separated cases. For more details on the different steps or to better understand the subtleties of the transformed equations, please refer to [25].
11
Competition in High Dimensional Spaces Using a Sparse Approximation
129
Fig. 11.3 Inhibition of weak distant components (left) and merging with close components (right). The convolution of the focus components (blue curve) with the CNFT weight function (red) inhibits or reinforces the input component (plain black) based on distance. This leads either to the removal of the input component from the focus field, or its merging with the other components, thus resulting in a slight drift of the focus bubble
11.2.2 Complexity and Convergence Analysis Contrary to the matrix implementation, the complexity of the sparse implementation no more fundamentally depends on the number of dimensions, hence its performance for the direct manipulation of high dimensional input spaces. Let d be the number of dimensions and n the resolution along each dimension (supposed to be the same for all dimensions for simplification purpose). The convolution of a map of size nd with a kernel of identical size involved in the matrix based computations determines the overall complexity in O(n2d ) using Landau notation. If we suppose that the weights are fixed, a higher-order singular value decomposition (HOSVD) of the kernel can be done at initialization [20], and the high-dimensional convolution is reduced to d linear convolutions to perform with the singular vectors, thus dropping the complexity to O(nd+1 ), independently of the input dynamics or parameters of the model. By additionally transposing the computations in the frequency domain using a fast Fourier transform (FFT), the circular convolutions are further reduced to pointwise multiplications, but the complexity remains a monotonic function of n and d. On the contrary, the number of components remaining at the end of the Gaussian mixture based procedure highly depends on the coherence of the stimulations. If the CNFT cannot find any structure in the input, for instance in presence of pure noise, the lateral inhibition will not be sufficient to eliminate low activity components. In this case, and with n being this time equal to 1/a so as to correspond to non-overlapping Gaussians, a maximum of O(nd ) elements might appear on the focus field, as additional components would be merged in virtue of the threshold in Algorithm 11.1. If we suppose that the number of input components is similarly bounded, we obtain a worst-case complexity of O(n2d ) for the competition step, which is dominant with infinite asymptotics. Even though n in the sparse implementation can take much lower values than its equivalent for dense matrices, this remains prohibitive as the atomic operations considered are also much heavier. However, the cost of updating the focus field components can drastically decrease when considering realistic inputs for which competition is indeed useful and effective. At the other end of the complexity spectrum, a unique static and stationary stimulus will generate a component that will get maximally reinforced for each timestep and will rapidly inhibit all other components on the field (see Fig. 11.3). The minimal complexity is thus linear in the number of stimulation components, that might be very low for artificial systems. In practice, the computational cost of the three-step procedure is of course variable and initially chaotic, but as soon as convergence towards a stimulus occurs, it induces a non linear transition in
130
J.-C. Quinton et al.
the number of components, allowing only a few distant elements to maintain their activity over time. This should not come as a surprise, as this observed behavior of the CNFT was the main reason for developing a Gaussian mixture based implementation. To consider more realistic scenarios compared to the idealistic case previously analyzed, relaxed constraints on the spatiotemporal characteristics of the stimuli lead to local instabilities on the focus field. Whereas perfectly aligned components will simply see their activity summed, the linear weighting applied on the location vectors for the input and focus components in (11.7) (respectively dtτ and 1 − dt τ ) generally leads to a drift of the bubble in direction of the stimulus movement (see Fig. 11.3). For a low value of dt relatively to the stimulus speed, successive components should remain close enough for the merging and consequent drift to occur. Although the dynamics is purely reactive and the focus is thus necessarily lagging behind the stimulation, it hence robustly tracks the coherent set of stimulations it has focused upon as long as their movement is not too fast relatively to the integration constant τ . The next section compiles other previously obtained experimental results and introduces a dedicated toy application for high dimensional tracking.
11.3 Evaluation In order to evaluate the competition mechanism realized by all versions of the CNFT, artificial input dynamics are provided to the system. These scenarios allow to test the ability to select and track what the designer considers as valid stimulations (generally those which initially have the highest spatiotemporal consistency). Various focus field characteristics are automatically computed and integrated over time, including a tracking error as the distance between the focus bubbles and the associated input stimuli. As we here only consider global inhibition and therefore the emergence of a unique bubble, a single locus synthesizing the entire focus field activity is used whether for undertaking actions or for statistical analysis purpose. The barycenter c of the remaining components after the update procedure (those that could not be merged together) can be rapidly computed by applying (11.9). Computing this kind of center of mass gives a good indication of the ability of CNFT models to converge on a single localized bubble. c=
× xk k Ik
k Ik
(11.9)
The tracking capabilities of the Gaussian mixture based model have already been compared to previous implementations in 2D [25]. As optimal parameters for the CNFT equation depend on the task to perform, on the dynamics of the inputs and on the exact numerical model implemented, genetic algorithms have been used to find the optimal parameters in each case and compare the implementations on a fair basis [23]. The model has been shown to reproduce the following main emergent properties, which will be extended to high dimensional inputs: 1. Tracking of a spatiotemporally coherent moving stimulus whatever its trajectory under a maximal speed fixed by the CNFT equation parameters. 2. Non linear bifurcation when distant but similar stimuli are presented, leading to the selection of one stimulus (phase transition in the distributed dynamics). 3. Robust tracking despite the presence of noise (up to 100% additive Gaussian white noise to signal ratio) and distracters (as long as they do not overlap with the tracked stimulus, in which case the CNFT follows the most stable inputs).
11
Competition in High Dimensional Spaces Using a Sparse Approximation
131
Fig. 11.4 Sparsification of a 2D color image into a set of Gaussian components. Once the objects have been projected on a 2D plane, the image is segmented and synthesized by a set of components (black dots) in the 3D feature space defined by the dimensions (x, y, h). The size of the 3D dots reflects the intensity of the components (saturation). The background activity and the possible thresholding of low activity components are not illustrated on this figure
11.3.1 3D Tracking Application Inputs can be considered as an arbitrary set of feature vectors and the sparse version of the CNFT can be said to detect and track coherent clusters in the feature space. Information theorists often describe the processing within the primary visual cortex to act as filters extracting information from the optical flow. Even when based on bio-inspired computations, filters used in computer vision are often combined using artificial techniques (segmentation for instance in [14]), which could be replaced by distributed competition algorithms. Although the sparse implementation is made to scale up to many dimensions, we will limit ourselves to three as to facilitate the visualization of the maps.
11.3.1.1 Input Structure and Dynamics To show the generality of the approach, color is used to generate a third dimension (see Fig. 11.4). Using a flow of 3D points representing a 3D scene would of course have been possible in a computer generated simulation, but both living and artificial systems never get instantaneous access to such information through their sensors. Methods have been developed to reconstruct the full structure environment, but they generally are neither bio-inspired nor efficient on unconstrained inputs. When projecting a 3D environment on a 2D sensor, occlusions and many different kind of ambiguities can indeed occur, reflecting a loss in the informational content. The environment is here represented as a set of layers, each composed of simple colored objects. These objects can move, change color, see their transparency level or shape transform, with occlusions additionally occurring between layers. These layers are then merged into a single colored dense map, where an HSV decomposition of the colors is performed to obtain a toric hue dimension and for saturation to directly act as a meaningful intensity component, contrary to what can be done with RGB. The CIELAB coding scheme provides a more human-inspired and uniform representation of color [12], but would have made the point less clear by lengthening the descriptions and making the preprocessing more complex. While the color saturation s acts as the stimulation intensity, the hue h is thus added as a third toric dimension to the 2D position (x, y) of the original CNFT. To test the behavior of the sparse CNFT within a 3D feature space, a set of input scenarios is introduced. Although the results presented in this paper instantiate bell-shaped stimuli in the spatial
132
J.-C. Quinton et al.
Fig. 11.5 Illustration of scenarios A to E (refer to the main text for their description). Each scenario is represented by one column of 2 input colored images. The first snapshot always corresponds to t = 0, where the algorithm gets some time to converge on the stimulus (except for scenario A where 2 stimuli are directly presented). The second snapshot is associated to a later time, chosen to be representative of the input dynamics
domain as defined by (11.10), other shapes and color gradients have been tested and lead to qualitatively similar results. When not specified, the hue of the stimuli should not make a difference in the dynamics and can therefore be set to any value. The following scenarios are defined: (A) 2 bell-shaped distant stimuli s1 and s2 are introduced at time t = 0. Their intensity are governed by I1 = 0.4 and I2 (t) = 0.5 + 0.5 cos(π × (t/5)). (B) 1 bell-shaped stimulus of standard deviation 0.1 and intensity 1.0 follows a circular trajectory of radius 0.2 around the point (0, 0) at 10 deg/s from t = 0. From t = 1, 5 distracters of the exact same shape are added and take new random hue and positions on the field every 1 s. (C) 1 moving bell-shaped stimulus (same as in B). At t = 1, Gaussian noise of amplitude 0.5 is added at each point of the field, with a random hue in [0, 1]. (D) 1 moving bell-shaped stimulus (same as in B) with additional full range oscillation on the hue dimension, with a period of 10 seconds. (E) 1 moving bell-shaped stimulus (same as in B) with a hue of 0.5 (cyan) is present from the beginning. At t = 1, a second bell-shaped stimulus of the same intensity is introduced. However, it has a hue of 0.0 (red) and moves at 1 deg/s on the same trajectory and in the same direction. Input stimuli centered on (xs , ys ) with a hue of hs and maximal saturation of ss are defined in (11.10). Although the hue should also vary for the three dimensions to take perfectly symmetric roles, this does not fundamentally change the dynamics as the bubble would converge on the average color, but leads to weird graphical representations and makes is more complex to compare with the 2D versions. h(x, y) = hs
s(x, y) = ss ∗ e
− (x−xs )
2 +(y−y )2 s σ2
(11.10)
Whereas the distracters are placed on top of the tracked stimulus to generate maximum perturbations and occlusions in scenario B, the second red stimulus for scenario E is placed in a deeper layer as to not totally occlude the tracked cyan stimulus. When several stimuli overlap or noise is added, alpha blending is used to combine the color components, where the transparency alpha value equals saturation (see Fig. 11.5).
11.3.1.2 Input Sparsification To reduce the number of input components and test the robustness of the algorithm, a rough sparsification process is introduced. The feature space is partitioned in cubic blocks of dimensions
11
Competition in High Dimensional Spaces Using a Sparse Approximation
133
Fig. 11.6 Illustration of the input spatiotemporal discontinuities. The snapshots present the same partial 2D view of the input image with the input components produced by the sparsification process superimposed (black crosses). Although these correspond to successive timesteps, the number of extracted features varies because of the threshold introduced (one appears between frame 1 and 2, one disappears between 2 and 3). There also are strong spatial discontinuities, for instance when noise is introduced abruptly between frame 3 and 4
(x, y, h). The activity in a block k is then synthesized as a 3D center of mass xk = (xk , yk , hk ), with mass being here associated with color saturation Ik . The intensity and vector components are combined in a stimulation component sk = (xk , Ik ) and computed by applying the following equations: x+x,y+y 1 x+x,y+y s(x, y).dx.dy hk = s(x, y)h.dx.dy Ik = Ik x,y x,y (11.11) 1 x+x,y+y 1 x+x,y+y xk = s(x, y)x.dx.dy yk = s(x, y)y.dx.dy Ik x,y Ik x,y As the original colored image can be described by two functions h(x, y) and s(x, y), the mathematical nature of h is different from x and y, so that s must be introduced to only consider stimulations in the range [h, h + h]. s(x, y) =
s(x, y) 0
if h(x, y) ∈ [h, h + h] otherwise
(11.12)
The white background used in the toy application generates input components with a null saturation because of the HSV decomposition. These components have no effect on the CNFT dynamics since the propagated activity is proportional to the source intensity and will be eliminated immediately after the first merging step. A non mandatory threshold is thus introduced to remove them, as to limit the number of components and make graphs easier to interpret (see Fig. 11.4). Even though this preprocessing may seem to provide adequate inputs for the CNFT and ease the clustering/competition process, it suffers from major limitations relatively to the sparse implementation. Blocks form a partition and do not provide continuity between the stimulations, whereas receptive fields usually largely overlap. A shaded or localized object can therefore be discretized differently from one step to another, occupying a different number of blocks depending on its position or color. Similarly, the many components coding for close objects can expand or contract when the objects follow a linear trajectory, because of their boundaries moving inside a block. Indeed, although static blocks are used, the centers of mass take continuous coordinates in the feature space. For an illustration of these phenomenons, please refer to Fig. 11.6. Such rough transformation of the inputs was intentionally chosen to show that the emergent properties of the CNFT equation combined with the Gaussian merging introduced in the three-step procedure compensate for spatiotemporal discontinuities. when considering artificial systems, this may be useful for processing real world video inputs, as encoding (specially compression algorithms) and low framerate can lead to artifacts and discontinuities.
134
J.-C. Quinton et al.
11.3.1.3 Results By neglecting the third hue dimension, the dynamics of the 3D sparse implementation can be compared to the dense 2D CNFT using the same parameters and input trajectories. The dense version is statistically more efficient at tracking stimuli in noisy environments as the spatial redundancy of the population coding reduces the impact of perturbations, whereas bubbles are coded by a single relatively fragile component in the sparse version. This effect could however be reduced by considering advanced merging algorithms and should be explored in further work. Results include a comparison of three implementations: the dense matrix based 2D CNFT used as a reference [28], the 2D sparse implementation presented in [25] and the 3D sparse version presented in this paper. Error distances as a function of time are shown for typical runs with input scenarios A to E on Fig. 11.7. Scenario A tests the ability of the system to rapidly decide between similar stimuli and to shift the focus to an unattended stimulus that becomes much more salient than the tracked stimulus. Local peaks on the error curves correspond to changes of target and are expected in this scenario only. Although the shifts in attention associated with a high error occur at slightly different times, the same kind of hysteresis appears for all three implementations. The shifts indeed do not correspond to the intersections between the intensity curves (represented not to scale by thin black lines on Fig. 11.7A). They always happen later, with a distracter to target intensity ratio up to 2 (0.8/0.4 and 0.4/0.2). Because the stimuli only differ by their x position and intensity, the reproduction of the results indicates that the preprocessing and merging do not slow down or hinder the convergence. Scenarios B and C show that the 3D sparse implementation is more sensitive to noise and distracters. The stimuli and distracters cannot be differentiated by their intensity alone in the sparse and dense 2D versions and are thus only interpreted as modifying the shape and position of the tracked stimulus. The third color dimension and the sparsification process here make a difference as they allow occlusions between shapes, because of the 2D projection. The full occlusion of the target by 3 distracters occurring at 11 seconds on the graph leads to the disappearance of the target for about 1 second, leaving enough time for the system to relax and focus on another target. However, due to the higher spatiotemporal continuity of the stimulus on the long-term (as distracters randomly take new positions every second), the CNFT finally focuses again on the stimulus at 15 seconds. If the stimulus and distracters were directly provided to the system by removing the sparsification process, noise would affect the component coordinates as in the 2D sparse implementation. This possibility theoretically and practically displays the best results as it benefits from the color information without suffering from projection ambiguities. However, due to the artificiality of its performance, it is not presented in the results. Scenario D simply extends the tracking results to a 3D stimulus trajectory, which means that it not only moves on the field but that its hue also changes. The bubble should thus emerge at a position close to the stimulus, but also take a similar color. Here again, the sparsification process introduces noise and discontinuities relatively to the other implementations, but the exploitation of the color information statistically improves the performance. More interestingly, scenario E shows the advantages of merging the information in a common space to track objects defined by their position and hue. Whereas the two other implementations diverge when the stimuli overlap, the 3D sparse version keeps track of the fastest stimulus against the natural stationarity of the CNFT bubbles. At the point where the fast blue tracked stimulus passes over the slow red stimulus, the same tendency is found in the three curves. About three seconds before perfect superposition, the second stimulus attracts the bubble in its direction as it reaches the excitatory part of the CNFT kernel (at least for the x and y coordinates). This compensates for the inherent lag of the reactive equations relatively to the movement of the stimulus and results in a decrease of the error. Just after the crossing occurred, the 3D implementation still tracks the blue stimulus whereas the other versions mistake one stimulus for the other and preferentially focus on the stationary one.
11
Competition in High Dimensional Spaces Using a Sparse Approximation
135
Fig. 11.7 Error distance as a function of time for scenarios A to E for three different implementations: 2D dense (plain red), 2D sparse (dashed), 3D sparse (plain blue). All error distances are clamped in [0, 0.1] as divergence generally occurs when above (up to a distance of 0.4 on the manifold). The black lines for scenario A represent the 2 stimuli intensity as a function of time, while they represent peculiar events for scenarios B and E
11.4 Discussion The previous sections presented a computationally efficient and scalable implementation of the CNFT, as well as its application to filtered inputs. The unconstrained number and the continuous coordinates of components are both a strength and a weakness of the algorithm, as they may be hard to parallelize. However, preliminary tests have shown that reintroducing receptive fields and aligning the input components on a fixed grid improves the tracking performance compared to the rough sparsification presented earlier, by injecting continuity between nearby components. Although this means reintroducing a matrix-like field of components for the inputs, their number can be reduced as the fo-
136
J.-C. Quinton et al.
cus field components would still take unconstrained coordinates and assimilate the stimulations based on a continuous distance. The minimal distance between two stimulations should not be more than the excitatory standard deviation a as to guarantee enough information is provided to the system for the merging step to be effective. For more complex representational content, as expected in associative maps where information from various modalities converge and get combined, finding an adequate topology is an issue. Features and spatiotemporal relationships defining multimodal representations may indeed be highly dependent on the context and concept considered. This is highly incompatible with the kind of regular topology assumed by the projection of all features within a common space, and by the use of a single propagation function for components to interact. Nevertheless, replacing the Cartesian distance of (11.3) by a dissimilarity measure may at least partially solve the problem. For sensorimotor systems, local contingencies involving a limited number of dimensions (for instance the relationships between particular motor commands and associated proprioceptive feedback) can interact and compete by only considering their common dimensions [24]. Another issue raised by the toy application is that of the nature of dimensions. Although the saturation and hue can both be described as a function of the position on the field, the saturation is used as the components intensity (Ik ), whereas the hue is translated into a spatial dimension of the feature space (xk ). To put it differently, their nature in the input signal is quite similar, but they intervene in different parts of the CNFT equations. This asymmetry makes it possible to focus on salient elements, i.e. those with a high saturation. Symmetry could be restored by transforming the saturation into a forth spatial dimension (leading to slab boundary conditions, where the topology along all dimensions but one is toric). This facilitates a Bayesian interpretation of the CNFT: the focus field activity would be equivalent to a prior probability distribution over the 4D manifold of the stimulations characteristics, which is updated by integrating the information extracted from the input flow. The improvement of the results demonstrated for the color tracking scenario simply means that the additional integration of information within a single feature space made the stimuli non ambiguous. If they were to remain ambiguous and that differentiating them was necessary to achieve the current goal of the agent, an anticipatory model would be required. This is also the case when the tracking lag induced by the reactive dynamics of the CNFT equation is no more negligible and impedes the real-time interactions with the agent’s environment. An anticipatory model is indeed required to make sense of the interactive dynamics over several timesteps, something the stationary reentrance of the focus activity in the original equation cannot account for. The activity should then be maximal where stimuli are expected to appear, and not where they were recognized during the previous timestep. In addition to its Bayesian interpretation, the CNFT can then be assimilated to an iterative basis function (IBF) network, with the feature space representing the high dimensional recurrent network where the inputs from various modalities are projected. With adequate internal connections to provide a forward model of the dynamics, such network can perform optimal sensorimotor integration in the spatiotemporal domain [13]. This also opens the door to sensorimotor and multimodal representations in high dimensional maps, instead of multiplying hierarchically organized 2D associative maps. To give a possible future application of the sparse implementation, local motion detectors (MT cells in the visual cortex) could be combined in a single space, where units would share the two spatial dimensions of the retina, one dimension for the direction of movement and a forth one for the velocity. Such a 4D space would be an alternative to using a large set of maps, each dedicated to a specific direction and speed [9]. By dynamically biasing the activity in a specific part of this space, the model would track extended objects following a determined trajectory. More generally, it would be possible to combine the bottom-up emergent properties of the CNFT with top-down modulations. A bias in activity in any part of the feature space can indeed drive the selection of stimuli with specific characteristics or further increase the robustness of the attentional focus.
11
Competition in High Dimensional Spaces Using a Sparse Approximation
137
References 1. Alahakoon, D., Halgamuge, S.K., Srinivasan, B.: Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans. Neural Netw. 11(3), 601–614 (2000) 2. Amari, S.-I.: Dynamics of pattern formation in lateral-inhibition type neural fields. Biol. Cybern. 27(2), 77–87 (1977) 3. Barandiaran, X.E., Di Paolo, E., Rohde, M.: Defining agency: Individuality, normativity, asymmetry, and spatiotemporality in action. Adapt. Behav. 17(5), 367–386 (2009) 4. Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961) 5. Bickhard, M.H., Christensen, W.D.: Process dynamics of normative function. Monist 85(1), 3–28 (2002) 6. Bosking, W.H., Zhang, Y., Schofield, B., Fitzpatrick, D.: Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex. J. Neurosci. 17(6), 2112–2127 (1997) 7. Brodmann, K.: Brodmann’s ‘Localisation in the Cerebral Cortex’. Smith-Gordon, London (1909/1994) 8. Burnod, Y.: An Adaptive Neural Network: The Cerebral Cortex. Masson, Paris (1989) 9. Castellanos Sánchez, C., Girau, B.: Dynamic pursuit with a bio-inspired neural model. In: Advanced Concepts for Intelligent Vision Systems—ACIVS 2005. Lecture Notes in Computer Science, vol. 3708, pp. 284–291 (2005) 10. Chevallier, S., Tarroux, P.: Visual focus with spiking neurons. In: European Symposium on Artificial Networks— Advances in Computational Intelligence and Learning (ESANN’2008), Bruges, April, pp. 23–25 (2008) 11. Childs, J., Lu, C.-C., Potter, J.: A fast, space-efficient algorithm for the approximation of images by an optimal sum of Gaussians. In: Graphics Interface, pp. 153–162 (2000) 12. CIE (Commission Internationale d’Eclairage): Colorimetry, 3rd edn. publication 15. Technical report, CIE Central Bureau, Vienna (2004) 13. Denève, S., Duhamel, J.-R., Pouget, A.: Optimal sensorimotor integration in recurrent cortical networks: A neural implementation of Kalman filters. J. Neurosci. 27(21), 5744–5756 (2007) 14. Díaz, J., Ros, E., Mota, S., Botella, G., Cañas, A., Sabatini, S.: Optical flow for cars overtaking monitor: the rear mirror blind spot problem. Technical report, Ecovision (European research project) (2003) 15. Goshtasby, A., O’Neill, W.D.: Curve fitting by a sum of Gaussians. CVGIP, Graph. Models Image Process. 56(4), 281–288 (1994) 16. Hubel, D.N., Wiesel, T.H.: Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–54 (1962) 17. Husbands, P., Smith, T., Jakobi, N., O’Shea, M.: Better living through chemistry: Evolving gasnets for robot control. Connect. Sci. 10(3–4), 185–210 (1998) 18. Kandel, E.R., Schwartz, J.H., Jessell, T.M.: Principles of Neural Science. McGraw-Hill, New York (2000) 19. Kohonen, T.: Self-organizing Maps. Springer, Berlin (1995) 20. Lathauwer, L.D., Moor, B.D., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000) 21. Ménard, O., Frezza-Buet, H.: Model of multi-modal cortical processing: coherent learning in self-organizing modules. Neural Netw. 18(5–6), 646–55 (2005) 22. Mountcastle, V.B.: Modality and topographic properties of single neurons of cat’s somatic sensory cortex. J. Neurophysiol. 20(4), 408–434 (1957) 23. Quinton, J.-C.: Exploring and optimizing dynamic neural fields parameters using genetic algorithms. In: Proceedings of IEEE World Congress on Computational Intelligence (IJCNN’2010), Barcelona, Spain, 2010 24. Quinton, J.-C., Buisson, J.-C.: Multilevel anticipative interactions for goal oriented behaviors. In: Proceedings of EpiRob’08—International Conference on Epigenetic Robotics, Brighton, UK, pp. 103–110 (2008). Lund University Cognitive Studies 25. Quinton, J.-C., Girau, B.: A sparse implementation of dynamic competition in continuous neural fields. In: Brain Inspired Cognitive Systems (BICS’2010), Madrid, 2010 26. Rodieck, R.W.: Quantitative analysis of cat retinal ganglion cell response to visual stimuli. Vis. Res. 5(11), 583–601 (1965) 27. Rougier, N., Boniface, Y.: Dynamic self-organising map. Neurocomputing 74(11), 1840–1847 (2010) 28. Rougier, N.P., Vitay, J.: Emergence of attention within a neural population. Neural Netw. 19(5), 573–581 (2006). doi:10.1016/j.neunet.2005.04.004 29. Taylor, J.G.: Neural bubble dynamics in two dimensions: Foundations. Biol. Cybern. 80, 5167–5174 (1999) 30. Tewarson, R.P.: Sparse Matrices. Academic Press, San Diego (1973) 31. Torres-Huitzil, C., Girau, B., Castellanos Sánchez, C.: On-chip visual perception of motion: a bio-inspired connectionist model on FPGA. Neural Netw. 18, 557–565 (2005) 32. Wennekers, T.: Separation of spatio-temporal receptive fields into sums of Gaussian components. J. Comput. Neurosci. 16(1), 27–38 (2004) 33. Wilson, H.R., Cowan, J.D.: A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue. Kybernetic 13, 55–80 (1973) 34. Xu, W., Duchateau, J., Demuynck, K., Dologlou, I.: A new approach to merging Gaussian densities in large vocabulary continuous speech recognition. In: IEEE Benelux Signal Processing Symposium, pp. 231–234 (1998)
Chapter 12
Informational Theories of Consciousness: A Review and Extension Igor Aleksander and David Gamez
Abstract In recent years a number of people have suggested that there is a close link between conscious experience and the differentiation and integration of information in certain areas of the brain. The balance between differentiation and integration is often called information integration, and a number of algorithms for measuring it have been put forward, which can be used to make predictions about consciousness and to understand the relationships between neurons in a network. One of the key problems with the current information integration measures is that they take a lot of computer processing power, which limits their application to networks of around a dozen neurons. There are also more general issues about whether the current algorithms accurately reflect the consciousness associated with a system. This paper addresses these issues by exploring a new automata-based algorithm for the calculation of information integration. To benchmark different approaches we implemented the Balduzzi and Tononi algorithm as a plugin to the SpikeStream neural simulator, and used it to carry out some preliminary comparisons of the liveliness and measures on simple four neuron networks.
12.1 Introduction In recent years a number of people have suggested that there is a close link between conscious experience and the balance between the differentiation and integration of information in certain areas of the brain. This combination of integration and differentiation is often called information integration, and Tononi has claimed that: “at the fundamental level, consciousness is integrated information, and . . . its quality is given by the informational relationships generated by a complex of elements” [18, p. 217]. A number of algorithms for measuring information integration have been put forward, which can be used to make predictions about consciousness and to assist with the debugging of artificial cognitive systems [8]. A better understanding of what information integration actually is can be gained by considering an example of a digital camera sensor with a million photodiodes [18]. If each photodiode is binary, the sensor can enter a total of 21,000,000 states, which corresponds to 1,000,000 bits of information. One of the key differences between the photodiodes and the areas of the brain associated with consciousness is that each photodiode acts independently of the others, whereas the vast number of states in the brain’s neural networks are the outcome of causal interactions between the neurons. Both the brain and the camera sensor can enter a large number of states, but Tononi [18] claims that some of the brain’s states are conscious because they are both differentiated and integrated at the same time. This
I. Aleksander () Department of Electrical Engineering, Imperial College, London SW7 2BT, UK e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_12, © Springer Science+Business Media, LLC 2011
139
140
I. Aleksander and D. Gamez
Fig. 12.1 Systems with different amounts of integration and differentiation. At the bottom left are systems with few states and low integration between these states, such as simple creatures or basic artificial systems. Bottom right are highly differentiated systems with little or no integration, such as the photodiodes in a digital camera sensor. Top left are systems whose high level of integration prevents them from entering a wide variety of states—for example, a large number of LEDs controlled by a single switch. Top right are systems with a large repertoire of states that are the result of causal interactions between their elements, such as the areas associated with consciousness in the human brain
combination of differentiation and integration is illustrated in Fig. 12.1, which contrasts conscious systems, such as parts of the brain, with systems that lack integration between their elements, or which can only enter a low number of states. A number of algorithms have been put forward to measure information integration. Tononi and Sporns [19] and Balduzzi and Tononi [2] developed a measure known as , and there are a number of related measures including neural complexity [20, 21], transfer entropy [11] and causal density [13]. A major problem with many of these algorithms is that they can take an extremely long time to compute. For example, the algorithms put forward by Tononi and Sporns [19] and Balduzzi and Tononi [2] have multiple factorial dependencies, which require calculations on all possible partitions of all possible subsets of a network. It has been estimated that a full analysis of an 18,000 neuron network using Tononi and Sporns’ [19] algorithm would take 109000 years [6], and the measured and predicted times for the more recent Balduzzi and Tononi [2] algorithm are also extremely large (see Fig. 12.3). Much more efficient ways of calculating information integration need to be found if it is to become a useful tool in research on consciousness and robotics. A second key issue is that different information integration algorithms are likely to make different predictions about which parts of a system are conscious, and more work is needed to determine the accuracy of the current measures. This issue will be easier to address when the performance issues have been solved because the current algorithms are typically only capable of analyzing systems with around a dozen elements. More discussion is also needed about Tononi’s strong claim that consciousness is information integration, which will be easier to address if it can be empirically established whether information integration is consistently linked with conscious states. To address these issues we are exploring an alternative way of calculating information integration based on liveliness and an automata approach [1], which is outlined in Sect. 12.3. To benchmark different information integration algorithms we implemented the Balduzzi and Tononi [2] algorithm as a plugin to the SpikeStream neural simulator (see Sect. 12.4), and Sect. 12.5 describes some very preliminary experiments in which a simple four neuron network was analyzed using the two different algorithms.
12
Informational Theories of Consciousness: A Review and Extension
141
12.2 Previous Work Other analysis work based on information integration has been carried out by Lee et al. [10], who made multi-channel EEG recordings from eight sites in conscious and unconscious subjects and constructed a covariance matrix of the recordings on each frequency band that was used to identify the complexes within the 8 node network using Tononi and Sporns’ [19] algorithm. This experiment found that the information integration capacity of the network in the gamma band was significantly higher when subjects were conscious. Information integration-based predictions about the consciousness of an 18,000 neuron network have been carried out by Gamez [7] and there has been some theoretical work on information integration by Seth et al. [13], who identified a number of weaknesses in Tononi and Sporns’ [19] method and criticized the link between information integration and consciousness. A number of other measures of the information relationships between neurons have been put forward, including neural complexity [20, 21], transfer entropy [11] and causal density [13]. There has been some work comparing neural complexity measures and graph theory [14], and these measures have been used by a number of people to examine the anatomical, functional and effective connectivity of biological networks, either using scanning or electrode data, or large-scale models of the brain. One example of this type of work is Honey et al. [9], who used transfer entropy to study the relationship between anatomical and functional connections on a large-scale model of the macaque cortex, and demonstrated that the functional and anatomical connectivity of their model coincided on long time scales. Another example is Brovelli et al. [3], who used Granger causality to identify the functional relationships between recordings made from different sites in two monkeys as they pressed a hand lever during the wait discrimination task, and Friston et al. [4] modeled the interactions between different brain areas and made predictions about the coupling between them. Information-based analyses have also been used to guide and study the evolution of artificial neural networks connected to simulated robots [12, 16]. An overview of this type of research can be found in Sporns et al. [17] and Sporns [15].
12.3 Some Finite-State Discrete Automata Principles Balduzzi and Tononi’s [2] recent work on discrete-state systems have made it possible to examine information integration ideas using classical automata theory. This section outlines some principles of the automata approach, which is compared with Balduzzi and Tononi’s [2] algorithm in Sect. 12.5.
12.3.1 Cyclic Activity and Neuron Liveliness In 1973, Aleksander and Atlas carried out an analysis of epigenesis and differentiation in networks of interconnected genes, which were modeled by 2-input binary neurons with fixed, but randomly chosen, neuron transfer functions. When these networks were started in a random state, they went from state to state until they returned to a previously visited state, at which point the cycle was complete and repeated itself in perpetuity. In this model the states in the cycle were analogous to the chemical changes in the division of a cell, with the number of different cycles involved in a given network being analogous to the number of different cell types of a particular organism. Aleksander and Atlas [1] showed how this type of network was stable, even if its elements were interconnected at random. In Aleksander and Atlas [1] the concept of liveliness was defined for two-input network elements or ‘nodes’. In this paper the concept of liveliness will be presented in terms of an arbitrary number j j j of n inputs. Let the binary inputs of a node j be: x1 , x2 , . . . , xn . The vectors of the 1 or 0 states of
142 Table 12.1 Example function to illustrate the calculation of liveliness. x1 , x2 , and x3 are the inputs to node j and z is the output of the node. Each combination of x1 , x2 , and x3 is one of the j j input vectors X0 , X1 , j j . . . , X2n and λ(xi ) is the liveliness of input i for node j
I. Aleksander and D. Gamez j
j
j
x1
x2
x3
z
λ(x1 )
λ(x2 )
λ(x3 )
0
0
0
0
1
0
1
0
0
1
1
0
1
0
0
1
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
1
1
1
0
1
0
1
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0
0
1
0
j
j
j
the inputs to node j will be represented as X0 , X1 , . . . , X2n , where the total number of possible input j vectors is 2n . For example, if n = 3 and x1 = 0, x2 = 1 and x3 = 0, the fifth input vector, X5 , could j j be [010]. A node j is said to be lively on input xi if its output zj changes value when xi changes its j j value. This is denoted by λ(xi ) = 1, or λ(xi ) = 0 if it is not lively. Liveliness depends on the function j j j in the node that maps the inputs X0 , X1 , . . . , X2n to the output zj , and it can be computed for all of n the 2 possible input vectors. j j The liveliness of node j for an input xi , (xi ), is the number of times for all of the input vecj j j tors X0 , X1 , . . . , X2n that the node is lively divided by the number of input combinations, 2n . As an example consider the function described in Table 12.1. The total node liveliness for the inputs in Table 12.1 is as follows: j
(x1 ) = 2/8;
j
(x2 ) = 6/8;
j
(x3 ) = 1/8.
Given a group of nodes the liveliness of a loop, L, can be found by taking the product of the liveliness of the connections in a path traversing the nodes. The liveliness of a node, N , is defined as the average liveliness of all the inputs to a node (scoring 0 for disconnections). Nodes that compute the parity function (output 1 if the number of 1s at the input is even) or the ‘oddness’ function (output 1 if the number of 1s at the input is odd) can be shown to have the maximum liveliness of 1 on all inputs, while ‘stuck at 1’ or ‘stuck at 0’ functions have a liveliness of 0. Common functions such as AND and OR can be shown to have a liveliness of 2/2n . In Aleksander and Atlas [1] this approach was used to predict the probability of finding closed element loops, whose ability to transmit information caused the cyclic state structures from which the cyclic activity of genetic networks could be predicted. This was seen to be a close approximation to what is known about epigenesis and differentiation across animal species.
12.3.2 Relevance to Information Integration While a formal link between liveliness and information integration is work in progress, it is possible to identify a heuristic relationship between them. The high liveliness between nodes that are in closer causal contact appears to broadly correspond to the causal interdependence between neurons that is measured by . For example, the digital camera sensor mentioned earlier exhibited low because there was no causal linking between the elements, and the overall liveliness of this system would be zero as well. However, whilst and liveliness both measure causal interactions, they do not measure exactly the same thing: indicates what information a system provides when it enters a particular state, whereas liveliness identifies the state structures that arise as a result of the causal interactions available to the neurons.
12
Informational Theories of Consciousness: A Review and Extension
143
Fig. 12.2 SpikeStream analysis plugin implementing Balduzzi and Tononi’s [2] algorithm. The graphical controls are on the left above a table listing the analysis results. The 3D display on the right shows the network that has been analyzed, with one of its complexes highlighted
12.4 SpikeStream Analysis Software SpikeStream was originally developed as an open source spiking neural simulator [5], and we adapted it to support weightless neurons and added a plugin to carry out the analysis of networks using Balduzzi and Tononi’s [2] information integration algorithm. In these experiments the networks were simulated using the NRM weightless neural simulator,1 and then imported into SpikeStream for analysis. A screen shot of the plugin implementing Balduzzi and Tononi’s [2] algorithm is shown in Fig. 12.2. To compare the performance of liveliness with the Balduzzi and Tononi [2] algorithm, networks with different numbers of weightless neurons were created, with each neuron receiving connections from another five randomly selected neurons. These networks were trained on 5 different patterns and the time that each algorithm took per time step was averaged over 25 runs using a Pentium IV 3.2 GHz single core computer. The analysis times for the Balduzzi and Tononi [2] algorithm are plotted in Fig. 12.3, which shows a very rapid increase in the time taken for the analysis, with networks of 30 neurons estimated to take 1010 years to analyze on a desktop computer. Whilst some optimizations might be able to increase the calculable network size, this performance issue cannot be entirely overcome because of the factorial dependencies of the calculation. This problem is corroborated by earlier work [6], which estimated that it would take 109000 years to analyze a network of 18,000 neurons using Tononi and Sporns’ [19] algorithm. The measured performance times for the liveliness algorithm plotted in Fig. 12.4 show a linear dependence on the number of neurons, which is also a linear dependence on the number of connections 1 More
information about NRM is available at: http://www.iis.ee.ic.ac.uk/eagle/barry_dunmall.htm.
144
I. Aleksander and D. Gamez
Fig. 12.3 Measured and predicted times for the calculation of information integration on different sizes of network using Balduzzi and Tononi’s [2] algorithm and a Pentium IV 3.2 GHz single core computer. Each neuron in the network was randomly connected to five other neurons and trained to give their truth tables five entries. The results are for the analysis of a single time step with a random firing pattern
Fig. 12.4 Performance of the liveliness algorithm
because there was a fixed number of five connections per neuron in all of the networks. It took around 13 seconds to analyze a 100 neuron network for liveliness. We are currently preparing a new release of the SpikeStream software, which will be available in Windows and Linux versions from the SpikeStream website (http://spikestream.sf.net). The next step will be to explore more efficient ways of calculating information integration, which can be benchmarked and compared by implementing them as SpikeStream plugins.
12.5 Examples The experiments in this section were carried out on a 4-neuron network (A,B,C,D) that was constructed with different functions and connectivity. The SpikeStream analysis software was used to identify the networks’ complexes and measure their , and these results were compared to the liveliness of each system.
12
Informational Theories of Consciousness: A Review and Extension
145
Fig. 12.5 (A) Connections and functions of Network 1; (B) Liveliness of connections in Network 1
Fig. 12.6 (A) Functions and connections of Network 2; (B) Liveliness of connections in Network 2
Table 12.2 and Loop values for Network 1. and have a correlation coefficient of 0.92
Complex
AB
CD
AC
ACD
ABCD
Average (bits)
2
1.5
1.0
0.7
0.3
Loop (%)
100
50
50
25
25
Table 12.3 , loop , and node values for Network 2. and Node have a correlation coefficient of 0.89 Complex
AB
ACD
ABCD
Average (bits)
2
2
1
Loop (%)
100
100
100
Node (%)
11
7
2.2
12.5.1 Example 1 The connections in the first network are shown in Fig. 12.5A, which had an AND function in neuron C. The liveliness of this network’s connections are shown in Fig. 12.5B. The analysis of Network 1 was carried out for each of the 16 possible states of the network. The highest complex of each set of connected neurons was identified for each state of the network, and these were averaged to produce the results shown in Table 12.2. These results show a 0.92 correlation between the loop liveliness of a set of connected elements and the average highest value of the complexes in which they are involved. This suggests the possibility that the complexes discovered by the exhaustive processor-intensive calculations could be approximated using the liveliness approach.
12.5.2 Example 2 In Network 2 the AND function in neuron C in Network 1 was replaced by an XOR function (see Fig. 12.6A), which led to the loop and node liveliness values shown in Fig. 12.6B. When the results in Table 12.3 are compared with the results in Table 12.2 it can be seen that the higher loop liveliness caused by the XOR function is roughly mirrored in higher values of the
146
I. Aleksander and D. Gamez
Table 12.4 Comparison of and average node values for ABCD with different connections and functions. Max and Average Node have a correlation coefficient of 0.96 Connections
Functions
Max (bits)
Average Node (%)
Same as Fig. 12.5
AND function between A and C
0.58
25
Same as Fig. 12.5
XOR function between A and C
1.0
31
Fig. 12.5 network with two way connections between AB, AC, BD, and DC
All nodes with XOR function
0.83
50
Two way connections between each node and all other nodes
All nodes with XOR function
4
100
complexes AB, ACD and ABCD. The results also show that average node liveliness is a better fit with the data, which could be investigated more systematically in future work.
12.5.3 Example 3 This experiment took the highest values of the complex ABCD for different network configurations and compared it to the average node liveliness. The results are shown in Table 12.4. In these results the maximum value of was associated with the maximum liveliness (row 4) and the minimum value of was associated with the minimum liveliness (row 1), with an overall correlation coefficient of 0.96 between Max and Average Node . It must be stressed that this work on the link between and is highly speculative at this stage, and the examples are only intended as a highly embryonic heuristic, which might eventually lead to better ways of measuring information integration.
12.6 Future Work One of the first things that is needed to take this work forward is a set of networks that can be used to benchmark and compare different methods of calculating information integration. These networks would need to be of increasing size and complexity and their topologies should be likely to exhibit different levels of information integration. These networks could be used to measure the time performance of each algorithm, and the high integration areas could be compared. Once a sensible set of benchmarks has been defined, it will be possible to develop more efficient information integration algorithms that accurately reflect the relationships between the elements in a network. Improved measures of information integration have many applications. Within neurophenomenology, information integration can be used to make predictions about what a person is conscious of based on EEG, fMRI or other data, and it is also possible to use information integration to make predictions about the consciousness of artificial systems as part of work on synthetic phenomenology [7]. Information integration can also be used to develop representations of the contents of the minds of artificial systems, which will be very useful for debugging cognitive robots that learn through interaction with their experiences [8].
12
Informational Theories of Consciousness: A Review and Extension
147
12.7 Conclusions The measurement of information integration is becoming increasingly important in the study of natural and artificial consciousness, and it also has applications in network analysis and the debugging of cognitive systems. The two key challenges in this area are the development of faster information integration algorithms and the validation of these algorithms on non-trivial systems. This paper has outlined a new approach to the measurement of information integration based on liveliness and described software with a plugin framework that can be used to benchmark different types of information integration algorithm. The second half of this paper offered some preliminary comparisons between the and liveliness of a simple four neuron network. These examples are very tentative and would need to be developed in much more detail to accurately evaluate the advantages and disadvantages of the liveliness approach. Acknowledgements
This work was supported by a grant from the Association for Information Technology Trust.
References 1. Aleksander, I., Atlas, P.: Cyclic activity in nature: causes of stability. Int. J. Neurosci. 6, 45–50 (1973) 2. Balduzzi, D., Tononi, G.: Integrated information in discrete dynamical systems: motivation and theoretical framework. PLoS Comput. Biol. 4(6), 1000091 (2008) 3. Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., Bressler, S.L.: Beta oscillations in a large-scale sensorimotor cortical network: Directional influences revealed by Granger causality. Proc. Natl. Acad. Sci. USA 101, 9849–54 (2004) 4. Friston, K.J., Harrison, L., Penny, W.: Dynamic causal modelling. NeuroImage 19, 1273–302 (2003) 5. Gamez, D.: SpikeStream: A Fast and Flexible Simulator of Spiking Neural Networks. In: Marques de Sá, J., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) Proceedings of ICANN 2007. Lecture Notes in Computer Science, vol. 4668, pp. 370–379. Springer, Berlin (2007) 6. Gamez, D.: The development and analysis of conscious machines. Unpublished Ph.D. thesis, University of Essex, UK. Available at: http://www.davidgamez.eu/mc-thesis/ (2008) 7. Gamez, D.: Information integration based predictions about the conscious states of a spiking neural network. Conscious. Cogn. 19(1), 294–310 (2010) 8. Gamez, D., Aleksander, I.: Taking a mental stance towards artificial systems. Biologically inspired cognitive architectures. Papers from the AAAI Fall Symposium. AAAI Technical Report FS-09-01: 56–61 (2009) 9. Honey, C.J., Kötter, R., Breakspear, M., Sporns, O.: Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proc. Natl. Acad. Sci. USA 104(24), 10240–10245 (2007) 10. Lee, U., Mashour, G.A., Kim, S., Noh, G.-J., Choi, B.-M.: Propofol induction reduces the capacity for neural information integration: Implications for the mechanism of consciousness and general anesthesia. Conscious. Cogn. 18(1), 56–64 (2009) 11. Schreiber, T.: Measuring information transfer. Phys. Rev. Lett. 85(2), 461–464 (2000) 12. Seth, A.K., Edelman, G.M.: Environment and behavior influence the complexity of evolved neural networks. Adapt. Behav. 12, 5–20 (2004) 13. Seth, A.K., Izhikevich, E., Reeke, G.N., Edelman, G.M.: Theories and measures of consciousness: An extended framework. Proc. Natl. Acad. Sci. USA 103(28), 10799–10804 (2006) 14. Shanahan, M.: Dynamical complexity in small-world networks of spiking neurons. Phys. Rev. E 78, 041924 (2008) 15. Sporns, O.: Brain connectivity. Scholarpedia 2(10), 4695 (2007) 16. Sporns, O., Lungarella, M.: Evolving coordinated behavior by maximizing information structure. In: Rocha, L., Yaeger, L., Bedau, M., Floreano, D., Goldstone, R.L., Vespigniani, A. (eds.) Artificial Life X: Proceedings of the 10th International Conference on the Simulation and Synthesis of Living Systems, pp. 322–329. MIT Press, Cambridge (2006) 17. Sporns, O., Chialvo, D.R., Kaiser, M., Hilgetag, C.C.: Organization, development and function of complex brain networks. Trends Cogn. Sci. 8(9), 418–425 (2004) 18. Tononi, G.: Consciousness and integrated information: a provisional manifesto. Biol. Bull. 215, 216–242 (2008) 19. Tononi, G., Sporns, O.: Measuring information integration. BMC Neurosci. 4, 31 (2003) 20. Tononi, G., Sporns, O., Edelman, G.M.: A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proc. Natl. Acad. Sci. USA 91, 5033–7 (1994) 21. Tononi, G., Edelman, G.M., Sporns, O.: Complexity and coherency: integrating information in the brain. Trends Cogn. Sci. 2(12), 474–84 (1998)
Chapter 13
Hippocampal Categories: A Mathematical Foundation for Navigation and Memory Jaime Gómez-Ramirez and Ricardo Sanz
Abstract It goes without saying that in science, experiments are essential; hypothesis need to be contrasted against empirical results in order to build scientific theories. In a system of overwhelming complexity like the brain, it is very likely that hidden variables, unknown by the experimentalist, are interacting with those few elements of which the values are expected and can be validated or rejected in the laboratory. Thus, at the end of the day, the experimentalist is refuting or validating tentative models that are somehow prisoners of the lack of knowledge about the structure of the system. The global picture being missing, a key is to look for the fundamental structure which must be found not in the objects, but in the relationships between the objects—their morphisms. How components at the same level interact (the objects here being neurons) and how superior levels constrain those levels below and emerge from those above is tackled here with a mathematical tooling. The mathematical theory of categories is proposed as a valid foundational framework for theoretical modeling in brain sciences.
13.1 The Hippocampus as a Representational Device How does the mind represent physical space? This is a question that has kept philosophers busy for centuries. In 1975, the philosophical discussions about space representation acquired a extremely powerful and fresh insight when O’Keefe and Nadel, discovered the place cells in the hippocampus of the rat [13]. The experimental study of spatial representation in the brain has since then exploded. The 70’s was the decade of the place cells, neurons that discharge when the rat is in a particular position. In the 80’s head direction cells, neurons that discharge significantly whenever the rat’s head changes direction, acquired the attention of scholars. Since 2005 we have been in the era of the grid cells. These discoveries are of major importance in different research fields. Indeed the theory of the cognitive map [13] is rooted in the discovery of place cells in the hippocampus. One derivative of this theory is the map-based navigation capability, that some animals have, and that engineers have been able to replicate in robots [10]. The debate about whether the brain generates a map-like structure or not, seems to have shifted in favour of those who back the cognitive map theory. Indeed the discovery of place cells, head cells and grid cells suggest so. Yet the underlying nature of the cognitive map remains elusive. Is the representation purely metrical or is topological? Are the maps constructed in the hippocampus built without paying attention to the J. Gómez-Ramirez () Autonomous Systems Laboratory, Universidad Politécnica de Madrid, José Gutiérrez Abascal 2, 28006 Madrid, Spain e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_13, © Springer Science+Business Media, LLC 2011
149
150
J. Gómez-Ramirez and R. Sanz
features of the environment—i.e. metrical maps—or do they reflect the relationships between the environmental features—i.e. topological maps? In a sense, the role of the hippocampus is to associate internal and external coordinate systems and to accommodate cue conflict situations (reinstantiate the context when there is a mismatch between internal and external relationships). Rather than debating whether the hippocampus is the depositary of the declarative memory or the index access of a collection of maps, it may be more productive to ask just what is the role of hippocampus in navigation and memory. With this mind, in The hippocampal debate: Are we asking the right questions? [16], Redish suggests that there are multiple memory systems in the brain and multiple navigation systems.
13.1.1 Place Cells Place cells are neurons located in the hippocampus that fire in complex bursts whenever an animal, for example a rat, moves through a specific location in an environment. The striking thing about place cells is that they code the spatial position of the animal, irrespective of either the direction from which the position is reached or the behavior of the rat at any precise instant. Thus, there is a direct link between the neural activity of a single cell to a Cartesian position of the rat. How does the animal know that it is in a particular position? Apparently this could be done by computing the allocentric space, landmark or visual cues. The most important property of these place cells is their omnidirectionality property, that can be observed in the conical shape of their activation landscapes (the firing rate increases when the rat approaches the location, independently of the direction is heading when it does it). Thus the immediate conclusion is that place cells are coding explicit (no contextual) locations in the environment and not particular sensorial cues. The region in which a place cell fires the most is called its place field. Thus, there is a correspondence place field/place cell. What defines a place field is that the firing rate within the field is much higher than outside—e.g.: from 20 Hz to 0.1 Hz. For a given environment, we can determine a collection of place cells whose associated place fields cover the whole environment. Nobody denies that under certain circumstances, the hippocampal pyramidal cells show locationassociated firing. However, it is less clear what they really represent; there are those who argue that place cells can be an epiphenomenon, produced by the spatial nature of the experiments where these cells are discovered. Granted that place cells are correlated to space, the question that arises is: Are the place cells the only neurons correlated to space? The possible representational content of these cells and of the assemblies they constitute, can serve to further question how the hippocampus contributes to spatial representation, navigation and episodic memory.
13.1.1.1 Place Cells as Representational Entities The interest in these cells is rooted in the fact that they are good candidates to be the direct representation of the external space—i.e. a neural correlate of spatial perception. A place cell, fires maximally when the animal is in a specific position or place field, so the firing rate of a cell can be used to decode the position of the animal within the environment with striking accuracy. The existence of place cells was not accepted until Muller [12] came out with a numerical method that allows to quantify the place fields. In this context, we can attempt to formally define the term “place field”. A place field F , for a place cell, is an open ball of radius r and center x in a normed vector space V—the spatial environment— such that f r(F ) > k, where k is a constant that represents a threshold for firing rate, and f r a function that returns the minimum firing rate for all the pixels (vectors) that fall into the ball F (Fig. 13.1).
13
Hippocampal Categories: A Mathematical Foundation for Navigation
151
Fig. 13.1 The picture shows the place fields corresponding to seven hippocampal CA1 place cells of a rat (CA1 is a region of the hippocampus. See Fig. 13.3). Thus, as it is obvious from the figure, the position of the rat is encoded in the firing of these cells. The place fields have conical shapes, this meaning that the neuron firing rates increase irrespective of the direction from which the rat arrives to the place
Fig. 13.2 Grid maps have been obtained from rat neurons [7]. The typical experiment uses an electrode to record the activity of an individual neuron in the dorsomedial entorhinal cortex. Spike recordings are made as the rat moves around freely in an open area. The image shows an spatial autocorrelogram of the neuronal activity of the grid cell. Image by Torkel Hafting
13.1.2 Grid Cells Grid cells, like place cells, are place-modulated neurons; however the firing location of a grid cell is multiple, contrary to the place cells which are mono field. The multiple firing location of a grid cell is indeed a grid with a most striking property, it is an array of equilateral triangles [7] (see Fig. 13.2). It might be noted that grid cells were discovered while researchers investigated whether place cells activity was endogenous to the hippocampus. The hypothesis was that CA3 and DG are the intra-hippocampal inputs to CA1 (see Fig. 13.3), which is the area in the hippocampus where one can find most of the place cells. This idea was proven wrong, after removing CA3 and DG, the CA1 inputs, the spatial firing in CA1 persisted. So place signals did not exclusively arise within the hippocampus. The signal bearing spatial information was brought to the CA1 from outside the hippocampus. In 2004 Fyhn et al. [6] discovered a group of neurons in the medial entorhinal cortex (mEC) that show space-related firing. These mEC cells have sharply tuned spatial firing, much like the hippocampal place cells do, but with one difference: each of the mEC cells, has multiple firing fields rather than one as is the case in the place cells. One year later, Hafting et al. [7] discovered that the many
152
J. Gómez-Ramirez and R. Sanz
Fig. 13.3 Basic circuit of the hippocampus, as drawn by Ramón y Cajal [15]
firing fields of each neuron, organise into a grid. Thus as the animal moves, the grid cells tile the environment with periodic triangles that reflect the changing position. As was said before, the grid cells have been found in the mEC. From the six layers of this cortical structure, it is in layer II where we can find the highest density of this kind of cells. The neurons in the layer II of the medial entorhinal cortex (mEC-II) are the main input of the place cells, but in contrast the entorhinal cells are activated throughout the environmental terrain, whenever the animal is at the vertex of some equilateral triangle, forming a tessellation or grid. In short, both place cells and grid cells are neurons with spatially located firing; in other words, they have spatial representational power, allowing the animal to know its position and to navigate in an environment, for example to find the way back home after eating. The difference, apart from the fact that place cells are hippocampal neurons and grid cells are in the mEC, is that whereas a place cell has a single firing location, a grid cell has multiple firing fields with a striking geometric regularity; the firing fields form periodic triangular arrays, tiling the entire environment available to the animal.
13.1.2.1 Grid Field Three parameters are necessary to fully describe the grid associated to a grid cell: spacing is the distance between contiguous fields, orientation is the rotation angle of the grid referred to a reference axis, and spatial phase is how much the grid is translated relative to a reference point. A grid field for a grid cell is a set of open balls Gi : 1..n, where for every ball Gi , f r(Gi ) > k, i.e. the grid cell has a significative firing rate. Thus, so far, the definition of Gi is similar to place field seen in Sect. 13.1.1.1. Additionally, every ball Gj of a grid field, form equilateral triangles with its closest balls. The grid field G is identified by the three parameters, spacing, orientation and phase that can be trivially obtained from the metric of the space defined above. The processing of the place signal is therefore not an exclusive privilege of the hippocampus, with the mEC playing a major role in the spatial representation. The majority of cells in mEC-II and mEC-III have grid properties, and this means that most of the cortical input to the hippocampal place cells that have to do with spatial representation, comes from grid cells in the mEC. Grid cells can be found just one synapse upstream of the place cells [7]. Therefore, acknowledging that grid cells and place cells are intrinsically connected, the claim that place fields may be extracted from grid fields [11] deserves to be appropriately explored (Table 13.1). Mathematically, using Fourier’s analysis, several grid fields with different spacing can combine linearly to yield a place field. Solstad [17] proposes a computational model showing that place field can arise by the sum of 10–50 grid cells. When the spatial phase variation in the grid-cell input was
13
Hippocampal Categories: A Mathematical Foundation for Navigation
153
Table 13.1 Place cells and Grid cells, similarities and differences Brain area
Type of map
Activation
Place cells
Hippocampus
Static
Need input from mEC
Grid cells
mEC
Dynamic
Active instantaneously in any novel environment
higher, multiple, and irregularly spaced firing fields were formed. This idea has been very appealing in the hippocampus community, and it has helped to produce a large number of models with a common tenet: place cells in the hippocampus compete to receive the summed firing pattern activity of the cortical grid cells. The problem with these kind of models that transform grid patterns into place pattern is that they do not tell us that much about the mechanisms that underlie the spatial firing pattern of grid cells and place cells. Besides, it is debatable that a linear sum of grid cell pattern which has a metric is the correct way to model a place cell pattern which represents topologically the environment without metrical relations. It might be remarked here that the models of grid field formation, deal with timing rather than with structure or connectivity, and this is because they assume that the structure is already known. In these models the structure is a single cell whose firing activity needs to be understood. There is a number of computational models that aim to simulate a grid field, however they do not tell us much about the causes that originates that phenomenon, let alone a mechanistic explanation that unveils the real causes of the emergence of place cells in the hippocampus. As Zilli points out [20], we must be prudent (“the study of grid cells is still in its infancy”). The mechanisms that underlie the spatial firing pattern are still waiting to be discovered.
13.2 A Theory of Brain’s Spatial Representation Based on Category Theory The huge amount of information on brain structure and operation that is being produced—e.g. by fMRI techniques—must be analysed from a theoretical background to have lasting impact in brain theory. Otherwise the global picture of brain operation is going to be missed. It is necessary to look for the fundamental structures which must be found not just in the objects—the neurons—but also in the relationships between the objects and the emerging organisations. How components at the same level interact (the objects here being neurons) and how superior levels constrain those levels below and emerge from those above is tackled here with a mathematical tooling. The mathematical theory of categories is proposed as a valid foundational framework for theoretical modeling in brain sciences [2]. One of the highlights of this work is that it exemplifies the theory of categories in strong nonalgebraic categories. Indeed, the crucial aspect and novelty in this work needs to be met in the categorical construction of biological (non algebraic) categories.
13.2.1 The Category of Neurons For this purpose we must find a definition for a neural abstract category CAT-Neur as a category whose objects are either neurons or sets of neurons. CAT-Neur as any other category, consists of three things, i. a set of objects O, ii. a set of morphisms Mor(A, B) for any two objects A, B of O, and iii. a rule of composition that fulfills the properties of associativity and identity.
154
J. Gómez-Ramirez and R. Sanz
Fig. 13.4 (X, d) is a metric space where X is the set of place cells in the hippocampus and d the Euclidean metric distance, (Y, e) is a metric space in the bidimensional plane with identical distance e = d. The mapping between the metric spaces f : X → Y preserves the distances if e(f (x1 ), f (x2 )) = d(x1 , x2 ). f is said to be an isometry and is immediately a monomorphism (Demo: x1 = x2 , e(f (x1 ), f (x2 )) = d(x1 , x2 ) = 0 ⇒ f (x1 ) = f (x2 )). An isometry that is an epimorphism is an isomorphism
We identify three possible definitions for the category CAT-Neur that may be useful for the development of the theory introduced in this paper: Neur, Neur+ and Neur*. The category Neur, whose objects are neurons and the morphisms are the synaptic paths between them, with the convolution of paths as composition. The category Neur* which is the category of neurons where the objects are topological spaces of neurons (N, θ ) and the morphisms are continuous maps. A function from two topological spaces f : (N, θ ) → (M, υ) is continuous if f −1 (B) ∈ θ whenever B ∈ υ. The category Neur+, which has as objects, metric spaces, and as morphisms, Lipschitz maps for λ = 1 that preserve distances. Note that a Lipschitz map is always continuous but the contrary is not true. The morphisms in Neur+ preserve distances between metric spaces which will exemplify neural assemblies.
13.2.2 The Category of Places Now we will define a category for modeling place fields, that is the physical locations that produce the spike firing in the grid cells and place cells. Following the previous definition for Neur, the category Field+ has as objects metric spaces (including hexagonal grids) and as morphisms contractions (a specific class of functions between metric spaces). And the category Field* is composed of topological spaces as objects and continuous functions as morphisms. The category of metric spaces is of course defined by objects and morphisms. The objects are metric spaces (X, d) and the morphisms are mappings between metric spaces (X, d) → (X , d ) (Fig. 13.4). As in any other category, the composition of morphisms must satisfy associativity and identity. A metric space is a structure (X, d) with X being a set and the function d : X × X → R+ satisfiying: 1. d(x, y) = 0 when x = y
13
Hippocampal Categories: A Mathematical Foundation for Navigation
155
2. d(x, y) = d(y, x) and 3. d(x, z) ≤ d(x, y) + d(y, z) Typically the function d is assumed to be the Euclidean distance. The Euclidean distance is a map n n + d : R × R → R . For n = 2 the distance is d((x1 , y1 ), (x2 , y2 )) = ((x1 − x2 )2 + (y1 − y2 )2 ). Of course, other distances are possible. One example of a metric that satisfies the three axions above is the “Manhattan distance” or d : Rn × Rn → R+ , where for a two dimension space, d((x1 , y1 ), (x2 , y2 )) = |x1 − x2 | + |y1 − y2 |. Definition 13.1 A mapping f : (X, d) → (X , d ) preserves distances if for all pair of points, x1 , x2 ∈ X, it holds d(x1 , x2 ) = e(f (x1 )f (x2 )). Definition 13.2 A function f : (X, d) → (Y, e) between two metric spaces is continuous at x0 ∈ X if for all > 0 there exists δ > 0 such that if d(x, x0 ) < δ then e(f (x0 ), f (x)) < Definition 13.3 A contraction is a Lipschitz map with λ < 1, while a map between two metric spaces f : (X, d) → (X , e), is such that d(x1 , x2 ) = e(f (x1 )f (x2 )), is a distance preserving map. Note that every Lipschitz map is continuous and as contractions are Lipschitz maps with λ < 1, contractions are continuous [2]. Now we are able to define the category Met of metric spaces and Lipschitz maps that are structure preserving maps. The composition of Lipschitz maps, g ◦ f , is a Lipschitz map and the properties associativity of composition and identity idx : (X, d) → (X, d), are trivially demonstrated. The topological spaces are useful when we are interested in closeness and continuity rather than in distance as it is the case in metric spaces. The category of topological spaces Top is one that has topological spaces as objects and continuous maps as morphisms.
13.2.3 Functor Between Neur and Field At this point we wish to define the relation between the Neur and Field categories that have been defined using the concept of functor. Let us consider that Neur+ is a category whose objects are sets of neurons and the arrows all the functions between them. In the case of considering only one place cell, the category Neur+ is a set of a single element or singleton. For an object of a given category C, there is an unique functor F : C → 1. Thus, there is an unique functor from the category of metric spaces and Lipschitz-distance perserving maps, Field+, and the category of one single place cell 1. Functors preserve isomorphisms, so given the functor F : C → D, the isomorphisms in category C are preserved in category D. An interesting feature of functors is that they may preserve properties. For example, since functors preserve composition of morphisms ◦, and identities, id, they preserve every property that can be positively expressed in the language of ◦ and id. In particular they preserve commutativity of diagrams [1]. So given a functor F : C → D, for certain objects, arrows or composition of arrows in category C, that have the property p, the functor F brings such property to the F -image. Definition 13.4 Let C and C two categories, a covariant functor F from C to C is defined as i. a rule which associates for every object A in C an object F (A) in the category C and ii. a rule that associates for every morphism α : A → B in C a morphism F (α) : F (A) → F (B) in the category C . Then F must satisfy the following two conditions:
156
J. Gómez-Ramirez and R. Sanz α
ii.a The composition is preserved: for the diagram (diagram is formally defined in Sect. 13.3.1) A −→ β
B −→ C in C, F (α ◦ β) = F (α) ◦ F (β) ii.b Identities are preserved: for any object A in the category C, F (idA ) = id(FA ) Now, the functor (more precisely a covariant functor) from a category of neurons CAT-Neur to the category Met of metric spaces, F : CAT-Neur → Met is such that: i Every object N in the category of neurons CAT-Neur is mapped onto an object F (N ) in the category Met. ii Every morphism α : N → N in CAT − Neur is mapped onto a morphism F (α) : F (N ) → F (N ) in the category Met. F preserves composition and identity. α
β
ii.a The composition is preserved, so A −→ B −→ C in CAT-Neur, F (α ◦N β) = F (α) ◦M F (β) (both sides of the equation are morphisms in Met). ii.b Identities are preserved, so for any object A in the category CAT-Neur, F (idA ) = id(Fa ) (both sides of the equation are morphisms in Met). The physiological interpretation of the functor is as follows. i means that it is possible for any object N in the category of neurons CAT-Neur to have associated a metric space (X, d). As it was stated in Sect. 13.2.1, the objects in the sets of category CAT-Neur are sets of neurons. Note that this is different to assign a location to a set of neurons, rather we are associating a set of neurons with a metric space N → (X, d), where d : X × X → R+ . For example, let Met1 be the category of metric planar space of diameter 1, (M, υ), that is, d(m, m ) ≤ 1 for all m, m ∈ M, M being an open ball. Then F (N ), F : N → (M, υ), represents that the object N , a set of neurons, falls into a circumference of diameter 1 in the two-dimensional space M. On the other hand, if we take for the category CAT-Neur the category Neur, then condition ii can be interpreted as follows, whenever there is a synapse between two neurons n, n , α : n → n , there is a relationship between the metric spaces associated to each of the synaptic neurons, F (α) : F (N ) → F (N ), such that F is a map that preserves composition and identity. α
β
Condition ii.a, if A −→ B −→ C, then F (α ◦ β) = F (α) ◦ F (β) simply means that the map associated to a synaptic path is equivalent to the map associated to the synapses. The last requirement, identity is preserved, can be interpreted as there is always a metric space for any neuron. It might be remarked that the functor F defined here, does not preserve the metric space defined in the category Met. This is in accordance with the empirical fact that the brain has no metric or at least not a Euclidean-like metric based on the concept of distance. Indeed, what F does is to bring the structure of the category of neurons over to the category of metric spaces Met. The very different nature of the two categories that are being mapped by F , makes difficult to see how F works. With an example we will try to make this point more clear. Let the objects of Neur be place cells, that is, neurons that fire when the brain occupies a position in a plane surface like for example a maze or a box. The metric space for the environment is given by the category Met. For every synapse α coupling two place cells, N and N in Neur. F (N) and F (N ) are called the place fields of N and N respectively in the category Met. Thus, the mapping F , in order to be a functor needs to be a structure preserving map between Neur and Met, the two categories being mapped by F . In the case that CAT-Neur is Neur whose objects are neurons, the relationship between the place field of the postsynaptic cell F (N ) and the place field of the presynaptic cell F (N) may be exemplified by d(F (Ni ), F (Nj )) ≤ d(Ni , Nj ), where Ni , Nj are in category Neur, and Ni , Nj in category Met.
13
Hippocampal Categories: A Mathematical Foundation for Navigation
157
13.3 A New Framework for Place and Grid Cells Here we propose a radically new theoretical framework for the formation of place cells from grid cells. The computational models of the hippocampus [3, 4, 18] state that the sum of a set of elements, grid cells, directly produce another element, a place cell. In doing so, these models take for granted that the properties of the sum are directly reducible to those of its components. This strict form of reductionism is at odds with the nature of complex systems. It is necessary to tackle the system as a whole, and bring to light the way in which the components interact, producing higher levels of functionality emerging from complexity, exemplified in new systemic properties that are not present in the single components. It might be remarked here, that this is not a criticism of the reductionist approach. Indeed the reductionist analysis is arguably the best plan of action that one may follow in order to understand how a system works. But this is just the half of the work, the synthetic endeavor must follow after the analysis. In what follows, we describe the effect in injecting the concepts of co-product and colimit from category theory into the problem of place cell formation in the hippocampus. The classical reductionism credo states that the whole is no more than the sum of its parts. Therefore the properties of the sum are reduced to those of its components, without introducing new properties. This is what the categorical concept coproduct exemplifies: for a given category, all one needs to know is about the components Ai to build the coproduct i Ai , this is possible because all the components play a symmetrical role in the construction. Definition 13.5 A coproduct of two objects A and B is a an object A + B together with the arrows ι1
ι2
α
β
A −→ A + B and B −→ A + B, such that for any object C and the pair of arrows A −→ C, B −→ C, there exists an unique morphism π that makes the following diagram commute: A
B
α
A+B
ι1
β ι2
π
C The coproduct generalizes to the direct sum as shown in the next diagram [14]: C αi (h)
πi i Ai
Ai
On the other hand, the more general concept of colimit embodies the collective operations made by the family of components Ai which are made possible because the components cooperate by means of the links that connect them [5]. The coproduct defined before is actually a special case of a colimit. The colimit in a category of a family of components Ai without any arrow between them is identical to the coproduct. A precise definition of colimit will be introduced later.
158
J. Gómez-Ramirez and R. Sanz
Fig. 13.5 The family of objects A1 , A2 , A3 , A4 has both a colimit cP and a coproduct i Ai . The coproduct is linked by s to the colimit. The link s express the transit from the coproduct to the colimit and embodies the symmetry breaking in the relationship between the family of objects Ai and the colimit
Fig. 13.6 A colimit K for the base diagram D. For the sake of clarity in the figure the diagram D has three objects Ai , i = 1, 2, 3
The colimit, contrary to the coproduct, entails a non symmetric relationship with its components. As the Fig. 13.5 depicts, the coproduct can be compared to the colimit cP . This symmetry breaking process may be somehow quantified by the arrow s.
13.3.1 Place Field as Colimit of Grid Fields The hypothesis prosed here is that the cooperation of the grid fields gives rise to the colimit which is a place field. Thus the colimit of the metric system depicted in Fig. 13.6 can be seen as an integrator of the information contained in the metric system components. It might be remarked that the colimit is an object of the category Field, a sort of complex object that actualizes the internal organisation of the objects that it is binding. Colimits and limits do not exist for all diagrams in all categories, but if they exist, they are unique up to isomorphism. The mathematical definition of colimit needs a prior definition, that of diagram, that is a precise concept in category theory. Definition 13.6 A diagram D in a category C is a collection of vertices and directed edges consistently labeled with objects and arrows of the category C. Thus, if an edge in the diagram D is labeled with an arrow f such that f : A → B in C, then the vertices this edge in the diagram D, must be labeled A and B [14]. Definition 13.7 Let D be a diagram in a category C with objects labeled Di and morphisms labeled fk : Di → Dj . We call cocone K for diagram D to the apical object B, together with the set of morphisms gi : Di → B forming a commutative diagram, that is, gj ◦ fk = gi .
13
Hippocampal Categories: A Mathematical Foundation for Navigation
159
Fig. 13.7 A colimit K for a diagram is a cocone defined in terms of the existence of morphisms from other cocones K
Fig. 13.8 The figure depicts a colimit where (4) acts as the place field of a place cell (6) in the hippocampus. The colimit is produced by several grid fields (one grid field (1) is produced by one grid cell (5))
Given the cocones K and K for D, a cocone morphism h : B → B is a morphism in C such that = h ◦ gi . To simplify the notation we denote the cocone morphism determined by h as h : K → K . Directly, the cocones form a category, the category of cocones cocD .
gi
Definition 13.8 A colimit for the diagram D is an initial object K in the category cocD , that is, for any other cocone K for diagram D, there exists a unique cocone morphism h : K → K . It follows from the definition that all colimits are isomorphic because all initial objects are isomorphic. Figure 13.8 shows that grid fields and grid cells in the medial entorhinal cortex (mEC), are linked by a map, as there is a map between place cells and place fields. Therefore for each grid cell, there is grid field, which is a metric space, where arrangement generates regular hexagons. For each place cell there is one place field which is also an object of the category of metric spaces, Field, but in this case, its geometry is a simple point rather than a hexagon.
160
J. Gómez-Ramirez and R. Sanz
We can assume that the neurons—place cells and grid cells—depicted in the bottom of the figure, are in the category Neur having as objects neurons and as morphisms synaptic connections. However, this is not always the case. For example, a neural category whose objects contain several neurons connected between them forming populations of neurons, has neuronal assemblies as objects rather than single neurons. In this line, it is particularly valuable to shed light on how populations of grid cells contribute to the formation of one place cell. The colimit is the mathematical structure that allow us to encode the emergence of place field and the relationship between grid fields. Now let us focus on the grid fields depicted as hexagons in Fig. 13.8 and their morphisms. It has been said above that regular hexagons are objects in the category Field, now we need to investigate the morphisms between the grid-field object in this category. A contraction between two grid-field objects (G1 , d, o, ψ), (G2 , d, o, ψ) is a continuous function f : (G1 , d, o, ψ) → (G2 , d, o, ψ), satisfying d(f (x, y)) ≤ d(x, y) and o(f (x, y)) ≤ o(x, y). This restriction is in accordance with the experimental finding that shows that spacing in grid fields, increases along the dorsoventral axis in the medial entorhinal cortex (mEC). This fact appears to be correlated with the increase in size of place fields along the dorsoventral axis of the hippocampus [8, 9]. Neighbor cells in the mEC have similar spacing and orientation. However, there is no evidence that anatomical cell clusters, correspond to functionally segregated grid maps with their own spacing and orientation [11]. On the other hand, the phase of the grid does not follow the restriction of continuity that spacing and orientation have. Indeed, firing vertices of colocalized grid cells are shifted randomly, that is to say, the mapping between vertices in the grid field and the external reference grid is not continuous. This is in fact how fields of neighboring hippocampal place cells behave. The colimit is a universal property; it is a remarkable fact that deserves to be explained. When a mathematical construction, in our case a colimit, satisfies an universal property, one can forget the details of the structure and focus on the universal property because all that has to be known about the colimit, is exemplified in the universal property. One important point that needs emphasis is that the existence of a colimit imposes constraints, not only on the diagram of grid cells that determine the colimit, but also on all the objects of the category. Besides, the colimit, if it exists, is uniquely determined (up to isomorphism) but the reverse is not true, one colimit can have several decompositions. Put it in the context of Fig. 13.8, this means that when the coordinated activity of a group of grid cells produce a place cell, this is a colimit and it is unique. But given a place cell, its place field cannot be uniquely determined by a group of grid cells, as a matter of fact, several grid fields are possible for that place field.
13.4 A Theory of Declarative Memory (Episodic and Semantic) Based on Category Theory The dual role of the hippocampus in formation and retrieval of concepts is not surprising, especially considering that the formation of new memory (knowledge) requires the retrieval of the old one. Thus, memory is knowledge, and perception is a condition of possibility of memory and therefore of knowledge. Just as any other higher cognitive function, to try to give a definition of memory seems hopeless. The definition in the MIT encyclopedia of cognitive sciences [19] is open enough to satisfy everyone: “the term memory implies the capacity to encode, store, and retrieve information”. However, it is also too unrestricted to provide a clear idea about what memory is and how it works.
13
Hippocampal Categories: A Mathematical Foundation for Navigation
161
Certainly, memory is not an univocal term, it has several forms that depend on different brain mechanisms. So a well-founded strategy to get an understanding of how memory works is to investigate how such cognitive process is implemented in the brain. The idea behind this is that the layman’s view of memory, which is still commonly used, which will become irrelevant once the biological mechanisms of memory have been uncovered and, if possible, described in mathematical terms. The main point that is being explored in this heading is that despite the diverse nature of episodic and semantic memory, it is possible to connect them via categorical objects like product, pullback or colimit. Let us begin by the introduction of the categorical product and its application in a navigational task in one dimension, after which the results will be expanded to navigation in a two-dimensional arena and the use of the categorical concept of pullback.
13.4.1 Categorical Product in Acquisition of Middle Point Concept in 1D Navigation Suppose a rat is placed in a track (one dimensional environment), the animal immediately starts moving back and forth in order to get and idea of the dimensions of the environment. As the rat moves from one point to the other, episodic memories are created. Thus the animal is able to make the association of self-centered information with the temporal order in which the different positions are reached. Episodic memories are not explicit. Explicit ones may be retrievable independent of the internal state of the rat. Suppose there is no particular visual or smell stimulus that can make the rat remember any particular position. One may think that after a while, the rat will acquire an explicit memory, for example the concept of middle point which exemplifies the position in the track, from where it needs the same amount of time to get any of the extremes. A cognitive behavior involves integration of information. The categorical concept of product is a formalisation of integration. Moreover, as it will be shown later, a product in a category that admits a final object, is an instance of a more general categorical form, pullback. Definition 13.9 In a given category C, a product of two objects A and B, is another object P equipped p1
q1
f
with two morphisms, P −→ A and P −→ B, such that for any pair of morphisms, X −→ A and g X −→ B there is an unique morphism h making the following diagram commute: X x1
x2 h
A
p1
P
p2
B
Note that the broken arrow h means that it is unique. The morphisms p1 , p2 are usually called projection morphisms. The main characteristic of a product is that the constituents are retrievable via the projection morphism.
162
J. Gómez-Ramirez and R. Sanz
Fig. 13.9 WA and WB are the two walls that the rat will meet when moving in a single track maze. After reaching both walls, the animal would develop the concept of middle point P
The following diagram shows the use of the categorical product for the modeling of the process of acquisition of place memories of the middle point between two walls WA and WB . X x1
x2 h
WA
p1
P
p2
WB
For our purpose, the categorical product given by the object P and the morphisms p1 , p2 is a statement about a cognitive behavior of the rat, whereas X and x1 , x2 is a constraint on what constitutes a valid product, rather than a specific claim about cognition. Note that there is not any particular commitment in the morphisms p1 , p2 . In fact, p1 can mean the travel time to reach the wall A, WA , but also the number of steps needed. Figure 13.9 represents a possible experimental setting that could be use to explore the biological plausibility of our theory in the acquisition of the middle point concept in a rat moving in a single track maze. P , WA and WB being objects in the category C of memories or mental objects that will be described with more detail in future works.
13.4.2 Categorical Pullback in Acquisition of Middle Point Concept in 2D Navigation Now suppose the rat is removed from the one dimensional track depicted in Fig. 13.9 and put upon a plane. The rat’s capacity to build the explicit memory for the middle point of the arena can be seen as analogous to the generalised product—i.e.: a pullback. Definition 13.10 In a category C, a pullback of two morphisms with common codomain A →f p1 p2 C ←g B is an object P together with a pair of morphisms P −→ A and P −→ B that form a commutative diagram f ◦ p1 = g ◦ p2 . P
B
p2
p1
g
C
A f
Moreover, the morphisms are universal among such squares because for any pair of morphisms z1 z2 Z −→ A and Z −→ B such that f ◦ z1 = g ◦ z2 , there is an unique morphism h such that the following
13
Hippocampal Categories: A Mathematical Foundation for Navigation
163
diagram commutes: Z
z2 h
P
z1
B
p2 p1
g f
C
A
A pullback may be seen as a constrained product, being the constrain given by f and g, f ◦ p1 = g ◦ p2 . P
WB
p2
p1
g
WC
WA f
13.4.3 Pullback and Grid Cell Formation The concept of pullback may be useful in dealing with grid cells. The way in which grid cells are calculated in literature is tricky. One of the three parameters refers to how accurate the representation outside the cell is. In doing so you are putting the outcome of the system in the input. In the following diagram, P can be seen as a grid cell where the projection morphisms p1 and p2 refer to the orienting and the spacing respectively. The morphisms f and g impose additional restrictions in the grid cell, like for example the constant value of those parameters all over the arena. P
B
p2
p1
g
A
C f
13.5 Discussion A theory that fully explains and predicts the highly complex cognitive abilities like perception, memory or learning has not been produced yet. Our society needs to deal with problems like for example Alzheimer’s disease that is ravaging a big sector of the population. It goes without saying that to shed light on the role played by the hippocampal system in cognitive functions like memory and learning can be of extraordinary value for the future of our own species. We must exploit the important fact that from the point of view of neurobiological knowledge, memory and perception share the same neural substrate. The time is ripe for a mature and rigorous approach to brain structure and function that sets the basis for a shareable scientific framework, able to carry out knowledge, commonly understandable among the different actors in the brain sciences.
164
J. Gómez-Ramirez and R. Sanz
References 1. Adámek, J., Gumm, H.P., Trnková, V.: Presentation of set functors: a coalgebraic perspective. J. Log. Comput. 20(5), 991–1015 (2010) 2. Arbib, M.A., Manes, E.G.: Arrows, Structures, and Functors: The Categorical Imperative. Academic Press, New York (1975) 3. Arleo, A., Gerstner, W.: Spatial cognition and neuro-mimetic navigation: a model of hippocampal place cell activity. Biol. Cybern. 83, 287–299 (2000) 4. Burgess, N., Recce, M., O’Keefe, J.: A model of hippocampal function. Neural Netw. 7, 1065–1081 (1994) 5. Ehresmann, A.C., Vanbremeersch, J.-P.: Memory Evolutive Systems; Hierarchy, Emergence, Cognition. Elsevier, Amsterdam (2007) 6. Fyhn, M., Molden, S., Witter, M.P., Moser, E.I., Moser, M.-B.: Spatial representation in the entorhinal cortex. Science 305, 1258–64 (2004) 7. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., Moser, E.I.: Microstructure of a spatial map in the entorhinal cortex. Nature 436, 801–6 (2005) 8. Jung, S.I., Wiener, M.W., McNaughton, B.L.: Comparison of spatial firing characteristics of units in dorsal and ventral hippocampus of the rat. J. Neurosci. 14, 7347–7356 (1994) 9. Kjelstrup, K.B., et al.: Very large place fields at the ventral pole of the hippocampal CA3 area. Neurosci. Abstr. 33(93), 1 (2007) 10. Milford, M.J.: Robot Navigation from Nature. Springer, Berlin (2008) 11. Moser, E., Kropff, E., Moser, M.: Place cells, grid cells, and the brain’s spatial representation system. Annu. Rev. Neurosci. 31, 69–89 (2008) 12. Muller, R.U., Kubie, J.L., Ranck, J. Jr.: Spatial firing patterns of hippocampal complex-spike cells in a fixed environment. J. Neurosci. 7, 1935–1950 (1987) 13. O’Keefe, J., Nadel, L.: The Hippocampus as a Cognitive Map. Oxford University Press, Oxford (1978) 14. Pierce, B.: Basic Category Theory for Computer Scientists. MIT Press, Cambridge (1991) 15. Ramón y Cajal, S.: Histologie du Systeme Nerveux de l’Homme Vertebres, vols. 1 and 2. Maloine, Paris (1911) 16. Redish, A.: The hippocampal debate: Are we asking the right questions? Behavioral Brain Research 127(935) (2001) 17. Solstad, T., Moser, E., Einevoll, G.: From grid cells to place cells: A mathematical model. Hippocampus 16, 1026– 1031 (2006) 18. Touretzky, D.S., Redish, A.D.: Theory of rodent navigation based on interacting representations of space. Hippocampus 6(3), 247–70 (1996) 19. Wilson, R.A., Keil, F.C. (eds.): The Mit Encyclopedia of Cognitive Sciences. MIT Press, Cambridge (2001) 20. Zilli, E.A., Yoshida, M., Tahvildari, B., Giocomo, L.M., Hasselmo, M.E.: Evaluation of the oscillatory interference model of grid cell firing through analysis and measured period variance of some biological oscillator. PLoS Comput. Biol. 5(11), e1000573 (2009)
Chapter 14
The Role of Feedback in a Hierarchical Model of Object Perception Salvador Dura-Bernal, Thomas Wennekers, and Susan L. Denham
Abstract We present a model which stems from a well-established model of object recognition, HMAX, and show how this feedforward system can include feedback, using a recently proposed architecture which reconciles biased competition and predictive coding approaches. Simulation results show successful feedforward object recognition, including cases of occluded and illusory images. Recognition is both position and size invariant. The model also provides a functional interpretation of the role of feedback connectivity in accounting for several observed effects such as enhancement, suppression and refinement of activity in lower areas. The model can qualitatively replicate responses in early visual cortex to occluded and illusory contours; and fMRI data showing that high-level object recognition reduces activity in lower areas. A Gestalt-like mechanism based on collinearity, coorientation and good continuation principles is proposed to explain illusory contour formation which allows the system to adapt a single high-level object prototype to illusory Kanizsa figures of different sizes, shapes and positions. Overall the model provides a biophysiologically plausible interpretation, supported by current experimental evidence, of the interaction between top-down global feedback and bottom-up local evidence in the context of hierarchical object perception.
14.1 Introduction Although, traditionally, models of the visual system have focused on feedforward processes, it is becoming increasingly clear these are limited in capturing the wide range of complexities involved in visual perception. Recent reviews [4, 27] suggest that approximately only 20% of the response of a V1 neuron is determined by conventional feedforward pathways, while the rest arises from horizontal and feedback connectivity. Anatomically, feedforward sensory pathways are paralleled by a greater number of top-down connections, which provide lower areas with massive feedback from higher cortical areas [6]. Feedback terminations in the primary visual cortex (V1) are functionally organized and well-suited for centre-surround interactions, and unlike horizontal connections, their spatial and temporal properties have been found to provide an explanation for extra-classical distal surround effects [1]. Experimental evidence shows that feedback originating in higher-level areas, such as V4, IT or MT with bigger and more complex receptive fields, can modify and shape V1 responses, accounting for contextual or extra-classical receptive field effects [13, 16, 21, 25, 36, 39]. Nonetheless, the role of feedback is still far from being understood, as highlighted by the apparently contradictory effects observed in these experiments, suggesting response fields based on an intricate interaction between stimuli, surrounding context, attentional priors, and previous experience [8, 12]. A notable example S. Dura-Bernal () Centre for Robotics and Neural Systems, University of Plymouth, Drake Circus, Plymouth, Devon PL4 8AA, UK e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_14, © Springer Science+Business Media, LLC 2011
165
166
S. Dura-Bernal et al.
is observed in V1/V2 activity in response to illusory contours with no direct retinal stimulation (e.g. Kanizsa figures), as confirmed by ERP, EEG, MEG and fMRI studies [23]. The experiments show illusory contour-related activity emerging first in Lateral Occipital Cortex (LOC), then V2 and finally in V1, strongly suggesting that the response is driven by feedback [21, 24]. Another remarkable study showed that the feedback-mediated response in foveal retinotopic cortex contains information about objects presented in the periphery, far away from the fovea [41], revealing that even the fundamental concept of a receptive field might be an inappropriate and misleading way to characterize feedback. While there is relative agreement that feedback connections play a role in integrating global and local information from different cortical regions to generate an integrated percept [2, 22], several differing (though intersecting) approaches have attempted to explain the underlying mechanisms. In predictive coding, derived from Bayesian principles, each level attempts to predict the responses of the next lower level via feedback connections, while feedforward activity carries the error signals. Making predictions at many temporal and featural scales is an effective strategy for discovering novelty, and for verifying and refining the accuracy of internal representations, which in turn allows the system to generate better predictions. Supporting experimental evidence shows the suppression of, hypothetically error signalling, neural activity which can be contextually explained by higher-levels [11, 30]. In this paper we explore the role of feedback in object perception, taking as a starting point a biologically inspired hierarchical model of object recognition [31, 33], and extending it by adding feedback connectivity based on a biased competition architecture [5]. Hence, the model not only achieves successful feedforward recognition invariant to position and size, but is also able to reproduce modulatory effects of higher-level feedback on lower-level activity. Finally, we extend the model to integrate a mechanism based on lateral connectivity, which solves the conceptual barriers present in obtaining spatial precision from invariant high-level abstract object representations. This enables the model to simulate illusory contour completion, present in lower visual areas, even for Kanizsa figures of different sizes and at different positions.
14.2 Methods/Model This section is organized in three parts. Firstly, the feedforward architecture and operations of the hierarchical object recognition system are described. Secondly, the model is extended to include feedback connections and Belief units, which combine bottom-up and top-down information. We use a particular type of biased competition model, in which nodes suppress the inputs instead of neighbouring nodes, and the model is therefore argued to implement predictive coding. Finally, we describe the Gestalt-based algorithm employed to generate the feedback connectivity weights from complex to simple layers. These weights are used in the biased competition model to generate the feedback response in the lower simple layer, by disambiguating top-down activity from the complex layer above.
14.2.1 Feedforward Architecture We start by describing the basic feedforward architecture and operations which serve as the backbone of the model, and which guide the design and imposed the necessary constraints on the feedback extension. The model attempts to reproduce activity and functionality observed along the ventral or ‘what’ visual pathway, comprising areas V1, V2, V4 and IT. It is based upon widely accepted basic principles, of cortical object recognition such as a hierarchical arrangement of these areas, with a progressive increase in receptive field size and complexity of preferred stimuli, as well as a gradual
14
The Role of Feedback in a Hierarchical Model of Object Perception
167
Fig. 14.1 Feedforward architecture. Left: Structure of units. Each location of the image is encoded by a set of units tuned to different stimuli, which increase in size and complexity along the hierarchy. The number of locations and prototypes for each layer is shown. Right: Graphical representation of prototypes at bottom level (32 Gabor filters), intermediate level (200 abstract features/object parts), top level (60 2D objects)
build-up of invariance to position and scale as we go further up the hierarchy. At the lowest level we observe simple V1 neurons, with small receptive field tuned to basic oriented gratings, while at the other end of the spectrum evidence shows IT neurons associated with complex invariant object-like representations [15]. The architecture of the model is arranged in 3 levels, roughly representing areas V1, V2/V4 and IT; and each level is composed of 2 layers, simple and complex, stemming from the basic Hubel and Wiesel [14] proposal. Two operations are performed in alternating layers of the hierarchy: the invariance operation, which occurs between layers of the same level (e.g. from S1 to C1); and the selectivity operation implemented between layers of different levels (e.g. from C1 to S2). Invariance is implemented applying the max function over a set of afferents selective to the same feature, but with slightly different positions and sizes, achieving responses which are invariant to translation and scale. An initial training stage is required to learn the S2 prototypes in an unsupervised manner. The 200 prototypes are learned by extracting patches, each one composed of 144 elements (3 × 3 positions ×4 orientations ×4 scale bands), from the C1 layer response. To ensure prototypes are statistically meaningful we used the k-means algorithm, an unsupervised clustering technique particularly wellsuited to extract Radial Basis Function centres. The resulting prototypes are therefore a weighted combination of C1 invariant Gabor-like features at different orientations, positions and scales. This generates units with bigger RF sizes and with a wide variety of complex shape selectivities, as shown in the small sample in Fig. 14.1. The 60 S3 prototypes are learned in a supervised way using as input each of the objects in the training set (see Fig. 14.1), and extracting a representative 6 × 6 patch from the C2 response. This
168
S. Dura-Bernal et al.
Table 14.1 Model feedforward parameters Parameter
S1
C1
S2
C2
S3
C3
Operation performed
Gabor filter
MAX
RBFa
MAX
RBF
MAX
Pooling area over previous layer
7 × 7 . . . 37 × 37
14 × 14
3×3
3×3
6×6
6×6
Shift between units
1
8
1
2
1
1
Number of prototypes
32
16
200
200
60
60
Number of spatial locations
160 × 160
22 × 22
22 × 22
11 × 11
6×6
1
a Radial
Basis Function
patch, which constitutes an abstract representation (scale and position invariant) of each of the objects in the training set, is stored as the S3 prototype which can be used for template-matching during the recognition phase. The implementation details for each of the layers are summarized in Table 14.1. The pooling over previous layer refers to the number of units used as input to the function (similar to the receptive field). The shift between units is an indication of the overlap between the receptive fields of the upper level units. The parameters of the model were based on existing HMAX implementations [3, 31, 34], to which we refer for a more detailed description. For supporting neurophysiological evidence see the Discussion section.
14.2.2 Feedback Extension The proposed extended model, shown in Fig. 14.2, employs three different types of units at each level to encode the feedforward error signal (eS and eC units), feedback predictive signal (fS and fC units) and Belief (B units). This yields an architecture similar to a recent model which reconciles biased competition with predictive coding approaches [37]. The main difference is the fact that each level now consists of two layers, i.e. simple and complex. Belief units at each level maintain an active representation of the stimuli, which is multiplicatively modulated by bottom-up error and top-down prediction. The bottom-up error at a given level is a function (fFsel ) of the complex error units (eC) in the level below and is symbolically labeled λ (from likelihood). Complex error units (eC) are a function (fFinv ) of simple error units (eS). These, in turn, are calculated by dividing the Belief (B) at that level by the top-down prediction (π ). The top-down prediction at each level is given by the simple feedback units (fS) at that level and is symbolically labeled π (from prior). Simple feedback units (fS) are a function (fBinv ) of complex feedback units (fC). These, in turn, are a function (fBsel ) of the Belief (B) in the level above. Note Fig. 14.2 is a simplified schematic version of the model architecture, where each oval represents the set of units coding all the different features and locations of that layer. The number and properties of units in each layer is defined by the feedforward parameters described in the previous section. The following simplified equations define the dynamics of the system presented in Fig. 14.2, where in order to generalize the equations, the indices i, [i − 1] and [i + 1] are used to denote the level of
14
The Role of Feedback in a Hierarchical Model of Object Perception
169
Fig. 14.2 Architecture of the biased competition/predictive coding implementation of the invariant hierarchical object perception model. Left: Schematic representation of the whole model (3 levels). Right: Detailed ‘zoomed in’ diagram the intermediate level. Belief units at each level maintain an active representation of the stimuli, which is multiplicatively modulated by bottom-up error and top-down prediction. See text for details
the unit: Bi t = Bi t−1 · λi t · πi t /N λi t = fFsel (W eC[i−1] , eC[i − 1]t ) + kλ eCi t = fFinv (W eSi , eSi t ) eSi t = Bi t /f Si t
(14.1)
πi t = f Si t + kπ f Si t = fBinv (W f Si , f Ci t ) f Ci t = fBsel (W f Ci , B[i + 1]t ) where t = time in discrete steps; kλ = feedforward noise constant; kπ = feedback noise constant; N = normalizing constant; W eSi , W eCi , W f Si , W f Si = weights between units in different layers; fFsel , fFinv , fBsel , fBinv = functions linking units in different layers; B1 = Evidence = W eC0 · Image. The feedforward functions for selectivity, fFsel , and invariance, fFinv , are derived from the feedforward operations in the HMAX model, namely, the Radial Basis Function (approximated as a dotproduct) and MAX operations respectively. This means, during the initial time-step, when we assume there is no top-down modulation, the error units (eS and eC) have equivalent properties to the units in the standard feedforward model (HMAX). To complete the definition of the model we need to specify the feedback selectivity operation, fBsel , that will link Belief units (B) with complex feedback units in the level below (eC). Taking into account that feedforward Radial Basis Function can be approximated by a weighted sum operation, the feedback function can be trivially obtained by appropriately inverting the prototype weights, and applying an analogous weighted sum operation. However, the feedback invariance function, fBinv , requires some additional processing to obtain the appropriate weights. The specific algorithm developed for this purpose is described later in this section. The equations for the feedforward and feedback
170
S. Dura-Bernal et al.
functions are as follows: Q M eCim,q · Wm,q eS[i + 1] = fFsel (eCi) = m=1 q=1
Q M eCi = fFinv (eSi) = max max eCim,q · Wm,q m=1 q=1
(14.2)
R N f Ci = fBsel (B[i + 1]) = B[i + 1]n,r · Wn,r n=1 r=1
R N f Si = fBinv (f Ci) = max max f Cin,r · Wn,r n=1 r=1
By adjusting the values of kλ and kπ in the set of (14.1) we can vary the modulatory effect that the feedforward and feedback components will have on the Belief. We assume kπ is fixed to 1, so feedback is always excitatory, while we make kλ proportional to the ambiguity (measured as the inverse of the standard deviation) of the error signal in relation to the Belief, kλ = σ (Bit )/σ (λti ). This means when the Belief is ambiguous and the error signal is highly informative, the modulatory effect of the error signal will be high (e.g. during the initial time step). In contrast, when the Belief has been already established and the error signal is low, its modulatory effect will be limited. The normalizing constant N determines the overall level of enhancement/reduction of the Belief, and is calculated as N = 1/(max(λti + kλ ) · mean(πit + kπ )). This value tries to ensure relative stability in the resulting Belief, by maintaining the population activity within a certain range. However, due to the intrinsic dynamics of the model, certain values of the distribution will be enhanced while others will be reduced. For the feedback invariance operation, we require a weight matrix which relates a complex node with each of its simple children nodes. Note that this information is not contained in the prototype weight matrix learned during the training phase, as the feedforward invariance (max) operation is nonlinear, and doesn’t employ a weight matrix which can be inverted. In other words, the precise detailed information is lost as one moves up the hierarchy leading to scale and size invariant representations at the top level. Therefore to obtain the required feedback weights we propose a novel disambiguation algorithm which maps the response of one single complex unit to many simple units. To do this the algorithm uses existing feedforward responses as cues and implements extrapolation techniques based on collinearity, co-orientation and good continuation principles [17]. To illustrate this, Fig. 14.3 shows how activity from the C1 layer (orange circles labelled A to D) feeds back to the corresponding set of S1 units (smaller orange circles). The distributed feedback is then disambiguated by using local evidence precisely represented in the S1 layer (black line), and by applying extrapolation methods to generate the feedback weight matrix (green circles). Initially, there is not enough local evidence at t = 1 to disambiguate the feedback from units C and D. Only at t = 2, after the S1 contour has been extended due to the effect of units A and B, can this feedback be disambiguated based on the new local evidence. This effect is only possible due to the overlap present in the model, which means the same S1 unit can receive feedback from several C1 units. The response amplitude, the length and the orientation of the local activity must meet some minimum thresholds in order to be used as disambiguation cues. However, due to the small pooling area covered by each C1 unit (i.e. 14 × 14 S1 units), it is difficult to discriminate between locally similar patches such as 1, 2 and 3 (in Fig. 14.4A). By including information from the surrounding area we can discriminate the patches based on other variables such as the length of the co-oriented line segment and whether the patch belongs to a contour with different
14
The Role of Feedback in a Hierarchical Model of Object Perception
171
Fig. 14.3 Disambiguation algorithm which calculates the weight matrix for the feedback invariance function. Activity from the C1 layer (orange circles labeled A to D) feeds back to the corresponding set of S1 units (smaller orange circles). The distributed feedback is then disambiguated using local evidence precisely represented in the S1 layer (black line), and applying extrapolation methods to generate the feedback weight matrix (green circles). The resulting feedback weight matrix (WFEEDBACK ) is shown under each S1 patch. See text for details
orientations (curved contour). Therefore, a so-called contextual map is built so that the value of each C1 location is based on the following 2 operations: 1. Sum of collinear adjacent units with the same orientation—this increases the weight of the unit proportional to the length of the co-oriented, collinear line segment to which it belongs. 2. Subtraction of collinear adjacent units with orientations at 45°—this decreases the weight of units which are likely to belong to curved contours. Figure 14.4B shows the horizontal orientation contextual map for the Kanizsa’s square image, where warmer colours indicate locations likely to present horizontal contours as a function of its surround. Note the locations where illusory contours are expected to arise show high values as they are summing activity from the collinear pacmen segments situated at both sides, consistent with interpolation theories. As a result, the combined operations of the disambiguation algorithm obtain a feedback weight matrix which locally maps the activity of each C1 unit to its S1 afferent units. A similar process is applied between layers C2-S2 and C3-S3, although in these cases there are no available properties (such as orientation) over which to apply extrapolation techniques. This means the weight matrix is
172
S. Dura-Bernal et al.
Fig. 14.4 Feedback disambiguation through contextual maps. (A) S1 horizontal units response to Kanizsa square. Ambiguous patches 1, 2 and 3 are difficult to differentiate locally due to the small pooling area covered by each C1 unit (i.e. 14 × 14 S1 units). (B) Contextual map for horizontal orientation for Kanizsa square, based on surrounding information helps to discriminate between patches. Warmer colours indicate locations likely to present horizontal contours
obtained simply based on the equivalence between the simple layer prototypes present at the potential feedback locations and the complex layer winner prototype.
14.3 Results 14.3.1 Feedforward Recognition The network was trained using 60 object silhouette images from which the S2 and S3 prototypes were learned. The trained network was then tested on different transformations of the same images including occluded, translated and scaled versions (Fig. 14.4A). The experiment was repeated for a second set of 60 natural images (Fig. 14.4B). For the occluded test set an average of 30% of the image’s black pixels are deleted using a rectangular white patch. The rectangle is placed in a position which makes the image still identifiable to a human observer. In the translated test-set the object is moved to a new position within the available image frame of 160 × 160 pixels. The displacement will be near to the maximum permitted in both directions but will depend on the original object size, i.e. small objects allow for bigger displacements. Two different scale sets have been used: scale ±10% where the image is scaled to either 90% or 110% of the original size and centred; and scale ±20% where the image is scaled to either 80% or 120% of the original size and centred. Additionally, to test the degree of scale invariance we used an individual object at different scales (±10%, ±20% and ±30%) (Fig. 14.4C). The average percentage of successfully categorized objects was 98% for silhouettes and 88% for natural images, demonstrating the tolerance of the model to variations in object location, scale and occlusions. Correct categorization occurs when, during the first time step, the highest value of the probability distribution over objects at the eC3 layer matches the input object.
14.3.2 Feedback Modulation We compared the activity in the error layers before and after feedback modulation using an object present in the training set, i.e. recognized unambiguously. The Belief and feedback units are set initially to represent flat distributions, under the assumption that no high-level information is known about the input image. Error units in layers eS1 and eC1, which code the difference between the bottom-up input (B1) and the high-level prediction (B2) will show high activity in response to a novel input image which cannot yet be predicted. This activity will then propagate upwards to B2, and again
14
The Role of Feedback in a Hierarchical Model of Object Perception
173
Fig. 14.5 Feedforward selectivity and invariance results. (A) The model was trained using a dataset containing 60 silhouette objects. It was then tested using occluded, translated and scaled versions of these objects. The results, obtained during the first feedforward pass in the top layer (eC3), show the percentage of correctly categorized objects for each of the categories. An example image is shown for each of the categories. (B) Analogous results for 60 natural images of objects. (C) Size-invariant recognition results. The probability distribution over the 60 objects of the dataset is shown for each of the scaled versions of the same object. Although noise increases proportionally to the scaling percentage, the image is still correctly categorized in all cases
the error populations eS2 and eC2 will yield high activity as the higher level (B3) still provides no prediction. Once B2 and B3 have been updated, the high-level predictions are fed back through the feedback units and reduce the response of the error units. The feedback prediction from B2 will not be completely accurate as the reconstruction is based on a limited number of S2 prototypes. Still, the response in eS1 and eC1 is strongly reduced indicating that the level above captures most of the information present in the input image. Analogously, the response in eS2 and eC2 is also significantly decreased as the top-down feedback from B3 manages to predict the activity in B2. A comparison between the error populations at different layers before and after feedback is shown in Figs. 14.6A and 14.6B. Three different sources are combined multiplicatively to generate the Belief in level 2: the Belief in the previous time step, the bottom-up error and top-down prediction. During the first time step the Belief is entirely determined by the bottom-up error as the other two sources are assumed to represent flat distributions. In contrast, during the next cycle feedback from B2 manages to predict lower-level input, so the bottom-up error will be strongly reduced and present a relatively flat distribution (due to the noise constant). Features which have not been reconstructed correctly will have slightly higher values, while features which are completely captured by the top-level prediction will have lower values. Generally, this means the error signal will have a low modulating effect on the resulting Belief, resulting in a relatively homogeneous suppression of all features, which can be seen as a kind of self-normalization.
174
S. Dura-Bernal et al.
Fig. 14.6 Feedback modulation results. (A) Probability distribution over the 200 S2 features for level 2 simple error units (eS2) at a specific location, before (red line) and after (blue dotted line) feedback. After feedback from level 3 (B3) manages to correctly predict activity in B2, the error population activity is reduced. (B) Visuotopic representation of the response of level 1 complex error units (eC1), summing over the 4 orientations of size 7 pixels, to a penthagon input image, before (left) and after (right) feedback. The feedback prediction from level 2 (B2) can approximately reconstruct the input image, and therefore the response of the error units is strongly reduced. (C) Response of Belief units before (red line) and after feedback (blue dotted line). Only a small subpopulation of units consistent with the high-level prediction is enhanced, while the rest are suppressed. (D) Comparison between the model and Murray’s [25] results on perceptual grouping. Activity in higher level areas increases when perceptually grouped stimuli (2D shape) are used instead of random lines. At lower levels the opposite effect is observed, as a consequence of lower-level error populations showing reduced activity due to the higher levels being able to represent and correctly predict their input (predictive coding)
On the other hand, the top-down prediction will present an excitatory signal that augments the response of features from the previous Belief. Features where the feedback modulatory effect is stronger than the bottom-up inhibition will be enhanced, while the rest will be suppressed. Figure 14.6C shows an example of Belief refinement, where only a small subpopulation of units consistent with the highlevel prediction is enhanced, while the rest are suppressed. The results are also congruent with [25] results which showed that when local information can be perceptually grouped into whole objects, activity in V1 decreases while activity in higher areas increases, and vice versa. We compared the output of the model using similar stimuli to those used in the experiment (random lines and 2D hexagon) and observed an equivalent pattern after feedback exerted its effects (see Fig. 14.6D). Layer eC3, considered here as the counterpart to the LOC area as the response of each unit is associated with a position and size invariant representation of an object, showed increased activity for the 2D hexagon. At the same time, the response in layer eC2 was strongly diminished, and the response in B2 moderately reduced, for the perceptually grouped object. Only when the object is recognized in B3 can it make an accurate prediction, which reduces the error (predictive coding) and refines the Belief. Figure 14.7A shows the temporal evolution of the lower level prediction c1F and s1F derived from the intermediate level Belief, B2, when using a Kanizsa’s illusory figure as input image. Layer c1F shows a gradual shift from the ‘4 pacmen’ to the ‘square’ figure due to the effect of high-level feedback
14
The Role of Feedback in a Hierarchical Model of Object Perception
175
Fig. 14.7 Illusory contour formation in the model. (A) During the initial feedforward stage the input image (Kanizsa square) is categorized in the top level as a ‘square’. Feedback from this area modulates the Belief in level 2 by enhancing the features consistent with the ‘square’. The feedback prediction from B2 to level 1 (fC1) consequently represents a gradual shift from the ‘4 pacmen’ in the input image to the ‘square’ representation. To obtain the feedback prediction in the simple layer of level 1 (fS1), the feedback disambiguation calculates the appropriate weights based on existing local evidence and good continuation principles. (B) Feedback from single high-level prototype completing distorted version of the Kanizsa’s square. From left to right: translated 10 pixels to the right, scaled 90%, scaled 110%, with rectangular symmetry, and with blurred edges. Note the contour completion will only occur when the variations lies within the position and scale invariance range of the high-level prototype; and when there is enough local evidence to guide the extrapolation/interpolation process. The second condition is not satisfied in the figure with blurred edges. (C) Illusory contour formation of an ‘occluded illusory square’ as predicted by the model. Two of the sides presented illusory contour completion, while the other two showed a certain degree of extended illusory contour due to extrapolation, which diminishes as it reaches the occluder
176
S. Dura-Bernal et al.
from B3 imposing its Belief (the image was recognized as a ‘square’ object). Subsequently, activity in s1F shows a gradual development of the illusory contour due to the disambiguation algorithm which exploits collinearity, co-orientation and good continuation principles. Figure 14.7B shows the model response after feedback for different versions of the Kanizsa square, including translated 10 pixels to the right, scaled 90%, scaled 110%, with rectangular symmetry, and with blurred edges. Results show how a single high-level prototype manages to complete the different distorted versions of the Kanizsa square. Note the contour completion will only occur when the variation lies within the position and scale invariance range of the high-level prototype; and when there is enough local evidence to guide the extrapolation/interpolation process. The second condition is not satisfied in the figure with blurred edges. Additionally we tested a special type of figure which can be considered an occluded illusory square. Two of the sides presented illusory contour completion, while the other two showed a certain degree of extended illusory contour due to extrapolation, which diminishes as it reaches the occluder (see Fig. 14.7C). This prediction still remains to be confirmed both through psychophysical and neurophysiological studies. The example illustrates how the illusory contours don’t emerge around the occluder, despite receiving feedback activity, as interpolation and extrapolation mechanisms have no local cues available.
14.4 Discussion The model described extends an existing feedforward hierarchical model of object recognition to include feedback connectivity and provides a framework in which both sources of information can be integrated. Feedback is implemented as a biased competition architecture consistent with predictive coding principles. The model is constrained by selectivity and invariance properties found in neurons in the different simulated cortical areas, as well as by the general principles governing hierarchical object recognition in the ventral pathway [31, 34]. Integrating top-down influences, mediated by feedback projections, with bottom-up processing had been previously pointed out as one of the main limitations and future challenges for the HMAX model [32]. Our proposed model addresses this challenge. Overall, physiological data on simple and complex RF size, spatial frequency and orientation bandwidth are in good agreement with the model S1 and C1 tuning properties, as well as with the hypothesis of complex cells performing a MAX operation over simple cell afferents [34]. As for the upper levels, It has been shown that the S2-C2 hierarchy produces both selectivity and invariance that matches observed responses in V4 [3]. Although the implementation of top-level units S3-C3 varies between previous versions of HMAX, e.g. from Gaussian tuning [31] to Support Vector Machines [33], the overall concept is preserved and captured by the current model. Top level units present bigger RFs and are tuned to complex composite invariant features, which are consistent with the so-called viewtuned cells present in the higher levels of the ventral pathway, such as the infero-temporal cortex [15, 28, 35]. In relation to the model operations, neurons in area V4 in the primate [7] and complex cells in the cat visual cortex [20] have both been found to perform a MAX operation. In the latter study, when optimal and non-optimal bars were presented simultaneously, the response elicited by the complex cells closely resembled the response when the optimal stimulus was presented alone. Recently, a biophysical model, based on standard spiking and synaptic mechanisms found in the visual and barrel cortices, was shown to implement both the invariance and tuning operations, satisfying the timing and accuracy constraints required for the model of object recognition [18]. Similarly, both distinct neural operations were also approximated by a canonical circuit, which involved divisive normalization and nonlinear operations, and was in accordance with neurophysiological data [19].
14
The Role of Feedback in a Hierarchical Model of Object Perception
177
Regarding feedback, we hypothesize the proposed mechanism arises from the interaction between feedback and lateral connections in layers 2/3. Lateral connections are responsible for collinear and co-oriented facilitation in the proximal surround [1]. Additionally, the mechanism is in harmony with previous models describing the interaction between feedback and lateral connections: the preattentiveattentive mechanism resolving perceptual grouping [29], models of spatial integration and perceptual grouping [26]. Results show the response in eS1 and eC1 is strongly reduced after feedback indicating that the level above captures most of the information present in the input image. Analogously, the response in eS2 and eC2 is also significantly decreased as the top-down feedback from B3 manages to predict the activity in B2. The general refinement of the Belief is aimed at reducing both the lower-level error, eS1 and eC1, by achieving a more accurate reconstruction of the input; and the upper-level error, eS2 and eC2, by increasing the strength of features consistent with the high-level representation in B3. These results reconcile previous apparently contradictory approaches by using two distinct populations: Belief units, which show refinement of the response due to feedback (both enhancement of consistent features and reduction of redundant ones), supported by experimental evidence suggesting feedback enhances activity consistent with the high-level prediction [13, 16]; and the error population, which are suppressed due to feedback, consistent with evidence showing reduction of lower-level activity due to feedback, and with predictive coding approaches [11, 30]. Overall, the illusory contour emerges as a consequence of the interaction between the global contextual feedback signal and horizontal connectivity, guided by existing feedforward cues (local evidence). The model is consistent with a recent review on illusory contour formation which hypothesizes three mechanisms are responsible for the phenomenon: interpolation, extrapolation and figural feedback [10]. The model is also supported by evidence showing the illusory contour response in V2 precedes that in V1; and the finding that lesion of IT impaired a monkey’s ability to see illusory contours [22]. Further substantiating proof was provided by [24] who showed illusory contour sensitivity first occurred in LOC, a high-level area in the ventral stream. In addition, [38] tested Kanizsa squares with blurred edges and found out the response in the LOC area was similar to that of sharp edged Kanizsa squares, although psychophysical experiments demonstrated that the perceived boundary was not as sharp and was less well localized. This in turn suggests contour-based processes that support the perception of illusory contours are performed in early retinotopically organized visual areas (V1, V2) where it should be possible to observe differential responses to the blurred-type stimuli. Testing a blurred Kanizsa square in the model reproduced these results (right column in Fig. 14.5B): the higher-level recognized the image as a square, but at the lower level’s the illusory contour could not be accurately reconstructed as there was not enough precise local information in the image to disambiguate the feedback. Another important property of the dynamic disambiguation algorithm is that feedback, which emerges from an abstract invariant high-level prototype, can adapt to match object variations which lie within the invariance range of the prototype. This means one single square prototype can be shaped by lateral connections, on the basis of local evidence, to complete the illusory contours of Kanizsa squares at different locations, scales and proportions, as shown in Fig. 14.5B). This contrasts with the blurred contours figure where, despite feedback, local cues are not precise enough to enable the completion of the illusory square. Evidence for illusory contour completion suggests it is the result of the interaction of feedback and lateral connectivity (interpolation and extrapolation) [10], which is functionally approximated in the model by the disambiguation algorithm. However, the disambiguation principles used (collinearity, co-orientation, good continuation) are very basic and just meant to illustrate how the proposed mechanism might work for the particular example of Kanizsa’s square. The binding and continuity problems entail a much more complex process which must be able to perceptually group together more elaborate patterns, including curved surfaces found in natural images. Nonetheless, the implemented simplified model provides a plausible solution to the problem of generating spatially precise feedback-mediated lower-level modulation through high-level abstract invariant prototypes.
178
S. Dura-Bernal et al.
The results are also congruent with [25] results which showed that when local information can be perceptually grouped into whole objects, activity in V1 decreases while activity in higher areas increases, and vice versa. The authors suggested different approaches to explain the phenomenon, including reducing the lower-level error (predictive coding), sharpening and explaining away, consistent with the model. Overall, the model provides a wide perspective, at multiple levels of description, of the integration of bottom-up and top-down information during object recognition in the ventral pathway, using predictive coding principles. It can serve as a guideline for the design of more detailed large-scale models of the visual cortex [9, 40], as the architecture and operations are well-suited to an implementation using spiking biophysical circuits. The model can also be used to make predictions about the responses of different neural populations at each cortical level (e.g. error and representation units) and their expected pattern of activity in response to different stimuli (e.g. refinement or suppression). Further, the framework allows for extensions such as learning and plasticity during feedback, which can enhance the stored prototypes based on new data thus improving recognition, and adaptation which could lead naturally to phenomena such as sensitivity to temporal context and bistability.
References 1. Angelucci, A., Bullier, J.: Reaching beyond the classical receptive field of vi neurons: horizontal or feedback axons? J. Physiol. 97(2–3), 141–154 (2003) 2. Bullier, J.: Integrated model of visual processing. Brains Res. Rev. 36(2–3), 96–107 (2001) 3. Cadieu, C., Kouh, M., Pasupathy, A., Connor, C.E., Riesenhuber, M., Poggio, T.: A model of v4 shape selectivity and invariance. J. Neurophysiol. 98(3), 1733–1750 (2007) 4. Carandini, M., Demb, J.B., Mante, V., Tolhurst, D.J., Dan, Y., Olshausen, B.A., Gallant, J.L., Rust, N.C.: Do we know what the early visual system does? J. Neurosci. 25(46), 10577–10597 (2005) 5. Deco, G., Rolls, E.T.: A neurodynamical cortical model of visual attention and invariant object recognition. Vis. Res. 44(6), 621–642 (2004) 6. Felleman, D.J., Van Essen, D.C.: Distributed hierarchical processing in primate cerebral cortex. Cereb. Cortex 1(1), 1–47 (1991) 7. Gawne, T.J., Martin, J.M.: Responses of primate visual cortical v4 neurons to simultaneously presented stimuli. J. Neurophysiol. 88(3), 1128–1135 (2002) 8. Gilbert, C.D., Sigman, M.: Brain states: Top-down influences in sensory processing. Neuron 54(5), 677–696 (2007) 9. Guo, K., Robertson, R.G., Pulgarin, M., Nevado, A., Panzeri, S., Thiele, A., Young, M.P.: Spatio-temporal prediction and inference by v1 neurons. Eur. J. Neurosci. 26(4), 1045–1054 (2007) 10. Halko, M.A., Mingolla, E., Somers, D.C.: Multiple mechanisms of illusory contour perception. J. Vis. 8(11), 1–17 (2008) 11. Harrison, L.M., Stephan, K.E., Rees, G., Friston, K.J.: Extra-classical receptive field effects measured in striate cortex with fmri. Neuroimage 34(3), 1199–1208 (2007) 12. Hochstein, S., Ahissar, M.: View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron 36(5), 791–804 (2002) 13. Huang, J.Y., Wang, C., Dreher, B.: The effects of reversible inactivation of postero-temporal visual cortex on neuronal activities in cat’s area 17. Brain Res. 1138, 111–128 (2007) 14. Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. J. Neurophysiol. 28, 229–289 (1965) 15. Hung, C.P., Kreiman, G., Poggio, T., Dicarlo, J.J.: Fast readout of object identity from macaque inferior temporal cortex. Science 310(5749), 863–866 (2005) 16. Hupe, J.M., James, A.C., Girard, P., Lomber, S.G., Payne, B.R., Bullier, J.: Feedback connections act on the early part of the responses in monkey visual cortex. J. Neurophysiol. 85(1), 134–145 (2001) 17. Keane, B.P., Lu, H., Kellman, P.J.: Classification images reveal spatiotemporal contour interpolation. Vis. Res. 47(28), 3460–3475 (2007) 18. Knoblich, U., Bouvrie, J.V., Poggio, T.: Biophysical models of neural computation: max and tuning circuits. In: Zhong, N., Liu, J., Yao, Y., Wu, J.-L., Lu, S., Li, K. (eds.) Web Intelligence Meets Brain Informatics. Lecture Notes in Computer Science, vol. 4845, pp. 164–189. Springer, Beijing (2007) 19. Kouh, M., Poggio, T.: A canonical neural circuit for cortical nonlinear operations. Neural Comput. 20(6), 1427– 1451 (2008)
14
The Role of Feedback in a Hierarchical Model of Object Perception
179
20. Lampl, I., Ferster, D., Poggio, T., Riesenhuber, M.: Intracellular measurements of spatial integration and the max operation in complex cells of the cat primary visual cortex. J. Neurophysiol. 92(5), 2704–2713 (2004) 21. Lee, T., Nguyen, M.: Dynamics of subjective contour formation in the early visual cortex. Proc. Natl. Acad. Sci. USA 98(4), 1907–1911 (2001) 22. Lee, T.S.: Computations in the early visual cortex. J. Physiol. 97, 121–139 (2003) 23. Maertens, M., Pollmann, S., Hanke, M., Mildner, T., Müller, H.E.: Retinotopic activation in response to subjective contours in primary visual cortex. Frontiers in Human Neuroscience 2(2) (2008). doi:10.3389/neuro.09.002.2008 24. Murray, M.M., Wylie, G.R., Higgins, B.A., Javitt, D.C., Schroeder, C.E., Foxe, J.J.: The spatiotemporal dynamics of illusory contour processing: Combined high-density electrical mapping, source analysis, and functional magnetic resonance imaging. J. Neurosci. 22(12), 5055–5073 (2002) 25. Murray, S.O., Schrater, P., Kersten, D.: Perceptual grouping and the interactions between visual cortical areas. Neural Netw. 17(5–6), 695–705 (2004) 26. Neumann, H., Mingolla, E.: Computational neural models of spatial integration in perceptual grouping. In: Shipley, T.F., Kellman, P.J. (eds.) From Fragments to Objects: Grouping and Segmentation in Vision, pp. 353–400. Elsevier, Amsterdam (2001) 27. Olshausen, B., Field, D.: How close are we to understanding v1? Neural Comput. 17(8), 1665–1699 (2005) 28. Quiroga, Q., Reddy, L., Kreiman, G., Koch, C., Fried, I.: Invariant visual representation by single neurons in the human brain. Nature 435(7045), 1102–1107 (2005) 29. Raizada, R.D.S., Grossberg, S.: Towards a theory of the laminar architecture of cerebral cortex: computational clues from the visual system. Cereb. Cortex 13(1), 100–113 (2003) 30. Rao, R.P.N., Ballard, D.H.: Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2(1), 79–87 (1999) 31. Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nat. Neurosci. 2(11), 1019–1025 (1999) 32. Riesenhuber, M., Poggio, T.: Models of object recognition. Nature Neuroscience (2000). doi:10.1038/81479 33. Serre, T., Oliva, A., Poggio, T.: A feedforward architecture accounts for rapid categorization. Proc. Natl. Acad. Sci. USA 104(15), 6424–6429 (2007) 34. Serre, T., Riesenhuber, M.: Realistic modeling of simple and complex cell tuning in the hmax model, and implications for invariant object recognition in cortex. Massachusetts Institute of Technology, Cambridge, MA. CBCL Paper 239/AI Memo 2004-017 (2004) 35. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust object recognition with cortex-like mechanisms. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 411–426 (2007) 36. Sillito, A.M., Cudeiro, J., Jones, H.E.: Always returning: feedback and sensory processing in visual cortex and thalamus. Trends Neurosci. 29(6), 307–316 (2006) 37. Spratling, M.: Reconciling predictive coding and biased competition models of cortical function. Front. Comput. Neurosci. 2(4), 1–8 (2008) 38. Stanley, D.A., Rubin, N.: FMRI activation in response to illusory contours and salient regions in the human lateral occipital complex. Neuron 37(2), 323–331 (2003) 39. Sterzer, P., Haynes, J.D., Rees, G.: Primary visual cortex activation on the path of apparent motion is mediated by feedback from HMT+/v5. Neuroimage 32(3), 1308–1316 (2006) 40. Symes, A., Wennekers, T.: A large-scale model of spatiotemporal patterns of excitation and inhibition evoked by the horizontal network in layer 2/3 of ferret visual cortex. Neural Netw. 2, 1079–1092 (2009). doi:10.1016/j.neunet. 2009.07.017 41. Williams, M.A., Baker, C.I., Op de Beeck, H.P., Mok Shim, W., Dang, S., Triantafyllou, C., Kanwisher, N.: Feedback of visual object information to foveal retinotopic cortex. Nat. Neurosci. 11(12), 1439–1445 (2008)
Chapter 15
Machine Free Will: Is Free Will a Necessary Ingredient of Machine Consciousness? Riccardo Manzotti
Abstract Sooner or later, machine consciousness will have to address the elusive notion of free will either to dismiss it or to produce a machine implementation. It is unclear whether freedom and consciousness are independent aspects of the human mind or by-product of the same underlying structure. Here, the relevant literature is reviewed focusing on the connection between determinism and freedom—usually explored by compatibilists. Eventually, a tentative model for machine free will is outlined.
For a time I thought of the problem of the freedom of the will as the most suitable Gordian knot; but in the end I opted for the concept of the mind (Ryle 1970 [31], p. 2)
Free will might be a necessary ingredient to design autonomous machines. Furthermore, as it has been said for consciousness, the attempt at emulating free will in machines might pave the way to its understanding. Between consciousness and free-will there are complex relationships that did not receive enough attention in recent cognitive literature probably due to demanding ontological and philosophical issues. The recent debate on free will often stresses the relation with consciousness notwithstanding the skepticism of a few authors. Martin Heisenberg remarks that we need not be conscious of our decisionmaking to be free. What matters is that actions are self-generated. Conscious awareness may help improve behavior, but it does not necessarily do so. Why should an action become free from one moment to the next simply because we reflect upon it? [14, p. 164]. Yet, the target of this skepticism seems to be the enabling role of consciousness rather than some broader and deeper connection between consciousness and free will. And it is difficult to criticize Heisenberg’s view although, it must be stressed that he is arguing probably against a conceptual straw man—has anyone really maintained that consciousness is enough to the freedom of a choice? And what kind of consciousness? Heisenberg’s criticism is fueled by a reflective model of consciousness seen as higher order thought. This is by no means the only possibility. Although self determination is a key ingredient of free will, without a more precise understanding of consciousness final conclusions are prematurely drawn. Is it justified to rule out deep relations between consciousness and freedom? Aren’t our intuition endorsing some kind of identity between a free choice and a conscious one? Have one ever taken a free choice while acting automatically and unconsciously? Is it conceivable a subject at the same time conscious and lacking any free will, whatever it is?
R. Manzotti () Institute of Communication and Behaviour, IULM University, Via Carlo Bo, 8, 20143 Milan, Italy e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_15, © Springer Science+Business Media, LLC 2011
181
182
R. Manzotti
If freedom is nothing but a name given to a certain level of autonomy and control achieved by cognitive systems? Or does it imply some deeper structural requirement? Since the beginning of western philosophy, a subject has been conceived as a free agent. An action which is not free is not even considered as an action but rather as a mechanical effect of previous causes. Action-hood entails freedom. Yet, very little of this intuitions were of some use in AI and cognitive science in general. It is true that, for a while, free will has been dismissed from scientific debate since it seemed to require some extraingredient as to the physical description of nature. In this respect, freedom and consciousness were held in similar dubious reputation by the scientific community. Furthermore, freedom is akin to consciousness as to presence/absence in animals. It was long maintained that animals were not conscious since they were not provided of soul. Similarly, it was stated that they were not free for the same metaphysical shortcomings. More liberal reasoning suggests that since animals posses some kind of consciousness so they must be free, at least to a certain extent. There is plenty of evidence that animal behavior cannot be reduced to totally fixed responses. A dog seems to show a freer behavior than a fly. Yet, the issue is vague. Is there a critical threshold? Of what kind? Structural, behavioral, or metaphysical? Is such threshold correlated with consciousness? Finally, what about machines? Could they exploit some kind of freedom? In the following—contrary to the layman’s widespread belief that free will is incompatible with determinism—it will be maintained that free will requires a deterministic world. The well known compatibilist stance will be put to good use—namely the view holding that free will is indeed possible only because of determinism. If this view is correct, there is a conceptual and indeed cognitive space for free will in machines. Machines might offer a testbed for theories and models of free will. In particular, this chapter will consider and criticize a model of present—here dubbed “temporal atomism”—which restrains from considering free will as a structural aspect of cognition. Taking advantage of a refutation of temporal atomism, in the last part of the chapter, a few aspects of free behavior—such as integration and polytropism—will be considered in relation to a tentative implementation of free will. It’s fair to warn the reader from the start that if you expect to find a block diagram of freedom you are going to be deluded.
15.1 What is Free Will? Famously Spinoza wrote that “[M]en believe themselves to be free, because they are conscious of their own actions and are ignorant of the causes by which they are determined” [33]. Yet, if freedom were a matter of pure ignorance, it would be too easy to succeed in replicating it into a machine. According to such a definition, most available artificial agents would indeed be free since they act without the slightest knowledge of the causes of their actions. Obviously something else must be added. When do we consider an agent to be free? A simple and still inadequate answer is that an agent is free if, and only if, it is able to achieve its goals against any external constraints. Yet, this is not enough and it can lead to an apparent freedom as it happens in all those agents where the constraints are internal or simply the action space is either severely limited or fixed. The issue at stake runs much deeper. Is the agent able to choose its goals if its behavior is constrained by previous factors such as programming, environmental stimuli, upbringing, genetic blueprint, and such? According to the philosopher Immanuel Kant, someone’s actions are not free if they are determined by something or someone else. An agent is taken to be free when it seems able to be the only and ultimate cause of its behavior. It is up to the agent whether to do something. Or, at least, this is what it seems. As we have seen, this is not necessarily the case. It could be an illusion due to insufficient causal knowledge. According to Peter Hájícek [12], any acceptable explication of free will has three ingredients: (1) it must entail that an agent might have chosen otherwise; (2) it must explicate the control that
15
Machine Free Will: Is Free Will a Necessary Ingredient of Machine
183
free will requires; and (3) must explicate the “sensibleness” or “rationality” that free will involves. The first requirements thus implies that the agent is able to contemplate and weight more than one course of action and then to single out an outcome on the basis of its own criteria. It is rather obvious from the start that there is some tension in such a formulation—on the one hand free will requires multiple outcomes, on the other hand, it requires the control and the efficacy of the agent’s criteria and rationality. It is plain that control and multiple outcomes are contradictory principles. By and large, currently there is an obvious difference between a human being and a machine. Human behavior is the result of mostly unknown causes that are practically unknowable because of their sheer numbers and their causal role in one’s life. In the case of machines, since they are the result of human design or programming, it is much easier to provide an almost exhaustive causal account. This means that, back to Spinoza, is easier to believe (wrongly) that a human being is freer than a machine, since it is easier to ignore the causes of human behavior. However this is just an epistemic difference and it cannot be used to endorse an ontological divide. Leaving aside Spinoza’s epistemic concerns, it seems we are faced with two possibilities: either an outcome is the unavoidable result of previous cause and thus it is not free, or it is the result of the free choice of the agent and thus it occurs as a result of some special causal power of the agent. Both conditions seem inappropriate for machine freedom. Yet, for identical reasons, they would be inappropriate for human freedom too. This leaves open the possibility that we are either applying unjustified assumptions or drawing unwarranted conclusions. Nevertheless, if human beings and other animals are free, in some practical sense of the word, machines could exploit the same cognitive trait unless humans were powered by some unnatural force alien to the physical world, which seems pretty unlikely.
15.2 Free Will Geometry The discussion on free will often revolves around a set of vaguely defined notions. In this section, a few critical notions which are crucial in the discussion on freedom are briefly listed and discussed: predictability vs. unpredictability, determinism vs. indeterminism, self-determination vs. hetero-determination, internal vs. external. The goal is to get rid of some vagueness before engaging in a more challenging debate. Furthermore, the quick outline of these notions is here relevant insofar as it shows how much the debate of free will is linked with the gap between epistemic and ontological issues. In fact, many of the following dichotomies are rooted in the divide between what is known about an agent’s behavior and the processes leading to that behavior. Predictability vs. Unpredictability If an agent is free, it is up to it to choose one outcome among many. In principle, there is no way to know in advance what the agent will do. However, in many situations, it is reasonably certain that a certain action will be preferred among many others. If someone buys the winning lottery ticket, what she will do? On the other hand, if you twist the knob to set the timer of a washing machine, assuming that everything is fine, you know perfectly well what is going to happen. To be free is thus to be unpredictable? Up to a point. In fact, Spinoza could be right. The fact that something cannot predicted could be only a matter of epistemic limitation on the beholder’s side. The weather cannot be predict reliably for more than a month in advance, yet as far as we know, the atmosphere is not a free agent. After seven bounces, a billiard ball trajectory is unpredictable. Yet, the ball is not freer because of the chaotic components of its trajectory. On the other hand, it is not unconceivable that a free choice could be rather predictable, although completely free—if such a thing would exist. For instance, once confronted with the possibility of avoiding pain, most humans would gladly do so (not all, though!). Are they less free since they are all the more predictable?
184
R. Manzotti
Determinism vs. Indetermism According to absolute determinism, of the kind Pierre Simon Laplace was fond of, every event is completely determined (fixed) by its predecessors. Since nothing comes out of nothing, everything is fixed. Determinism seems an easy view, but it is not [15, 17]. Does determinism imply causation? What kind of connection ties together subsequent events? Further, note that determinism does not entail predictability as shown by chaos theory. A chaotic system can be perfectly determined and yet, since its initial state cannot be measured with perfect accuracy, its future development remains unknown. The comforting picture of a deterministic universe was jeopardized by quantum mechanics since intrinsically casual events were admitted. In other words, although the probability density function of certain events is defined, their individual occurrence is not. For instance, there is a probability of 1/2 that a certain nucleus will change its status but there is no way to cause it to happen at a certain time t. In short, an indeterminate event is an event whose occurrence at a time t is not caused by any previous event or state of affairs. A determinate event is an event whose occurrence is exhausted by previous facts. Indeterminacy entails a rupture with the past, while determinacy seems to guarantee a continuity. Yet. indeterminism is no safe harbor for free will. As it was remarked by William James, “If a free act be a sheer novelty, that comes not from me, the previous me, but ex nihilo, and simply tack itself on to me, how can I, the previous I, be responsible?” [18, p. 53]. Similarly many authors observed that although indeterminism gets rid of unwanted external causes, it also gets rid of the necessary determination that should proceed from the subject towards the choice. Suppose that, whenever an agent has to take a choice between two options, a device throws a quantum dice. Then the agent acts accordingly. Would this behavior be free? It seems unlikely. A free choice is not a random one. A free choice is an act that is at the same time linked with the agent’s past and unconstrained by external causes. Self Determination vs. Hetero-determination Suppose that an agent is constrained in its behavior by external causes. Surely it would fall short of any intuitive notion of freedom. A good example is being programmed by an external designer. If the agent’s goals are set by an external designer, they are not the expression of the agent’s freedom. On the other hand, a free agent is self-determined. Being self-determined is an interesting notion since it stresses the deep relation between determinism and freedom rather than indeterminism and freedom. It also stresses the fact that the notion of the self is crucial and probably prior to that of freedom. The chasm between freedom and necessity closely matches that between agents who are their own cause and agents whose behavior depends on conditions external to them. Internal vs. External Causes external to the agent cannot contribute to its freedom. Such causes range from external constraints like chains to more subtle ways to constraint the agent’s behavior such as upbringing, education, genetic background, nature and nurture. Once Stephen Jay Gould wrote that “If we are programmed to be what we are, then these traits are ineluctable.” [10, p. 238]. This is not the place to engage in the nature vs. nurture debate. For what it matters here, both genes and cultural environment and education are just the same: causes which we consider to be external to the agent as such. The issue at stake is that intuitively there seems to be a sharp distinction as to whether some event belongs to the agent. If it does, the event can be considered internal to its structure, otherwise is usually conceived as external. To be internal is not sufficient to contribute to the freedom of the agent, yet to be external is surely sufficient to be alien to its free will. Such sharp distinction is very Cartesian, to say the least. The boundaries of the body are too narrow and too large at the same time. While an external event could be considered legitimately part of an agent’s being, a tumor growing inside the agent’s body would not. On the other hand, a neuron could be external to the agent whether the agent was identical to some neural activity that does not comprise that neuron. On the basis of our notion of what the agent is, a component of the agent is either rejected or accepted.
15
Machine Free Will: Is Free Will a Necessary Ingredient of Machine
185
To recap, the above sketchy outline of the various views on free will suggests that—whether anything like free will exists—it is determined or, better, self-determined, internal to the agent, intertwined with its individual history and selfhood. Is anything like that available in a machine?
15.3 Temporal Atomism As to free will, the two main views are compatibilism and libertarianism. According to compatibilism, the agent is determined and freedom is a reality that has to be explained in terms of integration, self-determination, consciousness, and other cognitive features. The opposite of compatibilism is incompabilism which holds that a deterministic world is incompatible with freedom. This entails that if the world were deterministic, there would be no freedom. Alternatively, if freedom does indeed exist, it would meant that the world is not completely deterministic. Incompatibilists who accept free will and deny determinism are called libertarians [1, 5, 16, 20]. From a libertarian standpoint, which is closer to the layman’s view of freedom, the kind of liberty suggested by compatibilism is nothing but a “wretched subterfuge”, as the philosopher Immanuel Kant was fond of saying. Many distinguished authors defended this view suspecting that the alternative would throw away the baby with the bath water [19, 21]. On the other hand, from a compatibilist perspective, which has been embraced by recent scholars like Daniel Dennett or Ted Honderich, libertarianism is nothing but a flight of fancy trying to cope with the hard facts of physics introducing vague metaphysical principles [6, 8, 17]. It has been observed that “AI depends on a compatibilist view, but having taken it, there is a lot to be learned about the specific forms of free will that can be designed” [30, p. 342]. The reason is rather obvious. Up to know, all machines are made of deterministic components and for good reasons too. A machine behavior has to be as much predictable as possible. Of course, if compatibilism would be true, implementing free will in machines would be easier. Yet this is not a proof as to whether compatibilism is true. According to many authors, a critical aspect of free will is the so-called problem of origination. In other words, they look for a crucial place and time where the choice is originated. As Robert Kane remarked “If there is indeterminacy in free will, on my view, it must come somewhere between the input and the output” [19, p. 27]. That place somewhere between is the special locus where free will allegedly changes the course of events. This is rather plain in Benjamin Libet’s words “The initiation of the freely voluntary act appears to begin in the brain unconsciously, well before the person consciously knows he wants to act?” [24, p. 49]. Once more there is the expectation that free will has to originate into a narrow spatio-temporal locus. The notion of free will is thus framed in terms of temporal atomism. By this term, I mean the view that the act of free will must be squeezed into a temporal atom of no width. At some point in the chain, there must have been an act of origination of a new causal chain. In this way, libertarians can defend the pristine causal role of free will. However, it could also lead to the refutation of free will as a physical possibility [34]. Given a causal chain leading to an action, it has been often claimed that free will must intervene at a given point by a voluntary act. It is as if there were a perfectly determined chain of events and then, somewhere and somewhen along the line, some unexpected event happens and possibly changes the course of events. This unexpected event is the alleged free choice of the agent. Given these (unwarranted) premises, temporal atomism is necessary in order to allow the origination of free will. No matter what has happened before (even internally), when the agent chooses, nothing is going to determinate its future outcome. “If we are responsible . . . then we have a prerogative which some would attribute only to God: each of us, when we act, is a prime mover unmoved. In doing what we do, we cause certain events to happen, and nothing—or no one—causes us to cause those events to happen” [4, p. 32].
186
R. Manzotti
However both the doctrine of origination and the model of temporal atomism suggests a mythical atomism of the free will act which runs counter any physical model of cognitive processes. In fact, any physical processes requires time to take place. Complex cognitive processes require a lot of time— usually in the order of hundreds of milliseconds. Yet, this should not come as a surprise. Neither should we look for some mythical temporal atom in which the free will occurs squeezed between otherwise determined causal chains. Yet, most empirical research on free will seem to assume that free will, if it exists, has to be located in a temporal instant [13, 24, 32]. Whenever this is not confirmed, it is taken to be a refutation of free will. However, it is not. Rather it is a refutation of the assumed temporal atomism. Consider the classic argument by Libet [22]—since the neural activity underpinning an action begins some time before the conscious choice, the autonomy of the will is challenged. However, it is to be expected that, whatever neural activity endorse cognitive processes is to be spread in time. Cognitive and neural processes are not instantaneous. According to Libet “What we found, in short, was that the brain exhibited an initiating process, beginning 550 ms before the free voluntary act; but awareness of the conscious will to perform the act appeared only 150–200 ms before the act. The voluntary process is therefore initiated unconsciously some 400 ms before the subject becomes aware of her will or intention to perform the act” [23, p. 124]. Subsequent empirical evidence is coherent with these initial findings [11, 13, 32]. Yet why ought to be so surprising? There is no reason to expect that a decisional process is not spread over an extended span of time. There is no limit indeed to how much time a cognitive process can take advantage of. Another reason to criticize libertarianism lies in its intrinsic dualism. In fact it suggests that humans (and free agents in general) exploit some intrinsic property which is not shared by the rest of reality. This is highly dubious. From a scientific point of view, compatibilism is a stronger position since it does not require any extra-hypothesis on the nature of reality.
15.4 A Model for Machine Free Will Are there cognitive architectures suitable to endorse free will? In the above, I argued that if anything like free will exists, there is no reason why should be present only in human beings. Further, free will is likely to be related neither with some mythical origination nor with casual events. Rather, it is to be expected that free will stems out of very complex causal processes akin to those exploited by human beings. However, it is plain that simple deterministic devices are not up to the task. It is as if, very loosely speaking, there is good determinism and bad determinism. Whereas the latter is the kind usually associated with machines and the former is the one linked to our sense of responsibility and free will. Consider a random machine built in such a way that whenever it has to take a decision, it throws a quantum dice which is genuinely casual. Is it free? No, of course not. First, in such a case, to talk of an agent is purely metaphorical. Secondly, as it has been argued above, to be indeterminate is not sufficient to be free. Consider a pure automata like a simple toy. For instance, a cuckoo clock. This time there is no indetermination involved. Once more there is no freedom either. The machine does not qualify in any intuitive sense as a free agent. It does not qualify as an agent, too. Yet its behavior is completely determined by internal causes. However there are at least two shortcomings. On the one hand, the causes are ultimately triggered by original previous causes external to the cuckoo clock such as the clockmaker starting the mechanism. On the other hand, there are no choices since all events follow a rigid course. Is it possible to design and implement a machine that is capable of true choices albeit being ontologically determinate? A first step in the right direction, to our opinion, was suggested by Gary
15
Machine Free Will: Is Free Will a Necessary Ingredient of Machine
187
Drescher [9] who contrasted situation-action machine with the more complex choice machine, in which the individual creation generates its own reasons for doing x or y, by anticipating probable outcomes of various candidate actions and evaluating them in terms of the goals it also represents [9]. Instead of looking for a magic step that introduces a gap in the causal determinacy of events, more complex agents exploit a more structured intertwinement between their history and their structure. Using Drescher’s terminology, a situation-action machine is a repository of sensory-motor responses. This means that the outcome of the machine is not the result of its individual development. The same idea matches the difference between Darwinian and Skinnerian agents outlined by Daniel Dennett [7]. While the former act purely on the basis of fixed input-output patterns, the latter build on top of their own experiences and modify their behavioural structure accordingly. Likewise, Riccardo Manzotti and Vincenzo Tagliasco suggested to distinguish between teleologically fixed and teleologically open agents [29]; the former resemble closely both Drescher’s situationaction and Dennett’s Darwinian/Skinnerian agents, the latter are a further step in the direction of a thoroughly self generated causal structure. A teleologically open agent is an agent that not only is able to learn how to behave to reach fixed goals, but it is also capable of acquiring new goals. Its history becomes part of its teleological causal structure. The common hunch in all these approaches is that the causal structure of an agent is not fully there from the start. Rather the agent develops while more and more causes get entangled together by means of the peculiar physical structure of the agent. There are two ways to interact between an agent and its environment. In the first case, external events trigger responses out of the agent, yet the agent structure remains the same. Consider a laptop. Its code does not change because of the user and the data it processes. There is no agent there, since there is no development, no causal space for an agent to grow around the seeds of the external stimuli. Consider now a human being. In such a case, during development, external events trigger complex changes in the causal structure of the agent. The external world nurtures and causes the causal structure itself. Is there a disguised dualism here? Is the agent contrasted with its causal structure? Not necessarily, the two are the same. An agent is its causal structure. It seems that there are two roles for causations and only the case in which causation changes the causal structure is relevant for free will. Or at least, this is what emerges out of the available literature. This is not the place to discuss in details what is meant by ‘causal structure’. Assume that it could be something like the capability to store and manipulate internal representations, or the existence of a global workspace, or the capability to develop new goals, or something entirely unknown at the present. This author is agnostic as to what is meant by ‘causal structure’. Whatever it is, it is clear that a standard program does not modify it, while a human being does. Recently, Dennett wrote that, “It is commonly supposed that in a deterministic world, there are no real options, only apparent options. This is false” [8, p. 25]. I do agree. But how can a deterministic machine account for choosing among real options? The question cuts to the crux of the problem. The proposal is that a free agent is capable of real choices. What is a real choice? It is neither a random outcome, nor the result of causes external to the agent. A choice is the outcome of a set of causes that are constitutive of the agent. Is there circularity here? Only if the agent were defined as a system capable of choices. Of course, the burden of the above definition lies on the capability of outlining the nature of the agent. Before embarking on a discussion as to the nature of real choices, it must be stressed that the above definition allows a gradual notion of freedom. In fact, freedom does not arise out of a special component. Rather, free will depends on the complexity of the agent considered. In fact, the freedom associated with each agent’s action depends on how much that action is the result of the agent’s self. For the sake of the argument, let’s assume here that the agent’s self is not a metaphysical entity but rather a quantifiable aspect of a cognitive agent such as the total memory, or the total number of sensory-motor contingencies, or the Tononi’s integrated information, or whatever [35]. If this were
188
R. Manzotti
possible, a free choice would depend on how much that particular choice is the expression of the agent individuality (more on this in the following). To recap, literature suggests that there is a progression from situation-action agents to more freedom-oriented agents and that such a progression corresponds to an increased entanglement between individual history and causal structure. This approach allows avoiding the determinism vs indeterminism dilemma. It draws on a different crucial property—namely being free depends on being part of the agent. Since the existence of an agent is a matter of degree, the freedom associated with an action could vary. Contrary to the duchess of Devonshire’s opinion, freedom could indeed be a matter of degree. A minimum of formalization is going to help. Suppose to have an agent and the events {ei } occurring in its surroundings. Suppose that these events can be divided in three possible categories: events {ei∗ } that are not causally influent with the agents (for instance, radio-waves for a worm), events {ei∗∗ } that are causally processed by the agent but that not alter its causal structure (for instance keystrokes on a laptop), and events {ei∗∗∗ } that are going to be embedded in the causal structure of the agent either as representations or as goals (for instance, the first glance at Juliet by Romeo). The third and last category could indeed be proposed as a rough approximation of the kind of events that partake to the causal structure of the agent. In slightly more philosophical words, the events {ei∗∗∗ } are those that have a counterfactual relation with the causal structure of the agent. The agent is here modeled as a collection of casual relations A = {ri |ri = cj → ek } such that they were introduced in the past by some triggering events. In this way, hard-wired casual relations are put aside—the agent is the result of the individual history. Given the agent A, there are three possible cases 1. If any e∗ occurs, nothing happens 2. If any e∗∗ occurs, ∃j ∗ |cj ∗ → e; in other words, because of the agent A, the event produces some outcome. Yet the causal structure of A remains unaltered. 3. If any e∗∗∗ occurs, either one or more new causal relations rk are added to the causal structure of A, or one or more existing causal relation ri are modified, or both. In the above formalization what is interesting is that all is a matter of degree. It is even possible to suggest a way to weight how much an incoming event is part of the subject. The basic idea, surely to be further refined, is that only the incoming ei∗∗∗ belong to the subject and that they belong all the more if they affect either existing or new causal relations constituting the agent. Considering the ratio between the number of the causal relations affected by an event e and the total number of causal relations constituting the agent, a very rough index of belongingness could be estimated. Ideally, an event that is going to change completely the agent’s casual structure, is going to affect all its causal relations with the world. Consider a Paul-on-the-road-to-Damascus kind of event in one’s life. After an experience like that, all is different. Conversely, an isolated and not very important event is going to have a much more local effect. So much for the past events concocting the causal structure of the agent, but what about the choices the agent has to take? Let us skip the question as to whether a free choice could be originated by totally endogenous and unpredicted causes such as is the case when all of a sudden one wants to do something totally unforeseen. A more common case of free will exercise is to take choices in response to incoming stimuli. You see a barking dog: do you stand or run away? Whenever you have to answer to an incoming stimulus, you have to take a choice. How many of your past experiences and thus of embedded causal relations are involved? Keep in mind that the agent is not made of any hardwired input-output patterns. The agent, according to the above definition, is made of what had been gobbled up during its individual history. In order to take a choice, how much of the agent structure has to be put into good use? Suppose that none of it, in fact, is used. It could be. For instance, the agent’s action could be the result of an instinct and thus of a hard-wired reflex like when you abruptly withdraw your hand from a spider on
15
Machine Free Will: Is Free Will a Necessary Ingredient of Machine
189
a wall. Would that be a free choice? Hardly. Suppose, on the contrary, that putting a wedding ring on someone’s finger is the result of a well-pondered balance among most of what constituted one’s life up to that crucial moment. Would that be a free choice? I hope so and, seriously speaking, it looks like. Intuitively, it seems that the more a choice is the result of one’s previous causal structure, the more it is an expression of freedom. Then a free choice would be any outcome which is the result of a relevant portion of the agent’s causal structures. In turn, the agent’s existence is gradually constituted by the increase in the number of causal relations embedded in the individual history. In practice, if there is no individual history, there is no agent. If there is no agent, there is no freedom. In turn, if there are no causal relations of the kind outlined above, there is neither agent nor freedom. So far so good, you may say, but clearly many questions remain unanswered. Against this model of free will in agents and machines, it may be objected that a few crucial things are blatantly lacking. I would like to mention a few. • The whole model might seem somehow circular since it depends on how you define the agent. This is partially true since the agent’s definition is based on the occurrence of counterfactual casual relations of a definite kind. It doesn’t always happen that the occurrence of an event allows for the further occurrence of future events of the same kind. Neither has it happened often that a stimulus changed the perceiver’s structure in any relevant way. Elsewhere, the connection between ontogenesis and epigenesis has be discussed at length [25–29]. Although, of course, the issue is far from being solved, it is likely a yet to be thoroughly understood aspect of agent development. • There is no computational (nor to speak of experimental) evidence of the efficacy of the presented model. True. Yet this is not an argument against the idea in itself but rather an encouragement to further research. • Where is the place where the choice is up to the agent? This would be the libertarian and Cartesian objection to our proposal. There is no place where the agent exploits any magic power to elude the causal determinism of causes and effects. There is no place where the free choice originates. Yet this is neither a surprise nor a shortcoming. In fact, it was to be expected given our sympathy for compatibilism. A machine is not the be expected to elude physical laws and there is no hope that future understanding of complexity theory or other sophisticated mathematical models will justify the emergence of any radical departure from the present standard physical model of reality. • Would a machine satisfying the presented model be free? The suggested model is a model of free will that could express human free will as well. In this sense, being free is mostly the capability to be self determined. Of course the notion of self determination carries a heavy epistemic debt with the notions of self and determination.
15.5 Conclusion According the Greek mythology, Apollo was a single minded god. He was monotropos. In fact, he drove the sun chariot along a fixed trajectory in the sky. And for many good reasons, as the disastrous Phaeton’s attempt showed to all. In contrast, the god Hermes had a devious mind. His mind was polytropos since it was able to pursue many different goals and to choose among them. Hermes mind offers a suitable example of what intuition suggests to be a free mind. His capability of endorsing multiple ends is a key aspect of free will. This is something that has been repeatedly stressed here, although, strictly speaking, it is a necessary but not sufficient condition for free will. In fact, if there are no alternatives, there is no freedom. Yet the simple generation of multiple scenarios is not sufficient to guarantee a free choice.
190
R. Manzotti
Another issue that has to be mentioned, however briefly, is the role of temporal integration. Since events are likely to take place at different times, an agent made of a huge number of causal relations is likely to be the product of a relevant span of time. This is an example of temporal integration where the causal structure of the agent is the result of events spread along a temporal interval. In turn, any outcome of such an agent is the result of temporal integration of many otherwise separate events. It should come as no surprise that many of the relevant conditions we have outlined for freedom have been already mentioned in previous works on machine consciousness [2, 3, 29]. For instance, polytropism is very close to teleologically openness. Similarly, the kind of causal relation expressing when an event belongs to an agent is tightly tied with previous models of machine consciousness. This is to be expected and considered moderately encouraging. In fact, consciousness and free will share many aspects both from an empirical and a theoretical view. Consider the gap between automatic responses and conscious ones and how close it matches the gap between automatic and free action. It is well known that most automatic sensory-motor reflexes do not require any awareness. In contrast, consciousness steps in whenever an individual, meditated, and original choice has to be taken—that is a free one. It is also well known that the more a behavior becomes automated (because of training or repetition) the more it fades from consciousness thereby being subtracted from conscious and free control. Consider the scene in the movie “Burn after reading” when a startled Harry Pfaffer (George Clooney) finds unexpectedly Chad Feldheimer (Brad Pitt) in his closet. Since Harry had a previous training to shoot unconsciously in an emergence, he shoots at Chad. Then he excuses his action since his shooting was not free being the mandatory outcome of his training. Many more examples could be made. Many issues that have not been adequately developed here such as temporal integration, polytropism, automatic vs conscious responses, unity, and agency, are indeed shared between the issue of consciousness and that of free will either in humans and in machines. This is likely not to be fortuitous. Although the only philosophical account of free will that can be adopted by AI—namely compatibilism—is still far from offering an established and detailed blue print for machine free will, it is remarkable to note that machine consciousness and freedom can be seen as the attempt at finding a more efficient way to situate an agent in an unpredictable and largely unknown environment. Acknowledgements I wish to thank Antonio Chella for his support and encouragement on various challenging and demanding topics in AI and Cognitive Sciences as well as for his many observations on an earlier version of this paper.
References 1. Atmanspacher, H., Bishop, R. (eds.): Between Chance and Choice. Interdisciplinary Perspectives on Determinism. Imprint Academic, Exeter (2002) 2. Chella, A., Manzotti, R. (eds.): Artificial Consciousness. Imprint Academic, Exeter (2007) 3. Chella, A., Manzotti, R.: Machine consciousness: a manifesto for robotics. Int. J. Mach. Conscious. 1(1), 33–51 (2009) 4. Chisholm, R.M.: Freedom and action. In: Lehrer, K. (ed.) Freedom and Determinism, pp. 11–44. Random House, New York (1966) 5. De Caro, M.: Libero Arbitrio: Una Introduzione. Laterza, Bari (2004) 6. Dennett, D.C.: Elbow Room. The Varieties of Free Will Worth Wanting. MIT Press, Cambridge (1984) 7. Dennett, D.C.: Kinds of Minds: Toward an Understanding of Consciousness. Science Masters, 1st edn. Basic Books, New York (1996) 8. Dennett, D.C.: Freedom Evolves. Penguin Books, London (2003) 9. Drescher, G.: Made-Up Minds: A Constructivist Approach to Artificial Intelligence. MIT Press, Cambridge (1991) 10. Gould, S.J.: Ever Since Darwin. Norton, New York (1978) 11. Haggard, P.: Voluntary action and conscious awareness. Nat. Neurosci. 5(4), 382–385 (2002) 12. Hájícek, P.: Free will as relative freedom with conscious component. Conscious. Cogn. 18, 103–109 (2009) 13. Haynes, J.-D., Sakai, K., Rees, G., Gilbert, S., Frith, C., Passingham, R.E.: Reading hidden intentions in the human brain. Curr. Biol. 17, 323–328 (2007)
15
Machine Free Will: Is Free Will a Necessary Ingredient of Machine
14. 15. 16. 17. 18. 19. 20. 21. 22.
Heisenberg, M.: Is Free Will an Illusion? Nature 459, 164–165 (2009) Honderich, T.: The Consequences of Determinism, vol 2. Oxford University Press, Oxford (1988) Honderich, T.: How Free Are You? Oxford University Press, Oxford (2003) Honderich, T.: On Determinism and Freedom. Edinburgh University Press, Edinburgh (2005) James, W.: Pragmatism: A New Name for Some Old Ways of Thinking. Dover, New York (1907) Kane, R.: The Significance of Free Will. Oxford University Press, Oxford (1996) Kane, R. (ed.): The Oxford Handbook of Free Will. Oxford University Press, New York (2001) Kane, R.: A Contemporary Introduction to Free Will. Oxford University Press, New York (2005) Libet, B.: Unconscious cerebral initiative and the role of conscious will in volutary action. Behav. Brain Sci. VIII, 529–566 (1985) Libet, B.: Mind Time. The Temporal Factor in Consciousness. Harward University Press, Cambridge (2004) Libet, B., Freeman, A., Sutherland, K.: The Volitional Brain: Towards a Neuroscience of Free Will. Imprint Academic, Thorverton (1999) Manzotti, R.: A process based architecture for an artificial conscious being. In: Seibt, J. (ed.) Process Theories: Crossdisciplinary Studies in Dynamic Categories, pp. 285–312. Kluwer Academic, Dordrecht (2003) Manzotti, R.: From artificial intelligence to artificial consciousness. In: Chella, A., Manzotti, R. (eds.) Artificial Consciousness, pp. 174–190. Imprint Academic, London (2007a) Manzotti, R.: Towards artificial consciousness. Comput. Philos. Newsl. 07(1), 12–15 (2007b) Manzotti, R.: From consciousness to machine consciousness. Proc. Addresses Am. Philos. Assoc. 82(1), 54 (2008) Manzotti, R., Tagliasco, V.: From “behaviour-based” robots to “motivations-based” robots. Robot. Auton. Syst. 51(2–3), 175–190 (2005) McCarthy, J.: Free will—even for robots. J. Exp. Theor. Artif. Intell. 12(3), 341–352 (2000) Ryle, G.: Autobiographical. In: Wood, O.P., Pitchers, G. (eds.) Ryle: A Collection of Essays, pp. 1–15. Double Day, Garden City (1970) Soon, C.S., Brass, M., Heinze, H.-J., Haynes, J.-D.: Unconscious determinants of free decisions in the human brain. Nat. Neurosci. 11, 543–545 (2008) Spinoza, B.: The Ethics (Ethica Ordine Geometrica Demonstrata). Dodo Press, New York (1664) Strawson, G.: Free will. In: Craig, E.M. (ed.) Routledge Encyclopedia of Philosophy. Routledge, London (1998/2004) Tononi, G.: An information integration theory of consciousness. BMC Neurosci. 5(42), 1–22 (2004)
23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
191
Chapter 16
Natural Evolution of Neural Support Vector Machines Magnus Jändel
Abstract Two different neural implementations of support vector machines are described and applied to one-shot trainable pattern recognition. The first model is based on oscillating associative memory and is mapped to the olfactory system. The second model is founded on competitive queuing memory originally employed for generating motor action sequences in the brain. Both models include forward pathways where a stream of support vectors is evoked from memory and merges with sensory input to produce support vector machine classifications. Misclassified events are imprinted as new support vector candidates. Support vector machine weights are tuned by virtual experimentation in sleep. Recalled training examples masquerade as sensor input and feedback from the classification process drives a learning process where support vector weights are optimized. For both support vector machine models it is demonstrated that there is a plausible evolutionary path from a simple hard-wired pattern recognizer to a full implementation of a biological kernel machine. Simple and individually beneficial modifications are accumulated in each step along this path. Neural support vector machines can apparently emerge by natural processes.
16.1 The Problems of One-Shot Trainable Pattern Recognition and Sleep While an increasing mass of data on brain systems is compiled there is still need for integrative theories of overall function. Learning to recognize new patterns in sensory inputs and to act on such classifications is a key cognitive skill. Artificial intelligence can still not match the uncanny efficiency of pattern recognition in biological organisms. Natural pattern recognition systems are reliable flexible and trainable. They can generalize well even in very noisy environments and acquired skills are stable even if not rehearsed for a long time. Intervening learning of unrelated patterns causes little dilution of established abilities. These features are hard to match for most artificial neural network algorithms. Classification processes in the brain are, in spite of this impressive performance, quite fast given the comparatively slow operation of neural cells. Low-level classifications are normally completed in a few hundred milliseconds or at most ∼ 100 neural cycles. An interesting aspect, that should be included in any explanatory model, is that pattern recognition as well as many other cognitive faculties in higher vertebrates seems to require sleep and is rapidly degraded if the animal is denied sleep. Since sleep is a potentially dangerous condition of degraded attention and nevertheless is universal in higher life forms the need for it must be built into the fabric of the system. Few artificial neural network models include sleep as an indispensable feature. One-shot learning is ubiquitous in higher life forms but is very hard to explain in artificial neural network models. One-shot learning means that one single training example teaches the organism to M. Jändel () Swedish Defence Research Agency, 164 90 Stockholm, Sweden e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_16, © Springer Science+Business Media, LLC 2011
193
194
M. Jändel
recognize a new pattern. Animals that need many examples for connecting the scent of a predator to mortal danger will succumb in circumstances where one-shot learners thrive. Lessons of high survival interest are hence frequently learnt from one single exposure. Even simple animals such as snails learn food aversion from just one contact with an unappetizing substance [1]. However, artificial neural networks typically require vigorous repetition of training examples in order to build useful skills. Artificial associative memories, built from neural components, are not stand-alone high-performance pattern recognisers but they have the key capacity of learning new patterns instantly [2]. We describe two different architectures where significant and surprising experiences are captured in associative memory. Memories are tuned and pruned in slow-wave sleep and used for feed-forward pattern recognition in the waking state. It turns out that the systems implement support vector machines. It is suggested that many such machines, each tuned to a different context, contribute to low-level pattern recognition in the brain. While speculating on intricate mathematical algorithms in living neural systems it is important to remember that complex organisms must have evolved from simpler life forms. A credible model must demonstrate an evolutionary path that starts with a very basic function and gradually builds the complex model in simple steps where each step independently provides add-on survival or reproductive value. Hypothetical evolutionary paths to neural support vector machines are therefore presented and discussed in Sects. 16.4 and 16.7. This paper is a part of a research program for investigating the hypothesis that support vector machines are used for low-level pattern recognition in biological organisms. Support vector machines are high-performance pattern recognition algorithms well grounded in mathematical generalization and optimization theory but with little obvious relation to natural neural systems. Reference [3] shows that a particular support vector machine algorithm (zero-bias ν-SVM) readily can be expressed as a neural system and that the architecture and dynamics of the system are similar to the olfactory system. In particular it is found that the system is capable of one-shot learning and that it requires sleep. Reference [4] finds that the burst mode of the thalamocortical system could be the signature of a pattern recognition mechanism that operates according to support vector machine principles. An efficient neural implementation of support vector machines that reuses memory structures otherwise employed for fluent motor action is discussed in [5]. The present paper is an expansion of [6] and contributes to the research program by demonstrating that it is comparatively easy for life forms to incrementally evolve support vector machines.
16.2 Core Kernel Machine Model Kernel machines or support vector machines (SVM) [7] are efficient pattern recognition algorithms that work by implicitly projecting inputs to a high-dimensional feature space where linear classifiers are applied. Features are typically non-linear functions of the input sample. Operating in a high or even infinite-dimensional space of features makes it easy to separate training examples with different valence by linear expressions describing hyperplanes. The optimal solution is the hyperplane in feature space that separates training example classes with a maximal margin. For simplicity we shall only allow for binary classifications. Consider at set of m training examples (xi , yi ) where xi is an input vector with binary or real-valued components and yi ∈ {1, −1} is the correct binary classification of the example. A (zero-bias) SVM classifies a test input vector x as positive if and only if f (x) ≥ 0 where f (x) =
m
yi αi K(xi , x).
(16.1)
i=1
The classification function f (x) depends of the training examples, the weights αi and the non-linear symmetric kernel function K.
16
Natural Evolution of Neural Support Vector Machines
195
We shall focus on zero-bias ν-SVM—a special support vector machine that is uniquely apt for biological implementation [3]. Zero-bias means that there is no constant factor in (16.1) as for most support vector machines. Removing the bias factor is essential for enabling a neural implementation based on one single associative memory. It carries, however, a penalty in the form of reduced generalization ability as it constrains the set of allowed classification functions. The alternative neural realization suggested in [5] reinstates the bias factor at the cost of adding a third constraint. A significantly more complex architecture is needed for handling this constraint and for computing the bias factor. This extension is considered in Sects. 16.5, 16.6, 16.7. The key advantage of the support vector machine approach is that the projection to feature space and finding the optimal separating hyperplane is performed implicitly by solving a much simpler quadratic optimization problem. The weights αi define the solution to the optimization problem where the dual objective function, m 1 W (α) = − yi yj αi αj K(xi , xj ), 2
(16.2)
i,j =1
is maximized subject to, 0 ≤ αi ≤
1 , m
(16.3)
and m
αi = ν.
(16.4)
i=1
The parameter 0 < ν < 1 controls the trade-off between accuracy and generalization. This model is a zero-bias specialization of ν-SVM [8]. The constraint (16.4) is applied as suggested by [9]. Because of the quadratic nature of the problem there are no local optima so the solution to (16.2)– (16.4) is readily found by gradient ascent in the hyperplane defined by (16.3) and (16.4). A simple gradient ascent scheme [3, 4] updates incrementally each weight αi (subject to (16.3)) according to, 1 αi ∼ C s − Ci , m m
(16.5)
s=1
where αi is the increment of αi and Ci is the classification margin of the ith example, Ci = yi
m
yj αj K(xi , xj ).
(16.6)
j =1
The learning rule (16.5) drives the weights of easily classified examples to zero. The increment αi is always negative if the example is correctly classified with a margin larger than the average margin. Such trivial examples have hence asymptotically vanishing weights αi = 0. Note that trivial training examples will not contribute to the classification function of a fully optimized support vector machine. Memory-saving algorithms where trivial examples are discarded from the training set have been shown to be efficient [10]. Training examples with optimized weights αi > 0 are called support vectors. Only support vectors contribute to classifications. Support vectors are the unique set of training examples that define the solution by demarcating the borders of the positive valence and the negative valence domains in feature space. Support vectors are borderline events and trivial examples are commonplace events. A soft-margin support vector machine such as the zero-bias ν-SVM can handle noisy training sets with outlier examples that e.g. may have been misclassified. It has therefore two types of support vectors. Regular support vectors are training examples that delineate the classification margins in
196
M. Jändel
Fig. 16.1 Outline of an olfactory kernel machine. Solid ovals stand for known brain parts. Higher-order brain systems (HOBS) are management functions in the cortex and the limbic system. OB is the olfactory bulb. AOC is the anterior olfactory cortex. APC and PPC are the anterior and posterior piriform cortex respectively. Dashed boxes indicate hypothetical components of the kernel machine. The Trap is a register for input data in the AOC. OM is oscillating associative memory in the APC and CL is the classification logic in the PPC. Solid lines are known neural projections. Dot-dashed lines are hypothetical connections. Broad connections carrying current or recalled sensory data are D1, D2, D3, D4 and D5. Narrow modulatory projections are M1, M2 and M3. Afferents (D1) carries odour data from the OB to the Trap. Trapped inputs are forwarded to the CL (D2) and to the OM (D3). The OM projects support vectors to the CL (D4) and backwards to the Trap (D5). The CL sends results (M1) to HOBS and learning feedback (M3) to OM. HOBS trigger learning of misclassified examples (M2). The architecture is anatomically feasible but the detailed function is speculative. Note that the figure is highly simplified. Many features that are irrelevant for the present discussion are ignored. See [14] for an overview of the olfactory system
feature space. In the absence of outliers all support vectors would be regular. Outlier support vectors are training examples that violates the margins. Outliers have margins larger than the average margin so their weights will, according to (16.5), be driven to the maximum value of 1/m. The margins of regular support vectors are all equal to the average margin. The optimal weights of a regular support vectors can fall anywhere in the allowed interval.
16.3 Olfactory Support Vector Machines Trainable olfactory pattern recognition according to the kernel machine model of Sect. 16.2 is described in [3]. This section reviews the core of the hypothesis. Many different support vector machines classify odours in a wide range of contexts. Each olfactory kernel machine includes memory for support vectors in the anterior piriform cortex (APC), sensory memory for stabilizing inputs in the anterior olfactory cortex (AOC) and classification apparatus in the posterior piriform cortex (PPC). Inputs are provided by the olfactory bulb (OB) and classifications are forwarded to higher-order brain systems (HOBS). HOBS is a place holder for brain systems such as amygdale, the prefrontal cortex, the perirhinal cortex and the entorhinal cortex that are bidirectionally connected to the piriform cortex (PC).
16
Natural Evolution of Neural Support Vector Machines
197
See Fig. 16.1 for details and the notation that is used in the following subsections. Section 16.3.1 describes the Classification process—how a trained system classifies inputs. The Surprise learning process in Sect. 16.3.2 performs one-shot learning of crucial incidents. Section 16.3.3 covers the Importance learning process where support vector weights are optimized and trivial examples are purged from memory.
16.3.1 The Classification Process Consider first how a fully trained olfactory support vector machine classifies inputs (D1) from the olfactory bulb. The Trap captures and holds a stable copy x of the input for the duration of a sniff cycle of 125–250 ms. The OM is an associative memory for support vectors. It oscillates rapidly between support vector states with a frequency much faster than the sniff cycle. See [11, 12] and [13] for simulations showing the feasibility of such oscillating memories and [3] for an in-depth discussion of the OM. The OM displays a memory state for a short time before it oscillates to the next state. The endurance time Ti is the average duration of memory state xi in the perpetual oscillation of the OM. The endurance time is the physical parameter that encodes the support vector weight of the memory state. In the following we use αi as a shorthand for a dimensionless parameter that is proportional to Ti and plays the part of the SVM weight of the training example that is engraved as memory state xi . The SVM kernel function K(xi , x) is computed in the CL where projections carrying support vectors xi (D4) join afferents (D2) conveying the input vector x. The classification function (16.1) is computed by temporal summation, t0 +Ttrap yi(t) K(xi(t) , x)dt, (16.7) f (x) ∼ t0
where t0 is the starting time of the integration, Ttrap is the holding time of the sensory memory and i(t) is the index of the prevailing OM memory pattern at time t. As usual, yi is the valence of the memory pattern xi . The resulting classification is transmitted to HOBS (M1). Note that neural temporal summation produces an approximation of (16.1) where the non-linear summation of physical neurons and the stochastic nature of the presentation of the support vectors contribute to pattern recognition errors. Further details of the classification process will be discussed in Sect. 16.4.
16.3.2 Surprise Learning In this paper, we define a surprise as a stimulus that causes a neural classifier to make an error. New stimuli may be correctly classified but a surprise is by definition misclassified. Misclassifications cause strong emotional responses with positive or negative valence and trigger a surge of neuromodulators (M2) causing the OM to engrave the misclassified pattern as a new a support vector candidate. The Trap holds a stable copy of the surprising input that projects (D3) to the OM. The emotional valence of the surprise provides the label yi of the new memory pattern xi . Note that the new training example is learned from one single exposure. Mechanisms for such one-shot learning in artificial associative memories are described by Hopfield [2]. The SVM weights αi are sub-optimal following the addition of a new training example so the animal may not classify scents correctly immediately after misclassification events.
198
M. Jändel
16.3.3 Importance Learning From ancient times it has been surmised that memory is trimmed and consolidated in sleep [15]. We suggest a specific application of this idea—support vector weights are optimized and trivial examples are pruned from memory while the animal sleeps. As external inputs are suppressed in sleep, the Trap locks on inputs (D5) from the OM. Real-world data are replaced with support vectors. The OM keeps oscillating incessantly in the sleeping brain so that support vectors are presented stochastically. The Trap holds each such training example (xj , yj ) for the duration of a sniff cycle and will then capture the next support vector that is presented by the OM. The OM oscillates much faster than the sniff cycle. The probability of trapping any given example i is hence proportional to the corresponding endurance time Ti . The CL computes the kernel K(xi , xj ). Note that (xj , yj ) is the example that is trapped and (xi , yi ) is the example that currently is offered by the OM. A feedback signal Bij = yj K(xi , xj ) is projected (M3) from the CL to the OM. Note that the kernel computation in the CL thus has dual use. Once during each OM oscillation the learning rules, 1 (16.8) yi Bij , m are applied. In (16.8), Ts is the increment of the endurance time Ts . The current memory pattern i is hence depressed in proportion to yi Bij and all memory patterns are potentiated in proportion to 1 m yi Bij . The sum of endurance times is conserved. Averaging (16.8), for any given memory pattern i, over the probability distribution of the trapped examples j gives the effective learning rules, Ti ∼ −yi Bij
and ∀s : Ts ∼
i ∼ −Ci T
and
s ∼ 1 Ci , ∀s : T m
(16.9)
s is the average increment of Ts and Ci is given by (16.6). Note that the support vector where T weight αs is proportional to Ts . The OM implements hence zero-bias ν-SVM gradient ascent according to (16.5). This means that the biological support vector machine eventually acquires optimal weights. Trivial examples are erased from the OM as the corresponding weights fall to zero.
16.4 Evolutionary Path to Zero-Bias ν-SVM This section describes a hypothetical evolutionary path from primitive pattern recognition to a full implementation of kernel machines in low-level perception. The path consists of a sequence of simple modifications where each step brings some advantage to the life form. As an ongoing example we shall consider an organism living in a world with many different food stuffs and many different toxic substances. Using odours for distinguishing food from poison is crucial. As organisms evolve they will be equipped with increasingly sophisticated chemical pattern recognition systems. The primordial pattern recognition system consists of a sensor system SS and a pattern recognizer PR (Fig. 16.2a). The sensor system includes receptor cells and back-end layers for stabilizing and filtering the external input. A prototypical sensor system is the primary receptor cells combined with the glomerular layer of the olfactory bulb. The output of the pattern recognizer is a function f (x (t)) where x is the sensory input vector and t is time. A positive value of f could e.g. mean “safe to eat” while a negative value indicates “not safe to eat”. Turbulence in the odour carrying medium causes discontinuous and highly variable exposure of odour signals at the chemoreceptor neurons [16]. Adding sensory memory SM enables more sophisticated analysis (Fig. 16.2b). SM captures a snapshot x of the sensor signal and holds it stable for a time Ttrap until the next snapshot is trapped. More time is now available for computing a complex classification function f (x) of a significant input x. Sensory memory expands the range of features and phenomena that the system can recognize.
16
Natural Evolution of Neural Support Vector Machines
199
Fig. 16.2 Speculative evolutionary path leading to a biological kernel machine according to Sect. 16.3. The key adaptation is indicated for each step. (a) Base-line pattern recognition system consisting of a sensor system (SS) and a pattern recognizer (PR). The signal from SS to PR is the sensory vector x . (b) The system is extended with sensory memory (SM) providing a stable duplicate x of the sensory vector. (c) Associative memory (AM) is available in the brain. (d) Surprising signals from SM are stored in AM. The emotional valence y is recalled for sufficiently similar inputs. (e) The PR modulates the recalled valence with a similarity measure comparing x with the stored pattern x . (f) Oscillating memory (OM) and temporal summation in the PR enable pattern recognition based on a weighted average over many training examples. (g) Learning feedback from the PR to the OM tunes memory weights in real-world experiments. (h) Feedback from the OM to the SM enables virtual experiments in sleep thus completing a biological support vector machine
The organism can learn to recognize new scents by adapting neural networks in the PR thus modifying f (x). Learning new pattern recognition skills by tweaking f (x) means, however, that new memories overwrites old unless all relevant training examples are repeated continuously. One-shot learning is an essential skill in a world where organisms can not afford to repeat mistakes. As a starting point for evolving one-shot learning, we assume that associative memory (AM) is available in the brain (Fig. 16.2c). This facility has developed for some other purpose and is initially disconnected from the pattern recognition system. The next evolutionary step is to connect SM to the AM (Fig. 16.2d). Frightening, painful, pleasurable or otherwise emotional events cause a burst of neuromodulators that imprints the present sensory input x as a new memory pattern in the AM. Hopfield demonstrated the feasibility of such one-shot learning in a model of associative memory [2]. The emotional valence y of the input is a part of the memory trace. In the food search example, y = 1 indicates food and y = −1 means poison. Significance events are hence represented in persistent memory by the associated input pattern and the emotional valence. Sensory memory is essential for temporary saving the input that caused the surprise. As the organism explores its environment, the input x falls within the basin of attraction of memory pattern x and cues the AM to settle into the state (x , y ). The emotional valence y of the triggered memory is produced (see Fig. 16.2d). Remembering the emotional valence of a training example that is similar to the presently encountered substance helps to select food and avoid poisons. The system can use y directly to drive actions or more likely fuse it with other evidence in high-level decision modules. The system of Fig. 16.2d would work rather well in a world where all substances are known and have unambiguous sensory signatures. In a less clear-cut environment, food scents that are only re-
200
M. Jändel
motely similar to a known poison could fall into the basin of attraction of the corresponding memory state and thus trigger unwarranted avoidance behaviour. A successful mutation could build a connection from the associative memory to the pattern recognition module (Fig. 16.2e). The input x falls within the basin of attraction of some memory pattern x and causes the AM to settle in the state x . The PR receives x, x and y . The classification function would now be of the form f (x) = y K(x , x) where the function K measures the similarity of x and x. The PR outputs the recalled valence yf tempered by a measure of similarity between the present sensory signal and the recalled example. An activation threshold could ensure that only sufficient similar x and x trigger actions suggested by y . The resulting behaviour would be more appropriate e.g. with feeding triggered only by substances that are quite similar to known foods. A disadvantage of this system is that the organism gets little guidance if the selected pattern x is too dissimilar to x since the system compares the input with just one of the training examples. The next evolutionary step is to compare the sensory input to many stored patterns. To achieve this, the associative memory transforms so that it will not settle into a stable attractor but rather perpetually oscillate between memory states. The associative memory becomes an oscillating memory OM (Fig. 16.2f). The wide-ranging phenomenon of chaotic itinerancy ([17–19] see [20] for a review) lends credibility to the existence of such oscillating memories in brains and shows that a minor change in the dynamics of biological associative memory can cause a transition to the oscillating phase. The pattern recognizer employs temporal summation to compute t0 +Ttrap yi(t) K(xi(t) , x)dt, (16.10) f (x) ∼ t0
where t0 is the starting time of the integration, Ttrap is the holding time of the sensory memory, i(t) is the index of the present memory pattern of the OM and yi is the valence of the memory pattern xi . Temporal summation is a naturally occurring property of neurons [21] and may already be available in the PR although it served no computational function in preceding systems. No change to the PR may hence be required in the transition to the system of Fig. 16.2f. If Ttrap is much larger than the OM oscillation time, (16.10) averages to, f (x) ≈ c
m
yi αi K(xi , x),
(16.11)
i=1
where c is a positive constant and αi are weights proportional to the endurance times Ti of the corresponding memory patterns. The endurance time could depend on the emotional intensity of the event that imprinted the corresponding memory trace. Our organism can now perform pattern recognition based on weighted averages of similarity measures for many stored memories. It has in fact implemented the classification process of a support vector machine as described in Sects. 16.2 and 16.3. Equation (16.11) is identical to (16.1) provided that K is understood as the kernel function of the support vector machine, αi are the SVM weights and the signum function is applied for binary classifications. The Surprise learning process of Sect. 16.3 is also identical to one-shot learning as described in this section. Further evolution could explore that the same oscillating memory can serve multiple pattern recognition units, each tailored for a different purpose. Temporal integration according to (16.10) will, however, converge within reasonable time only for a limited number of memory patterns. The capacity of the oscillating memory is also finite. Too handle rich and variable environments the organism needs means for trimming the content of the OM to a small and dynamically updated population of vital training examples. The next evolutionary invention is to carry a feedback signal Bij = yj K(xi , xj ) from the PR to the OM (Fig. 16.2g). Note that (xi , yi ) indicates the state of the OM while xj is the sensory vector held by the SM. The feedback signal Bij includes the valence yj as evaluated by higher-order brain
16
Natural Evolution of Neural Support Vector Machines
201
systems in interaction with the world. Tasting the substance with scent xj gives e.g. the classification yj (edible or toxic). The OM uses Bij to regulate endurance times according to (16.8). Consider a world with m ˆ substances that the organism encounters with probability pˆ j . Each substance is either edible or toxic. Averaging Bij for a given OM state i over world states j and mulˆ ˆ tiplying with the valence yi of the OM state gives Cˆ i = yi m j =1 yj pˆ j K(xi , xj ). Note that Ci is the mˆ classification margin of xi for the classifier fˆ(x) = j =1 yj pˆ j K(xj , x) that averages over valences of real-world substances weighted with real-world probability pˆ j and the similarity measure K(xj , x). ˆ We also define Cˆ averaged over all m memory patterns in the OM, Cˆ OM = m1 m s=1 Cs . Applying the learning rules in (16.8) means that the endurance time Ti of all memory states with Cˆ i > Cˆ OM are driven to zero. Such states are hence pruned from the OM. The endurance time of states with Cˆ i < Cˆ OM are pushed to the maximum value Ti = Tmax . The effect of adding the feedback Bij = yj K(xi , xj ) (Fig. 16.2g) and applying the OM learning rules is that training examples that are correctly classified with a good margin are purged from the OM. The system retains training examples with a narrow margin that are hard to classify correctly. Such examples mark the borderline between categories and provide hence useful information for classification purposes. Dropping the high-margin trivial examples subtracts little from pattern recognition performance but makes the system much faster and reduces the need for memory capacity. The selection of training examples and the associated endurance times (weights) is, however, not optimal. The classifier fˆ(x) is different from f (x) so the learning process will drive the OM population and weights to a suboptimal state from a support vector machine point of view. Evolution completes the implementation of a biological support vector machine by adding a backward projection carrying memory patterns from the OM to SM (Fig. 16.2h). Input from the sensors SS dominates, however, in the waking state. As sensors are turned off in sleep, the SM will trap the otherwise suppressed input from the backward projection. Randomly selected training examples masquerade for actual sensory data. The optimization of the SVM weights is performed using the same learning rules as in the system of Fig. 16.2g but replacing real-world inputs with recalled training examples as described in Sect. 16.3.3. This final step provides two major advantages. Firstly, pattern recognition performance is improved since the selection of OM training examples and their weights are optimal. Secondly, the optimization process proceeds swiftly by virtual experimentation in sleep rather than by slow and dangerous trial-and-error in the real world.
16.5 Extended Kernel Machine Model Reference [5] describes an alternative architecture for biological support vector machines where the key components are structures also found in the motor system. This architecture fits naturally with the biased version of support vector machines. Hence we will first consider how to extend the constraints and the learning rules of Sect. 16.2 to the biased case where the classification function now takes the form, f (x) =
m
yi αi K(xi , x) + b.
(16.12)
i=1
Adding the bias factor b does not change the dual objective function in (16.2) but the problem formulation is extended with a third constraint, m i=1
yi αi = 0.
(16.13)
202
M. Jändel
It is very difficult to accommodate the extra constraint using one single memory unit but it turns out that a synchronized pair of memory units, in which positive and negative valence training examples are stored in separate memory units, will handle the biased case. Before proceeding with the architectural implementation we must, however, specify new SVM learning rules for such bisymmetric architectures. In order to express the learning rules compactly we define a symbol ∗ that can take the values + or − where + indicates positive valence training examples and − indicates negative valence training examples. The learning rule for either positive or negative valence training examples is, ∗ αi ∼ C∗ − Ci ,
(16.14)
where Ci is defined by (16.6) and C∗ is given by, C∗ =
m 1 δ∗ (yi )Ci . m∗
(16.15)
i=1
The total number of positive valence training examples is m+ while the total number of negative training examples is m− . The Kronecker delta functions are defined according to δ+ (1) = 1, δ+ (−1) = 0, δ− (1) = 0 and δ− (−1) = 1. The proof of the learning rules is found in Ref. [5].
16.6 Competitive Queuing Memory and Support Vector Machines This section describes an alternative architecture for neural support vector machines that employs the biased classification function and special learning rules of the previous section. The key difference compared to the zero-bias model of Sect. 16.3 is that the single oscillating memory is replaced with a pair of synchronized competitive queuing memories. Competitive queuing memory (CQM) [22–24] is one of the two main approaches for explaining fluent trainable motor actions. The capacity to learn and repeat temporal series of patterns is central for achieving smooth purposeful muscle action. Both competitive queuing memory and recurrent neural networks [25–27] could provide this functionality and each mechanism has its advocates. Bullock [28] compiles evidence for that competitive queuing memory drives skilled motor behaviour such as drawing and playing musical instruments. A CQM consists of a memory layer and a choice layer. For the present purpose it is sufficient to employ a high-level model in which the memory layer holds a set of weighted patterns and the choice layer selects stored patterns one by one resulting in an output stream where each pattern is exhibited for a time that is proportional to the weight of the pattern. Figure 16.3 shows a simplified architecture of a CQM-based support vector machine. Full details are found in [5]. The pattern recognition device of Fig. 16.3 performs the same three processes as the olfactory support vector machine of Sect. 16.3. The Classification process of a fully trained CQM-based support vector machine works as follows. The Trap captures a sample of the sensory input and holds a stable input vector x for the duration Ttrap of an evaluation cycle. The CQM − outputs the complete series of negative valence support vectors during the cycle. Each such support vector xi is displayed for a time that is proportional to the SVM weight αi . The Kernel− unit computes the SVM kernel function of the input vector x and the stream of negative valence support vectors x− . The CQM + and Kernel+ modules mirror this behaviour. The Integrator computes over an evaluation cycle, t0 +Ttrap m + − c (K(x, x ) − K(x, x ))dt + b = yi αi K(xi , x) + b = f (x), (16.16) t0
i=1
16
Natural Evolution of Neural Support Vector Machines
203
Fig. 16.3 Outline of a CQM-based support vector machine. Solid ovals represent surrounding systems including Sensors and Higher-order brain systems (HOBS). Boxes are system parts. The Trap operates as the corresponding component of Fig. 16.1. CQM − stores negative valence support vectors whereas CQM + stores positive valence support vectors. Kernel+ and Kernel− computes the kernel function of the input vector and the stream of positive and negative valence support vectors respectively. The Integrator computes the classification function
where c is a constant and t0 is the start time of the evaluation cycle. The system outputs the SVM classification function according to (16.12). The Surprise learning process discovers new support vector candidates and works precisely as described in Sect. 16.3. Training examples are stored CQM + and CQM − according to the valence of the examples. The Importance learning process is also very similar to the corresponding process in Sect. 16.3. The only difference is that a separate learning process is applied to each CQM reflecting the separate learning rules indicated by (16.14). Each CQM receives the feedback Bij = yj K(xi , xj ) and applies the local learning rules, Ti ∼ −y∗ Bij
and
∀s : Ts ∼
1 y∗ Bij , m∗
(16.17)
where y+ = 1 and y− = −1. Note that (xj , yj ) is the example that is trapped and (xi , yi ) is the example that currently is offered by the CQM. The notation is otherwise as explained in Sect. 16.3.3. Equation (16.17) ensures that sum of endurance times is conserved for each CQM. Averaging (16.17) for a long series of trapped examples xj gives, i ∼ −Ci T
and
s ∼ ∀s : T
1 Ci , m∗
(16.18)
where the notation again is as in Sect. 16.3.3. Since the endurance time Ts is proportional to the support vector weight αs we find that the system implements the learning rules of (16.14). By diligently applying the local learning rule of (16.17) to each CQM, the system will eventually find the optimal support vector machine weights.
204
M. Jändel
Fig. 16.4 Hypothetical evolutionary path to a CQM-based biological SVM. (a) The foundation is a motor subsystem where twin CQM units generate a stream of actions. (b) Internalized actions are used for pattern recognition. (c) The system learns new training examples. (d) Support vector weights are adapted in real-world experiments. (e) A complete SVM where support vector weights are optimized in sleep
16.7 Evolving a CQM-based Support Vector Machine This section describes a hypothetical scenario for incremental development of a CQM-based support vector machine. We will gloss over some of the details where they are similar to the previous scenario. The evolution of a pattern recognition apparatus according to Sect. 16.6 is facilitated by hijacking a component of the primitive motor system (Fig. 16.4a). The motor system component is bisymmetric perhaps reflecting the bisymmetric body plan of vertebrates. Each CQM generates a time sequence of patterns. Sensory input is provided by the sensor system SS and is stabilized by sensory memory SM. External input is blended with internally generated pattern sequences in units that are labelled K in anticipation of upcoming events. The action sequence is formed by merging outputs from both
16
Natural Evolution of Neural Support Vector Machines
205
sides of the system. The system of Fig. 16.4a stores and recalls temporal sequences that are tuned and modified by sensor data for the purpose of producing action sequences. The first evolutionary step transforms the motor system to a simple pattern recognizer by rerouting muscle control signals to higher-order brain systems where they are interpreted as perceptions rather than actions (Fig. 16.4b). This system has, as explained in Sect. 16.5, the capacity of computing the classification function of a support vector machine given that the CQM stores support vectors with appropriate valence and weights, the K units realize kernel functions and the I unit performs appropriate temporal integration and biasing. The system of Fig. 16.4b will, however, presently just amalgamate hard-wired CQM sequences with sensory input and output internalized actions. The second evolutionary step (Fig. 16.4c) adds the capacity for learning new training examples by connecting sensory input to the CQM units. The organism needs to establish a policy for selecting what, in the incessant stream of experiences, that is sufficiently important to remember. The surprise learning procedure of Sect. 16.3.2 would be a reasonable choice. Figure 16.4d adds the ability to adapt the support vector weights in real-world experiments as detailed in connection to Fig. 16.2g. Connections from each Kernel unit to the associated CQM carry feedback data that drives the optimization process according to the learning rules of (16.17). Figure 16.4e completes a full-blown CQM-based support vector machine by adding feedback from the CQM:s to the sensory memory. Stored patterns mimic sensor data by flowing back to the sensory memory while the animal sleeps. The optimal support vector weights are found according to the Importance learning process that is described in section six.
16.8 Discussion and Conclusion Section 16.3 reviews a new model for olfactory pattern recognition. Note, however, that there is a wealth of computational approaches to olfaction (see [29] for a review). Odour recognition models based on cortical dynamics include [30]. Associative memory in the piriform cortex is described by [31]. Models with central information processing in the olfactory bulb include [32, 33]. It should be understood that early rungs of the evolutionary ladder could coexist with modern structures. The direct connection from the olfactory bulb to the piriform cortex might e.g. be a part of a legacy discrimination system (at the level of Fig. 16.2a) that is hard-wired for detecting scents of high survival significance [14]. Later evolutionary steps employ, according to Sect. 16.3, the rewired route through the anterior olfactory cortex. The lobster olfactory system includes also “labelled lines” where dedicated subsystems handle specific odorants of particular survival value [34]. The locus for short-term sensory memory, in the olfactory model of Sect. 16.3, could also be in the olfactory bulb. Periodic signalling of the memory state to a secondary sensory memory in the anterior olfactory cortex would be consistent with the present model. The procerebral lobe, that has a similar function as the olfactory bulb in invertebrate species, seems to be the site of odour sensory memory [35]. It is also conceivable that several types of sensory memory with different time scales operate in different contexts. Support vector machine models of trainable pattern recognition match the architecture of both the thalamic [4] and the olfactory system [3]. It appears that there are several alternative evolutionary paths from simple hard-wired pattern recognition to a full blown biological implementation of a support vector machine. Each step along the paths adds components or connections that provide some crucial advantage in pattern recognition performance. If evolution actually has travelled along any of these paths remains to be investigated. Acknowledgements This work was supported by the Swedish Foundation for Strategic Research. Enlightening discussions with Hans Liljenström are gratefully acknowledged.
206
M. Jändel
References 1. Teyke, T.: Food-attraction conditioning in the snail. Helix Pomatia. J. Comp. Physiol. A 177, 409–414 (1995) 2. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554–2558 (1982) 3. Jändel, M.: A neural support vector machine. Neural Netw. 23, 607–613 (2010) 4. Jändel, M.: Thalamic bursts mediate pattern recognition. In: Proceedings of the 4th International IEEE EMBS Conference on Neural Engineering, pp. 562–565 (2009) 5. Jändel, M.: Pattern recognition as an internalized motor programme. In: Proceedings International Conference on Neural Networks, pp. 828–836 (2010) 6. Jändel, M.: Evolutionary path to biological kernel machines. In: Proceedings Brain Inspired Cognitive Systems (2010) 7. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Methods. Cambridge University Press, Cambridge (2000) 8. Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000) 9. Chang, C.-C., Lin, C.-J.: Training ν-support vector classifiers: theory and algorithms. Neural Comput. 13, 2119– 2147 (2001) 10. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002) 11. Pantic, L., Torres, J.J., Kappen, H.J., Gielen, S.: Associative memory with dynamic synapses. Neural Comput. 14, 2903–2923 (2002) 12. Horn, D., Usher, M.: Neural networks with dynamical thresholds. Phys. Rev. A 40(2), 1036–1044 (1989) 13. Liljenström, H.: Neural stability and flexibility: a computational approach. Int. J. Neuropsychopharmacol. 28, 64– 73 (2003) 14. Haberly, L.B.: Parallel-distributed processing in olfactory cortex: new insights from morphological and physiological analysis of neuronal circuitry. Chem. Senses 26, 551–576 (2001) 15. Quintilianus, M.F.: Institutio Oratoria, Book XI (English translation in The Orators Education, vol. 5, Books 11-12, Loeb classical library) (1995) 16. Koehl, M.A.R., Koseff, J.R., Grimaldi, J.P., McCay, M.G., Cooper, T., Wiley, M.B., Moore, P.A.: Lobster sniffing: antennule design and hydrodynamic filtering of information in an odor plume. Science 294, 1948–1951 (2001) 17. Ikeda, K., Matsumoto, K., Otsuka, K.: Maxwell-Bloch turbulence. Prog. Theor. Phys. Suppl. 99, 295–324 (1989) 18. Kaneko, K.: Clustering, coding, switching, hierarchical ordering, and control in a network of chaotic elements. Physica D 41, 137–172 (1990) 19. Tsuda, I.: Dynamic link of memory: chaotic memory map in nonequilibrium neural networks. Neural Netw. 5, 313–326 (1992) 20. Kaneko, K., Tsuda, I.: Chaotic itinerancy. Chaos 13, 926–936 (2003) 21. Johnston, D., Wu, S.M.-S.: Foundations of Cellular Neurophysiology. MIT Press, Cambridge (1995) 22. Grossberg, S.: A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans. In: Rosen, R., Snell, F. (eds.) Progress in Theoretical Biology, vol. 5, pp. 233–374. Academic Press, San Diego (1978) 23. Houghton, G.: The problem of serial order: A neural network model of sequence learning and recall. In: Dale, R., et al. (ed.) Current Research In Natural Language Generation, pp. 287–319. Academic Press, San Diego (1990) 24. Bullock, D., Rhodes, B.: Competitive queuing for serial planning and performance. In: Arbib, M. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 241–244. MIT Press, Cambridge (2003) 25. Elman, J.: Language processing. In: Arbib, M. (ed.) The Handbook Of Brain Theory And Neural Networks, pp. 508–512. MIT Press, Cambridge (1995) 26. Dominey, P.F.: Influences of temporal organization on sequence learning and transfer. J. Exp. Psychol. Learn. Mem. Cogn. 24, 234–248 (1998) 27. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990) 28. Bullock, D.: Adaptive neural models of queuing and timing in fluent action. Trends Cogn. Sci. 8(9), 426–433 (2004) 29. Cleland, T.A., Linster, C.: Computation in the olfactory system. Chem. Senses 30, 801–813 (2005) 30. Liljenström, H.: Modeling the dynamics of olfactory cortex using simplified network units and realistic architecture. Int. J. Neural Syst. 2, 1–15 (1991) 31. Li, Z., Hertz, J.: Odour recognition and segmentation by a model olfactory bulb and cortex. Network: Comput. Neural Syst. 11, 83–102 (2000) 32. Freeman, W.J.: Mass action in the Nervous System. Academic Press, New York (1975)
16
Natural Evolution of Neural Support Vector Machines
207
33. Skarda, C.A., Freeman, W.J.: How brains make chaos to make sense of the world. Behav. Brain Sci. 10, 161–195 (1987) 34. Derby, C.D.: Learning from spiny lobsters about chemosensory coding of mixtures. Physiol. Behav. 69, 203–209 (2000) 35. Gelperin, A.: Oscillatory dynamics and information processing in olfactory systems. J. Exp. Biol. 202, 1855–1864 (1999)
Chapter 17
Self-conscious Robotic System Design Process— From Analysis to Implementation Antonio Chella, Massimo Cossentino, and Valeria Seidita
Abstract Developing robotic systems endowed with self-conscious capabilities means realizing complex sub-systems needing ad-hoc software engineering techniques for their modelling, analysis and implementation. In this chapter the whole process (from analysis to implementation) to model the development of self-conscious robotic systems is presented and the new created design process, PASSIC, supporting each part of it, is fully illustrated.
17.1 Introduction One of the most important topics in the current robotics research is to provide a robotic system with self-conscious abilities. Our work starts from the hypothesis, also endorsed by several studies in the field of neuroscience, psychology and philosophy, that basic conscious behaviour can be modelled and implemented by means of a continuous loop between the activity in the brain and the events perceived in the outer world (see [6]). The perception loop realizes a continuous interaction with the external environment by means of continuously comparing the expected behaviour with the real one. In a real robotic system there may be different perception loops concurrently in action, being each one of them related to different sensor modalities or considering different parameters and aspects of the same sensor modality. Higher order perceptions make the robot able to reflect about itself, in the sense that the higher order loops allow the robot to make inferences about acting in the scene. We argue that higher order perception loops are responsible for the robot self-consciousness. Implementing generalized higher order perception loops in a robotic system is a hard issue. We are investigating how to cope with modelling and engineering these robotic systems. Nowadays literature proposes several different software engineering techniques to develop complex robotic systems. For example, in the past the agent paradigm [1, 10, 18] has proved to be successful to develop robotic applications by considering the robotic system as a collection of agents each of them responsible for a specific functionality. In this context the PASSI (the Process for Agent Societies Specification and Implementation) [13] design process provides a mean to develop multi-agent system used within different kinds application domains, for instance software for embedded robotics and agent-based information systems. In the presented work our aim is to model the development of self-conscious robotic system in its entirety, and to adopt proper software engineering techniques to conceive its parts in order to obtain a multi agent system where each agent (or a set of agents) is committed to managing the different A. Chella () Dipartimento di Ingegneria Chimica Gestionale Informatica Meccanica, Universitá degli Studi di Palermo, Viale delle Scienze, 90128 Palermo, Italy e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_17, © Springer Science+Business Media, LLC 2011
209
210
A. Chella et al.
order of perception loops. Agents’ peculiarities and characteristics such as autonomy, proactivity and situadness make a multi-agent system be suitable to implement such systems. In the past we developed and experimented an approach for the creation of ad-hoc agent design processes following the (Situational) Method Engineering paradigm [16, 31]; this approach is based on the use of a metamodel describing the set of elements to be instantiated during the system development. The results of the experiments realized in the past include agent design processes like Agile PASSI [7] and PASSIG [17, 30]. Both are based and developed starting from PASSI by extracting the essential characteristics useful to be applied to specific application contexts. The former is an agent oriented design process developed taking into account what Agile Manifesto [24] prescribes and was thought to be used as the agile development tool for robotic systems. The latter is the evolution of PASSI ad-hoc created and used to perform a goal-oriented analysis of the problem domain. This latter, together with PASSI2, the main evolution of PASSI, presents features we found useful to be integrated in a new agent design process to develop self-conscious robotic systems. Therefore, the work presented in this chapter is based on the extension of the PASSI (Process for Agent Society Specification and Implementation) process and metamodel in order to include the activities and elements needed for the construction of self-conscious robotic system. In [9] and [8] the metaphor of test has been used to develop and implement the reflective part of a robotic system. That work resulted in two different design activities, integrated in this work as a new process (PASSIC), built on the two PASSI previous evolutions (PASSI2 and PASSIG). An overview of the previous work is given in the remaining of this chapter, and the PASSIC design process is illustrated together with the whole self-conscious robotic system development process it is part of. The chapter is organized as follow: Sect. 17.2 gives some hints about the theoretical background of the presented work. In Sect. 17.3 the whole self-conscious system development process is shown and in Sect. 17.4 its central point, namely the PASSIC design process, is detailed. Finally, in Sect. 17.5 some conclusions are drawn.
17.2 Theoretical Background The aim of our work is creating multi-agent software systems able to control a robot using different perception loops. This section gives an overview of the key elements of our research: the perception loop, PASSI as the extended agent-based design process, and the techniques used to create ad-hoc design processes.
17.2.1 The Robot Perception Loop The robot perception loop described in [5, 12] (see Fig. 17.1) is composed of three parts: the perception system, the sensor and the comparative component. Through the proprioceptive sensors the perception system receives a set of data regarding the robot such as its position, speed and other information. These data are used from the perception system to generate the anticipation of the scenes and are mapped on the effective scene the robot perceives, thus generating the robot’s prediction about the relevant events around it. As it can be seen in the figure there is a loop among the perception and the anticipation, so each time some parts of a perceived scene, in what it is called the current situation, matches with the anticipated one, then the anticipation of other parts of the same scene can be generated. According to [22, 25, 28] the perception loop realizes a loop among “brain, body and environment”.
17
Self-conscious Robotic System Design Process—From Analysis
211
Fig. 17.1 The perception loop
The generalized perception loop was tested and implemented on Cicerobot, an indoor robot offering guided tours in the Archaeological Museum of Agrigento [12], and on Robotanic, an outdoor robot offering guided tours in the Botanical Garden of the University of Palermo [2]. By implementing the perception loop the robot is endowed with the ability to sense (to perceive) the word around it; besides, besides, it is argued in [6, 11] that in a real operating robot there can be different perception loops contemporaneously in action, thus realizing robot self-consciousness, the robot’s inner world perception. Each of them is applied to different abilities of sensing and reacting to external stimuli. All of them can be managed at a higher level letting the lower order loops to perceive the environment and higher order loops to perceive the self thus providing the robot with a wide autonomous control about its own capabilities, actions and behaviors.
17.2.2 The PASSI Design Process PASSI (Process for Agent Society Specification and Implementation) [13] is the design process developed several years ago in our laboratory. It is devoted to modelling and implementing different kind of multi-agent software systems by mainly exploiting the possibility it offers to decompose the system requirements into functionalities that can be assigned to a set of agents, where each one can interact with another one by exchanging knowledge about the environment they live in. PASSI has been mainly used to develop robotic systems. During the last years we also experimented the possibility of creating a process framework, whose core is PASSI, to develop a wide number of agent software kinds (see the following site for a more detailed overview: http://www.pa. icar.cnr.it/passi/PassiExtension/extensionsIndex.html). PASSI main phases and its lifecycle are illustrated in the following sections.
17.2.2.1 The PASSI Lifecycle The PASSI process covers all the phases from requirements analysis to deployment configuration, coding, and testing. PASSI has been designed to develop systems in the areas of robotics, workflow management, and information systems. Designers involved in the design process are supposed to have experiences of object-oriented design, processes like UP [23] and of concepts like a functionality-oriented requirement analysis. PASSI uses mainly models from object-oriented software engineering and UML notation to obtain the artefacts, as result of the activities it is composed of. Figure 17.21 shows a high level decomposition of PASSI where each phase is decomposed in activities (and then in tasks) resulting in the production of one artefact.2 1 The
notation used in this diagram is the one proposed by SPEM 2.0 (Software and Systems Process Engineering Metamodel) specification [26].
2 For
a detailed description of the PASSI design process refer to [13] and http://www.pa.icar.cnr.it/passi/.
212
A. Chella et al.
Fig. 17.2 Phases of the PASSI design process
1. The System Requirements phase is devoted to producing a model of the system requirements that can be committed to agents, the activities involved in this phase are: Domain Description, Agent Identification, Role Identification, Task Specification. 2. The Agent Society phase’s aim is to model the agents society knowledge and the communications the agents take part in; it also produces models describing the structure of roles played by agents and the protocol used to communicate. The activities involved are: Domain Ontology Description, Communication Ontology Description, Role Description and Protocol Description. 3. The Agent Implementation phase deals with the solution architecture both in terms of single agent view and multi agent one. The activities it is composed of are: Multi-Agent Structure Definition, Multi-Agent Behavior Description,Single-Agent Structure Definition, Single-Agent Behavior Description. 4. The Code phase provides a model of the solution at code level. It is largely supported by patterns reuse and automatic code generation. The activities are: Code Reuse and Code Completion. 5. The Deployment phase describes the distribution model system’s parts across hardware processing unit, and the allocation of agents. Several extensions to PASSI have been developed for specific application contexts. The work presented in this chapter starts from two of them: PASSI2 and PASSIG. The former was the natural evolution of PASSI after a few years of experience with that; whereas the latter is the result of a PASSI modification in order to support goal-oriented analysis.3 One of the most important features exploited from PASSI2 is the possibility of early identifying, during the analysis phase, the structural description of the identified agents. PASSIG was used for it provides means to perform a goal-oriented analysis of the features the system have to accomplish and we found it principally useful for the identification and description of the goals the robot has to execute.
17.2.3 Agent Oriented Situational Method Engineering The development of a multi-agent system always requires great effort to learn and use an existing design process. Nowadays it is recognized that it does not exist only one standard design process (or also a methodology or a method) to develop every kind of systems able to solve any kind of problems. 3 For
more details see [17] and http://www.pa.icar.cnr.it/passi/PassiExtension/exstensionsIndex.html.
17
Self-conscious Robotic System Design Process—From Analysis
213
Fig. 17.3 The proposed self-conscious system development process
Therefore there is a need to create techniques and tools for a designer to develop an ad-hoc design process prior to use it. In order to solve this problem and to give a mean for one to develop an agent system using the “right” design process, we adopted and extended the Situational Method Engineering (SME) approach [4, 20, 27, 32] by creating techniques and tools [14, 15, 31] that gave us the possibility of creating design processes to develop and implement any class of systems. SME is mainly based on the concept of “reuse”: each time the method engineer, the person devoted to create and develop methodologies, wants to create his own methodology he has to reuse portions of existing design processes already used and tested, in a certain sense in the same way a software designer does when he is developing software. The SME root element, often called method fragment, chunk, process fragment or simply fragment, is generally extracted from an existing design process, which requires using the right techniques, also for its description. Once extracted, the fragment is stored in a repository from which it can be selected whenever necessary, and assembled with another one to form a complete, new, design process.
17.3 The Proposed Development Process for Self-Conscious Systems The perception loop forms the base of the development and implementation processes for selfconscious behaviour in a robotic system, as it provides the starting point for the system to be able to activate all the proper behaviours sprung from the mismatch between the expected situation and the real perceived one while pursuing a goal. In our approach, the robot can (dynamically) tune some of the mission execution parameters, decide to adopt another behaviour or to save the successful one in a repository of cases for a future reuse. We consider robotic systems that, in the same way biological systems do, are endowed with a set of innate capabilities; these capabilities found their practical realization in the set of activities (i.e. tasks) the robot is able to perform with a precise set of parameters configuration; each set of parameters allows the robot to reach one specific goal. Figure 17.3 shows the complete development process used to develop self-conscious systems, depicting the three different areas the designer has to deal with while implementing such systems, they are: (i) Problem, (ii) Design and Configuration, (iii) Execution.
214
A. Chella et al.
17.3.1 The Three Development Areas The Problem area is composed of all the activities devoted to elicit system requirements and to identify the mission the robot has to perform in order to reach its goals. During these activities the designer considers a database where the set of abilities the system possesses are stored (the Cases); the proposed development process is applied to systems owning pre-determined abilities. More in details, the process considers two different archives: Cases and Configurations. A Case is composed of the goal description, the set of actions performed in order to reach it (a plan), pre-conditions, and the list of parameters needed for successfully applying the plan (only their names, not useful values). A Configuration is a specific set of parameter values that has proved to be successful for instantiating one specific case. It also includes the number of positive outcomes this configuration produced in pursuing the case goal. The Design and Configuration area deals with the definition of the robotic system that will accomplish the required mission while successfully fulfilling the requirement constrains. After the design has been completed, the system has to be configured in order to obtain an optimal performance. The first activity in this area is the Design activity. This corresponds to the usual application of a system design process. During this activity, the designer defines a software solution that could accomplish the required mission. This activity corresponds to the application of the PASSIC design process (see Sect. 17.4). The process starts with the inputs collected during the previous phase and according to them aims at defining two fundamental deliverables: the design of the robotic system to be built, and the design of the perception test that will drive the robot’s behavioural choices. The latter artefact, also includes the specification of the rules that will be used to tune system parameters when the executed behaviour results do not match the anticipation. Once the system is designed, one case has to be selected from the Cases database. This will be used to produce both the anticipated behaviour and in the meanwhile to start the mission execution. Case selection is done on the basis of the goal(s) to be pursued. In such selection, it is to be considered that sometimes the pursued goal cannot be satisfied by any of the cases in the database. This situation is solved by creating a new case (usually by reusing and composing existing cases). In the current implementation of the system, new cases are created by randomly selecting existing one. We plan to adopt a more rigorous and smart approach in the future. Usually, cases are described in terms of some parameters that deeply affect the expected outcome. Such set of parameter values define a configuration. In other words, a configuration is a set of records reporting instantiation data for cases in the database, with the corresponding scores, that reports successful applications of the case (with that configuration) versus total applications of it (with that configuration). If the results obtained by the application of the selected configuration are not correct (this check is performed after the Perception Test Execution), a new configuration can be tried (either by selecting a new set of values for parameters or by selecting a new case). If the results of the perception test are satisfying, the new configuration is saved in the Configurations Database as a successful one. The perception test is performed within the activities in the Execution area during which the running system produces the Anticipation Generation and executes the mission. After a case has been selected, a part of the system generates the anticipations about the mission to be performed using the case itself. For instance, if the goal is “reaching object O”, the plan might be “go from point A to point B” and the corresponding expectation is “the robot position at the end of the plan execution is (x, y)”. Once the anticipation is produced, the robot starts the execution of its mission. Referring to the former example, it moves and continuously compares its real behaviour with all the parameters involved (for instance wheels position, proprioceptive sensors and so on) in the anticipated case by means of the Perception Test Execution. If it finds some differences it activates the tuning phase by changing the initial configuration. If the expected behaviour perfectly matches with the anticipated one then the used configuration has been successful and it can be saved in the database for future reuse.
17
Self-conscious Robotic System Design Process—From Analysis
215
17.4 The PASSIC Design Process PASSIC is the design process that has been created by extending PASSI to develop and implementing self-conscious behaviour onto a robotic systems; it provides design activities for the design of each portion of the system presented in the previous section.
17.4.1 The Definition of PASSIC Design Process In [8, 9] an experiment concerning the creation of a design methodology and a model of the perception loop has been presented. The process for creating the new methodology follows the Situational Method Engineering paradigm [4, 21, 27], extends and modifies the PASSI [13], PASSI2 [17] and the PASSIG [30] processes developed by the authors in the latest years by exploiting process fragments also coming from Tropos [3, 19]. As already said in Sect. 17.2.3 Situational Method Engineering is the discipline developed in the field of information systems with the aim of creating, exploiting and evaluating techniques, methods and tools for the creation of design processes to be used in a specific application context. The SME paradigm has been extended to the agent field and a well defined approach for the creation of agent design process has been developed [31]; this one is called PRoDe (Process for the Design of Design Processes) and its main elements is the so called process fragment [14]. The whole process is composed of three main phases, the process requirements, the fragments selection and the fragments assembly. The first concern the requirements analysis of the design process under construction; the second the selection of the right process fragments from the repository and to be assembled in the following phase. PRoDe mainly exploits the use of the multi-agent system (MAS) metamodel to perform the tasks within the selection and the assembly phases. The MAS metamodel contains all the elements to be designed to develop a specific system following one specific design process. For instance the PASSI MAS metamodel contains elements such as agent, role, task etc. An agent plays some roles in order to reach an objective and has some capabilities under the form of tasks it is able to perform; each of this elements has to be designed in, at least, one activity of the design process. In PRoDe the MAS metamodel is the result of the process requirements phase and it is used as the base for the selection and assembly of fragments. An extended analysis and description of the set of requirements leading to the creation of PASSIC is reported in [9, 29], where the analysis resulted in the definition of the metamodel for the perception loop (see [8] for further details), where the elements of perception loop were identified and reflected onto a robotic system. Briefly, some central elements of the metamodel are: the robot having the responsibility of pursuing one or more goals composed of plans and actions, i.e. physical or communicative acts between the robot and external objects that result in the change of the surrounding environment. The robot also has capability by means of test, simulated act and log that implement the robot’s inner and outer reflections (i.e., the perception loop). In order to cope with the aforementioned elements of the conscious metamodel two process fragments coming from the Unified Process (UP) [23] (Test Plan and Design and Test Execution) have been reused, modified and integrated; the former’s aim is to identify the system functionalities to be tested, the available system resources and the test objective in order to design the Anticipation Generation. The latter aims at defining the Execution Test in order to identify defects and analyze the results also by means of defining criteria for evaluating perception test results.
216
A. Chella et al.
Fig. 17.4 The PASSIC design process—phases
17.4.2 The PASSIC Process Lifecycle PASSIC includes three phases arranged in an iterative/incremental process model (see Fig. 17.4): • System Requirements: it covers all the phases related to a goal-oriented requirements analysis and agents/roles identification. • Agent Society: where all the aspects of the agent society are faced. • Implementation: A view on the system’s architecture in terms of classes and methods to describe the structure and the behavior of single agent, reusable code and source code for the target system, how the agents are deployed and which constraints are defined/identified for their migration and mobility. Each phase produces a document that is usually composed aggregating UML models and work products produced during the related activities. Moreover each phase is composed of one or more subphases responsible for designing or refining one or more artefacts that are part of the corresponding model. The details of each phase are discussed in the following subsections.
17.4.2.1 The System Requirements Phase The System Requirements phase aims at analyzing the problem domain through a goal-oriented analysis in order to produce the model of the system in terms of agency, the set of actors involved in the system under construction and the related goals (Fig. 17.5). Developing this phase involves eight activities: 1. Domain Description provides a mean for analyzing the problem statement, that is the description of the problem to be faced, in order to identify the actors involved in the system and their goals; actor is an intentional entity that can be external or internal and that has a strategic interest, i.e. the goal. 2. Domain Analysis aims at identifying the tasks, each actor has to perform in order to pursue a goal, and at applying means-end-analysis in order to relate each task to (at least) one goal. A task is a specific set of actions performed in order to pursue a goal, or a sub-goal. 3. Identify System where the System-to-be actor is identified. The System-to-be actor represents the system under construction together with the dependencies with all the other actors of the environment. 4. Agent Structure Exploration where it is performed an analysis-level description of the agent structure in terms of tasks required for accomplishing the agent’s functionalities. 5. Describe Environment produces the system’s actors and goals that can be assigned to the Systemto-be actor hence identifying the dependencies between the System actor and all other actors.
17
Self-conscious Robotic System Design Process—From Analysis
217
Fig. 17.5 The activities of the system requirements phase
6. Identify Architecture for decomposing the System-to-be into sub-actors, to which goals are assigned, and for identifying agents. Generally each sub-actor can be mapped onto an agent. 7. Define Agent Society aims at identifying a set of capabilities for each agent in order to establish which plans they have to follow. 8. Roles Identification provides a mean for identifying the roles each agent plays and the dependencies among agents. The role represents the social behaviour of an agent.
17.4.2.2 The Agent Society Phase The Agent Society phase introduces an agent-oriented solution for the problem described in the previous phase. This phase presents an ontological description of the domain where agents will live and their communications. Next agents are described in terms of the roles they play, services provided by roles, resource dependencies and, finally, their structure and behaviors. Once an agent solution has been identified, the autonomous part of the system, devoted to create the expectation about the results of plans application, and the related configuration management is designed. Developing this phase involves eight activities (Fig. 17.6): 1. Domain Ontology Description aims at identifying and describing the ontological elements the system will deal with, in order to define the pieces of knowledge of each agent and their communication ontology. The domain categories are: concepts, actions that could affect their state and propositions about values of categories. 2. Communication Ontology Description for describing agents’ communications in terms of the previously determined ontology, interaction protocol and message content language. 3. Perception Test Planning and Design where the anticipation is produced, starting from the agent society architecture, the knowledge about the environment and requirements. The set of tasks, each agent has to pursue, is modeled from a structural point of view. The purpose is to describe the robot’s actions while interacting with the environment. 4. Role Description aims at modeling the whole lifecycle of each agent formalized by the distinct roles played, the tasks involved in the roles, communication capabilities and inter-agent dependencies in terms of services.
218
A. Chella et al.
Fig. 17.6 The activities of the agent society phase
5. Multi-Agent Structure Definition (MASD) describes the structure of solution agent classes at the social level of abstraction. 6. Multi-Agent Behavior Description describes the behavior of individual agents at the social level of abstraction. 7. Perception Test Execution aims at designing the portion of system devoted at producing the results of the comparison between the observed and the expected robot’s/system’s behavior and the criteria to evaluate them. 8. Configuration Management designs the rules for tuning the system parameters. This activity is obviously strictly related to the specific robotic platform to be used to deploy the designed multiagent system.
17.4.2.3 The Implementation Phase Implementation Phase results in the model of the solution architecture in terms of classes, methods, deployment configuration, code and testing directives. In this phase, the agent society defined in the previous models and phases is seen as a specification for the implementation of a set of agents that should be now designed at the implementation level of details, then coded, deployed and finally tested. The Implementation Phase is composed of seven activities (Fig. 17.7): 1. Single-Agent Structure Definition describes the structure of solution agent classes at the implementation level of abstraction. 2. Single-Agent Behavior Description describes the behavior of individual agents at the implementation level of abstraction. 3. Deployment Configuration describes the allocation of agents to the available processing units and any constraints on migration, mobility and configuration of hosts and agent-running platforms. 4. Code Reuse uses a library of patterns with associated reusable code in order to allow the automatic generation of significant portions of code. 5. Code Completion where source code of the target system is manually completed. 6. Agent Test is devoted to verifying the single behavior with regards to the original requirements of the system solved by the specific agent.
17
Self-conscious Robotic System Design Process—From Analysis
219
Fig. 17.7 The activities of the implementation phase
7. Society Test where the validation of the correct interaction of the agents is performed in order to verify that they actually concur in solving problems that need cooperation. More details about PASSIC phases and activities and how it has been created starting from its metamodel by using PRoDe approach can be found in [8, 9, 29]; [29] also provides the description of the experiment made in order to test the usability of PASSIC.
17.5 Conclusion The authors developed in the past some agent-oriented design processes realizing the possibility of designing systems working in different application contexts mainly exploiting the fact that agent oriented processes can be used as a design paradigm. The work presented here focuses on the creation of a complete process for the development of a self-conscious robotic system and starts from the hypothesis that self-consciousness in a robot may be reached by means of different orders of perception loops. Each loop can be managed by an agent, or a society of agents. The experiences made in the latest years in the creation of ad-hoc design processes allowed the identification and the analysis of the requirements for the creation of a design process, realizing such a system, by following a perception driven approach; the continuous loop between perceived events and activities in the brain is the core of the self-conscious behaviour we want to emulate in a robotic system. The result was the extension of PASSI design process, by integrating it with new techniques to design the robot perception loop, thus creating PASSIC whose activities are fully described in this chapter. PASSIC contains all the activities for the complete development of self-conscious robotic system and allows to design and implement the perception loop thus making a robotic system able to move in a dynamic environment, by continuously detecting the differences between the expected and the real behaviour, and tuning its parameters also learning successfully experienced behaviours for later reuse in novel situations. Acknowledgements
This work has been partially supported by the EU project FP7-Humanobs.
220
A. Chella et al.
References 1. Alami, R., Chatila, R., Fleury, S., Ghallab, M., Ingrand, F.: An architecture for autonomy. Int. J. Robot. Res. 17(4), 315 (1998) 2. Barone, R., Macaluso, I., Riano, L., Chella, A.: A brain inspired architecture for an outdoor robot guide. In: Samsonovich, A. (ed.) Proc. of AAAI Fall Symposium on Biologically Inspired Cognitive Architectures BICA’08. AAAI Press, Menlo Park (2008) 3. Bresciani, P., Giorgini, P., Giunchiglia, F., Mylopoulos, J., Perini, A.: Tropos: An agent-oriented software development methodology. Auton. Agents Multi-Agent Syst. 3, 203–236 (2004) 4. Brinkkemper, S., Lyytinen, K., Welke, R.: Method engineering: Principles of method construction and tool support. Int. Federat. Inform. Proc. 65, 65 (1996) 5. Chella, A.: Towards robot conscious perception. In: Chella, A., Manzotti, R. (eds.) Artificial Consciousness. Imprinting Academic, Exeter (2007) 6. Chella, A.: A robot architecture based on higher order perception loop. In: Hussain, A. (ed.) Brain Inspired Cognitive Systems 2008. Springer, Berlin (2009) 7. Chella, A., Cossentino, M., Sabatucci, L., Seidita, V.: Agile PASSI: An agile process for designing agents. J. Comput. Syst. Sci. Int. 21(2), 133–144 (2006), Special issue on Software Engineering for Multi-Agent Systems 8. Chella, A., Cossentino, M., Seidita, V.: Towards a methodology for designing artificial conscious robotic system. In: Samsonovich, A. (ed.) Proc. of AAAI Fall Symposium on Biologically Inspired Cognitive Architectures BICA ’09. AAAI Press, Menlo Park (2009) 9. Chella, A., Cossentino, M., Seidita, V.: Towards the adoption of a perception-driven perspective in the design of complex robotic systems. In: Proc. of the 10th Workshop on Objects and Agents (WOA09) (2009) 10. Chella, A., Frixione, A., Gaglio, S.: An architecture for autonomous agents exploiting conceptual representations. Robot. Auton. Syst. 25, 231–240 (1998) 11. Chella, A., Macaluso, I.: Higher order robot perception loop. In: BICS 2008 Brain Inspired Cognitive Systems, June 24–27, 2008. Springer, Berlin (2008) 12. Chella, A., Macaluso, I.: The perception loop in Cicerobot, a museum guide robot. Neurocomputing 72, 760–766 (2009) 13. Cossentino, M.: From requirements to code with the PASSI methodology. In: Henderson-Sellers, B., Giorgini, P. (eds.) Agent Oriented Methodologies, pp. 79–106. Idea Group Publishing, Hershey (2005). Chap. IV. http://www. idea-group.com/books/details.asp?id=4931 14. Cossentino, M., Gaglio, S., Garro, A., Seidita, V.: Method fragments for agent design methodologies: from standardisation to research. Int. J. Agent-Oriented Softw. Eng. 1(1), 91–121 (2007) 15. Cossentino, M., Galland, S., Gaglio, S., Gaud, N., Hilaire, H., Koukam, A., Seidita, V.: A mas metamodel-driven approach to process composition. In: Proc. of the Ninth International Workshop on Agen-Oriented Software Engineering (AOSE-2008) at the Seventh International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2008) (2008) 16. Cossentino, M., Seidita, V.: Composition of a new process to meet agile needs using method engineering. Softw. Eng. Large Multi-Agent Syst. 3, 36–51 (2004) 17. Cossentino, M., Seidita., V.: PASSI2—going towards maturity of the PASSI process. Technical Report ICAR-CNR (09-02) (2009) 18. Dominguez-Brito, A., Hernandez-Sosa, D., Isern-Gonzalez, J., Cabrera-Gamez, J.: Integrating robotics software. In: Proceedings. ICRA’04. 2004 IEEE International Conference. Robotics and Automation, vol. 4 (2004) 19. Giorgini, P., Kolp, M., Mylopoulos, J., Castro, J.: Tropos: a requirements-driven methodology for agent-oriented software. In: Agent Oriented Methodologies, pp. 20–45. Idea Group Publishing, Hershey (2005). Chap. II. http:// www.idea-group.com/books/details.asp?id=4931 20. Harmsen, A.F., Brinkkemper, S., Oei, H.: Situational method engineering for information system projects. In: Methods and Associated Tools for the Information Systems Life Cycle, Proceedings of the IFIP WG8. 1 Working Conference CRISí94, pp. 169–194 (1994) 21. Harmsen, A.F., Ernst, M., Twente, U.: Situational Method Engineering. Moret Ernst & Young Management Consultants (1997) 22. Hurley, S.J.: Varieties of externalism. In: Menary, R. (ed.) The Extended Mind. Ashgate (in press) 23. Jacobson, B.: Rumbaugh, the Unified Software Development Process. Addison-Wesley, Reading. ISBN 0-20157169-2 24. Manifesto for Agile Software Development: http://www.agilemanifesto.org 25. Manzotti, R., Tagliasco, V.: An externalist process-oriented framework for artificial consciousness. In: Chella, A., Manzotti, R. (eds.) AI and Consciousness: Theoretical Foundations and Current Approaches. AAAI Press, Menlo Park (2007) 26. OMG Object Management Group. Software Process Engineering Metamodel. Version 2.0. Final Adopted Specification ptc/07-03-03. March 2007
17
Self-conscious Robotic System Design Process—From Analysis
221
27. Ralyt˙e, J.: Towards situational methods for information systems development: engineering reusable method chunks. In: Proc. 13th Int. Conf. on Information Systems Development. Advances in Theory, Practice and Education, pp. 271–282 (2004) 28. Rockwell, W.T.: Neither Brain nor Ghost. MIT Press, Cambridge (2005) 29. Seidita, V., Cossentino, M.: From modeling to implementing the perception loop in self-conscious systems. Int. J. Mach. Conscious. 2(2), 289–306 (2010) 30. Seidita, V., Cossentino, M., Gaglio, S.: Adapting passi to support a goal oriented approach: a situational method engineering experiment. In: EUMAS’07 The Fifth European Workshop on Multi-Agent Systems. Hammamet, Tunisia, 13–14 December 2007 (2007) 31. Seidita, V., Cossentino, M., Hilaire, V., Gaud, N., Galland, S., Koukam, A., Gaglio, S.: The metamodel: a starting point for design processes construction. Int. J. Softw. Eng. Knowl. Eng. 20(4), 575–608 (2010). doi:10.1142/S0218194010004785 32. ter Hofstede, A.H.M., Verhoef, T.F.: On the feasibility of situational method engineering. Inf. Syst. 22(6/7), 401– 422 (1997)
Chapter 18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture Raúl Arrabales, Agapito Ledezma, and Araceli Sanchis
Abstract The concept of qualia poses a central problem in the framework of consciousness studies. Despite it being a controversial issue even in the study of human consciousness, we argue that qualia can be complementarily studied using artificial cognitive architectures. In this work we address the problem of defining qualia in the domain of artificial systems, providing a model of “artificial qualia”. Furthermore, we partially apply the proposed model to the generation of visual qualia using the cognitive architecture CERA-CRANIUM, which is modeled after the global workspace theory of consciousness. It is our aim to define, characterize and identify artificial qualia as direct products of a simulated conscious perception process. Simple forms of the apparent motion effect are used as the basis for a preliminary experimental setting focused on the simulation and analysis of synthetic visual experience. In contrast with the study of biological brains, the inspection of the dynamics and transient inner states of the artificial cognitive architecture can be performed effectively, thus enabling the detailed analysis of covert and overt percepts generated by the system when it is confronted with specific visual stimuli. The observed states in the artificial cognitive architecture during the simulation of apparent motion effects are used to discuss the existence of possible analogous mechanisms in human cognition processes.
18.1 Introduction Understanding what qualia are and identifying their functional role are key open problems in the scientific study of consciousness. Typically, research efforts have been put into the domain of human consciousness, specifically from the point of view of philosophy of mind. However, little empirical work has been carried out in the field of Machine Consciousness (MC). This is not surprising given the elusive nature of the phenomenal aspects of consciousness and the relative immatureness of the MC research field. In this work we have adopted a Synthetic Phenomenology (SP) approach [1], which lies on the processes of modeling and simulation of phenomenal states in artificial systems. The SP approach provides a pragmatic framework for the study of qualia using machines. We think that the specification of the contents of subjective experience in a machine, whether or not they can be claimed to be real phenomenal states, is a valuable approach for making progress in the understanding of qualia (both in natural and artificial organisms). Although empirical results obtained using machine consciousness models are not directly applicable to humans, building models and simulations of experiential states can potentially shed some light on the problem of human conscious perception. For instance, as illustrated in this work, the functional dimension of qualia can be effectively analyzed using controlled experimental settings. Additionally, R. Arrabales () Carlos III University of Madrid, Avda. Universidad, 30, 28911 Leganés, Spain e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_18, © Springer Science+Business Media, LLC 2011
223
224
R. Arrabales et al.
the design of the cognitive architecture can be verified to replicate the same perceptual illusions as produced in humans when confronted with analogous stimuli. In this particular case, as the CERACRANIUM architecture is based on the Global Workspace Theory (GWT) [2], a plausible functional explanation of how conscious experience could be generated out of a global workspace is provided. In order to address the problem of simulating qualia in artificial cognitive architectures we need a model of artificial qualia that can be used as a conceptual reference framework. We argue that a formal definition, or at least a functional characterization, of artificial qualia is required in order to establish valid engineering principles for synthetic phenomenology. Having a model of artificial qualia could be also useful for understanding the differences between “natural qualia” and the specification of qualia being simulated in an artificial system. Furthermore, we see this process as a continuous refinement of the artificial qualia model, which can be iteratively improved based on the feedback from simulation results.
18.2 A Model of Artificial Qualia The proposed model for artificial qualia is based on the assumption that qualia play a crucial role in cognition and they should not be considered epiphenomena. In other words, we argue that it is nonsense to make a distinction between the what is it like and the functionality: the contents of experience and their associated functionality are actually two aspects of the very same thing. Cognitive science has shown us that human conscious perception is not directly based on the data acquired by senses, but heavily biased by psychological aspects like for instance cognitive context, subject’s history, and expectations. Moreover, context and subjective history are in turn shaped by the specific way in which stimuli are consciously perceived. Although we tend to believe that we perceive reality, qualia generated in our brains are quite far from being a truthful representation of real world. Nevertheless, our conscious experience of the world generally proves to be highly reliable in terms of survival and performing everyday tasks. In short, qualia are the way the world is interpreted by the subject in such a way that is advantageous for his or her goals. The former intuitive definition of conscious perception does not seem sufficient to build a complete and accurate model of a conscious machine including phenomenal aspects (see [3] for a discussion about the grand illusion of consciousness and perceptual phenomenology). In fact, there is no satisfactory and generally accepted model or theory of qualia that could be applicable to humans, other animals, or even machines. In other words, the phenomenal dimension of consciousness, both in natural and artificial creatures, remains elusive to the scientific study. Additionally, as pointed out by Sloman [4], many times we might be discussing bogus concepts due to the use of misleading contexts and ill-defined terms. Although we are proposing a new model in this work, it is not our aim to contribute to the existing confusion in the field of MC, but to help clarifying concepts from the point of view of engineering. As argued by Sloman [4], directing the basic questions about consciousness to machines with different designs can help to figure out what really needs to be explained. While many Artificial Intelligence (AI) implementations cover some aspects of the broader picture of cognition, we have not yet reached the point where human-like conscious machines are possible [5]. One of the greatest challenges that still need to be addressed is the design of computational models of qualia, i.e. models of artificial qualia. MC designers are in the need of a practical definition of what an artificial conscious mind is. In this context, we assume that a complete and scientifically established definition of what artificial qualia could be does not yet exist. However, we propose to circumvent this problem by looking for alternative, interim, and partial definitions that could practically contribute to the advancement of the MC research field. This approach does not necessarily lead to better implementations in terms of performance, but the whole exercise is expected to provide more insight about what conscious machines could be like, what
18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture
225
design strategies are more promising, and how neuroscience could benefit from the use of computational models of qualia. As Chrisley has suggested [1], one of the aims of SP should be to characterize the phenomenal states possessed or modeled in MC implementations. In the next subsections we attempt to provide a decomposition of the artificial phenomenology problem into more tractable and recognizable steps. It is our hypothesis that the partial definitions of artificial qualia that we outline here may become useful conceptual tools in the domain of SP.
18.2.1 Decomposing the Concept of Qualia Taking into account the complexity of the problem of understanding qualia, we have no other option than to bear with some of the well-known but controversial issues of the scientific study of consciousness. The study of qualia can be regarded as the specification of phenomenal states. We argue that a possible way to analyze qualia and tackle the problem of private first-person observations is to distinguish between different but interrelated components of such an intricate concept. Many problems in the science of consciousness are rooted in the fact that the term consciousness can be used to refer to multiple related aspects [6]. In other words, consciousness can be seen as different phenomena according to the perspective of the observer. If we distinguish between phenomenal and functional dimensions as suggested by Block [7], the process of conscious perception can be seen from two different perspectives: – Conscious perception is the set of phenomenal experiences of which our inner life is made of (the “what is it like” to have experiential mental states [8]), e.g. the redness of red. – Conscious perception is the set of functional representations or internal models of the world adapted to our needs, which are made available for use in reasoning and action, e.g. the neuronal encoding of color in the brain (see [9] for details). These two views (phenomenal consciousness and access consciousness respectively) should not be considered exclusive or contradictory, but complementary aspects of the same complex process. In fact, natural cognition encompasses these two dimensions: conscious contents of our mind are both experiential mental states and also functional representations accessible for explicit reasoning and action. As Haikonen has suggested [10], qualia are the direct products of the perception process, and without qualia there is no consciousness. Therefore, qualia cannot be neglected in the study of consciousness, especially if we want our MC computational models to be of any use to the quest for understanding of human cognition. While qualia are usually associated with the first view (phenomenal consciousness), most of the work done in the domain of artificial cognitive systems is exclusively related to the second view (access consciousness). One of the reasons for this bias is the poor comprehension of phenomenal aspects of consciousness. Another significant reason is that plenty of work still needs to be done on machines that apparently do not require qualia for successful performance. Determining why some machines might need qualia is a related question that is addressed below.
18.2.2 Circumventing the First-Person Problem The problem of considering both personal and access views at the same time is that phenomenal consciousness is only available to the first-person observer, i.e. it is an inherently private property [11]. People usually infer the existence of inner experience in human counterparts using third-person observations combined with the similarity argument: if I feel pain when I get hurt, I infer other humans will
226
R. Arrabales et al.
likely feel the same in the same situations (because they have a nervous system like mine). However, when it comes to detecting the presence of phenomenal states in machines we cannot count on the similarity argument during the inference process. Being qualia inherently private, how could we determine if a machine is experiencing any inner world? (see [12] for a comprehensive discussion about the problem of measuring machine consciousness). Essentially, this issue is the so-called hard problem of consciousness [13], described from the point of view of MC. We do not have yet any commonly accepted and convincing explanation of how phenomenal consciousness is produced, i.e. we lack a theory of qualia that could be translated into a computational model. Does this mean that any attempt to create artificial qualia will be futile? Or should we, instead of giving up the challenge, try to explore machine qualia as a means to shed light on the nature of consciousness? Do we really need to understand the very nature of consciousness in order to reproduce it in machines? Is there a real lack of scientific tools to address the problem of consciousness? Can we develop a model of phenomenal consciousness exclusively based on third-person approaches? Perhaps these questions cannot be answered yet; however, we believe that third-person approaches can be useful to make progress in the field of artificial cognitive systems and their application to phenomenal consciousness research. Typically, applying third-person approaches involved looking exclusively at behaviour, including different forms of accurate report [14]. However, external observers are also able to inspect the architecture and inner machinery of a creature. The inner inspection and monitoring of biological living organisms, including humans, is much more problematic than the inspection of working implementations of artificial cognitive systems. Therefore, the analyses of correlation between observed behaviour and internal state of MC implementations has to be exploited as it could provide valuable information about the models being tested (without the existing limitations in analogous experiments with biological creatures). We argue that, following this line of research, a limited definition of artificial qualia can be made in such a way that only third-person approaches are used. This partial definition might not explain phenomenal qualia as it is present in humans, but it could be used to develop computational models, and subsequent implementations, which then could be used to enhance our understanding of “natural consciousness”.
18.2.3 The Function of Qualia Understanding what is the function of qualia and why they emerged as part of biological evolution is an essential part of the challenge of the scientific study of consciousness. As usual, the research interplay between natural and artificial sciences can be seen both ways: on one hand, a comprehensive understanding of qualia, as they manifest in biological creatures, might make possible the building of conscious machines; on the other hand, the progress towards a complete understanding of qualia in biology might benefit from the research on new computational models focused on phenomenology. These ideas about qualia are not free of controversies. While some argue that qualia are mere epiphenomena (e.g. [15]), we believe phenomenal consciousness appeared as an evolutionary advantage. One way to prove it would be to compare the performance of phenomenally conscious machines versus unconscious machines, both confronted to complex tasks in unstructured environments. Given that such sorts of experiments are not realizable nowadays, we will focus on both evidence from the biological world and current computational models. There are many features related with consciousness that we know are useful because the contribute to survival (for instance, Theory of Mind [16]). These cognitive capabilities have associated functions and that is the reason why they have been selected by evolution. Is the same argument applicable to phenomenal aspects of consciousness? Do they have a clear functional role?
18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture
227
Qualia or subjective experience should not be seen as an additional component of the complex notion of consciousness, but a process that is present in relation with cognitive features. Qualia are experienced by a creature when it is able to introspect some of its perceptual processes and use that introspection to generate a meta-representation which in turn is used to modulate the way the system works. Qualia are indeed the output of the perception process [10], which in some cases are made explicit thanks to a transparent access to the perception process outcome (sensory system response to stimuli). In short, when we are aware of a red object in our field of view, we do not perceive the colour red, but the redness quale, which is the reaction of our perceptual system to the red colour stimulus. The role of qualia as described above could be studied in artificial systems. The generation of artificial qualia along these lines could provide insight about consciousness applicable to biological creatures. As argued by Sloman and Chrisley [17], a machine could even develop private ontologies for referring to its own private perceptual contents and states. The use of this ontology for modulating the system processes is the major function of qualia. A system with qualia is a system with autonomous meta-management capabilities (combination of introspection and active control based on self-monitoring). Making a serious effort to design and build such systems will contribute to the confirmation of refutation of this hypothesis about the role of qualia in biological creatures.
18.2.4 A Computational Model of Qualia In the domain of MC research designers have to deal with the concept of qualia in order to develop implementations that could be claimed to be conscious (or to model/simulate phenomenal consciousness). One way to mitigate the complexity involved in this task is to conceptually decompose the notion of qualia into different aspects that can be analyzed separately. It is our hypothesis that working with these partial views might provide useful clues for directing future research. We propose to use partial but complementary definitions of artificial qualia distinguishing between three different stages or characterizations of the development of mechanisms that support qualia in machines (see Fig. 18.1): – Stage 1. Perceptual Content Representation. At this stage the information acquired by the perceptual system of the machine is integrated and interpreted, generating a subjective representation. This content is built as a result of the combination of exteroceptive and proprioceptive sensing subsystems, hence giving place to an inherently subjective content representation. The process that generates this perceptual content involves a continuous checking for consistency. In other words, a number of possible partial reconstructions of the world compete for being integrated into the final consistent match. This match, or inner world final reconstruction, is achieved by a coherent integration between what is being sensed from the external world and what is currently represented as inner depiction (see Dennett’s Multiple Draft Model [18], for a more metaphorical description of these sorts of competitive/collaborative content creation processes). Note that the integration process seeking coherency in perception also involves feedback from higher states, i.e. although we provide here an independent description for each stage, the overall process of conscious perception has to be considered as an effective integration of the mechanisms described in all stages. – Stage 2. Introspective Perceptual Meta-Representation. This stage refers to the required mechanisms for the monitoring of the processes described in Stage 1. These monitoring mechanisms involve the creation of meta-representations about how perceptual content representations are created. Observing how machine’s own perceptual content is created and manipulating the associated meta-representations are essential requirements for the potential development of a private ontology. Such an ontology would convey grounded meaning about what is it like for the machine to experience subjective perceptual contents.
228
R. Arrabales et al.
Fig. 18.1 Stages in the development of artificial qualia
– Stage 3. Self-Modulation and Reportability. In the case of a machine being able to achieve stages 1 and 2, meta-representations from Stage 2, or introspective ontologies [4], could be used to modulate the way all perceptual systems work (including all mechanisms associated with stages 1 and 2). This constitutes a self-regulation loop that has clear functional implications; i.e. qualia as defined here are part of the causal process. Additionally, introspective ontologies created in Stage 2 could be used to report the artificial qualia of the machine. The stages described above make no claims about the qualities or modalities of the specific contents of the artificial mind, or to what extent they could resemble human subjective experience. The presence of different sensory modalities (laser ranging, for instance) and different mechanisms for cognition will produce different conscious contents and associated qualities. The proposed stages are just the components of a conceptual framework or guideline for the design of MC architectures. Subsequent implementations and associated experiments are expected to clarify some aspects of the nature of phenomenal experience and its impact in cognitive abilities. Self-consciousness is not specifically addressed in the proposed definition because it is not considered a requirement for phenomenal consciousness. Nevertheless, self-consciousness could be described in the context of the proposed framework having a model of the body as part of the perceptual content representation. The concept of self would be expected to arise as a meta-representation in Stage 2. Then, references to the self could be found in accurate reports generated in Stage 3. In other words, the self might arise as a stable concept in the agent’s private ontology.
18.2.5 Detecting the Presence of Qualia The definition of qualia that we have described represents a hypothesis to be tested and cannot be taken as established knowledge. Therefore, the presence of qualia cannot be scientifically tested just by detecting the proposed mechanisms through inspection. In contrast, we suggest using this approximation to what qualia might be as a working hypothesis. This approach calls for the experimentation with MC implementations that follow the proposed assumptions. The result of this experimentation
18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture
229
Fig. 18.2 Sequence of images used to produce apparent motion qualia in humans
process is expected to prove whether or not the original hypothesis was correct. One of the benefits of this kind of approach is that the first-person problem can be circumvented. Nevertheless, more work should be done in order to design meaningful experiments that effectively combine both behavioural outcome and architecture inspection. Also, identification of hallmarks of qualia, like bistable perception [19], could be put in the context of the model proposed in this work.
18.3 Visual Experience and Apparent Motion Effect We have chosen to simulate the conscious perception of apparent motion in order to illustrate the proposed segmented characterization of artificial qualia. Humans can perceive motion not only from real moving objects, but also from series of images containing spatially segregated stimuli [20]. Simple experiments to test this effect consist of two stationary blinking stimuli which are presented using various spatial and temporal parameters (see Fig. 18.2). At certain rates, subjects perceive (apparent) motion. The sequence of images depicted in Fig. 18.2 are used to generate apparent motion qualia in humans, and presumably also in plausible models of phenomenally conscious machines. Note that a blank inter-stimulus interval (ISI) is inserted after every dot stimulus (the looping sequence is as follows: left dot stimulus—blank—right dot stimulus—blank). This experiment, which is usually carried out with human subjects, could be carried out using a machine as subject. Putting this experiment in the context of the former characterization of qualia, let us consider a robot with a visual perceptual system modelled after the human visual cortex. The basic content at each stage could be described as: – Stage 1: “moving dot”. – Stage 2: “what is it like to see a moving dot”. – Stage 3: “I report to be watching a moving dot”. Figure 18.1 represents these different levels of content description in each of the stages that we have defined for the development of artificial qualia. The perception process in the robot would follow these steps: first of all, visual sensor acquires the images using its light detection sensors. At the same time, robot’s somatosensory system acquires the relative position of the camera, its orientation, and its foveation. The combination of sensory data from exteroceptive sensors (pixel maps from the camera) and proprioceptive sensors (relative coordinates from camera position and focus sensor) is then used to form depictions of percepts along the lines described by Aleksander and Dunmall [21]. As the sequence described in Fig. 18.2 is presented to the robot, single depictive percepts are created to represent the appearance of dots and their relative positions. Subsequently, robot motion detectors, fed with the stream of dot percepts, will eventually create new motion percepts depending, amongst other things, on the duration of ISI frames. These moving dot percepts (or moving dot representations) are the contents of Stage 1 (“moving dot”). The presence of the motion percepts will in turn trigger a set of reactions in the system. For instance, if the robot is designed to keep track of some moving objects, or detect some types of trajectories, the associated detectors will be activated. Also, affective evaluations (or somatic markers [22]) of percepts could be invoked (the robot could be designed, or could have learnt, that moving dots have to be evaluated positively, and therefore maintain bonds with them). If the robot were endowed with a
230
R. Arrabales et al.
mechanism to represent these reactions, he would generate meta-representations of “what is like” for the robot to see a moving dot. This content corresponds to our Stage 2 definition. Finally, if Stage 2 introspective content is used both for self-regulating global perception-action processes and also for reporting purposes, then the robot would be able to reason explicitly about what does it mean to him to see a moving dot. Provided with the necessary linguistic skills, the robot will also be able to report his mental content using his own ontology (Stage 3 content).
18.4 Modeling Qualia in the CERA-CRANIUM Cognitive Architecture Baars originally proposed the GWT as a metaphor for the access dimension of consciousness [2] (see [7] for a definition of access consciousness in contrast with the phenomenal consciousness concept). Taking the GWT as the main inspiration we have developed a cognitive architecture called CERACRANIUM [23]. The mechanisms implemented in this architecture are used to test the proposed model for the specification of the contents of subjective experience. In the following, we briefly introduce the GWT and describe the corresponding computational model that we have used as the basis for this research work.
18.4.1 Global Workspace Theory GWT explains access consciousness using the intuitive idea of a “theater”. Baars’ theater is a functional explanation of consciousness which can be considered as a contrary vision of the dualism advocated by the “Cartesian theater” idea. In Baars’ metaphorical theater, the scene corresponds to the working memory and the spotlight on the scene represents the focus of attention. The selection of the position of the spotlight is primarily done behind the scenes by the play director—executive guidance processes. The (conscious) action taking place under the bright spot on stage is formed thanks to a large set of specialized (unconscious) processors—the metaphorical audience in the dark—that can form coalitions and contribute their output to the workspace. There is a permanent competition between individual processors or coalitions of processors to contribute to the workspace. Contextual systems behind the scenes shape the content under the spotlight which will be globally available; i.e., once the content is shaped under the spotlight it is broadcasted to the audience (see Fig. 18.3). According to the GWT, the conscious contents of the mind are formed under the bright spot and then broadcasted to both the audience (specialized processors) and the management team behind the scenes (context formation and executive guidance processes). Thanks to the broadcast mechanism, the specialized processors “see” the action taking place under the focal point on stage. Depending on the information received by the processors and their potential contribution, these processors may form interim coalitions to build a new possible elaborated contribution to the next steps of the performance. Individual processors can also provide their processed outputs. All contributions from the audience compete for the appearance in the brightly lit area of the scene. The winning content that will show up as the main action of the play is finally shaped under the influence of active contexts and guidance from the director, script writer, scene designer, etc. In short, GWT is based on the idea that functional (or access) consciousness is produced in the brain thanks to the operation of a sort of blackboard architecture [24]. There seems to be evidence that such a global access mechanism actually takes place in the brain. Some known neural mechanisms have been found to correlate with the functional roles described in the GWT [25, 26].
18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture
231
Fig. 18.3 Global workspace theory
18.4.2 A Computational Model of the GWT The GWT has inspired a number of MC models and implementations [27–29], including the one we have used for this research work: CERA-CRANIUM. While GWT provides just a metaphorical description of how a blackboard system could operate within a mind, a computational model inspired on this theory has to go beyond the metaphor and describe the same processes but in terms of engineering design. This is to say, the computational model must provide a fine level of description which serves as a design guideline for a real implementation. Furthermore, in the particular case of this research work, which is focused in the generation of qualia, the computational design has to provide an actual mechanism for the identification and specification of the explicit contents (overt perception) of the artificial mind. As pointed out in the introductory section, at this stage of the research we are not aiming at describing a mechanism for the production of phenomenal states, but a mechanism for the generation and specification of the contents of overt perception. In short, we characterize artificial qualia as the contents that appear under the spotlight in a GWT-inspired implementation. We also argue that the explicit content specification produced by our system can be compared to the experiential content reported by humans when confronted with the same stimuli. We think this exercise can be useful in two different but related ways: on one hand, the computational model can be improved in order to better match human qualia production; on the other hand, a significant success in the former task might help understanding how conscious contents are generated in humans. In computational terms, we interpret GWT as follows: there is a single serial thread that corresponds to conscious perception and gives access to a very limited set of highly elaborated representations. These selected integrated contents, which here we characterize as artificial qualia, are formed thanks to the collaboration and competition of a vast number of processors that run concurrently. Working memory—or short term memory—is modeled as a specific type of blackboard system whose operation is modulated by control signals sent from a set of coordination processes (see Fig. 18.4). GWT broadcast mechanism is modeled as messages being sent from the single serial thread to the rest of processes running in the system (specialized processors and coordination processes). Analogously, possible contributions from processors are defined as messages sent from the processors to the workspace (these messages are called submissions—see Fig. 18.4). Note that the broadcast message from the spotlight to the specialized processors is indeed a submission message to the workspace. Each piece of content sent to the workspace using a submission message is temporarily stored and potentially accessible to any specialized processor. These pieces of content are generically called percepts, as they represent the interim and partial elaborations that can potentially be integrated in a more complex and integrated percept. Whenever a new percept enters the workspace, all relevant processors
232
R. Arrabales et al.
Fig. 18.4 CERA-CRANIUM computational model
can be notified. Notification messages provide this functionality: they are used to send newly created percepts to a variable list of specialized processors. Selection of processors eligible to receive a notification message is based on current active context and the contents of the corresponding percept. Integrated percepts coming from a serial thread submission are sent to all available processors, i.e., broadcasted. Active contexts are generated dynamically as a function of current explicit perception (the sequence of artificial qualia percepts) and active system’s goals. Contexts are defined as a set of criteria. For instance, a possible context based on the relative location criterion could be “right side”, meaning that percepts coming from the right field of view are more likely to be active, and therefore notified to specialized processors. The generation and application of multimodal contexts is explained in detail elsewhere [30]. In short, active contexts help solving the problem of percept selection, i.e. directs the attention of the explicit or “conscious” perception. This means that in the competition for access to the serial thread, those percepts which better fit the active context are more likely to be selected. Global workspace dynamics are modeled as an information processing system that takes a vast amount of raw sensory data as input and generates a serial and much lower bandwidth summary of integrated information (artificial qualia). Information filtering and integration is achieved by the distributed processing carried out by the specialized processors. The specific way these processors have access to the information is regulated by the application of contexts and other commands sent to the workspace from the context formation and coordination processes. Other mechanisms exist for modulating the workspace dynamics and allowing lower level feedback loops, like the modeling of unconscious reflexes (see [23]). The synthesis of more complex and elaborated information out of incoming sensory data is based on the concept of percept. The main role of the system described in Fig. 18.4 is to dynamically generate integrated percepts by iteratively processing and combining more simple percepts. The content available in working memory, which is essentially represented in the form of single percepts and complex percepts, is both the input and the output of this iterative integration process. In sum, the functionality associated with processors coalition is the collaboration of two or more specialized processor to generate a new more integrated percept out of a number of other existing
18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture
233
Fig. 18.5 Minimal configuration of CERA-CRANIUM architecture
percepts. This functionality is implemented in the model by means of percept reiterative generation using the workspace shared memory. The competition between processors is also modeled by means of the dynamically generated percepts that are stored temporarily in the workspace. The aim of this competition process is to select the contents that will be “illuminated” by the spotlight; therefore this task can be carried out by the application of contexts. At any given time a specific context is active, inducing a bias in the workspace. The application of a context in the workspace implies that only those percepts that match the criteria of the context are likely to be sent to the processors. It could be said that the competition takes place between the percepts, and they compete for being further processed and hopefully become a part of the finally selected complex percept (which will be sent to the spotlight service and also used as input for the definition of the next active context).
18.4.3 Minimal Implementation of CERA-CRANIUM Architecture The CERA-CRANIUM cognitive architecture has been designed to serve as a MC research test bed where two main components can be distinguished: CERA is a cognitive architecture structured in layers, and CRANIUM is basically an implementation of the functional consciousness model described above. CERA uses the services provided by CRANIUM in order to control an autonomous robot. Although the present implementation covers more aspects, like component reusability across robotic platforms, here we will primarily focus on CRANIUM, describing just a minimal part of CERA. Analogously, although other specialized processors have been implemented for other modalities, at this time we will focus exclusively on visual processing. Current definition of CERA comprises four layers (see Fig. 18.5): – Sensory-motor services layer encloses sensor and actuator drivers. – Physical layer hosts a CRANIUM workspace and manages representations directly related with existing physical body of the robot. – Mission-specific layer hosts a CRANIUM workspace and manages problem domain-dependent representations. – Core layer regulates the operation of lower level workspaces and manages domain-independent representations. CERA layers are also defined as services, managing the access to CRANIUM services and establishing a hierarchy that enables the use of several workspaces. Just one workspace located in the CERA physical layer has been used in this work for the sake of simplicity. For preliminary experiments on the generation of artificial qualia a minimal configuration of the cognitive architecture has been used. Mission-specific layer, motor services, and actuators are not used. A CRANIUM workspace service is hosted and managed by the CERA physical layer. Information is integrated in the workspace in the form of complex percepts. Workspace modulation commands
234
R. Arrabales et al.
are issued from the core layer. Basically, no actuators are used, mission-specific layer is not used, and the core layer consists of a minimal implementation for context definition. The perceptual information flow is limited to visual modality. Image bitmaps from the digital camera are acquired periodically thanks to a camera driver located in the CERA sensory-motor services layer (sensor services in Fig. 18.5). Proprioceptive sensing data is also acquired thanks to specific services located in the sensory-motor services layer. In the case of vision, relative location and current orientation of the camera is provided. All sensing data is sent to the physical layer—note that CERA higher layers do not have direct access to raw sensory data. As soon as sensor data are received in the physical layer, first single percepts are created and submitted to the workspace service. Basically, these initial single percepts are data packages combining exteroceptive sensing data with the associated proprioceptive data. For instance, one single percept might contain an image bitmap plus the camera orientation that was logged when that bitmap was acquired by the CCD sensor. Data structures enclosed in single percepts are used to implement a depictive representation of percepts in the sense described by Aleksander and Dunmall [21].
18.5 Experimental Setting A minimal architecture along the lines described above has been implemented for the generation of visual qualia. Although the system is designed to work both with real and simulated robots, a simulated environment has been set up initially. The simulated robot, designed using Robotics Developer Studio (RDS) [31], is a modified version of a MobileRobots Pioneer 3 DX8 robot in which actuators are disabled, and the only sensor available is a single simulated camera oriented to look over the robot’s front side (providing 320 × 200 pixels bitmaps). A specific sensor service for the camera has been written, so a predefined image sequence can be injected at will (instead of ingesting the synthetic visual feed from the simulator or the feed from a real camera connected to the system). Only one meta-goal has been defined in the CERA core layer, which is based on detecting saliencies. Therefore, the operation of the workspace will be modulated by relative location contexts pointing to novelties. The kind of novelties being detected depends on the perception process; that is to say, the particular processors implemented in the system will determine the sort of novelties that can be potentially detected. Two types of processors have been used for the experiments described in this work: motion detectors and region of interest (ROI) detectors. The following types of basic visual stimuli have been used for preliminary testing: – S1. Static white object in a dark background. – S2. White object moving along a rectilinear trajectory. – S3. Two stationary white blinking rounded spots. S1 and S2 are generated using the RDS simulator, while S3 is generated using the image sequence injection mechanism. It consists of a sequence designed to induce apparent motion effect in humans [20]. Humans can consciously perceive motion not only from real moving objects, but also from series of images containing spatially segregated visual stimuli like S3 (see Fig. 18.2). The ability to report qualia with grounded meaning is one of the key features of conscious machines [32]. In order to study artificial qualia, we have to analyze what kind of integrated content a system is able to report at any given level. The system we have implemented for this work is not yet endowed with an accurate reporting system. However, we have devised an inner state inspection mechanism: the CERA viewer (see Fig. 18.6). Although the output of a viewer cannot be directly compared with human accurate verbal report, alternative strategies can be adopted in order to compare the contents of conscious perception in
18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture
235
Fig. 18.6 Artificial qualia specification produced by CERA-CRANIUM
humans with the content specification of explicit perception in our proposed system. For instance, the same human observer can confirm whether or not the content of his/her visual experience matches the integrated percepts represented using CERA viewer. As pointed out above, the proposed comparison scheme is just an initial step towards a more complete synthetic phenomenology approach. Artificial qualia specification produced by CERA-CRANIUM can be compared with human conscious content report using the CERA viewer. A human observer can judge the similarity between his/her own experiential content and the representation displayed on screen. Three stages have been defined as partial but complementary definitions of artificial qualia. Current implementation addresses the so-called Stage 1 (perceptual content representation) and to some extent Stage 3 (self-modulation and reportability). It is our aim to work towards a full implementation that addresses the three defined steps, including Stage 2 (introspective perceptual meta-representation).
18.6 Preliminary Results Preliminary experiments were conducted as follows: both a human subject without previous knowledge of the domain (H ) and the minimal CERA-CRANIUM implementation (MCC) were exposed to S1, S2, and S3 visual stimuli (Fig. 18.7). H was asked to pay attention to white objects and verbally report the action perceived when looking at the computer screen. MCC was given just one meta-goal: focus on saliencies; that is to say, core layer was programmed to generate spatial contexts pointing to either ROI or detected movement. CERA viewer was programmed to generate a camera field of view screen in which the complex percepts coming out the physical layer are represented. Only two specialized processors were activated: a specific ROI detector for white objects and a motion detector based on pixel changes. Consequently, current version of CERA viewer is only able to pinpoint the location of complex percepts corresponding to integrated ROIs (using red color pixels), and the direction of movement (using a black color mark indicating the direction of movement). Working memory span (maximum age of percepts temporarily stored in the workspace and hence available to processors) was configured to 500 ms. S3 white ball stimulus duration was 100 ms. and inter-stimulus interval (ISI) was 50 ms. When exposed to S1 stimulus, H reported a static white object resting on the ground, located near to the center of the screen. The output of CERA viewer when MCC was exposed to the same visual stimulus matched part of H ’s report (Fig. 18.6a). Given the current implementation, all meaning that MCC can represent is exclusively about white objects and movement. Therefore no representation for such a concept as ground can appear in the viewer. When exposed to S2, H reported a round object moving uniformly from the right to the left. MCC viewer representations again matched H ’s report in part (Fig. 18.6b). Given that motion detector processor does not provide any measure related with speed, the uniform speed was not perceived by MCC. As expected, when H was exposed to S3 she reported a ball continuously moving back and forth from the left to the right and vice versa. However, MCC did not produce a matching motion representation (Fig. 18.6c). MCC viewer showed motion marks to the left and to the right (only when working memory span is shorter than ISI), but no continuity existed during black ISI after every dot stimulus, i.e. the representation did not correspond to the smooth experience reported by H .
236
R. Arrabales et al.
Fig. 18.7 S1 (a), S2 (b), and S3 (c) visual stimuli. From top to bottom: simulated scenario, robot camera image, and CERA viewer output. Corresponding H ’s report is illustrated using speech bubbles. The white object in scenarios (a) and (b) is a simulated iRobot Create robot
18.7 Conclusions and Future Work A developmental view of qualia based on former work by other authors has been defined as an attempt to provide a conceptual framework for the creation of new MC models. The definition of qualia presented in this work advocates for a functional role of phenomenal consciousness. Furthermore, self-modulation and integration of perceptual systems is appointed as a process being driven by the construction of introspective meta-representations. Additionally, reportability is assumed to be based on the same meta-representations. It is expected that implementations based on the proposed conceptual framework are able to incorporate the phenomenal dimension of consciousness into their models. That is not to say that they will become phenomenally conscious just because the proposed conceptual definition is considered. Nevertheless, we argue that exploring the design space in the proposed direction might shed some light to the problem of production of qualia, both in natural and artificial creatures. Phenomenology is typically one of the fields where advancement in natural sciences is challenging and we think this discipline could benefit from the research with artificial systems. The research line introduced in this paper is well underway. Preliminary results indicate that proposed GWT-based implementation needs to be enhanced in order to be able to specify accurately human-like visual experience. Future work also includes improving the CERA viewer interface so overt perception content specification can be interpreted easily by humans. It is well known that human perception is dramatically affected by expectations. Therefore, next steps planned for the enhancement of the architecture, which we think will lead to better results, include the generation of expectation-based percepts. Our hypothesis is that the use of expectations
18
Simulating Visual Qualia in the CERA-CRANIUM Cognitive Architecture
237
will contribute to a more robust system when we progress on to testing with noisy real world images. Additionally, better results might be obtained in terms of reproducing some human optical illusions. If that is the case, it will help to demonstrate that the presence of perceptual illusions correlates with better perception accuracy in noisy environments, and therefore illusions could be considered a by-product of an outstanding perception systems selected by evolution. Whether or not robots with human-level visual recognition skills will inevitably experience similar optical illusions remains to be seen. Once we have the expectation generation mechanism in place, we aim to test other typical visual phenomena like color-phi effect, attentional blink, flash-lag effect, etc.
References 1. 2. 3. 4.
5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
28.
Chrisley, R.: Synthetic phenomenology. Int. J. Mach. Conscious. 1, 53–70 (2009) Baars, B.J.: A Cognitive Theory of Consciousness. Cambridge University Press, Cambridge (1988) Noë, A.: Is the visual world a grand illusion? J. Conscious. Stud. 9, 1–12 (2002) Sloman, A.: Why some machines may need qualia and how they can have them: including a demanding new turing test for robot philosophers. In: Chella, A., Manzotti, R. (eds.) AI and Consciousness: TheoreticalFoundations and Current Approaches AAAI Fall Symposium 2007, pp. 9–16. AAAI Press, Menlo Park (2007). Technical Report FS-07-01 Arrabales, R., Ledezma, A., Sanchis, A.: Establishing a roadmap and metrics for conscious machines development. In: Proceedings of the 8th IEEE International Conference on Cognitive Informatics Block, N.: Some Concepts of Consciousness. In: Chalmers, D. (ed.) Philosophy of Mind: Classical and Contemporary Readings, vol. 2009, Oct. 20, 2001 Block, N.: On a confusion about a function of consciousness. Behav. Brain Sci. 18, 227–287 (1995) Nagel, T.: What is it like to be a bat? Philos. Rev. 83, 435–450 (1974) Lennie, P.: The physiology of color vision. In: Shevell, S.K. (ed.) The Science of Color, 2nd edn. (2003) Haikonen, P.O.A.: Qualia and conscious machines. Int. J. Mach. Conscious. 1(2), 225–234 (2009) Dennett, D.C.: Quining qualia. In: Consciousness in Modern Science (1988) Arrabales, R., Ledezma, A., Sanchis, A.: Strategies for measuring machine consciousness. Int. J. Mach. Conscious. 1, 193–201 (2009) Chalmers, D.: Facing up to the problem of consciousness. J. Conscious. Stud. 2, 200–219 (1995) Seth, A., Baars, B., Edelman, D.: Criteria for consciousness in humans and other mammals. Conscious. Cogn. 14, 119–139 (2005) 3 Jackson, F.: Epiphenomenal qualia. Philos. Q. 32, 127–136 (1982) Vygotsky, L.S.: Mind in Society: The Development of Higher Psychological Processes (1980) Sloman, A., Chrisley, R.: Virtual machines and consciousness. J. Conscious. Stud. 10, 133–172 (2003) Dennett, D.C.: Consciousness Explained. Brown, Boston (1991) Fürstenau, N.: A computational model of bistable perception–attention dynamics with long range correlations. In: KI 2007: Advances in Artificial Intelligence, pp. 251–263 (2007) Muckli, L., Kriegeskorte, N., Lanfermann, H., Zanella, F.E., Singer, W., Goebel, R.: Apparent motion: Eventrelated functional magnetic resonance imaging of perceptual switches and states. J. Neurosci. 22(9), 219 (2002) Aleksander, I., Dunmall, B.: Axioms and tests for the presence of minimal consciousness in agents. J. Conscious. Stud. 10, 47–66 (2003) Damasio, A.R.: The Feeling of what Happens: Body and Emotion in the Making of Consciousness. Heinemann, London (1999) Arrabales, R., Ledezma, A., Sanchis, A.: CERA-CRANIUM: A test bed for machine consciousness research (2009) Nii, H.P.: Blackboard application systems, blackboard systems and a knowledge engineering perspective. AI Mag. 7, 82–107 (1986) Dehaene, S., Naccache, L.: Towards a cognitive neuroscience of consciousness: Basic evidence and a workspace framework. Cognition 79(1–2), 1–37 (2001) Baars, B.J.: The conscious access hypothesis: origins and recent evidence. Trends Cogn. Sci. 6, 47–52 (2002) Dehaene, S., Sergent, C., Changeux, J.: A neuronal network model linking subjective reports and objective physiological data during conscious perception. In: Proceedings of the National Academy of Sciences of the United States of America 100(14), pp. 8520–8525. (2003) Shanaham, M.: Consciousness, emotion, and imagination. A brain-inspired architecture for cognitive robotics. In: AISB Workshop Next Generation Approaches to Machine Consciousness (2005)
238
R. Arrabales et al.
29. Franklin, S., Ramamurthy, U., DMello, S.K., McCauley, L., Negatu, A., Silva, R., Datla, V.: LIDA: A computational model of global workspace theory and developmental learning. In: AAAI Fall Symposium on AI and Consciousness: Theoretical Foundations and Current Approaches (2007) 30. Arrabales, R., Ledezma, A., Sanchis, A.: A cognitive approach to multimodal attention. J. Phys. Agents 3, 53–64 (2009) 31. Microsoft and Corp., Microsoft Robotics Studio. http://msdn.microsoft.com/robotics/, 2006 32. Haikonen, P.O.A.: Robot Brains. Circuits and Systems for Conscious Machines. Wiley, London (2007)
Chapter 19
The Ouroboros Model, Selected Facets Knud Thomsen
Abstract The Ouroboros Model features a biologically inspired cognitive architecture. At its core lies a self-referential recursive process with alternating phases of data acquisition and evaluation. Memory entries are organized in schemata. The activation at a time of part of a schema biases the whole structure and, in particular, missing features, thus triggering expectations. An iterative recursive monitor process termed ‘consumption analysis’ is then checking how well such expectations fit with successive activations. Mismatches between anticipations based on previous experience and actual current data are highlighted and used for controlling the allocation of attention. A measure for the goodness of fit provides feedback as (self-) monitoring signal. The basic algorithm works for goal directed movements and memory search as well as during abstract reasoning. It is sketched how the Ouroboros Model can shed light on characteristics of human behavior including attention, emotions, priming, masking, learning, sleep and consciousness.
19.1 Introduction The Ouroboros Model describes an algorithmic architecture for cognitive agents. Its venture points are two simple observations: animals and human beings are embodied, strongly interacting with their environment, and they can only survive if they maintain a minimum of consistency in their behavior. The first issue poses severe constraints but also offers indispensable foundations for a bootstrapping mechanism leading from simple to sophisticated behaviors. As for bodily movement, also for cognition some measure of coherence and consistency is indispensable, e.g. nobody can move a limb up and down simultaneously, and, at least in real-world settings, opposites cannot both be fully true at the same time. In a recent contribution, a detailed description of the principal layout of the Ouroboros Model has been given together with a glimpse on how the proposed structures and processes can address questions distilled from 50 years of research into Artificial Intelligence [1].
19.2 Action and Memory Structure Following Hebb’s law, already in quite simple animals neurons concurrently active often experience an enhancement of their link, raising the probability for later joint activation [2]. Different forms of learning act in concert. Neural assemblies are permanently linked together when once co-activated in the right manner. Later partial activation biases the whole associated neural population to fire together. K. Thomsen () Paul Scherrer Institut, 5232 Villigen PSI, Switzerland e-mail:
[email protected] C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3_19, © Springer Science+Business Media, LLC 2011
239
240
K. Thomsen
Structured memories are laid down, and this especially effectively when they are associated with some reward markers for success. According to the Ouroboros Model, these representations are bound together and preserved for later use, i.e. compound units, here called schemata, which join diverse slots into cohesive memory structures, are established [1]. In rather direct extension of conditioned reflexes, the processes and the resulting memory entries are the same in principle, irrespective of whether components of movements are combined into more elaborate choreographies, percepts into figures, or activations in many different brain areas into an entry for an episode. Data in brains are consequently organized into hierarchies of schemata where activation of any part promotes the selected concept and graded activation for each of the linked features. As a consequence of these structures, every neural activation triggers an expectation for the other associated constituents, which are usually active in this context. Activation at a time of part of a schema thus biases the whole structure with all relevant slots and, in particular, also missing features. Schemata can thus be seen as an effective generalization of production rules [3]. It is important to note that in addition to old structures established well in advance due to the repeated connection of the involved attributes, schemata can also be generated on the fly, assembled from parts and existing building blocks as the occasion arises [4].
19.3 Principal Algorithmic Backbone At the core of the Ouroboros Model lies a self-referential recursive process with alternating phases of data acquisition and evaluation. A monitor process termed ‘consumption analysis’ is checking how well expectations triggered at one point in time fit with successive activations; these principal stages are identified: • • • •
. . . anticipation, action/perception, evaluation, anticipation, . . .
The sub-processes are linked into a full repeating circle, and the snake bites its end, the Ouroboros devours its tail as in the alchemists’ serpent symbol [1]. A general overview is presented in Fig. 19.1. Start This is the almost arbitrary entry point in the perpetual flow of the proposed data-collection and -evaluation processes: a novel episode commences with little heritage from previous activity. Get data
In this example, first perceptional data arrive as input.
Activate Schema Schemata are searched in parallel; the one with the strongest bottom up activation sharing similar features is activated. Memory Highlights Slots Each of the features making up the selected schema are marked as relevant and they are activated to some extent; this biases all features belonging to this schema also when they are not part of the current input, i.e. empty slots are thus pointed out. Consumption Analysis This is the distinguished recurrent point at the core of the main cyclic process constituting the Ouroboros Model. Anticipations, i.e. the bias exerted by an activated schema, are compared to current actual data. In case of satisfactory correspondence the current cycle thus is concluded without gaps, and a new processing round can start. If the achieved fit is not sufficient, e.g. slots are left unfilled, follow-up action is triggered. In the outlined most simple example, more data are searched for, guided by expectations in the form of the biased empty slots.
19
The Ouroboros Model, Selected Facets
241
Fig. 19.1 Structure of the basic data processing loop in the Ouroboros Model (reprinted with permission from [1])
End/new Start In the example of Fig. 19.1. A (preliminary) end is reached when good agreement between active expectations and data is detected, e.g. an object is recognized. A new episode can start.
19.3.1 Consumption Analysis Any occurring activation excites associated schemata. The one with the highest activation is selected first, and other, possibly also applicable schemata are inhibited, suppressed [1]. Taking the first selected schema and ensuing anticipations active at the time as reference and basis, consumption analysis checks how successive activations fit into this activated schema, i.e. how well low level input data are “consumed” by the chosen schema. Features are assigned/attributes are “explained away” [5]. If everything fits perfectly the process comes to a momentary partly standstill and continues with new input data. If discrepancies surface, they have an even more immediate impact on the following elicited actions [1]. In case of severe mismatch, the first schema is discarded and another new conceptual frame is tried. The actual appropriateness of a schema can vary over a wide range. In any case, consumption analysis delivers a gradual measure for the goodness of fit between expectations and actual inputs, in sum, the acceptability of an interpretation. Thresholds for this signal are set in terms of approval levels depending on relevant experience in this context. In the real world nothing can always be perfect; nevertheless, a wrong schema has to be abandoned at some point. Consumption analysis can be understood as a particular algorithm for pattern-completion and matching, which not only delivers a simple result and feedback concerning a perception but also some meta-information relating to the overall performance, the quality of results, and how steadily progress unfolds. Suitable and promising next actions in a situation are highlighted, all of that based on the available concepts, i.e. schemata, which had been established earlier [1].
19.4 Selected Effects in the Ouroboros Model Consumption analysis points out discrepancies between the current status and relevant anticipations derived from prior experience. As will be described in this section, the very same basic processes
242
K. Thomsen
work with goal directed movements, with conceptual plans and with high level goals as well as during abstract rational reasoning, e.g. when weighing evidence for and against some remote theory. It is argued that the Ouroboros is self-referentially consistent, i.e. the purported structures and effects are resulting from and also leading to these very same concepts and consequences when implementing the proposed processes. Without going into detail, a few prominent effects, claimed to result directly from the consumption analysis process working on animal or human memory substrates and representations organized into schemata, are briefly sketched in the following subsections. For ease of reading, claimed effects of the Ouroboros Model are described as if all conjectures had already been demonstrated convincingly.
19.4.1 Attention Quite generally, contradictions signal that something unexpected is encountered and that, against the background of experience, some modification of the current behavior might be necessary as an immediate consequence. In any case, a missing feature can be seen as effectively triggering attention to that open slot in a schema structure. Faces, for example, are attentively looked at not in random scans but rather along scan paths linking the most important features like eyes, mouth, ears and nose [6]. One feature appears to direct attention to the next. Distinct discrepancies focus the gaze and mind on issues requiring further action. Unexpected peripherally perceived motion triggers attention and a saccade. Not only in natural scenes, top-down guidance can overrun bottom-up saliency depending on the task [7]. The amount of the resulting arousal can serve as a measure of assigned relevance; it depends in a meaningful manner on the history in the memory of the agent. If dimensions are involved which are considered or marked as important, minor deviations from expectations can cause a big effect, while discrepancies involving issues, which were unimportant or boring in the past, usually stir less attention. The weight of a mismatch is thus self-reflectively modulated based on the actual situation including the associated urgency. If some parts of an activated schema fit very tightly while, at the same time, there are striking discrepancies concerning other expected features, this triggers more focused attention and causes a higher tension than if a schema fits overall with some evenly distributed minor deviations between anticipations and current observations. This particular pattern, where it is clear that only a fraction of the highlighted slots can or should be filled with material from one domain, can be identified as characteristic for analogies. Stumbling over an obstacle which usually does not lie in the middle of the way certainly draws the attention of the actor towards his feet and to the ground. The effect is basically the same when one hears a politician making a public statement running counter to expectations derived from knowing the established consensus in the party to which she belongs.
19.4.2 Emotion Monitoring the quality of congruence with experience by means of a consumption analysis as suggested, provides a very useful feedback signal for any actor under all circumstances. Assuming as a start an average agreement around something like 80 percent, this would set the zero level of the scale: deviations to the worse as well as to the better are worthwhile for special highlighting, and these episodes should preferentially be remembered [8].
19
The Ouroboros Model, Selected Facets
243
The Ouroboros Model claims that the feeling-component of emotions is primarily that: a feedback signal to the actor from a consumption analysis process. Feelings and emotions are prototypically personal and individual matters. Indisputably, some events or objects were so meaningful for a species during evolution that dedicated detectors as well as bodily reactions and associated action biases have been established and become hardwired into brains. According to the Ouroboros Model, “new” emotions, experienced in conjunction with unfamiliar objects and under novel circumstances, first occur as a consequence of a situation or an event and they also set the stage for the activities following thereafter. Eventually, they have almost the same status and consequences as inborn (bodily and mental) reaction patterns. Discrepancies as well as achievements usually evoke attention and associated arousal; therefore these dimensions are finally linked to the relevant schemata. As a consequence, heightened arousal can boost emotions in turn; isolating this latter activation path, one arrives at somatic theories of emotions [9, 10]. The resulting hybrid picture, emphasizing feed-back plus guidance, combines well established and seemingly contradictory stances describing, purportedly explaining, emotions [11, 12]. Affects as information resulting from appraisal the same as motivational accounts stressing behavioral dispositions are just different facets and effects of the basic processes in the Ouroboros Model. When discrepancies become too big or victory is established, an interruption, a kind of reset, of the ongoing activity will be triggered. Emotions appear to be tied to a particular event, object or situation; moods usually denote less focused affects and circumstances. In addition, these manifestations of one and the same basic signal differ in their time characteristics. Schemata are activated over time, some being easier and quicker to excite than others. They regularly are built up from the combination of other constituent schemata, and most schemata therefore are nested into (non-strict) hierarchies. At least at certain points in time, consistency in the Ouroboros Model is checked globally, in parallel for all (partly) activated features. As discrepancies are dealt with according to their weight, a rather fixed sequence of appraisal dimensions might be observed due to general similarities between schemata, i.e. shared parts [13]. With some abstraction, basic emotions can be distilled [12]. Actors often are not alone: any signal useful for oneself, would also be of relevance also to others in a community. This explains the communication value of displayed emotions [14]. Similarly as observed for the weight of dimensions concerning the attraction of attention, also emotions come in two versions (and their combinations). They can be inherited, i.e. forming a constitutive part of a schema as an earlier associated feature; in this case they can be activated very directly and quickly. When novel circumstances give rise to a never before experienced context and evaluation, attention will be evoked and corresponding emotions will build up; they are incorporated into the memory of the event, ready for later fast use.
19.4.3 Rational Problem Solving In many circumstances a Bayesian approach can be specified as the optimum way of considering all available evidence in order to arrive at a decision, e.g. for classification and for acting [5, 15]. At the heart of optimality lies the appropriate combination of prior probabilities with current data; assuming the right weighing, the interplay between a partially activated schemata and newly observed features does exactly that. When certain combinations in general are much more likely than particular others, i.e. they have higher prior probabilities, less additional evidence is required for acceptance. Knowledge-guided perception and the organization of knowledge in useful chunks, i.e. meaningful schemata, was found to form the very basis of expert performance [16]. Beyond the effectiveness of the proposed memory structures in the Ouroboros Model for perception and the selection of interpretations, directed progress of activation in a brain is well adapted to
244
K. Thomsen
more abstract problem solving. Constraint satisfaction has been proposed as a general mechanism for rational behavior and reasoning, applicable in a very wide range of settings [17]. Maximizing the satisfaction of a set of constraints can be seen as optimizing coherence [18]. The Ouroboros Model and consumption analysis not only offers an efficient implementation of constraint satisfaction but also provide a rationale and a basis for goal directed refinement and self-steered tuning of the process. The succession of parallel data acquisition and evaluation phases interspersed with singular decision points in an overall serial repetitive process offers a natural explanation of seemingly contradictory observations concerning the prevalent character of human data processing: competing views stressing serial or parallel aspects can both be correct, depending on the investigated details and the specific timing. The same can be said concerning the debate on bottom-up versus top-down accounts of data processing: during well defined phases and focusing on certain aspects, each of these accounts is appropriate. In total, all of the processes interact and concur at their turn: we process data in parallel and serially and employ bottom-up as well as top down mechanisms. In a similar vein, memories are of the past, and they serve for the future [4]. Filling empty slots can be seen as theory driven activity, and at the same time as a kind of simulation, especially in cases where no good dedicated model exists and we have to fill in default values, probably even taking them from our own self model in order to understand the actions of others. For current overt behavior as well as during memory search or when generating anticipations of things to come, attention can be quickly and pre-consciously focused on the most pressing questions at any one time. Emotions provide the feedback how well things go, and they steer the ensuing actions; they thus are an indispensible and constituting ingredient to rational behavior. The unfolding of all activity depends on the individual history and personal preferences of an agent. The available schemata and the specific implementation of the diverse processes vary over wide ranges for different persons. In any case, a minimum of the slots deemed important have to be filled satisfactorily to rationally as well as emotionally accept a solution. Most extensive achievable consistency is the main criterion for judging the value and reliability of primary sensory percepts, and even more so, the “truth” of theories of highest complexity, abstraction and remoteness.
19.4.4 Priming/Masking The Ouroboros Model claims that guidance by existing schemata is essential for effective action, but this algorithmic architecture can also have detrimental consequences, which may actually limit the performance of an agent under specific circumstances. Triggering an expectation linked to a specific location facilitates acting on the information displayed there. This is one basic finding of simple priming experiments; another one concerns semantic priming where reactions are faster when an appropriate context is activated. Anyway, after locations or features are successfully consumed, they have to be marked together as part of the entire activation associated with the relevant concept in order to avoid that they are immediately and, in that case, most often improperly considered again. Attentional blink is a direct consequence of this inhibition and it occurs when subjects watch for targets in a stream of stimuli: they likely miss a target if it follows with a short delay after a first and detected target. Distracting subjects with task-irrelevant information attenuates the attentional blink [19]; as more than one activity is going on, consumption tagging is less efficient. In a straight forward manner the Ouroboros Model can deliver a unified account for competing models of attentional blink: resource depletion, processing bottleneck, and temporary loss of control can all be understood as different facets originating in one process [20].
19
The Ouroboros Model, Selected Facets
245
The proposed periodic processing entails a marked structure with respect to time, and its perception. The shortest time span that can be discerned by humans has been found to be approximately 30 ms [21]. Attentive visual perception appears to be associated with frequencies around 13 Hz in cortical electrical potentials [22]. The detection-sensitivity at luminance threshold is related to the EEG phase at lower frequencies [23]. Depending on the behavioral state of an animal, variations in EEG-frequency components’ strength and distribution, their coupling and coherence have been found [24]. If the time available is not is not enough to arrive at a conclusion because the sensory input is quickly replaced by a mask, no complete perception can be obtained. Stimuli difficult to perceive requiring more than one cycle, would correspondingly be sensitive to disruption for a longer period. Under rather special conditions, negative congruency effects can be observed where subjects respond faster to incongruent prime stimuli compared to congruent ones. This happens when the prime is already consuming fitting parts of the mask and new priming is elicited from the mask’s remainders, which are again similar to the target [25]. Non-overlapping masks, which terminate at the same time as the target, can impede target perception even if their presentation started only after the target; this can be explained by a more general reset, when the inputs for many slots become concurrently unavailable.
19.4.5 Cognitive Growth With a positive signal that everything fits nicely, the associated positive emotions mark a good basis concerning the expected future usefulness of the schema in question [8]. Whenever pre-existing structures cannot satisfactorily accommodate new data, this will be accompanied by another clear signal from the consumption analysis monitor. Negative emotions notify the need to change something, at least the assessment of the input data, adding new schema structures. In both cases, i.e. feedback indicating significantly better or worse than expected progress, effective self-regulated bootstrapping occurs. Useful schemata will be enhanced and preferably learned; they can serve as material for future refinement and as building blocks for more sophisticated concepts. Features where discrepancies surfaced are also identified. Extended or completely new schemata will be the consequence [26]. Learning thus does not take place indiscriminately but at the right spots where it is most effective. Cognitive growth preferentially occurs exactly when and where it is needed and the expected usefulness is highest. The Ouroboros Model comprises an efficient learning strategy; in fact, inherent meta-cognition allocates resources, directs the effort and leads to better adapted structures in the future. New concepts and content can best be conquered and absorbed if they lie just at the boundary of the already established structures; only there, gaps are rather well defined and success is in easy reach when filling in those vacancies. This stresses the importance of self-paced exploration and of fun associated with the activity. In school, Montessori education appears to emphasize these points; it is reported to yield impressive results [27]. Effective teaching thus seems to build on the same processes as outlined above for self-paced knowledge acquisition. Natural and artificial minds can best be taught when pre-existing knowledge structures and novel input are carefully matched. A provoked aha-experience in all cases marks some content as useful, and it is an efficient trigger for durable long term storage. Victory over a competitor with about equal strength would provide strong additional motivation [16].
246
K. Thomsen
19.4.6 Sleep All known agents who are able to exhibit some substantial measure of intelligence (and consciousness) spend a sizeable fraction of their life in strange states of sleep. It seems, therefore, appropriate to demand from any attempt at a comprehensive account of mental functions to include a profound explanation of this fact. Given stringent time constraints for an agent in the real world concerned with survival, consumption analysis inevitably produces “leftovers”, i.e. not-allocated features and not-confirmed concepts, which accumulate with time. The Ouroboros Model explains sleep as a specific and multifaceted housekeeping function for maintaining appropriate signal noise conditions in the brain by actively handling relics, “data garbage” [28]. There are different possibilities for effective “cleaning up”. One of course is the active erasure of whatever content, which appears to be of no more use. As in everyday life, a second way would be to assign prior unassociated material to conceptual structures where it seems to fit best, i.e. putting order into everything and trying to keep what might be of value later. Both of these processes seem to be not possible during actual real-time performance but easily feasible while a system is off-line. Sleep is a multi-facetted phenomenon, comprised of significantly different phases following each other in turn. The matching of these phases to their hypothesized specific functions in any detail goes beyond the scope of this chapter. Nevertheless, some general hypotheses can be formulated, and many at first sight rather diverse observed characteristics and proposed functions of sleeping and dreaming could be seen as consequences of efficient data processing as described by the Ouroboros Model and thereof resulting, basically unavoidable, appropriate clean-up: • extra and especially novel activities necessitate an increased need for sleep as the involved schemata are overflowing more, in particular when they are still developing; • for the same reason babies and children quite generally need more sleep than adults; • disturbing, threatening and unresolved issues would predominantly surface as dream content as their processing could not be concluded yet; • episodes which stir emotions, because expectations—e.g. norms—are violated, will preferably provide material for dreams; • erasing erroneous associations and traces of expectations or of perceptions unaccounted for enhances correct and well established connections; the ensuing increase in signal/noise would make memory entries, checked for consistency by the consumption analysis, stand out and thus easier to retrieve and reactivate; • at the same time, the association of further material, which probably was not so tightly connected to any memory entry before, to already established conceptualizations could provide additional support to them, also increasing the signal/noise ratio. It remains to be worked out in detail how the above listed general observations could fit with actual sleeping behavior and specific sleep phases as well as with the variations in neuromodulator levels reported in the literature.
19.5 From Self-Awareness to Consciousness We are embodied agents. Our body anchors and grounds each of us firmly in the real world; not only statically but, most importantly, with every dynamic action. Self-awareness and consciousness come in shades of grey; they start with physiology, somatic proprioception and a sense of ownership for an action. Drawing on multiple and multimodal sources, a sensory event is attributed to one’s own agency when predicted and assessed states are congruent [29]. First intentions do not need much second-order
19
The Ouroboros Model, Selected Facets
247
thought: a goal, like for a simple reaching movements, is aimed at, and it is reached or corrections are necessary. There is convincing experimental evidence that awareness of an action stems from a dynamic fusion of predictive and inferential processes [30]. It has been suggested that premotor processes lay down predictive signals in anticipation of effects, which are integrated into a coherent conscious experience encompassing actions and effects as soon as feedback sensory input becomes available [31]. Dissociations between focused attention and consciousness have been found under special conditions [32]. It perfectly fits with the tenets of the Ouroboros Model that attentional enhancement for the processing of unconscious (simple) visual features is possible, whereas spatial attention can target only consciously perceived stimuli, which appears to require activation of much wider distributed cortical areas [33]. Having a second look on the possible outcomes of the consumption analysis process, at the coarse level intended here, three conditions can be distinguished in the light of the Ouroboros Model [8]: • Perfect fit between expectation and data; • Deviations in the expected range; • No acceptable match. If the first (unconscious) guess on what frame a certain feature might belong to was right, the recursive process will quickly converge and result in one strongly activated concept. Consumption analysis then yields that all features are nicely consumed and nothing is left over; all data are consistent and coherently linked in the selected interpretation. Aha-experiences relevant to the actor catch attention and become conscious by provoking higher order personality activation “HOPA” thus connecting the context and details of the task and also of the actor.1 The whole process of embracing and incorporating contributions from distant parts in a cortex advancing over some time, might just be what is seen as needed to make the concurrent general activation reach a threshold for consciousness [34]. As proved useful, these episodes are preferably committed to long term memory [8]. If in a specific situation, topical “local” information does not suffice to obtain a unique and coherent interpretation, activation spreads and more remote content is considered. Data from the episodic memory will always contain representations pertaining to the body and thus some anchoring to the actor. She is thus becoming more involved as the need for a solution becomes more pressing and activation soars. With high enough weight, significant gaps in the understanding of a situation will thus trigger HOPA, too [35]. In between the above two sketched extremes lies much thoughtless action, e.g. driving home the usual way without any special event. This could be called a Zombie-mode of operation. The memory entries for episodes contribute, in turn, to the complete life story of the actor, her selfconcept, the narrative self. References to the body in particular provide privileged semantic content and anchoring; they ensure continuity and the uniqueness of a record [21, 35, 36, 39]. Especially in cases when a direct and unreflected impulse for action, provoked by some distinct trigger and fuelled by a strong associated emotion, is moderated and subject to a second consideration by the actor, where a wider context is taken into account, awareness and conscious elaboration can improve the performance of an agent [40, 41]. In sum, it can be claimed that special quality emerges as soon as information about the owner of the processes is self-reflexively included and associated representations, i.e. autobiographic memory 1 “Higher
Order Personality Activation”, which includes representations and signals of the body of the actor is claimed to lie at the basis of consciousness [35]. If possible, even more than in other global theories of consciousness [36–38], the all-embracing involvement of the actor herself is considered most important, which is accomplished by wide broadcasting and requesting of input; this becomes especially evident in the cases where the purpose of the global excitement is to bring to bear all possibly useful information in a difficult situation.
248
K. Thomsen
and self models, are embraced in the focus of the same basic processing. This has been conjectured by many authors [42, 43]. When the actor herself is in the center of her attention, other content fades, losing importance and weight. This starts when looking at ones toes and continues to recollections and reflections on personal experiences, goals, plans, preferences and attitudes. Deeply anchored in our whole body, this self-reflective and self-relevant recursively looping activity in the brain is drawing on and again inducing a sense of ownership for (mental) actions as well as “qualia” and all associated emotions: our total personal experience. Everybody who has ever burnt his finger will not primarily be bothered by doubts concerning the true essence or general validity of temperature sensations; he will certainly understand what heat feels like. Notwithstanding that it was found that not all shades of consciousness are depending on language, details subject to definitions, language certainly is the most powerful tool when it comes to consciously assembling and manipulating elaborate models in a human mind—not to speak of deliberations and explanations or their communication [34]. Emphasising diverse aspects, definite properties can be ascribed to the complex and multifaceted phenomenon of the self, which is seen by many authors as providing the unifying thread from episodic memory to consciousness and free will [44].
19.5.1 Pleasure in Play and Art Perception, just the same as memory retrieval or movement, is an active process in living beings. It is postulated that almost any specific activation in their brain can be integrated with others forming part of a schema. One feature which is preferably included in relevant schemata relates to the effort usually required and the time needed to perform an action, in particular: perception. Process details are monitored and recorded into the associated schema. Similar as for other attribute features like size or color, according to the Ouroboros Model, also the consumption of these meta-features is regularly monitored. Irrespective of their content, slots in schemata can be filled with little, as expected, or more than anticipated problems. This self-monitoring of (perceptual) fluency is taken as forming the basis for pleasure in play and art [45, 46].
19.6 Discussion The Ouroboros Model offers a novel self-consistent and self-contained account of efficient selfreferential and self-reflective data processing in human beings and artificial autonomous agents. The very object of this short chapter is to draft an overview of what the large puzzle could look like. Working out the intricate details should follow and be guided by expectations, the gist, briefly indicated here. Self-consistently it is claimed to be a strong argument in favor of the Ouroboros Model, that one simple general conception based on biologically plausible elements and processes can shed light on very diverse questions ranging from sensory perception to an efficient layout for animats exhibiting artificial general intelligence [1]. From probably many diverse anticipated objections to the Ouroboros Model only one shall be addressed shortly. It is most important to point out that there arises no problem due to seemingly circular arguments. The principal recursive algorithm progresses and evolves in time. When the snake bites its tail, the teeth and the tip of the tail belong to two distinctly different points in time. Starting with
19
The Ouroboros Model, Selected Facets
249
any basic set of expectations, discrepancies relating to any actual input data can be determined and used for choosing and steering the following actions including the establishment of revised expectations. This full cycle can in fact not be depicted adequately by a circle but more faithfully by a spiral. The intrinsically built-in potential for growth out of whatever plain of current understanding to novel heights, opening up new dimensions in a productive way, is probably one of the key and most valuable features of the Ouroboros Model.
References 1. Thomsen, K.: The ouroboros model in the light of venerable criteria. Neurocomputing 74, 121–128 (2010) 2. Antonov, I., Antonova, I., Kandel, E.R., Hawkins, R.D.: Activity-dependent presynaptic facilitation and hebbian LTP are both required and interact during classical conditioning in Aplysia. Neuron 37(1), 135–47 (2003) 3. Anderson, J.R.: The Architecture of Cognition. Harvard University Press, Cambridge (1983) 4. Schacter, D.L., Addis, D.R.: The cognitive neuroscience of constructive memory: remembering the past and imagining the future. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 362, 773–786 (2007) 5. Yuille, A., Kersten, D.: Vision as Bayesian inference: analysis by synthesis? Trends Cogn. Sci. 10(7), 301–308 (2006) 6. Noton, D., Stark, L.: Eye movements and visual perception. Sci. Am. 224(6), 34–43 (1971) 7. Einhäuser, W., Rutishauser, U., Koch, C.: Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. J. Vis. 8(2), 1–19 (2008) 8. Thomsen, K.: Concept formation in the Ouroboros Model. In: Proceedings of the Third Conference on Artificial General Intelligence, Lugano, Switzerland, 5–8 March (2010) 9. James, W.: What is an emotion? Mind 9, 188–205 (1884) 10. Lange, C.: Ueber Gemüthsbewegungen. Verlag von Theodor Thomas, Leipzig (1887) 11. Scherer, K.R.: Appraisal theory. In: Dalgleish, T., Power, M. (eds.) Handbook of Cognition and Emotion, pp. 637– 663. Wiley, Chichester (1999) 12. Ekman, P.: Basic emotions. In: Dalgleish, T., Power, M. (eds.) Handbook of Cognition and Emotion, pp. 45–60. Wiley, Chichester (1999) 13. Sander, D., Grandjean, D., Scherer, K.R.: A systems approach to appraisal mechanisms in emotion. Neural Netw. 18, 317–352 (2005) 14. Darwin, C.: The Expression of the Emotions in Man and Animals. John Murray, London (1872) 15. Tenenbaum, J.B., Griffiths, T.L., Kemp, C.: Theory-based Bayesian models of inductive learning and reasoning. Trends Cogn. Sci. 10(7), 309–318 (2006) 16. Ross, P.E.: The expert mind. Sci. Am. 295(2), 46–53 (2006) 17. Thagard, P.: Coherence in Thought and Action, Bradford. MIT Press, Cambridge (2000) 18. Thagard, P., Verbeurgt, K.: Coherence as constraint satisfaction. Cogn. Sci. 22, 1–24 (1998) 19. Olivers, N.L., Nieuwenhuis, S.: The benefical effect of concurrent task. Irrelevant mental activity on temporal attention. Psychol. Sci. 16, 265–269 (2005) 20. Kawahara, J., Enns, J.T., Di Lollo, V.: The attentional blink is not a unitary phenomenon. Psychol. Res. 70, 405–413 (2006) 21. Pöppel, E.: A hierarchical model of temporal perception. Trends Cogn. Sci. 1(2), 56–61 (1997) 22. VanRullen, R., Reddy, L., Koch, C.: The continuous wagon wheel illusion is associated with changes in electroencehpalogram power at ∼13 Hz. J. Neurosci. 26, 502–507 (2006) 23. Bush, N.A., Dubois, J., VanRullen, R.: The phase of ongoing EEG oscillations predicts visual perception. J. Neurosci. 29, 7869–7876 (2009) 24. Young, C.K., McNaughton, N.: Cereb. Cortex 19, 24–40 (2009) 25. Kiesel, A., Berner, M.P., Kunde, W.: Negative congruency effects: a test of the inhibition account. Conscious. Cogn. 17, 1–21 (2006) 26. Piaget, J., Inhelder, B.: Die Psychologie des Kindes. dtv/Klett-Cotta, Stuttgart (1986) 27. Lillard, A., Else-Quest, N.: Science 313, 1893–1894 (2006) 28. Thomsen, K.: The ouroboros model needs sleep. In: TSC 2007, Budapest, 23–27 July (2007) 29. Synofzik, M., Vosgerau, G., Newen, A.: Beyond the comparator model: a multifactorial two-step account of agency. Conscious. Cogn. 17, 219–239 (2007) 30. Moore, J., Haggard, P.: Awareness of action: inference and prediction. Conscious. Cogn. 17, 136–144 (2007) 31. Haggard, P., Cole, J.: Intention, attention and the temporal experience of action. Conscious. Cogn. 16, 211–220 (2007)
250
K. Thomsen
32. Koch, C., Tsuchiya, N.: Attention and consciousness: two distinct processes. Trends Cogn. Sci. 11(1), 16–22 (2006) 33. Kanai, R., Tauchiya, N., Verstraten, F.A.J.: The scope and limits of top-down attention in unconscious visual processing. Curr. Biol. 16, 2332–2336 (2006) 34. Tononi, G., Koch, C.: The neural correlates of consciousness, an update. Ann. N.Y. Acad. Sci. 1124, 239–261 (2008) 35. Thomsen, K.: Consciousness for the ouroboros model. Int. J. Mach. Consc. (2010, in print) 36. Baars, J.B.: A Cognitive Theory of Consciousness. Cambridge University Press, Cambridge (1988) 37. Dehaene, S., Naccache, L.: Towards a cognitive neuroscience of consciousness: basic evidence and a workspace framework. Cognition 79, 1–37 (2001) 38. Van Gulick, R.: Higher-order global states—an alternative higher-order view. In: Gennaro, R. (ed.) Higher-Order Theories of Consciousness. Benjamins, Amsterdam (2004) 39. Pauen, M.: Was ist der Mensch? Die Entdeckung der Natur des Geistes. Deutsche Verlagsanstalt, München (2007) 40. Lambie, J.A.: On the irrationality of emotion and the rationality of awareness. Conscious. Cogn. 17, 946–971 (2007) 41. DeWall, C.N., Baumeister, R.F., Masicampo, E.J.: Evidence that logical reasoning depends on conscious processing. Conscious. Cogn. 17, 628–645 (2008) 42. Gallagher, S.: Philosophical conceptions of the self: implications for cognitive science. Trends Cogn. Sci. 4, 14–21 (2000) 43. Chella, A., Cossentino, M., Seidita, V.: Self-Conscious Robotic Design Process—from Analysis to Implementation, 14–16 July, 2010, Madrid, Spain (2010) 44. Samsonovich, A.V., Nadel, L.: Fundamental principles and mechanisms of the conscious self. Cortex 41(5), 669– 689 (2005) 45. Thomsen, K.: Beauty and art arise in the brains of beholders. http://cogprints.org/857/ (2000) 46. Reber, R., Schwarz, N., Winkielman, P.: Personal. Soc. Psychol. Rev. 8(4), 364–382 (2004)
Index
A Action, 181–183, 185–188, 190 Activity bump, 44, 45, 47–50, 52, 55 Ad-hoc, 3 Adjacency matrix, 36 Aggregate, 112, 113 AI, 182, 185, 190 Alpha rhythm, 57–60, 71 Alzheimer’s disease, 4 Analysis decompositional, 112–115, 117–119 functional, 113 structural, 113 Anticipation, 210, 214, 215, 217 Artificial qualia, 223–229, 231–235 Associative map, 41–43, 50, 54, 55 Attention, 242, 244 Attentional blink, 244 Auditory, 7–9, 16 Automata theory, 141 Autonomous machine, 181 B Background activity, 20, 23, 25, 27, 29, 30 Basis function, 116, 117 Bayesian interpretation, 136 BCM, 55 BCM feedback modulation, 48 BCM learning rule, 43, 46, 51 Behavior, 181–186, 190 Biased competition, 75–78, 84, 165, 166, 168, 169, 176 Bio-inspired system, 1 Bioinspired cognitive architecture, 2 Biologically inspired cognitive system, 4 Blue Brain Project, 115 Brain, 139–141 Brain knowledge, 2 Brain-inspired robotic, 2 C Capacities, 113 Causality, 117 Causation, 184, 187
CERA-CRANIUM, 5, 223, 224, 230–233, 235 Characteristic path length, 33, 34, 37 Cicerobot, 211 Clustering coefficient, 33, 34, 36, 37 CNFT, 124 Cognitive architecture, 5, 223, 224, 230, 233 Cognitive growth, 245 Colimit, 157, 158 Column, 116, 117 Compatibilism, 185, 186, 189, 190 Competition, 4, 123 Competitive layer, 45, 47 Competitive queuing memory, 193, 202 Complex network, 33–35, 37, 38 Complexity, 115, 118, 119, 129 Computational tractability, 4 Concept formation, 245, 247 Conductance, 34, 35 Connection delays, 37, 38 Connection matrix, 36 Consciousness, 139–141, 146, 147, 181, 182, 185, 190, 246 Consumption analysis, 241 Continuum Neural Field, 124 Controller, 2 Coproduct, 157, 158 Cortex, 7–10, 16 basic uniformity, 116 columnar organization, 112, 115 uniform, 116 Cortical column, 42, 44, 46, 49, 50, 54 Cortical layer, 49, 50 Corticofugal, 7, 9 Corticothalamic, 9, 10, 14 Coupled oscillators, 75–77 Coupling of BCM learning rule and neural maps, 41 Cross-modal, 89, 98 D Debugging cognitive robots, 146 Decomposability, 112, 115, 117–119 Degrees of freedom, 101, 102 Delay, 8, 10, 16
C. Hernández et al. (eds.), From Brains to Systems, Advances in Experimental Medicine and Biology 718, DOI 10.1007/978-1-4614-0164-3, © Springer Science+Business Media, LLC 2011
251
252 Design and build strategy, 2 Design knowledge, 2 Design principle, 3 Design process, 1, 209–216, 219 Determinism, 181–186, 188, 189 Development, 7, 9, 16, 209, 210, 212–214, 219 Distributed, 4 DNF, 45, 46, 51, 52, 55 DNF activity bump, 49 DNF modulation, 53 Dominant frequency, 58, 61, 63, 65, 66, 68–70 Dynamic neural field (DNF), 44, 124 Dynamic neural field (DNF) theory, 43 Dynamical systems, 101, 102, 108 E Early detection, 57 EEG, 141, 146 Emergence, 78 Emergent property, 4 Emotion, 242 Entailment, 117 Episodic memory, 161 Everse engineering, 116 Evolution, 193, 200, 201, 204, 205 Excitatory neurons, 35, 36 F Failure, 2 Feature sensitivity, 3 Feedback, 165, 166, 168–178 Feedback modulation, 48 Free agent, 182–184, 186, 187 Free choice, 181, 183–185, 188, 189 Free will, 5, 181–190 Freedom, 181–190 Functional architecture, 104–107 Functional connectivity, 33 Functional integration, 33 Functional segregation, 33 Fundamental assumption, 113, 115, 117 G Gaussian mixture, 124 Generalization, 42, 43, 45, 51 Generic, 42, 45, 48–50, 55 Generic structure, 41 Genericity, 42, 54 Gestalt principles, 76 Global workspace, 247 Global workspace theory, 5, 223, 224, 230, 231 Goals, 182, 184, 187–189 Graph theory, 33 Grid cell, 149, 151, 152 H High dimensional input, 129 Higher Order Personality Activation (HOPA), 247 Hippocampal category, 5
Index I Illusory contour, 166, 171, 175–177 Illusory contour completion, 166, 177 Image segmentation, 4, 81, 84, 85 Impredicative, 118 Indeterminism, 183, 184, 188 Inference, 117, 118 Information integration, 5, 33, 38, 139–144, 146, 147 Inhibitory neurons, 35 Input sparsification, 132 Integrated information, 187 Internal simulation, 4, 88, 89, 98 J James, 184 L Laplace, 184 Lateral support, 78, 80, 81 Layer, 8, 10, 12 Learning, 7–12, 14 Libertarianism, 185, 186 Liveliness, 140–147 Local, continuous, decentralized and unsupervised learning, 42, 43, 45 Localization, 114 Loops, 118 LTD/LTP threshold, 46 LTP/LTD threshold, 47 M Machine consciousness, 181, 190, 223, 226 Machine free will, 181, 186, 190 Masking, 244 Mathematisation, 6 Mean EEG, 58, 68, 70, 71 Memory, 150 Merging algorithm, 128 Merging layer, 50 Metamodel, 210, 211, 215, 219 Metric space, 154 Microcircuit, 116 Mission-flexible robot, 1 Mission-level robustness, 2 Modalities, 41 Modality map, 42, 43, 45, 48, 49, 51, 53–55 Modality maps, 41, 42, 50 Model, 7–11, 16 Model construction, 4 Modeling relation, 117, 118 Modulation, 43 Morphism, 154 Multimodal context, 42, 43, 45, 47, 49–51, 54, 55 Multimodal level, 51 Multimodal perception, 4, 41 Multimodal representations, 136
Index N Network topology, 34 Neural maps, 42 Neural mass model, 71 Neural networks, 139, 141 Neurophenomenology, 146 Noise, 20, 21, 26–30 Non-decomposable, 112, 117, 118 O Object perception, 5 Olfactory system, 193, 194, 196, 205 Open-ended, dynamic environment, 1 Optimal parameters, 130 Origination, 185, 186 Ouroboros, 5, 240 P PASSI, 209–212, 215, 219 PASSIC, 5 Pattern, 20–30 Pattern onset, 3 Pattern recognition, 5 Patterncompletion, 241 Patternmatching, 241 Perception loop, 209–211, 213, 215, 219 Phase flow, 102–107 Place cells, 149 Plasticity, 3, 7–9, 11, 13, 16, 19–21 Poisson, 20, 21 Predicative, 118 Predictability, 183, 184 Predictive coding, 165, 166, 168, 169, 174, 176–178 Priming, 244 Process fragment, 213, 215 Pullback, 161–163 Q Qualia, 248 R Random network, 34, 37 Receptive field, 126 Recurrent, 7–9, 16 Recurrent neural network, 89 Recursive SOM, 89 Reference network, 34, 36, 37 Regular network, 34 Reverse engineering, 4, 111, 113, 114, 117–119 Reverse-engineer and copy, 2 Reverse-engineering of brains, 6 Rewiring, 34, 36 Rigorous method, 6 S Schema, 240, 242 Schemata, 240, 241
253 Scientific analysis method, 117, 119 Selectivity, 46–48, 52, 54, 55 Self determination, 181, 184, 189 Self-awareness, 246 Self-conscious, 5 Self-organization, 4, 41, 42, 45–47, 49, 51, 53–55 Self-organize, 45 Self-organizing map, 88–90 Sensory layer, 45, 52 Simulation hypothesis, 87, 88 Situational method engineering, 212, 213, 215 Sleep, 193, 194, 198, 199, 201, 204, 205, 246 Slowing, 57, 58, 65, 68–71 Small-world index, 34, 37, 38 Small-world network, 34 Small-world structure, 4 Sparse convolution, 127 Sparse matrices, 124 Spatially located firing, 152 SpikeStream, 140, 143, 144 Spinoza, 182, 183 Standard analysis method, 117 STDP, 7, 8, 10, 11, 16, 19–25, 27, 29, 30, 34–38 Structured flows on manifolds, 101–104, 106, 107 Superposition, 114, 116 Support vector machine, 5, 193–205 Synapse, 35, 36, 38 SyNAPSE Project, 115 Synaptic connectivity, 59, 70, 71 Synaptic plasticity, 4, 34 Synaptic weight, 34–36 Synthetic phenomenology, 146 Systems complex, 112, 118, 119 integrated, 115, 117–119 simple, 118 T Temporal atomism, 182, 185, 186 Thalamocortical, 7, 9, 10, 12, 15, 16 Thalamocortical circuitry, 58, 63, 71 Thalamus, 7, 8, 10 Theories of consciousness, 5 Theory of categories, 5 Time, 243, 248 Tononi, 139–141, 143, 144 Tracking, 130 U Unification, 6 Unified theory of cognition, 6 Unpredictability, 183 V Visual qualia, 5, 223, 234